Changes

Summary

  1. [SPARK-35318][SQL] Hide internal view properties for describe table cmd (commit: 3f5a209) (details)
  2. [SPARK-35240][SS] Use CheckpointFileManager for checkpoint file (commit: c6d3f37) (details)
  3. [SPARK-35215][SQL] Update custom metric per certain rows and at the end (commit: 6cd5cf5) (details)
  4. [SPARK-34526][SS] Ignore the error when checking the path in (commit: dfb3343) (details)
  5. [SPARK-35326][BUILD] Upgrade Jersey to 2.34 (commit: bb93547) (details)
  6. [SPARK-35326][BUILD][FOLLOWUP] Update dependency manifest files (commit: 482b43d) (details)
  7. [SPARK-35293][SQL][TESTS][FOLLOWUP] Update the hash key to refresh (commit: e834ef7) (details)
  8. [SPARK-35306][MLLIB][TESTS] Add benchmark results for BLASBenchmark (commit: 94bbca3) (details)
  9. [SPARK-35133][SQL] Explain codegen works with AQE (commit: 42f59ca) (details)
  10. [SPARK-34701][SQL][FOLLOW-UP] Children/innerChildren should be mutually (commit: 33c1034) (details)
  11. [SPARK-26164][SQL][FOLLOWUP] WriteTaskStatsTracker should know which (commit: e83910f) (details)
  12. [SPARK-35020][SQL] Group exception messages in catalyst/util (commit: cf2c4ba) (details)
  13. [SPARK-35333][SQL] Skip object null check in Invoke if possible (commit: 9aa18df) (details)
  14. [SPARK-35144][SQL] Migrate to transformWithPruning for object rules (commit: 72d3266) (details)
  15. [SPARK-35021][SQL] Group exception messages in connector/catalog (commit: d3b92ee) (details)
  16. [SPARK-35175][BUILD] Add linter for JavaScript source files (commit: 2634dba) (details)
  17. [SPARK-35297][CORE][DOC][MINOR] Modify the comment about the executor (commit: 6f0ef93) (details)
  18. [SPARK-35288][SQL] StaticInvoke should find the method without exact (commit: 33fbf56) (details)
  19. [SPARK-35321][SQL] Don't register Hive permanent functions when creating (commit: b4ec9e2) (details)
  20. [SPARK-35261][SQL] Support static magic method for stateless Java (commit: f47e0f8) (details)
  21. [SPARK-35232][SQL] Nested column pruning should retain column metadata (commit: 323a6e8) (details)
  22. [SPARK-35331][SQL] Support resolving missing attrs for (commit: b025780) (details)
  23. [SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause (commit: 06c4009) (details)
Commit 3f5a20919cfa10af8e687bdee4c09f21f5fea3d4 by wenchen
[SPARK-35318][SQL] Hide internal view properties for describe table cmd

### What changes were proposed in this pull request?
Hide internal view properties for describe table command, because those
properties are generated by spark and should be transparent to the end-user.

### Why are the changes needed?
Avoid internal properties confusing the users.

### Does this PR introduce _any_ user-facing change?
Yes
Before this change, the user will see below output for `describe formatted test_view`
```
....
Table Properties       [view.catalogAndNamespace.numParts=2, view.catalogAndNamespace.part.0=spark_catalog, view.catalogAndNamespace.part.1=default, view.query.out.col.0=c, view.query.out.col.1=v, view.query.out.numCols=2, view.referredTempFunctionsNames=[], view.referredTempViewNames=[]]
...
```
After this change, the internal properties will be hidden for `describe formatted test_view`
```
...
Table Properties        []
...
```

### How was this patch tested?
existing UT

Closes #32441 from linhongliu-db/hide-properties.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 3f5a209)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/create_view.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/show-tables.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/charvarchar.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/describe.sql.out (diff)
Commit c6d3f3778faa308308492fd758d2e9bd027f4768 by viirya
[SPARK-35240][SS] Use CheckpointFileManager for checkpoint file manipulation

### What changes were proposed in this pull request?

This patch changes a few places using `FileSystem` API to manipulate checkpoint file to `CheckpointFileManager`.

### Why are the changes needed?

`CheckpointFileManager` is designed to handle checkpoint file manipulation. However, there are a few places exposing `FileSystem` from checkpoint files/paths. We should use `CheckpointFileManager` to manipulate checkpoint files. For example, we may want to have one storage system for checkpoint file. If all checkpoint file manipulation is performed through `CheckpointFileManager`, we can only implement `CheckpointFileManager` for the storage system, and don't need to implement `FileSystem` API for it.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing unit tests.

Closes #32361 from viirya/checkpoint-manager.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: c6d3f37)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ResolveWriteToStream.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CheckpointFileManager.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetadata.scala (diff)
Commit 6cd5cf57229050ba9542a644ed0a4c844949a832 by wenchen
[SPARK-35215][SQL] Update custom metric per certain rows and at the end of the task

### What changes were proposed in this pull request?

This patch changes custom metric updating to update per certain rows (currently 100), instead of per row.

### Why are the changes needed?

Based on previous discussion https://github.com/apache/spark/pull/31451#discussion_r605413557, we should only update custom metrics per certain (e.g. 100) rows and also at the end of the task. Updating per row doesn't make too much benefit.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing unit test.

Closes #32330 from viirya/metric-update.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 6cd5cf5)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceRDD.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/metric/CustomMetrics.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousDataSourceRDD.scala (diff)
Commit dfb3343423304dbd4b10e41cac6610d4c961cdeb by kabhwan.opensource
[SPARK-34526][SS] Ignore the error when checking the path in FileStreamSink.hasMetadata

### What changes were proposed in this pull request?
When checking the path in `FileStreamSink.hasMetadata`, we should ignore the error and assume the user wants to read a batch output.

### Why are the changes needed?
Keep the original behavior of ignoring the error.

### Does this PR introduce _any_ user-facing change?
Yes.
The path checking will not throw an exception when checking file sink format

### How was this patch tested?
New UT added.

Closes #31638 from xuanyuanking/SPARK-34526.

Authored-by: Yuanjian Li <yuanjian.li@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(commit: dfb3343)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala (diff)
Commit bb93547cdf0791c38dffaf2ca28bf04b85680100 by dhyun
[SPARK-35326][BUILD] Upgrade Jersey to 2.34

### What changes were proposed in this pull request?

This PR upgrades Jersey to 2.34.

### Why are the changes needed?

CVE-2021-28168, a local information disclosure vulnerability, is reported (https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-28168).
Spark 3.1.1, 3.0.2 and 3.2.0 use an affected version 2.30.

### Does this PR introduce _any_ user-facing change?

It's not clear how much the impact is but Spark uses an affected version of Jersey so I think it's better to upgrade it just in case.

### How was this patch tested?

CI.

Closes #32453 from sarutak/upgrade-jersey.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: bb93547)
The file was modifiedpom.xml (diff)
Commit 482b43d78de2fbeb85a2ba54c59e08dab45f59aa by dhyun
[SPARK-35326][BUILD][FOLLOWUP] Update dependency manifest files

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/32453.

### Why are the changes needed?

Jenkins doesn't check dependency manifest files.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action or manually.

Closes #32458 from dongjoon-hyun/SPARK-35326.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 482b43d)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
Commit e834ef74dcbfc29f5288a41392dc3d5c08119fcf by dhyun
[SPARK-35293][SQL][TESTS][FOLLOWUP] Update the hash key to refresh TPC-DS cache data in forked GA jobs

### What changes were proposed in this pull request?

This is a follow-up PRi of #32420 and it intends to update the hash key to refresh TPC-DS cache data in forked GA jobs.

### Why are the changes needed?

To recover GA jobs.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA passed.

Closes #32460 from maropu/SPARK-35293-FOLLOWUP.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: e834ef7)
The file was modified.github/workflows/build_and_test.yml (diff)
Commit 94bbca3e55c924f85869cef2dd08bd4703ee1697 by gurwls223
[SPARK-35306][MLLIB][TESTS] Add benchmark results for BLASBenchmark created by GitHub Actions machines

### What changes were proposed in this pull request?
This PR adds benchmark results for `BLASBenchmark` created by GitHub Actions machines.
Benchmark result files are added for both JDK 8 (`BLASBenchmark-result.txt`) and 11 (`BLASBenchmark-jdk11-result.txt`) in `{SPARK_HOME}/mllib-local/benchmarks/`.

### Why are the changes needed?
In [SPARK-34950](https://issues.apache.org/jira/browse/SPARK-34950), benchmark results were updated to the ones created by Github Actions machines.
As benchmark results for `BLASBenchmark` (added at [SPARK-33882](https://issues.apache.org/jira/browse/SPARK-33882) and [SPARK-35150](https://issues.apache.org/jira/browse/SPARK-35150)) are not currently available at the repository, this PR adds them.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
The benchmark results were obtained by running tests with GitHub Actions workflow in my forked repository.
You can refer to the test results and output files from the link below.
- https://github.com/byungsoo-oh/spark/actions/runs/809900377
- https://github.com/byungsoo-oh/spark/actions/runs/810084610

Closes #32435 from byungsoo-oh/SPARK-35306.

Authored-by: byungsoo <byungsoo@byungsoo-pc.tn.corp.samsungelectronics.net>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 94bbca3)
The file was addedmllib-local/benchmarks/BLASBenchmark-jdk11-results.txt
The file was addedmllib-local/benchmarks/BLASBenchmark-results.txt
Commit 42f59caf735993f520220920c968214f669db5ba by dhyun
[SPARK-35133][SQL] Explain codegen works with AQE

### What changes were proposed in this pull request?

`EXPLAIN CODEGEN <query>` (and Dataset.explain("codegen")) prints out the generated code for each stage of plan. The current implementation is to match `WholeStageCodegenExec` operator in query plan and prints out generated code (https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala#L111-L118 ). This does not work with AQE as we wrap the whole query plan inside `AdaptiveSparkPlanExec` and do not run whole stage code-gen physical plan rule eagerly (`CollapseCodegenStages`). This introduces unexpected behavior change for EXPLAIN query (and Dataset.explain), as we enable AQE by default now.

The change is to explain code-gen for the current executed plan of AQE.

### Why are the changes needed?

Make `EXPLAIN CODEGEN` work same as before.

### Does this PR introduce _any_ user-facing change?

No (when comparing with latest Spark release 3.1.1).

### How was this patch tested?

Added unit test in `ExplainSuite.scala`.

Closes #32430 from c21/explain-aqe.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 42f59ca)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala (diff)
Commit 33c1034315af126b3fbcaab385a9c5e8561c1709 by wenchen
[SPARK-34701][SQL][FOLLOW-UP] Children/innerChildren should be mutually exclusive for AnalysisOnlyCommand

### What changes were proposed in this pull request?

This is a follow up to https://github.com/apache/spark/pull/32032#discussion_r620928086. Basically, `children`/`innerChildren` should be mutually exclusive for `AlterViewAsCommand` and `CreateViewCommand`, which extend `AnalysisOnlyCommand`. Otherwise, there could be an issue in the `EXPLAIN` command. Currently, this is not an issue, because these commands will be analyzed (children will always be empty) when the `EXPLAIN` command is run.

### Why are the changes needed?

To be future-proof where these commands are directly used.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added new tsts

Closes #32447 from imback82/SPARK-34701-followup.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 33c1034)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala (diff)
Commit e83910f1f89d39d90153219cf5c2f44b070f75b6 by wenchen
[SPARK-26164][SQL][FOLLOWUP] WriteTaskStatsTracker should know which file the row is written to

### What changes were proposed in this pull request?

This is a follow-up of https://github.com/apache/spark/pull/32198

Before https://github.com/apache/spark/pull/32198, in `WriteTaskStatsTracker.newRow`, we know that the row is written to the current file. After https://github.com/apache/spark/pull/32198 , we no longer know this connection.

This PR adds the file path parameter in `WriteTaskStatsTracker.newRow` to bring back the connection.

### Why are the changes needed?

To not break some custom `WriteTaskStatsTracker` implementations.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #32459 from cloud-fan/minor.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: e83910f)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/CustomWriteTaskStatsTrackerSuite.scala
Commit cf2c4ba584dd5ae2f8608ff9164ee907be751ca2 by wenchen
[SPARK-35020][SQL] Group exception messages in catalyst/util

### What changes were proposed in this pull request?
This PR group exception messages in `sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util`.

### Why are the changes needed?
It will largely help with standardization of error messages and its maintenance.

### Does this PR introduce _any_ user-facing change?
No. Error messages remain unchanged.

### How was this patch tested?
No new tests - pass all original tests to make sure it doesn't break any existing behavior.

Closes #32367 from beliefer/SPARK-35020.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: cf2c4ba)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TypeUtils.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilder.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala (diff)
Commit 9aa18dfe19561914498e013f8597973a9bf946f7 by wenchen
[SPARK-35333][SQL] Skip object null check in Invoke if possible

### What changes were proposed in this pull request?

If `targetObject` is not nullable, we don't need the object null check in `Invoke`.

### Why are the changes needed?

small perf improvement

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #32466 from cloud-fan/invoke.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 9aa18df)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala (diff)
Commit 72d32662d4744440e286a639783fed8dcf6c3948 by ltnwgl
[SPARK-35144][SQL] Migrate to transformWithPruning for object rules

### What changes were proposed in this pull request?

Added the following TreePattern enums:
- APPEND_COLUMNS
- DESERIALIZE_TO_OBJECT
- LAMBDA_VARIABLE
- MAP_OBJECTS
- SERIALIZE_FROM_OBJECT
- PROJECT
- TYPED_FILTER

Added tree traversal pruning to the following rules dealing with objects:
- EliminateSerialization
- CombineTypedFilters
- EliminateMapObjects
- ObjectSerializerPruning

### Why are the changes needed?

Reduce the number of tree traversals and hence improve the query compilation latency.

### How was this patch tested?

Existing tests.

Closes #32451 from sigmod/object.

Authored-by: Yingyi Bu <yingyi.bu@databricks.com>
Signed-off-by: Gengliang Wang <ltnwgl@gmail.com>
(commit: 72d3266)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleIdCollection.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala (diff)
Commit d3b92eec454380207f3d5b5c0974cdaf8aa78b68 by wenchen
[SPARK-35021][SQL] Group exception messages in connector/catalog

### What changes were proposed in this pull request?
This PR group exception messages in `sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog`.

### Why are the changes needed?
It will largely help with standardization of error messages and its maintenance.

### Does this PR introduce _any_ user-facing change?
No. Error messages remain unchanged.

### How was this patch tested?
No new tests - pass all original tests to make sure it doesn't break any existing behavior.

Closes #32377 from beliefer/SPARK-35021.

Lead-authored-by: beliefer <beliefer@163.com>
Co-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: d3b92ee)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Implicits.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala (diff)
Commit 2634dbac35c5e8d5216b38fd4256f5fd059f341f by sarutak
[SPARK-35175][BUILD] Add linter for JavaScript source files

### What changes were proposed in this pull request?

This PR proposes to add linter for JavaScript source files.
[ESLint](https://eslint.org/) seems to be a popular linter for JavaScript so I choose it.

### Why are the changes needed?

Linter enables us to check style and keeps code clean.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually run `dev/lint-js` (Node.js and npm are required).

In this PR, mainly indentation style is also fixed an linter passes.

Closes #32274 from sarutak/introduce-eslint.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(commit: 2634dba)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/historypage.js (diff)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/executorspage.js (diff)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/structured-streaming-page.js (diff)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/timeline-view.js (diff)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/log-view.js (diff)
The file was addeddev/package-lock.json
The file was modified.gitignore (diff)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/streaming-page.js (diff)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/table.js (diff)
The file was modifieddev/.rat-excludes (diff)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/utils.js (diff)
The file was addeddev/eslint.json
The file was addeddev/lint-js
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/stagepage.js (diff)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/historypage-common.js (diff)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/spark-dag-viz.js (diff)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/webui.js (diff)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/initialize-tooltips.js (diff)
The file was addeddev/package.json
Commit 6f0ef93f9a72ee704785cfe2421e3fe3587b4df6 by yamamuro
[SPARK-35297][CORE][DOC][MINOR] Modify the comment about the executor

### What changes were proposed in this pull request?
Now Spark Executor already can be used in Kubernetes scheduler. So we should modify the annotation in the Executor.scala.

### Why are the changes needed?
only comment

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
no

Closes #32426 from jerqi/master.

Authored-by: RoryQi <1242949407@qq.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: 6f0ef93)
The file was modifiedcore/src/main/scala/org/apache/spark/executor/Executor.scala (diff)
Commit 33fbf5647b4a5587c78ac51339c0cbc9d70547a4 by viirya
[SPARK-35288][SQL] StaticInvoke should find the method without exact argument classes match

### What changes were proposed in this pull request?

This patch proposes to make StaticInvoke able to find method with given method name even the parameter types do not exactly match to argument classes.

### Why are the changes needed?

Unlike `Invoke`, `StaticInvoke` only tries to get the method with exact argument classes. If the calling method's parameter types are not exactly matched with the argument classes, `StaticInvoke` cannot find the method.

`StaticInvoke` should be able to find the method under the cases too.

### Does this PR introduce _any_ user-facing change?

Yes. `StaticInvoke` can find a method even the argument classes are not exactly matched.

### How was this patch tested?

Unit test.

Closes #32413 from viirya/static-invoke.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: 33fbf56)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala (diff)
Commit b4ec9e230484db88c6220c27e43e3db11f3bdeef by dhyun
[SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client

### What changes were proposed in this pull request?

Instantiate a new Hive client through `Hive.getWithFastCheck(conf, false)` instead of `Hive.get(conf)`.

### Why are the changes needed?

[HIVE-10319](https://issues.apache.org/jira/browse/HIVE-10319) introduced a new API `get_all_functions` which is only supported in Hive 1.3.0/2.0.0 and up. As result, when Spark 3.x talks to a HMS service of version 1.2 or lower, the following error will occur:
```
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions'
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3897)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
        at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
        ... 96 more
Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions'
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_all_functions(ThriftHiveMetastore.java:3845)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_all_functions(ThriftHiveMetastore.java:3833)
```

The `get_all_functions` is called only when `doRegisterAllFns` is set to true:
```java
  private Hive(HiveConf c, boolean doRegisterAllFns) throws HiveException {
    conf = c;
    if (doRegisterAllFns) {
      registerAllFunctionsOnce();
    }
  }
```

what this does is to register all Hive permanent functions defined in HMS in Hive's `FunctionRegistry` class, via iterating through results from `get_all_functions`. To Spark, this seems unnecessary as it loads Hive permanent (not built-in) UDF via directly calling the HMS API, i.e., `get_function`. The `FunctionRegistry` is only used in loading Hive's built-in function that is not supported by Spark. At this time, it only applies to `histogram_numeric`.

### Does this PR introduce _any_ user-facing change?

Yes with this fix Spark now should be able to talk to HMS server with Hive 1.2.x and lower (with HIVE-24608 too)

### How was this patch tested?

Manually started a HMS server of Hive version 1.2.2, with patched Hive 2.3.8 using HIVE-24608. Without the PR it failed with the above exception. With the PR the error disappeared and I can successfully perform common operations such as create table, create database, list tables, etc.

Closes #32446 from sunchao/SPARK-35321.

Authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: b4ec9e2)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (diff)
Commit f47e0f83794fc9beee3c07dca4c0bb7e0eab81e4 by dhyun
[SPARK-35261][SQL] Support static magic method for stateless Java ScalarFunction

### What changes were proposed in this pull request?

This allows `ScalarFunction` implemented in Java to optionally specify the magic method `invoke` to be static, which can be used if the UDF is stateless. Comparing to the non-static method, it can potentially give better performance due to elimination of dynamic dispatch, etc.

Also added a benchmark to measure performance of: the default `produceResult`, non-static magic method and static magic method.

### Why are the changes needed?

For UDFs that are stateless (e.g., no need to maintain intermediate state between each function call), it's better to allow users to implement the UDF function as static method which could potentially give better performance.

### Does this PR introduce _any_ user-facing change?

Yes. Spark users can now have the choice to define static magic method for `ScalarFunction` when it is written in Java and when the UDF is stateless.

### How was this patch tested?

Added new UT.

Closes #32407 from sunchao/SPARK-35261.

Authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: f47e0f8)
The file was addedsql/core/benchmarks/V2FunctionBenchmark-results.txt
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ScalarFunction.java (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/connector/catalog/functions/JavaStrLen.java (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2FunctionSuite.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/connector/functions/V2FunctionBenchmark.scala
The file was addedsql/core/benchmarks/V2FunctionBenchmark-jdk11-results.txt
The file was addedsql/core/src/test/java/test/org/apache/spark/sql/connector/catalog/functions/JavaLongAdd.java
Commit 323a6e848e29cab7890fab572400863e73faed4b by viirya
[SPARK-35232][SQL] Nested column pruning should retain column metadata

### What changes were proposed in this pull request?

Retain column metadata during the process of nested column pruning, when constructing `StructField`.

To test the above change, this also added the logic of column projection in `InMemoryTable`. Without the fix `DSV2CharVarcharDDLTestSuite` will fail.

### Why are the changes needed?

The column metadata is used in a few places such as re-constructing CHAR/VARCHAR information such as in [SPARK-33901](https://issues.apache.org/jira/browse/SPARK-33901). Therefore, we should retain the info during nested column pruning.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests.

Closes #32354 from sunchao/SPARK-35232.

Authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: 323a6e8)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTable.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruningSuite.scala (diff)
Commit b0257801d582cbb5ab4a90a9fc0735a2127a0d41 by dongjoon
[SPARK-35331][SQL] Support resolving missing attrs for distribute/cluster by/repartition hint

### What changes were proposed in this pull request?

This PR makes the below case work well.

```sql
select a b from values(1) t(a) distribute by a;
```

```logtalk
== Parsed Logical Plan ==
'RepartitionByExpression ['a]
+- 'Project ['a AS b#42]
   +- 'SubqueryAlias t
      +- 'UnresolvedInlineTable [a], [List(1)]

== Analyzed Logical Plan ==
org.apache.spark.sql.AnalysisException: cannot resolve 'a' given input columns: [b]; line 1 pos 62;
'RepartitionByExpression ['a]
+- Project [a#48 AS b#42]
   +- SubqueryAlias t
      +- LocalRelation [a#48]
```
### Why are the changes needed?

bugfix

### Does this PR introduce _any_ user-facing change?

yes, the original attributes can be used in `distribute by` / `cluster by` and hints like `/*+ REPARTITION(3, c) */`

### How was this patch tested?

new tests

Closes #32465 from yaooqinn/SPARK-35331.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: b025780)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
Commit 06c40091a6d2218132b43e625c9d7acbc9affc9e by yamamuro
[SPARK-35327][SQL][TESTS] Filters out the TPC-DS queries that can cause flaky test results

### What changes were proposed in this pull request?

This PR proposes to filter out TPCDS v1.4 q6 and q75 in `TPCDSQueryTestSuite`.

I saw`TPCDSQueryTestSuite` failed nondeterministically because output row orders were different with those in the golden files. For example, the failure in the GA job, https://github.com/linhongliu-db/spark/runs/2507928605?check_suite_focus=true, happened because the `tpcds/q6.sql` query output rows were only sorted by `cnt`:

https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds/q6.sql#L20
Actually, `tpcds/q6.sql`  and `tpcds-v2.7.0/q6.sql` are almost the same and the only difference is that `tpcds-v2.7.0/q6.sql` sorts both `cnt` and `a.ca_state`:
https://github.com/apache/spark/blob/a0c76a8755a148e2bd774edcda12fe20f2f38c75/sql/core/src/test/resources/tpcds-v2.7.0/q6.sql#L22
So, I think it's okay just to test `tpcds-v2.7.0/q6.sql` in this case (q75 has the same issue).

### Why are the changes needed?

For stable testing.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

GA passed.

Closes #32454 from maropu/CleanUpTpcdsQueries.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: 06c4009)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/TPCDSQueryTestSuite.scala (diff)
The file was removedsql/core/src/test/resources/tpcds-query-results/v1_4/q6.sql.out
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala (diff)
The file was removedsql/core/src/test/resources/tpcds-query-results/v1_4/q75.sql.out