Changes

Summary

  1. [PYTHON][MINOR] Fix docstring of DataFrame.join (commit: 747ad18) (details)
  2. [SPARK-34291][ML] LSH hashDistance optimization (commit: a9969fa) (details)
  3. [SPARK-34141][SQL] Remove side effect from ExtractGenerator (commit: c73f70b) (details)
  4. [SPARK-34356][ML] OVR transform fix potential column conflict (commit: 178dc50) (details)
  5. [SPARK-34387][CORE][TESTS] Add ZStandardBenchmark (commit: 466c045) (details)
  6. [SPARK-34385][SQL] Unwrap `SparkUpgradeException` in v2 Parquet (commit: 6845d26) (details)
  7. [SPARK-34370][SQL] Support Avro schema evolution for partitioned Hive (commit: cc508d1) (details)
  8. [SPARK-34342][SQL] Format DateLiteral and TimestampLiteral toString (commit: 6e05e99) (details)
  9. [SPARK-34390][CORE] Enable Zstandard buffer pool by default (commit: eb5558e) (details)
  10. [SPARK-34398][DOCS] Fix PySpark migration link (commit: 34a1a65) (details)
  11. [SPARK-34158] Incorrect url of the only developer Matei in pom.xml (commit: 30ef3d6) (details)
  12. [SPARK-34346][CORE][TESTS][FOLLOWUP] Fix UT by removing core-site.xml (commit: dcaf62a) (details)
  13. [SPARK-34391][BUILD] Upgrade commons-io to 2.8.0 (commit: 329f945) (details)
  14. [SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with (commit: b344e91) (details)
  15. [SPARK-34355][SQL] Add log and time cost for commit job (commit: 9270238) (details)
  16. [SPARK-34157][SQL] Unify output of SHOW TABLES and pass output (commit: 2c243c9) (details)
  17. [SPARK-34239][SQL][FOLLOW_UP] SHOW COLUMNS Keep consistence with other (commit: 70a79e9) (details)
  18. [SPARK-33354][DOC] Remove an unnecessary quote in doc (commit: 88ced28) (details)
  19. [MINOR] Add a note about pip installation test in RC for release vote (commit: 556ecd6) (details)
  20. [SPARK-34157][BUILD][FOLLOW-UP] Fix Scala 2.13 compilation error via (commit: 70ef196) (details)
  21. [SPARK-34377][SQL] Add new parquet datasource options to control (commit: a854906) (details)
  22. [SPARK-33438][SQL] Eagerly init objects with defined SQL Confs for (commit: 037bfb2) (details)
  23. [MINOR][SQL][FOLLOW-UP] Add assertion to (commit: d1131bc) (details)
  24. [SPARK-34388][SQL] Propagate the registered UDF name to ScalaUDF, (commit: c92e408) (details)
  25. [SPARK-34168][SQL] Support DPP in AQE when the join is Broadcast hash (commit: 3b26bc2) (details)
  26. [SPARK-34374][SQL][DSTREAM] Use standard methods to extract keys or (commit: 777d51e) (details)
  27. [SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows (commit: e65b28c) (details)
  28. [SPARK-34395][SQL] Clean up unused code for code simplifications (commit: 37fe8c6) (details)
  29. [SPARK-34405][CORE] Fix mean value of timersLabels in the (commit: a1e75ed) (details)
  30. [SPARK-34407][K8S] KubernetesClusterSchedulerBackend.stop should clean (commit: ea339c3) (details)
  31. [SPARK-34355][CORE][SQL][FOLLOWUP] Log commit time in all File Writer (commit: 7ea3a33) (details)
  32. [MINOR][ML][TESTS] Increase tolerance to make NaiveBayesSuite more (commit: 18b3010) (details)
  33. [SPARK-34334][K8S] Correctly identify timed out pending pod requests as (commit: b2dc38b) (details)
  34. [SPARK-34209][SQL] Delegate table name validation to the session catalog (commit: cf7a13c) (details)
  35. [SPARK-34363][CORE] Add an option for limiting storage for migrated (commit: 2b51843) (details)
  36. [SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & (commit: 50641d2) (details)
  37. Revert "[SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning (commit: c8628c9) (details)
  38. [SPARK-34080][ML][PYTHON][FOLLOW-UP] Update score function in (commit: 1fbd576) (details)
  39. [SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & (commit: 5248ecb) (details)
  40. [SPARK-34238][SQL][FOLLOW_UP] SHOW PARTITIONS Keep consistence with (commit: 3e12e9d) (details)
  41. [SPARK-34311][SQL] PostgresDialect can't treat arrays of some types (commit: f79305a) (details)
  42. [SPARK-34240][SQL] Unify output of `SHOW TBLPROPERTIES` clause's output (commit: 123365e) (details)
  43. [SPARK-34137][SQL] Update suquery's stats when build LogicalPlan's stats (commit: 2f387b4) (details)
  44. [SPARK-33995][SQL] Expose make_interval as a Scala function (commit: e6753c9) (details)
  45. [SPARK-31816][SQL][DOCS] Added high level description about JDBC (commit: 0a37a95) (details)
  46. [SPARK-34347][SQL] CatalogImpl.uncacheTable should invalidate in cascade (commit: 0986f16) (details)
  47. [SPARK-34404][SQL] Add new Avro datasource options to control datetime (commit: c082c53) (details)
  48. [SPARK-34234][SQL] Remove TreeNodeException that didn't work (commit: 32a523b) (details)
Commit 747ad1809b4026aae4a7bedec2cac485bddcd5f2 by srowen
[PYTHON][MINOR] Fix docstring of DataFrame.join

### What changes were proposed in this pull request?
Fix docstring of PySpark `DataFrame.join`.

### Why are the changes needed?
For a better view of PySpark documentation.

### Does this PR introduce _any_ user-facing change?
No (only documentation changes).

### How was this patch tested?
Manual test.

From
![image](https://user-images.githubusercontent.com/47337188/106977730-c14ab080-670f-11eb-8df8-5aea90902104.png)

To
![image](https://user-images.githubusercontent.com/47337188/106977834-ed663180-670f-11eb-9c5e-d09be26e0ca8.png)

Closes #31463 from xinrong-databricks/fixDoc.

Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 747ad18)
The file was modifiedpython/pyspark/sql/dataframe.py (diff)
Commit a9969faca7253e630709466ddebcbbe28d052f96 by srowen
[SPARK-34291][ML] LSH hashDistance optimization

### What changes were proposed in this pull request?
`hashDistance` optimization: if two vectors in a pair are the same, directly return 0.0

### Why are the changes needed?
it should be faster than existing impl, because of short-circuit

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing testsuites

Closes #31394 from zhengruifeng/min_hash_distance_opt.

Authored-by: Ruifeng Zheng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: a9969fa)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSH.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/feature/LSHTest.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/MinHashLSH.scala (diff)
Commit c73f70bb0dc0234d349a362b379e5ff6c0029965 by srowen
[SPARK-34141][SQL] Remove side effect from ExtractGenerator

### What changes were proposed in this pull request?

Rewrote one `ExtractGenerator` case such that it would not rely on a side effect of the flatmap function.

### Why are the changes needed?

With the dataframe api it is possible to have a lazy sequence as the `output` of a `LogicalPlan`. When exploding a column on this dataframe using the `withColumn("newName", explode(col("name")))` method, the `ExtractGenerator` does not extract the generator and `CheckAnalysis` would throw an exception.

### Does this PR introduce _any_ user-facing change?

Bugfix
Before this, the work around was to put `.select("*")` before the explode.

### How was this patch tested?

UT

Closes #31213 from tanelk/SPARK-34141_extract_generator.

Authored-by: tanel.kiis@gmail.com <tanel.kiis@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: c73f70b)
The file was addedsql/catalyst/src/test/scala-2.12/org/apache/spark/sql/catalyst/analysis/ExtractGeneratorSuite.scala
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was addedsql/catalyst/src/test/scala-2.13/org/apache/spark/sql/catalyst/analysis/ExtractGeneratorSuite.scala
Commit 178dc50b7aa07e565743077611101d8e7c593df6 by srowen
[SPARK-34356][ML] OVR transform fix potential column conflict

### What changes were proposed in this pull request?
1, clear predictionCol & probabilityCol, use tmp rawPred col, to avoid potential column conflict;
2, use array instead of map, to keep in line with the python side;
3, simplify transform

### Why are the changes needed?
if input dataset has a column whose name is `predictionCol`,`probabilityCol`,`RawPredictionCol`, transfrom will fail.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added testsuite

Closes #31472 from zhengruifeng/ovr_submodel_skip_pred_prob.

Authored-by: Ruifeng Zheng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 178dc50)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala (diff)
Commit 466c045bfac20b6ce19f5a3732e76a5de4eb4e4a by dhyun
[SPARK-34387][CORE][TESTS] Add ZStandardBenchmark

### What changes were proposed in this pull request?

This PR aims to add ZStandardBenchmark as a base-line.

### Why are the changes needed?

This will prevent any regression when we upgrade Zstandard library in the future.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually.

Closes #31498 from dongjoon-hyun/SPARK-ZSTD-BENCH.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 466c045)
The file was addedcore/benchmarks/ZStandardBenchmark-results.txt
The file was addedcore/benchmarks/ZStandardBenchmark-jdk11-results.txt
The file was addedcore/src/test/scala/org/apache/spark/io/ZStandardBenchmark.scala
Commit 6845d26057c4c600e5183c0e93cafb35db282851 by dhyun
[SPARK-34385][SQL] Unwrap `SparkUpgradeException` in v2 Parquet datasource

### What changes were proposed in this pull request?
Unwrap `SparkUpgradeException` from `ParquetDecodingException` in v2 `FilePartitionReader` in the same way as v1 implementation does: https://github.com/apache/spark/blob/3a299aa6480ac22501512cd0310d31a441d7dfdc/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L180-L183

### Why are the changes needed?
1. To be compatible with v1 implementation of the Parquet datasource.
2. To improve UX with Spark SQL by making `SparkUpgradeException` more visible.

### Does this PR introduce _any_ user-facing change?
Yes, it can.

### How was this patch tested?
By running the affected test suites:
```
$ build/sbt "sql/test:testOnly *ParquetRebaseDatetimeV1Suite"
$ build/sbt "sql/test:testOnly *ParquetRebaseDatetimeV2Suite"
```

Closes #31497 from MaxGekk/parquet-spark-upgrade-exception.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 6845d26)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FilePartitionReader.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRebaseDatetimeSuite.scala (diff)
Commit cc508d17c7176bd3504c9f3af68794e1b98f8a81 by dhyun
[SPARK-34370][SQL] Support Avro schema evolution for partitioned Hive tables using "avro.schema.url"

### What changes were proposed in this pull request?

With https://github.com/apache/spark/pull/31133 Avro schema evolution is introduce for partitioned hive tables where the schema is given by `avro.schema.literal`.
Here that functionality is extended to support schema evolution where the schema is defined via `avro.schema.url`.

### Why are the changes needed?

Without this PR the problem described in https://github.com/apache/spark/pull/31133 can be reproduced by tables where `avro.schema.url` is used. As in this case always the property value given at partition level is used for the `avro.schema.url`.

So for example when a new column (with a default value) is added to the table then one the following problem happens:
-  when the new field is added after the last one the cell values will be null values instead of the default value
-  when the schema is extended somewhere before the last field then values will be listed for the wrong column positions

Similar error will happen when one of the field is removed from the schema.

For details please check the attached unit tests where both cases are checked.

### Does this PR introduce _any_ user-facing change?

Fixes the potential value error.

### How was this patch tested?

The existing unit tests for schema evolution is generalized and reused.
New tests:
- `SPARK-34370: support Avro schema evolution (add column with avro.schema.url)`
- `SPARK-34370: support Avro schema evolution (remove column with avro.schema.url)`

Closes #31501 from attilapiros/SPARK-34370.

Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: cc508d1)
The file was addedsql/hive/src/test/resources/schemaWithTwoFields.avsc
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was addedsql/hive/src/test/resources/schemaWithOneField.avsc
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala (diff)
Commit 6e05e99143539963bc0ab3ec053ca4a0100caba3 by dhyun
[SPARK-34342][SQL] Format DateLiteral and TimestampLiteral toString

### What changes were proposed in this pull request?

This pr format DateLiteral and TimestampLiteral toString. For example:
```sql
SELECT * FROM date_dim WHERE d_date BETWEEN (cast('2000-03-11' AS DATE) - INTERVAL 30 days) AND (cast('2000-03-11' AS DATE) + INTERVAL 30 days)
```
Before this pr:
```
Condition : (((isnotnull(d_date#18) AND (d_date#18 >= 10997)) AND (d_date#18 <= 11057))
```
After this pr:
```
Condition : (((isnotnull(d_date#14) AND (d_date#14 >= 2000-02-10)) AND (d_date#14 <= 2000-04-10))
```

### Why are the changes needed?

Make the plan more readable.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #31455 from wangyum/SPARK-34342.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 6e05e99)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q21.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q40.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q40.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q40/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q21/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q40/explain.txt (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q21/explain.txt (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q21.sf100/explain.txt (diff)
Commit eb5558ed35d51215de2ab12156e477cd7130a66a by dhyun
[SPARK-34390][CORE] Enable Zstandard buffer pool by default

### What changes were proposed in this pull request?

This PR aims to enable ZStandard JNI BufferPool by default in Apache Spark 3.2.0.

### Why are the changes needed?

**1. SPEED UP**
SPARK-34387 shows the speed-up on both Java8/Java11 by adding [ZStandardBenchmark](https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/io/ZStandardBenchmark.scala).

**2. MEMORY USAGE**
The followings are the memory usage graphs while running [ZStandardBenchmark](https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/io/ZStandardBenchmark.scala) on Java11 with the increased N (100000) value in order to visualize easily. In the charts, the first half is the memory consumption without buffer pool while the last last half is one with buffer pool. The difference is noticeable.
```scala
-  val N = 10000
+  val N = 100000
```

- Compression
![Screenshot from 2021-02-06 18-41-17](https://user-images.githubusercontent.com/9700541/107134909-0c4cfb00-68ab-11eb-9273-82cbecdebfba.png)

- Decompression
![Screenshot from 2021-02-06 18-43-05](https://user-images.githubusercontent.com/9700541/107134927-2edf1400-68ab-11eb-97c4-5cd101e91bb0.png)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the existing UTs.

Closes #31502 from dongjoon-hyun/SPARK-34390.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: eb5558e)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
Commit 34a1a65b398c4469eb97cd458ee172dc76b7ef56 by gurwls223
[SPARK-34398][DOCS] Fix PySpark migration link

### What changes were proposed in this pull request?

docs/pyspark-migration-guide.md

### Why are the changes needed?
broken link

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually build and check

Closes #31514 from raphaelauv/patch-2.

Authored-by: raphaelauv <raphaelauv@users.noreply.github.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 34a1a65)
The file was modifieddocs/pyspark-migration-guide.md (diff)
Commit 30ef3d6e1c00bd1f28e71511576daad223ba8b22 by gurwls223
[SPARK-34158] Incorrect url of the only developer Matei in pom.xml

### What changes were proposed in this pull request?

Update the Incorrect URL of the only developer Matei in pom.xml

### Why are the changes needed?

The current link was broken

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

change the link to https://cs.stanford.edu/people/matei/

Closes #31512 from pingsutw/SPARK-34158.

Authored-by: Kevin <pingsutw@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 30ef3d6)
The file was modifiedpom.xml (diff)
Commit dcaf62afea8791e49a44c2062fe14bafdcc0e92f by gurwls223
[SPARK-34346][CORE][TESTS][FOLLOWUP] Fix UT by removing core-site.xml

### What changes were proposed in this pull request?

This is a follow-up for SPARK-34346 which causes a flakiness due to `core-site.xml` test resource file addition. This PR aims to remove the test resource `core/src/test/resources/core-site.xml` from `core` module.

### Why are the changes needed?

Due to the test resource `core-site.xml`, YARN UT becomes flaky in GitHub Action and Jenkins.
```
$ build/sbt "yarn/testOnly *.YarnClusterSuite -- -z SPARK-16414" -Pyarn
...
[info] YarnClusterSuite:
[info] - yarn-cluster should respect conf overrides in SparkHadoopUtil (SPARK-16414, SPARK-23630) *** FAILED *** (20 seconds, 209 milliseconds)
[info]   FAILED did not equal FINISHED (stdout/stderr was not captured) (BaseYarnClusterSuite.scala:210)
```

To isolate more, we may use `SPARK_TEST_HADOOP_CONF_DIR` like `yarn` module's `yarn/Client`, but it seems an overkill in `core` module.
```
// SPARK-23630: during testing, Spark scripts filter out hadoop conf dirs so that user's
// environments do not interfere with tests. This allows a special env variable during
// tests so that custom conf dirs can be used by unit tests.
val confDirs = Seq("HADOOP_CONF_DIR", "YARN_CONF_DIR") ++
  (if (Utils.isTesting) Seq("SPARK_TEST_HADOOP_CONF_DIR") else Nil)
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #31515 from dongjoon-hyun/SPARK-34346-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: dcaf62a)
The file was removedcore/src/test/resources/core-site.xml
The file was modifiedcore/src/test/scala/org/apache/spark/SparkContextSuite.scala (diff)
Commit 329f945534a91c5c84594dc53971af23e3aff545 by dhyun
[SPARK-34391][BUILD] Upgrade commons-io to 2.8.0

### What changes were proposed in this pull request?

This PR aims to upgrade `commons-io` from 2.5 to 2.8.0 for Apache Spark 3.2.0.

### Why are the changes needed?

`2.5` was released on 2016-04-22. This will bring the latest bug fixes.
- [2020-09-05: 2.8.0](https://commons.apache.org/proper/commons-io/changes-report.html#a2.8.0)
- [2020-05-24: 2.7](https://commons.apache.org/proper/commons-io/changes-report.html#a2.7)
- [2017-10-15: 2.6](https://commons.apache.org/proper/commons-io/changes-report.html#a2.6)

### Does this PR introduce _any_ user-facing change?

Yes, but this is a compatible dependency change.

### How was this patch tested?

Pass the CIs.

Closes #31503 from dongjoon-hyun/SPARK-34391.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 329f945)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
Commit b344e91368ffdb22152d0450875e0549fb902200 by gurwls223
[SPARK-34375][CORE][K8S][TEST] Replaces 'Mockito.initMocks' with 'Mockito.openMocks'

### What changes were proposed in this pull request?
`Mockito.initMocks(Object)` is a deprecated api, should use `Mockito.openMocks(Object).close()` instead.

### Why are the changes needed?
Cleanup deprecation api usage.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass the Jenkins or GitHub Action

Closes #31487 from LuciferYang/mockito-api.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: b344e91)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/K8sSubmitOpSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/shuffle/sort/io/LocalDiskShuffleMapOutputWriterSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala (diff)
The file was modifiedcore/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriterSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/shuffle/sort/IndexShuffleBlockResolverSuite.scala (diff)
The file was modifiedcore/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSourceSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/shuffle/sort/SortShuffleWriterSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManagerSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsPollingSnapshotSourceSuite.scala (diff)
The file was modifiedcore/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/worker/WorkerSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/shuffle/ShuffleBlockPusherSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocatorSuite.scala (diff)
Commit 92702384732f52f802bbad08820e91532af54289 by kabhwan.opensource
[SPARK-34355][SQL] Add log and time cost for commit job

### What changes were proposed in this pull request?

Add some info log around commit log.

### Why are the changes needed?

Th commit job is a heavy option and we have seen many times Spark block at this code place due to the slow rpc with namenode or other.

It's better to record the time that commit job cost.

### Does this PR introduce _any_ user-facing change?

Yes, more info log.

### How was this patch tested?

Not need.

Closes #31471 from ulysses-you/add-commit-log.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
(commit: 9270238)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala (diff)
Commit 2c243c93d9b107ed19fa21002d54deccd17ce50a by wenchen
[SPARK-34157][SQL] Unify output of SHOW TABLES and pass output attributes properly

### What changes were proposed in this pull request?
The current implement of some DDL not unify the output and not pass the output properly to physical command.
Such as: The `ShowTables` output attributes `namespace`, but `ShowTablesCommand` output attributes `database`.

As the query plan, this PR pass the output attributes from `ShowTables` to `ShowTablesCommand`, `ShowTableExtended ` to `ShowTablesCommand`.

Take `show tables` and `show table extended like 'tbl'` as example.
The output before this PR:
`show tables`
|database|tableName|isTemporary|
-- | -- | --
| default|      tbl|      false|

If catalog is v2 session catalog, the output before this PR:
|namespace|tableName|
-- | --
| default|      tbl

`show table extended like 'tbl'`
|database|tableName|isTemporary|         information|
-- | -- | -- | --
| default|      tbl|      false|Database: default...|

The output after this PR:
`show tables`
|namespace|tableName|isTemporary|
-- | -- | --
|  default|      tbl|      false|

`show table extended like 'tbl'`
|namespace|tableName|isTemporary|         information|
-- | -- | -- | --
|  default|      tbl|      false|Database: default...|

### Why are the changes needed?
This PR have benefits as follows:
First, Unify schema for the output of SHOW TABLES.
Second, pass the output attributes could keep the expr ID unchanged, so that avoid bugs when we apply more operators above the command output dataframe.

### Does this PR introduce _any_ user-facing change?
Yes.
The output schema of `SHOW TABLES` replace `database` by `namespace`.

### How was this patch tested?
Jenkins test.

Closes #31245 from beliefer/SPARK-34157.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 2c243c9)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowTablesSuiteBase.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablesExec.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/show-tables.sql.out (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2CommandExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLContextSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/ShowTablesSuite.scala (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedpython/pyspark/sql/context.py (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
Commit 70a79e920a97b8323ba5b51836c075bf0b0434a1 by wenchen
[SPARK-34239][SQL][FOLLOW_UP] SHOW COLUMNS Keep consistence with other `SHOW` command

### What changes were proposed in this pull request?
Keep consistence with other `SHOW` command according to  https://github.com/apache/spark/pull/31341#issuecomment-774613080

### Why are the changes needed?
Keep consistence

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes #31518 from AngersZhuuuu/SPARK-34239-followup.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 70a79e9)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
Commit 88ced28141beb696791ae67eac35219de942bf31 by gurwls223
[SPARK-33354][DOC] Remove an unnecessary quote in doc

### What changes were proposed in this pull request?

Remove an unnecessary quote in the documentation.
Super trivial.

### Why are the changes needed?

Fix a mistake.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Just doc

Closes #31523 from gengliangwang/removeQuote.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 88ced28)
The file was modifieddocs/sql-ref-ansi-compliance.md (diff)
Commit 556ecd681a02ec4080c41491727e5cedc4be7e51 by gurwls223
[MINOR] Add a note about pip installation test in RC for release vote template

### What changes were proposed in this pull request?

This PR proposes to add a note about pip installation test in RC for release vote template.

### Why are the changes needed?

To promote PySpark users to test PyPi distribution and pip installation.

### Does this PR introduce _any_ user-facing change?

No. It will be used for release vote.

### How was this patch tested?

N/A

Closes #31527 from HyukjinKwon/minor-update-vote-templ.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 556ecd6)
The file was modifieddev/create-release/vote.tmpl (diff)
Commit 70ef196d59c355e9fcba2abd7d0feda23d7f2c1e by gurwls223
[SPARK-34157][BUILD][FOLLOW-UP] Fix Scala 2.13 compilation error via using Array.deep

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/31245:

```
[error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:112:53: value deep is not a member of Array[String]
[error]         assert(sql("show tables").schema.fieldNames.deep ==
[error]                                                     ^
[error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:115:72: value deep is not a member of Array[String]
[error]         assert(sql("show table extended like 'tbl'").schema.fieldNames.deep ==
[error]                                                                        ^
[error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:121:55: value deep is not a member of Array[String]
[error]           assert(sql("show tables").schema.fieldNames.deep ==
[error]                                                       ^
[error] /home/runner/work/spark/spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala:124:74: value deep is not a member of Array[String]
[error]           assert(sql("show table extended like 'tbl'").schema.fieldNames.deep ==
[error]                                                                          ^
```

It broke Scala 2.13 build. This PR works around by using ScalaTests' `===` that can compare `Array`s safely.

### Why are the changes needed?

To fix the build.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

CI in this PR should test it out.

Closes #31526 from HyukjinKwon/SPARK-34157.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 70ef196)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala (diff)
Commit a85490659f45410be3588c669248dc4f534d2a71 by wenchen
[SPARK-34377][SQL] Add new parquet datasource options to control datetime rebasing in read

### What changes were proposed in this pull request?
In the PR, I propose new options for the Parquet datasource:
1. `datetimeRebaseMode`
2. `int96RebaseMode`

Both options influence on loading ancient dates and timestamps column values from parquet files. The `datetimeRebaseMode` option impacts on loading values of the `DATE`, `TIMESTAMP_MICROS` and `TIMESTAMP_MILLIS` types, `int96RebaseMode` impacts on loading of `INT96` timestamps.

The options support the same values as the SQL configs `spark.sql.legacy.parquet.datetimeRebaseModeInRead` and `spark.sql.legacy.parquet.int96RebaseModeInRead` namely;
- `"LEGACY"`, when an option is set to this value, Spark rebases dates/timestamps from the legacy hybrid calendar (Julian + Gregorian) to the Proleptic Gregorian calendar.
- `"CORRECTED"`, dates/timestamps are read AS IS from parquet files.
- `"EXCEPTION"`, when it is set as an option value, Spark will fail the reading if it sees ancient dates/timestamps that are ambiguous between the two calendars.

### Why are the changes needed?
1. New options will allow to load parquet files from at least two sources in different rebasing modes in the same query. For instance:
```scala
val df1 = spark.read.option("datetimeRebaseMode", "legacy").parquet(folder1)
val df2 = spark.read.option("datetimeRebaseMode", "corrected").parquet(folder2)
df1.join(df2, ...)
```
Before the changes, it is impossible because the SQL config `spark.sql.legacy.parquet.datetimeRebaseModeInRead`  influences on both reads.

2. Mixing of Dataset/DataFrame and RDD APIs should become possible. Since SQL configs are not propagated through RDDs, the following code fails on ancient timestamps:
```scala
spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "legacy")
spark.read.parquet(folder).distinct.rdd.collect()
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the modified test suites:
```
$ build/sbt "sql/test:testOnly *ParquetRebaseDatetimeV1Suite"
$ build/sbt "sql/test:testOnly *ParquetRebaseDatetimeV2Suite"
```

Closes #31489 from MaxGekk/parquet-rebase-options.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: a854906)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRebaseDatetimeSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala (diff)
The file was modifiedpython/pyspark/sql/streaming.py (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala (diff)
The file was modifiedpython/pyspark/sql/readwriter.py (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetScan.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala (diff)
Commit 037bfb2dbcb73cfbd73f0fd9abe0b38789a182a2 by gurwls223
[SPARK-33438][SQL] Eagerly init objects with defined SQL Confs for command `set -v`

### What changes were proposed in this pull request?
In Spark, `set -v` is defined as "Queries all properties that are defined in the SQLConf of the sparkSession".
But there are other external modules that also define properties and register them to SQLConf. In this case,
it can't be displayed by `set -v` until the conf object is initiated (i.e. calling the object at least once).

In this PR, I propose to eagerly initiate all the objects registered to SQLConf, so that `set -v` will always output
the completed properties.

### Why are the changes needed?
Improve the `set -v` command to produces completed and  deterministic results

### Does this PR introduce _any_ user-facing change?
`set -v` command will dump more configs

### How was this patch tested?
existing tests

Closes #30363 from linhongliu-db/set-v.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 037bfb2)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/api/python/PythonSQLUtils.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit d1131bc85028ea0f78ac9ef73bba731080f1ff6a by srowen
[MINOR][SQL][FOLLOW-UP] Add assertion to FixedLengthRowBasedKeyValueBatch

### What changes were proposed in this pull request?
Adds an assert to `FixedLengthRowBasedKeyValueBatch#appendRow` method to check the incoming vlen and klen by comparing them with the lengths stored as member variables as followup to https://github.com/apache/spark/pull/30788

### Why are the changes needed?
Add assert statement to catch similar bugs in future.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Ran some tests locally, though not easy to test.

Closes #31447 from yliou/SPARK-33726-Assert.

Authored-by: yliou <yliou@berkeley.edu>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: d1131bc)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java (diff)
Commit c92e408aa1c7f5eed29ad21d5b6cb570b5cc5758 by wenchen
[SPARK-34388][SQL] Propagate the registered UDF name to ScalaUDF, ScalaUDAF and ScalaAggregator

### What changes were proposed in this pull request?

This PR proposes to propagate the name used for registering UDFs to `ScalaUDF`, `ScalaUDAF` and `ScaalAggregator`.

Note that `PythonUDF` gets the name correctly: https://github.com/apache/spark/blob/466c045bfac20b6ce19f5a3732e76a5de4eb4e4a/python/pyspark/sql/udf.py#L358-L359
, and same for Hive UDFs:
https://github.com/apache/spark/blob/466c045bfac20b6ce19f5a3732e76a5de4eb4e4a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionCatalog.scala#L67
### Why are the changes needed?

This PR can help in the following scenarios:
1) Better EXPLAIN output
2) By adding  `def name: String` to `UserDefinedExpression`, we can match an expression by `UserDefinedExpression` and look up the catalog, an use case needed for #31273.

### Does this PR introduce _any_ user-facing change?

The EXPLAIN output involving udfs will be changed to use the name used for UDF registration.

For example, for the following:
```
sql("CREATE TEMPORARY FUNCTION test_udf AS 'org.apache.spark.examples.sql.Spark33084'")
sql("SELECT test_udf(col1) FROM VALUES (1), (2), (3)").explain(true)
```
The output of the optimized plan will change from:
```
Aggregate [spark33084(cast(col1#223 as bigint), org.apache.spark.examples.sql.Spark330846906be0f, 1, 1) AS spark33084(col1)#237]
+- LocalRelation [col1#223]
```
to
```
Aggregate [test_udf(cast(col1#223 as bigint), org.apache.spark.examples.sql.Spark330847a62d697, 1, 1, Some(test_udf)) AS test_udf(col1)#237]
+- LocalRelation [col1#223]
```

### How was this patch tested?

Added new tests.

Closes #31500 from imback82/udaf_name.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: c92e408)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (diff)
Commit 3b26bc25362a245a610c3e222b971b4ae612bc3e by wenchen
[SPARK-34168][SQL] Support DPP in AQE when the join is Broadcast hash join at the beginning

### What changes were proposed in this pull request?
This PR is to enable AQE and DPP when the join is broadcast hash join at the beginning, which can benefit the performance improvement from DPP and AQE at the same time. This PR will make use of the result of build side and then insert the DPP filter into the probe side.

### Why are the changes needed?
Improve performance

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
adding new ut

Closes #31258 from JkSelf/supportDPP1.

Authored-by: jiake <ke.a.jia@intel.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 3b26bc2)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveSubqueries.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryAdaptiveBroadcastExec.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/CoalesceShufflePartitionsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/RemoveRedundantProjectsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala (diff)
Commit 777d51e7e346e59d78c6c87544f413cd7fa928c1 by srowen
[SPARK-34374][SQL][DSTREAM] Use standard methods to extract keys or values from a Map

### What changes were proposed in this pull request?
Use standard methods to extract `keys` or `values` from a `Map`, it's semantically consistent and  use the `DefaultKeySet` and `DefaultValuesIterable` instead of a manual loop.

**Before**
```
map.map(_._1)
map.map(_._2)
```

**After**
```
map.keys
map.values
```

### Why are the changes needed?
Code Simpilefications.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass the Jenkins or GitHub Action

Closes #31484 from LuciferYang/keys-and-values.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 777d51e)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala (diff)
The file was modifiedstreaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverSchedulingPolicy.scala (diff)
Commit e65b28cf7d9680ebdf96833a6f2d38ffd61c7d21 by gurwls223
[SPARK-34352][SQL] Improve SQLQueryTestSuite so as could run on windows system

### What changes were proposed in this pull request?
The current implement of `SQLQueryTestSuite` cannot run on windows system.
Becasue the code below will fail on windows system:
`assume(TestUtils.testCommandAvailable("/bin/bash"))`

For operation system that cannot support `/bin/bash`, we just skip some tests.

### Why are the changes needed?
SQLQueryTestSuite has a bug on windows system.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Jenkins test

Closes #31466 from beliefer/SPARK-34352.

Authored-by: gengjiaan <gengjiaan@360.cn>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: e65b28c)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala (diff)
Commit 37fe8c6d3cd1c5aae3a61fb9b32ca4595267f1bb by srowen
[SPARK-34395][SQL] Clean up unused code for code simplifications

### What changes were proposed in this pull request?
Currently, we pass the default value `EmptyRow` to method `checkEvaluation` in the `StringExpressionsSuite`, but the default value of the 'checkEvaluation' method parameter is the `emptyRow`.

We can clean the parameter for Code Simplifications.

### Why are the changes needed?
for Code Simplifications

**before**:
```
def testConcat(inputs: String*): Unit = {
  val expected = if (inputs.contains(null)) null else inputs.mkString
  checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected, EmptyRow)
}
```
**after**:
```
def testConcat(inputs: String*): Unit = {
  val expected = if (inputs.contains(null)) null else inputs.mkString
  checkEvaluation(Concat(inputs.map(Literal.create(_, StringType))), expected)
}
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass the Jenkins or Github action.

Closes #31510 from yikf/master.

Authored-by: yikf <13468507104@163.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 37fe8c6)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala (diff)
Commit a1e75edc39c11e85d8a4917c3e82282fa974be96 by dhyun
[SPARK-34405][CORE] Fix mean value of timersLabels in the PrometheusServlet class

### What changes were proposed in this pull request?
The getMetricsSnapshot method of the PrometheusServlet class has a wrong value, It should be taking the mean value but it's taking the max value.

### Why are the changes needed?

The mean value of timersLabels in the PrometheusServlet class is wrong, You can look at line 105 of this class: L105.

```
sb.append(s"${prefix}Mean$timersLabels ${snapshot.getMax}\n")
```
it should be
```
sb.append(s"${prefix}Mean$timersLabels ${snapshot.getMean}\n")
```

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

![image](https://user-images.githubusercontent.com/5170878/107313576-cc199280-6acd-11eb-9384-b6abf71c0f90.png)

Closes #31532 from 397090770/SPARK-34405.

Authored-by: wyp <wyphao.2007@163.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: a1e75ed)
The file was modifiedcore/src/main/scala/org/apache/spark/metrics/sink/PrometheusServlet.scala (diff)
Commit ea339c38b43c59931257386efdd490507f7de64d by dhyun
[SPARK-34407][K8S] KubernetesClusterSchedulerBackend.stop should clean up K8s resources

### What changes were proposed in this pull request?

This PR aims to fix `KubernetesClusterSchedulerBackend.stop` to wrap `super.stop` with `Utils.tryLogNonFatalError`.

### Why are the changes needed?

[CoarseGrainedSchedulerBackend.stop](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L559) may throw `SparkException` and this causes K8s resource (pod and configmap) leakage.

### Does this PR introduce _any_ user-facing change?

No. This is a bug fix.

### How was this patch tested?

Pass the CI with the newly added test case.

Closes #31533 from dongjoon-hyun/SPARK-34407.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: ea339c3)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala (diff)
Commit 7ea3a336b99915f09174a4c3e47fa17f30b88890 by kabhwan.opensource
[SPARK-34355][CORE][SQL][FOLLOWUP] Log commit time in all File Writer

### What changes were proposed in this pull request?
When doing https://issues.apache.org/jira/browse/SPARK-34399 based  on https://github.com/apache/spark/pull/31471
Found FileBatchWrite will use `FileFormatWrite.processStates()` too. We need log commit duration  in other writer too.
In this pr:

1. Extract a commit job method in SparkHadoopWriter
2. address other commit writer

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
No

Closes #31520 from AngersZhuuuu/SPARK-34355-followup.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
(commit: 7ea3a33)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileBatchWrite.scala (diff)
Commit 18b30107adb37d3c7a767a20cc02813f0fdb86da by gurwls223
[MINOR][ML][TESTS] Increase tolerance to make NaiveBayesSuite more robust

### What changes were proposed in this pull request?
Increase the rel tol from 0.2 to 0.35.

### Why are the changes needed?
Fix flaky test

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT.

Closes #31536 from WeichenXu123/ES-65815.

Authored-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 18b3010)
The file was modifiedmllib/src/test/scala/org/apache/spark/mllib/classification/NaiveBayesSuite.scala (diff)
Commit b2dc38b6546552cf3fcfdcd466d7d04d9aa3078c by hkarau
[SPARK-34334][K8S] Correctly identify timed out pending pod requests as excess request

### What changes were proposed in this pull request?

Fixing identification of timed-out pending pod requests as excess requests to delete when the excess is higher than the newly created timed out requests and there is some non-timed out newly created requests too.

### Why are the changes needed?

After https://github.com/apache/spark/pull/29981 only timed out newly created requests and timed out pending requests are taken as excess request.

But there is small bug when the excess is higher than the newly created timed out requests and there is some non-timed out newly created requests as well. Because all the newly created requests are counted as excess request when items are chosen from the timed out pod pending requests.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

There is new unit test added: `SPARK-34334: correctly identify timed out pending pod requests as excess`.

Closes #31445 from attilapiros/SPARK-34334.

Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
(commit: b2dc38b)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocatorSuite.scala (diff)
Commit cf7a13c363ef5d56556c9d70e7811bf6a40de55f by hkarau
[SPARK-34209][SQL] Delegate table name validation to the session catalog

### What changes were proposed in this pull request?

Delegate table name validation to the session catalog

### Why are the changes needed?

Queerying of tables with nested namespaces.

### Does this PR introduce _any_ user-facing change?

SQL queries of nested namespace queries

### How was this patch tested?

Unit tests updated.

Closes #31427 from holdenk/SPARK-34209-delegate-table-name-validation-to-the-catalog.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
(commit: cf7a13c)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala (diff)
Commit 2b51843ca41236f8cec29c406ea35ce1088364cf by hkarau
[SPARK-34363][CORE] Add an option for limiting storage for migrated shuffle blocks

### What changes were proposed in this pull request?

Allow users to configure a maximum amount of shuffle blocks to be stored and reject remote shuffle blocks when this threshold is exceeded.

### Why are the changes needed?

In disk constrained environments with large amount of shuffle data, migrations may result in excessive disk pressure on the nodes. On Kube nodes this can result in cascading failures when combined with `emptyDir`.

### Does this PR introduce _any_ user-facing change?

Yes, new configuration parameter.

### How was this patch tested?

New unit tests.

Closes #31493 from holdenk/SPARK-34337-reject-disk-blocks-when-under-disk-pressure.

Lead-authored-by: Holden Karau <hkarau@apple.com>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Signed-off-by: Holden Karau <hkarau@apple.com>
(commit: 2b51843)
The file was modifiedcore/src/main/scala/org/apache/spark/shuffle/MigratableResolver.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
Commit 50641d2e3d659f51432aa2c0e6b9af76d71a5796 by hkarau
[SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes

### What changes were proposed in this pull request?

Allow users to have Spark attempt to decommission excluded executors.
Since excluded executors may be flaky, this also adds the ability for users to specify a time limit after which a decommissioning executor will be killed by Spark.

### Why are the changes needed?

This may help prevent fetch failures from excluded executors, and also handle the situation in which executors

### Does this PR introduce _any_ user-facing change?

Yes, two new configuration flags for the behaviour.

### How was this patch tested?

Extended unit and integration tests.

Closes #31249 from holdenk/configure-inaccessibleList-kill-to-use-decommissioning.

Lead-authored-by: Holden Karau <hkarau@apple.com>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Signed-off-by: Holden Karau <hkarau@apple.com>
(commit: 50641d2)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/HealthTrackerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/HealthTracker.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala (diff)
Commit c8628c943cd12bbad7561bdc297cea9ff23becc7 by gurwls223
Revert "[SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes"

This reverts commit 50641d2e3d659f51432aa2c0e6b9af76d71a5796.
(commit: c8628c9)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/HealthTracker.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/HealthTrackerSuite.scala (diff)
Commit 1fbd5764105e2c09caf4ab57a7095dd794307b02 by gurwls223
[SPARK-34080][ML][PYTHON][FOLLOW-UP] Update score function in UnivariateFeatureSelector document

### What changes were proposed in this pull request?

This follows up #31160 to update score function in the document.

### Why are the changes needed?

Currently we use `f_classif`, `ch2`, `f_regression`, which sound to me the sklearn's naming. It is good to have it but I think it is nice if we have formal score function name with sklearn's ones.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

No, only doc change.

Closes #31531 from viirya/SPARK-34080-minor.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 1fbd576)
The file was modifiedpython/pyspark/ml/feature.py (diff)
The file was modifieddocs/ml-features.md (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/UnivariateFeatureSelector.scala (diff)
Commit 5248ecb5ab0c9fdbd5f333e751d1f5fb3c514d14 by hkarau
[SPARK-34104][SPARK-34105][CORE][K8S] Maximum decommissioning time & allow decommissioning for excludes

### What changes were proposed in this pull request?

Allow users to have Spark attempt to decommission excluded executors.
Since excluded executors may be flaky, this also adds the ability for users to specify a time limit after which a decommissioning executor will be killed by Spark.

### Why are the changes needed?

This may help prevent fetch failures from excluded executors, and also handle the situation in which executors

### Does this PR introduce _any_ user-facing change?

Yes, two new configuration flags for the behaviour.

### How was this patch tested?

Extended unit and integration tests.

Closes #31539 from holdenk/re=enable-SPARK-34104-SPARK-34105.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
(commit: 5248ecb)
The file was modifiedcore/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/HealthTrackerSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/HealthTracker.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala (diff)
Commit 3e12e9d2ee2a0d6d0137a956356dfe888412d16a by wenchen
[SPARK-34238][SQL][FOLLOW_UP] SHOW PARTITIONS Keep consistence with other `SHOW` command

### What changes were proposed in this pull request?
Keep consistence with other `SHOW` command according to  https://github.com/apache/spark/pull/31341#issuecomment-774613080

### Why are the changes needed?
Keep consistence

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes #31516 from AngersZhuuuu/SPARK-34238-follow-up.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 3e12e9d)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolvePartitionSpec.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
Commit f79305a402b94a87e77c5dbc4a6559aa587f7d4a by yamamuro
[SPARK-34311][SQL] PostgresDialect can't treat arrays of some types

### What changes were proposed in this pull request?

This PR fixes the issue that `PostgresDialect` can't treat arrays of some types.
Though PostgreSQL supports wide range of types (https://www.postgresql.org/docs/13/datatype.html),  the current `PostgresDialect` can't treat arrays of the following types.

* xml
* tsvector
* tsquery
* macaddr
* macaddr8
* txid_snapshot
* pg_snapshot
* point
* line
* lseg
* box
* path
* polygon
* circle
* pg_lsn
* bit varying
* interval

NOTE: PostgreSQL doesn't implement arrays of serial types so this PR doesn't care about them.

### Why are the changes needed?

To provide better support with PostgreSQL.

### Does this PR introduce _any_ user-facing change?

Yes. PostgresDialect can handle arrays of types shown above.

### How was this patch tested?

New test.

Closes #31419 from sarutak/postgres-array-types.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: f79305a)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala (diff)
Commit 123365e05cb6dabc3b6d3ecc5b3728221b0c33c2 by wenchen
[SPARK-34240][SQL] Unify output of `SHOW TBLPROPERTIES` clause's output attribute's schema and ExprID

### What changes were proposed in this pull request?
Passing around the output attributes should have more benefits like keeping the exprID unchanged to avoid bugs when we apply more operators above the command output DataFrame.

This PR did 2 things :

1. After this pr, a `SHOW TBLPROPERTIES` clause's output shows `key` and `value` columns whether you specify the table property `key`. Before this pr, a `SHOW TBLPROPERTIES` clause's output only show a `value` column when you specify the table property `key`..
2. Keep `SHOW TBLPROPERTIES` command's output attribute exprId unchanged.

### Why are the changes needed?
1. Keep `SHOW TBLPROPERTIES`'s output schema consistence
2. Keep `SHOW TBLPROPERTIES` command's output attribute exprId unchanged.

### Does this PR introduce _any_ user-facing change?
After this pr, a `SHOW TBLPROPERTIES` clause's output shows `key` and `value` columns whether you specify the table property `key`. Before this pr, a `SHOW TBLPROPERTIES` clause's output only show a `value` column when you specify the table property `key`.

Before this PR:
```
sql > SHOW TBLPROPERTIES tabe_name('key')
value
value_of_key
```

After this PR
```
sql > SHOW TBLPROPERTIES tabe_name('key')
key value
key value_of_key
```

### How was this patch tested?
Added UT

Closes #31378 from AngersZhuuuu/SPARK-34240.

Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Co-authored-by: AngersZhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 123365e)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/show-tblproperties.sql.out (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala (diff)
Commit 2f387b41e888778eb44572c3dca1af2a1a942ef5 by wenchen
[SPARK-34137][SQL] Update suquery's stats when build LogicalPlan's stats

### What changes were proposed in this pull request?
When explain SQL with cost, treeString about subquery won't show it's statistics:

How to reproduce:
```
spark.sql("create table t1 using parquet as select id as a, id as b from range(1000)")
spark.sql("create table t2 using parquet as select id as c, id as d from range(2000)")

spark.sql("ANALYZE TABLE t1 COMPUTE STATISTICS FOR ALL COLUMNS")
spark.sql("ANALYZE TABLE t2 COMPUTE STATISTICS FOR ALL COLUMNS")
spark.sql("set spark.sql.cbo.enabled=true")

spark.sql(
  """
    |WITH max_store_sales AS
    |  (SELECT max(csales) tpcds_cmax
    |  FROM (SELECT
    |    sum(b) csales
    |  FROM t1 WHERE a < 100 ) x),
    |best_ss_customer AS
    |  (SELECT
    |    c
    |  FROM t2
    |  WHERE d > (SELECT * FROM max_store_sales))
    |
    |SELECT c FROM best_ss_customer
    |""".stripMargin).explain("cost")
```
Before this PR's output:
```
== Optimized Logical Plan ==
Project [c#4263L], Statistics(sizeInBytes=31.3 KiB, rowCount=2.00E+3)
+- Filter (isnotnull(d#4264L) AND (d#4264L > scalar-subquery#4262 [])), Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3)
   :  +- Aggregate [max(csales#4260L) AS tpcds_cmax#4261L]
   :     +- Aggregate [sum(b#4266L) AS csales#4260L]
   :        +- Project [b#4266L]
   :           +- Filter ((a#4265L < 100) AND isnotnull(a#4265L))
   :              +- Relation default.t1[a#4265L,b#4266L] parquet, Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
   +- Relation default.t2[c#4263L,d#4264L] parquet, Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3)
```

After this pr:
```
== Optimized Logical Plan ==
Project [c#4481L], Statistics(sizeInBytes=31.3 KiB, rowCount=2.00E+3)
+- Filter (isnotnull(d#4482L) AND (d#4482L > scalar-subquery#4480 [])), Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3)
   :  +- Aggregate [max(csales#4478L) AS tpcds_cmax#4479L], Statistics(sizeInBytes=16.0 B, rowCount=1)
   :     +- Aggregate [sum(b#4484L) AS csales#4478L], Statistics(sizeInBytes=16.0 B, rowCount=1)
   :        +- Project [b#4484L], Statistics(sizeInBytes=1616.0 B, rowCount=101)
   :           +- Filter (isnotnull(a#4483L) AND (a#4483L < 100)), Statistics(sizeInBytes=2.4 KiB, rowCount=101)
   :              +- Relation[a#4483L,b#4484L] parquet, Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
   +- Relation[c#4481L,d#4482L] parquet, Statistics(sizeInBytes=46.9 KiB, rowCount=2.00E+3)

```

### Why are the changes needed?
Complete explain treeString's statistics

### Does this PR introduce _any_ user-facing change?
When user use explain with cost mode, user can see subquery's statistic too.

### How was this patch tested?
Added UT

Closes #31485 from AngersZhuuuu/SPARK-34137.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 2f387b4)
The file was addedsql/core/src/test/resources/sql-tests/inputs/explain-cbo.sql
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala (diff)
The file was addedsql/core/src/test/resources/sql-tests/results/explain-cbo.sql.out
Commit e6753c9402b5c40d9e2af662f28bd4f07a0bae17 by wenchen
[SPARK-33995][SQL] Expose make_interval as a Scala function

### What changes were proposed in this pull request?

This pull request exposes the `make_interval` function, [as suggested here](https://github.com/apache/spark/pull/31000#pullrequestreview-560812433), and as agreed to [here](https://github.com/apache/spark/pull/31000#issuecomment-754856820) and [here](https://github.com/apache/spark/pull/31000#issuecomment-755040234).

This powerful little function allows for idiomatic datetime arithmetic via the Scala API:

```scala
// add two hours
df.withColumn("plus_2_hours", col("first_datetime") + make_interval(hours = lit(2)))

// subtract one week and 30 seconds
col("d") - make_interval(weeks = lit(1), secs = lit(30))
```

The `make_interval` [SQL function](https://github.com/apache/spark/pull/26446) already exists.

Here is [the JIRA ticket](https://issues.apache.org/jira/browse/SPARK-33995) for this PR.

### Why are the changes needed?

The Spark API makes it easy to perform datetime addition / subtraction with months (`add_months`) and days (`date_add`).  Users need to write code like this to perform datetime addition with years, weeks, hours, minutes, or seconds:

```scala
df.withColumn("plus_2_hours", expr("first_datetime + INTERVAL 2 hours"))
```

We don't want to force users to manipulate SQL strings when they're using the Scala API.

### Does this PR introduce _any_ user-facing change?

Yes, this PR adds `make_interval` to the `org.apache.spark.sql.functions` API.

This single function will benefit a lot of users.  It's a small increase in the surface of the API for a big gain.

### How was this patch tested?

This was tested via unit tests.

cc: MaxGekk

Closes #31073 from MrPowers/SPARK-33995.

Authored-by: MrPowers <matthewkevinpowers@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: e6753c9)
The file was addedsql/core/src/test/java/test/org/apache/spark/sql/JavaDateFunctionsSuite.java
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
Commit 0a37a9522414d4ffffb851d3e3aa962adc61ec66 by yamamuro
[SPARK-31816][SQL][DOCS] Added high level description about JDBC connection providers for users/developers

### What changes were proposed in this pull request?
JDBC connection provider API and embedded connection providers already added to the code but no in-depth description about the internals. In this PR I've added both user and developer documentation and additionally added an example custom JDBC connection provider.

### Why are the changes needed?
No documentation and example custom JDBC provider.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
```
cd docs/
SKIP_API=1 jekyll build
```
<img width="793" alt="Screenshot 2021-02-02 at 16 35 43" src="https://user-images.githubusercontent.com/18561820/106623428-e48d2880-6574-11eb-8d14-e5c2aa7c37f1.png">

Closes #31384 from gaborgsomogyi/SPARK-31816.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: 0a37a95)
The file was addedexamples/src/main/resources/META-INF/services/org.apache.spark.sql.jdbc.JdbcConnectionProvider
The file was modifieddocs/sql-data-sources-jdbc.md (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/jdbc/README.md
The file was addedexamples/src/main/scala/org/apache/spark/examples/sql/jdbc/ExampleJdbcConnectionProvider.scala
Commit 0986f16c8d2f0d648d5239958dd4bccac43890c2 by viirya
[SPARK-34347][SQL] CatalogImpl.uncacheTable should invalidate in cascade for temp views

### What changes were proposed in this pull request?

This PR includes the following changes:
1. in `CatalogImpl.uncacheTable`, invalidate caches in cascade when the target table is
a temp view, and `spark.sql.legacy.storeAnalyzedPlanForView` is false (default value).
2. make `SessionCatalog.lookupTempView` public and return processed temp view plan (i.e., with `View` op).

### Why are the changes needed?

Following [SPARK-34052](https://issues.apache.org/jira/browse/SPARK-34052) (#31107), we should invalidate in cascade for `CatalogImpl.uncacheTable` when the table is a temp view, so that the behavior is consistent.

### Does this PR introduce _any_ user-facing change?

Yes, now `SQLContext.uncacheTable` will drop temp view in cascade by default.

### How was this patch tested?

Added a UT

Closes #31462 from sunchao/SPARK-34347.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: 0986f16)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala (diff)
Commit c082c537de6d4a2b8443179f9f04b62abdc66353 by wenchen
[SPARK-34404][SQL] Add new Avro datasource options to control datetime rebasing in read

### What changes were proposed in this pull request?
In the PR, I propose new option `datetimeRebaseMode` for the Avro datasource. The option influences on loading ancient dates and timestamps column values from avro files.

The option supports the same values as the SQL config `spark.sql.legacy.avro.datetimeRebaseModeInRead` namely;
- `"LEGACY"`, when an option is set to this value, Spark rebases dates/timestamps from the legacy hybrid calendar (Julian + Gregorian) to the Proleptic Gregorian calendar.
- `"CORRECTED"`, dates/timestamps are read AS IS from avro files.
- `"EXCEPTION"`, when it is set as an option value, Spark will fail the reading if it sees ancient dates/timestamps that are ambiguous between the two calendars.

### Why are the changes needed?
1. New options will allow to load avro files from at least two sources in different rebasing modes in the same query. For instance:
```scala
val df1 = spark.read.option("datetimeRebaseMode", "legacy").format("avro").load(folder1)
val df2 = spark.read.option("datetimeRebaseMode", "corrected").format("avro").load(folder2)
df1.join(df2, ...)
```
Before the changes, it is impossible because the SQL config `spark.sql.legacy.avro.datetimeRebaseModeInRead` influences on both reads.

2. Mixing of Dataset/DataFrame and RDD APIs should become possible. Since SQL configs are not propagated through RDDs, the following code fails on ancient timestamps:
```scala
spark.conf.set("spark.sql.legacy.avro.datetimeRebaseModeInRead", "legacy")
spark.read.format("avro").load(folder).distinct.rdd.collect()
```

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the modified test suites:
```
$ build/sbt "test:testOnly *AvroV1Suite"
$ build/sbt "test:testOnly *AvroV2Suite"
```

Closes #31529 from MaxGekk/avro-rebase-options.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: c082c53)
The file was modifiedexternal/avro/src/test/scala/org/apache/spark/sql/avro/AvroSerdeSuite.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroPartitionReaderFactory.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala (diff)
The file was modifiedexternal/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala (diff)
Commit 32a523b56fc329e858f3a463314c13e3710be5e0 by wenchen
[SPARK-34234][SQL] Remove TreeNodeException that didn't work

### What changes were proposed in this pull request?
`TreeNodeException` causes the error msg not clear and it didn't work well.
Because the `TreeNodeException` looks redundancy, we could remove it.

There are show a case:
```
val df = Seq(("1", 1), ("1", 2), ("2", 3), ("2", 4)).toDF("x", "y")
val hashAggDF = df.groupBy("x").agg(c, sum("y"))
```
The above code will use `HashAggregateExec`. In order to ensure that an exception will be thrown when executing `HashAggregateExec`, I added `throw new RuntimeException("calculate error")` into https://github.com/apache/spark/blob/72b7f8abfb60d0008f1f9bed94ce1c367a7d7cce/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L85

So, if the above code is executed, `RuntimeException("calculate error")` will be thrown.
Before this PR, the error is:
```
execute, tree:
ShuffleQueryStage 0
+- Exchange hashpartitioning(x#105, 5), ENSURE_REQUIREMENTS, [id=#168]
   +- HashAggregate(keys=[x#105], functions=[partial_sum(y#106)], output=[x#105, sum#118L])
      +- Project [_1#100 AS x#105, _2#101 AS y#106]
         +- LocalTableScan [_1#100, _2#101]

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
ShuffleQueryStage 0
+- Exchange hashpartitioning(x#105, 5), ENSURE_REQUIREMENTS, [id=#168]
   +- HashAggregate(keys=[x#105], functions=[partial_sum(y#106)], output=[x#105, sum#118L])
      +- Project [_1#100 AS x#105, _2#101 AS y#106]
         +- LocalTableScan [_1#100, _2#101]

at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.doMaterialize(QueryStageExec.scala:163)
at org.apache.spark.sql.execution.adaptive.QueryStageExec.$anonfun$materialize$1(QueryStageExec.scala:81)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.adaptive.QueryStageExec.materialize(QueryStageExec.scala:79)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$5(AdaptiveSparkPlanExec.scala:207)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$5$adapted(AdaptiveSparkPlanExec.scala:205)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:205)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:179)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:289)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3708)
at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2977)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3699)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3697)
at org.apache.spark.sql.Dataset.collect(Dataset.scala:2977)
at org.apache.spark.sql.DataFrameAggregateSuite.$anonfun$assertNoExceptions$3(DataFrameAggregateSuite.scala:665)
at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
at org.apache.spark.sql.DataFrameAggregateSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(DataFrameAggregateSuite.scala:37)
at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:246)
at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:244)
at org.apache.spark.sql.DataFrameAggregateSuite.withSQLConf(DataFrameAggregateSuite.scala:37)
at org.apache.spark.sql.DataFrameAggregateSuite.$anonfun$assertNoExceptions$2(DataFrameAggregateSuite.scala:659)
at org.apache.spark.sql.DataFrameAggregateSuite.$anonfun$assertNoExceptions$2$adapted(DataFrameAggregateSuite.scala:655)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
at org.apache.spark.sql.DataFrameAggregateSuite.assertNoExceptions(DataFrameAggregateSuite.scala:655)
at org.apache.spark.sql.DataFrameAggregateSuite.$anonfun$new$126(DataFrameAggregateSuite.scala:695)
at org.apache.spark.sql.DataFrameAggregateSuite.$anonfun$new$126$adapted(DataFrameAggregateSuite.scala:695)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.DataFrameAggregateSuite.$anonfun$new$125(DataFrameAggregateSuite.scala:695)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:176)
at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:61)
at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:61)
at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233)
at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232)
at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
at org.scalatest.Suite.run(Suite.scala:1112)
at org.scalatest.Suite.run$(Suite.scala:1094)
at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:237)
at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:237)
at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:236)
at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:61)
at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:61)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1320)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1314)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1314)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:993)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:971)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1480)
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:971)
at org.scalatest.tools.Runner$.run(Runner.scala:798)
at org.scalatest.tools.Runner.run(Runner.scala)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:131)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
HashAggregate(keys=[x#105], functions=[partial_sum(y#106)], output=[x#105, sum#118L])
+- Project [_1#100 AS x#105, _2#101 AS y#106]
   +- LocalTableScan [_1#100, _2#101]

at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doExecute(HashAggregateExec.scala:84)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:118)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD(ShuffleExchangeExec.scala:118)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.mapOutputStatisticsFuture$lzycompute(ShuffleExchangeExec.scala:122)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.mapOutputStatisticsFuture(ShuffleExchangeExec.scala:121)
at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.$anonfun$doMaterialize$1(QueryStageExec.scala:163)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
... 91 more
Caused by: java.lang.RuntimeException: calculate error
at org.apache.spark.sql.execution.aggregate.HashAggregateExec.$anonfun$doExecute$1(HashAggregateExec.scala:85)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
... 103 more
```

After this PR, the error is:
```
calculate error
java.lang.RuntimeException: calculate error
at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doExecute(HashAggregateExec.scala:84)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:117)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD(ShuffleExchangeExec.scala:117)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.mapOutputStatisticsFuture$lzycompute(ShuffleExchangeExec.scala:121)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.mapOutputStatisticsFuture(ShuffleExchangeExec.scala:120)
at org.apache.spark.sql.execution.adaptive.ShuffleQueryStageExec.doMaterialize(QueryStageExec.scala:161)
at org.apache.spark.sql.execution.adaptive.QueryStageExec.$anonfun$materialize$1(QueryStageExec.scala:80)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.adaptive.QueryStageExec.materialize(QueryStageExec.scala:78)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$5(AdaptiveSparkPlanExec.scala:207)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$5$adapted(AdaptiveSparkPlanExec.scala:205)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$getFinalPhysicalPlan$1(AdaptiveSparkPlanExec.scala:205)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.getFinalPhysicalPlan(AdaptiveSparkPlanExec.scala:179)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:289)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3708)
at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2977)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3699)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3697)
at org.apache.spark.sql.Dataset.collect(Dataset.scala:2977)
at org.apache.spark.sql.DataFrameAggregateSuite.$anonfun$assertNoExceptions$3(DataFrameAggregateSuite.scala:665)
at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
at org.apache.spark.sql.DataFrameAggregateSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(DataFrameAggregateSuite.scala:37)
at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:246)
at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:244)
at org.apache.spark.sql.DataFrameAggregateSuite.withSQLConf(DataFrameAggregateSuite.scala:37)
at org.apache.spark.sql.DataFrameAggregateSuite.$anonfun$assertNoExceptions$2(DataFrameAggregateSuite.scala:659)
at org.apache.spark.sql.DataFrameAggregateSuite.$anonfun$assertNoExceptions$2$adapted(DataFrameAggregateSuite.scala:655)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
at org.apache.spark.sql.DataFrameAggregateSuite.assertNoExceptions(DataFrameAggregateSuite.scala:655)
at org.apache.spark.sql.DataFrameAggregateSuite.$anonfun$new$126(DataFrameAggregateSuite.scala:695)
at org.apache.spark.sql.DataFrameAggregateSuite.$anonfun$new$126$adapted(DataFrameAggregateSuite.scala:695)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.DataFrameAggregateSuite.$anonfun$new$125(DataFrameAggregateSuite.scala:695)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:176)
at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:61)
at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:61)
at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233)
at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233)
at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232)
at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563)
at org.scalatest.Suite.run(Suite.scala:1112)
at org.scalatest.Suite.run$(Suite.scala:1094)
at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563)
at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:237)
at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:237)
at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:236)
at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:61)
at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:61)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13(Runner.scala:1320)
at org.scalatest.tools.Runner$.$anonfun$doRunRunRunDaDoRunRun$13$adapted(Runner.scala:1314)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1314)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24(Runner.scala:993)
at org.scalatest.tools.Runner$.$anonfun$runOptionallyWithPassFailReporter$24$adapted(Runner.scala:971)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1480)
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:971)
at org.scalatest.tools.Runner$.run(Runner.scala:798)
at org.scalatest.tools.Runner.run(Runner.scala)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:131)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
```

### Why are the changes needed?
`TreeNodeException` didn't work well.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Jenkins test.

Closes #31337 from beliefer/SPARK-34234.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 32a523b)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was removedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/errors/package.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/RuleExecutorSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortAggregateExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/ExpandExec.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/ReferenceSort.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerStructuralIntegrityCheckerSuite.scala (diff)