Changes

Summary

  1. [SPARK-35456][CORE] Print the invalid value in config validation error (commit: 2e9936d) (details)
  2. [SPARK-35445][SQL] Reduce the execution time of DeduplicateRelations (commit: e1296ea) (details)
  3. [SPARK-35479][SQL] Format PartitionFilters IN strings in scan nodes (commit: 1a923f5) (details)
  4. [SPARK-35063][SQL] Group exception messages in sql/catalyst (commit: c740c09) (details)
  5. [SPARK-35427][SQL][TESTS] Check the `EXCEPTION` rebase mode for (commit: 2b08070) (details)
  6. [SPARK-35063][SQL][FOLLOWUP] Fix scala 2.13 error (commit: 8373785) (details)
  7. [SPARK-35454][SQL] One LogicalPlan can match multiple dataset ids (commit: 34284c0) (details)
  8. [SPARK-35395][DOCS] Move ORC data source options from Python and Scala (commit: 0fe65b5) (details)
  9. [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and (commit: 419ddcb) (details)
  10. [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from (commit: d2bdd65) (details)
  11. [SPARK-35482][K8S] Use `spark.blockManager.port` not the wrong (commit: d957426) (details)
  12. [SPARK-34558][SQL][FOLLOWUP] Do not fail Spark startup with a broken (commit: 7274e3a) (details)
  13. [SPARK-34558][SQL][TESTS][FOLLOWUP] Fix a test to use a unknown (commit: cc05daa) (details)
  14. [SPARK-35439][SQL] Children subexpr should come first than parent (commit: 066944c) (details)
  15. [SPARK-35465][PYTHON] Set up the mypy configuration to enable (commit: 2616d5c) (details)
  16. [SPARK-28551][SQL][FOLLOWUP] Use the corrected hadoop conf (commit: b624b7e) (details)
  17. [SPARK-35487][BUILD] Upgrade dropwizard metrics to 4.2.0 (commit: 6bd6e46) (details)
  18. [MINOR][SQL] Change the script name for creating oracle docker image (commit: 0549caf) (details)
  19. [SPARK-35488][BUILD] Upgrade ASM to 7.3.1 (commit: 003294c) (details)
  20. [SPARK-35489][BUILD] Upgrade ORC to 1.6.8 (commit: fa424ac) (details)
  21. [SPARK-35480][SQL] Make percentile_approx work with pivot (commit: 1d9f09d) (details)
  22. [SPARK-35463][BUILD][FOLLOWUP] Redirect output for skipping checksum (commit: 594ffd2) (details)
  23. [MINOR][FOLLOWUP] Update SHA for the oracle docker image (commit: a59a214) (details)
  24. [SPARK-35226][SQL][FOLLOWUP] Fix test added in SPARK-35226 for (commit: 1a43415) (details)
  25. [SPARK-35493][K8S] make `spark.blockManager.port` fallback for (commit: 96b0548) (details)
  26. [SPARK-35492][BUILD] Upgrade httpcore to 4.4.14 (commit: d5868eb) (details)
  27. [SPARK-35410][SQL] SubExpr elimination should not include redundant (commit: 9e1b204) (details)
Commit 2e9936db93aae6e5b16b907b74ec0ae75b29cf9a by gurwls223
[SPARK-35456][CORE] Print the invalid value in config validation error message

### What changes were proposed in this pull request?

Print the invalid value in config validation error message for `checkValue` just like `checkValues`

### Why are the changes needed?

Invalid configuration values may come in many ways, this PR can help different kinds of users or developers to identify what the config the error is related to

### Does this PR introduce _any_ user-facing change?

yes, but only error msg
### How was this patch tested?

yes, modified tests

Closes #32600 from yaooqinn/SPARK-35456.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 2e9936d)
The file was modifiedcore/src/test/scala/org/apache/spark/internal/config/ConfigEntrySuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/timezone.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfEntrySuite.scala (diff)
Commit e1296eab5fa23d40b23a2a7330e65aec9e7e1a85 by ltnwgl
[SPARK-35445][SQL] Reduce the execution time of DeduplicateRelations

### What changes were proposed in this pull request?

This PR reduces the execution time of `DeduplicateRelations` by:

1) use `Set` instead `Seq` to check duplicate relations

2) avoid plan output traverse and attribute rewrites when there are no changes in the children plan

### Why are the changes needed?

Rule `DeduplicateRelations` is slow.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Run `TPCDSQuerySuite` and checked the run time of `DeduplicateRelations`. The time has been reduced by 77.9% after this PR.

Closes #32590 from Ngone51/improve-dedup.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Gengliang Wang <ltnwgl@gmail.com>
(commit: e1296ea)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala (diff)
Commit 1a923f5319f5ba62811cea1208c95b4ae8e65a42 by wenchen
[SPARK-35479][SQL] Format PartitionFilters IN strings in scan nodes

### What changes were proposed in this pull request?

This PR proposes to format strings correctly for `PushedFilters`. For example, `explain()` for a query below prints `v in (array('a'))` as `PushedFilters: [In(v, [WrappedArray(a)])]`;

```
scala> sql("create table t (v array<string>) using parquet")
scala> sql("select * from t where v in (array('a'), null)").explain()
== Physical Plan ==
*(1) Filter v#4 IN ([a],null)
+- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN ([a],null)], Format: Parquet, Location: InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t], PartitionFilters: [], PushedFilters: [In(v, [WrappedArray(a),null])], ReadSchema: struct<v:array<string>>
```
This PR makes `explain()` print it as `PushedFilters: [In(v, [[a]])]`;
```
scala> sql("select * from t where v in (array('a'), null)").explain()
== Physical Plan ==
*(1) Filter v#4 IN ([a],null)
+- FileScan parquet default.t[v#4] Batched: false, DataFilters: [v#4 IN ([a],null)], Format: Parquet, Location: InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-3.1.1-bin-hadoop2.7/spark-warehouse/t], PartitionFilters: [], PushedFilters: [In(v, [[a],null])], ReadSchema: struct<v:array<string>>
```
NOTE: This PR includes a bugfix caused by #32577 (See the cloud-fan comment: https://github.com/apache/spark/pull/32577/files#r636108150).

### Why are the changes needed?

To improve explain strings.

### Does this PR introduce _any_ user-facing change?

Yes, this PR improves the explain strings for pushed-down filters.

### How was this patch tested?

Added tests in `SQLQueryTestSuite`.

Closes #32615 from maropu/ExplainPartitionFilters.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 1a923f5)
The file was modifiedsql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/explain.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/explain.sql (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala (diff)
Commit c740c097e086ddaa30737ae63501d7ca547ea4c5 by wenchen
[SPARK-35063][SQL] Group exception messages in sql/catalyst

### What changes were proposed in this pull request?
This PR group exception messages in `sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst`.

### Why are the changes needed?
It will largely help with standardization of error messages and its maintenance.

### Does this PR introduce _any_ user-facing change?
No. Error messages remain unchanged.

### How was this patch tested?
No new tests - pass all original tests to make sure it doesn't break any existing behavior.

Closes #32478 from beliefer/SPARK-35063.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: c740c09)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/AnalysisHelper.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleIdCollection.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CTESubstitution.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/FailureSafeParser.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/ValueInterval.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveUnion.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/First.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharVarcharUtils.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
Commit 2b08070e79134304fa1a15c74e94c5f1c6359ac7 by wenchen
[SPARK-35427][SQL][TESTS] Check the `EXCEPTION` rebase mode for Avro/Parquet

### What changes were proposed in this pull request?
Add tests to check the `EXCEPTION` rebase mode explicitly in the datasources:
- Parquet: `DATE` type and `TIMESTAMP`: `INT96`, `TIMESTAMP_MICROS`, `TIMESTAMP_MILLIS`
- Avro: `DATE` type and `TIMESTAMP`: `timestamp-millis` and `timestamp-micros`.

### Why are the changes needed?
1. To improve test coverage
2. The `EXCEPTION` rebase mode should be checked independently from the default settings.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the affected test suites:
```
$ build/sbt "test:testOnly *AvroV2Suite"
$ build/sbt "test:testOnly *ParquetRebaseDatetimeV1Suite"
```

Closes #32574 from MaxGekk/test-rebase-exception.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 2b08070)
The file was modifiedexternal/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRebaseDatetimeSuite.scala (diff)
Commit 83737852f08b609d705e0a6aaa56dee6bf2bb614 by wenchen
[SPARK-35063][SQL][FOLLOWUP] Fix scala 2.13 error

### What changes were proposed in this pull request?

### Why are the changes needed?

Fix scala compile error.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass GA

Closes #32617 from ulysses-you/scala2-13.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 8373785)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala (diff)
Commit 34284c06496cd621792c0f9dfc90435da0ab9eb5 by wenchen
[SPARK-35454][SQL] One LogicalPlan can match multiple dataset ids

### What changes were proposed in this pull request?

Change the type of `DATASET_ID_TAG` from `Long` to `HashSet[Long]` to allow the logical plan to match multiple datasets.

### Why are the changes needed?

During the transformation from one Dataset to another Dataset, the DATASET_ID_TAG of logical plan won't change if the plan itself doesn't change:

https://github.com/apache/spark/blob/b5241c97b17a1139a4ff719bfce7f68aef094d95/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L234-L237

However, dataset id always changes even if the logical plan doesn't change:
https://github.com/apache/spark/blob/b5241c97b17a1139a4ff719bfce7f68aef094d95/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L207-L208

And this can lead to the mismatch between dataset's id and col's __dataset_id. E.g.,

```scala
  test("SPARK-28344: fail ambiguous self join - Dataset.colRegex as column ref") {
    // The test can fail if we change it to:
    // val df1 = spark.range(3).toDF()
    // val df2 = df1.filter($"id" > 0).toDF()
    val df1 = spark.range(3)
    val df2 = df1.filter($"id" > 0)

    withSQLConf(
      SQLConf.FAIL_AMBIGUOUS_SELF_JOIN_ENABLED.key -> "true",
      SQLConf.CROSS_JOINS_ENABLED.key -> "true") {
      assertAmbiguousSelfJoin(df1.join(df2, df1.colRegex("id") > df2.colRegex("id")))
    }
  }
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added unit tests.

Closes #32616 from Ngone51/fix-ambiguous-join.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 34284c0)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Dataset.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/analysis/DetectAmbiguousSelfJoin.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala (diff)
Commit 0fe65b536585b2c4b65df355dde45adb583f0f1c by gurwls223
[SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

### What changes were proposed in this pull request?

This PR proposes move ORC data source options from Python, Scala and Java into a single page.

### Why are the changes needed?

So far, the documentation for ORC data source options is separated into different pages for each language API documents. However, this makes managing many options inconvenient, so it is efficient to manage all options in a single page and provide a link to that page in the API of each language.

### Does this PR introduce _any_ user-facing change?

Yes, the documents will be shown below after this change:

- "ORC Files" page
![Screen Shot 2021-05-21 at 2 07 14 PM](https://user-images.githubusercontent.com/44108233/119085078-f4564d00-ba3d-11eb-8990-3ba031d809da.png)

- Python
![Screen Shot 2021-05-21 at 2 06 46 PM](https://user-images.githubusercontent.com/44108233/119085097-00daa580-ba3e-11eb-8017-ac5a95a7c053.png)

- Scala
![Screen Shot 2021-05-21 at 2 06 09 PM](https://user-images.githubusercontent.com/44108233/119085135-164fcf80-ba3e-11eb-9cac-78dded523f38.png)

- Java
![Screen Shot 2021-05-21 at 2 06 30 PM](https://user-images.githubusercontent.com/44108233/119085125-118b1b80-ba3e-11eb-9434-f26612d7da13.png)

### How was this patch tested?

Manually build docs and confirm the page.

Closes #32546 from itholic/SPARK-35395.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 0fe65b5)
The file was modifieddocs/sql-data-sources-orc.md (diff)
The file was modifiedpython/pyspark/sql/streaming.py (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala (diff)
The file was modifiedpython/pyspark/sql/readwriter.py (diff)
Commit 419ddcb2a433ee913648b5fae58a8f368824b457 by gurwls223
[SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

### What changes were proposed in this pull request?

This PR proposes move JSON data source options from Python, Scala and Java into a single page.

### Why are the changes needed?

So far, the documentation for JSON data source options is separated into different pages for each language API documents. However, this makes managing many options inconvenient, so it is efficient to manage all options in a single page and provide a link to that page in the API of each language.

### Does this PR introduce _any_ user-facing change?

Yes, the documents will be shown below after this change:

- "JSON Files" page
<img width="876" alt="Screen Shot 2021-05-20 at 8 48 27 PM" src="https://user-images.githubusercontent.com/44108233/118973662-ddb3e580-b9ac-11eb-987c-8139aa9c3fe2.png">

- Python
<img width="714" alt="Screen Shot 2021-04-16 at 5 04 11 PM" src="https://user-images.githubusercontent.com/44108233/114992491-ca0cef00-9ed5-11eb-9d0f-4de60d8b2516.png">

- Scala
<img width="726" alt="Screen Shot 2021-04-16 at 5 04 54 PM" src="https://user-images.githubusercontent.com/44108233/114992594-e315a000-9ed5-11eb-8bd3-af7e568fcfe1.png">

- Java
<img width="911" alt="Screen Shot 2021-04-16 at 5 06 11 PM" src="https://user-images.githubusercontent.com/44108233/114992751-10624e00-9ed6-11eb-888c-8668d3c74289.png">

### How was this patch tested?

Manually build docs and confirm the page.

Closes #32204 from itholic/SPARK-35081.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 419ddcb)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala (diff)
The file was modifieddocs/sql-data-sources-json.md (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was modifiedpython/pyspark/sql/readwriter.py (diff)
The file was modifiedpython/pyspark/sql/streaming.py (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala (diff)
Commit d2bdd6595ec3495672c60e225948c99bfaeff04a by gurwls223
[SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page

### What changes were proposed in this pull request?

This PR proposes move Parquet data source options from Python, Scala and Java into a single page.

### Why are the changes needed?

So far, the documentation for Parquet data source options is separated into different pages for each language API documents. However, this makes managing many options inconvenient, so it is efficient to manage all options in a single page and provide a link to that page in the API of each language.

### Does this PR introduce _any_ user-facing change?

Yes, the documents will be shown below after this change:

- "Parquet Files" page
![Screen Shot 2021-05-21 at 1 35 08 PM](https://user-images.githubusercontent.com/44108233/119082866-e7375f00-ba39-11eb-9ade-a931a5957b34.png)

- Python
![Screen Shot 2021-05-21 at 1 38 27 PM](https://user-images.githubusercontent.com/44108233/119082879-eef70380-ba39-11eb-9e8e-ee50eed98dbe.png)

- Scala
![Screen Shot 2021-05-21 at 1 36 52 PM](https://user-images.githubusercontent.com/44108233/119082884-f1595d80-ba39-11eb-98d5-966657df65f7.png)

- Java
![Screen Shot 2021-05-21 at 1 37 19 PM](https://user-images.githubusercontent.com/44108233/119082888-f4544e00-ba39-11eb-8bf8-47ce78ec0b01.png)

### How was this patch tested?

Manually build docs and confirm the page.

Closes #32161 from itholic/SPARK-34491.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: d2bdd65)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (diff)
The file was modifiedpython/pyspark/sql/streaming.py (diff)
The file was modifiedpython/pyspark/sql/readwriter.py (diff)
The file was modifieddocs/sql-data-sources-parquet.md (diff)
Commit d957426351149dd1b4e1106d1230f395934f61d2 by dhyun
[SPARK-35482][K8S] Use `spark.blockManager.port` not the wrong `spark.blockmanager.port` in BasicExecutorFeatureStep

### What changes were proposed in this pull request?

most spark conf keys are case sensitive, including `spark.blockManager.port`, we can not get the correct port number with `spark.blockmanager.port`.

This PR changes the wrong key to `spark.blockManager.port` in `BasicExecutorFeatureStep`.

This PR also ensures a fast fail when the port value is invalid for executor containers. When 0 is specified(it is valid as random port, but invalid as a k8s request), it should not be put in the `containerPort` field of executor pod desc. We do not expect executor pods to continuously fail to create because of invalid requests.

### Why are the changes needed?

bugfix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests

Closes #32621 from yaooqinn/SPARK-35482.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: d957426)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala (diff)
Commit 7274e3a4d24b85c99c42c115a227df42074ea739 by dhyun
[SPARK-34558][SQL][FOLLOWUP] Do not fail Spark startup with a broken FileSystem

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/31671

https://github.com/apache/spark/pull/31671 qualifies the warehouse at the beginning, which may fail Spark startup if something goes wrong, like the underlying FileSystem can't be initialized.

This PR falls back to the old behavior and leave the warehouse path unqualified if qualifying fails.

### Why are the changes needed?

Fix a regression. It's important to be always able to start Spark app (e.g. spark-shell), so that we can debug.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

a new test case

Closes #32622 from cloud-fan/follow2.

Lead-authored-by: Wenchen Fan <wenchen@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 7274e3a)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala (diff)
Commit cc05daa8842a6c2fa8d58217f5448b123c59136b by dhyun
[SPARK-34558][SQL][TESTS][FOLLOWUP] Fix a test to use a unknown filesystem

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/32622 to fix a test case.

### Why are the changes needed?

Fix a wrong test case name and fix the test case to cause the expected error correctly.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #32623 from dongjoon-hyun/SPARK-34558.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: cc05daa)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala (diff)
Commit 066944c1bd7c577f8cb17332cc0d1eb0a84cd297 by viirya
[SPARK-35439][SQL] Children subexpr should come first than parent subexpr

### What changes were proposed in this pull request?

This patch sorts equivalent expressions based on their child-parent relation.

### Why are the changes needed?

`EquivalentExpressions` maintains a map of equivalent expressions. It is `HashMap` now so the insertion order is not guaranteed to be preserved later. Subexpression elimination relies on retrieving subexpressions from the map. If there is child-parent relationships among the subexpressions, we want the child expressions come first than parent expressions, so we can replace child expressions in parent expressions with subexpression evaluation.

For example, we have two different expressions `Add(Literal(1), Literal(2))` and `Add(Literal(3), add)`.

Case 1: child subexpr comes first.
```scala
addExprTree(add)
addExprTree(Add(Literal(3), add))
addExprTree(Add(Literal(3), add))
```

Case 2: parent subexpr comes first. For this case, we need to sort equivalent expressions.
```
addExprTree(Add(Literal(3), add))  => We add `Add(Literal(3), add)` into the map first, then add `add` into the map
addExprTree(add)
addExprTree(Add(Literal(3), add))
```

As we are going to sort equivalent expressions at all, we don't need `LinkedHashMap` but just do sorting.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added tests.

Closes #32586 from viirya/use-listhashmap.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: 066944c)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubexpressionEliminationSuite.scala (diff)
Commit 2616d5cc1d4b4cf42c9124eb12a7d70b909e548b by ueshin
[SPARK-35465][PYTHON] Set up the mypy configuration to enable disallow_untyped_defs check for pandas APIs on Spark module

### What changes were proposed in this pull request?

Sets up the `mypy` configuration to enable `disallow_untyped_defs` check for pandas APIs on Spark module.

### Why are the changes needed?

Currently many functions in the main codes in pandas APIs on Spark module are still missing type annotations and disabled `mypy` check `disallow_untyped_defs`.

We should add more type annotations and enable the mypy check.

### Does this PR introduce _any_ user-facing change?

Yes.
This PR adds more type annotations in pandas APIs on Spark module, which can impact interaction with development tools for users.

### How was this patch tested?

The mypy check with a new configuration and existing tests should pass.

Closes #32614 from ueshin/issues/SPARK-35465/disallow_untyped_defs.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
(commit: 2616d5c)
The file was modifiedpython/pyspark/pandas/indexes/multi.py (diff)
The file was modifiedpython/pyspark/pandas/extensions.py (diff)
The file was modifiedpython/pyspark/pandas/internal.py (diff)
The file was modifiedpython/pyspark/pandas/mlflow.py (diff)
The file was modifiedpython/pyspark/pandas/base.py (diff)
The file was modifiedpython/pyspark/pandas/numpy_compat.py (diff)
The file was modifiedpython/pyspark/pandas/__init__.py (diff)
The file was modifiedpython/pyspark/pandas/exceptions.py (diff)
The file was modifiedpython/pyspark/pandas/indexes/category.py (diff)
The file was modifiedpython/pyspark/pandas/spark/utils.py (diff)
The file was modifiedpython/pyspark/pandas/spark/functions.py (diff)
The file was modifiedpython/pyspark/pandas/datetimes.py (diff)
The file was modifiedpython/pyspark/pandas/indexes/datetimes.py (diff)
The file was modifiedpython/pyspark/pandas/strings.py (diff)
The file was modifiedpython/pyspark/pandas/config.py (diff)
The file was modifiedpython/pyspark/pandas/indexes/numeric.py (diff)
The file was modifiedpython/pyspark/pandas/series.py (diff)
The file was modifiedpython/mypy.ini (diff)
The file was modifiedpython/pyspark/pandas/typedef/string_typehints.py (diff)
The file was modifiedpython/pyspark/pandas/categorical.py (diff)
The file was modifiedpython/pyspark/pandas/sql_processor.py (diff)
The file was modifiedpython/pyspark/pandas/indexes/base.py (diff)
The file was modifiedpython/pyspark/pandas/ml.py (diff)
Commit b624b7e93f3ae07d9550ef3fd258d73f0eeb1bf6 by yao
[SPARK-28551][SQL][FOLLOWUP] Use the corrected hadoop conf

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/32411, to fix a mistake and use `sparkSession.sessionState.newHadoopConf` which includes SQL configs instead of `sparkSession.sparkContext.hadoopConfiguration` .

### Why are the changes needed?

fix mistake

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #32618 from cloud-fan/follow1.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Kent Yao <yao@apache.org>
(commit: b624b7e)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala (diff)
Commit 6bd6e46aeccf2f0706461dd5a66ecb53b5ec38ef by dhyun
[SPARK-35487][BUILD] Upgrade dropwizard metrics to 4.2.0

### What changes were proposed in this pull request?

This PR upgrades Dropwizard metrics to 4.2.0.
I also modified the corresponding links in `docs/monitoring.md`.

### Why are the changes needed?

The latest version was released last week and it contains some improvements.
https://github.com/dropwizard/metrics/releases/tag/v4.2.0

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Build succeeds and all the modified links are reachable.

Closes #32628 from sarutak/upgrade-dropwizard-4.2.0.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 6bd6e46)
The file was modifieddocs/monitoring.md (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
Commit 0549caf07a8556a2fd107306d3342664ed217423 by dhyun
[MINOR][SQL] Change the script name for creating oracle docker image

### What changes were proposed in this pull request?

This PR changes the name in the comment in `jdbc.OracleIntegrationSuite` and `v2.OracleIntegrationSuite`.
The script is for creating oracle docker image.

### Why are the changes needed?

The name of the script is `buildContainerImage`, not `buildDockerImage` now.
- https://github.com/oracle/docker-images/commit/d918f5a4c6da9b0f294afb8650d1f6f951964262#diff-be303ab32e74192aca829e5ea259a0aec07aac23a6049120fb337ec4efa601b0

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed that I can build the image with `./buildContainerImage.sh -v 18.4.0 -x`.

Closes #32629 from sarutak/change-oracle-container-script-name.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 0549caf)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/OracleIntegrationSuite.scala (diff)
Commit 003294ce1dbe03ac00573ca92f954e31b162386b by sarutak
[SPARK-35488][BUILD] Upgrade ASM to 7.3.1

### What changes were proposed in this pull request?
This PR aims to upgrade ASM to 7.3.1.

- https://issues.apache.org/jira/browse/XBEAN-323
- https://asm.ow2.io/versions.html

### Why are the changes needed?
ASM 7.3.1 bring following changes

- new V15 constant
- experimental support for PermittedSubtypes and RecordComponent
- bug fixes
- - 317885: SKIP_DEBUG now skips MethodParameters attributes

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Ran with the existing UTs

Closes #32634 from vinodkc/br_build_upgrade_asm.

Authored-by: Vinod KC <vinod.kc.in@gmail.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(commit: 003294c)
The file was modifiedpom.xml (diff)
The file was modifiedproject/plugins.sbt (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
Commit fa424ac2b851ceaf3b2094cafa218d2867cc0da1 by dhyun
[SPARK-35489][BUILD] Upgrade ORC to 1.6.8

### What changes were proposed in this pull request?

This PR aims to upgrade ORC to 1.6.8.

### Why are the changes needed?

This will bring the latest bug fixes.
- https://orc.apache.org/news/2021/05/21/ORC-1.6.8/

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the existing CIs.

Closes #32635 from dongjoon-hyun/SPARK-35489.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: fa424ac)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
Commit 1d9f09decb280beb313f3ff0384364a40c47c4b4 by gurwls223
[SPARK-35480][SQL] Make percentile_approx work with pivot

### What changes were proposed in this pull request?

This PR proposes to avoid wrapping if-else to the constant literals for `percentage` and `accuracy` in `percentile_approx`. They are expected to be literals (or foldable expressions).

Pivot works by two phrase aggregations, and it works with manipulating the input to `null` for non-matched values (pivot column and value).

Note that pivot supports an optimized version without such logic with changing input to `null` for some types (non-nested types basically). So the issue fixed by this PR is only for complex types.

```scala
val df = Seq(
  ("a", -1.0), ("a", 5.5), ("a", 2.5), ("b", 3.0), ("b", 5.2)).toDF("type", "value")
  .groupBy().pivot("type", Seq("a", "b")).agg(
    percentile_approx(col("value"), array(lit(0.5)), lit(10000)))
df.show()
```

**Before:**

```
org.apache.spark.sql.AnalysisException: cannot resolve 'percentile_approx((IF((type <=> CAST('a' AS STRING)), value, CAST(NULL AS DOUBLE))), (IF((type <=> CAST('a' AS STRING)), array(0.5D), NULL)), (IF((type <=> CAST('a' AS STRING)), 10000, CAST(NULL AS INT))))' due to data type mismatch: The accuracy or percentage provided must be a constant literal;
'Aggregate [percentile_approx(if ((type#7 <=> cast(a as string))) value#8 else cast(null as double), if ((type#7 <=> cast(a as string))) array(0.5) else cast(null as array<double>), if ((type#7 <=> cast(a as string))) 10000 else cast(null as int), 0, 0) AS a#16, percentile_approx(if ((type#7 <=> cast(b as string))) value#8 else cast(null as double), if ((type#7 <=> cast(b as string))) array(0.5) else cast(null as array<double>), if ((type#7 <=> cast(b as string))) 10000 else cast(null as int), 0, 0) AS b#18]
+- Project [_1#2 AS type#7, _2#3 AS value#8]
   +- LocalRelation [_1#2, _2#3]
```

**After:**

```
+-----+-----+
|    a|    b|
+-----+-----+
|[2.5]|[3.0]|
+-----+-----+
```

### Why are the changes needed?

To make percentile_approx work with pivot as expected

### Does this PR introduce _any_ user-facing change?

Yes. It threw an exception but now it returns a correct result as shown above.

### How was this patch tested?

Manually tested and unit test was added.

Closes #32619 from HyukjinKwon/SPARK-35480.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 1d9f09d)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFramePivotSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
Commit 594ffd2db224c89f5375645a7a249d4befca6163 by dhyun
[SPARK-35463][BUILD][FOLLOWUP] Redirect output for skipping checksum check

### What changes were proposed in this pull request?

This patch is a followup of SPARK-35463. In SPARK-35463, we output a message to stdout and now we redirect it to stderr.

### Why are the changes needed?

All `echo` statements in `build/mvn` should redirect to stderr if it is not followed by `exit`. It is because we use `build/mvn` to get stdout output by other scripts. If we don't redirect it, we can get invalid output, e.g. got "Skipping checksum because shasum is not installed." as `commons-cli` version.

### Does this PR introduce _any_ user-facing change?

No. Dev only.

### How was this patch tested?

Manually test on internal system.

Closes #32637 from viirya/fix-build.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 594ffd2)
The file was modifiedbuild/mvn (diff)
Commit a59a214610dd96a6be654ea94ac513ec491e99a2 by dhyun
[MINOR][FOLLOWUP] Update SHA for the oracle docker image

### What changes were proposed in this pull request?

This PR updates SHA for the oracle docker image in the comment in `OracleIntegrationSuite` and `v2.OracleIntegrationSuite`.
The SHA for the latest image is `3f422c4a35b423dfcdbcc57a84f01db6c82eb6c1`

### Why are the changes needed?

The script name for creating the oracle docker image is changed in #32629, following the latest image so we also need to update the corresponding SHA.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The value is from `git log`.

Closes #32630 from sarutak/followup-oracle-script-name.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: a59a214)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/OracleIntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala (diff)
Commit 1a43415d8de1d6462d64a6b6f8daa505a3a81970 by dhyun
[SPARK-35226][SQL][FOLLOWUP] Fix test added in SPARK-35226 for DB2KrbIntegrationSuite

### What changes were proposed in this pull request?

This PR fixes an test added in SPARK-35226 (#32344).

### Why are the changes needed?

`SELECT 1` seems non-valid query for DB2.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

DB2KrbIntegrationSuite passes on my laptop.

I also confirmed all the KrbIntegrationSuites pass with the following command.
```
build/sbt -Phive -Phive-thriftserver -Pdocker-integration-tests "testOnly org.apache.spark.sql.jdbc.*KrbIntegrationSuite"
```

Closes #32632 from sarutak/followup-SPARK-35226.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 1a43415)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerKrbJDBCIntegrationSuite.scala (diff)
Commit 96b0548ab6d5fe36833812f7b6424c984f75c6dd by dhyun
[SPARK-35493][K8S] make `spark.blockManager.port` fallback for `spark.driver.blockManager.port` as same as other cluster managers

### What changes were proposed in this pull request?

`spark.blockManager.port` does not work for k8s driver pods now, we should make it work as other cluster managers.

### Why are the changes needed?

`spark.blockManager.port` should be able to work for spark driver pod

### Does this PR introduce _any_ user-facing change?

yes, `spark.blockManager.port` will be respect iff it is present  && `spark.driver.blockManager.port` is absent

### How was this patch tested?

new tests

Closes #32639 from yaooqinn/SPARK-35493.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 96b0548)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStepSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala (diff)
Commit d5868ebc398c7b69dbaaaa3218bc9b177252a3c8 by dhyun
[SPARK-35492][BUILD] Upgrade httpcore to 4.4.14

### What changes were proposed in this pull request?
This PR aims to upgrade Apache HttpCore from 4.4.12 to 4.4.14.

### Why are the changes needed?
Stability improvements in httpcore 4.4.14

- Bug fix: Non-blocking TLSv1.3 connections can end up in an infinite event spin when closed concurrently by the local and the remote endpoints.
- HTTPCORE-647: Non-blocking connection terminated due to 'java.io.IOException: Broken pipe' can enter an infinite loop flushing buffered output data.
- PR #201, HTTPCORE-634: Fix race condition in AbstractConnPool that can cause internal state
-   corruption
- HTTPCORE-612: DefaultConnectionReuseStrategy incorrectly used int to represent Content-Length value
-   instead of long

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
With Jenkins Tests

Closes #32638 from vinodkc/br_build_upgrade_httpcore.

Authored-by: Vinod KC <vinod.kc.in@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: d5868eb)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
Commit 9e1b204bcce4a8fe24c1edd8271197277b5017f4 by dhyun
[SPARK-35410][SQL] SubExpr elimination should not include redundant children exprs in conditional expression

### What changes were proposed in this pull request?

This patch fixes a bug when dealing with common expressions in conditional expressions such as `CaseWhen` during subexpression elimination.

For example, previously we find common expressions among conditions of `CaseWhen`, but children expressions are also counted into. We should not count these children expressions as common expressions.

### Why are the changes needed?

If the redundant children expressions are counted as common expressions too, they will be redundantly evaluated and miss the subexpression elimination opportunity.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added tests.

Closes #32559 from viirya/SPARK-35410.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 9e1b204)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubexpressionEliminationSuite.scala (diff)