Changes

Summary

  1. [SPARK-35194][SQL][FOLLOWUP] Recover build error with Scala 2.13 on GA (commit: b763db3) (details)
  2. [SPARK-35559][TEST] Speed up one test in AdaptiveQueryExecSuite (commit: 678592a) (details)
  3. [SPARK-35538][SQL] Migrate transformAllExpressions call sites to use (commit: 5c8a141) (details)
  4. [SPARK-31168][BUILD] Upgrade Scala to 2.12.14 (commit: 6c4b60f) (details)
  5. [SPARK-35532][TESTS] Ensure mllib and kafka-0-10 module can be maven (commit: 16d9de8) (details)
  6. [SPARK-35526][CORE][SQL][ML][MLLIB] Re-Cleanup `procedure syntax is (commit: 09d039d) (details)
  7. [SPARK-35550][BUILD] Upgrade Jackson to 2.12.3 (commit: ff27264) (details)
  8. [SPARK-31168][BUILD][FOLLOWUP] Update scala-2.12 profile (commit: 1a55019) (details)
  9. [SPARK-35545][SQL] Split SubqueryExpression's children field into outer (commit: 806da9d) (details)
  10. [SPARK-35507][INFRA] Add Python 3.9 in the docker image for GitHub (commit: c225196) (details)
  11. [SPARK-35566][SS] Fix StateStoreRestoreExec output rows (commit: 73ba449) (details)
  12. [SPARK-35562][DOC] Fix docs about Kubernetes and Yarn (commit: 8c69e9c) (details)
  13. [SPARK-34808][SQL] Removes outer join if it only has DISTINCT on (commit: 6cd6c43) (details)
  14. [SPARK-35575][INFRA] Recover updating build status in GitHub Actions (commit: 14e12c6) (details)
  15. [SPARK-35411][SQL][FOLLOWUP] Handle Currying Product while serializing (commit: 1603775) (details)
Commit b763db3efdd6a58e34c136b03426371400afefd1 by sarutak
[SPARK-35194][SQL][FOLLOWUP] Recover build error with Scala 2.13 on GA

### What changes were proposed in this pull request?

This PR fixes a build error with Scala 2.13 on GA.
#32301 seems to bring this error.

### Why are the changes needed?

To recover CI.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA

Closes #32696 from sarutak/followup-SPARK-35194.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(commit: b763db3)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala (diff)
Commit 678592a6121a5237f05956e7d9f0565d82d1860a by dhyun
[SPARK-35559][TEST] Speed up one test in AdaptiveQueryExecSuite

### What changes were proposed in this pull request?

I just noticed that `AdaptiveQueryExecSuite.SPARK-34091: Batch shuffle fetch in AQE partition coalescing` takes more than 10 minutes to finish, which is unacceptable.

This PR sets the shuffle partitions to 10 in that test, so that the test can finish with 5 seconds.

### Why are the changes needed?

speed up the test

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #32695 from cloud-fan/test.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 678592a)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
Commit 5c8a141d03ddd6da33b27f417b4c78c7fc6c3c28 by xingbo.jiang
[SPARK-35538][SQL] Migrate transformAllExpressions call sites to use transformAllExpressionsWithPruning

### What changes were proposed in this pull request?

Added the following TreePattern enums:
- EXCHANGE
- IN_SUBQUERY_EXEC
- UPDATE_FIELDS

Migrated `transformAllExpressions` call sites to use `transformAllExpressionsWithPruning`

### Why are the changes needed?

Reduce the number of tree traversals and hence improve the query compilation latency.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.
Perf diff:
Rule name | Total Time (baseline) | Total Time (experiment) | experiment/baseline
OptimizeUpdateFields | 54646396 | 27444424 | 0.5
ReplaceUpdateFieldsExpression  | 24694303 | 2087517 | 0.08

Closes #32643 from sigmod/all_expressions.

Authored-by: Yingyi Bu <yingyi.bu@databricks.com>
Signed-off-by: Xingbo Jiang <xingbo.jiang@databricks.com>
(commit: 5c8a141)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UpdateFields.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReuseAdaptiveSubquery.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveDynamicPruningFilters.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/DynamicPruning.scala (diff)
Commit 6c4b60f3b33287c8f1607bc09981e96b9aa7e2a4 by dhyun
[SPARK-31168][BUILD] Upgrade Scala to 2.12.14

### What changes were proposed in this pull request?

This PR is the 4th try to upgrade Scala 2.12.x in order to see the feasibility.
- https://github.com/apache/spark/pull/27929 (Upgrade Scala to 2.12.11, wangyum )
- https://github.com/apache/spark/pull/30940 (Upgrade Scala to 2.12.12, viirya )
- https://github.com/apache/spark/pull/31223 (Upgrade Scala to 2.12.13, dongjoon-hyun )

Note that Scala 2.12.14 has the following fix for Apache Spark community.
- Fix cyclic error in runtime reflection (protobuf), a regression that prevented Spark upgrading to 2.12.13

REQUIREMENTS:
- [x] `silencer` library is released via https://github.com/ghik/silencer/pull/66
- [x] `genjavadoc` library is released via https://github.com/lightbend/genjavadoc/issues/282

### Why are the changes needed?

Apache Spark was stuck to 2.12.10 due to the regression in Scala 2.12.11/2.12.12/2.12.13. This will bring all the bug fixes.
- https://github.com/scala/scala/releases/tag/v2.12.14
- https://github.com/scala/scala/releases/tag/v2.12.13
- https://github.com/scala/scala/releases/tag/v2.12.12
- https://github.com/scala/scala/releases/tag/v2.12.11

### Does this PR introduce _any_ user-facing change?

Yes, but this is a bug-fixed version.

### How was this patch tested?

Pass the CIs.

Closes #32697 from dongjoon-hyun/SPARK-31168.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 6c4b60f)
The file was modifieddocs/_config.yml (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifiedproject/SparkBuild.scala (diff)
Commit 16d9de815eab1b13614a3c448457e3d918c3d893 by dhyun
[SPARK-35532][TESTS] Ensure mllib and kafka-0-10 module can be maven test independently in Scala 2.13

### What changes were proposed in this pull request?
Before this pr, when we execute maven test command to test `mllib` and `kafka-0-10` module independently, there are some Java UTs failed, the key error messages are as follows:

```
java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport
```

and

```
java.lang.NoClassDefFoundError: scala/collection/parallel/immutable/ParVector
```

The UTs need `scala-parallel-collections_2.13`,  but it not in classpath when we run `mvn test -pl mllib -Pscala-2.13` and `mvn test -pl external/kafka-0-10 -Pscala-2.13`.

So the main change of this pr is add `scala-2.13` profile to `mllib/pom.xml` and `external/kafka-0-10/pom.xml`, the `scala-2.13` profile include dependency on `scala-parallel-collections_2.13`, then these two modules can maven test independently.

### Why are the changes needed?
Ensure mllib and kafka-0-10 module can be maven test independently in Scala 2.13

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass the GitHub Action Scala 2.13 job
- Manual test:

1. Execute
```
dev/change-scala-version.sh 2.13
mvn clean install -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13
```

2. Execute

```
mvn test -pl mllib -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13
```

**Before**

6 Java UTs failed:

```
[ERROR] Errors:
[ERROR]   JavaStreamingLogisticRegressionSuite.javaAPI:78 » TestFailed 20005 was not les...
[ERROR]   JavaStreamingKMeansSuite.javaAPI:78 » TestFailed 20040 was not less than 20000...
[ERROR]   JavaPrefixSpanSuite.runPrefixSpan:45 » NoClassDefFound scala/collection/parall...
[ERROR]   JavaPrefixSpanSuite.runPrefixSpanSaveLoad:67 » NoClassDefFound scala/collectio...
[ERROR]   JavaStreamingLinearRegressionSuite.javaAPI:77 » TestFailed 20014 was not less ...
[ERROR]   JavaStatisticsSuite.streamingTest:112 » TestFailed 20043 was not less than 200...
[INFO]
[ERROR] Tests run: 122, Failures: 0, Errors: 6, Skipped: 0
```

**After**

```
[INFO] Tests run: 122, Failures: 0, Errors: 0, Skipped: 0

Run completed in 28 minutes, 32 seconds.
Total number of tests run: 1654
Suites: completed 208, aborted 0
Tests: succeeded 1654, failed 0, canceled 0, ignored 7, pending 0
All tests passed.
```

3. Execute

```
mvn test -pl external/kafka-0-10 -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13
```

**Before**

2 Java UTs failed:

```
[ERROR] Errors:
[ERROR] org.apache.spark.streaming.kafka010.JavaDirectKafkaStreamSuite.testKafkaStream
[ERROR]   Run 1: JavaDirectKafkaStreamSuite.testKafkaStream:170 expected:<[topic1-1, topic1-2, topic2-1, topic1-3, topic2-2, topic2-3]> but was:<[]>
[ERROR]   Run 2: JavaDirectKafkaStreamSuite.tearDown:57 » NoClassDefFound scala/collection/para...
[ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0
```

**After**

```
[INFO] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0

Run completed in 1 minute, 3 seconds.
Total number of tests run: 21
Suites: completed 4, aborted 0
Tests: succeeded 21, failed 0, canceled 0, ignored 0, pending 0
All tests passed.

```

Closes #32676 from LuciferYang/mllib-kafka-mvn-test.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 16d9de8)
The file was modifiedexternal/kafka-0-10/pom.xml (diff)
The file was modifiedmllib/pom.xml (diff)
Commit 09d039da56fa26f5e6efa555c9229f68dcbf30b3 by dhyun
[SPARK-35526][CORE][SQL][ML][MLLIB] Re-Cleanup `procedure syntax is deprecated` compilation warning in Scala 2.13

### What changes were proposed in this pull request?
After SPARK-29291 and SPARK-33352, there are still some compilation warnings about `procedure syntax is deprecated` as follows:

```
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:723: [deprecation   | origin= | version=2.13.0] procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `registerMergeResult`'s return type
[WARNING] [Warn] /spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:748: [deprecation   | origin= | version=2.13.0] procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `unregisterMergeResult`'s return type
[WARNING] [Warn] /spark/core/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala:223: [deprecation   | origin= | version=2.13.0] procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `testSimpleSpillingForAllCodecs`'s return type
[WARNING] [Warn] /spark/mllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASBenchmark.scala:53: [deprecation   | origin= | version=2.13.0] procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `runBLASBenchmark`'s return type
[WARNING] [Warn] /spark/sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala:110: [deprecation   | origin= | version=2.13.0] procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `assertEmptyRootPath`'s return type
[WARNING] [Warn] /spark/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala:602: [deprecation   | origin= | version=2.13.0] procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `executeCTASWithNonEmptyLocation`'s return type
```

So the main change of this pr is cleanup these compilation warnings.

### Why are the changes needed?
Eliminate compilation warnings in Scala 2.13 and this change should be compatible with Scala 2.12

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass the Jenkins or GitHub Action

Closes #32669 from LuciferYang/re-clean-procedure-syntax.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 09d039d)
The file was modifiedcore/src/main/scala/org/apache/spark/MapOutputTracker.scala (diff)
The file was modifiedmllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASBenchmark.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala (diff)
Commit ff27264ae53130d0dcdc7b8665ab7147e8f43371 by gurwls223
[SPARK-35550][BUILD] Upgrade Jackson to 2.12.3

### What changes were proposed in this pull request?
This pr upgrade Jackson version to 2.12.3.
Jackson Release 2.12.3: [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.12.3](https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.12.3)

### Why are the changes needed?
Upgrade to a new version to bring potential bug fixes like [https://github.com/FasterXML/jackson-modules-java8/issues/207](https://github.com/FasterXML/jackson-modules-java8/issues/207)  and avro's master has been upgraded to Jackson to 2.12.3

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass the Jenkins or GitHub Action

Closes #32688 from LuciferYang/SPARK-35550.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: ff27264)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
Commit 1a55019b1fce91c1dd1a36044dca2714b5bb1247 by dhyun
[SPARK-31168][BUILD][FOLLOWUP] Update scala-2.12 profile

### What changes were proposed in this pull request?

This PR is a follow-up of https://github.com/apache/spark/pull/32697 to update the missed part.
After SPARK-34774, we have Scala 2.12 version in `scala-2.12` profile.

### Why are the changes needed?

To be consistent.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs and manual.

**BEFORE**
```
$ build/mvn help:evaluate -Pscala-2.12 -Dexpression=scala.version | grep "^2.12"
Using `mvn` from path: /usr/local/bin/mvn
2.12.10
```

**AFTER**
```
$ build/mvn help:evaluate -Pscala-2.12 -Dexpression=scala.version | grep "^2.12"
Using `mvn` from path: /usr/local/bin/mvn
2.12.14
```

Closes #32707 from dongjoon-hyun/SPARK-31168-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 1a55019)
The file was modifiedpom.xml (diff)
Commit 806da9d6fae403f88aac42213a58923cf6c2cb05 by wenchen
[SPARK-35545][SQL] Split SubqueryExpression's children field into outer attributes and join conditions

### What changes were proposed in this pull request?
This PR refactors `SubqueryExpression` class. It removes the children field from SubqueryExpression's constructor and adds `outerAttrs` and `joinCond`.

### Why are the changes needed?
Currently, the children field of a subquery expression is used to store both collected outer references in the subquery plan and join conditions after correlated predicates are pulled up.

For example:
`SELECT (SELECT max(c1) FROM t1 WHERE t1.c1 = t2.c1) FROM t2`

During the analysis phase, outer references in the subquery are stored in the children field: `scalar-subquery [t2.c1]`, but after the optimizer rule `PullupCorrelatedPredicates`, the children field will be used to store the join conditions, which contain both the inner and the outer references: `scalar-subquery [t1.c1 = t2.c1]`. This is why the references of SubqueryExpression excludes the inner plan's output:
https://github.com/apache/spark/blob/29ed1a2de42e7a663f764192fce157a9f23029b3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala#L68-L69

This can be confusing and error-prone. The references for a subquery expression should always be defined as outer attribute references.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

Closes #32687 from allisonwang-db/refactor-subquery-expr.

Authored-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 806da9d)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveSubqueries.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
Commit c225196be0d18975a3d290b0b9d7283d764be322 by dhyun
[SPARK-35507][INFRA] Add Python 3.9 in the docker image for GitHub Action

### What changes were proposed in this pull request?

This PR aims to add `Python 3.9.5` and updates the docker image references except SparkR job.

### Why are the changes needed?

To save GitHub Action resource and be more robust on the the Python and R library changes.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action.

Closes #32706 from dongjoon-hyun/SPARK-35507.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: c225196)
The file was modified.github/workflows/build_and_test.yml (diff)
Commit 73ba4492b1deea953e4f22dbf36dfcacd81c0f8a by gurwls223
[SPARK-35566][SS] Fix StateStoreRestoreExec output rows

### What changes were proposed in this pull request?

This is a minor change to update how `StateStoreRestoreExec` computes its number of output rows. Previously we only count input rows, but the optionally restored rows are not counted in.

### Why are the changes needed?

Currently the number of output rows of `StateStoreRestoreExec` only counts the each input row. But it actually outputs input rows + optional restored rows. We should provide correct number of output rows.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests.

Closes #32703 from viirya/fix-outputrows.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 73ba449)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala (diff)
Commit 8c69e9cd944415aaf301163128306f678e5bbcca by dhyun
[SPARK-35562][DOC] Fix docs about Kubernetes and Yarn

Fixed some places in cluster-overview that are obsolete (i.e. not mentioning Kubernetes), and also fixed the Yarn spark-submit sample command in submitting-applications.

### What changes were proposed in this pull request?

This is to fix the docs in "Cluster Overview" and "Submitting Applications" for places where Kubernetes is missed (mostly due to obsolete docs that haven't got updated) and where Yarn sample spark-submit command is incorrectly written.

### Why are the changes needed?

To help the Spark users who uses Kubernetes as cluster manager to have a correct idea when reading the "Cluster Overview" doc page. Also to make the sample spark-submit command for Yarn actually runnable in the "Submitting Applications" doc page, by removing the invalid comment after line continuation char `\`.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

No test, as this is doc fix.

Closes #32701 from huskysun/doc-fix.

Authored-by: Shiqi Sun <s.sun@salesforce.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 8c69e9c)
The file was modifieddocs/cluster-overview.md (diff)
The file was modifieddocs/submitting-applications.md (diff)
Commit 6cd6c438f238f2e749fdf96ab7bc2da8334c1502 by yumwang
[SPARK-34808][SQL] Removes outer join if it only has DISTINCT on streamed side

### What changes were proposed in this pull request?

This pr add new rule to removes outer join if it only has distinct on streamed side. For example:
```scala
spark.range(200L).selectExpr("id AS a").createTempView("t1")
spark.range(300L).selectExpr("id AS b").createTempView("t2")
spark.sql("SELECT DISTINCT a FROM t1 LEFT JOIN t2 ON a = b").explain(true)
```

Before this pr:
```
== Optimized Logical Plan ==
Aggregate [a#2L], [a#2L]
+- Project [a#2L]
   +- Join LeftOuter, (a#2L = b#6L)
      :- Project [id#0L AS a#2L]
      :  +- Range (0, 200, step=1, splits=Some(2))
      +- Project [id#4L AS b#6L]
         +- Range (0, 300, step=1, splits=Some(2))
```

After this pr:
```
== Optimized Logical Plan ==
Aggregate [a#2L], [a#2L]
+- Project [id#0L AS a#2L]
   +- Range (0, 200, step=1, splits=Some(2))
```

### Why are the changes needed?

Improve query performance. [DB2](https://www.ibm.com/docs/en/db2-for-zos/11?topic=manipulation-how-db2-simplifies-join-operations) support this feature:
![image](https://user-images.githubusercontent.com/5399861/119594277-0d7c4680-be0e-11eb-8bd4-366d8c4639f0.png)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #31908 from wangyum/SPARK-34808.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(commit: 6cd6c43)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/AggregateOptimizeSuite.scala (diff)
Commit 14e12c64d35ae488cc6ce4d1c53e519f29a74d45 by gurwls223
[SPARK-35575][INFRA] Recover updating build status in GitHub Actions

### What changes were proposed in this pull request?

This PR fixes the logic to be fault tolerant when it gets the status of the workflow run from PR author's forked repository.

Looks like https://github.com/apache/spark/pull/32483 removed and disabled (see also https://github.com/apache/spark/pull/32486/checks?check_run_id=2648696751) the GitHub actions workflow runs in the forked repositories, and the detection logic in the main repo fails because the runs don't exist anymore.

See also https://github.com/apache/spark/runs/2709537998?check_suite_focus=true

### Why are the changes needed?

To recover the status update of GitHub Actions in PRs.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

It cannot be tested without being merged.

Closes #32711 from HyukjinKwon/SPARK-35575.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 14e12c6)
The file was modified.github/workflows/update_build_status.yml (diff)
Commit 16037759343902108e6196babacb002628e82e24 by wenchen
[SPARK-35411][SQL][FOLLOWUP] Handle Currying Product while serializing TreeNode to JSON

### What changes were proposed in this pull request?
Handle Currying Product while serializing TreeNode to JSON. While processing [Product](https://github.com/apache/spark/blob/v3.1.2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala#L820), we may get an assert error for cases like Currying Product because of the mismatch of sizes between field name and field values.
Fallback to use reflection to get all the values for constructor parameters when we  meet such cases.

### Why are the changes needed?
Avoid throwing error while serializing TreeNode to JSON, try to output as much information as possible.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
New UT case added.

Closes #32713 from ivoson/SPARK-35411-followup.

Authored-by: Tengfei Huang <tengfei.h@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 1603775)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala (diff)