Changes

Summary

  1. [SPARK-35617][INFRA] Update GitHub Action docker image to 20210602 (commit: 2550490) (details)
  2. [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor (commit: b9e53f8) (details)
  3. [SPARK-35620][BUILD][PYTHON] Remove documentation build in Python linter (commit: d478cff) (details)
  4. [SPARK-35528][DOCS] Add more options at Data Source Options pages (commit: e0bccc1) (details)
  5. [SPARK-35574][BUILD] Add a compile arg to turn compilation warnings (commit: 4a549f2) (details)
  6. [SPARK-35081][DOCS] Add Data Source Option links to missing documents (commit: 2658bc5) (details)
  7. [SPARK-35609][BUILD] Add style rules to prohibit to use a Guava's API (commit: c532f82) (details)
  8. [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In/InSet (commit: cfde117) (details)
  9. [SPARK-35416][K8S][FOLLOWUP] Use Set instead of ArrayBuffer (commit: 4f0db87) (details)
  10. [SPARK-35580][SQL] Implement canonicalized method for (commit: 0342dcb) (details)
  11. [SPARK-35612][SQL] Support LZ4 compression in ORC data source (commit: 878527d) (details)
  12. [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in (commit: 7eeb07d) (details)
  13. [SPARK-35589][CORE][TESTS][FOLLOWUP] Remove the duplicated test coverage (commit: 745bd09) (details)
  14. [SPARK-35587][PYTHON][DOCS] Initial porting of Koalas documentation (commit: 3d158f9) (details)
  15. [SPARK-35642][INFRA] Split pyspark-pandas tests to rebalance the test (commit: 221553c) (details)
  16. [SPARK-35396][CORE][FOLLOWUP] Free memory entry immediately (commit: 63ab38f) (details)
  17. [SPARK-35523] Fix the default value in Data Source Options page (commit: 7c32415) (details)
  18. [SPARK-35648][PYTHON] Refine and add dependencies needed for dev in (commit: 807b400) (details)
  19. [SPARK-35636][SQL] Lambda keys should not be referenced outside of the (commit: 53a758b) (details)
  20. [SPARK-35629][SQL] Use better exception type if database doesn't exist (commit: c7fb0e1) (details)
  21. [SPARK-35568][SQL] Add the BroadcastExchange after re-optimizing the (commit: 6ce5f24) (details)
  22. [SPARK-21957][SQL][FOLLOWUP] Support CURRENT_USER without tailing (commit: dc3317f) (details)
  23. [SPARK-32975][K8S] Add config for driver readiness timeout before (commit: 497c80a) (details)
  24. [SPARK-35643][PYTHON] Fix ambiguous reference in functions.py column() (commit: f2c0a04) (details)
  25. [SPARK-35446] Override getJDBCType in MySQLDialect to map FloatType to (commit: b5678be) (details)
  26. [SPARK-35621][SQL] Add rule id pruning to the TypeCoercion rule (commit: 7bc364b) (details)
  27. [SPARK-35655][BUILD] Upgrade HtmlUnit and its related artifacts to 2.50 (commit: 510bde4) (details)
Commit 2550490c0985d0e7bfb9fd5239a7401d2261ee4a by dhyun
[SPARK-35617][INFRA] Update GitHub Action docker image to 20210602

### What changes were proposed in this pull request?

This PR aims to update GitHub Action docker image with the following updates.
1. Add `pip` explicitly to Python 3.8/3.9
2. Add `plotly` to Python 3.8.
3. Since SPARK-35573 fixes SparkR UT failures on R 4.1.0, update SparkR job to run R 4.1.0.

### Why are the changes needed?

To improve the GitHub Action test infra and unblock #32737

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action.

Closes #32755 from dongjoon-hyun/SPARK-35617.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 2550490)
The file was modified.github/workflows/build_and_test.yml (diff)
Commit b9e53f89379c34bde36b9c37471e21f037092749 by yi.wu
[SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight

### What changes were proposed in this pull request?

This patch proposes a fix to prevent triggering BlockManager reregistration while `StopExecutor` msg is in-flight.
Here on receiving `StopExecutor` msg, we do not remove the corresponding `BlockManagerInfo` from `blockManagerInfo` map, instead we mark it as dead by updating the corresponding `executorRemovalTs`. There's a separate cleanup thread running to periodically remove the stale `BlockManagerInfo` from `blockManangerInfo` map.

Now if a recently removed `BlockManager` tries to register, the driver simply ignores it since the `blockManagerInfo` map already contains an entry for it. The same applies to `BlockManagerHeartbeat`, if the BlockManager belongs to a recently removed executor, the `blockManagerInfo` map would contain an entry and we shall not ask the corresponding `BlockManager` to re-register.

### Why are the changes needed?

This changes are needed since BlockManager reregistration while executor is shutting down causes inconsistent bookkeeping of executors in Spark.
Consider the following scenario:
- `CoarseGrainedSchedulerBackend` issues async `StopExecutor` on executorEndpoint
- `CoarseGrainedSchedulerBackend` removes that executor from Driver's internal data structures and publishes `SparkListenerExecutorRemoved` on the `listenerBus`.
- Executor has still not processed `StopExecutor` from the Driver
- Driver receives heartbeat from the Executor, since it cannot find the `executorId` in its data structures, it responds with `HeartbeatResponse(reregisterBlockManager = true)`
- `BlockManager` on the Executor reregisters with the `BlockManagerMaster` and `SparkListenerBlockManagerAdded` is published on the `listenerBus`
- Executor starts processing the `StopExecutor` and exits
- `AppStatusListener` picks the `SparkListenerBlockManagerAdded` event and updates `AppStatusStore`
- `statusTracker.getExecutorInfos` refers `AppStatusStore` to get the list of executors which returns the dead executor as alive.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

- Modified the existing unittests.
- Ran a simple test application on minikube that asserts on number of executors are zero once the executor idle timeout is reached.

Closes #32114 from sumeetgajjar/SPARK-35011.

Authored-by: Sumeet Gajjar <sumeetgajjar93@gmail.com>
Signed-off-by: yi.wu <yi.wu@databricks.com>
(commit: b9e53f8)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/util/Utils.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/HeartbeatReceiver.scala (diff)
Commit d478cff8bbf2455c4f494ea9a176623dfc560b38 by gurwls223
[SPARK-35620][BUILD][PYTHON] Remove documentation build in Python linter

### What changes were proposed in this pull request?

This PR proposes to remove PySpark documentation build in linter check because:

- to speed up CI build by removing duplicate documentation build (linter and doc build)
- for https://github.com/apache/spark/pull/32726. With this PR PySpark documentation build requires a full Spark build to generate plot images in PySpark documentation. It makes less sense to require it in Python linter.
- to remove unnecessary dependency installation for Python linter in CI

### Why are the changes needed?

Python linter script includes documentation build. Because of this, we run documentation builds duplicately in CI, and requires unnecessary dependencies to be installed, and takes extra time. It would more make sense to exclude this in Python linter.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested, and it will be tested in CI.

Closes #32760 from HyukjinKwon/SPARK-35620.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: d478cff)
The file was modified.github/workflows/build_and_test.yml (diff)
The file was modifieddev/lint-python (diff)
Commit e0bccc1831a0fc8997c2d3a7ed826336fb6962d9 by gurwls223
[SPARK-35528][DOCS] Add more options at Data Source Options pages

### What changes were proposed in this pull request?

This PR proposes adding more methods to set data source option to `Data Source Option` page for each data source.

For example, Data Source Option page for JSON as below:

- Before
<img width="322" alt="Screen Shot 2021-06-03 at 10 51 54 AM" src="https://user-images.githubusercontent.com/44108233/120574245-eb13aa00-c459-11eb-9f81-0b356023bcb5.png">

- After
<img width="470" alt="Screen Shot 2021-06-03 at 10 52 21 AM" src="https://user-images.githubusercontent.com/44108233/120574253-ed760400-c459-11eb-9008-1f075e0b9267.png">

### Why are the changes needed?

To provide users various options when they set options for data source.

### Does this PR introduce _any_ user-facing change?

Yes, now the document provides more methods for setting options than before, as in above screen capture.

### How was this patch tested?

Manually built the docs and check one by one.

Closes #32757 from itholic/SPARK-35528.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: e0bccc1)
The file was modifieddocs/sql-data-sources-parquet.md (diff)
The file was modifieddocs/sql-data-sources-orc.md (diff)
The file was modifieddocs/sql-data-sources-json.md (diff)
The file was modifieddocs/sql-data-sources-csv.md (diff)
Commit 4a549f2de232208253625706375aa0964571ceb8 by gurwls223
[SPARK-35574][BUILD] Add a compile arg to turn compilation warnings related to `procedure syntax` to compilation errors in Scala 2.13

### What changes were proposed in this pull request?
There are several pr to fix compilation warnings  related to `procedure syntax` like SPARK-29291, SPARK-33352 and SPARK-35526, in order to prevent the recurrence of similar problems, this pr add a compile arg to convert `procedure syntax` related compilation warnings to compilation errors in Scala 2.13.

### Why are the changes needed?
Prevent the recurrence of compilation warnings related to `procedure syntax is deprecated`

### Does this PR introduce _any_ user-facing change?
`procedure syntax` is no longer allowed in Spark code with Scala 2.13, for constructors methods definition should be `this(...) = { }` not `this(...) { }`, for without  `return type` methods definition should be `def methodName(...): Unit = {}` not `def methodName(...) {}`.

### How was this patch tested?

- Pass the GitHub Action Scala 2.13 job

- Manual test:

Do some code change like:

```
Index: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
===================================================================
-67,7 +67,7
private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
   extends SparkListener with ThreadSafeRpcEndpoint with Logging {

-  def this(sc: SparkContext) = {
+  def this(sc: SparkContext) {
     this(sc, new SystemClock)
   }

Index: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
===================================================================
-720,7 +720,7
     }
   }

-  def registerMergeResult(shuffleId: Int, reduceId: Int, status: MergeStatus): Unit = {
+  def registerMergeResult(shuffleId: Int, reduceId: Int, status: MergeStatus) {
     shuffleStatuses(shuffleId).addMergeResult(reduceId, status)
   }
```

**sbt  with Scala 2.13 profile compile  failed as follows:***

```
[error] /home/runner/work/spark/spark/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70:29: procedure syntax is deprecated for constructors: add `=`, as in method definition
[error]   def this(sc: SparkContext) {
[error]                             ^
[error] /home/runner/work/spark/spark/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:723:79: procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `registerMergeResult`'s return type
[error]   def registerMergeResult(shuffleId: Int, reduceId: Int, status: MergeStatus) {
[error]                                                                               ^
[error] two errors found
[error] (core / Compile / compileIncremental) Compilation failed
[error] Total time: 136 s (02:16), completed May 31, 2021 10:06:50 AM

Error: Process completed with exit code 1.
```

**maven  with Scala 2.13 profile compile  failed as follows:**

```
[ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine/core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala:70: procedure syntax is deprecated for constructors: add `=`, as in method definition
[ERROR] [Error] /Users/yangjie01/SourceCode/git/spark-mine/core/src/main/scala/org/apache/spark/MapOutputTracker.scala:723: procedure syntax is deprecated: instead, add `: Unit =` to explicitly declare `registerMergeResult`'s return type
[ERROR] two errors found
```

Closes #32710 from LuciferYang/SPARK-35574.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 4a549f2)
The file was modifiedpom.xml (diff)
The file was modifiedproject/SparkBuild.scala (diff)
Commit 2658bc590fec51e2266d03121c85b47f553022ec by gurwls223
[SPARK-35081][DOCS] Add Data Source Option links to missing documents

### What changes were proposed in this pull request?

This PR proposes adding the missing link to Data Source Option page, for related functions such as `to_csv`, `to_json`, `from_csv`, `from_json`, `schema_of_csv`, `schema_of_json`.

- Before
<img width="797" alt="Screen Shot 2021-06-03 at 11 39 17 AM" src="https://user-images.githubusercontent.com/44108233/120578877-7b092200-c461-11eb-9e24-bd5349445c66.png">

- After
<img width="776" alt="Screen Shot 2021-06-03 at 11 59 14 AM" src="https://user-images.githubusercontent.com/44108233/120579868-29fa2d80-c463-11eb-9329-bd6c8f068f5b.png">

### Why are the changes needed?

To provide users available options in detail with the proper documentation link.

### Does this PR introduce _any_ user-facing change?

Yes, the link to Data Source Options page is added to the API documentations, as shown in the above screen capture.

### How was this patch tested?

Manually built the docs and checked one by one.

Closes #32762 from itholic/SPARK-35081.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 2658bc5)
The file was modifiedpython/pyspark/sql/functions.py (diff)
Commit c532f8260ee2f2f4170dc50f7e890fafab438b76 by sarutak
[SPARK-35609][BUILD] Add style rules to prohibit to use a Guava's API which is incompatible with newer versions

### What changes were proposed in this pull request?

This PR adds rules to `checkstyle.xml` and `scalastyle-config.xml` to avoid introducing `Objects.toStringHelper` a Guava's API which is no longer present in newer Guava.

### Why are the changes needed?

SPARK-30272 (#26911) replaced `Objects.toStringHelper` which is an APIs Guava 14 provides with `commons.lang3` API because `Objects.toStringHelper` is no longer present in newer Guava.
But toStringHelper was introduced into Spark again and replaced them in SPARK-35420 (#32567).
I think it's better to have a style rule to avoid such repetition.

SPARK-30272 replaced some APIs aside from `Objects.toStringHelper` but `Objects.toStringHelper` seems to affect Spark for now so I add rules only for it.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed that `lint-java` and `lint-scala` detect the usage of `toStringHelper` and let the lint check fail.
```
$ dev/lint-java
exec: curl --silent --show-error -L https://downloads.lightbend.com/scala/2.12.14/scala-2.12.14.tgz
Using `mvn` from path: /opt/maven/3.6.3//bin/mvn
Checkstyle checks failed at following occurrences:
[ERROR] src/main/java/org/apache/spark/network/protocol/OneWayMessage.java:[78] (regexp) RegexpSinglelineJava: Avoid using Object.toStringHelper. Use ToStringBuilder instead.

$ dev/lint-scala
Scalastyle checks failed at following occurrences:
[error] /home/kou/work/oss/spark/core/src/main/scala/org/apache/spark/rdd/RDD.scala:93:25: Avoid using Object.toStringHelper. Use ToStringBuilder instead.
[error] Total time: 25 s, completed 2021/06/02 16:18:25
```

Closes #32740 from sarutak/style-rule-for-guava.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(commit: c532f82)
The file was modifiedscalastyle-config.xml (diff)
The file was modifieddev/checkstyle.xml (diff)
Commit cfde117c6fec758c44f77fe9f9379ba38940b51a by wenchen
[SPARK-35316][SQL] UnwrapCastInBinaryComparison support In/InSet predicate

### What changes were proposed in this pull request?

This pr add in/inset predicate support for `UnwrapCastInBinaryComparison`.

Current implement doesn't pushdown filters for `In/InSet` which contains `Cast`.

For instance:

```scala
spark.range(50).selectExpr("cast(id as int) as id").write.mode("overwrite").parquet("/tmp/parquet/t1")
spark.read.parquet("/tmp/parquet/t1").where("id in (1L, 2L, 4L)").explain
```

before this pr:

```
== Physical Plan ==
*(1) Filter cast(id#5 as bigint) IN (1,2,4)
+- *(1) ColumnarToRow
   +- FileScan parquet [id#5] Batched: true, DataFilters: [cast(id#5 as bigint) IN (1,2,4)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:int>
```

after this pr:

```
== Physical Plan ==
*(1) Filter id#95 IN (1,2,4)
+- *(1) ColumnarToRow
   +- FileScan parquet [id#95] Batched: true, DataFilters: [id#95 IN (1,2,4)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [In(id, [1,2,4])], ReadSchema: struct<id:int>
```

### Does this PR introduce _any_ user-facing change?

No.
### How was this patch tested?

New test.

Closes #32488 from cfmcgrady/SPARK-35316.

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: cfde117)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala (diff)
Commit 4f0db872a089257f1e894060e1757f9e5b973142 by mridulatgmail.com
[SPARK-35416][K8S][FOLLOWUP] Use Set instead of ArrayBuffer

### What changes were proposed in this pull request?

This is a follow-up of https://github.com/apache/spark/pull/32564 .

### Why are the changes needed?

To use Set instead of ArrayBuffer and add a return type.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #32758 from dongjoon-hyun/SPARK-35416-2.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
(commit: 4f0db87)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala (diff)
Commit 0342dcb6289bae67f6ab742b8fd03b2d653e52ea by viirya
[SPARK-35580][SQL] Implement canonicalized method for HigherOrderFunction

### What changes were proposed in this pull request?

This patch implements `canonicalized` method for `HigherOrderFunction`. Basically it canonicalizes the name of all `NamedLambdaVariable`s and their `ExprId`. The name and `ExprId` of `NamedLambdaVariable` are unque. But to compare semantic equality between `HigherOrderFunction`, we can canonicalize them.

### Why are the changes needed?

The default `canonicalized` method does not work for `HigherOrderFunction`. It makes subexpression elimination not work for higher functions.

Manual check gen-ed code for:
```scala
val df = Seq(Seq(1, 2, 3)).toDF("a")
df.select(transform($"a", x => x + 1), transform($"a", x => x + 1)).collect()
```

The code for `transform(input[0, array<int>, true], lambdafunction((lambda x_20#19041 + 1), lambda x_20#19041, false)),transform(input[0, array<int>, true], lambdafunction((lambda x_21#19042 + 1), lambda x_21#19042, false))`, generated by `GenerateUnsafeProjection`.

Before:

```java
/* 005 */ class SpecificUnsafeProjection extends org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
...
/* 028 */   public UnsafeRow apply(InternalRow i) {
...
/* 034 */     Object obj_0 = ((Expression) references[0]).eval(i);
...
/* 062 */     Object obj_1 = ((Expression) references[1]).eval(i);
...
/* 093 */ }
```

After:
```java
/* 005 */ class SpecificUnsafeProjection extends org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
...
/* 031 */   public UnsafeRow apply(InternalRow i) {
...
/* 033 */     subExpr_0(i);
...
/* 086 */   private void subExpr_0(InternalRow i) {
/* 087 */     Object obj_0 = ((Expression) references[0]).eval(i);
/* 088 */     boolean isNull_0 = obj_0 == null;
/* 089 */     ArrayData value_0 = null;
/* 090 */     if (!isNull_0) {
/* 091 */       value_0 = (ArrayData) obj_0;
/* 092 */     }
/* 093 */     subExprIsNull_0 = isNull_0;
/* 094 */     mutableStateArray_0[0] = value_0;
/* 095 */   }
/* 096 */
/* 097 */ }
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test and manual check gen-ed code.

Closes #32735 from viirya/higher-func-canonicalize.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: 0342dcb)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala (diff)
Commit 878527d9fae8945d087ec871bb0a5f49b6341939 by dhyun
[SPARK-35612][SQL] Support LZ4 compression in ORC data source

### What changes were proposed in this pull request?

This PR aims to support LZ4 compression in the ORC data source.

### Why are the changes needed?

Apache ORC supports LZ4 compression, but we cannot set LZ4 compression in the ORC data source

**BEFORE**

```scala
scala> spark.range(10).write.option("compression", "lz4").orc("/tmp/lz4")
java.lang.IllegalArgumentException: Codec [lz4] is not available. Available codecs are uncompressed, lzo, snappy, zlib, none, zstd.
```

**AFTER**

```scala
scala> spark.range(10).write.option("compression", "lz4").orc("/tmp/lz4")
```
```bash
$ orc-tools meta /tmp/lz4
Processing data file file:/tmp/lz4/part-00000-6a244eee-b092-4c79-a977-fb8a69dde2eb-c000.lz4.orc [length: 222]
Structure for file:/tmp/lz4/part-00000-6a244eee-b092-4c79-a977-fb8a69dde2eb-c000.lz4.orc
File Version: 0.12 with ORC_517
Rows: 10
Compression: LZ4
Compression size: 262144
Type: struct<id:bigint>

Stripe Statistics:
  Stripe 1:
    Column 0: count: 10 hasNull: false
    Column 1: count: 10 hasNull: false bytesOnDisk: 7 min: 0 max: 9 sum: 45

File Statistics:
  Column 0: count: 10 hasNull: false
  Column 1: count: 10 hasNull: false bytesOnDisk: 7 min: 0 max: 9 sum: 45

Stripes:
  Stripe: offset: 3 data: 7 rows: 10 tail: 35 index: 35
    Stream: column 0 section ROW_INDEX start: 3 length 11
    Stream: column 1 section ROW_INDEX start: 14 length 24
    Stream: column 1 section DATA start: 38 length 7
    Encoding column 0: DIRECT
    Encoding column 1: DIRECT_V2

File length: 222 bytes
Padding length: 0 bytes
Padding ratio: 0%

User Metadata:
  org.apache.spark.version=3.2.0
```

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

Pass the newly added test case.

Closes #32751 from fornaix/spark-35612.

Authored-by: fornaix <foxnaix@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 878527d)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala (diff)
The file was modifieddocs/sql-data-sources-orc.md (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOptions.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit 7eeb07d0f9cb5f405d0913c27fabe4c136667c4b by dhyun
[SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

### What changes were proposed in this pull request?

In the build_and_test workflow, tests are run against both Python 3.6 and Python 3.9. However, only libraries installed in Python 3.6 are listed. We should list Python 3.9's installed libraries as well.

### Why are the changes needed?

Listing Python 3.9's installed libraries is helpful for debugging.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual check.

Closes #32737 from xinrong-databricks/ci_py3.9lib.

Lead-authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Co-authored-by: xinrong-databricks <47337188+xinrong-databricks@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 7eeb07d)
The file was modified.github/workflows/build_and_test.yml (diff)
Commit 745bd090f7802276f9138244ec40b36db8bf40f2 by gurwls223
[SPARK-35589][CORE][TESTS][FOLLOWUP] Remove the duplicated test coverage

### What changes were proposed in this pull request?

This removes the accidental duplicated test coverage.

### Why are the changes needed?

To save the test resources.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A because this is a removal of the duplicated test coverage.

Closes #32774 from dongjoon-hyun/SPARK-35589-3.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 745bd09)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala (diff)
Commit 3d158f9c913912b672ad93821b09971e7741b4a1 by gurwls223
[SPARK-35587][PYTHON][DOCS] Initial porting of Koalas documentation

### What changes were proposed in this pull request?

This PR proposes to port Koalas documentation to PySpark documentation as its initial step.
It ports almost as is except these differences:

- Renamed import from `databricks.koalas` to `pyspark.pandas`.
- Renamed `to_koalas` -> `to_pandas_on_spark`
- Renamed `(Series|DataFrame).koalas` -> `(Series|DataFrame).pandas_on_spark`
- Added a `ps_` prefix in the RST file names of Koalas documentation

Other then that,

- Excluded `python/docs/build/html` in linter
- Fixed GA dependency installataion

### Why are the changes needed?

To document pandas APIs on Spark.

### Does this PR introduce _any_ user-facing change?

Yes, it adds new documentations.

### How was this patch tested?

Manually built the docs and checked the output.

Closes #32726 from HyukjinKwon/SPARK-35587.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 3d158f9)
The file was addedpython/docs/source/getting_started/ps_install.rst
The file was addedpython/docs/source/user_guide/ps_best_practices.rst
The file was addedpython/docs/source/reference/ps_extensions.rst
The file was addedpython/docs/source/user_guide/ps_from_to_dbms.rst
The file was addedpython/docs/source/user_guide/ps_typehints.rst
The file was addedpython/docs/source/user_guide/ps_options.rst
The file was modifiedpython/docs/source/conf.py (diff)
The file was modifiedpython/docs/source/reference/index.rst (diff)
The file was modifiedpython/docs/source/development/index.rst (diff)
The file was addedpython/docs/source/reference/ps_frame.rst
The file was addedpython/docs/source/reference/ps_series.rst
The file was modifiedpython/docs/source/index.rst (diff)
The file was addedpython/docs/source/development/ps_contributing.rst
The file was addedpython/docs/source/reference/ps_ml.rst
The file was addedpython/docs/source/development/ps_design.rst
The file was addedpython/docs/source/user_guide/ps_types.rst
The file was addedpython/docs/source/getting_started/ps_videos_blogs.rst
The file was addedpython/docs/source/reference/ps_window.rst
The file was modifiedpython/docs/source/user_guide/index.rst (diff)
The file was modifiedpython/docs/source/getting_started/index.rst (diff)
The file was addedpython/docs/source/getting_started/ps_10mins.ipynb
The file was modified.github/workflows/build_and_test.yml (diff)
The file was addedpython/docs/source/reference/ps_groupby.rst
The file was modifieddev/tox.ini (diff)
The file was addedpython/docs/source/reference/ps_indexing.rst
The file was addedpython/docs/source/reference/ps_io.rst
The file was addedpython/docs/source/user_guide/ps_pandas_pyspark.rst
The file was addedpython/docs/source/reference/ps_general_functions.rst
The file was modifieddocs/_plugins/copy_api_dirs.rb (diff)
The file was addedpython/docs/source/user_guide/ps_faq.rst
The file was addedpython/docs/source/user_guide/ps_transform_apply.rst
The file was modifiedpython/pyspark/pandas/accessors.py (diff)
Commit 221553c204951ae663523e6a5beeb5a31066fb90 by gurwls223
[SPARK-35642][INFRA] Split pyspark-pandas tests to rebalance the test duration

### What changes were proposed in this pull request?

Splits some tests in `pyspark-pandas` module as slot tests to rebalance the test duration.

Picked the top 12 tests from the previous runs and the total times are almost even.

### Why are the changes needed?

Currently `pyspark-pandas` module tests take long time, so we should rebalance the tests.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #32778 from ueshin/issues/SPARK-35642/split-pandas-on-spark-tests.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 221553c)
The file was modifieddev/sparktestsupport/modules.py (diff)
The file was modifieddev/run-tests.py (diff)
The file was modified.github/workflows/build_and_test.yml (diff)
Commit 63ab38f917b3936f5bc025ac7b0d4e281414efef by dhyun
[SPARK-35396][CORE][FOLLOWUP] Free memory entry immediately

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/32534 , and proposes to free the memory entry immediately instead of doing it in the backround asynchronously. The reason is:
1. It's a bit weird to free the resource in an asynchronous way.
2. We free the off-heap memory entry in the same thread, and it's better to be consistent with it.
3. We can simplify the code quite a bit.

This PR also simplifies the tests to reuse the class definition.

### Why are the changes needed?

code simplification

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing tests

Closes #32743 from cloud-fan/follow.

Lead-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 63ab38f)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/MemoryStoreSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala (diff)
Commit 7c324156693888e1adf0bd294050072c574af62c by gurwls223
[SPARK-35523] Fix the default value in Data Source Options page

### What changes were proposed in this pull request?

This PR proposes fix the default value in Data Source Option page based on the Scala documentation.

### Why are the changes needed?

Some of the existing default value in Data Source Option page follow the Python documentation, which has `None` as the default value for all options.

### Does this PR introduce _any_ user-facing change?

Yes, the default value in the Data Source Option page is fixed (from `None` to proper default value)

- Before
<img width="361" alt="Screen Shot 2021-06-02 at 6 31 12 PM" src="https://user-images.githubusercontent.com/44108233/120456594-b8719f00-c3d0-11eb-9778-071ab2ba9f45.png">

- After
<img width="562" alt="Screen Shot 2021-06-02 at 6 32 47 PM" src="https://user-images.githubusercontent.com/44108233/120456844-f1117880-c3d0-11eb-9c7c-9dcd66776444.png">

### How was this patch tested?

Manually built the docs and checked one by one.

Closes #32745 from itholic/SPARK-35523.

Lead-authored-by: itholic <haejoon.lee@databricks.com>
Co-authored-by: Haejoon Lee <44108233+itholic@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 7c32415)
The file was modifieddocs/sql-data-sources-avro.md (diff)
The file was modifieddocs/sql-data-sources-csv.md (diff)
The file was modifieddocs/sql-data-sources-parquet.md (diff)
The file was modifieddocs/sql-data-sources-jdbc.md (diff)
The file was modifieddocs/sql-data-sources-json.md (diff)
The file was modifieddocs/sql-data-sources-text.md (diff)
The file was modifieddocs/sql-data-sources-orc.md (diff)
Commit 807b4006cacd31b5dc244d3d91116c6248de55f6 by gurwls223
[SPARK-35648][PYTHON] Refine and add dependencies needed for dev in dev/requirement.txt

### What changes were proposed in this pull request?

This PR proposes to update `dev/requirement.txt` file.

NOTE that:
- This file isn't used anywhere in Apache Spark CI. It's just for convenience
- To minimize the overhead of maintenance, I removed all lowerbounds of dependencies, which means that using the latest versions of them should work in the clean environment (e.g., you can reinstall all of them).

### Why are the changes needed?

To note the dependencies needed for Spark dev, and for easier env setting up.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Logically derived from setup.py, and other places like CI

Closes #32780 from HyukjinKwon/SPARK-35648.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 807b400)
The file was modifieddev/requirements.txt (diff)
Commit 53a758b51b79c52ac5d4bf3fad72765e55607f36 by gurwls223
[SPARK-35636][SQL] Lambda keys should not be referenced outside of the lambda function

### What changes were proposed in this pull request?

Sets `references` for `NamedLambdaVariable` and `LambdaFunction`.

| Expression  | NamedLambdaVariable | LambdaFunction |
| --- | --- | --- |
| References before | None | All function references |
| References after | self.toAttribute | Function references minus arguments' references |

In `NestedColumnAliasing`, this means that `ExtractValue(ExtractValue(attr, lv: NamedLambdaVariable), ...)` now references both `attr` and `lv`, rather than just `attr`. As a result, it will not be included in the nested column references.

### Why are the changes needed?

Before, lambda key was referenced outside of lambda function.

#### Example 1

Before:
```
Project [transform(keys#0, lambdafunction(_extract_v1#0, lambda key#0, false)) AS a#0]
+- 'Join Cross
   :- Project [kvs#0[lambda key#0].v1 AS _extract_v1#0]
   :  +- LocalRelation <empty>, [kvs#0]
   +- LocalRelation <empty>, [keys#0]
```

After:
```
Project [transform(keys#418, lambdafunction(kvs#417[lambda key#420].v1, lambda key#420, false)) AS a#419]
+- Join Cross
   :- LocalRelation <empty>, [kvs#417]
   +- LocalRelation <empty>, [keys#418]
```

#### Example 2

Before:
```
Project [transform(keys#0, lambdafunction(kvs#0[lambda key#0].v1, lambda key#0, false)) AS a#0]
+- GlobalLimit 5
  +- LocalLimit 5
    +- Project [keys#0, _extract_v1#0 AS _extract_v1#0]
      +- GlobalLimit 5
        +- LocalLimit 5
          +- Project [kvs#0[lambda key#0].v1 AS _extract_v1#0, keys#0]
            +- LocalRelation <empty>, [kvs#0, keys#0]
```

After:
```
Project [transform(keys#428, lambdafunction(kvs#427[lambda key#430].v1, lambda key#430, false)) AS a#429]
+- GlobalLimit 5
  +- LocalLimit 5
    +- Project [keys#428, kvs#427]
      +- GlobalLimit 5
        +- LocalLimit 5
          +- LocalRelation <empty>, [kvs#427, keys#428]
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Scala unit tests for the examples above

Closes #32773 from karenfeng/SPARK-35636.

Authored-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 53a758b)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasingSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala (diff)
Commit c7fb0e18be28aaca133d330afcc7e85c3e4c6d7a by gengliang
[SPARK-35629][SQL] Use better exception type if database doesn't exist on `drop database`

### What changes were proposed in this pull request?

Add database if exists check in `SeesionCatalog`

### Why are the changes needed?

Curently execute `drop database test` will throw unfriendly error msg.

```
Error in query: org.apache.hadoop.hive.metastore.api.NoSuchObjectException: test
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.metastore.api.NoSuchObjectException: test
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
at org.apache.spark.sql.hive.HiveExternalCatalog.dropDatabase(HiveExternalCatalog.scala:200)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.dropDatabase(ExternalCatalogWithListener.scala:53)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.dropDatabase(SessionCatalog.scala:273)
at org.apache.spark.sql.execution.command.DropDatabaseCommand.run(ddl.scala:111)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3707)
```

### Does this PR introduce _any_ user-facing change?

Yes, more cleaner error msg.

### How was this patch tested?

Add test.

Closes #32768 from ulysses-you/SPARK-35629.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(commit: c7fb0e1)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (diff)
Commit 6ce5f2491c156e0bf4af713948e780d3215a992d by wenchen
[SPARK-35568][SQL] Add the BroadcastExchange after re-optimizing the physical plan to fix the UnsupportedOperationException when enabling both AQE and DPP

### What changes were proposed in this pull request?
This PR is to fix the `UnsupportedOperationException` described in [PR#32705](https://github.com/apache/spark/pull/32705).
When AQE and DPP are turned on at the same time, because the `BroadcastExchange` included in the DPP filter is not added through `EnsureRequirement` rule, Therefore, when AQE optimizes the DPP filter, there is no way to add `BroadcastExchange` through the `EnsureRequirement` rule in `reOptimize` method, which eventually leads to the loss of `BroadcastExchange` in the final physical plan. This PR adds `BroadcastExchange` node in the `reOptimize` method if the current plan is DPP filter.
### Why are the changes needed?
bug fix

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
adding new ut

Closes #32741 from JkSelf/fixDPP+AQEbug.

Authored-by: Ke Jia <ke.a.jia@intel.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 6ce5f24)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala (diff)
Commit dc3317fdf9c3fd718ef23c191fe00f730f341c79 by wenchen
[SPARK-21957][SQL][FOLLOWUP] Support CURRENT_USER without tailing parentheses

### What changes were proposed in this pull request?

A followup for 345d35ed1aa508bd2228603f94cc8aa0ad998906, in this PR we support CURRENT_USER without tailing parentheses in default mode. And for ANSI mode, we can only use CURRENT_USER without tailing parentheses because it is a reserved keyword that cannot be used as a function name

### Why are the changes needed?

1. make it the same as current_date/current_timestamp
2. better ANSI compliance
### Does this PR introduce _any_ user-facing change?

no, just a followup

### How was this patch tested?

new tests

Closes #32770 from yaooqinn/SPARK-21957-F.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: dc3317f)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/MiscFunctionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala (diff)
Commit 497c80a1ad7fdd605b75c8a6601fce35c7449578 by dhyun
[SPARK-32975][K8S] Add config for driver readiness timeout before executors start

### What changes were proposed in this pull request?
Add a new config that controls the timeout of waiting for driver pod's readiness before allocating executor pods. This wait only happens once on application start.

### Why are the changes needed?
The driver's headless service can be resolved by DNS only after the driver pod is ready. If the executor tries to connect to the headless service before driver pod is ready, it will hit UnkownHostException and get into error state but will not be restarted. **This case usually happens when the driver pod has sidecar containers but hasn't finished their creation when executors start.** So basically there is a race condition. This issue can be mitigated by tweaking this config.

### Does this PR introduce _any_ user-facing change?
A new config `spark.kubernetes.allocation.driver.readinessTimeout` added.

### How was this patch tested?
Exisiting tests.

Closes #32752 from cchriswu/SPARK-32975-fix.

Lead-authored-by: Chris Wu <wucaowei19@gmail.com>
Co-authored-by: Chris Wu <wcaowei@vmware.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 497c80a)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocatorSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala (diff)
Commit f2c0a049a699c57a22e7e422eb8914670c10df6c by gurwls223
[SPARK-35643][PYTHON] Fix ambiguous reference in functions.py column()

### What changes were proposed in this pull request?
In functions.py, there is a function added `def column(col)`. There is also another method in the same file `def col(col)`. This leads to some ambiguity on whether the parameter is being referred to or the function. In pyspark 3.1.2, this leads to `TypeError: 'str' object is not callable` when the function `column(col)` is called - the highest preference is given to the string variable in scope as opposed to the function `col `in the file as intended.

This PR fixes that ambiguity by changing the variable name to `col_like`. I have filed this as an issue on JIRA here - https://issues.apache.org/jira/browse/SPARK-35643.

### Why are the changes needed?
In pyspark 3.1.2, we see `TypeError: 'str' object is not callable` when `column()` function is called. This Pr fixes that error.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
I don't believe this patch needs additional testing.

Closes #32771 from keerthanvasist/col.

Lead-authored-by: Keerthan Vasist <kvasist@amazon.com>
Co-authored-by: keerthanvasist <kvasist@amazon.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: f2c0a04)
The file was modifiedpython/pyspark/sql/functions.py (diff)
Commit b5678bee1e5ef2903518e91dc539a3f42ee3371b by gurwls223
[SPARK-35446] Override getJDBCType in MySQLDialect to map FloatType to FLOAT

### What changes were proposed in this pull request?

Override `getJDBCType` method in `MySQLDialect` so that `FloatType` is mapped to `FLOAT` instead of `REAL`

### Why are the changes needed?

MySQL treats `REAL` as a synonym to `DOUBLE` by default (see https://dev.mysql.com/doc/refman/8.0/en/numeric-types.html). Therefore, when creating a table with a column of `REAL` type, it will be created as `DOUBLE`. However, currently, `MySQLDialect` does not provide an implementation for `getJDBCType`, and will thus ultimately fall back to `JdbcUtils.getCommonJDBCType`, which maps `FloatType` to `REAL`. This change is needed so that we can properly map the `FloatType` to `FLOAT` for MySQL.

### Does this PR introduce _any_ user-facing change?
Prior to this PR, when writing a dataframe with a `FloatType` column to a MySQL table, it will create a `DOUBLE` column. After the PR, it will create a `FLOAT` column.

### How was this patch tested?
Added a test case in `JDBCSuite` that verifies the mapping.

Closes #32605 from mariosmeim-db/SPARK-35446.

Authored-by: Marios Meimaris <marios.meimaris@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: b5678be)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/jdbc/MySQLDialect.scala (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala (diff)
Commit 7bc364beed9ae4507c77282b77f1397a3cfe2de8 by gengliang
[SPARK-35621][SQL] Add rule id pruning to the TypeCoercion rule

### What changes were proposed in this pull request?

- Added TreeNode.transformUpWithBeforeAndAfterRuleOnChildren(...);
- Call transformUpWithBeforeAndAfterRuleOnChildren in TypeCoercionRule.

### Why are the changes needed?

Reduce the number of tree traversals and hence improve the query compilation latency.

### How was this patch tested?

Existing tests.
Performance diff :
<google-sheets-html-origin><style type="text/css"></style>
&nbsp; | Baseline | Experiment (wo. ruleId) | Experiment (wo. ruleId)/Baseline | Experiment (w. ruleId) | Experiment (w. ruleId)/Baseline
-- | -- | -- | -- | -- | --
CombinedTypeCoercionRule | 665020354 | 567320034 | 0.85 | 330798240 | 0.50

</google-sheets-html-origin>

Closes #32761 from sigmod/transform.

Authored-by: Yingyi Bu <yingyi.bu@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(commit: 7bc364b)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleIdCollection.scala (diff)
Commit 510bde460a2689d88fd127f5412b635b7912a5dd by gengliang
[SPARK-35655][BUILD] Upgrade HtmlUnit and its related artifacts to 2.50

### What changes were proposed in this pull request?

This PR upgrades HtmlUnit and its related artifacts to `2.50`.

### Why are the changes needed?

We currently uses 2.40 but 2.41+ seem to include bunch of bug fixes and improvements especially for JavaScript and CSS.
https://htmlunit.sourceforge.io/changes-report.html#a2.50.0

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

All the `UISeleniumSuite` which use HtmlUnit pass on my laptop.

Closes #32789 from sarutak/upgrade-htmlunit250.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(commit: 510bde4)
The file was modifiedpom.xml (diff)