Changes

Summary

  1. [SPARK-33957][BUILD] Update commons-lang3 to 3.11 (commit: bd346f4) (details)
  2. [SPARK-33956][SQL] Add rowCount for Range operator (commit: 4cd6805) (details)
  3. [SPARK-33961][BUILD] Upgrade SBT to 1.4.6 (commit: 1c25bea) (details)
  4. [SPARK-33959][SQL] Improve the statistics estimation of the Tail (commit: 6c5ba81) (details)
  5. [SPARK-33963][SQL] Canonicalize `HiveTableRelation` w/o table stats (commit: fc7d016) (details)
  6. [SPARK-33962][SS] Fix incorrect min partition condition (commit: cfd4a08) (details)
  7. [SPARK-33955][SS] Add latest offsets to source progress (commit: 963c60f) (details)
  8. [SPARK-33398] Fix loading tree models prior to Spark 3.0 (commit: 6b7527e) (details)
  9. [SPARK-33950][SQL] Refresh cache in v1 `ALTER TABLE .. DROP PARTITION` (commit: 67195d0) (details)
  10. [SPARK-33951][SQL] Distinguish the error between filter and distinct (commit: b037930) (details)
  11. [SPARK-33954][SQL] Some operator missing rowCount when enable CBO (commit: 2a68ed7) (details)
  12. [SPARK-33934][SQL] Add SparkFile's root dir to env property PATH (commit: adac633) (details)
  13. [SPARK-33888][SQL] JDBC SQL TIME type represents incorrectly as (commit: 0b647fe) (details)
  14. [SPARK-33965][SQL][TESTS] Recognize `spark_catalog` by `CACHE TABLE` in (commit: 8b3fb43) (details)
  15. [SPARK-33978][SQL] Support ZSTD compression in ORC data source (commit: 271c4f6) (details)
  16. [SPARK-33844][SQL] InsertIntoHiveDir command should check col name too (commit: 8583a46) (details)
  17. [SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables (commit: ddc0d51) (details)
  18. [SPARK-33984][PYTHON] Upgrade to Py4J 0.10.9.1 (commit: 6b86aa0) (details)
  19. [SPARK-33990][SQL][TESTS] Remove partition data by v2 `ALTER TABLE .. (commit: fc3f226) (details)
  20. [SPARK-33988][SQL][TEST] Add an option to enable CBO in (commit: 414d323) (details)
  21. [SPARK-33983][PYTHON] Update cloudpickle to v1.6.0 (commit: d6322bf) (details)
  22. [SPARK-33980][SS] Invalidate char/varchar in spark.readStream.schema (commit: ac4651a) (details)
  23. [SPARK-33996][BUILD] Upgrade checkstyle plugins (commit: 90f4ecf) (details)
  24. [SPARK-33987][SQL] Refresh cache in v2 `ALTER TABLE .. DROP PARTITION` (commit: 84c1f43) (details)
  25. [SPARK-33894][SQL] Change visibility of private case classes in mllib to (commit: 9b4173f) (details)
  26. [SPARK-33908][CORE][FOLLOWUP] Correct Scaladoc of (commit: 559f411) (details)
  27. [SPARK-33964][SQL] Combine distinct unions in more cases (commit: bb6d6b5) (details)
  28. [SPARK-33794][SQL] NextDay expression throw runtime (commit: 976e97a) (details)
  29. [SPARK-33998][SQL] Provide an API to create an InternalRow in (commit: 6b00fdc) (details)
  30. [SPARK-34001][SQL][TESTS] Remove unused runShowTablesSql() in (commit: 15a863f) (details)
  31. [SPARK-33992][SQL] override transformUpWithNewOutput to add (commit: f0ffe0c) (details)
  32. [SPARK-34000][CORE] Fix stageAttemptToNumSpeculativeTasks (commit: a7d3fcd) (details)
  33. [SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in (commit: a071826) (details)
  34. [SPARK-33935][SQL] Fix CBO cost function (commit: f252a93) (details)
  35. [SPARK-33919][SQL][TESTS] Unify v1 and v2 SHOW NAMESPACES tests (commit: 122f8f0) (details)
  36. [SPARK-34007][BUILD] Downgrade scala-maven-plugin to 4.3.0 (commit: 356fdc9) (details)
  37. [SPARK-32017][PYTHON][FOLLOW-UP] Rename HADOOP_VERSION to (commit: 329850c) (details)
  38. [SPARK-33999][BUILD] Make sbt unidoc success with JDK11 (commit: acf0a4f) (details)
  39. [SPARK-34010][SQL][DODCS] Use python3 instead of python in SQL (commit: 8d09f96) (details)
  40. [SPARK-34009][BUILD] To activate profile 'aarch64' based on OS settings (commit: 14c2eda) (details)
  41. [SPARK-33542][SQL] Group exception messages in catalyst/catalog (commit: cc1d9d2) (details)
  42. [SPARK-33874][K8S][FOLLOWUP] Handle long lived sidecars - clean up (commit: 171db85) (details)
  43. [SPARK-34012][SQL] Keep behavior consistent when conf (commit: e279ed3) (details)
  44. [SPARK-34011][SQL] Refresh cache in `ALTER TABLE .. RENAME TO PARTITION` (commit: b77d11d) (details)
  45. [SPARK-34015][R] Fixing input timing in gapply (commit: 3d8ee49) (details)
  46. [SPARK-33029][CORE][WEBUI] Fix the UI executor page incorrectly marking (commit: 2951082) (details)
  47. [SPARK-34004][SQL] Change FrameLessOffsetWindowFunction as sealed (commit: 2ab77d6) (details)
  48. [SPARK-34008][BUILD] Upgrade derby to 10.14.2.0 (commit: b1c4fc7) (details)
  49. [SPARK-33635][SS] Adjust the order of check in (commit: fa93090) (details)
  50. [SPARK-33934][SQL][FOLLOW-UP] Use SubProcessor's exit code as assert (commit: c0d0dba) (details)
  51. [SPARK-33948][SQL] Fix CodeGen error of MapObjects.doGenCode method in (commit: 45a4ff8) (details)
  52. [SPARK-33938][SQL] Optimize Like Any/All by LikeSimplification (commit: 26d8df3) (details)
  53. [SPARK-32221][K8S] Avoid possible errors due to incorrect file size or (commit: f64dfa8) (details)
  54. [SPARK-30681][PYTHON][FOLLOW-UP] Keep the name similar with Scala side (commit: ff284fb) (details)
  55. [SPARK-34022][DOCS] Support latest mkdocs in SQL built-in function docs (commit: 0d86a02) (details)
  56. [SPARK-33977][SQL][DOCS] Add doc for "'like any' and 'like all' (commit: 6788304) (details)
  57. [SPARK-32685][SQL][FOLLOW-UP] Update migration guide about change (commit: 3cdc4ef) (details)
  58. [SPARK-34022][DOCS][FOLLOW-UP] Fix typo in SQL built-in function docs (commit: a0269bb) (details)
  59. [SPARK-34029][SQL][TESTS] Add OrcEncryptionSuite and FakeKeyProvider (commit: 8bb70bf) (details)
  60. [SPARK-33806][SQL][FOLLOWUP] Fold RepartitionExpression num partition (commit: f9daf03) (details)
Commit bd346f4a2d078dd36d2fcadf3d5025389b124814 by dhyun
[SPARK-33957][BUILD] Update commons-lang3 to 3.11

### What changes were proposed in this pull request?

This PR aims to update commons-lang3 to 3.11 to support Java 16+ better.

### Why are the changes needed?

commons-lang3 has the following bug fixes and Java 16 support.
- https://commons.apache.org/proper/commons-lang/changes-report.html#a3.11

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?
Pass the CIs.

Closes #30990 from williamhyun/Commons-lang3.

Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: bd346f4)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
Commit 4cd680581a948fb4d7701842ac2cd9e12328089d by dhyun
[SPARK-33956][SQL] Add rowCount for Range operator

### What changes were proposed in this pull request?

This pr add rowCount for `Range` operator:
```scala
spark.sql("set spark.sql.cbo.enabled=true")
spark.sql("select id from range(100)").explain("cost")
```

Before this pr:
```
== Optimized Logical Plan ==
Range (0, 100, step=1, splits=None), Statistics(sizeInBytes=800.0 B)
```

After this pr:
```
== Optimized Logical Plan ==
Range (0, 100, step=1, splits=None), Statistics(sizeInBytes=800.0 B, rowCount=100)
```

### Why are the changes needed?

[`JoinEstimation.estimateInnerOuterJoin`](https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala#L55-L156) need the row count.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #30989 from wangyum/SPARK-33956.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 4cd6805)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala (diff)
Commit 1c25bea0bbe8365523d2a3d6b06da03d67f25794 by dhyun
[SPARK-33961][BUILD] Upgrade SBT to 1.4.6

### What changes were proposed in this pull request?

This PR aims to upgrade SBT to 1.4.6 to fix the SBT regression.

### Why are the changes needed?

[SBT 1.4.6](https://github.com/sbt/sbt/releases/tag/v1.4.6) has the following fixes
- Updates to Coursier 2.0.8, which fixes the cache directory setting on Windows
- Fixes performance regression in shell tab completion
- Fixes match error when using withDottyCompat
- Fixes thread-safety in AnalysisCallback handler

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #30993 from dongjoon-hyun/SPARK-SBT-1.4.6.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 1c25bea)
The file was modifiedproject/build.properties (diff)
Commit 6c5ba8169ae64fdcefd8530c2b38326178f5fa92 by gurwls223
[SPARK-33959][SQL] Improve the statistics estimation of the Tail

### What changes were proposed in this pull request?

This pr improve the statistics estimation of the `Tail`:

```scala
spark.sql("set spark.sql.cbo.enabled=true")
spark.range(100).selectExpr("id as a", "id as b", "id as c", "id as e").write.saveAsTable("t1")
println(Tail(Literal(5), spark.sql("SELECT * FROM t1").queryExecution.logical).queryExecution.stringWithStats)
```

Before this pr:
```
== Optimized Logical Plan ==
Tail 5, Statistics(sizeInBytes=3.8 KiB)
+- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB)
```

After this pr:
```
== Optimized Logical Plan ==
Tail 5, Statistics(sizeInBytes=200.0 B, rowCount=5)
+- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB)
```

### Why are the changes needed?

Import statistics estimation.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #30991 from wangyum/SPARK-33959.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 6c5ba81)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/BasicStatsPlanVisitor.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlanVisitor.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala (diff)
Commit fc7d0165d29e04a8e78577c853a701bdd8a2af4a by gurwls223
[SPARK-33963][SQL] Canonicalize `HiveTableRelation` w/o table stats

### What changes were proposed in this pull request?
Skip table stats in canonicalizing of `HiveTableRelation`.

### Why are the changes needed?
The changes fix a regression comparing to Spark 3.0, see SPARK-33963.

### Does this PR introduce _any_ user-facing change?
Yes. After changes Spark behaves as in the version 3.0.1.

### How was this patch tested?
By running new UT:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *CachedTableSuite"
```

Closes #30995 from MaxGekk/fix-caching-hive-table.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: fc7d016)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala (diff)
Commit cfd4a083987f985da4659333c718561c19e0cbfe by dhyun
[SPARK-33962][SS] Fix incorrect min partition condition

### What changes were proposed in this pull request?

This patch fixes an incorrect condition when comparing offset range size and min partition config.

### Why are the changes needed?

When calculating offset ranges, we consider `minPartitions` configuration. If `minPartitions` is not set or is less than or equal the size of given ranges, it means there are enough partitions at Kafka so we don't need to split offsets to satisfy min partition requirement. But the current condition is `offsetRanges.size > minPartitions.get` and is not correct. Currently `getRanges` will split offsets in unnecessary case.

Besides, in non-split case, we can assign preferred executor location and reuse `KafkaConsumer`. So unnecessary splitting offset range will miss the chance to reuse `KafkaConsumer`.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test.

Manual test in Spark cluster with Kafka.

Closes #30994 from viirya/ss-minor4.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: cfd4a08)
The file was modifiedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculatorSuite.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetRangeCalculator.scala (diff)
Commit 963c60fe49a54c05cc1c50cb7abce864c5322bdf by dhyun
[SPARK-33955][SS] Add latest offsets to source progress

### What changes were proposed in this pull request?

This patch proposes to add latest offset to source progress for streaming queries.

### Why are the changes needed?

Currently we record start and end offsets per source in streaming process. Latest offset is an important information for streaming process but the progress lacks of this info. We can use it to track the process lag and adjust streaming queries. We should add latest offset to source progress.

### Does this PR introduce _any_ user-facing change?

Yes, for new metric about latest source offset in source progress.

### How was this patch tested?

Unit test. Manually test in Spark cluster:

```
    "description" : "KafkaV2[Subscribe[page_view_events]]",
    "startOffset" : {
      "page_view_events" : {
        "2" : 582370921,
        "4" : 391910836,
        "1" : 631009201,
        "3" : 406601346,
        "0" : 195799112
      }
    },
    "endOffset" : {
      "page_view_events" : {
        "2" : 583764414,
        "4" : 392338002,
        "1" : 632183480,
        "3" : 407101489,
        "0" : 197304028
      }
    },
    "latestOffset" : {
      "page_view_events" : {
        "2" : 589852545,
        "4" : 394204277,
        "1" : 637313869,
        "3" : 409286602,
        "0" : 203878962
      }
    },
    "numInputRows" : 4999997,
    "inputRowsPerSecond" : 29287.70501405811,
```

Closes #30988 from viirya/latest-offset.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 963c60f)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/SupportsAdmissionControl.java (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala (diff)
The file was modifiedproject/MimaExcludes.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala (diff)
Commit 6b7527e381591bcd51be205853aea3e349893139 by srowen
[SPARK-33398] Fix loading tree models prior to Spark 3.0

### What changes were proposed in this pull request?
In https://github.com/apache/spark/pull/21632/files#diff-0fdae8a6782091746ed20ea43f77b639f9c6a5f072dd2f600fcf9a7b37db4f47, a new field `rawCount` was added into `NodeData`, which cause that a tree model trained in 2.4 can not be loaded in 3.0/3.1/master;
field `rawCount` is only used in training, and not used in `transform`/`predict`/`featureImportance`. So I just set it to -1L.

### Why are the changes needed?
to support load old tree model in 3.0/3.1/master

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added testsuites

Closes #30889 from zhengruifeng/fix_tree_load.

Authored-by: Ruifeng Zheng <ruifengz@foxmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 6b7527e)
The file was addedmllib/src/test/resources/ml-models/rfc-2.4.7/data/_SUCCESS
The file was addedmllib/src/test/resources/ml-models/rfc-2.4.7/metadata/_SUCCESS
The file was addedmllib/src/test/resources/ml-models/gbtc-2.4.7/treesMetadata/._SUCCESS.crc
The file was addedmllib/src/test/resources/ml-models/rfc-2.4.7/treesMetadata/.part-00000-21082d24-b666-4c4e-a823-70c7afdcbdc5-c000.snappy.parquet.crc
The file was addedmllib/src/test/resources/ml-models/dtr-2.4.7/metadata/_SUCCESS
The file was addedmllib/src/test/resources/ml-models/dtr-2.4.7/data/.part-00000-39b027f0-a437-4b3d-84af-d861adcb9ca8-c000.snappy.parquet.crc
The file was addedmllib/src/test/resources/ml-models/rfc-2.4.7/treesMetadata/part-00000-21082d24-b666-4c4e-a823-70c7afdcbdc5-c000.snappy.parquet
The file was addedmllib/src/test/resources/ml-models/rfr-2.4.7/data/._SUCCESS.crc
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala (diff)
The file was addedmllib/src/test/resources/ml-models/rfc-2.4.7/data/part-00000-e41a7b98-91f8-4485-b112-25b4b11c9009-c000.snappy.parquet
The file was addedmllib/src/test/resources/ml-models/rfr-2.4.7/metadata/part-00000
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala (diff)
The file was addedmllib/src/test/resources/ml-models/gbtr-2.4.7/data/_SUCCESS
The file was addedmllib/src/test/resources/ml-models/gbtr-2.4.7/metadata/_SUCCESS
The file was addedmllib/src/test/resources/ml-models/gbtr-2.4.7/data/part-00000-3b5433ff-d346-4511-9aab-639288bfae6d-c000.snappy.parquet
The file was addedmllib/src/test/resources/ml-models/gbtr-2.4.7/data/._SUCCESS.crc
The file was addedmllib/src/test/resources/ml-models/gbtc-2.4.7/metadata/part-00000
The file was addedmllib/src/test/resources/ml-models/gbtc-2.4.7/metadata/_SUCCESS
The file was addedmllib/src/test/resources/ml-models/gbtr-2.4.7/treesMetadata/part-00000-6b9124f5-87fe-4fd8-ad9c-4be239c2215a-c000.snappy.parquet
The file was addedmllib/src/test/resources/ml-models/rfr-2.4.7/metadata/.part-00000.crc
The file was addedmllib/src/test/resources/ml-models/gbtr-2.4.7/data/.part-00000-3b5433ff-d346-4511-9aab-639288bfae6d-c000.snappy.parquet.crc
The file was addedmllib/src/test/resources/ml-models/rfc-2.4.7/metadata/._SUCCESS.crc
The file was addedmllib/src/test/resources/ml-models/gbtc-2.4.7/metadata/._SUCCESS.crc
The file was addedmllib/src/test/resources/ml-models/gbtc-2.4.7/treesMetadata/_SUCCESS
The file was addedmllib/src/test/resources/ml-models/gbtc-2.4.7/data/_SUCCESS
The file was addedmllib/src/test/resources/ml-models/rfc-2.4.7/data/.part-00000-e41a7b98-91f8-4485-b112-25b4b11c9009-c000.snappy.parquet.crc
The file was addedmllib/src/test/resources/ml-models/gbtr-2.4.7/metadata/._SUCCESS.crc
The file was addedmllib/src/test/resources/ml-models/gbtr-2.4.7/metadata/.part-00000.crc
The file was addedmllib/src/test/resources/ml-models/rfc-2.4.7/metadata/part-00000
The file was addedmllib/src/test/resources/ml-models/gbtc-2.4.7/data/part-00000-dacbde64-c861-41c7-91c0-6da8cc01fb43-c000.snappy.parquet
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/tree/treeModels.scala (diff)
The file was addedmllib/src/test/resources/ml-models/dtc-2.4.7/metadata/._SUCCESS.crc
The file was addedmllib/src/test/resources/ml-models/dtr-2.4.7/data/_SUCCESS
The file was addedmllib/src/test/resources/ml-models/dtr-2.4.7/metadata/.part-00000.crc
The file was addedmllib/src/test/resources/ml-models/gbtc-2.4.7/metadata/.part-00000.crc
The file was addedmllib/src/test/resources/ml-models/gbtr-2.4.7/treesMetadata/.part-00000-6b9124f5-87fe-4fd8-ad9c-4be239c2215a-c000.snappy.parquet.crc
The file was addedmllib/src/test/resources/ml-models/gbtr-2.4.7/treesMetadata/_SUCCESS
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala (diff)
The file was addedmllib/src/test/resources/ml-models/dtc-2.4.7/data/.part-00000-bd7ae42f-c890-406c-894c-ca4eac67c690-c000.snappy.parquet.crc
The file was addedmllib/src/test/resources/ml-models/dtc-2.4.7/data/_SUCCESS
The file was addedmllib/src/test/resources/ml-models/dtr-2.4.7/metadata/part-00000
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala (diff)
The file was addedmllib/src/test/resources/ml-models/rfc-2.4.7/data/._SUCCESS.crc
The file was addedmllib/src/test/resources/ml-models/rfr-2.4.7/treesMetadata/._SUCCESS.crc
The file was addedmllib/src/test/resources/ml-models/rfc-2.4.7/treesMetadata/_SUCCESS
The file was addedmllib/src/test/resources/ml-models/rfr-2.4.7/metadata/._SUCCESS.crc
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala (diff)
The file was addedmllib/src/test/resources/ml-models/dtr-2.4.7/data/part-00000-39b027f0-a437-4b3d-84af-d861adcb9ca8-c000.snappy.parquet
The file was addedmllib/src/test/resources/ml-models/gbtc-2.4.7/data/.part-00000-dacbde64-c861-41c7-91c0-6da8cc01fb43-c000.snappy.parquet.crc
The file was addedmllib/src/test/resources/ml-models/rfc-2.4.7/treesMetadata/._SUCCESS.crc
The file was addedmllib/src/test/resources/ml-models/dtr-2.4.7/data/._SUCCESS.crc
The file was addedmllib/src/test/resources/ml-models/dtc-2.4.7/metadata/part-00000
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/feature/StringIndexerSuite.scala (diff)
The file was addedmllib/src/test/resources/ml-models/dtc-2.4.7/data/part-00000-bd7ae42f-c890-406c-894c-ca4eac67c690-c000.snappy.parquet
The file was addedmllib/src/test/resources/ml-models/rfr-2.4.7/data/.part-00000-4a69607d-6edb-40fc-b681-981caaeca996-c000.snappy.parquet.crc
The file was addedmllib/src/test/resources/ml-models/rfc-2.4.7/metadata/.part-00000.crc
The file was addedmllib/src/test/resources/ml-models/dtc-2.4.7/metadata/_SUCCESS
The file was addedmllib/src/test/resources/ml-models/gbtc-2.4.7/treesMetadata/.part-00000-81137d9f-31e3-4a90-813c-ddc394101e21-c000.snappy.parquet.crc
The file was addedmllib/src/test/resources/ml-models/rfr-2.4.7/data/_SUCCESS
The file was addedmllib/src/test/resources/ml-models/dtr-2.4.7/metadata/._SUCCESS.crc
The file was addedmllib/src/test/resources/ml-models/gbtr-2.4.7/treesMetadata/._SUCCESS.crc
The file was addedmllib/src/test/resources/ml-models/rfr-2.4.7/treesMetadata/part-00000-dfe4db51-d349-447a-9b86-d95edaabcde8-c000.snappy.parquet
The file was addedmllib/src/test/resources/ml-models/gbtr-2.4.7/metadata/part-00000
The file was addedmllib/src/test/resources/ml-models/gbtc-2.4.7/treesMetadata/part-00000-81137d9f-31e3-4a90-813c-ddc394101e21-c000.snappy.parquet
The file was addedmllib/src/test/resources/ml-models/rfr-2.4.7/metadata/_SUCCESS
The file was addedmllib/src/test/resources/ml-models/dtc-2.4.7/metadata/.part-00000.crc
The file was addedmllib/src/test/resources/ml-models/dtc-2.4.7/data/._SUCCESS.crc
The file was addedmllib/src/test/resources/ml-models/gbtc-2.4.7/data/._SUCCESS.crc
The file was addedmllib/src/test/resources/ml-models/rfr-2.4.7/treesMetadata/_SUCCESS
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/regression/RandomForestRegressorSuite.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/feature/HashingTFSuite.scala (diff)
The file was addedmllib/src/test/resources/ml-models/rfr-2.4.7/data/part-00000-4a69607d-6edb-40fc-b681-981caaeca996-c000.snappy.parquet
The file was addedmllib/src/test/resources/ml-models/rfr-2.4.7/treesMetadata/.part-00000-dfe4db51-d349-447a-9b86-d95edaabcde8-c000.snappy.parquet.crc
Commit 67195d0d977caa5a458e8a609c434205f9b54d1b by wenchen
[SPARK-33950][SQL] Refresh cache in v1 `ALTER TABLE .. DROP PARTITION`

### What changes were proposed in this pull request?
Invoke `refreshTable()` from `AlterTableDropPartitionCommand.run()` after partitions dropping. In particular, this invalidates the cache associated with the modified table.

### Why are the changes needed?
This fixes the issues portrayed by the example:
```sql
spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED BY (part0);
spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
spark-sql> CACHE TABLE tbl1;
spark-sql> SELECT * FROM tbl1;
0 0
1 1
spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
spark-sql> SELECT * FROM tbl1;
0 0
1 1
```
The last query must not return `0 0` since it was deleted by previous command.

### Does this PR introduce _any_ user-facing change?
Yes. After the changes for the example above:
```sql
...
spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
spark-sql> SELECT * FROM tbl1;
1 1
```

### How was this patch tested?
By running the affected test suite:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
```

Closes #30983 from MaxGekk/drop-partition-refresh-cache.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 67195d0)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableDropPartitionSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala (diff)
Commit b037930952a341f4ed956a8f1839852992feaadc by wenchen
[SPARK-33951][SQL] Distinguish the error between filter and distinct

### What changes were proposed in this pull request?
The error messages for specifying filter and distinct for the aggregate function are mixed together and should be separated. This can increase readability and ease of use.

### Why are the changes needed?
increase readability and ease of use.

### Does this PR introduce _any_ user-facing change?
'Yes'.

### How was this patch tested?
Jenkins test

Closes #30982 from beliefer/SPARK-33951.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: b037930)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/higherOrderFunctions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/QueryCompilationErrors.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala (diff)
Commit 2a68ed71e4402c2864202aa78a54d9921c257990 by wenchen
[SPARK-33954][SQL] Some operator missing rowCount when enable CBO

### What changes were proposed in this pull request?

This pr fix some operator missing rowCount when enable CBO, e.g.:
```scala
spark.range(1000).selectExpr("id as a", "id as b").write.saveAsTable("t1")
spark.sql("ANALYZE TABLE t1 COMPUTE STATISTICS FOR ALL COLUMNS")
spark.sql("set spark.sql.cbo.enabled=true")
spark.sql("set spark.sql.cbo.planStats.enabled=true")
spark.sql("select * from (select * from t1 distribute by a limit 100) distribute by b").explain("cost")
```

Before this pr:
```
== Optimized Logical Plan ==
RepartitionByExpression [b#2129L], Statistics(sizeInBytes=2.3 KiB)
+- GlobalLimit 100, Statistics(sizeInBytes=2.3 KiB, rowCount=100)
   +- LocalLimit 100, Statistics(sizeInBytes=23.4 KiB)
      +- RepartitionByExpression [a#2128L], Statistics(sizeInBytes=23.4 KiB)
         +- Relation[a#2128L,b#2129L] parquet, Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
```

After this pr:
```
== Optimized Logical Plan ==
RepartitionByExpression [b#2129L], Statistics(sizeInBytes=2.3 KiB, rowCount=100)
+- GlobalLimit 100, Statistics(sizeInBytes=2.3 KiB, rowCount=100)
   +- LocalLimit 100, Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
      +- RepartitionByExpression [a#2128L], Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
         +- Relation[a#2128L,b#2129L] parquet, Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)

```

### Why are the changes needed?

[`JoinEstimation.estimateInnerOuterJoin`](https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala#L55-L156) need the row count.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #30987 from wangyum/SPARK-33954.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 2a68ed7)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/BasicStatsPlanVisitor.scala (diff)
Commit adac633f93f05442f57456f7dcc0d59822d14c2b by gurwls223
[SPARK-33934][SQL] Add SparkFile's root dir to env property PATH

### What changes were proposed in this pull request?
In hive we always use
```
add file /path/to/script.py;
select transform(col1, col2, ..)
using 'script.py' as (col1, col2, ...)
from ...
```
Since in spark we wrapper script command with `/bash/bin -c`, in this case we will throw `script.py command not found`.

This pr add a SparkFile's root dir path to execution env property `PATH`, then  sub-processor will find `scrip.py` as program under `PATH`.

### Why are the changes needed?
Support SQL migration form Hive to Spark.

### Does this PR introduce _any_ user-facing change?
User can direct use script file name as program in script transform SQL.

```
add file /path/to/script.py;
select transform(col1, col2, ..)
using 'script.py' as (col1, col2, ...)
from ...
```
### How was this patch tested?
UT

Closes #30973 from AngersZhuuuu/SPARK-33934.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: adac633)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala (diff)
The file was modifiedsql/core/src/test/resources/test_script.py (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala (diff)
Commit 0b647fe69cf201b4dcbc0f4dfc0eb504a523571d by wenchen
[SPARK-33888][SQL] JDBC SQL TIME type represents incorrectly as TimestampType, it should be physical Int in millis

### What changes were proposed in this pull request?
JDBC SQL TIME type represents incorrectly as TimestampType, we change it to be physical Int in millis for now.

### Why are the changes needed?
Currently, for JDBC, SQL TIME type represents incorrectly as Spark TimestampType. This should be represent as physical int in millis Represents a time of day, with no reference to a particular calendar, time zone or date, with a precision of one millisecond. It stores the number of milliseconds after midnight, 00:00:00.000.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Close #30902

Closes #30902 from saikocat/SPARK-33888.

Lead-authored-by: Hoa <hoameomu@gmail.com>
Co-authored-by: Hoa <saikocatz@gmail.com>
Co-authored-by: Duc Hoa, Nguyen <hoa.nd@teko.vn>
Co-authored-by: Duc Hoa, Nguyen <hoameomu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 0b647fe)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala (diff)
Commit 8b3fb43f408594ebcb9313b0c0d4c5982ba1ae31 by wenchen
[SPARK-33965][SQL][TESTS] Recognize `spark_catalog` by `CACHE TABLE` in Hive table names

### What changes were proposed in this pull request?
Remove special handling of `CacheTable` in `TestHiveQueryExecution. analyzed` because it does not allow to support of `spark_catalog` in Hive table names. `spark_catalog` could be handled by a few lines below:
```scala
      case UnresolvedRelation(ident, _, _) =>
        if (ident.length > 1 && ident.head.equalsIgnoreCase(CatalogManager.SESSION_CATALOG_NAME)) {
```
added by https://github.com/apache/spark/pull/30883.

### Why are the changes needed?
1. To have feature parity with v1 In-Memory catalog.
2. To be able to write unified tests for In-Memory and Hive external catalogs.

### Does this PR introduce _any_ user-facing change?
Should not.

### How was this patch tested?
By running the test suite with new UT:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *CachedTableSuite"
```

Closes #30997 from MaxGekk/cache-table-spark_catalog.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 8b3fb43)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala (diff)
Commit 271c4f6e00b7bc7c47d84a8e59018e84a19c9822 by dhyun
[SPARK-33978][SQL] Support ZSTD compression in ORC data source

### What changes were proposed in this pull request?

This PR aims to support ZSTD compression in ORC data source.

### Why are the changes needed?

Apache ORC 1.6 supports ZSTD compression to generate more compact files and save the storage cost.
- https://issues.apache.org/jira/browse/ORC-363

**BEFORE**
```scala
scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd")
java.lang.IllegalArgumentException: Codec [zstd] is not available. Available codecs are uncompressed, lzo, snappy, zlib, none.
```

**AFTER**
```scala
scala> spark.range(10).write.option("compression", "zstd").orc("/tmp/zstd")
```

```bash
$ orc-tools meta /tmp/zstd
Processing data file file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc [length: 230]
Structure for file:/tmp/zstd/part-00011-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc
File Version: 0.12 with ORC_14
Rows: 1
Compression: ZSTD
Compression size: 262144
Calendar: Julian/Gregorian
Type: struct<id:bigint>

Stripe Statistics:
  Stripe 1:
    Column 0: count: 1 hasNull: false
    Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9

File Statistics:
  Column 0: count: 1 hasNull: false
  Column 1: count: 1 hasNull: false bytesOnDisk: 6 min: 9 max: 9 sum: 9

Stripes:
  Stripe: offset: 3 data: 6 rows: 1 tail: 35 index: 35
    Stream: column 0 section ROW_INDEX start: 3 length 11
    Stream: column 1 section ROW_INDEX start: 14 length 24
    Stream: column 1 section DATA start: 38 length 6
    Encoding column 0: DIRECT
    Encoding column 1: DIRECT_V2

File length: 230 bytes
Padding length: 0 bytes
Padding ratio: 0%

User Metadata:
  org.apache.spark.version=3.2.0
```

### Does this PR introduce _any_ user-facing change?

Yes, this is a new feature.

### How was this patch tested?

Pass the newly added test case.

Closes #31002 from dongjoon-hyun/SPARK-33978.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 271c4f6)
The file was modifiedpython/pyspark/sql/readwriter.py (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOptions.scala (diff)
Commit 8583a4605f74cd439bbf109bf9d551e0ec697910 by wenchen
[SPARK-33844][SQL] InsertIntoHiveDir command should check col name too

### What changes were proposed in this pull request?

In hive-1.2.1, hive serde just split `serdeConstants.LIST_COLUMNS` and `serdeConstants.LIST_COLUMN_TYPES` use comma.

When we use spark 2.4 with UT
```
  test("insert overwrite directory with comma col name") {
    withTempDir { dir =>
      val path = dir.toURI.getPath

      val v1 =
        s"""
           | INSERT OVERWRITE DIRECTORY '${path}'
           | STORED AS TEXTFILE
           | SELECT 1 as a, 'c' as b, if(1 = 1, "true", "false")
         """.stripMargin

      sql(v1).explain(true)

      sql(v1).show()
    }
  }
```
failed with as below since column name contains `,` then column names and column types size not equal.
```
19:56:05.618 ERROR org.apache.spark.sql.execution.datasources.FileFormatWriter:  [ angerszhu ] Aborting job dd774f18-93fa-431f-9468-3534c7d8acda.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.apache.hadoop.hive.serde2.SerDeException: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe: columns has 5 elements while columns.types has 3 elements!
at org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.extractColumnInfo(LazySerDeParameters.java:145)
at org.apache.hadoop.hive.serde2.lazy.LazySerDeParameters.<init>(LazySerDeParameters.java:85)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.initialize(LazySimpleSerDe.java:125)
at org.apache.spark.sql.hive.execution.HiveOutputWriter.<init>(HiveFileFormat.scala:119)
at org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:103)
at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:120)
at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:108)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:287)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:219)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:218)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$12.apply(Executor.scala:461)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:467)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

```

After hive-2.3 we will set COLUMN_NAME_DELIMITER to special char when col name cntains ',':
https://github.com/apache/hive/blob/6f4c35c9e904d226451c465effdc5bfd31d395a0/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L1180-L1188
https://github.com/apache/hive/blob/6f4c35c9e904d226451c465effdc5bfd31d395a0/metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java#L1044-L1075

And in script transform, we parse column name  to avoid this problem
https://github.com/apache/spark/blob/554600c2af0dbc8979955807658fafef5dc66c08/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationExec.scala#L257-L261

So I think in `InsertIntoHiveDirComman`, we should do same thing too. And I have verified this method can make spark-2.4 work well.

### Why are the changes needed?
More save use serde

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Closes #30850 from AngersZhuuuu/SPARK-33844.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 8583a46)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveDirCommand.scala (diff)
Commit ddc0d5148ac6decde160cca847b5db5d6de1be58 by wenchen
[SPARK-33875][SQL] Implement DESCRIBE COLUMN for v2 tables

### What changes were proposed in this pull request?

This PR proposes to implement `DESCRIBE COLUMN` for v2 tables.

Note that `isExnteded` option is not implemented in this PR.

### Why are the changes needed?

Parity with v1 tables.

### Does this PR introduce _any_ user-facing change?

Yes, now, `DESCRIBE COLUMN` works for v2 tables.
```scala
sql("CREATE TABLE testcat.tbl (id bigint, data string COMMENT 'hello') USING foo")
sql("DESCRIBE testcat.tbl data").show
```
```
+---------+----------+
|info_name|info_value|
+---------+----------+
| col_name|      data|
|data_type|    string|
|  comment|     hello|
+---------+----------+
```

Before this PR, the command would fail with: `Describing columns is not supported for v2 tables.`

### How was this patch tested?

Added new test.

Closes #30881 from imback82/describe_col_v2.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: ddc0d51)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/describe-table-column.sql.out (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeColumnExec.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolvePartitionSpec.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/describe-table-column.sql (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/describe.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/QueryCompilationErrors.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala (diff)
Commit 6b86aa0b524b4d19b91ab434d2088667c9a1e662 by dhyun
[SPARK-33984][PYTHON] Upgrade to Py4J 0.10.9.1

### What changes were proposed in this pull request?

This PR upgrade Py4J from 0.10.9 to 0.10.9.1 that contains some bug fixes and improvements.
It contains one bug fix (https://github.com/bartdag/py4j/commit/4152353ac142a7c6d177e0d8f5d420d92c846a30).

### Why are the changes needed?

To leverage fixes from the upstream in Py4J.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Jenkins build and GitHub Actions will test it out.

Closes #31009 from HyukjinKwon/SPARK-33984.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 6b86aa0)
The file was modifiedcore/pom.xml (diff)
The file was modifiedsbin/spark-config.sh (diff)
The file was modifiedpython/setup.py (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/api/python/PythonUtils.scala (diff)
The file was modifiedpython/docs/Makefile (diff)
The file was modifiedpython/docs/make2.bat (diff)
The file was modifiedbin/pyspark (diff)
The file was addedpython/lib/py4j-0.10.9.1-src.zip
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was removedpython/lib/py4j-0.10.9-src.zip
The file was modifiedbin/pyspark2.cmd (diff)
Commit fc3f22645e5c542e80a086d96da384feb6afe121 by dhyun
[SPARK-33990][SQL][TESTS] Remove partition data by v2 `ALTER TABLE .. DROP PARTITION`

### What changes were proposed in this pull request?
Remove partition data by `ALTER TABLE .. DROP PARTITION` in V2 table catalog used in tests.

### Why are the changes needed?
This is a bug fix. Before the fix, `ALTER TABLE .. DROP PARTITION` does not remove the data belongs to the dropped partition. As a consequence of that, the `select` query returns removed data.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running tests suites for v1 and v2 catalogs:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
```

Closes #31014 from MaxGekk/fix-drop-partition-v2.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: fc3f226)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableDropPartitionSuiteBase.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryPartitionTable.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryAtomicPartitionTable.scala (diff)
Commit 414d323d6c92584beb87e1c426e4beab5ddbd452 by dhyun
[SPARK-33988][SQL][TEST] Add an option to enable CBO in TPCDSQueryBenchmark

### What changes were proposed in this pull request?

This PR intends to add a new option `--cbo` to enable CBO in TPCDSQueryBenchmark. I think this option is useful so as to monitor performance changes with CBO enabled.

### Why are the changes needed?

To monitor performance chaneges with CBO enabled.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually checked.

Closes #31011 from maropu/AddOptionForCBOInTPCDSBenchmark.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 414d323)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmarkArguments.scala (diff)
Commit d6322bf70c622f4068e510975e9f53c8e18bf59c by dhyun
[SPARK-33983][PYTHON] Update cloudpickle to v1.6.0

### What changes were proposed in this pull request?

This PR proposes to upgrade cloudpickle from 1.5.0 to 1.6.0.
It virtually contains one fix:

https://github.com/cloudpipe/cloudpickle/commit/4510be850d55bc60decf86953324f98bc3199f9e

From a cursory look, this isn't a regression, and not even properly supported in Python:

```python
>>> import pickle
>>> pickle.dumps({}.keys())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: cannot pickle 'dict_keys' object
```

So it seems fine not to backport.

### Why are the changes needed?

To leverage bug fixes from the cloudpickle upstream.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Jenkins build and GitHub actions build will test it out.

Closes #31007 from HyukjinKwon/cloudpickle-upgrade.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: d6322bf)
The file was modifiedpython/pyspark/cloudpickle/cloudpickle_fast.py (diff)
The file was modifiedpython/pyspark/cloudpickle/cloudpickle.py (diff)
The file was modifiedpython/pyspark/cloudpickle/__init__.py (diff)
Commit ac4651a7d19b248c86290d419ac3f6d69ed2b61e by dhyun
[SPARK-33980][SS] Invalidate char/varchar in spark.readStream.schema

### What changes were proposed in this pull request?

invalidate char/varchar in `spark.readStream.schema` just like what we've done for `spark.read.schema` in da72b87374a7be5416b99ed016dc2fc9da0ed88a

### Why are the changes needed?

bugfix, char/varchar is only for table schema while `spark.sql.legacy.charVarcharAsString=false`

### Does this PR introduce _any_ user-facing change?

yes, char/varchar will fail to define ss readers when `spark.sql.legacy.charVarcharAsString=false`

### How was this patch tested?

new tests

Closes #31003 from yaooqinn/SPARK-33980.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: ac4651a)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala (diff)
Commit 90f4ecf8cc07505c7cdea90b07fc60151c62ee2d by dhyun
[SPARK-33996][BUILD] Upgrade checkstyle plugins

### What changes were proposed in this pull request?

This PR aims to upgrade `checkstyle` Maven plugins and its dependency, `com.puppycrawl.tools:checkstyle`.

### Why are the changes needed?

The changes are needed to support Java 14+ better.
- https://checkstyle.org/releasenotes.html#Release_8.39
- https://checkstyle.org/releasenotes.html#Release_8.38
- https://checkstyle.org/releasenotes.html#Release_8.37
- https://checkstyle.org/releasenotes.html#Release_8.36
- https://checkstyle.org/releasenotes.html#Release_8.35
- https://checkstyle.org/releasenotes.html#Release_8.34
- https://checkstyle.org/releasenotes.html#Release_8.33
- https://checkstyle.org/releasenotes.html#Release_8.32
- https://checkstyle.org/releasenotes.html#Release_8.31
- https://checkstyle.org/releasenotes.html#Release_8.30

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass the CI.

Closes #31019 from williamhyun/checkstyle.

Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 90f4ecf)
The file was modifiedpom.xml (diff)
Commit 84c1f436690c76bfd3bd1a664dba303cfc8381da by dhyun
[SPARK-33987][SQL] Refresh cache in v2 `ALTER TABLE .. DROP PARTITION`

### What changes were proposed in this pull request?
1. Refresh the cache associated with tables from v2 table catalogs in the `ALTER TABLE .. DROP PARTITION` command.
2. Port the test for v1 catalogs to the base suite to run it for v2 table catalog.

### Why are the changes needed?
The changes fix incorrect query results from cached V2 table altered by `ALTER TABLE .. DROP PARTITION`, see the added test and SPARK-33987.

### Does this PR introduce _any_ user-facing change?
Yes, it could if users have v2 table catalogs.

### How was this patch tested?
By running unified tests for `ALTER TABLE .. DROP PARTITION`:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
```

Closes #31017 from MaxGekk/drop-partition-refresh-cache-v2.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 84c1f43)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableDropPartitionSuiteBase.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableDropPartitionSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableDropPartitionExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
Commit 9b4173fa95047fed94e2fe323ad281fb48deffda by dhyun
[SPARK-33894][SQL] Change visibility of private case classes in mllib to avoid runtime compilation errors with Scala 2.13

### What changes were proposed in this pull request?
Change visibility modifier of two case classes defined inside objects in mllib from private to private[OuterClass]

### Why are the changes needed?
Without this change when running tests for Scala 2.13 you get runtime code generation errors. These errors look like this:
```
[info] Cause: java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 73, Column 65: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 73, Column 65: No applicable constructor/method found for zero actual parameters; candidates are: "public java.lang.String org.apache.spark.ml.feature.Word2VecModel$Data.word()"
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests now pass for Scala 2.13

Closes #31018 from koertkuipers/feat-visibility-scala213.

Authored-by: Koert Kuipers <koert@tresata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 9b4173f)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala (diff)
Commit 559f411da83856a81ac39cf79df8487cc5a06134 by dhyun
[SPARK-33908][CORE][FOLLOWUP] Correct Scaladoc of resolveDependencyPaths/resolveMavenDependencies

### What changes were proposed in this pull request?
Fix un-correct doc of last change https://github.com/apache/spark/pull/30922#discussion_r551453193

### Why are the changes needed?
FIx doc

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Builds finished correctly.

Closes #31016 from AngersZhuuuu/SPARK-33908-FOLLOW-UP.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 559f411)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/util/DependencyUtils.scala (diff)
Commit bb6d6b560287ac83e79012fe8dcbe5dcd2e7a904 by gurwls223
[SPARK-33964][SQL] Combine distinct unions in more cases

### What changes were proposed in this pull request?

Added the `RemoveNoopOperators` rule to optimization batch `Union`.  Also made sure that the `RemoveNoopOperators` would be idempotent.

### Why are the changes needed?

In several TPCDS queries the `CombineUnions` rule does not manage to combine unions, because they have noop `Project`s between them.
The `Project`s will be removed by `RemoveNoopOperators`, but by then `ReplaceDistinctWithAggregate` has been applied and there are aggregates between the unions. Adding a copy of `RemoveNoopOperators` earlier in the optimization chain allows `CombineUnions` to work on more queries.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New UTs and the output of `PlanStabilitySuite`

Closes #30996 from tanelk/SPARK-33964_combine_unions.

Authored-by: tanel.kiis@gmail.com <tanel.kiis@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: bb6d6b5)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q75.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q36a/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q80a/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q5a.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q80a.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q77a/explain.txt (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14a.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q77a.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14a/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q77a/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14a/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q75/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q80a/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q75/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q36a.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q75/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q5a/simplified.txt (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q86a.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q75.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q86a/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q70a/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q80a.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q36a.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q70a/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q77a.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q75/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q86a.sf100/simplified.txt (diff)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveNoopOperatorsSuite.scala
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q70a.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q75.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q36a/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q70a.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14a.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q86a/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q5a/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q75.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q5a.sf100/explain.txt (diff)
Commit 976e97a80de66d520167f58bdc9082e4bbbc9639 by wenchen
[SPARK-33794][SQL] NextDay expression throw runtime IllegalArgumentException when receiving invalid input under ANSI mode

### What changes were proposed in this pull request?

Instead of returning NULL, the next_day function throws runtime IllegalArgumentException when ansiMode is enable and receiving invalid input of the dayOfWeek parameter.

### Why are the changes needed?

For ansiMode.

### Does this PR introduce _any_ user-facing change?

Yes.
When spark.sql.ansi.enabled = true, the next_day function will throw IllegalArgumentException when receiving invalid input of the dayOfWeek parameter.
When spark.sql.ansi.enabled = false, same behaviour as before.

### How was this patch tested?

Ansi mode is tested with existing tests.
End-to-end tests have been added.

Closes #30807 from chongguang/SPARK-33794.

Authored-by: Chongguang LIU <chongguang.liu@laposte.fr>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 976e97a)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime.sql.out (diff)
The file was modifieddocs/sql-ref-ansi-compliance.md (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime-legacy.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/datetime.sql.out (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/datetime.sql (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala (diff)
Commit 6b00fdc756e85ce7affded605d6d8e0a5308c1ed by wenchen
[SPARK-33998][SQL] Provide an API to create an InternalRow in V2CommandExec

### What changes were proposed in this pull request?

There are many v2 commands such as `SHOW TABLES`, `DESCRIBE TABLE`, etc. that require creating `InternalRow`s. Currently, the code to create `InternalRow`s are duplicated across many commands and it can be moved into `V2CommandExec` to remove duplicate code.

### Why are the changes needed?

To clean up duplicate code.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing test since this is just refactoring.

Closes #31020 from imback82/refactor_v2_command.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 6b00fdc)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablesExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablePropertiesExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeColumnExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowCurrentNamespaceExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeTableExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2CommandExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowNamespacesExec.scala (diff)
Commit 15a863fd54aa76cbb0f2a076bd94773529536add by dhyun
[SPARK-34001][SQL][TESTS] Remove unused runShowTablesSql() in DataSourceV2SQLSuite.scala

### What changes were proposed in this pull request?

After #30287, `runShowTablesSql()` in `DataSourceV2SQLSuite.scala` is no longer used. This PR removes the unused method.

### Why are the changes needed?

To remove unused method.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing test.

Closes #31022 from imback82/33382-followup.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 15a863f)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
Commit f0ffe0cd652188873f2ec007e4e282744717a0b3 by wenchen
[SPARK-33992][SQL] override transformUpWithNewOutput to add allowInvokingTransformsInAnalyzer

### What changes were proposed in this pull request?

In https://github.com/apache/spark/pull/29643, we move  the plan rewriting methods to QueryPlan. we need to override transformUpWithNewOutput to add allowInvokingTransformsInAnalyzer
because it and resolveOperatorsUpWithNewOutput are called in the analyzer.
For example,

PaddingAndLengthCheckForCharVarchar could fail query when resolveOperatorsUpWithNewOutput
with
```logtalk
[info] - char/varchar resolution in sub query  *** FAILED *** (367 milliseconds)
[info]   java.lang.RuntimeException: This method should not be called in the analyzer
[info]   at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule(AnalysisHelper.scala:150)
[info]   at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.assertNotAnalysisRule$(AnalysisHelper.scala:146)
[info]   at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.assertNotAnalysisRule(LogicalPlan.scala:29)
[info]   at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown(AnalysisHelper.scala:161)
[info]   at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDown$(AnalysisHelper.scala:160)
[info]   at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
[info]   at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
[info]   at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$updateOuterReferencesInSubquery(QueryPlan.scala:267)
```
### Why are the changes needed?

trivial bugfix
### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

new tests

Closes #31013 from yaooqinn/SPARK-33992.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: f0ffe0c)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/AnalysisHelper.scala (diff)
Commit a7d3fcd354289c1d0f5c80887b4f33beb3ad96a2 by dhyun
[SPARK-34000][CORE] Fix stageAttemptToNumSpeculativeTasks java.util.NoSuchElementException

### What changes were proposed in this pull request?
From below log, Stage 600 could be removed from `stageAttemptToNumSpeculativeTasks` by `onStageCompleted()`, but the speculative task 306.1 in stage 600 threw `NoSuchElementException` when it entered into `onTaskEnd()`.
```
21/01/04 03:00:32,259 WARN [task-result-getter-2] scheduler.TaskSetManager:69 : Lost task 306.1 in stage 600.0 (TID 283610, hdc49-mcc10-01-0510-4108-039-tess0097.stratus.rno.ebay.com, executor 27): TaskKilled (another attempt succeeded)
21/01/04 03:00:32,259 INFO [task-result-getter-2] scheduler.TaskSetManager:57 : Task 306.1 in stage 600.0 (TID 283610) failed, but the task will not be re-executed (either because the task failed with a shuffle data fetch failure, so the
previous stage needs to be re-run, or because a different copy of the task has already succeeded).
21/01/04 03:00:32,259 INFO [task-result-getter-2] cluster.YarnClusterScheduler:57 : Removed TaskSet 600.0, whose tasks have all completed, from pool default
21/01/04 03:00:32,259 INFO [HiveServer2-Handler-Pool: Thread-5853] thriftserver.SparkExecuteStatementOperation:190 : Returning result set with 50 rows from offsets [5378600, 5378650) with 1fe245f8-a7f9-4ec0-bcb5-8cf324cbbb47
21/01/04 03:00:32,260 ERROR [spark-listener-group-executorManagement] scheduler.AsyncEventQueue:94 : Listener ExecutorAllocationListener threw an exception
java.util.NoSuchElementException: key not found: Stage 600 (Attempt 0)
        at scala.collection.MapLike.default(MapLike.scala:235)
        at scala.collection.MapLike.default$(MapLike.scala:234)
        at scala.collection.AbstractMap.default(Map.scala:63)
        at scala.collection.mutable.HashMap.apply(HashMap.scala:69)
        at org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:621)
        at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:45)
        at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
        at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
        at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:38)
        at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:115)
        at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:99)
        at org.apache.spark.scheduler.AsyncEventQueue.super$postToAll(AsyncEventQueue.scala:116)
        at org.apache.spark.scheduler.AsyncEventQueue.$anonfun$dispatch$1(AsyncEventQueue.scala:116)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
        at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:102)
        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.$anonfun$run$1(AsyncEventQueue.scala:97)
        at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1320)
        at org.apache.spark.scheduler.AsyncEventQueue$$anon$2.run(AsyncEventQueue.scala:97)
```

### Why are the changes needed?
To avoid throwing the java.util.NoSuchElementException

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
This is a protective patch and it's not easy to reproduce in UT due to the event order is not fixed in a async queue.

Closes #31025 from LantaoJin/SPARK-34000.

Authored-by: LantaoJin <jinlantao@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: a7d3fcd)
The file was modifiedcore/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala (diff)
Commit a071826f72cd717a58bf37b877f805490f7a147f by yamamuro
[SPARK-33100][SQL] Ignore a semicolon inside a bracketed comment in spark-sql

### What changes were proposed in this pull request?
Now the spark-sql does not support parse the sql statements with bracketed comments.
For the sql statements:
```
/* SELECT 'test'; */
SELECT 'test';
```
Would be split to two statements:
The first one: `/* SELECT 'test'`
The second one: `*/ SELECT 'test'`

Then it would throw an exception because the first one is illegal.
In this PR, we ignore the content in bracketed comments while splitting the sql statements.
Besides, we ignore the comment without any content.

### Why are the changes needed?
Spark-sql might split the statements inside bracketed comments and it is not correct.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Added UT.

Closes #29982 from turboFei/SPARK-33110.

Lead-authored-by: fwang12 <fwang12@ebay.com>
Co-authored-by: turbofei <fwang12@ebay.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: a071826)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala (diff)
Commit f252a9334e49dc359dd9255fcfe17a6bc75b8781 by yamamuro
[SPARK-33935][SQL] Fix CBO cost function

### What changes were proposed in this pull request?

Changed the cost function in CBO to match documentation.

### Why are the changes needed?

The parameter `spark.sql.cbo.joinReorder.card.weight` is documented as:
```
The weight of cardinality (number of rows) for plan cost comparison in join reorder: rows * weight + size * (1 - weight).
```
The implementation in `JoinReorderDP.betterThan` does not match this documentaiton:
```
def betterThan(other: JoinPlan, conf: SQLConf): Boolean = {
      if (other.planCost.card == 0 || other.planCost.size == 0) {
        false
      } else {
        val relativeRows = BigDecimal(this.planCost.card) / BigDecimal(other.planCost.card)
        val relativeSize = BigDecimal(this.planCost.size) / BigDecimal(other.planCost.size)
        relativeRows * conf.joinReorderCardWeight +
          relativeSize * (1 - conf.joinReorderCardWeight) < 1
      }
    }
```

This different implementation has an unfortunate consequence:
given two plans A and B, both A betterThan B and B betterThan A might give the same results. This happes when one has many rows with small sizes and other has few rows with large sizes.

A example values, that have this fenomen with the default weight value (0.7):
A.card = 500, B.card = 300
A.size = 30, B.size = 80
Both A betterThan B and B betterThan A would have score above 1 and would return false.

This happens with several of the TPCDS queries.

The new implementation does not have this behavior.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New and existing UTs

Closes #30965 from tanelk/SPARK-33935_cbo_cost_function.

Authored-by: tanel.kiis@gmail.com <tanel.kiis@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: f252a93)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q17.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q18.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q52.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q19.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q33.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q18a.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q52.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q13.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q24a.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q91.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q24b.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q17.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q91.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q18a.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q13.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q33.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q81.sf100/simplified.txt (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q24b.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q72.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q72.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q55.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q72.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q81.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q72.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q55.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q24a.sf100/explain.txt (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/joinReorder/JoinReorderSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/joinReorder/StarJoinCostBasedReorderSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q19.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q18.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q25.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q25.sf100/explain.txt (diff)
Commit 122f8f0fdb0fdc87a5970f4b39938a0496bd4b4b by wenchen
[SPARK-33919][SQL][TESTS] Unify v1 and v2 SHOW NAMESPACES tests

### What changes were proposed in this pull request?
1. Port DS V2 tests from `DataSourceV2SQLSuite` to the base test suite `ShowNamespacesSuiteBase` to run those tests for v1 catalogs.
2. Port DS v1 tests from `DDLSuite` to `ShowNamespacesSuiteBase` to run the tests for v2 catalogs too.

### Why are the changes needed?
To improve test coverage.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running new test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ShowNamespacesSuite"
```

Closes #30937 from MaxGekk/unify-show-namespaces-tests.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 122f8f0)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was addedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/ShowNamespacesSuite.scala
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/ShowNamespacesSuite.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowNamespacesSuiteBase.scala
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowNamespacesParserSuite.scala
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowNamespacesSuite.scala
Commit 356fdc9a7fc88fd07751c40b920043eaebeb0abf by gurwls223
[SPARK-34007][BUILD] Downgrade scala-maven-plugin to 4.3.0

### What changes were proposed in this pull request?

This PR is a partial revert of https://github.com/apache/spark/pull/30456 by downgrading scala-maven-plugin from 4.4.0 to 4.3.0.

Currently, when you run the docker release script (`./dev/create-release/do-release-docker.sh`), it fails to compile as below during incremental compilation with zinc for an unknown reason:

```
[INFO] Compiling 21 Scala sources and 3 Java sources to /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes ...
[ERROR] ## Exception when compiling 24 sources to /opt/spark-rm/output/spark-3.1.0-bin-hadoop2.7/resource-managers/yarn/target/scala-2.12/test-classes
java.lang.SecurityException: class "javax.servlet.SessionCookieConfig"'s signer information does not match signer information of other classes in the same package
java.lang.ClassLoader.checkCerts(ClassLoader.java:891)
java.lang.ClassLoader.preDefineClass(ClassLoader.java:661)
java.lang.ClassLoader.defineClass(ClassLoader.java:754)
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
java.net.URLClassLoader.access$100(URLClassLoader.java:74)
java.net.URLClassLoader$1.run(URLClassLoader.java:369)
java.net.URLClassLoader$1.run(URLClassLoader.java:363)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:362)
java.lang.ClassLoader.loadClass(ClassLoader.java:418)
java.lang.ClassLoader.loadClass(ClassLoader.java:351)
java.lang.Class.getDeclaredMethods0(Native Method)
java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
java.lang.Class.privateGetPublicMethods(Class.java:2902)
java.lang.Class.getMethods(Class.java:1615)
sbt.internal.inc.ClassToAPI$.toDefinitions0(ClassToAPI.scala:170)
sbt.internal.inc.ClassToAPI$.$anonfun$toDefinitions$1(ClassToAPI.scala:123)
scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
sbt.internal.inc.ClassToAPI$.toDefinitions(ClassToAPI.scala:123)
sbt.internal.inc.ClassToAPI$.$anonfun$process$1(ClassToAPI.scala:3
```

This happens when it builds Spark with Hadoop 2. It doesn't reproduce when you build this alone. It should follow the sequence of build in the release script.

This is fixed by downgrading. Looks like there is a regression in scala-maven-plugin somewhere between 4.4.0 and 4.3.0.

### Why are the changes needed?

To unblock the release.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

It can be tested as below:

```bash
./dev/create-release/do-release-docker.sh -d $WORKING_DIR
```

Closes #31031 from HyukjinKwon/SPARK-34007.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 356fdc9)
The file was modifiedpom.xml (diff)
Commit 329850c667305053e4433c4c6da0e47b231302d4 by gurwls223
[SPARK-32017][PYTHON][FOLLOW-UP] Rename HADOOP_VERSION to PYSPARK_HADOOP_VERSION in pip installation option

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/29703.
It renames `HADOOP_VERSION` environment variable to `PYSPARK_HADOOP_VERSION` in case `HADOOP_VERSION` is already being used somewhere. Arguably `HADOOP_VERSION` is a pretty common name. I see here and there:
- https://www.ibm.com/support/knowledgecenter/SSZUMP_7.2.1/install_grid_sym/understanding_advanced_edition.html
- https://cwiki.apache.org/confluence/display/ARROW/HDFS+Filesystem+Support
- http://crs4.github.io/pydoop/_pydoop1/installation.html

### Why are the changes needed?

To avoid the environment variables is unexpectedly conflicted.

### Does this PR introduce _any_ user-facing change?

It renames the environment variable but it's not released yet.

### How was this patch tested?

Existing unittests will test.

Closes #31028 from HyukjinKwon/SPARK-32017-followup.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 329850c)
The file was modifiedpython/setup.py (diff)
The file was modifiedpython/docs/source/getting_started/install.rst (diff)
The file was modifiedpython/pyspark/find_spark_home.py (diff)
Commit acf0a4fac2983a89c663d1622bf03a2e5929d121 by gurwls223
[SPARK-33999][BUILD] Make sbt unidoc success with JDK11

### What changes were proposed in this pull request?

This PR fixes an issue that `sbt unidoc` fails with JDK11.
With the current master, `sbt unidoc` fails because the generated Java sources cause syntax error.
As of JDK11, the default doclet seems to refuse such syntax error.

Usually, it's enough to specify  `--ignore-source-errors` option when `javadoc` runs to suppress the syntax error but unfortunately, we will then get an internal error.

```
[error] javadoc: error - An internal exception has occurred.
[error] (java.lang.NullPointerException)
[error] Please file a bug against the javadoc tool via the Java bug reporting page
[error] (http://bugreport.java.com) after checking the Bug Database (http://bugs.java.com)
[error] for duplicates. Include error messages and the following diagnostic in your report. Thank you.
[error] java.lang.NullPointerException
[error] at jdk.compiler/com.sun.tools.javac.code.Types.erasure(Types.java:2340)
[error] at jdk.compiler/com.sun.tools.javac.code.Types$14.visitTypeVar(Types.java:2398)
[error] at jdk.compiler/com.sun.tools.javac.code.Types$14.visitTypeVar(Types.java:2348)
[error] at jdk.compiler/com.sun.tools.javac.code.Type$TypeVar.accept(Type.java:1659)
[error] at jdk.compiler/com.sun.tools.javac.code.Types$DefaultTypeVisitor.visit(Types.java:4857)
[error] at jdk.compiler/com.sun.tools.javac.code.Types.erasure(Types.java:2343)
[error] at jdk.compiler/com.sun.tools.javac.code.Types.erasure(Types.java:2329)
[error] at jdk.compiler/com.sun.tools.javac.model.JavacTypes.erasure(JavacTypes.java:134)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.toolkit.util.Utils$5.visitTypeVariable(Utils.java:1069)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.toolkit.util.Utils$5.visitTypeVariable(Utils.java:1048)
[error] at jdk.compiler/com.sun.tools.javac.code.Type$TypeVar.accept(Type.java:1695)
[error] at java.compiler11.0.9.1/javax.lang.model.util.AbstractTypeVisitor6.visit(AbstractTypeVisitor6.java:104)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.toolkit.util.Utils.asTypeElement(Utils.java:1086)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.LinkInfoImpl.setContext(LinkInfoImpl.java:410)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.LinkInfoImpl.<init>(LinkInfoImpl.java:285)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.LinkFactoryImpl.getTypeParameterLink(LinkFactoryImpl.java:184)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.LinkFactoryImpl.getTypeParameterLinks(LinkFactoryImpl.java:167)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.toolkit.util.links.LinkFactory.getLink(LinkFactory.java:196)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.HtmlDocletWriter.getLink(HtmlDocletWriter.java:679)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.HtmlDocletWriter.addPreQualifiedClassLink(HtmlDocletWriter.java:814)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.HtmlDocletWriter.addPreQualifiedStrongClassLink(HtmlDocletWriter.java:839)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.AbstractTreeWriter.addPartialInfo(AbstractTreeWriter.java:185)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.AbstractTreeWriter.addLevelInfo(AbstractTreeWriter.java:92)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.AbstractTreeWriter.addLevelInfo(AbstractTreeWriter.java:94)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.AbstractTreeWriter.addTree(AbstractTreeWriter.java:129)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.AbstractTreeWriter.addTree(AbstractTreeWriter.java:112)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.PackageTreeWriter.generatePackageTreeFile(PackageTreeWriter.java:115)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.PackageTreeWriter.generate(PackageTreeWriter.java:92)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.formats.html.HtmlDoclet.generatePackageFiles(HtmlDoclet.java:312)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.toolkit.AbstractDoclet.startGeneration(AbstractDoclet.java:210)
[error] at jdk.javadoc/jdk.javadoc.internal.doclets.toolkit.AbstractDoclet.run(AbstractDoclet.java:114)
[error] at jdk.javadoc/jdk.javadoc.doclet.StandardDoclet.run(StandardDoclet.java:72)
[error] at jdk.javadoc/jdk.javadoc.internal.tool.Start.parseAndExecute(Start.java:588)
[error] at jdk.javadoc/jdk.javadoc.internal.tool.Start.begin(Start.java:432)
[error] at jdk.javadoc/jdk.javadoc.internal.tool.Start.begin(Start.java:345)
[error] at jdk.javadoc/jdk.javadoc.internal.tool.Main.execute(Main.java:63)
[error] at jdk.javadoc/jdk.javadoc.internal.tool.Main.main(Main.java:52)
```

I found the internal error happens when a generated Java class is from a Scala class which is package private and generic.
I also found that if we don't generate class hierarchy tree in the JavaDoc, we can suppress the internal error for JDK11 and later.

### Why are the changes needed?

Make the build success with sbt and JDK11.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed the following command successfully finish with JDK8 and JDK11.
```
$ build/sbt -Phive -Phive-thriftserver -Pyarn -Pkubernetes -Pmesos -Pspark-ganglia-lgpl -Pkinesis-asl -Phadoop-cloud clean unidoc
```

I also confirmed html files are successfully generated under `target/javaunidoc`.

Closes #31023 from sarutak/fix-genjavadoc-java11.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: acf0a4f)
The file was modifiedproject/SparkBuild.scala (diff)
Commit 8d09f9649510bf5d812c82b04f7711b9252a7db0 by gurwls223
[SPARK-34010][SQL][DODCS] Use python3 instead of python in SQL documentation build

### What changes were proposed in this pull request?

This PR proposes to use python3 instead of python in SQL documentation build.
After SPARK-29672, we use `sql/create-docs.sh` everywhere in Spark dev. We should fix it in `sql/create-docs.sh` too.
This blocks release because the release container does not have `python` but only `python3`.

### Why are the changes needed?

To unblock the release.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

I manually ran the script

Closes #31041 from HyukjinKwon/SPARK-34010.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 8d09f96)
The file was modifiedsql/create-docs.sh (diff)
Commit 14c2edae7e8e02e18a24862a6c113b02719d4785 by gurwls223
[SPARK-34009][BUILD] To activate profile 'aarch64' based on OS settings

Instead of taking parameter '-Paarch64' when maven build
to activate the profile based on OS settings automatically,
than we can use same command to build on aarch64.

### What changes were proposed in this pull request?
Activate profile 'aarch64' based on OS

### Why are the changes needed?
After this change, we build spark using the same command for aarch64 as x86.

### Does this PR introduce _any_ user-facing change?
No.
After this change, no need to taking parameter '-Paarch64' when build, but take the parameter works also.

### How was this patch tested?
ARM daily CI.

Closes #31036 from huangtianhua/SPARK-34009.

Authored-by: huangtianhua <huangtianhua223@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 14c2eda)
The file was modifiedpom.xml (diff)
Commit cc1d9d25fb4c2e4af912d6f9802de8f351c32deb by wenchen
[SPARK-33542][SQL] Group exception messages in catalyst/catalog

### What changes were proposed in this pull request?
This PR group exception messages in `/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog`.

### Why are the changes needed?
It will largely help with standardization of error messages and its maintenance.

### Does this PR introduce _any_ user-facing change?
No. Error messages remain unchanged.

### How was this patch tested?
No new tests - pass all original tests to make sure it doesn't break any existing behavior.

Closes #30870 from beliefer/SPARK-33542.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: cc1d9d2)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/QueryCompilationErrors.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/functionResources.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/QueryExecutionErrors.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/GlobalTempViewManager.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala (diff)
Commit 171db85aa2cdacf39caeb26162569275076fd52f by dhyun
[SPARK-33874][K8S][FOLLOWUP] Handle long lived sidecars - clean up logging

### What changes were proposed in this pull request?

Switch log level from warn to debug when the spark container is not present in the pod's container statuses.

### Why are the changes needed?

There are many non-critical situations where the Spark container may not be present, and the warning log level is too high.

### Does this PR introduce _any_ user-facing change?

Log message change.

### How was this patch tested?

N/A

Closes #31047 from holdenk/SPARK-33874-follow-up.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 171db85)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala (diff)
Commit e279ed304475a6d5a9fbf739fe9ed32ef58171cb by yamamuro
[SPARK-34012][SQL] Keep behavior consistent when conf `spark.sql.legacy.parser.havingWithoutGroupByAsWhere` is true with migration guide

### What changes were proposed in this pull request?
In https://github.com/apache/spark/pull/22696 we support HAVING without GROUP BY means global aggregate
But since we treat having as Filter before, in this way will cause a lot of analyze error, after https://github.com/apache/spark/pull/28294 we use `UnresolvedHaving` to instead `Filter` to solve such problem, but break origin logical about treat `SELECT 1 FROM range(10) HAVING true` as `SELECT 1 FROM range(10) WHERE true`   .
This PR fix this issue and add UT.

### Why are the changes needed?
Keep consistent behavior of migration guide.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added UT

Closes #31039 from AngersZhuuuu/SPARK-25780-Follow-up.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: e279ed3)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/group-by.sql (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/group-by.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
Commit b77d11dfd942ee2164dde2f5c25c6aaed65c444c by gurwls223
[SPARK-34011][SQL] Refresh cache in `ALTER TABLE .. RENAME TO PARTITION`

### What changes were proposed in this pull request?
1. Invoke `refreshTable()` from `AlterTableRenamePartitionCommand.run()` after partitions renaming. In particular, this re-creates the cache associated with the modified table.
2. Refresh the cache associated with tables from v2 table catalogs in the `ALTER TABLE .. RENAME TO PARTITION` command.

### Why are the changes needed?
This fixes the issues portrayed by the example:
```sql
spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED BY (part0);
spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
spark-sql> CACHE TABLE tbl1;
spark-sql> SELECT * FROM tbl1;
0 0
1 1
spark-sql> ALTER TABLE tbl1 PARTITION (part0=0) RENAME TO PARTITION (part=2);
spark-sql> SELECT * FROM tbl1;
0 0
1 1
```
The last query must not return `0 2` since `0  0` was renamed by previous command.

### Does this PR introduce _any_ user-facing change?
Yes. After the changes for the example above:
```sql
...
spark-sql> ALTER TABLE tbl1 PARTITION (part=0) RENAME TO PARTITION (part=2);
spark-sql> SELECT * FROM tbl1;
0 2
1 1
```

### How was this patch tested?
By running the affected test suite:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRenamePartitionSuite"
```

Closes #31044 from MaxGekk/rename-partition-refresh-cache.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: b77d11d)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableRenamePartitionSuiteBase.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableRenamePartitionExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
Commit 3d8ee492d6cd0c086988f2970bc6ea1d70a98368 by gurwls223
[SPARK-34015][R] Fixing input timing in gapply

### What changes were proposed in this pull request?

When sparkR is run at log level INFO, a summary of how the worker spent its time processing the partition is printed. There is a logic error where it is over-reporting the time inputting rows.

In detail: the variable inputElap in a wider context is used to mark the end of reading rows, but in the part changed here it was used as a local variable for measuring the beginning of compute time in a loop over the groups in the partition. Thus, the error is not observable if there is only one group per partition, which is what you get in unit tests.

For our application, here's what a log entry looks like before these changes were applied:

`20/10/09 04:08:58 INFO RRunner: Times: boot = 0.013 s, init = 0.005 s, broadcast = 0.000 s, read-input = 529.471 s, compute = 492.037 s, write-output = 0.020 s, total = 1021.546 s`

this indicates that we're spending more time reading rows than operating on the rows.

After these changes, it looks like this:

`20/12/15 06:43:29 INFO RRunner: Times: boot = 0.013 s, init = 0.010 s, broadcast = 0.000 s, read-input = 120.275 s, compute = 1680.161 s, write-output = 0.045 s, total = 1812.553 s
`
### Why are the changes needed?

Metrics shouldn't mislead?

### Does this PR introduce _any_ user-facing change?

Aside from no longer misleading, no

### How was this patch tested?

unit tests passed. Field test results seem plausible

Closes #31021 from WamBamBoozle/input_timing.

Authored-by: Tom.Howland <Tom.Howland@target.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 3d8ee49)
The file was modifiedR/pkg/inst/worker/worker.R (diff)
Commit 29510821a0e3b1e09a7710ed02a0fa1caab506af by dhyun
[SPARK-33029][CORE][WEBUI] Fix the UI executor page incorrectly marking the driver as excluded

### What changes were proposed in this pull request?
Filter out the driver entity when updating the exclusion status of live executors(including the driver), so the UI won't be marked as excluded in the UI even if the node that hosts the driver has been marked as excluded.

### Why are the changes needed?
Before this change, if we run spark with the standalone mode and with spark.blacklist.enabled=true. The driver will be marked as excluded when the host that hosts that driver has been marked as excluded. While it's incorrect because the exclude list feature will exclude executors only and the driver is still active.
![image](https://user-images.githubusercontent.com/26694233/103238740-35c05180-4911-11eb-99a2-c87c059ba0cf.png)
After the fix, the driver won't be marked as excluded.
![image](https://user-images.githubusercontent.com/26694233/103238806-6f915800-4911-11eb-80d5-3c99266cfd0a.png)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual test. Reopen the UI and see the driver is no longer marked as excluded.

Closes #30954 from baohe-zhang/SPARK-33029.

Authored-by: Baohe Zhang <baohe.zhang@verizonmedia.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 2951082)
The file was modifiedcore/src/test/resources/HistoryServerExpectations/executor_memory_usage_expectation.json (diff)
The file was modifiedcore/src/test/resources/HistoryServerExpectations/executor_node_excludeOnFailure_expectation.json (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/status/AppStatusListener.scala (diff)
Commit 2ab77d634f2e87b080786f4f39cb17e0994bc550 by dhyun
[SPARK-34004][SQL] Change FrameLessOffsetWindowFunction as sealed abstract class

### What changes were proposed in this pull request?
Change `FrameLessOffsetWindowFunction` as sealed abstract class so that simplify pattern match.

### Why are the changes needed?
Simplify pattern match

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Jenkins test

Closes #31026 from beliefer/SPARK-30789-followup.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 2ab77d6)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala (diff)
Commit b1c4fc7fc71530d2d257500484f959282f5b6d44 by dhyun
[SPARK-34008][BUILD] Upgrade derby to 10.14.2.0

### What changes were proposed in this pull request?

This PR upgrades `derby` to `10.14.2.0`.

You can check the major changes from the following URLs.

* 10.13.1.1 http://svn.apache.org/repos/asf/db/derby/code/tags/10.13.1.1/RELEASE-NOTES.html
* 10.14.1.0 http://svn.apache.org/repos/asf/db/derby/code/tags/10.14.1.0/RELEASE-NOTES.html
* 10.14.2.0 http://svn.apache.org/repos/asf/db/derby/code/tags/10.14.2.0/RELEASE-NOTES.html

### Why are the changes needed?

It seems to be the final release which supports `JDK8` as the minimum required version.
After `10.15.1.3`, the minimum required version is `JDK9`.
https://db.apache.org/derby/

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #31032 from sarutak/upgrade-derby.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: b1c4fc7)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
Commit fa9309001a47a2b87f7a735f964537886ed9bd4c by dhyun
[SPARK-33635][SS] Adjust the order of check in KafkaTokenUtil.needTokenUpdate to remedy perf regression

### What changes were proposed in this pull request?

This PR proposes to adjust the order of check in KafkaTokenUtil.needTokenUpdate, so that short-circuit applies on the non-delegation token cases (insecure + secured without delegation token) and remedies the performance regression heavily.

### Why are the changes needed?

There's a serious performance regression between Spark 2.4 vs Spark 3.0 on read path against Kafka data source.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually ran a reproducer (https://github.com/codegorillauk/spark-kafka-read with modification to just count instead of writing to Kafka topic) with measuring the time.

> the branch applying the change with adding measurement

https://github.com/HeartSaVioR/spark/commits/debug-SPARK-33635-v3.0.1

> the branch only adding measurement

https://github.com/HeartSaVioR/spark/commits/debug-original-ver-SPARK-33635-v3.0.1

> the result (before the fix)

count: 10280000
Took 41.634007047 secs

21/01/06 13:16:07 INFO KafkaDataConsumer: debug ver. 17-original
21/01/06 13:16:07 INFO KafkaDataConsumer: Total time taken to retrieve: 82118 ms

> the result (after the fix)

count: 10280000
Took 7.964058475 secs

21/01/06 13:08:22 INFO KafkaDataConsumer: debug ver. 17
21/01/06 13:08:22 INFO KafkaDataConsumer: Total time taken to retrieve: 987 ms

Closes #31056 from HeartSaVioR/SPARK-33635.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: fa93090)
The file was modifiedexternal/kafka-0-10-token-provider/src/main/scala/org/apache/spark/kafka010/KafkaTokenUtil.scala (diff)
Commit c0d0dbabdb264180d5a88e2656e4a2fe353f21f1 by dhyun
[SPARK-33934][SQL][FOLLOW-UP] Use SubProcessor's exit code as assert condition to fix flaky test

### What changes were proposed in this pull request?
Follow comment and fix. flaky test https://github.com/apache/spark/pull/30973#issuecomment-754852130.
This flaky test is similar as https://github.com/apache/spark/pull/30896

Some task's failed with root cause but in driver may return error without root cause , change. UT to check with status exit code since different root cause's exit code is not same.

### Why are the changes needed?
Fix flaky test

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existed UT

Closes #31046 from AngersZhuuuu/SPARK-33934-FOLLOW-UP.

Lead-authored-by: angerszhu <angers.zhu@gmail.com>
Co-authored-by: AngersZhuuuu <angers.zhu@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: c0d0dba)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/test_script.py (diff)
Commit 45a4ff8e5472ed724b1bba40ce4ee5d314bf72c2 by dhyun
[SPARK-33948][SQL] Fix CodeGen error of MapObjects.doGenCode method in Scala 2.13

### What changes were proposed in this pull request?
`MapObjects.doGenCode` method will generate wrong code when `inputDataType` is `ArrayBuffer`.

For example `encode/decode for Tuple2: (ArrayBuffer[(String, String)],ArrayBuffer((a,b))) (codegen path)` in `ExpressionEncoderSuite`, the error generated code part as follow:

```
/* 126 */   private scala.collection.mutable.ArrayBuffer MapObjects_0(InternalRow i) {
/* 127 */     boolean isNull_4 = i.isNullAt(1);
/* 128 */     ArrayData value_4 = isNull_4 ?
/* 129 */     null : (i.getArray(1));
/* 130 */     scala.collection.mutable.ArrayBuffer value_3 = null;
/* 131 */
/* 132 */     if (!isNull_4) {
/* 133 */
/* 134 */       int dataLength_0 = value_4.numElements();
/* 135 */
/* 136 */       scala.Tuple2[] convertedArray_0 = null;
/* 137 */       convertedArray_0 = new scala.Tuple2[dataLength_0];
/* 138 */
/* 139 */
/* 140 */       int loopIndex_0 = 0;
/* 141 */
/* 142 */       while (loopIndex_0 < dataLength_0) {
/* 143 */         value_MapObject_lambda_variable_1 = (InternalRow) (value_4.getStruct(loopIndex_0, 2));
/* 144 */         isNull_MapObject_lambda_variable_1 = value_4.isNullAt(loopIndex_0);
/* 145 */
/* 146 */         boolean isNull_5 = false;
/* 147 */         scala.Tuple2 value_5 = null;
/* 148 */         if (!false && isNull_MapObject_lambda_variable_1) {
/* 149 */
/* 150 */           isNull_5 = true;
/* 151 */           value_5 = ((scala.Tuple2)null);
/* 152 */         } else {
/* 153 */           scala.Tuple2 value_13 = NewInstance_0(i);
/* 154 */           isNull_5 = false;
/* 155 */           value_5 = value_13;
/* 156 */         }
/* 157 */         if (isNull_5) {
/* 158 */           convertedArray_0[loopIndex_0] = null;
/* 159 */         } else {
/* 160 */           convertedArray_0[loopIndex_0] = value_5;
/* 161 */         }
/* 162 */
/* 163 */         loopIndex_0 += 1;
/* 164 */       }
/* 165 */
/* 166 */       value_3 = new org.apache.spark.sql.catalyst.util.GenericArrayData(convertedArray_0);
/* 167 */     }
/* 168 */     globalIsNull_0 = isNull_4;
/* 169 */     return value_3;
/* 170 */   }

```

Line 166 in generated code try to assign `GenericArrayData`  to `value_3(ArrayBuffer)` because `ArrayBuffer` type can't match `s.c.i.Seq` branch in Scala 2.13 in `MapObjects.doGenCode` method now.

So this pr change to use `s.c.Seq` instead of `Seq` alias to let `ArrayBuffer` type can enter  the same branch as Scala 2.12.

After this pr the generate code when `inputDataType` is `ArrayBuffer` as follow:

```
/* 126 */   private scala.collection.mutable.ArrayBuffer MapObjects_0(InternalRow i) {
/* 127 */     boolean isNull_4 = i.isNullAt(1);
/* 128 */     ArrayData value_4 = isNull_4 ?
/* 129 */     null : (i.getArray(1));
/* 130 */     scala.collection.mutable.ArrayBuffer value_3 = null;
/* 131 */
/* 132 */     if (!isNull_4) {
/* 133 */
/* 134 */       int dataLength_0 = value_4.numElements();
/* 135 */
/* 136 */       scala.collection.mutable.Builder collectionBuilder_0 = scala.collection.mutable.ArrayBuffer$.MODULE$.newBuilder();
/* 137 */       collectionBuilder_0.sizeHint(dataLength_0);
/* 138 */
/* 139 */
/* 140 */       int loopIndex_0 = 0;
/* 141 */
/* 142 */       while (loopIndex_0 < dataLength_0) {
/* 143 */         value_MapObject_lambda_variable_1 = (InternalRow) (value_4.getStruct(loopIndex_0, 2));
/* 144 */         isNull_MapObject_lambda_variable_1 = value_4.isNullAt(loopIndex_0);
/* 145 */
/* 146 */         boolean isNull_5 = false;
/* 147 */         scala.Tuple2 value_5 = null;
/* 148 */         if (!false && isNull_MapObject_lambda_variable_1) {
/* 149 */
/* 150 */           isNull_5 = true;
/* 151 */           value_5 = ((scala.Tuple2)null);
/* 152 */         } else {
/* 153 */           scala.Tuple2 value_13 = NewInstance_0(i);
/* 154 */           isNull_5 = false;
/* 155 */           value_5 = value_13;
/* 156 */         }
/* 157 */         if (isNull_5) {
/* 158 */           collectionBuilder_0.$plus$eq(null);
/* 159 */         } else {
/* 160 */           collectionBuilder_0.$plus$eq(value_5);
/* 161 */         }
/* 162 */
/* 163 */         loopIndex_0 += 1;
/* 164 */       }
/* 165 */
/* 166 */       value_3 = (scala.collection.mutable.ArrayBuffer) collectionBuilder_0.result();
/* 167 */     }
/* 168 */     globalIsNull_0 = isNull_4;
/* 169 */     return value_3;
/* 170 */   }
```

### Why are the changes needed?
Bug fix in Scala 2.13

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass the Jenkins or GitHub Action
- Manual test `sql/catalyst` and `sql/core` in Scala 2.13 passed

```
mvn clean test -pl sql/catalyst -Pscala-2.13

Run completed in 11 minutes, 23 seconds.
Total number of tests run: 4711
Suites: completed 261, aborted 0
Tests: succeeded 4711, failed 0, canceled 0, ignored 5, pending 0
All tests passed.
```

- Manual cherry-pick this pr to branch 3.1 and  test`sql/catalyst`  in Scala 2.13 passed

```
mvn clean test -pl sql/catalyst -Pscala-2.13

Run completed in 11 minutes, 18 seconds.
Total number of tests run: 4655
Suites: completed 256, aborted 0
Tests: succeeded 4655, failed 0, canceled 0, ignored 5, pending 0
```

Closes #31055 from LuciferYang/SPARK-33948.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 45a4ff8)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala (diff)
Commit 26d8df300a1a57e220b1a0f9814795f68101f28b by wenchen
[SPARK-33938][SQL] Optimize Like Any/All by LikeSimplification

### What changes were proposed in this pull request?
We should optimize Like Any/All by LikeSimplification to improve performance.

### Why are the changes needed?
Optimize Like Any/All

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Jenkins test.

Closes #30975 from beliefer/SPARK-33938.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 26d8df3)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LikeSimplificationSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala (diff)
Commit f64dfa8727b785f333a0c10f5f7175ab51f22764 by prashsh1
[SPARK-32221][K8S] Avoid possible errors due to incorrect file size or type supplied in spark conf

### What changes were proposed in this pull request?

Skip files if they are binary or very large to fit the configMap's max size.

### Why are the changes needed?

Config map cannot hold binary files and there is also a limit on how much data a configMap can hold.
This limit can be configured by the k8s cluster admin. This PR, skips such files (with a warning) instead of failing with weird runtime errors.
If such files are not skipped, then it would result in mount errors or encoding errors (if binary files are submitted).

### Does this PR introduce _any_ user-facing change?

yes, in simple words avoids possible errors due to negligence (for example, placing a large file or a binary file in SPARK_CONF_DIR) and thus improves user experience.

### How was this patch tested?

Added relevant tests and improved existing tests.

Closes #30472 from ScrapCodes/SPARK-32221/avoid-conf-propagate-errors.

Lead-authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Co-authored-by: Prashant Sharma <prashant@apache.org>
Signed-off-by: Prashant Sharma <prashsh1@in.ibm.com>
(commit: f64dfa8)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/ClientSuite.scala (diff)
The file was addedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtilsSuite.scala
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientUtils.scala (diff)
Commit ff284fb6ac624b2f38ef12f9b840be3077cd27a6 by gurwls223
[SPARK-30681][PYTHON][FOLLOW-UP] Keep the name similar with Scala side in higher order functions

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/27406. It fixes the naming to match with Scala side.

Note that there are a bit of inconsistency already e.g.) `col`, `e`, `expr` and `column`. This part I did not change but other names like `zero` vs `initialValue` or `col1`/`col2` vs `left`/`right` looks unnecessary.

### Why are the changes needed?

To make the usage similar with Scala side, and for consistency.

### Does this PR introduce _any_ user-facing change?

No, this is not released yet.

### How was this patch tested?

GitHub Actions and Jenkins build will test it out.

Closes #31062 from HyukjinKwon/SPARK-30681.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: ff284fb)
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was modifiedpython/pyspark/sql/functions.pyi (diff)
Commit 0d86a02ffbaf53c403a4c68bac0041e84acb0cdd by gurwls223
[SPARK-34022][DOCS] Support latest mkdocs in SQL built-in function docs

### What changes were proposed in this pull request?

This PR adds the support of the latest mkdocs, and makes the sidebar properly show. It works in lower versions too.

Before:

![Screen Shot 2021-01-06 at 5 11 56 PM](https://user-images.githubusercontent.com/6477701/103745131-4e7fe400-5042-11eb-9c09-84f9f95e9fb9.png)

After:

![Screen Shot 2021-01-06 at 5 10 53 PM](https://user-images.githubusercontent.com/6477701/103745139-5049a780-5042-11eb-8ded-30b6f7ef48aa.png)

### Why are the changes needed?

This is a regression in the documentation.

### Does this PR introduce _any_ user-facing change?

Technically no. It's not related yet. It fixes the list on the sidebar appears properly.

### How was this patch tested?

Manually built the docs via `./sql/create-docs.sh` and `open ./sql/site/index.html`

Closes #31061 from HyukjinKwon/SPARK-34022.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 0d86a02)
The file was modifiedsql/gen-sql-api-docs.py (diff)
Commit 6788304240c416d173ebdb3d544f3361c6b9fe8e by yamamuro
[SPARK-33977][SQL][DOCS] Add doc for "'like any' and 'like all' operators"

### What changes were proposed in this pull request?
Add doc for 'like any' and 'like all' operators in sql-ref-syntx-qry-select-like.cmd

### Why are the changes needed?
make the usage of 'like any' and 'like all' known to more users

### Does this PR introduce _any_ user-facing change?
Yes.

<img width="687" alt="Screen Shot 2021-01-06 at 21 10 38" src="https://user-images.githubusercontent.com/692303/103767385-dc1ffb80-5063-11eb-9529-89479531425f.png">
<img width="495" alt="Screen Shot 2021-01-06 at 21 11 06" src="https://user-images.githubusercontent.com/692303/103767391-dde9bf00-5063-11eb-82ce-63bdd11593a1.png">
<img width="406" alt="Screen Shot 2021-01-06 at 21 11 20" src="https://user-images.githubusercontent.com/692303/103767396-df1aec00-5063-11eb-8e81-a192e6c72431.png">

### How was this patch tested?
No tests

Closes #31008 from beliefer/SPARK-33977.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: 6788304)
The file was modifieddocs/sql-ref-syntax-qry-select-like.md (diff)
Commit 3cdc4ef5b41ce1254610436a8721ea517124d62e by wenchen
[SPARK-32685][SQL][FOLLOW-UP] Update migration guide about change default filed.delim to '\t' when user specifies serde

### What changes were proposed in this pull request?
Update migration guide according to https://github.com/apache/spark/pull/30942#issuecomment-755054562

### Why are the changes needed?
update migration guide.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes #31051 from AngersZhuuuu/SPARK-32685-FOLLOW-UP.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 3cdc4ef)
The file was modifieddocs/sql-migration-guide.md (diff)
Commit a0269bb419a37c31850e02884385b889cd153133 by dhyun
[SPARK-34022][DOCS][FOLLOW-UP] Fix typo in SQL built-in function docs

### What changes were proposed in this pull request?

This PR is a follow-up of #31061. It fixes a typo in a document: `Finctions` -> `Functions`

### Why are the changes needed?

Make the change better documented.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

Closes #31069 from kiszk/SPARK-34022-followup.

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: a0269bb)
The file was modifiedsql/gen-sql-api-docs.py (diff)
Commit 8bb70bf0d646f6d54d17690d23ee935e452e747e by dhyun
[SPARK-34029][SQL][TESTS] Add OrcEncryptionSuite and FakeKeyProvider

### What changes were proposed in this pull request?

This PR aims to add a basis for columnar encryption test framework by add `OrcEncryptionSuite` and `FakeKeyProvider`.

Please note that we will improve more in both Apache Spark and Apache ORC in Apache Spark 3.2.0 timeframe.

### Why are the changes needed?

Apache ORC 1.6 supports columnar encryption.

### Does this PR introduce _any_ user-facing change?

No. This is for a test case.

### How was this patch tested?

Pass the newly added test suite.

Closes #31065 from dongjoon-hyun/SPARK-34029.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 8bb70bf)
The file was modifiedproject/SparkBuild.scala (diff)
The file was addedsql/core/src/test/resources/META-INF/services/org.apache.hadoop.crypto.key.KeyProviderFactory
The file was addedsql/core/src/test/java/test/org/apache/spark/sql/execution/datasources/orc/FakeKeyProvider.java
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala
Commit f9daf035f473fea12a2ee67428db8d78f29973d5 by dhyun
[SPARK-33806][SQL][FOLLOWUP] Fold RepartitionExpression num partition should check if partition expression is empty

### What changes were proposed in this pull request?

Add check partition expressions is empty.

### Why are the changes needed?

We should keep `spark.range(1).hint("REPARTITION_BY_RANGE")` has default shuffle number instead of 1.

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

Add test.

Closes #31074 from ulysses-you/SPARK-33806-FOLLOWUP.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: f9daf03)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)