Changes

Summary

  1. [MINOR][INFRA] Use the latest image for GitHub Action jobs (commit: f94cb53) (details)
  2. [SPARK-31953][SS] Add Spark Structured Streaming History Server Support (commit: 4f96670) (details)
  3. [SPARK-33610][ML] Imputer transform skip duplicate head() job (commit: 90d4d7d) (details)
  4. [SPARK-32896][SS][FOLLOW-UP] Rename the API to `toTable` (commit: 878cc0e) (details)
  5. [SPARK-22798][PYTHON][ML][FOLLOWUP] Add labelsArray to PySpark (commit: 0880989) (details)
  6. [SPARK-33636][PYTHON][ML][FOLLOWUP] Update since tag of labelsArray in (commit: 3b2ff16) (details)
  7. [SPARK-20044][SQL] Add new function DATE_FROM_UNIX_DATE and UNIX_DATE (commit: ff13f57) (details)
  8. [SPARK-26218][SQL][FOLLOW UP] Fix the corner case of codegen when (commit: 512fb32) (details)
  9. [SPARK-30098][SQL] Add a configuration to use default datasource as (commit: 0706e64) (details)
  10. [SPARK-33629][PYTHON] Make spark.buffer.size configuration visible on (commit: bd71186) (details)
  11. [SPARK-33623][SQL] Add canDeleteWhere to SupportsDelete (commit: aa13e20) (details)
  12. [SPARK-33634][SQL][TESTS] Use Analyzer in PlanResolutionSuite (commit: 63f9d47) (details)
  13. [SPARK-33520][ML][PYSPARK] make (commit: 7e759b2) (details)
  14. [SPARK-33650][SQL] Fix the error from ALTER TABLE .. ADD/DROP PARTITION (commit: 8594958) (details)
  15. [SPARK-33649][SQL][DOC] Improve the doc of spark.sql.ansi.enabled (commit: 29e415d) (details)
  16. [SPARK-32405][SQL][FOLLOWUP] Remove USING _ in CREATE TABLE in (commit: e22ddb6) (details)
  17. [SPARK-33142][SPARK-33647][SQL] Store SQL text for SQL temp view (commit: e02324f) (details)
  18. [SPARK-33430][SQL] Support namespaces in JDBC v2 Table Catalog (commit: 15579ba) (details)
  19. [SPARK-33658][SQL] Suggest using Datetime conversion functions for (commit: e838066) (details)
  20. [SPARK-33571][SQL][DOCS] Add a ref to INT96 config from the doc for (commit: 94c144b) (details)
  21. [SPARK-33577][SS] Add support for V1Table in stream writer table API and (commit: 325abf7) (details)
  22. [SPARK-33656][TESTS] Add option to keep container after tests finish for (commit: 91baab7) (details)
  23. [SPARK-33640][TESTS] Extend connection timeout to DB server for (commit: 976e897) (details)
  24. [SPARK-27237][SS] Introduce State schema validation among query restart (commit: 233a849) (details)
  25. [SPARK-33615][K8S] Make 'spark.archives' working in Kubernates (commit: 990bee9) (details)
  26. [SPARK-33141][SQL][FOLLOW-UP] Store the max nested view depth in (commit: acc211d) (details)
  27. [SPARK-33660][DOCS][SS] Fix Kafka Headers Documentation (commit: d671e05) (details)
  28. [SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT (commit: de9818f) (details)
  29. [SPARK-33141][SQL][FOLLOW-UP] Fix Scala 2.13 compilation (commit: b6b45bc) (details)
  30. [SPARK-33472][SQL][FOLLOW-UP] Update RemoveRedundantSorts comment (commit: 960d6af) (details)
  31. [SPARK-33651][SQL] Allow CREATE EXTERNAL TABLE with LOCATION for data (commit: 1b4e35d) (details)
Commit f94cb53a90558285541090d484a6ae9938fe02e8 by gurwls223
[MINOR][INFRA] Use the latest image for GitHub Action jobs

### What changes were proposed in this pull request?

Currently, GitHub Action is using two docker images.

```
$ git grep dongjoon/apache-spark-github-action-image
.github/workflows/build_and_test.yml:      image: dongjoon/apache-spark-github-action-image:20201015
.github/workflows/build_and_test.yml:      image: dongjoon/apache-spark-github-action-image:20201025
```

This PR aims to make it consistent by using the latest one.
```
- image: dongjoon/apache-spark-github-action-image:20201015
+ image: dongjoon/apache-spark-github-action-image:20201025
```

### Why are the changes needed?

This is for better maintainability. The image size is almost the same.
```
$ docker images | grep 202010
dongjoon/apache-spark-github-action-image                       20201025               37adfa3d226a   5 weeks ago     2.18GB
dongjoon/apache-spark-github-action-image                       20201015               ff6fee8dc36d   6 weeks ago     2.16GB
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action.

Closes #30578 from dongjoon-hyun/SPARK-MINOR.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: f94cb53)
The file was modified.github/workflows/build_and_test.yml (diff)
Commit 4f9667035886a67e6c9a4e8fad2efa390e87ca68 by zsxwing
[SPARK-31953][SS] Add Spark Structured Streaming History Server Support

### What changes were proposed in this pull request?

Add Spark Structured Streaming History Server Support.

### Why are the changes needed?

Add a streaming query history server plugin.

![image](https://user-images.githubusercontent.com/7402327/84248291-d26cfe80-ab3b-11ea-86d2-98205fa2bcc4.png)
![image](https://user-images.githubusercontent.com/7402327/84248347-e44ea180-ab3b-11ea-81de-eefe207656f2.png)
![image](https://user-images.githubusercontent.com/7402327/84248396-f0d2fa00-ab3b-11ea-9b0d-e410115471b0.png)

- Follow-ups
  - Query duration should not update in history UI.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Update UT.

Closes #28781 from uncleGen/SPARK-31953.

Lead-authored-by: uncleGen <hustyugm@gmail.com>
Co-authored-by: Genmao Yu <hustyugm@gmail.com>
Co-authored-by: Yuanjian Li <yuanjian.li@databricks.com>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
(commit: 4f96670)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryHistoryServerPlugin.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryTab.scala (diff)
The file was addedsql/core/src/test/resources/spark-events/local-1596020211915
The file was addedsql/core/src/test/scala/org/apache/spark/sql/streaming/ui/StreamingQueryHistorySuite.scala
The file was addedsql/core/src/test/scala/org/apache/spark/deploy/history/Utils.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/ui/UIUtils.scala (diff)
The file was modifieddev/.rat-excludes (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryStatusStore.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPageSuite.scala (diff)
The file was modifiedsql/core/src/main/resources/META-INF/services/org.apache.spark.status.AppHistoryServerPlugin (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryPage.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListenerSuite.scala (diff)
Commit 90d4d7d43ffd29ad780dc7c5588b7e55a73aba97 by ruifengz
[SPARK-33610][ML] Imputer transform skip duplicate head() job

### What changes were proposed in this pull request?
on each call of `transform`, a head() job will be triggered, which can be skipped by using a lazy var.

### Why are the changes needed?
avoiding duplicate head() jobs

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing tests

Closes #30550 from zhengruifeng/imputer_transform.

Authored-by: Ruifeng Zheng <ruifengz@foxmail.com>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(commit: 90d4d7d)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala (diff)
Commit 878cc0e6e95f300a0a58c742654f53a28b30b174 by zsxwing
[SPARK-32896][SS][FOLLOW-UP] Rename the API to `toTable`

### What changes were proposed in this pull request?
As the discussion in https://github.com/apache/spark/pull/30521#discussion_r531463427, rename the API to `toTable`.

### Why are the changes needed?
Rename the API for further extension and accuracy.

### Does this PR introduce _any_ user-facing change?
Yes, it's an API change but the new API is not released yet.

### How was this patch tested?
Existing UT.

Closes #30571 from xuanyuanking/SPARK-32896-follow.

Authored-by: Yuanjian Li <yuanjian.li@databricks.com>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
(commit: 878cc0e)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala (diff)
Commit 08809897554a48065c2280c709d7efba28fa441d by gurwls223
[SPARK-22798][PYTHON][ML][FOLLOWUP] Add labelsArray to PySpark StringIndexer

### What changes were proposed in this pull request?

This is a followup to add missing `labelsArray` to PySpark `StringIndexer`.

### Why are the changes needed?

`labelsArray` is for multi-column case for `StringIndexer`. We should provide this accessor at PySpark side too.

### Does this PR introduce _any_ user-facing change?

Yes, `labelsArray` was missing in PySpark `StringIndexer` in Spark 3.0.

### How was this patch tested?

Unit test.

Closes #30579 from viirya/SPARK-22798-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 0880989)
The file was modifiedpython/pyspark/ml/feature.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_feature.py (diff)
Commit 3b2ff16ee6e457daade0ecb9f96955c8ed73f2a5 by gurwls223
[SPARK-33636][PYTHON][ML][FOLLOWUP] Update since tag of labelsArray in StringIndexer

### What changes were proposed in this pull request?

This is to update `labelsArray`'s since tag.

### Why are the changes needed?

The original change was backported to branch-3.0 for 3.0.2 version. So it is better to update the since tag to reflect the fact.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A. Just tag change.

Closes #30582 from viirya/SPARK-33636-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 3b2ff16)
The file was modifiedpython/pyspark/ml/feature.py (diff)
Commit ff13f574e67ff9e2c38167368dc6190455e8ed7f by wenchen
[SPARK-20044][SQL] Add new function DATE_FROM_UNIX_DATE and UNIX_DATE

### What changes were proposed in this pull request?

Add new functions DATE_FROM_UNIX_DATE and UNIX_DATE for conversion between Date type and Numeric types.

### Why are the changes needed?

1. Explicit conversion between Date type and Numeric types is disallowed in ANSI mode. We need to provide new functions for users to complete the conversion.

2. We have introduced new functions from Bigquery for conversion between Timestamp type and Numeric types: TIMESTAMP_SECONDS, TIMESTAMP_MILLIS, TIMESTAMP_MICROS , UNIX_SECONDS, UNIX_MILLIS, and UNIX_MICROS. It makes sense to add functions for conversion between Date type and Numeric types as well.

### Does this PR introduce _any_ user-facing change?

Yes, two new datetime functions are added.

### How was this patch tested?

Unit tests

Closes #30588 from gengliangwang/dateToNumber.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: ff13f57)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/datetime.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime-legacy.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/datetime.sql (diff)
The file was modifiedsql/core/src/test/resources/sql-functions/sql-expression-schema.md (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala (diff)
Commit 512fb32b38e4694abd9f667581cdd5e99dee811f by wenchen
[SPARK-26218][SQL][FOLLOW UP] Fix the corner case of codegen when casting float to Integer

### What changes were proposed in this pull request?
This is a followup of [#27151](https://github.com/apache/spark/pull/27151). It fixes the same issue for the codegen path.

### Why are the changes needed?
Result corrupt.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added Unit test.

Closes #30585 from luluorta/SPARK-26218.

Authored-by: luluorta <luluorta@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 512fb32)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala (diff)
Commit 0706e64c49f66431560cdbecb28adcda244c3342 by wenchen
[SPARK-30098][SQL] Add a configuration to use default datasource as provider for CREATE TABLE command

### What changes were proposed in this pull request?

For CRETE TABLE [AS SELECT] command, creates native Parquet table if neither USING nor STORE AS is specified and `spark.sql.legacy.createHiveTableByDefault` is false.

This is a retry after we unify the CREATE TABLE syntax. It partially reverts https://github.com/apache/spark/commit/d2bec5e265e0aa4fa527c3f43cfe738cdbdc4598

This PR allows `CREATE EXTERNAL TABLE` when `LOCATION` is present. This was not allowed for data source tables before, which is an unnecessary behavior different with hive tables.

### Why are the changes needed?

Changing from Hive text table to native Parquet table has many benefits:
1. be consistent with `DataFrameWriter.saveAsTable`.
2. better performance
3. better support for nested types (Hive text table doesn't work well with nested types, e.g. `insert into t values struct(null)` actually inserts a null value not `struct(null)` if `t` is a Hive text table, which leads to wrong result)
4. better interoperability as Parquet is a more popular open file format.

### Does this PR introduce _any_ user-facing change?

No by default. If the config is set, the behavior change is described below:

Behavior-wise, the change is very small as the native Parquet table is also Hive-compatible. All the Spark DDL commands that works for hive tables also works for native Parquet tables, with two exceptions: `ALTER TABLE SET [SERDE | SERDEPROPERTIES]` and `LOAD DATA`.

char/varchar behavior has been taken care by https://github.com/apache/spark/pull/30412, and there is no behavior difference between data source and hive tables.

One potential issue is `CREATE TABLE ... LOCATION ...` while users want to directly access the files later. It's more like a corner case and the legacy config should be good enough.

Another potential issue is users may use Spark to create the table and then use Hive to add partitions with different serde. This is not allowed for Spark native tables.

### How was this patch tested?

Re-enable the tests

Closes #30554 from cloud-fan/create-table.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 0706e64)
The file was modifiedsql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/QueryPartitionSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveShowCreateTableSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala (diff)
Commit bd711863fdcdde21a7d64de8a9b6b7a8bf7c19ec by gurwls223
[SPARK-33629][PYTHON] Make spark.buffer.size configuration visible on driver side

### What changes were proposed in this pull request?
`spark.buffer.size` not applied in driver from pyspark. In this PR I've fixed this issue.

### Why are the changes needed?
Apply the mentioned config on driver side.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing unit tests + manually.

Added the following code temporarily:
```
def local_connect_and_auth(port, auth_secret):
...
            sock.connect(sa)
            print("SPARK_BUFFER_SIZE: %d" % int(os.environ.get("SPARK_BUFFER_SIZE", 65536))) <- This is the addition
            sockfile = sock.makefile("rwb", int(os.environ.get("SPARK_BUFFER_SIZE", 65536)))
...
```

Test:
```
#Compile Spark

echo "spark.buffer.size 10000" >> conf/spark-defaults.conf

$ ./bin/pyspark
Python 3.8.5 (default, Jul 21 2020, 10:48:26)
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
20/12/03 13:38:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/12/03 13:38:14 WARN SparkEnv: I/O encryption enabled without RPC encryption: keys will be visible on the wire.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
      /_/

Using Python version 3.8.5 (default, Jul 21 2020 10:48:26)
Spark context Web UI available at http://192.168.0.189:4040
Spark context available as 'sc' (master = local[*], app id = local-1606999094506).
SparkSession available as 'spark'.
>>> sc.setLogLevel("TRACE")
>>> sc.parallelize([0, 2, 3, 4, 6], 5).glom().collect()
...
SPARK_BUFFER_SIZE: 10000
...
[[0], [2], [3], [4], [6]]
>>>
```

Closes #30592 from gaborgsomogyi/SPARK-33629.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: bd71186)
The file was modifiedpython/pyspark/context.py (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/api/python/PythonUtils.scala (diff)
Commit aa13e207c9091e24aae1edcf3bb5cd35d3a27cbb by dongjoon
[SPARK-33623][SQL] Add canDeleteWhere to SupportsDelete

### What changes were proposed in this pull request?

This PR provides us with a way to check if a data source is going to reject the delete via `deleteWhere` at planning time.

### Why are the changes needed?

The only way to support delete statements right now is to implement ``SupportsDelete``. According to its Javadoc, that interface is meant for cases when we can delete data without much effort (e.g. like deleting a complete partition in a Hive table).

This PR actually provides us with a way to check if a data source is going to reject the delete via `deleteWhere` at planning time instead of just getting an exception during execution. In the future, we can use this functionality to decide whether Spark should rewrite this delete and execute a distributed query or it can just pass a set of filters.

Consider an example of a partitioned Hive table. If we have a delete predicate like `part_col = '2020'`, we can just drop the matching partition to satisfy this delete. In this case, the data source should return `true` from `canDeleteWhere` and use the filters it accepts in `deleteWhere` to drop the partition. I consider this as a delete without significant effort. At the same time, if we have a delete predicate like `id = 10`, Hive tables would not be able to execute this delete using a metadata only operation without rewriting files. In that case, the data source should return `false` from `canDeleteWhere` and we should use a more sophisticated row-level API to find out which records should be removed (the API is yet to be discussed, but we need this PR as a basis).

If we decide to support subqueries and all delete use cases by simply extending the existing API, this will mean all data sources will have to implement a lot of Spark logic to determine which records changed. I don't think we want to go that way as the Spark logic to determine which records should be deleted is independent of the underlying data source. So the assumption is that Spark will execute a plan to find which records must be deleted for data sources that return `false` from `canDeleteWhere`.
### Does this PR introduce _any_ user-facing change?

Yes but it is backward compatible.

### How was this patch tested?

This PR comes with a new test.

Closes #30562 from aokolnychyi/spark-33623.

Authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: aa13e20)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java (diff)
Commit 63f9d474b9ec4b66741fcca1d3c3865c32936a85 by dongjoon
[SPARK-33634][SQL][TESTS] Use Analyzer in PlanResolutionSuite

### What changes were proposed in this pull request?

Instead of using several analyzer rules, this PR uses the actual analyzer to run tests in `PlanResolutionSuite`.

### Why are the changes needed?

Make the test suite to match reality.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test-only

Closes #30574 from cloud-fan/test.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 63f9d47)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala (diff)
Commit 7e759b2d95eb3592d62ec010297c39384173a93c by weichen.xu
[SPARK-33520][ML][PYSPARK] make CrossValidator/TrainValidateSplit/OneVsRest Reader/Writer support Python backend estimator/evaluator

### What changes were proposed in this pull request?
make CrossValidator/TrainValidateSplit/OneVsRest Reader/Writer support Python backend estimator/model

### Why are the changes needed?
Currently, pyspark support third-party library to define python backend estimator/evaluator, i.e., estimator that inherit `Estimator` instead of `JavaEstimator`, and only can be used in pyspark.

CrossValidator and TrainValidateSplit support tuning these python backend estimator,
but cannot support saving/load, becase CrossValidator and TrainValidateSplit writer implementation is use JavaMLWriter, which require to convert nested estimator and evaluator into java instance.

OneVsRest saving/load now only support java backend classifier due to similar issue.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test.

Closes #30471 from WeichenXu123/support_pyio_tuning.

Authored-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
(commit: 7e759b2)
The file was modifiedpython/pyspark/ml/classification.pyi (diff)
The file was modifiedpython/pyspark/ml/tests/test_persistence.py (diff)
The file was modifiedpython/pyspark/ml/classification.py (diff)
The file was modifiedpython/pyspark/ml/tuning.py (diff)
The file was modifiedpython/pyspark/testing/mlutils.py (diff)
The file was modifiedpython/pyspark/ml/util.pyi (diff)
The file was modifiedpython/pyspark/ml/tuning.pyi (diff)
The file was modifiedpython/pyspark/ml/tests/test_tuning.py (diff)
The file was modifiedpython/pyspark/ml/util.py (diff)
Commit 85949588b71ed548a2e10d2e58183d9cce313a48 by dongjoon
[SPARK-33650][SQL] Fix the error from ALTER TABLE .. ADD/DROP PARTITION for non-supported partition management table

### What changes were proposed in this pull request?
In the PR, I propose to change the order of post-analysis checks for the `ALTER TABLE .. ADD/DROP PARTITION` command, and perform the general check (does the table support partition management at all) before specific checks.

### Why are the changes needed?
The error message for the table which doesn't support partition management can mislead users:
```java
PartitionSpecs are not resolved;;
'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
+- ResolvedTable org.apache.spark.sql.connector.InMemoryTableCatalog2fd64b11, ns1.ns2.tbl, org.apache.spark.sql.connector.InMemoryTable5d3ff859
```
because it says nothing about the root cause of the issue.

### Does this PR introduce _any_ user-facing change?
Yes. After the change, the error message will be:
```
Table ns1.ns2.tbl can not alter partitions
```

### How was this patch tested?
By running the affected test suite `AlterTablePartitionV2SQLSuite`.

Closes #30594 from MaxGekk/check-order-AlterTablePartition.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 8594958)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/AlterTablePartitionV2SQLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
Commit 29e415deac3c90936dd1466eab6b001b7f1f4959 by gengliang.wang
[SPARK-33649][SQL][DOC] Improve the doc of spark.sql.ansi.enabled

### What changes were proposed in this pull request?

Improve the documentation of SQL configuration `spark.sql.ansi.enabled`

### Why are the changes needed?

As there are more and more new features under the SQL configuration `spark.sql.ansi.enabled`, we should make it more clear about:
1. what exactly it is
2. where can users find all the features of the ANSI mode
3. whether all the features are exactly from the SQL standard

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

It's just doc change.

Closes #30593 from gengliangwang/reviseAnsiDoc.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
(commit: 29e415d)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifieddocs/sql-ref-ansi-compliance.md (diff)
Commit e22ddb6740e73a5d1b4ef1ddd21e4241bf85f03c by wenchen
[SPARK-32405][SQL][FOLLOWUP] Remove USING _ in CREATE TABLE in JDBCTableCatalog docker tests

### What changes were proposed in this pull request?
remove USING _ in CREATE TABLE in JDBCTableCatalog docker tests

### Why are the changes needed?
Previously CREATE TABLE syntax forces users to specify a provider so we have to add a USING _ . Now the problem was fix and we need to remove it.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests

Closes #30599 from huaxingao/remove_USING.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: e22ddb6)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/PostgresIntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MsSqlServerIntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/MySQLIntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/OracleIntegrationSuite.scala (diff)
Commit e02324f2dda3510dd229199e97c87ffdcc766a18 by wenchen
[SPARK-33142][SPARK-33647][SQL] Store SQL text for SQL temp view

### What changes were proposed in this pull request?
Currently, in spark, the temp view is saved as its analyzed logical plan, while the permanent view
is kept in HMS with its origin SQL text. As a result, permanent and temporary views have
different behaviors in some cases. In this PR we store the SQL text for temporary view in order
to unify the behavior between permanent and temporary views.

### Why are the changes needed?
to unify the behavior between permanent and temporary views

### Does this PR introduce _any_ user-facing change?
Yes, with this PR, the temporary view will be re-analyzed when it's referred. So if the
underlying datasource changed, the view will also be updated.

### How was this patch tested?
existing and newly added test cases

Closes #30567 from linhongliu-db/SPARK-33142.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: e02324f)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/show-tblproperties.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/subquery/negative-cases/invalid-correlation.sql.out (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/describe.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/create_view.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/group-by-filter.sql.out (diff)
Commit 15579ba1f82e321a694130d4c9db2a6524e9ae2e by wenchen
[SPARK-33430][SQL] Support namespaces in JDBC v2 Table Catalog

### What changes were proposed in this pull request?
Add namespaces support in JDBC v2 Table Catalog by making ```JDBCTableCatalog``` extends```SupportsNamespaces```

### Why are the changes needed?
make v2 JDBC implementation complete

### Does this PR introduce _any_ user-facing change?
Yes. Add the following to  ```JDBCTableCatalog```

- listNamespaces
- listNamespaces(String[] namespace)
- namespaceExists(String[] namespace)
- loadNamespaceMetadata(String[] namespace)
- createNamespace
- alterNamespace
- dropNamespace

### How was this patch tested?
Add new docker tests

Closes #30473 from huaxingao/name_space.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 15579ba)
The file was addedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/PostgresNamespaceSuite.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala (diff)
The file was addedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCNamespaceTest.scala
Commit e8380665c7e3aca446631964f49e09f264dee1c2 by gurwls223
[SPARK-33658][SQL] Suggest using Datetime conversion functions for invalid ANSI casting

### What changes were proposed in this pull request?

Suggest users using Datetime conversion functions in the error message of invalid ANSI explicit casting.

### Why are the changes needed?

In ANSI mode, explicit cast between DateTime types and Numeric types is not allowed.
As of now, we have introduced new functions `UNIX_SECONDS`/`UNIX_MILLIS`/`UNIX_MICROS`/`UNIX_DATE`/`DATE_FROM_UNIX_DATE`, we can show suggestions to users so that they can complete these type conversions precisely and easily in ANSI mode.

### Does this PR introduce _any_ user-facing change?

Yes, better error messages

### How was this patch tested?

Unit test

Closes #30603 from gengliangwang/improveErrorMsgOfExplicitCast.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: e838066)
The file was modifieddocs/sql-ref-ansi-compliance.md (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala (diff)
Commit 94c144bdd05d6c751dcd907161e1b965e637f69c by gurwls223
[SPARK-33571][SQL][DOCS] Add a ref to INT96 config from the doc for `spark.sql.legacy.parquet.datetimeRebaseModeInWrite/Read`

### What changes were proposed in this pull request?
For the SQL configs `spark.sql.legacy.parquet.datetimeRebaseModeInWrite` and `spark.sql.legacy.parquet.datetimeRebaseModeInRead`, improve their descriptions by:
1. Explicitly document on which parquet types, those configs influence on
2. Refer to corresponding configs for `INT96`

### Why are the changes needed?
To avoid user confusions like reposted in SPARK-33571, and make the config description more precise.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `./dev/scalastyle`.

Closes #30596 from MaxGekk/clarify-rebase-docs.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 94c144b)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit 325abf7957373161d2cf0921d35567235186d6eb by kabhwan.opensource
[SPARK-33577][SS] Add support for V1Table in stream writer table API and create table if not exist by default

### What changes were proposed in this pull request?
After SPARK-32896, we have table API for stream writer but only support DataSource v2 tables. Here we add the following enhancements:

- Create non-existing tables by default
- Support both managed and external V1Tables

### Why are the changes needed?
Make the API covers more use cases. Especially for the file provider based tables.

### Does this PR introduce _any_ user-facing change?
Yes, new features added.

### How was this patch tested?
Add new UTs.

Closes #30521 from xuanyuanking/SPARK-33577.

Authored-by: Yuanjian Li <yuanjian.li@databricks.com>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
(commit: 325abf7)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala (diff)
Commit 91baab77f7e0a5102ac069846f0e2920bb2dd15a by dongjoon
[SPARK-33656][TESTS] Add option to keep container after tests finish for DockerJDBCIntegrationSuites for debug

### What changes were proposed in this pull request?

This PR add an option to keep container after DockerJDBCIntegrationSuites (e.g. DB2IntegrationSuite, PostgresIntegrationSuite) finish.
By setting a system property `spark.test.docker.keepContainer` to `true`, we can use this option.

### Why are the changes needed?

If some error occur during the tests, it would be useful to keep the container for debug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed that the container is kept after the test by the following commands.
```
# With sbt
$ build/sbt -Dspark.test.docker.keepContainer=true -Pdocker-integration-tests -Phive -Phive-thriftserver package "testOnly org.apache.spark.sql.jdbc.MariaDBKrbIntegrationSuite"

# With Maven
$ build/mvn -Dspark.test.docker.keepContainer=true -Pdocker-integration-tests -Phive -Phive-thriftserver -Dtest=none -DwildcardSuites=org.apache.spark.sql.jdbc.MariaDBKrbIntegrationSuite test

$ docker container ls
```

I also confirmed that there are no regression for all the subclasses of `DockerJDBCIntegrationSuite` with sbt/Maven.
* MariaDBKrbIntegrationSuite
* DB2KrbIntegrationSuite
* PostgresKrbIntegrationSuite
* MySQLIntegrationSuite
* PostgresIntegrationSuite
* DB2IntegrationSuite
* MsSqlServerintegrationsuite
* OracleIntegrationSuite
* v2.MySQLIntegrationSuite
* v2.PostgresIntegrationSuite
* v2.DB2IntegrationSuite
* v2.MsSqlServerIntegrationSuite
* v2.OracleIntegrationSuite

NOTE: `DB2IntegrationSuite`, `v2.DB2IntegrationSuite` and `DB2KrbIntegrationSuite` can fail due to the too much short connection timeout. It's a separate issue and I'll fix it in #30583

Closes #30601 from sarutak/keepContainer.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 91baab7)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala (diff)
The file was modifiedpom.xml (diff)
Commit 976e8970399a1a0fef4c826d4fdd1a138ca52c77 by dongjoon
[SPARK-33640][TESTS] Extend connection timeout to DB server for DB2IntegrationSuite and its variants

### What changes were proposed in this pull request?

This PR extends the connection timeout to the DB server for DB2IntegrationSuite and its variants.

The container image ibmcom/db2 creates a database when it starts up.
The database creation can take over 2 minutes.

DB2IntegrationSuite and its variants use the container image but the connection timeout is set to 2 minutes so these suites almost always fail.
### Why are the changes needed?

To pass those suites.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed the suites pass with the following commands.
```
$ build/sbt -Pdocker-integration-tests -Phive -Phive-thriftserver package "testOnly org.apache.spark.sql.jdbc.DB2IntegrationSuite"
$ build/sbt -Pdocker-integration-tests -Phive -Phive-thriftserver package "testOnly org.apache.spark.sql.jdbc.v2.DB2IntegrationSuite"
$ build/sbt -Pdocker-integration-tests -Phive -Phive-thriftserver package "testOnly org.apache.spark.sql.jdbc.DB2KrbIntegrationSuite"

Closes #30583 from sarutak/extend-timeout-for-db2.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 976e897)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2KrbIntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala (diff)
Commit 233a8494c8cc7bc8a4a9393ec512943749f11bef by gurwls223
[SPARK-27237][SS] Introduce State schema validation among query restart

## What changes were proposed in this pull request?

Please refer the description of [SPARK-27237](https://issues.apache.org/jira/browse/SPARK-27237) to see rationalization of this patch.

This patch proposes to introduce state schema validation, via storing key schema and value schema to `schema` file (for the first time) and verify new key schema and value schema for state are compatible with existing one. To be clear for definition of "compatible", state schema is "compatible" when number of fields are same and data type for each field is same - Spark has been allowing rename of field.

This patch will prevent query run which has incompatible state schema, which would reduce the chance to get indeterministic behavior (actually renaming of field is also the smell of semantically incompatible, but end users could just modify its name so we can't say) as well as providing more informative error message.

## How was this patch tested?

Added UTs.

Closes #24173 from HeartSaVioR/SPARK-27237.

Lead-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Co-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 233a849)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityCheckerSuite.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingStateStoreFormatCompatibilitySuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateSchemaCompatibilityChecker.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataVersionUtil.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreConf.scala (diff)
Commit 990bee9c58ea9abd8c4f04f20c78c6d5b720406a by gurwls223
[SPARK-33615][K8S] Make 'spark.archives' working in Kubernates

### What changes were proposed in this pull request?

This PR proposes to make `spark.archives` configuration working in Kubernates.
It works without a problem in standalone cluster but there seems a bug in Kubernates.
It fails to fetch the file on the driver side as below:

```
20/12/03 13:33:53 INFO SparkContext: Added JAR file:/tmp/spark-75004286-c83a-4369-b624-14c5d2d2a748/spark-examples_2.12-3.1.0-SNAPSHOT.jar at spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/jars/spark-examples_2.12-3.1.0-SNAPSHOT.jar with timestamp 1607002432558
20/12/03 13:33:53 INFO SparkContext: Added archive file:///tmp/tmp4542734800151332666.txt.tar.gz#test_tar_gz at spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/files/tmp4542734800151332666.txt.tar.gz with timestamp 1607002432558
20/12/03 13:33:53 INFO TransportClientFactory: Successfully created connection to spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc/172.17.0.4:7078 after 83 ms (47 ms spent in bootstraps)
20/12/03 13:33:53 INFO Utils: Fetching spark://spark-test-app-48ae737628cee6f8-driver-svc.spark-integration-test.svc:7078/files/tmp4542734800151332666.txt.tar.gz to /tmp/spark-66573e24-27a3-427c-99f4-36f06d9e9cd5/fetchFileTemp2665785666227461849.tmp
20/12/03 13:33:53 ERROR SparkContext: Error initializing SparkContext.
java.lang.RuntimeException: Stream '/files/tmp4542734800151332666.txt.tar.gz' was not found.
at org.apache.spark.network.client.TransportResponseHandler.handle(TransportResponseHandler.java:242)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:142)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
```

This is because `spark.archives` was not actually added on the driver side correctly. The changes here fix it by adding and resolving URIs correctly.

### Why are the changes needed?

`spark.archives` feature can be leveraged for many things such as Conda support. We should make it working in Kubernates as well.
This is a bug fix too.

### Does this PR introduce _any_ user-facing change?

No, this feature is not out yet.

### How was this patch tested?

I manually tested with Minikube 1.15.1. For an environment issue (?), I had to use a custom namespace, service account and roles. `default` service account does not work for me and complains it doesn't have permissions to get/list pods, etc.

```bash
minikube delete
minikube start --cpus 12 --memory 16384
kubectl create namespace spark-integration-test
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark
  namespace: spark-integration-test
EOF
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=spark-integration-test:spark --namespace=spark-integration-test
dev/make-distribution.sh --pip --tgz -Pkubernetes
resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --spark-tgz `pwd`/spark-3.1.0-SNAPSHOT-bin-3.2.0.tgz  --service-account spark --namespace spark-integration-test
```

Closes #30581 from HyukjinKwon/SPARK-33615.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 990bee9)
The file was modifiedresource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala (diff)
The file was modifieddocs/running-on-kubernetes.md (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/SparkContext.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/Utils.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala (diff)
Commit acc211d2cf0e6ab94f6578e1eb488766fd20fa4e by wenchen
[SPARK-33141][SQL][FOLLOW-UP] Store the max nested view depth in AnalysisContext

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/30289. It removes the hack in `View.effectiveSQLConf`, by putting the max nested view depth in `AnalysisContext`. Then we don't get the max nested view depth from the active SQLConf, which keeps changing during nested view resolution.

### Why are the changes needed?

remove hacks.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

If I just remove the hack, `SimpleSQLViewSuite.restrict the nested level of a view` fails. With this fix, it passes again.

Closes #30575 from cloud-fan/view.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: acc211d)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala (diff)
Commit d671e053e9806d6b4e43a39f5018aa9718790160 by kabhwan.opensource
[SPARK-33660][DOCS][SS] Fix Kafka Headers Documentation

### What changes were proposed in this pull request?

Update kafka headers documentation, type is not longer a map but an array

[jira](https://issues.apache.org/jira/browse/SPARK-33660)

### Why are the changes needed?
To help users

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?

It is only documentation

Closes #30605 from Gschiavon/SPARK-33660-fix-kafka-headers-documentation.

Authored-by: german <germanschiavon@gmail.com>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
(commit: d671e05)
The file was modifieddocs/structured-streaming-kafka-integration.md (diff)
Commit de9818f043c1ebcda321077633f93072feba601f by dongjoon
[SPARK-33662][BUILD] Setting version to 3.2.0-SNAPSHOT

### What changes were proposed in this pull request?

This PR aims to update `master` branch version to 3.2.0-SNAPSHOT.

### Why are the changes needed?

Start to prepare Apache Spark 3.2.0.

### Does this PR introduce _any_ user-facing change?

N/A.

### How was this patch tested?

Pass the CIs.

Closes #30606 from dongjoon-hyun/SPARK-3.2.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: de9818f)
The file was modifiedlauncher/pom.xml (diff)
The file was modifiedstreaming/pom.xml (diff)
The file was modifiedexternal/avro/pom.xml (diff)
The file was modifiedpom.xml (diff)
The file was modifiedcommon/network-shuffle/pom.xml (diff)
The file was modifiedcommon/sketch/pom.xml (diff)
The file was modifiedmllib-local/pom.xml (diff)
The file was modifiedexternal/spark-ganglia-lgpl/pom.xml (diff)
The file was modifiedtools/pom.xml (diff)
The file was modifiedgraphx/pom.xml (diff)
The file was modifiedexternal/kafka-0-10/pom.xml (diff)
The file was modifiedassembly/pom.xml (diff)
The file was modifiedcommon/network-yarn/pom.xml (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/pom.xml (diff)
The file was modifiedresource-managers/kubernetes/core/pom.xml (diff)
The file was modifiedexternal/kinesis-asl/pom.xml (diff)
The file was modifiedcore/pom.xml (diff)
The file was modifiedproject/MimaExcludes.scala (diff)
The file was modifiedexamples/pom.xml (diff)
The file was modifiedexternal/kafka-0-10-assembly/pom.xml (diff)
The file was modifiedcommon/network-common/pom.xml (diff)
The file was modifiedsql/hive-thriftserver/pom.xml (diff)
The file was modifiedR/pkg/DESCRIPTION (diff)
The file was modifiedsql/core/pom.xml (diff)
The file was modifiedcommon/kvstore/pom.xml (diff)
The file was modifiedpython/pyspark/version.py (diff)
The file was modifieddocs/_config.yml (diff)
The file was modifiedexternal/kafka-0-10-sql/pom.xml (diff)
The file was modifiedexternal/docker-integration-tests/pom.xml (diff)
The file was modifiedmllib/pom.xml (diff)
The file was modifiedrepl/pom.xml (diff)
The file was modifiedsql/catalyst/pom.xml (diff)
The file was modifiedsql/hive/pom.xml (diff)
The file was modifiedexternal/kinesis-asl-assembly/pom.xml (diff)
The file was modifiedhadoop-cloud/pom.xml (diff)
The file was modifiedcommon/tags/pom.xml (diff)
The file was modifiedresource-managers/mesos/pom.xml (diff)
The file was modifiedcommon/unsafe/pom.xml (diff)
The file was modifiedresource-managers/yarn/pom.xml (diff)
The file was modifiedexternal/kafka-0-10-token-provider/pom.xml (diff)
Commit b6b45bc695706201693572bfb87bcee310548945 by dongjoon
[SPARK-33141][SQL][FOLLOW-UP] Fix Scala 2.13 compilation

### What changes were proposed in this pull request?

This PR aims to fix Scala 2.13 compilation.

### Why are the changes needed?

To recover Scala 2.13.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GitHub Action Scala 2.13 build job.

Closes #30611 from dongjoon-hyun/SPARK-33141.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: b6b45bc)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala (diff)
Commit 960d6af75d5ef29b1efcf0d03e7db840270382e6 by dongjoon
[SPARK-33472][SQL][FOLLOW-UP] Update RemoveRedundantSorts comment

### What changes were proposed in this pull request?
This PR is a follow-up for #30373 that updates the comment for RemoveRedundantSorts in QueryExecution.

### Why are the changes needed?
To update an incorrect comment.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
N/A

Closes #30584 from allisonwang-db/spark-33472-followup.

Authored-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 960d6af)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala (diff)
Commit 1b4e35d1a8acf7b744e11b9ac9ca8f81de6db5e5 by dongjoon
[SPARK-33651][SQL] Allow CREATE EXTERNAL TABLE with LOCATION for data source tables

### What changes were proposed in this pull request?

This PR removes the restriction and allows CREATE EXTERNAL TABLE with LOCATION for data source tables. It also moves the check from the analyzer rule `ResolveSessionCatalog` to `SessionCatalog`, so that v2 session catalog can overwrite it.

### Why are the changes needed?

It's an unnecessary behavior difference that Hive serde table can be created with `CREATE EXTERNAL TABLE` if LOCATION is present, while data source table doesn't allow `CREATE EXTERNAL TABLE` at all.

### Does this PR introduce _any_ user-facing change?

Yes, now `CREATE EXTERNAL TABLE ... USING ... LOCATION ...` is allowed.

### How was this patch tested?

new tests

Closes #30595 from cloud-fan/minor.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 1b4e35d)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/sources/CreateTableAsSelectSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/TestV2SessionCatalogBase.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSessionCatalogSuite.scala (diff)