Changes

Summary

  1. [SPARK-33762][BUILD] Upgrade commons-codec to 1.15 (commit: 99848e5) (details)
  2. [SPARK-33766][BUILD] Upgrade Jackson to 2.11.4 (commit: 01b73ae) (details)
  3. [SPARK-33589][SQL][FOLLOWUP] Replace Throwable with NonFatal (commit: 94bc2d6) (details)
  4. [SPARK-33764][SS] Make state store maintenance interval as SQL config (commit: 45af3c9) (details)
  5. [SPARK-33690][SQL] Escape meta-characters in showString (commit: 8197ee3) (details)
  6. [SPARK-33723][SQL] ANSI mode: Casting String to Date should throw (commit: 6e86279) (details)
  7. [SPARK-33757][INFRA][R][FOLLOWUP] Provide more simple solution (commit: b135db3) (details)
  8. [SPARK-33705][SQL][TEST] Fix HiveThriftHttpServerSuite flakiness (commit: 4d47ac4) (details)
  9. [SPARK-33770][SQL][TESTS] Fix the `ALTER TABLE .. DROP PARTITION` tests (commit: 9160d59) (details)
  10. [SPARK-33768][SQL] Remove `retainData` from `AlterTableDropPartition` (commit: 817f58d) (details)
  11. [SPARK-33546][SQL] Enable row format file format validation in CREATE (commit: e7fe92f) (details)
  12. [SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test cases (commit: b7c8210) (details)
  13. [SPARK-33751][SQL] Migrate ALTER VIEW ... AS command to use (commit: a84c8d8) (details)
  14. [SPARK-33673][SQL] Avoid push down partition filters to ParquetScan for (commit: cd0356d) (details)
  15. [SPARK-33716][K8S] Fix potential race condition during pod termination (commit: bf2c88c) (details)
  16. [SPARK-33428][SQL] Conv UDF use BigInt to avoid  Long value overflow (commit: 5f9a7fe) (details)
  17. [SPARK-33733][SQL] PullOutNondeterministic should check and collect (commit: 839d689) (details)
  18. [SPARK-33779][SQL] DataSource V2: API to request distribution and (commit: 82aca7e) (details)
  19. [SPARK-33779][SQL][FOLLOW-UP] Fix Java Linter error (commit: bb60fb1) (details)
  20. [SPARK-33261][K8S] Add a developer API for custom feature steps (commit: 5885cc1) (details)
  21. [SPARK-33771][SQL][TESTS] Fix Invalid value for HourOfAmPm when testing (commit: 412d86e) (details)
  22. [SPARK-33777][SQL] Sort output of V2 SHOW PARTITIONS (commit: f156718) (details)
  23. [SPARK-33653][SQL] DSv2: REFRESH TABLE should recache the table itself (commit: 49d3256) (details)
  24. [SPARK-33748][K8S] Respect environment variables and configurations for (commit: a99a47c) (details)
  25. [SPARK-33785][SQL] Migrate ALTER TABLE ... RECOVER PARTITIONS to use (commit: 366beda) (details)
  26. [SPARK-33767][SQL][TESTS] Unify v1 and v2 ALTER TABLE .. DROP PARTITION (commit: 141e26d) (details)
  27. [SPARK-33273][SQL] Fix a race condition in subquery execution (commit: 0304252) (details)
  28. [SPARK-33769][SQL] Improve the next-day function of the sql component to (commit: 20f6d63) (details)
  29. [SPARK-33752][SQL] Avoid the getSimpleMessage of AnalysisException adds (commit: 58cb2ba) (details)
  30. [SPARK-33758][SQL] Prune unrequired partitionings from (commit: 23083aa) (details)
  31. [SPARK-33617][SQL][FOLLOWUP] refine the default parallelism SQL config (commit: 40c37d6) (details)
  32. [SPARK-33735][SQL] Handle UPDATE in ReplaceNullWithFalseInPredicate (commit: 4d56d43) (details)
  33. [SPARK-22256][MESOS] Introduce spark.mesos.driver.memoryOverhead (commit: 87c5836) (details)
  34. [SPARK-33788][SQL] Throw NoSuchPartitionsException from (commit: 3dfdcf4) (details)
  35. [SPARK-33796][DOCS] Show hidden text from the left menu of Spark Doc (commit: dd042f5) (details)
  36. [SPARK-33793][TESTS] Introduce withExecutor to ensure proper cleanup in (commit: ddff94f) (details)
Commit 99848e530f8528283bb21afac2f89984924f2235 by dongjoon
[SPARK-33762][BUILD] Upgrade commons-codec to 1.15

### What changes were proposed in this pull request?

### Why are the changes needed?

Open Source scans are reporting a potential encoding/decoding issue related to versions of commons-codec prior to 1.13. Commit referenced: https://github.com/apache/commons-codec/commit/48b615756d1d770091ea3322eefc08011ee8b113

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #30740 from n-marion/SPARK-33762_upgrade-commons-codec.

Authored-by: Nicholas Marion <nmarion@us.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 99848e5)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
Commit 01b73ae6388279514d61c14a9dc9718a34dad465 by dongjoon
[SPARK-33766][BUILD] Upgrade Jackson to 2.11.4

### What changes were proposed in this pull request?

This pr upgrade Jackson to 2.11.4.
Jackson Release 2.11: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.11

### Why are the changes needed?

Make it easy to upgrade dependency because Jackson 2.10 is not compatible with 2.11:
```
com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.5 requires Jackson Databind version >= 2.10.0 and < 2.11.0
```
[Avro](https://issues.apache.org/jira/browse/AVRO-2967) has upgraded Jackson to 2.11.3.
[Parquet](https://issues.apache.org/jira/browse/PARQUET-1895) has upgraded Jackson to 2.11.2.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing test.

Closes #30746 from wangyum/SPARK-33766.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 01b73ae)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
Commit 94bc2d61a2598d995df8eb79fe450b0e5f6d7582 by dongjoon
[SPARK-33589][SQL][FOLLOWUP] Replace Throwable with NonFatal

### What changes were proposed in this pull request?

This pr replace `Throwable` with `NonFatal`.

### Why are the changes needed?

Improve code.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #30744 from wangyum/SPARK-33589-2.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 94bc2d6)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala (diff)
Commit 45af3c96889eba1958055206f10524299d0be61c by dongjoon
[SPARK-33764][SS] Make state store maintenance interval as SQL config

### What changes were proposed in this pull request?

Currently the maintenance interval is hard-coded in `StateStore`. This patch proposes to make it as SQL config.

### Why are the changes needed?

Currently the maintenance interval is hard-coded in `StateStore`. For consistency reason, it should be placed together with other SS configs together. SQLConf also has a better way to have doc and default value setting.

### Does this PR introduce _any_ user-facing change?

Yes. Previously users use Spark config to set the maintenance interval. Now they could use SQL config to set it.

### How was this patch tested?

Unit test.

Closes #30741 from viirya/maintenance-interval-sqlconfig.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 45af3c9)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreConf.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit 8197ee3b15265d39f05f192934b7d7e661713eaa by dongjoon
[SPARK-33690][SQL] Escape meta-characters in showString

### What changes were proposed in this pull request?

This PR intends to escape meta-characters (e.g., \n and \t) in `Dataset.showString`.
Before this PR:
```
scala> Seq("aaa\nbbb\t\tccccc").toDF("value").show()
+--------------+
|         value|
+--------------+
|aaa
bbb ccccc|
+--------------+
```
After this PR:
```
+-----------------+
|            value|
+-----------------+
|aaa\nbbb\t\tccccc|
+-----------------+
```

### Why are the changes needed?

For better output.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added a unit test.

Closes #30647 from maropu/EscapeMetaInShow.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 8197ee3)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Dataset.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala (diff)
Commit 6e862792fbc6c0916ad04f1c23dc4acbc5f5a53b by gurwls223
[SPARK-33723][SQL] ANSI mode: Casting String to Date should throw exception on parse error

### What changes were proposed in this pull request?

Currently, when casting a string as timestamp type in ANSI mode, Spark throws a runtime exception on parsing error.
However, the result for casting a string to date is always null. We should throw an exception on parsing error as well.

### Why are the changes needed?

Add missing feature for ANSI mode

### Does this PR introduce _any_ user-facing change?

Yes for ANSI mode, Casting string to date will throw an exception on parsing error

### How was this patch tested?

Unit test

Closes #30687 from gengliangwang/castDate.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 6e86279)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime-legacy.sql.out (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala (diff)
The file was modifieddocs/sql-ref-ansi-compliance.md (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/datetime.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/datetime.sql (diff)
Commit b135db3b1a5c0b2170e98b97f6160bcf55903799 by dongjoon
[SPARK-33757][INFRA][R][FOLLOWUP] Provide more simple solution

### What changes were proposed in this pull request?

This PR proposes a better solution for the R build failure on GitHub Actions.
The issue is solved in #30737 but I noticed the following two things.

* We can use the latest `usethis` if we install additional libraries on the GitHub Actions environment.
* For tests on AppVeyor, `usethis` is not necessary, so I partially revert the previous change.

### Why are the changes needed?

For more simple solution.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Confirmed on GitHub Actions and AppVeyor on my account.

Closes #30753 from sarutak/followup-SPARK-33757.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: b135db3)
The file was modifiedappveyor.yml (diff)
The file was modified.github/workflows/build_and_test.yml (diff)
Commit 4d47ac4b4b20a475c2f416c7d614318b31323041 by wenchen
[SPARK-33705][SQL][TEST] Fix HiveThriftHttpServerSuite flakiness

### What changes were proposed in this pull request?
TO FIX flaky tests:

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132345/testReport/
```
org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.JDBC query execution
org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.Checks Hive version
org.apache.spark.sql.hive.thriftserver.HiveThriftHttpServerSuite.SPARK-24829 Checks cast as float
```

The root cause here is a jar conflict issue.
`NewCookie.isHttpOnly` is not defined in the `jsr311-api.jar` which conflicts
The transitive artifact `jsr311-api.jar` of `hadoop-client` is excluded at the maven side. See https://issues.apache.org/jira/browse/SPARK-27179.

The Jenkins PR builder and Github Action use `SBT` as the compiler tool.

First, the exclusion rule from maven is not followed by sbt, so I was able to see `jsr311-api.jar` from maven cache to be added to the classpath directly. **This seems to be a  bug of `sbt-pom-reader` plugin but I'm not that sure.**

Then I added an `ExcludeRule` for the `hive-thriftserver` module at the SBT side and did see the `jsr311-api.jar` gone, but the CI jobs still failed with the same error.

I added a trace log in ThriftHttpServlet

```s
ERROR ThriftHttpServlet: !!!!!!!!! Suspect???????? --->
file:/home/jenkins/workspace/SparkPullRequestBuilder/assembly/target/scala-2.12/jars/jsr311-api-1.1.1.jar
```
And the log pointed out that the assembly phase copied it to `assembly/target/scala-2.12/jars/` which will be added to the classpath too. With the help of SBT `dependencyTree` tool, I saw the `jsr311-api` again as a transitive of `jersery-core` from `yarn` module with a `test` scope. So **This seems to be another bug from the SBT side of the `sbt-assembly` plugin.**  It copied a test scope transitive artifact to the assembly output.

In this PR, I defined some rules in SparkBuild.scala to bypass the potential bugs from the SBT side.

First, exclude the `jsr311` from all over the project and then add it back separately to the YARN module for SBT.

Additionally, the HiveThriftServerSuites was reflected for reducing flakiness too, but not related to the bugs I have found so far.

### Why are the changes needed?

fix test here

### Does this PR introduce _any_ user-facing change?

NO
### How was this patch tested?

passing jenkins and ga

Closes #30643 from yaooqinn/HiveThriftHttpServerSuite.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 4d47ac4)
The file was modifiedproject/SparkBuild.scala (diff)
The file was modifiedresource-managers/yarn/pom.xml (diff)
The file was modifiedLICENSE-binary (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkThriftServerProtocolVersionsSuite.scala (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/UISeleniumSuite.scala (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifiedcore/pom.xml (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkMetadataOperationSuite.scala (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
The file was removedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/JdbcConnectionUriSuite.scala
Commit 9160d59ae379910ca3bbd04ee25d336afff28abd by gurwls223
[SPARK-33770][SQL][TESTS] Fix the `ALTER TABLE .. DROP PARTITION` tests that delete files out of partition path

### What changes were proposed in this pull request?
Modify the tests that add partitions with `LOCATION`, and where the number of nested folders in `LOCATION` doesn't match to the number of partitioned columns. In that case, `ALTER TABLE .. DROP PARTITION` tries to access (delete) folder out of the "base" path in `LOCATION`.

The problem belongs to Hive's MetaStore method `drop_partition_common`:
https://github.com/apache/hive/blob/8696c82d07d303b6dbb69b4d443ab6f2b241b251/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L4876
which tries to delete empty partition sub-folders recursively starting from the most deeper partition sub-folder up to the base folder. In the case when the number of sub-folder is not equal to the number of partitioned columns `part_vals.size()`, the method will try to list and delete folders out of the base path.

### Why are the changes needed?
To fix test failures like https://github.com/apache/spark/pull/30643#issuecomment-743774733:
```
org.apache.spark.sql.hive.execution.command.AlterTableAddPartitionSuite.ALTER TABLE .. ADD PARTITION Hive V1: SPARK-33521: universal type conversions of partition values
sbt.ForkMain$ForkError: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: File file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-832cb19c-65fd-41f3-ae0b-937d76c07897 does not exist;
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
at org.apache.spark.sql.hive.HiveExternalCatalog.dropPartitions(HiveExternalCatalog.scala:1014)
...
Caused by: sbt.ForkMain$ForkError: org.apache.hadoop.hive.metastore.api.MetaException: File file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-832cb19c-65fd-41f3-ae0b-937d76c07897 does not exist
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partition_with_environment_context(HiveMetaStore.java:3381)
at sun.reflect.GeneratedMethodAccessor304.invoke(Unknown Source)
```

The issue can be reproduced by the following steps:
1. Create a base folder, for example: `/Users/maximgekk/tmp/part-location`
2. Create a sub-folder in the base folder and drop permissions for it:
```
$ mkdir /Users/maximgekk/tmp/part-location/aaa
$ chmod a-rwx chmod a-rwx /Users/maximgekk/tmp/part-location/aaa
$ ls -al /Users/maximgekk/tmp/part-location
total 0
drwxr-xr-x   3 maximgekk  staff    96 Dec 13 18:42 .
drwxr-xr-x  33 maximgekk  staff  1056 Dec 13 18:32 ..
d---------   2 maximgekk  staff    64 Dec 13 18:42 aaa
```
3. Create a table with a partition folder in the base folder:
```sql
spark-sql> create table tbl (id int) partitioned by (part0 int, part1 int);
spark-sql> alter table tbl add partition (part0=1,part1=2) location '/Users/maximgekk/tmp/part-location/tbl';
```
4. Try to drop this partition:
```
spark-sql> alter table tbl drop partition (part0=1,part1=2);
20/12/13 18:46:07 ERROR HiveClientImpl:
======================
Attempt to drop the partition specs in table 'tbl' database 'default':
Map(part0 -> 1, part1 -> 2)
In this attempt, the following partitions have been dropped successfully:

The remaining partitions have not been dropped:
[1, 2]
======================

Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Error accessing file:/Users/maximgekk/tmp/part-location/aaa;
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Error accessing file:/Users/maximgekk/tmp/part-location/aaa;
```
The command fails because it tries to access to the sub-folder `aaa` that is out of the partition path `/Users/maximgekk/tmp/part-location/tbl`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the affected tests from local IDEA which does not have access to folders out of partition paths.

Closes #30752 from MaxGekk/fix-drop-partition-location.

Lead-authored-by: Max Gekk <max.gekk@gmail.com>
Co-authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 9160d59)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableAddPartitionSuiteBase.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
Commit 817f58ddcb775dacbe1b4b2b99056a74a56f65e9 by wenchen
[SPARK-33768][SQL] Remove `retainData` from `AlterTableDropPartition`

### What changes were proposed in this pull request?
Remove the `retainData` parameter from the logical node `AlterTableDropPartition`.

### Why are the changes needed?
The `AlterTableDropPartition` command reflects the sql statement (see SqlBase.g4):
```
    | ALTER (TABLE | VIEW) multipartIdentifier
        DROP (IF EXISTS)? partitionSpec (',' partitionSpec)* PURGE?    #dropTablePartitions
```
but Spark doesn't allow to specify data retention. So, the parameter can be removed to improve code maintenance.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the test suite `DDLParserSuite`.

Closes #30748 from MaxGekk/remove-retainData.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 817f58d)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolvePartitionSpec.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
Commit e7fe92f12991ce4ccc101c2cc01354201c9c5384 by wenchen
[SPARK-33546][SQL] Enable row format file format validation in CREATE TABLE LIKE

### What changes were proposed in this pull request?

[SPARK-33546] stated the there are three inconsistency behaviors for CREATE TABLE LIKE.

1. CREATE TABLE LIKE does not validate the user-specified hive serde. e.g., STORED AS PARQUET can't be used with ROW FORMAT SERDE.
2. CREATE TABLE LIKE requires STORED AS and ROW FORMAT SERDE to be specified together, which is not necessary.
3. CREATE TABLE LIKE does not respect the default hive serde.

This PR fix No.1, and after investigate, No.2 and No.3 turn out not to be issue.

Within Hive.

CREATE TABLE abc ... ROW FORMAT SERDE 'xxx.xxx.SerdeClass' (Without Stored as) will have
following result. Using the user specific SerdeClass and fetch default input/output format from default textfile format.

```
SerDe Library:          xxx.xxx.SerdeClass
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
```

But for
CREATE TABLE dst LIKE src ROW FORMAT SERDE 'xxx.xxx.SerdeClass' (Without Stored as) will just ignore user specific SerdeClass and using (input, output, serdeClass) from src table.

It's better to just throw an exception on such ambiguous behavior, so No.2 is not an issue, but in the PR, we add some comments.

For No.3, in fact, CreateTableLikeCommand is using following logical to try to follow src table's storageFormat if current fileFormat.inputFormat is empty

```
val newStorage = if (fileFormat.inputFormat.isDefined) {
      fileFormat
    } else {
      sourceTableDesc.storage.copy(locationUri = fileFormat.locationUri)
    }
```

If we try to fill the new target table with HiveSerDe.getDefaultStorage if file format and row format is not explicity spefified, it will break the CREATE TABLE LIKE semantic.

### Why are the changes needed?

Bug Fix.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added UT and Existing UT.

Closes #30705 from leanken/leanken-SPARK-33546.

Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: e7fe92f)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
Commit b7c82101352078fb10ab1822bc745c8b4fbb2590 by wenchen
[SPARK-33142][SPARK-33647][SQL][FOLLOW-UP] Add docs and test cases

### What changes were proposed in this pull request?
Addressed comments in PR #30567, including:
1. add test case for SPARK-33647 and SPARK-33142
2. add migration guide
3. add `getRawTempView` and `getRawGlobalTempView` to return the raw view info (i.e. TemporaryViewRelation)
4. other minor code clean

### Why are the changes needed?
Code clean and more test cases

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing and newly added test cases

Closes #30666 from linhongliu-db/SPARK-33142-followup.

Lead-authored-by: Linhong Liu <linhong.liu@databricks.com>
Co-authored-by: Linhong Liu <67896261+linhongliu-db@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: b7c8210)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
Commit a84c8d842ca027ab0f1b641146e81fc2782d150d by wenchen
[SPARK-33751][SQL] Migrate ALTER VIEW ... AS command to use UnresolvedView to resolve the identifier

### What changes were proposed in this pull request?

This PR migrates `ALTER VIEW ... AS` to use `UnresolvedView` to resolve the view identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in [JIRA](https://issues.apache.org/jira/browse/SPARK-29900) or [proposal doc](https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing).

The `TempViewOrV1Table` extractor in `ResolveSessionCatalog.scala` can now be removed as well.

### Why are the changes needed?

To use `UnresolvedView` for view resolution.

### Does this PR introduce _any_ user-facing change?

The exception message changes if a table is found instead of view:
```
// OLD
`tab1` is not a view"
```
```
// NEW
"tab1 is a table. 'ALTER VIEW ... AS' expects a view."
```

### How was this patch tested?

Updated existing tests.

Closes #30723 from imback82/alter_view_as_statement.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: a84c8d8)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
Commit cd0356df9e3cb8e8690a216b8adfac75bcf1365f by yumwang
[SPARK-33673][SQL] Avoid push down partition filters to ParquetScan for DataSourceV2

### What changes were proposed in this pull request?
As described in SPARK-33673, some test suites in `ParquetV2SchemaPruningSuite` will failed when set `parquet.version` to 1.11.1 because Parquet will return empty results for non-existent column since PARQUET-1765.

This pr change to use `readDataSchema()` instead of `schema` to build `pushedParquetFilters` in `ParquetScanBuilder` to avoid push down partition filters to `ParquetScan` for `DataSourceV2`

### Why are the changes needed?
Prepare for upgrade using Parquet 1.11.1.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?

- Pass the Jenkins or GitHub Action

- Manual test as follows:

```
mvn -Dtest=none -DwildcardSuites=org.apache.spark.sql.execution.datasources.parquet.ParquetV2SchemaPruningSuite -Dparquet.version=1.11.1 test -pl sql/core -am
```

**Before**

```
Run completed in 3 minutes, 13 seconds.
Total number of tests run: 134
Suites: completed 2, aborted 0
Tests: succeeded 120, failed 14, canceled 0, ignored 0, pending 0
*** 14 TESTS FAILED ***
```

**After**

```
Run completed in 3 minutes, 46 seconds.
Total number of tests run: 134
Suites: completed 2, aborted 0
Tests: succeeded 134, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #30652 from LuciferYang/SPARK-33673.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(commit: cd0356d)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetScanBuilder.scala (diff)
Commit bf2c88ccaebd8e27d9fc27c55c9955129541d3e1 by dongjoon
[SPARK-33716][K8S] Fix potential race condition during pod termination

### What changes were proposed in this pull request?

Check that the pod state is not pending or running even if there is a deletion timestamp.

### Why are the changes needed?

This can occur when the pod state and deletion timestamp are not updated by etcd in sync & we get a pod snapshot during an inconsistent view.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manual testing with local version of Minikube on an overloaded computer that caused out of sync updates.

Closes #30693 from holdenk/SPARK-33716-decommissioning-race-condition-during-pod-snapshot.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: bf2c88c)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala (diff)
Commit 5f9a7fea06cbbb6bf2b40cc9b3aa4d539c996301 by wenchen
[SPARK-33428][SQL] Conv UDF use BigInt to avoid  Long value overflow

### What changes were proposed in this pull request?
Use Long value store  encode value will overflow and return unexpected result, use BigInt to replace Long value and make logical more simple.

### Why are the changes needed?
Fix value  overflow issue

### Does this PR introduce _any_ user-facing change?
People can sue `conf` function to convert value big then LONG.MAX_VALUE

### How was this patch tested?
Added UT

#### BenchMark
```
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements.  See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License.  You may obtain a copy of the License at
*
*    http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.spark.sql.execution.benchmark

import scala.util.Random

import org.apache.spark.benchmark.Benchmark
import org.apache.spark.sql.functions._
object ConvFuncBenchMark extends SqlBasedBenchmark {

  val charset =
    Array[String]("0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
      "A", "B", "C", "D", "E", "F", "G",
      "H", "I", "J", "K", "L", "M", "N",
      "O", "P", "Q", "R", "S", "T",
      "U", "V", "W", "X", "Y", "Z")

  def constructString(from: Int, length: Int): String = {
    val chars = charset.slice(0, from)
    (0 to length).map(x => {
      val v = Random.nextInt(from)
      chars(v)
    }).mkString("")
  }

  private def doBenchmark(cardinality: Long, length: Int, from: Int, toBase: Int): Unit = {
    spark.range(cardinality)
      .withColumn("str", lit(constructString(from, length)))
      .select(conv(col("str"), from, toBase))
      .noop()
  }

  /**
   * Main process of the whole benchmark.
   * Implementations of this method are supposed to use the wrapper method `runBenchmark`
   * for each benchmark scenario.
   */
  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
    val N = 1000000L
    val benchmark = new Benchmark("conv", N, output = output)
    benchmark.addCase("length 10 from 2 to 16") { _ =>
      doBenchmark(N, 10, 2, 16)
    }

    benchmark.addCase("length 10 from 2 to 10") { _ =>
      doBenchmark(N, 10, 2, 10)
    }

    benchmark.addCase("length 10 from 10 to 16") { _ =>
      doBenchmark(N, 10, 10, 16)
    }

    benchmark.addCase("length 10 from 10 to 36") { _ =>
      doBenchmark(N, 10, 10, 36)
    }

    benchmark.addCase("length 10 from 16 to 10") { _ =>
      doBenchmark(N, 10, 10, 10)
    }

    benchmark.addCase("length 10 from 16 to 36") { _ =>
      doBenchmark(N, 10, 16, 36)
    }

    benchmark.addCase("length 10 from 36 to 10") { _ =>
      doBenchmark(N, 10, 36, 10)
    }

    benchmark.addCase("length 10 from 36 to 16") { _ =>
      doBenchmark(N, 10, 36, 16)
    }

    //
    benchmark.addCase("length 20 from 10 to 16") { _ =>
      doBenchmark(N, 20, 10, 16)
    }

    benchmark.addCase("length 20 from 10 to 36") { _ =>
      doBenchmark(N, 20, 10, 36)
    }

    benchmark.addCase("length 30 from 10 to 16") { _ =>
      doBenchmark(N, 30, 10, 16)
    }

    benchmark.addCase("length 30 from 10 to 36") { _ =>
      doBenchmark(N, 30, 10, 36)
    }

    //
    benchmark.addCase("length 20 from 16 to 10") { _ =>
      doBenchmark(N, 20, 16, 10)
    }

    benchmark.addCase("length 20 from 16 to 36") { _ =>
      doBenchmark(N, 20, 16, 36)
    }

    benchmark.addCase("length 30 from 16 to 10") { _ =>
      doBenchmark(N, 30, 16, 10)
    }

    benchmark.addCase("length 30 from 16 to 36") { _ =>
      doBenchmark(N, 30, 16, 36)
    }

    benchmark.run()
  }

}
```

Result with patch :
```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.14.6
Intel(R) Core(TM) i5-8259U CPU  2.30GHz
conv:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
length 10 from 2 to 16                               54             73          18         18.7          53.6       1.0X
length 10 from 2 to 10                               43             47           5         23.5          42.5       1.3X
length 10 from 10 to 16                              39             47          12         25.5          39.2       1.4X
length 10 from 10 to 36                              38             42           3         26.5          37.7       1.4X
length 10 from 16 to 10                              39             41           3         25.7          38.9       1.4X
length 10 from 16 to 36                              36             41           4         27.6          36.3       1.5X
length 10 from 36 to 10                              38             40           2         26.3          38.0       1.4X
length 10 from 36 to 16                              37             39           2         26.8          37.2       1.4X
length 20 from 10 to 16                              36             39           2         27.4          36.5       1.5X
length 20 from 10 to 36                              37             39           2         27.2          36.8       1.5X
length 30 from 10 to 16                              37             39           2         27.0          37.0       1.4X
length 30 from 10 to 36                              36             38           2         27.5          36.3       1.5X
length 20 from 16 to 10                              35             38           2         28.3          35.4       1.5X
length 20 from 16 to 36                              34             38           3         29.2          34.3       1.6X
length 30 from 16 to 10                              38             40           2         26.3          38.1       1.4X
length 30 from 16 to 36                              37             38           1         27.2          36.8       1.5X
```
Result without patch:
```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_191-b12 on Mac OS X 10.14.6
Intel(R) Core(TM) i5-8259U CPU  2.30GHz
conv:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
length 10 from 2 to 16                               66            101          29         15.1          66.1       1.0X
length 10 from 2 to 10                               50             55           5         20.2          49.5       1.3X
length 10 from 10 to 16                              46             51           5         21.8          45.9       1.4X
length 10 from 10 to 36                              43             48           4         23.4          42.7       1.5X
length 10 from 16 to 10                              44             47           4         22.9          43.7       1.5X
length 10 from 16 to 36                              40             44           2         24.7          40.5       1.6X
length 10 from 36 to 10                              40             44           4         25.0          40.1       1.6X
length 10 from 36 to 16                              41             43           2         24.3          41.2       1.6X
length 20 from 10 to 16                              39             41           2         25.7          38.9       1.7X
length 20 from 10 to 36                              40             42           2         24.9          40.2       1.6X
length 30 from 10 to 16                              39             40           1         25.9          38.6       1.7X
length 30 from 10 to 36                              40             41           1         25.0          40.0       1.7X
length 20 from 16 to 10                              40             41           1         25.1          39.8       1.7X
length 20 from 16 to 36                              40             42           2         25.2          39.7       1.7X
length 30 from 16 to 10                              39             42           2         25.6          39.0       1.7X
length 30 from 16 to 36                              39             40           2         25.7          38.8       1.7X
```

Closes #30350 from AngersZhuuuu/SPARK-33428.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 5f9a7fe)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/NumberConverterSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConverter.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala (diff)
The file was modifiedsql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala (diff)
Commit 839d6899adafd9a0695667656d00220d4665895d by wenchen
[SPARK-33733][SQL] PullOutNondeterministic should check and collect deterministic field

### What changes were proposed in this pull request?

The deterministic field is wider than `NonDerterministic`, we should keep same range between pull out and check analysis.

### Why are the changes needed?

For example
```
select * from values(1), (4) as t(c1) order by java_method('java.lang.Math', 'abs', c1)
```

We will get exception since `java_method` deterministic field is false but not a `NonDeterministic`
```
Exception in thread "main" org.apache.spark.sql.AnalysisException: nondeterministic expressions are only allowed in
Project, Filter, Aggregate or Window, found:
java_method('java.lang.Math', 'abs', t.`c1`) ASC NULLS FIRST
in operator Sort [java_method(java.lang.Math, abs, c1#1) ASC NULLS FIRST], true
               ;;
```

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

Add test.

Closes #30703 from ulysses-you/SPARK-33733.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 839d689)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CallMethodViaReflection.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala (diff)
Commit 82aca7eb8f2501dceaf610f1aaa86082153ef5ee by blue
[SPARK-33779][SQL] DataSource V2: API to request distribution and ordering on write

### What changes were proposed in this pull request?

This PR adds connector interfaces proposed in the [design doc](https://docs.google.com/document/d/1X0NsQSryvNmXBY9kcvfINeYyKC-AahZarUqg3nS1GQs/edit#) for SPARK-23889.

**Note**: This PR contains a subset of changes discussed in PR #29066.

### Why are the changes needed?

Data sources should be able to request a specific distribution and ordering of data on write. In particular, these scenarios are considered useful:
- global sort
- cluster data and sort within partitions
- local sort within partitions
- no sort

Please see the design doc above for a more detailed explanation of requirements.

### Does this PR introduce _any_ user-facing change?

This PR introduces public changes to the DS V2 by adding a logical write abstraction as we have on the read path as well as additional interfaces to represent distribution and ordering of data (please see the doc for more info).

The existing `Distribution` interface in `read` package is read-specific and not flexible enough like discussed in the design doc. The current proposal is to evolve these interfaces separately until they converge.

### How was this patch tested?

This patch adds only interfaces.

Closes #30706 from aokolnychyi/spark-23889-interfaces.

Authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
Signed-off-by: Ryan Blue <blue@apache.org>
(commit: 82aca7e)
The file was addedsql/catalyst/src/main/scala/org/apache/spark/sql/connector/distributions/distributions.scala
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/NullOrdering.java
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/Distribution.java
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/Distributions.java
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/SortDirection.java
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/SortOrder.java
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/OrderedDistribution.java
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/UnspecifiedDistribution.java
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/write/Write.java
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Expressions.java (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/write/WriteBuilder.java (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/connector/expressions/expressions.scala (diff)
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/write/RequiresDistributionAndOrdering.java
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/distributions/ClusteredDistribution.java
Commit bb60fb1bbd97b70d60e42a0435e15862c3e3f97e by dongjoon
[SPARK-33779][SQL][FOLLOW-UP] Fix Java Linter error

### What changes were proposed in this pull request?

This PR removes unused imports.

### Why are the changes needed?

These changes are required to fix the build.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Via `dev/lint-java`.

Closes #30767 from aokolnychyi/fix-linter.

Authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: bb60fb1)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/write/WriteBuilder.java (diff)
Commit 5885cc15cae9c9780530e235d2bd4bd6beda5dbb by hkarau
[SPARK-33261][K8S] Add a developer API for custom feature steps

### What changes were proposed in this pull request?

Add a developer API for custom driver & executor feature steps.

### Why are the changes needed?

While we allow templates for the basis of pod creation, some deployments need more flexibility in how the pods are configured. This adds a developer API for custom deployments.

### Does this PR introduce _any_ user-facing change?

New developer API.

### How was this patch tested?

Extended tests to verify custom step is applied when configured.

Closes #30206 from holdenk/SPARK-33261-allow-people-to-extend-pod-feature-steps.

Authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
(commit: 5885cc1)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkPod.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilder.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilderSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/KubernetesFeatureConfigStep.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilder.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/PodBuilderSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/submit/KubernetesDriverBuilderSuite.scala (diff)
Commit 412d86e711188ff1bd8a6387524131aa3c200503 by dongjoon
[SPARK-33771][SQL][TESTS] Fix Invalid value for HourOfAmPm when testing on JDK 14

### What changes were proposed in this pull request?

This pr fix invalid value for HourOfAmPm when testing on JDK 14.

### Why are the changes needed?

Run test on JDK 14.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #30754 from wangyum/SPARK-33771.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 412d86e)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/TimestampFormatterSuite.scala (diff)
Commit f156718587fc33b9bf8e5abc4ae1f6fa0a5da887 by dongjoon
[SPARK-33777][SQL] Sort output of V2 SHOW PARTITIONS

### What changes were proposed in this pull request?
List partitions returned by the V2 `SHOW PARTITIONS` command in alphabetical order.

### Why are the changes needed?
To have the same behavior as:
1. V1 in-memory catalog, see https://github.com/apache/spark/blob/a28ed86a387b286745b30cd4d90b3d558205a5a7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala#L546
2. V1 Hive catalogs, see https://github.com/apache/spark/blob/fab2995972761503563fa2aa547c67047c51bd33/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L715

### Does this PR introduce _any_ user-facing change?
Yes, after the changes, V2 SHOW PARTITIONS sorts its output.

### How was this patch tested?
Added new UT to the base trait `ShowPartitionsSuiteBase` which contains tests for V1 and V2.

Closes #30764 from MaxGekk/sort-show-partitions.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: f156718)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowPartitionsExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala (diff)
Commit 49d3256497cb47d03a3167a550fb9857bd3afdbd by dongjoon
[SPARK-33653][SQL] DSv2: REFRESH TABLE should recache the table itself

### What changes were proposed in this pull request?

This changes DSv2 refresh table semantics to also recache the target table itself.

### Why are the changes needed?

Currently "REFRESH TABLE" in DSv2 only invalidate all caches referencing the table. With #30403 merged which adds support for caching a DSv2 table, we should also recache the target table itself to make the behavior consistent with DSv1.

### Does this PR introduce _any_ user-facing change?

Yes, now refreshing table in DSv2 also recache the target table itself.
### How was this patch tested?

Added coverage of this new behavior in the existing UT for v2 refresh table command

Closes #30742 from sunchao/SPARK-33653.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 49d3256)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/RefreshTableExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
Commit a99a47ca1df689377dbfbf4dd7258f59aee2be44 by gurwls223
[SPARK-33748][K8S] Respect environment variables and configurations for Python executables

### What changes were proposed in this pull request?

This PR proposes:

- Respect `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, or `spark.pyspark.python` and `spark.pyspark.driver.python` configurations in Kubernates just like other cluster types in Spark.

- Depreate `spark.kubernetes.pyspark.pythonVersion` and guide users to set the environment variables and configurations for Python executables.
    NOTE that `spark.kubernetes.pyspark.pythonVersion` is already a no-op configuration without this PR. Default is `3` and other values are disallowed.

- In order for Python executable settings to be consistently used, fix `spark.archives` option to unpack into the current working directory in the driver of Kubernates' cluster mode. This behaviour is identical with Yarn's cluster mode. By doing this, users can leverage Conda or virtuenenv in cluster mode as below:

   ```python
    conda create -y -n pyspark_conda_env -c conda-forge pyarrow pandas conda-pack
    conda activate pyspark_conda_env
    conda pack -f -o pyspark_conda_env.tar.gz
    PYSPARK_PYTHON=./environment/bin/python spark-submit --archives pyspark_conda_env.tar.gz#environment app.py
   ```

- Removed several unused or useless codes such as `extractS3Key` and `renameResourcesToLocalFS`

### Why are the changes needed?

- To provide a consistent support of PySpark by using `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, or `spark.pyspark.python` and `spark.pyspark.driver.python` configurations.
- To provide Conda and virtualenv support via `spark.archives` options.

### Does this PR introduce _any_ user-facing change?

Yes:

- `spark.kubernetes.pyspark.pythonVersion` is deprecated.
- `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables, and `spark.pyspark.python` and `spark.pyspark.driver.python` configurations are respected.

### How was this patch tested?

Manually tested via:

```bash
minikube delete
minikube start --cpus 12 --memory 16384
kubectl create namespace spark-integration-test
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark
  namespace: spark-integration-test
EOF
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=spark-integration-test:spark --namespace=spark-integration-test
dev/make-distribution.sh --pip --tgz -Pkubernetes
resource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh --spark-tgz `pwd`/spark-3.2.0-SNAPSHOT-bin-3.2.0.tgz  --service-account spark --namespace spark-integration-test
```

Unittests were also added.

Closes #30735 from HyukjinKwon/SPARK-33748.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: a99a47c)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala (diff)
The file was modifieddocs/running-on-kubernetes.md (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/DriverCommandFeatureStepSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala (diff)
The file was addedresource-managers/kubernetes/integration-tests/tests/python_executable_check.py
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/Utils.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/tests/py_container_checks.py (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ProcessUtils.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/DriverCommandFeatureStep.scala (diff)
Commit 366beda54a2911e59a994bfed9fb84a97aa2ab8b by wenchen
[SPARK-33785][SQL] Migrate ALTER TABLE ... RECOVER PARTITIONS to use UnresolvedTable to resolve the identifier

### What changes were proposed in this pull request?

This PR proposes to migrate `ALTER TABLE ... RECOVER PARTITIONS` to use `UnresolvedTable` to resolve the table identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in [JIRA](https://issues.apache.org/jira/browse/SPARK-29900) or [proposal doc](https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing).

Note that `ALTER TABLE ... RECOVER PARTITIONS` is not supported for v2 tables.

### Why are the changes needed?

The PR makes the resolution consistent behavior consistent. For example,
```scala
sql("CREATE DATABASE test")
sql("CREATE TABLE spark_catalog.test.t (id bigint, val string) USING csv PARTITIONED BY (id)")
sql("CREATE TEMPORARY VIEW t AS SELECT 2")
sql("USE spark_catalog.test")
sql("ALTER TABLE t RECOVER PARTITIONS") // works fine
```
, but after this PR:
```
sql("ALTER TABLE t RECOVER PARTITIONS")
org.apache.spark.sql.AnalysisException: t is a temp view. 'ALTER TABLE ... RECOVER PARTITIONS' expects a table; line 1 pos 0
```
, which is the consistent behavior with other commands.

### Does this PR introduce _any_ user-facing change?

After this PR, `ALTER TABLE t RECOVER PARTITIONS` in the above example is resolved to a temp view `t` first instead of `spark_catalog.test.t`.

### How was this patch tested?

Updated existing tests.

Closes #30773 from imback82/alter_table_recover_part_v2.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 366beda)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/AlterTablePartitionV2SQLSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
Commit 141e26d65ba92c96ce1aeaf4d93dc0bfbafda902 by wenchen
[SPARK-33767][SQL][TESTS] Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests

### What changes were proposed in this pull request?
1. Move the `ALTER TABLE .. DROP PARTITION` parsing tests to `AlterTableDropPartitionParserSuite`
2. Place v1 tests for `ALTER TABLE .. DROP PARTITION` from `DDLSuite` and v2 tests from `AlterTablePartitionV2SQLSuite` to the common trait `AlterTableDropPartitionSuiteBase`, so, the tests will run for V1, Hive V1 and V2 DS.

### Why are the changes needed?
- The unification will allow to run common `ALTER TABLE .. DROP PARTITION` tests for both DSv1 and Hive DSv1, DSv2
- We can detect missing features and differences between DSv1 and DSv2 implementations.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running new test suites:
```
$ build/sbt -Phive -Phive-thriftserver "test:testOnly *AlterTableDropPartitionParserSuite"
$ build/sbt -Phive -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
```

Closes #30747 from MaxGekk/unify-alter-table-drop-partition-tests.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 141e26d)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableDropPartitionSuiteBase.scala
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableDropPartitionSuite.scala
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableDropPartitionSuite.scala
The file was addedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/AlterTableDropPartitionSuite.scala
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableDropPartitionParserSuite.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/AlterTablePartitionV2SQLSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
Commit 03042529e3c7bfd03185e5d751086173766926c3 by gurwls223
[SPARK-33273][SQL] Fix a race condition in subquery execution

### What changes were proposed in this pull request?

If we call `SubqueryExec.executeTake`, it will call `SubqueryExec.execute` which will trigger the codegen of the query plan and create an RDD. However, `SubqueryExec` already has a thread (`SubqueryExec.relationFuture`) to execute the query plan, which means we have 2 threads triggering codegen of the same query plan at the same time.

Spark codegen is not thread-safe, as we have places like `HashAggregateExec.bufferVars` that is a shared variable. The bug in `SubqueryExec` may lead to correctness bugs.

Since https://issues.apache.org/jira/browse/SPARK-33119, `ScalarSubquery` will call `SubqueryExec.executeTake`, so flaky tests start to appear.

This PR fixes the bug by reimplementing https://github.com/apache/spark/pull/30016 . We should pass the number of rows we want to collect to `SubqueryExec` at planning time, so that we can use `executeTake` inside `SubqueryExec.relationFuture`, and the caller side should always call `SubqueryExec.executeCollect`. This PR also adds checks so that we can make sure only `SubqueryExec.executeCollect` is called.

### Why are the changes needed?

fix correctness bug.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

run `build/sbt "sql/testOnly *SQLQueryTestSuite  -- -z scalar-subquery-select"` more than 10 times. Previously it fails, now it passes.

Closes #30765 from cloud-fan/bug.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 0304252)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala (diff)
Commit 20f6d63bc109284f6f9daf5da20cb2fef560628a by gurwls223
[SPARK-33769][SQL] Improve the next-day function of the sql component to deal with Column type

### What changes were proposed in this pull request?

The proposition of this pull request is described in this JIRA ticket: [https://issues.apache.org/jira/browse/SPARK-33769](url)

It proposes to improve the next-day function of the sql component to deal with Column type for the parameter dayOfWeek.

### Why are the changes needed?

It makes this functionality easier to use.
Actually the signature of this function is:
> def next_day(date: Column, dayOfWeek: String): Column.

It accepts the dayOfWeek parameter as a String. However in some cases, the dayOfWeek is in a Column, so a different value for each row of the dataframe.
A current workaround is to use the NextDay function like this:
> NextDay(dateCol.expr, dayOfWeekCol.expr).

The proposition is to add another signature for this function:
> def next_day(date: Column, dayOfWeek: Column): Column

In fact it is already the case for some other functions in this scala object, exemple:
> def date_sub(start: Column, days: Int): Column = date_sub(start, lit(days))
> def date_sub(start: Column, days: Column): Column = withExpr \{ DateSub(start.expr, days.expr) }

or

> def add_months(startDate: Column, numMonths: Int): Column = add_months(startDate, lit(numMonths))
> def add_months(startDate: Column, numMonths: Column): Column = withExpr {
>  AddMonths(startDate.expr, numMonths.expr)
>  }

This pull request is the same idea for the function next_day.

### Does this PR introduce _any_ user-facing change?

Yes
With this pull request, users of spark will have a new signature of the function:
> def next_day(date: Column, dayOfWeek: Column): Column

But the existing function signature should still work:
> def next_day(date: Column, dayOfWeek: String): Column

So this change should be retrocompatible.

### How was this patch tested?

The unit tests of the next_day function has been enhanced.
It tests the dayOfWeek parameter both as String and Column.
I also added a test case for the existing signature where the dayOfWeek is a non valid String. This should return null.

Closes #30761 from chongguang/SPARK-33769.

Authored-by: Chongguang LIU <chongguang.liu@laposte.fr>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 20f6d63)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala (diff)
Commit 58cb2bae747a09caff194007b5c60f19b84f7c40 by gurwls223
[SPARK-33752][SQL] Avoid the getSimpleMessage of AnalysisException adds semicolon repeatedly

### What changes were proposed in this pull request?
The current `getSimpleMessage` of `AnalysisException` may adds semicolon repeatedly. There show an example below:
`select decode()`

The output will be:
```
org.apache.spark.sql.AnalysisException
Invalid number of arguments for function decode. Expected: 2; Found: 0;; line 1 pos 7
```

### Why are the changes needed?
Fix a bug, because it adds semicolon repeatedly.

### Does this PR introduce _any_ user-facing change?
Yes. the message of AnalysisException will be correct.

### How was this patch tested?
Jenkins test.

Closes #30724 from beliefer/SPARK-33752.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 58cb2ba)
The file was modifiedsql/core/src/test/resources/sql-tests/results/cte-nested.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/sources/PartitionedWriteSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/AnalysisException.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/sources/BucketedWriteSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/describe-table-column.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/select_having.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/group-by-filter.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/aggregates_part1.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/csv-functions.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-basic.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowTablesSuiteBase.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime-legacy.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/regexp-functions.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/widenSetOperationTypes.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/strings.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-except-all.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/intersect-all.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/json-functions.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/string-functions.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-aggregates_part1.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/datetime.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/show_columns.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/count.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/show-views.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-window.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/subquery/negative-cases/invalid-correlation.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/show-tables.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/except-all.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/window_part3.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/with.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/group-by-ordinal.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/parse-schema-string.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/change-column.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/extract.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/limit.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-intersect-all.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/describe.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/grouping_set.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/limit.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/window.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/columnresolution-negative.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/aggregates_part3.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-aggregates_part3.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/numeric.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/create_view.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/group-by.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-select_having.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-pivot.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/pivot.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/subquery/negative-cases/subq-input-typecheck.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-group-analytics.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-group-by.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/group-analytics.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/having.sql.out (diff)
Commit 23083aa594360938c611a45794405d81e59ecaf1 by wenchen
[SPARK-33758][SQL] Prune unrequired partitionings from AliasAwareOutputPartitionings when some columns are dropped from projection

### What changes were proposed in this pull request?
This PR tries to prune the unrequired output partitionings in cases when the columns are dropped from Project/Aggregates etc.

### Why are the changes needed?
Consider this query:
    select t1.id from t1 JOIN t2 on t1.id = t2.id

This query will have top level Project node which will just project t1.id. But the outputPartitioning of this project node will be: PartitioningCollection(HashPartitioning(t1.id), HashPartitioning(t2.id)).

But since we are not propagating t2.id column, so we can drop HashPartitioning(t2.id) from the output partitioning of Project node.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UTs.

Closes #30762 from prakharjain09/SPARK-33758-prune-partitioning.

Authored-by: Prakhar Jain <prakharjain09@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 23083aa)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputExpression.scala (diff)
Commit 40c37d69fd003ed6079ee8c139dba5c15915c568 by wenchen
[SPARK-33617][SQL][FOLLOWUP] refine the default parallelism SQL config

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/30559 . The default parallelism config in Spark core is not good, as it's unclear where it applies. To not inherit this problem in Spark SQL, this PR refines the default parallelism SQL config, to make it clear that it only applies to leaf nodes.

### Why are the changes needed?

Make the config clearer.

### Does this PR introduce _any_ user-facing change?

It changes an unreleased config.

### How was this patch tested?

existing tests

Closes #30736 from cloud-fan/follow.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 40c37d6)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SchemaMergeUtils.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/LocalTableScanExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FilePartition.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SparkPlanSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala (diff)
Commit 4d56d438386049b5f481ec83b69e3c89807be201 by dongjoon
[SPARK-33735][SQL] Handle UPDATE in ReplaceNullWithFalseInPredicate

### What changes were proposed in this pull request?

This PR adds `UpdateTable` to supported plans in `ReplaceNullWithFalseInPredicate`.

### Why are the changes needed?

This change allows Spark to optimize update conditions like we optimize filters.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This PR extends the existing test cases to also cover `UpdateTable`.

Closes #30787 from aokolnychyi/spark-33735.

Authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 4d56d43)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala (diff)
Commit 87c58367cd8b1815feef754695631ce08c3cde8b by dongjoon
[SPARK-22256][MESOS] Introduce spark.mesos.driver.memoryOverhead

### What changes were proposed in this pull request?
This is a simple change to support allocating a specified amount of overhead memory for the driver's mesos container.  This is already supported for executors.

### Why are the changes needed?
This is needed to keep the driver process from exceeding memory limits and being killed off when running on mesos.

### Does this PR introduce _any_ user-facing change?
Yes, it adds a `spark.mesos.driver.memoryOverhead` configuration option.  Documentation changes for this option are included in the PR.

### How was this patch tested?
Test cases covering allocation of driver memory overhead are included in the changes.

### Other notes
This is a second attempt to get this change reviewed, accepted and merged.  The original pull request was closed as stale back in January: https://github.com/apache/spark/pull/21006.
For this pull request, I took the original change by pmackles, rebased it onto the current master branch, and added a test case that was requested in the original code review.
I'm happy to make any further edits or do anything needed so that this can be included in a future spark release.  I keep having to build custom spark distributions so that we can use spark within our mesos clusters.

Closes #30739 from dmcwhorter/dmcwhorter-SPARK-22256.

Lead-authored-by: David McWhorter <david_mcwhorter@premierinc.com>
Co-authored-by: Paul Mackles <pmackles@adobe.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 87c5836)
The file was modifieddocs/running-on-mesos.md (diff)
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/config.scala (diff)
The file was modifiedresource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala (diff)
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala (diff)
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala (diff)
Commit 3dfdcf4f92ef5e739f15c22c93d673bb2233e617 by gurwls223
[SPARK-33788][SQL] Throw NoSuchPartitionsException from HiveExternalCatalog.dropPartitions()

### What changes were proposed in this pull request?
Throw `NoSuchPartitionsException` from `ALTER TABLE .. DROP TABLE` for not existing partitions of a table in V1 Hive external catalog.

### Why are the changes needed?
The behaviour of Hive external catalog deviates from V1/V2 in-memory catalogs that throw `NoSuchPartitionsException`. To improve user experience with Spark SQL, it would be better to throw the same exception.

### Does this PR introduce _any_ user-facing change?
Yes, the command throws `NoSuchPartitionsException` instead of the general exception `AnalysisException`.

### How was this patch tested?
By running tests for `ALTER TABLE .. DROP PARTITION`:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
```

Closes #30778 from MaxGekk/hive-drop-partition-exception.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 3dfdcf4)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/AlterTableDropPartitionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableDropPartitionSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableDropPartitionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableDropPartitionSuiteBase.scala (diff)
Commit dd042f58e7a0fd2289f6889c324c0d5e4c18ad7f by gurwls223
[SPARK-33796][DOCS] Show hidden text from the left menu of Spark Doc

### What changes were proposed in this pull request?

If the text in the left menu of Spark is too long, it will be hidden.
![sql1](https://user-images.githubusercontent.com/1097932/102249583-5ae7a580-3eb7-11eb-813c-f2e2fe019d28.jpeg)

This PR is to fix the style issue.

### Why are the changes needed?

Improve the UI of Spark documentation.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Manual test
After changes:
![sql2](https://user-images.githubusercontent.com/1097932/102249603-5fac5980-3eb7-11eb-806d-4e7b8248e6b6.jpeg)

Closes #30786 from gengliangwang/fixDocStyle.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: dd042f5)
The file was modifieddocs/css/main.css (diff)
Commit ddff94fd32f85072cbc5c752c337f3b89ae00bed by gurwls223
[SPARK-33793][TESTS] Introduce withExecutor to ensure proper cleanup in tests

### What changes were proposed in this pull request?
This PR introduces a helper method `withExecutor` that handles the creation of an Executor object and ensures that it is always stopped in a finally block. The tests in ExecutorSuite have been refactored to use this method.

### Why are the changes needed?
Recently an issue was discovered that leaked Executors (which are not explicitly stopped after a test) can cause other tests to fail due to the JVM being killed after 10 min. It is therefore crucial that tests always stop the Executor. By introducing this helper method, a simple pattern is established that can be easily adopted in new tests, which reduces the risk of regressions.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Run the ExecutorSuite locally.

Closes #30783 from sander-goos/SPARK-33793-close-executors.

Authored-by: Sander Goos <sander.goos@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: ddff94f)
The file was modifiedcore/src/test/scala/org/apache/spark/executor/ExecutorSuite.scala (diff)