Changes

Summary

  1. [SPARK-34434][SQL] Mention DS rebase options in `SparkUpgradeException` (commit: aca6db1) (details)
  2. [SPARK-34431][CORE] Only load `hive-site.xml` once (commit: 4fd3247) (details)
  3. [SPARK-34424][SQL][TESTS] Fix failures of HiveOrcHadoopFsRelationSuite (commit: 0316105) (details)
  4. [SPARK-33210][SQL][DOCS][FOLLOWUP] Fix descriptions of the SQL configs (commit: 1a11fe5) (details)
  5. [SPARK-34426][K8S][TESTS] Add driver and executors POD logs to (commit: 5f91245) (details)
  6. [SPARK-34333][SQL] Fix PostgresDialect to handle money types properly (commit: dd6383f) (details)
  7. [SPARK-34451][SQL] Add alternatives for datetime rebasing SQL configs (commit: 5957bc1) (details)
  8. [SPARK-33763] Add metrics for better tracking of dynamic allocation (commit: 76e5d75) (details)
  9. [SPARK-34446][SS][DOCS] Update doc for stream-stream join (full outer + (commit: a575e80) (details)
  10. [SPARK-34456][SQL] Remove unused write options from BatchWriteHelper (commit: 44a9aed) (details)
  11. [SPARK-33736][SQL] Handle MERGE in ReplaceNullWithFalseInPredicate (commit: 1ad3432) (details)
  12. [SPARK-34433][DOCS] Lock Jekyll version by Gemfile and Bundler (commit: bdcad33) (details)
  13. [SPARK-34455][SQL] Deprecate (commit: 7b549c3) (details)
  14. [SPARK-34437][SQL][DOCS] Update Spark SQL guide about the rebasing DS (commit: b58f097) (details)
  15. [SPARK-34449][BUILD] Upgrade Jetty to fix CVE-2020-27218 (commit: 5167228) (details)
  16. [SPARK-34393][SQL] Unify output of SHOW VIEWS and pass output attributes (commit: c925e4d) (details)
  17. [SPARK-34394][SQL] Unify output of SHOW FUNCTIONS and pass output (commit: edccf96) (details)
  18. [SPARK-33739][SQL] Jobs committed through the S3A Magic committer don't (commit: ff5115c) (details)
  19. [SPARK-32703][SQL] Replace deprecated API calls from (commit: 2787328) (details)
  20. [SPARK-34454][SQL] Mark legacy SQL configs as internal (commit: 8f7ec4b) (details)
  21. [SPARK-34467][BUILD] Upgrade Zstd-jni to 1.4.8-4 (commit: 331c6fd) (details)
  22. [SPARK-34465][SQL] Rename v2 alter table exec nodes (commit: cad469d) (details)
  23. [MINOR][SQL][DOCS] Fix the comments in the example at window function (commit: 26548ed) (details)
Commit aca6db1868c7fd7632b490e5f20254989b270e8e by dhyun
[SPARK-34434][SQL] Mention DS rebase options in `SparkUpgradeException`

### What changes were proposed in this pull request?
Mention the DS options introduced by https://github.com/apache/spark/pull/31529 and by https://github.com/apache/spark/pull/31489 in `SparkUpgradeException`.

### Why are the changes needed?
To improve user experience with Spark SQL. Before the changes, the error message recommends to set SQL configs but the configs cannot help in the some situations (see the PRs for more details).

### Does this PR introduce _any_ user-facing change?
Yes. After the changes, the error message is:

_org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: reading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z from Parquet files can be ambiguous, as the files may be written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set the SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInRead' or the datasource option 'datetimeRebaseMode' to 'LEGACY' to rebase the datetime values w.r.t. the calendar difference during reading. To read the datetime values as it is, set the SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInRead' or the datasource option 'datetimeRebaseMode' to 'CORRECTED'._

### How was this patch tested?
1. By checking coding style: `./dev/scalastyle`
2. By running the related test suite:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ParquetRebaseDatetimeV1Suite"
```

Closes #31562 from MaxGekk/rebase-upgrade-exception.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: aca6db1)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala (diff)
Commit 4fd3247bca400f31b0175813df811352b906acbf by dhyun
[SPARK-34431][CORE] Only load `hive-site.xml` once

### What changes were proposed in this pull request?
Lazily load Hive's configuration properties from `hive-site.xml` only once.

### Why are the changes needed?
It is expensive to parse the same file over and over.

### Does this PR introduce _any_ user-facing change?
Should not. The changes can improve performance slightly.

### How was this patch tested?
By existing test suites such as `SparkContextSuite`.

Closes #31556 from MaxGekk/load-hive-site-once.

Authored-by: herman <herman@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 4fd3247)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala (diff)
Commit 03161055de0c132070354407160553363175c4d7 by gurwls223
[SPARK-34424][SQL][TESTS] Fix failures of HiveOrcHadoopFsRelationSuite

### What changes were proposed in this pull request?
Modify `RandomDataGenerator.forType()` to allow generation of dates/timestamps that are valid in both Julian and Proleptic Gregorian calendars. Currently, the function can produce a date (for example `1582-10-06`) which is valid in the Proleptic Gregorian calendar. Though it cannot be saved to ORC files AS IS since ORC format (ORC libs in fact) assumes Julian calendar. So, Spark shifts `1582-10-06` to the next valid date `1582-10-15` while saving it to ORC files. And as a consequence of that, the test fails because it compares original date `1582-10-06` and the date `1582-10-15` loaded back from the ORC files.

In this PR, I propose to generate valid dates/timestamps in both calendars for ORC datasource till SPARK-34440 is resolved.

### Why are the changes needed?
The changes fix failures of `HiveOrcHadoopFsRelationSuite`. For instance, the test "test all data types" fails with the seed **610710213676**:
```
== Results ==
!== Correct Answer - 20 ==    == Spark Answer - 20 ==
struct<index:int,col:date>   struct<index:int,col:date>
...
![9,1582-10-06]               [9,1582-10-15]
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the modified test suite:
```
$ build/sbt -Phive -Phive-thriftserver "test:testOnly *HiveOrcHadoopFsRelationSuite"
```

Closes #31552 from MaxGekk/fix-HiveOrcHadoopFsRelationSuite.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 0316105)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala (diff)
Commit 1a11fe55017a79016dd138dd2afb4edd0a6cef2f by gurwls223
[SPARK-33210][SQL][DOCS][FOLLOWUP] Fix descriptions of the SQL configs for the parquet INT96 rebase modes

### What changes were proposed in this pull request?
Fix descriptions of the SQL configs `spark.sql.legacy.parquet.int96RebaseModeInRead` and `spark.sql.legacy.parquet.int96RebaseModeInWrite`, and mention `EXCEPTION` as the default value.

### Why are the changes needed?
This fixes incorrect descriptions that can mislead users.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running `./dev/scalastyle`.

Closes #31557 from MaxGekk/int96-exception-by-default-followup.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 1a11fe5)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit 5f91245cc2242b9e9c7ba374594014aff3046d4a by kabhwan.opensource
[SPARK-34426][K8S][TESTS] Add driver and executors POD logs to integration tests log when the test fails

### What changes were proposed in this pull request?

This PR introduces a new protected method in `SparkFunSuite` which is only called when the test failed and can be used to collect logs for failed test. By this PR it is implemented in the Kubernetes tests by `KubernetesSuite` class where it collects all the POD logs and logs them out.

This unfortunately cannot be realized with a simple "after" method as in the "after" method the test outcome is not available.

Moreover this PR removes the `appLocator` as a method argument as `appLocator` is available as a member variable.

### Why are the changes needed?

Currently both the driver and executors logs are lost.

In [developer-tools](https://spark.apache.org/developer-tools.html) there is a hint:
"Getting logs from the pods and containers directly is an exercise left to the reader."

But when the test is executed by Jenkins and a failure happened we really need the POD logs to analyze problem.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

By integration testing. I have checked what would happen if one test fails, the output would be:

```
21/02/14 11:05:34.261 ScalaTest-main-running-KubernetesSuite INFO KubernetesSuite:

===== EXTRA LOGS FOR THE FAILED TEST

21/02/14 11:05:34.278 ScalaTest-main-running-KubernetesSuite INFO KubernetesSuite: BEGIN driver POD log
++ id -u
+ myuid=185
++ id -g
+ mygid=0
+ set +e
++ getent passwd 185
+ uidentry=
+ set -e
+ '[' -z '' ']'
+ '[' -w /etc/passwd ']'
+ echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z ']'
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.3 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///opt/spark/tests/decommissioning.py
21/02/14 10:02:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting decom test
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/02/14 10:02:29 INFO SparkContext: Running Spark version 3.2.0-SNAPSHOT
21/02/14 10:02:29 INFO ResourceUtils: ==============================================================
21/02/14 10:02:29 INFO ResourceUtils: No custom resources configured for spark.driver.
21/02/14 10:02:29 INFO ResourceUtils: ==============================================================
...
21/02/14 10:03:17 INFO ShutdownHookManager: Deleting directory /var/data/spark-fa6961ed-a2c1-444c-bfeb-20e63ba0b5cf/spark-ab4b0287-6e24-4b39-837e-9b0b62c1f26f
21/02/14 10:03:17 INFO ShutdownHookManager: Deleting directory /tmp/spark-d6b11e7d-6a03-4a1d-8559-37cb853319bf

21/02/14 11:05:34.279 ScalaTest-main-running-KubernetesSuite INFO KubernetesSuite: END driver POD log
```

Closes #31561 from attilapiros/SPARK-34426.

Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(commit: 5f91245)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/SparkConfPropagateSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/RTestsSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/PythonTestsSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/SparkFunSuite.scala (diff)
Commit dd6383f0a3917346c0e29252d3e160ed9a77144c by sarutak
[SPARK-34333][SQL] Fix PostgresDialect to handle money types properly

### What changes were proposed in this pull request?

This PR changes the type mapping for `money` and `money[]`  types for PostgreSQL.
Currently, those types are tried to convert to `DoubleType` and `ArrayType` of `double` respectively.
But the JDBC driver seems not to be able to handle those types properly.

https://github.com/pgjdbc/pgjdbc/issues/100
https://github.com/pgjdbc/pgjdbc/issues/1405

Due to these issue, we can get the error like as follows.

money type.
```
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (192.168.1.204 executor driver): org.postgresql.util.PSQLException: Bad value for type double : 1,000.00
[info] at org.postgresql.jdbc.PgResultSet.toDouble(PgResultSet.java:3104)
[info] at org.postgresql.jdbc.PgResultSet.getDouble(PgResultSet.java:2432)
[info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$5(JdbcUtils.scala:418)
```

money[] type.
```
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (192.168.1.204 executor driver): org.postgresql.util.PSQLException: Bad value for type double : $2,000.00
[info] at org.postgresql.jdbc.PgResultSet.toDouble(PgResultSet.java:3104)
[info] at org.postgresql.jdbc.ArrayDecoding$5.parseValue(ArrayDecoding.java:235)
[info] at org.postgresql.jdbc.ArrayDecoding$AbstractObjectStringArrayDecoder.populateFromString(ArrayDecoding.java:122)
[info] at org.postgresql.jdbc.ArrayDecoding.readStringArray(ArrayDecoding.java:764)
[info] at org.postgresql.jdbc.PgArray.buildArray(PgArray.java:310)
[info] at org.postgresql.jdbc.PgArray.getArrayImpl(PgArray.java:171)
[info] at org.postgresql.jdbc.PgArray.getArray(PgArray.java:111)
```

For money type, a known workaround is to treat it as string so this PR do it.
For money[], however, there is no reasonable workaround so this PR remove the support.

### Why are the changes needed?

This is a bug.

### Does this PR introduce _any_ user-facing change?

Yes. As of this PR merged, money type is mapped to `StringType` rather than `DoubleType` and the support for money[] is stopped.
For money type, if the value is less than one thousand,  `$100.00` for instance, it works without this change so I also updated the migration guide because it's a behavior change for such small values.
On the other hand, money[] seems not to work with any value but mentioned in the migration guide just in case.

### How was this patch tested?

New test.

Closes #31442 from sarutak/fix-for-money-type.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(commit: dd6383f)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala (diff)
Commit 5957bc18a1dfdeeca07d3aeb4fc941927b772fc1 by wenchen
[SPARK-34451][SQL] Add alternatives for datetime rebasing SQL configs and deprecate legacy configs

### What changes were proposed in this pull request?
Move the datetime rebase SQL configs from the `legacy` namespace by:
1. Renaming of the existing rebase configs like `spark.sql.legacy.parquet.datetimeRebaseModeInRead` -> `spark.sql.parquet.datetimeRebaseModeInRead`.
2. Add the legacy configs as alternatives
3. Deprecate the legacy rebase configs.

### Why are the changes needed?
The rebasing SQL configs like `spark.sql.legacy.parquet.datetimeRebaseModeInRead` can be used not only for migration from previous Spark versions but also to read/write datatime columns saved by other systems/frameworks/libs. So, the configs shouldn't be considered as legacy configs.

### Does this PR introduce _any_ user-facing change?
Should not. Users will see a warning if they still use one of the legacy configs.

### How was this patch tested?
1. Manually checking new configs:
```scala
scala> spark.conf.get("spark.sql.parquet.datetimeRebaseModeInRead")
res0: String = EXCEPTION

scala> spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY")
21/02/17 14:57:10 WARN SQLConf: The SQL config 'spark.sql.legacy.parquet.datetimeRebaseModeInRead' has been deprecated in Spark v3.2 and may be removed in the future. Use 'spark.sql.parquet.datetimeRebaseModeInRead' instead.

scala> spark.conf.get("spark.sql.parquet.datetimeRebaseModeInRead")
res2: String = LEGACY
```
2. By running a datetime rebasing test suite:
```
$ build/sbt "test:testOnly *ParquetRebaseDatetimeV1Suite"
```

Closes #31576 from MaxGekk/rebase-confs-alternatives.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 5957bc1)
The file was modifiedpython/pyspark/sql/readwriter.py (diff)
The file was modifiedexternal/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala (diff)
The file was modifiedexternal/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/sources/HadoopFsRelationTest.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala (diff)
The file was modifiedpython/pyspark/sql/streaming.py (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/AvroOutputWriter.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRebaseDatetimeSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala (diff)
Commit 76e5d75e369609b96248a34bc6015ec61936e652 by hkarau
[SPARK-33763] Add metrics for better tracking of dynamic allocation

### What changes were proposed in this pull request?

This PR adds the following metrics to track executor remove reasons during dynamic allocation:
-  `numberExecutorsGracefullyDecommissioned`: number of executors which reached the finished decommissioning state and shut itself down cleanly
- `numberExecutorsDecommissionUnfinished`: executors which requested to decommission but they stopped without reaching the finished decommissioning state
- `numberExecutorsKilledByDriver`: executors killed by the driver (requested to stop)
-  `numberExecutorsExitedUnexpectedly`: executors exited without driver request

### Why are the changes needed?

For supporting monitoring of dynamic allocation better with these metrics.

### Does this PR introduce _any_ user-facing change?

Yes. The new metrics will be available for monitoring.

### How was this patch tested?

With unit and integration tests.

Finally manually checked the new metrics in jconsole:
<img width="1054" alt="jmx" src="https://user-images.githubusercontent.com/2017933/107458686-de8adf00-6b54-11eb-86f7-41faf2fb638f.png">

Closes #31450 from attilapiros/SPARK-33763-final.

Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
(commit: 76e5d75)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitorSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/ExecutorLossReason.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala (diff)
Commit a575e805a18a515f8707f74cf2b22777474f2f06 by kabhwan.opensource
[SPARK-34446][SS][DOCS] Update doc for stream-stream join (full outer + left semi)

### What changes were proposed in this pull request?

Per discussion in https://issues.apache.org/jira/browse/SPARK-32883?focusedCommentId=17285057&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17285057, we should add documentation for added new features of full outer and left semi joins into SS programming guide.

* Reworded the section for "Outer Joins with Watermarking", to make it work for full outer join. Updated the code snippet to show up full outer and left semi join.
* Added one section for "Semi Joins with Watermarking", similar to "Outer Joins with Watermarking".
* Updated "Support matrix for joins in streaming queries" to reflect latest fact for full outer and left semi join.

### Why are the changes needed?

Good for users and developers to follow guide to try out these two new features.

### Does this PR introduce _any_ user-facing change?

Yes. They will see the corresponding updated guide.

### How was this patch tested?

No, just documentation change. Previewed the markdown file in browser.
Also attached here for the change to the "Support matrix for joins in streaming queries" table.

<img width="896" alt="Screen Shot 2021-02-16 at 8 12 07 PM" src="https://user-images.githubusercontent.com/4629931/108155275-73c92e80-7093-11eb-9f0b-c8b4bb7321e5.png">

Closes #31572 from c21/ss-doc.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(commit: a575e80)
The file was modifieddocs/structured-streaming-programming-guide.md (diff)
Commit 44a9aed0d7dc44fbb131899b73e0e9de002747d0 by dhyun
[SPARK-34456][SQL] Remove unused write options from BatchWriteHelper

### What changes were proposed in this pull request?

This PR removes dead code from `BatchWriteHelper` after SPARK-33808.

### Why are the changes needed?

These changes simplify `BatchWriteHelper` by removing write options that are no longer needed as we build `Write` earlier.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #31581 from aokolnychyi/simplify-batch-write-helper.

Authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 44a9aed)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V1FallbackWriters.scala (diff)
Commit 1ad343238cb82a79e51ae9d46ae704bc482ff437 by dhyun
[SPARK-33736][SQL] Handle MERGE in ReplaceNullWithFalseInPredicate

### What changes were proposed in this pull request?

This PR handles merge operations in `ReplaceNullWithFalseInPredicate`.

### Why are the changes needed?

These changes are needed to match what we already do for delete and update operations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This PR extends existing tests to cover merge operations.

Closes #31579 from aokolnychyi/spark-33736.

Authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 1ad3432)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala (diff)
Commit bdcad33d8bba2412a8c59322cf31dbcf23be2886 by gurwls223
[SPARK-34433][DOCS] Lock Jekyll version by Gemfile and Bundler

### What changes were proposed in this pull request?

Improving the documentation and release process by pinning Jekyll version by Gemfile and Bundler.

Some files and their responsibilities within this PR:
- `docs/.bundle/config` is used to specify a directory "docs/.local_ruby_bundle" which will be used as destination to install the ruby packages into instead of the global one which requires root access
- `docs/Gemfile` is specifying the required Jekyll version and other top level gem versions
- `docs/Gemfile.lock` is generated by the "bundle install". This file contains the exact resolved versions of all the gems including the top level gems and all the direct and transitive dependencies of those gems. When this file is generated it contains a platform related section "PLATFORMS" (in my case after the generation it was "universal-darwin-19"). Still this file must be under version control as when the version of a gem does not fit to the one specified in `Gemfile` an error comes (i.e. if the `Gemfile.lock` was generated for Jekyll 4.1.0 and its version is updated in the `Gemfile` to 4.2.0 then it triggers the error: "The bundle currently has jekyll locked at 4.1.0."). This is solution is also suggested officially in [its documentation](https://bundler.io/rationale.html#checking-your-code-into-version-control). To get rid of the specific platform (like "universal-darwin-19") first we have to add "ruby" as platform [which means this should work on every platform where Ruby runs](https://guides.rubygems.org/what-is-a-gem/)) by running "bundle lock --add-platform ruby" then the specific platform can be removed by "bundle lock --remove-platform universal-darwin-19".

After this the correct process to update Jekyll version is the following:
1. update the version in `Gemfile`
2. run "bundle update" which updates the `Gemfile.lock`
3. commit both files

This process for version update is tested for details please check the testing section.

### Why are the changes needed?

Using different Jekyll versions can generate different output documents.
This PR standardize the process.

### Does this PR introduce _any_ user-facing change?

No, assuming the release was done via docker by using `do-release-docker.sh`.
In that case  there should be no difference at all as the same Jekyll version is specified in the Gemfile.

### How was this patch tested?

#### Testing document generation

Doc generation step was triggered via  the docker release:

```
$ ./do-release-docker.sh -d ~/working -n -s docs
...
========================
= Building documentation...
Command: /opt/spark-rm/release-build.sh docs
Log file: docs.log
Skipping publish step.
```

The docs.log contains the followings:
```
Building Spark docs
Fetching gem metadata from https://rubygems.org/.........
Using bundler 2.2.9
Fetching rb-fsevent 0.10.4
Fetching forwardable-extended 2.6.0
Fetching public_suffix 4.0.6
Fetching colorator 1.1.0
Fetching eventmachine 1.2.7
Fetching http_parser.rb 0.6.0
Fetching ffi 1.14.2
Fetching concurrent-ruby 1.1.8
Installing colorator 1.1.0
Installing forwardable-extended 2.6.0
Installing rb-fsevent 0.10.4
Installing public_suffix 4.0.6
Installing http_parser.rb 0.6.0 with native extensions
Installing eventmachine 1.2.7 with native extensions
Installing concurrent-ruby 1.1.8
Fetching rexml 3.2.4
Fetching liquid 4.0.3
Installing ffi 1.14.2 with native extensions
Installing rexml 3.2.4
Installing liquid 4.0.3
Fetching mercenary 0.4.0
Installing mercenary 0.4.0
Fetching rouge 3.26.0
Installing rouge 3.26.0
Fetching safe_yaml 1.0.5
Installing safe_yaml 1.0.5
Fetching unicode-display_width 1.7.0
Installing unicode-display_width 1.7.0
Fetching webrick 1.7.0
Installing webrick 1.7.0
Fetching pathutil 0.16.2
Fetching kramdown 2.3.0
Fetching terminal-table 2.0.0
Fetching addressable 2.7.0
Fetching i18n 1.8.9
Installing terminal-table 2.0.0
Installing pathutil 0.16.2
Installing i18n 1.8.9
Installing addressable 2.7.0
Installing kramdown 2.3.0
Fetching kramdown-parser-gfm 1.1.0
Installing kramdown-parser-gfm 1.1.0
Fetching rb-inotify 0.10.1
Fetching sassc 2.4.0
Fetching em-websocket 0.5.2
Installing rb-inotify 0.10.1
Installing em-websocket 0.5.2
Installing sassc 2.4.0 with native extensions
Fetching listen 3.4.1
Installing listen 3.4.1
Fetching jekyll-watch 2.2.1
Installing jekyll-watch 2.2.1
Fetching jekyll-sass-converter 2.1.0
Installing jekyll-sass-converter 2.1.0
Fetching jekyll 4.2.0
Installing jekyll 4.2.0
Fetching jekyll-redirect-from 0.16.0
Installing jekyll-redirect-from 0.16.0
Bundle complete! 4 Gemfile dependencies, 30 gems now installed.
Bundled gems are installed into `./.local_ruby_bundle`
```

#### Testing Jekyll (or other gem) update

First locally I reverted Jekyll to 4.1.0:
```
$ rm Gemfile.lock
$ rm -rf .local_ruby_bundle

# edited Gemfile to use version 4.1.0
$ cat Gemfile
source "https://rubygems.org"

gem "jekyll", "4.1.0"
gem "rouge", "3.26.0"
gem "jekyll-redirect-from", "0.16.0"
gem "webrick", "1.7"
$ bundle install
...
```

Testing Jekyll version before the update:

```
$ bundle exec jekyll --version
jekyll 4.1.0
```

Imitating Jekyll update coming from git by reverting my local changes:

```
$ git checkout Gemfile
Updated 1 path from the index
$ cat Gemfile
source "https://rubygems.org"

gem "jekyll", "4.2.0"
gem "rouge", "3.26.0"
gem "jekyll-redirect-from", "0.16.0"
gem "webrick", "1.7"

$ git checkout Gemfile.lock
Updated 1 path from the index
```

Run the install:

```
$ bundle install
...
```

Checking the updated Jekyll version:
```
$ bundle exec jekyll --version
jekyll 4.2.0
```

Closes #31559 from attilapiros/pin-jekyll-version.

Lead-authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: bdcad33)
The file was modifieddev/lint-python (diff)
The file was modified.github/workflows/build_and_test.yml (diff)
The file was modifieddev/create-release/do-release-docker.sh (diff)
The file was modifieddev/create-release/release-build.sh (diff)
The file was modifieddev/create-release/spark-rm/Dockerfile (diff)
The file was addeddocs/Gemfile.lock
The file was addeddocs/.bundle/config
The file was addeddocs/Gemfile
The file was modifieddocs/README.md (diff)
The file was modifieddev/run-tests.py (diff)
The file was modifiedpython/docs/source/development/contributing.rst (diff)
The file was modified.gitignore (diff)
Commit 7b549c3e53812575c968006096746f18d658206f by dhyun
[SPARK-34455][SQL] Deprecate `spark.sql.legacy.replaceDatabricksSparkAvro.enabled`

### What changes were proposed in this pull request?
1. Put the SQL config `spark.sql.legacy.replaceDatabricksSparkAvro.enabled` to the list of deprecated configs `deprecatedSQLConfigs`
2. Update docs for the Avro datasource
<img width="982" alt="Screenshot 2021-02-17 at 21 04 26" src="https://user-images.githubusercontent.com/1580697/108249890-abed7180-7166-11eb-8cb7-0c246d2a34fc.png">

### Why are the changes needed?
The config exists for enough time. We can deprecate it, and recommend users to use `.format("avro")` instead.

### Does this PR introduce _any_ user-facing change?
Should not except of the warning with the recommendation to use the `avro` format.

### How was this patch tested?
1. By generating docs via:
```
$ SKIP_API=1 SKIP_SCALADOC=1 SKIP_PYTHONDOC=1 SKIP_RDOC=1 jekyll serve --watch
```
2. Manually checking the warning:
```
scala> spark.conf.set("spark.sql.legacy.replaceDatabricksSparkAvro.enabled", false)
21/02/17 21:20:18 WARN SQLConf: The SQL config 'spark.sql.legacy.replaceDatabricksSparkAvro.enabled' has been deprecated in Spark v3.2 and may be removed in the future. Use `.format("avro")` in `DataFrameWriter` or `DataFrameReader` instead.
```

Closes #31578 from MaxGekk/deprecate-replaceDatabricksSparkAvro.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 7b549c3)
The file was modifieddocs/sql-data-sources-avro.md (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit b58f0976a92900088bb24bdd27b694c33f875a47 by gurwls223
[SPARK-34437][SQL][DOCS] Update Spark SQL guide about the rebasing DS options and SQL configs

### What changes were proposed in this pull request?
In the PR, I propose to update the Spark SQL guide about the SQL configs that are related to datetime rebasing:
- spark.sql.parquet.int96RebaseModeInWrite
- spark.sql.parquet.datetimeRebaseModeInWrite
- spark.sql.parquet.int96RebaseModeInRead
- spark.sql.parquet.datetimeRebaseModeInRead
- spark.sql.avro.datetimeRebaseModeInWrite
- spark.sql.avro.datetimeRebaseModeInRead

Parquet options added by #31489:
- datetimeRebaseMode
- int96RebaseMode

and Avro options added by #31529:
- datetimeRebaseMode

<img width="998" alt="Screenshot 2021-02-17 at 21 42 09" src="https://user-images.githubusercontent.com/1580697/108252043-3afb8900-7169-11eb-8568-511e21fa7f78.png">

### Why are the changes needed?
To inform users about supported DS options and SQL configs.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By generating the doc and manually checking:
```
$ SKIP_API=1 SKIP_SCALADOC=1 SKIP_PYTHONDOC=1 SKIP_RDOC=1 jekyll serve --watch
```

Closes #31564 from MaxGekk/doc-rebase-options.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: b58f097)
The file was modifieddocs/sql-data-sources-avro.md (diff)
The file was modifieddocs/sql-data-sources-parquet.md (diff)
Commit 51672281728164db731f3f607818bffea0334eb0 by gurwls223
[SPARK-34449][BUILD] Upgrade Jetty to fix CVE-2020-27218

### What changes were proposed in this pull request?

This PR upgrades Jetty from `9.4.34` to `9.4.36`.

### Why are the changes needed?

CVE-2020-27218 affects currently used Jetty 9.4.34.
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-27218

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Modified existing test and new test which comply with the new version of Jetty.

Closes #31574 from sarutak/upgrade-jetty-9.4.36.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 5167228)
The file was modifiedpom.xml (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/ui/UISuite.scala (diff)
Commit c925e4d0fde835499baad386ddfdff0df4c67f45 by wenchen
[SPARK-34393][SQL] Unify output of SHOW VIEWS and pass output attributes properly

### What changes were proposed in this pull request?
The current implement of some DDL not unify the output and not pass the output properly to physical command.
Such as: The output attributes of `ShowViews` does't pass to `ShowViewsCommand` properly.

As the query plan, this PR pass the output attributes from `ShowViews` to `ShowViewsCommand`.

### Why are the changes needed?
This PR pass the output attributes could keep the expr ID unchanged, so that avoid bugs when we apply more operators above the command output dataframe.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Jenkins test.

Closes #31508 from beliefer/SPARK-34393.

Authored-by: gengjiaan <gengjiaan@360.cn>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: c925e4d)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
Commit edccf96caddd8a570aa819bd5fde96457d2a3749 by wenchen
[SPARK-34394][SQL] Unify output of SHOW FUNCTIONS and pass output attributes properly

### What changes were proposed in this pull request?
The current implement of some DDL not unify the output and not pass the output properly to physical command.
Such as: The output attributes of `ShowFunctions` does't pass to `ShowFunctionsCommand` properly.

As the query plan, this PR pass the output attributes from `ShowFunctions` to `ShowFunctionsCommand`.

### Why are the changes needed?
This PR pass the output attributes could keep the expr ID unchanged, so that avoid bugs when we apply more operators above the command output dataframe.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Jenkins test.

Closes #31519 from beliefer/SPARK-34394.

Authored-by: gengjiaan <gengjiaan@360.cn>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: edccf96)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
Commit ff5115c3ac15cf718fbfe98e07c56f3dde79a602 by tgraves
[SPARK-33739][SQL] Jobs committed through the S3A Magic committer don't track bytes

BasicWriteStatsTracker to probe for a custom Xattr if the size of
the generated file is 0 bytes; if found and parseable use that as
the declared length of the output.

The matching Hadoop patch in HADOOP-17414:

* Returns all S3 object headers as XAttr attributes prefixed "header."
* Sets the custom header x-hadoop-s3a-magic-data-length to the length of
  the data in the marker file.

As a result, spark job tracking will correctly report the amount of data uploaded
and yet to materialize.

### Why are the changes needed?

Now that S3 is consistent, it's a lot easier to use the S3A "magic" committer
which redirects a file written to `dest/__magic/job_0011/task_1245/__base/year=2020/output.avro`
to its final destination `dest/year=2020/output.avro` , adding a zero byte marker file at
the end and a json file `dest/__magic/job_0011/task_1245/__base/year=2020/output.avro.pending`
containing all the information for the job committer to complete the upload.

But: the write tracker statictics don't show progress as they measure the length of the
created file, find the marker file and report 0 bytes.
By probing for a specific HTTP header in the marker file and parsing that if
retrieved, the real progress can be reported.

There's a matching change in Hadoop [https://github.com/apache/hadoop/pull/2530](https://github.com/apache/hadoop/pull/2530)
which adds getXAttr API support to the S3A connector and returns the headers; the magic
committer adds the relevant attributes.

If the FS being probed doesn't support the XAttr API, the header is missing
or the value not a positive long then the size of 0 is returned.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

New tests in BasicWriteTaskStatsTrackerSuite which use a filter FS to
implement getXAttr on top of LocalFS; this is used to explore the set of
options:
* no XAttr API implementation (existing tests; what callers would see with
  most filesystems)
* no attribute found (HDFS, ABFS without the attribute)
* invalid data of different forms

All of these return Some(0) as file length.

The Hadoop PR verifies XAttr implementation in S3A and that
the commit protocol attaches the header to the files.

External downstream testing has done the full hadoop+spark end
to end operation, with manual review of logs to verify that the
data was successfully collected from the attribute.

Closes #30714 from steveloughran/cdpd/SPARK-33739-magic-commit-tracking-master.

Authored-by: Steve Loughran <stevel@cloudera.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
(commit: ff5115c)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/BasicWriteTaskStatsTrackerSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala (diff)
The file was modifieddocs/cloud-integration.md (diff)
Commit 27873280ffbd73be6df230b4497701794ac81d91 by srowen
[SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase

### What changes were proposed in this pull request?

Currently in `SpecificParquetRecordReaderBase` we use deprecated APIs in a few places from Parquet, such as `readFooter`, `ParquetInputSplit`, `new ParquetFileReader`, `filterRowGroups`, etc. This replaces these with the newer APIs. In specific this:
- Replaces `ParquetInputSplit` with `FileSplit`. We never use specific things in the former such as `rowGroupOffsets` so the swap is pretty simple.
- Removes `readFooter` calls by using `ParquetFileReader.open`
- Replace deprecated `ParquetFileReader` ctor with the newer API which takes `ParquetReadOptions`.
- Removes the unnecessary handling of case when `rowGroupOffsets` is not null. It seems this never happens.

### Why are the changes needed?

The aforementioned APIs were deprecated and is going to be removed at some point in future. This is to ensure better supportability.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

This is a cleanup and relies on existing tests on the relevant code paths.

Closes #29542 from sunchao/SPARK-32703.

Authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 2787328)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala (diff)
Commit 8f7ec4b28e8b63cce03f7b2bf64fe5dbabbf982d by dhyun
[SPARK-34454][SQL] Mark legacy SQL configs as internal

### What changes were proposed in this pull request?
1. Make the following SQL configs as internal:
    - spark.sql.legacy.allowHashOnMapType
    - spark.sql.legacy.sessionInitWithConfigDefaults
2. Add a test to check that all SQL configs from the `legacy` namespace are marked as internal configs.

### Why are the changes needed?
Assuming that legacy SQL configs shouldn't be set by users in common cases. The purpose of such configs is to allow switching to old behavior in corner cases. So, the configs should be marked as internals.

### Does this PR introduce _any_ user-facing change?
Should not.

### How was this patch tested?
By running new test:
```
$ build/sbt "test:testOnly *SQLConfSuite"
```

Closes #31577 from MaxGekk/mark-legacy-configs-as-internal.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 8f7ec4b)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala (diff)
Commit 331c6fd4efcb337d903b7179b05997dca2dae2a8 by dhyun
[SPARK-34467][BUILD] Upgrade Zstd-jni to 1.4.8-4

### What changes were proposed in this pull request?

This PR aims to upgrade Zstd-JNI library to 1.4.8-4 to bring JNI side optimization.
`ZStandardBenchmark` shows that there is no regression in terms of performance and show some improvements.

### Why are the changes needed?

https://github.com/luben/zstd-jni/commits/v1.4.8-4
- https://github.com/luben/zstd-jni/commit/be9be47fae23b5510344b67f9c0f7b258c2f5890
- https://github.com/luben/zstd-jni/commit/be51ebade15cbd4939a46581070836cc56e23ae6
- https://github.com/luben/zstd-jni/commit/44ff8b6f958bfcd75e187c6a51acf324a481cbe1

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #31585 from dongjoon-hyun/SPARK-ZSTD-1.4.8-4.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 331c6fd)
The file was modifiedpom.xml (diff)
The file was modifiedcore/benchmarks/ZStandardBenchmark-results.txt (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifiedcore/benchmarks/ZStandardBenchmark-jdk11-results.txt (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
Commit cad469d47a0828106dcd2f9a949c91d69b28ec85 by dhyun
[SPARK-34465][SQL] Rename v2 alter table exec nodes

### What changes were proposed in this pull request?
Rename the following v2 exec nodes:
- AlterTableAddPartitionExec -> AddPartitionExec
- AlterTableRenamePartitionExec -> RenamePartitionExec
- AlterTableDropPartitionExec -> DropPartitionExec

### Why are the changes needed?
- To be consistent with v2 exec node added before: ALTER TABLE .. RENAME TO` -> RenameTableExec.
- For simplicity and readability of the execution plans.

### Does this PR introduce _any_ user-facing change?
Should not since this is internal API.

### How was this patch tested?
By running the existing test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableAddPartitionSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRenamePartitionSuite"
```

Closes #31584 from MaxGekk/rename-alter-table-exec-nodes.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: cad469d)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DropPartitionExec.scala
The file was removedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableAddPartitionExec.scala
The file was removedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableRenamePartitionExec.scala
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/RenamePartitionExec.scala
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AddPartitionExec.scala
The file was removedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableDropPartitionExec.scala
Commit 26548edfa2445b009f63bbdbe810cdb6c289c18d by gurwls223
[MINOR][SQL][DOCS] Fix the comments in the example at window function

### What changes were proposed in this pull request?

`functions.scala` window function has an comment error in the field name. The column should be `time` per `timestamp:TimestampType`.

### Why are the changes needed?

To deliver the correct documentation and examples.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes the user-facing docs.

### How was this patch tested?

CI builds in this PR should test the documentation build.

Closes #31582 from yzjg/yzjg-patch-1.

Authored-by: yzjg <785246661@qq.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 26548ed)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)