Changes

Summary

  1. [SPARK-35041][SQL] Revise the overflow in UTF8String (commit: e70b0f8) (details)
  2. [SPARK-35043][SQL] Add condition lambda and rule id to the resolve (commit: 49618c9) (details)
  3. [SPARK-35014] Fix the PhysicalAggregation pattern to not rewrite (commit: 9cd25b4) (details)
  4. [SPARK-35045][SQL] Add an internal option to control input buffer in (commit: 1f56215) (details)
  5. [SPARK-34916][SQL][FOLLOWUP] Remove duplicate code in (commit: ade3a1d) (details)
  6. [SPARK-34947][SQL] Streaming write to a V2 table should invalidate its (commit: 1a67089) (details)
  7. [SPARK-33604][SQL] Group exception messages in sql/execution (commit: 27bec91) (details)
  8. [SPARK-35049][CORE] Remove unused MapOutputTracker in (commit: ee7d838) (details)
  9. [SPARK-35012][PYTHON] Port Koalas DataFrame-related unit tests into (commit: 8ebc3fc) (details)
  10. [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple (commit: ef05e89) (details)
  11. [MINOR][PYTHON][DOCS] Fix docstring for pyspark.sql.DataFrameWriter.json (commit: faa928c) (details)
  12. [SPARK-35050][DOCS][MESOS] Document deprecation of Apache Mesos in 3.2.0 (commit: 700aa17) (details)
  13. [SPARK-35033][PYTHON] Port Koalas plot unit tests into PySpark (commit: cd1e8e8) (details)
  14. [SPARK-35048][INFRA] Distribute GitHub Actions workflows to fork (commit: 2974b70) (details)
  15. [SPARK-35035][PYTHON] Port Koalas internal implementation unit tests (commit: 47d62af) (details)
  16. [SPARK-35039][PYTHON] Remove PySpark version dependent codes (commit: 4ae57d5) (details)
  17. [SPARK-34577][SQL][FOLLOWUP] Add change of `DESC NAMESPACE`'s schema to (commit: 0fc97b5) (details)
  18. [SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL (commit: 816f6dd) (details)
  19. [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its (commit: b5241c9) (details)
  20. [SPARK-35002][YARN][TESTS][FOLLOW-UP] Fix java.net.BindException in (commit: a153efa) (details)
  21. [SPARK-35069][SQL] TRANSFORM forbidden  `DISTICNT` and `ALL`, also make (commit: 4ca9958) (details)
  22. [SPARK-35061][BUILD] Upgrade pycodestyle from 2.6.0 to 2.7.0 (commit: 3e218ad) (details)
  23. [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a (commit: de9e8b6) (details)
  24. [SPARK-33882][ML] Add a vectorized BLAS implementation (commit: 9244066) (details)
  25. [SPARK-34834][NETWORK] Fix a potential Netty memory leak in (commit: bf9f3b8) (details)
  26. [SPARK-35044][SQL] `SET propertyKey` shall also lookup (commit: f32114d) (details)
  27. [SPARK-34630][PYTHON][FOLLOWUP] Add __version__ into pyspark init (commit: 31555f7) (details)
  28. [SPARK-35034][PYTHON] Port Koalas miscellaneous unit tests into PySpark (commit: 58feb85) (details)
Commit e70b0f81b32f3c5ad9c926142d7ccdc9f1a19ab4 by max.gekk
[SPARK-35041][SQL] Revise the overflow in UTF8String

### What changes were proposed in this pull request?

Add overflow check before do `new byte[]`.

### Why are the changes needed?

Avoid overflow in extreme case.

### Does this PR introduce _any_ user-facing change?

Maybe yes, the error msg changed if overflow.

### How was this patch tested?

Pass CI.

Closes #32142 from ulysses-you/SPARK-35041.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: e70b0f8)
The file was modifiedcommon/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java (diff)
Commit 49618c9543948bc0b409be14b6088c4ed371c19c by ltnwgl
[SPARK-35043][SQL] Add condition lambda and rule id to the resolve function family

### What changes were proposed in this pull request?

This PR contains:
- AnalysisHelper changes to allow the resolve function family to stop earlier without traversing the entire tree;
- Example changes in a few rules to support such pruning, e.g., ResolveRandomSeed, ResolveWindowFrame, ResolveWindowOrder, and ResolveNaturalAndUsingJoin.

### Why are the changes needed?

It's a framework-level change for reducing the query compilation time.
In particular, if we update existing analysis rules' call sites as per the examples in this PR, the analysis time can be reduced as described in the [doc](https://docs.google.com/document/d/1SEUhkbo8X-0cYAJFYFDQhxUnKJBz4lLn3u4xR2qfWqk).

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

It is tested by existing tests.

Closes #32135 from sigmod/resolver.

Authored-by: Yingyi Bu <yingyi.bu@databricks.com>
Signed-off-by: Gengliang Wang <ltnwgl@gmail.com>
(commit: 49618c9)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleIdCollection.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/AnalysisHelper.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala (diff)
Commit 9cd25b46b9d1de0c7cdecdabd8cf37b25ec2d78a by wenchen
[SPARK-35014] Fix the PhysicalAggregation pattern to not rewrite foldable expressions

### What changes were proposed in this pull request?

Fix PhysicalAggregation to not transform a foldable expression.

### Why are the changes needed?

It can potentially break certain queries like the added unit test shows.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes undesirable errors caused by a returned TypeCheckFailure from places like RegExpReplace.checkInputDataTypes.

Closes #32113 from sigmod/foldable.

Authored-by: Yingyi Bu <yingyi.bu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 9cd25b4)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala (diff)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/PhysicalAggregationSuite.scala
Commit 1f562159bf61dd5e536db7841b16e74a635e7a97 by max.gekk
[SPARK-35045][SQL] Add an internal option to control input buffer in univocity

### What changes were proposed in this pull request?

This PR makes the input buffer configurable (as an internal option). This is mainly to work around uniVocity/univocity-parsers#449.

### Why are the changes needed?

To work around uniVocity/univocity-parsers#449.

### Does this PR introduce _any_ user-facing change?

No, it's only internal option.

### How was this patch tested?

Manually tested by modifying the unittest added in https://github.com/apache/spark/pull/31858 as below:

```diff
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
index fd25a79619d..b58f0bd3661 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
-2460,6 +2460,7  abstract class CSVSuite
       Seq(line).toDF.write.text(path.getAbsolutePath)
       assert(spark.read.format("csv")
         .option("delimiter", "|")
+        .option("inputBufferSize", "128")
         .option("ignoreTrailingWhiteSpace", "true").load(path.getAbsolutePath).count() == 1)
     }
   }
```

Closes #32145 from HyukjinKwon/SPARK-35045.

Lead-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: 1f56215)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala (diff)
Commit ade3a1df82d11e927ff5800c21dc23214163c99e by ltnwgl
[SPARK-34916][SQL][FOLLOWUP] Remove duplicate code in `TreeNode.treePatternBits`

### What changes were proposed in this pull request?

Remove duplicate code in `TreeNode.treePatternBits`

### Why are the changes needed?

Code clean up. Make it easier for maintainence.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests.

Closes #32143 from gengliangwang/getBits.

Authored-by: Gengliang Wang <ltnwgl@gmail.com>
Signed-off-by: Gengliang Wang <ltnwgl@gmail.com>
(commit: ade3a1d)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala (diff)
Commit 1a6708918b32e821bff26a00d2d8b7236b29515f by wenchen
[SPARK-34947][SQL] Streaming write to a V2 table should invalidate its associated cache

### What changes were proposed in this pull request?

Populate table catalog and identifier from `DataStreamWriter` to `WriteToMicroBatchDataSource` so that we can invalidate cache for tables that are updated by a streaming write.

This is somewhat related [SPARK-27484](https://issues.apache.org/jira/browse/SPARK-27484) and [SPARK-34183](https://issues.apache.org/jira/browse/SPARK-34183) (#31700), as ideally we may want to replace `WriteToMicroBatchDataSource` and `WriteToDataSourceV2` with logical write nodes and feed them to analyzer. That will potentially change the code path involved in this PR.

### Why are the changes needed?

Currently `WriteToDataSourceV2` doesn't have cache invalidation logic, and therefore, when the target table for a micro batch streaming job is cached, the cache entry won't be removed when the table is updated.

### Does this PR introduce _any_ user-facing change?

Yes now when a DSv2 table which supports streaming write is updated by a streaming job, its cache will also be invalidated.

### How was this patch tested?

Added a new UT.

Closes #32039 from sunchao/streaming-cache.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 1a67089)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/WriteToMicroBatchDataSource.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ResolveWriteToStream.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/streaming/WriteToStream.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/streaming/WriteToStreamStatement.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
Commit 27bec91bc971b393bd91f2ec8c6483b33f844f12 by wenchen
[SPARK-33604][SQL] Group exception messages in sql/execution

### What changes were proposed in this pull request?
This PR group exception messages in `/core/src/main/scala/org/apache/spark/sql/execution`.

### Why are the changes needed?
It will largely help with standardization of error messages and its maintenance.

### Does this PR introduce _any_ user-facing change?
No. Error messages remain unchanged.

### How was this patch tested?
No new tests - pass all original tests to make sure it doesn't break any existing behavior.

Closes #31920 from beliefer/SPARK-33604.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 27bec91)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnType.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/analysis/DetectAmbiguousSelfJoin.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/simpleCosting.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnAccessor.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnBuilder.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/python/RowQueue.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala (diff)
Commit ee7d838aaf46f9d786e0388915b422fb78952893 by mridulatgmail.com
[SPARK-35049][CORE] Remove unused MapOutputTracker in BlockStoreShuffleReader

### What changes were proposed in this pull request?
Remove unused MapOutputTracker in BlockStoreShuffleReader

### Why are the changes needed?
Remove unused MapOutputTracker in BlockStoreShuffleReader

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes #32148 from AngersZhuuuu/SPARK-35049.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
(commit: ee7d838)
The file was modifiedcore/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala (diff)
Commit 8ebc3fca8c68f896b1395731e0519941c8b49e67 by ueshin
[SPARK-35012][PYTHON] Port Koalas DataFrame-related unit tests into PySpark

### What changes were proposed in this pull request?
Now that we merged the Koalas main code into the PySpark code base (#32036), we should port the Koalas DataFrame-related unit tests to PySpark.

### Why are the changes needed?
Currently, the pandas-on-Spark modules are not fully tested. We should enable the DataFrame-related unit tests first.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Enable DataFrame-related unit tests.

Closes #32131 from xinrong-databricks/port.test_dataframe_related.

Lead-authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Co-authored-by: xinrong-databricks <47337188+xinrong-databricks@users.noreply.github.com>
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
(commit: 8ebc3fc)
The file was addedpython/pyspark/pandas/tests/test_dataframe_conversion.py
The file was modifieddev/sparktestsupport/modules.py (diff)
The file was addedpython/pyspark/pandas/tests/test_dataframe_spark_io.py
The file was addedpython/pyspark/pandas/tests/test_frame_spark.py
Commit ef05e89ee54095c975ef3a1f1e71a9ff90d50411 by sarutak
[SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

### What changes were proposed in this pull request?

This PR fixes an issue that `LIST FILES/JARS/ARCHIVES path1 path2 ...` cannot list all paths if at least one path is quoted.
An example here.
```
ADD FILE /tmp/test1;
ADD FILE /tmp/test2;

LIST FILES /tmp/test1 /tmp/test2;
file:/tmp/test1
file:/tmp/test2

LIST FILES /tmp/test1 "/tmp/test2";
file:/tmp/test2
```

In this example, the second `LIST FILES` doesn't show `file:/tmp/test1`.

To resolve this issue, I modified the syntax rule to be able to handle this case.
I also changed `SparkSQLParser` to be able to handle paths which contains white spaces.

### Why are the changes needed?

This is a bug.
I also have a plan which extends `ADD FILE/JAR/ARCHIVE` to take multiple paths like Hive and the syntax rule change is necessary for that.

### Does this PR introduce _any_ user-facing change?

Yes. Users can pass quoted paths when using `ADD FILE/JAR/ARCHIVE`.

### How was this patch tested?

New test.

Closes #32074 from sarutak/fix-list-files-bug.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(commit: ef05e89)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
Commit faa928cefc8c1c6d7771aacd2ae7670162346361 by gurwls223
[MINOR][PYTHON][DOCS] Fix docstring for pyspark.sql.DataFrameWriter.json lineSep param

### What changes were proposed in this pull request?

Add a new line to the `lineSep` parameter so that the doc renders correctly.

### Why are the changes needed?

> <img width="608" alt="image" src="https://user-images.githubusercontent.com/8269566/114631408-5c608900-9c71-11eb-8ded-ae1e21ae48b2.png">

The first line of the description is part of the signature and is **bolded**.

### Does this PR introduce _any_ user-facing change?

Yes, it changes how the docs for `pyspark.sql.DataFrameWriter.json` are rendered.

### How was this patch tested?

I didn't test it; I don't have the doc rendering tool chain on my machine, but the change is obvious.

Closes #32153 from AlexMooney/patch-1.

Authored-by: Alex Mooney <alexmooney@fastmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: faa928c)
The file was modifiedpython/pyspark/sql/readwriter.py (diff)
Commit 700aa1769cdf472cfbff66ea9b8e67dcf20d0f02 by gurwls223
[SPARK-35050][DOCS][MESOS] Document deprecation of Apache Mesos in 3.2.0

### What changes were proposed in this pull request?

Deprecate Apache Mesos support for Spark 3.2.0 by adding documentation to this effect.

### Why are the changes needed?

Apache Mesos is ceasing development (https://lists.apache.org/thread.html/rab2a820507f7c846e54a847398ab20f47698ec5bce0c8e182bfe51ba%40%3Cdev.mesos.apache.org%3E) ; at some point we'll want to drop support, so, deprecate it now.

This doesn't mean it'll go away in 3.3.0.

### Does this PR introduce _any_ user-facing change?

No, docs only.

### How was this patch tested?

N/A

Closes #32150 from srowen/SPARK-35050.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 700aa17)
The file was modifieddocs/core-migration-guide.md (diff)
The file was modifieddocs/index.md (diff)
The file was modifieddocs/cluster-overview.md (diff)
The file was modifieddocs/running-on-mesos.md (diff)
Commit cd1e8e8158d8d1e2b1a7b4c65020b8b69d209028 by gurwls223
[SPARK-35033][PYTHON] Port Koalas plot unit tests into PySpark

### What changes were proposed in this pull request?
Now that we merged the Koalas main code into the PySpark code base (#32036), we should port the Koalas plot unit tests to PySpark.

### Why are the changes needed?
Currently, the pandas-on-Spark modules are not tested fully. We should enable the plot unit tests.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Enable plot unit tests.

Closes #32151 from xinrong-databricks/port.plot_tests.

Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: cd1e8e8)
The file was addedpython/pyspark/pandas/tests/plot/test_frame_plot_matplotlib.py
The file was addedpython/pyspark/pandas/tests/plot/test_frame_plot.py
The file was addedpython/pyspark/pandas/tests/plot/test_series_plot.py
The file was modifieddev/sparktestsupport/modules.py (diff)
The file was addedpython/pyspark/pandas/tests/plot/test_series_plot_plotly.py
The file was modifiedpython/pyspark/pandas/testing/utils.py (diff)
The file was addedpython/pyspark/pandas/tests/plot/__init__.py
The file was addedpython/pyspark/pandas/tests/plot/test_series_plot_matplotlib.py
The file was addedpython/pyspark/pandas/tests/plot/test_frame_plot_plotly.py
Commit 2974b70d1efd4b1c5cfe7e2467766f0a9a1fec82 by gurwls223
[SPARK-35048][INFRA] Distribute GitHub Actions workflows to fork repositories to share the resources

### What changes were proposed in this pull request?

This PR proposes to leverage the GitHub Actions resources from the forked repositories instead of using the resources in ASF organisation at GitHub.

This is how it works:

1. "Build and test" (`build_and_test.yml`)  triggers a build on any commit on any branch (except `branch-*.*`), which roughly means:
    - The original repository will trigger the build on any commits in `master` branch
    - The forked repository will trigger the build on any commit in any branch.
2. The build triggered in the forked repository will checkout the original repository's `master` branch locally, and merge the branch from the forked repository into the original repository's `master` branch locally.
  Therefore, the tests in the forked repository will run after being sync'ed with the original repository's `master` branch.
3. In the original repository, it triggers a workflow that detects the workflow triggered in the forked repository, and add a comment, to the PR, pointing out the workflow in forked repository.

In short, please see this example HyukjinKwon#34

1. You create a PR and your repository triggers the workflow. Your PR uses the resources allocated to you for testing.
2. Apache Spark repository finds your workflow, and links it in a comment in your PR

**NOTE** that we will still run the tests in the original repository for each commit pushed to `master` branch. This distributes the workflows only in PRs.

### Why are the changes needed?

ASF shares the resources across all the ASF projects, which makes the development slow down.
Please see also:
- Discussion in the buildsa.o mailing list: https://lists.apache.org/x/thread.html/r48d079eeff292254db22705c8ef8618f87ff7adc68d56c4e5d0b4105%3Cbuilds.apache.org%3E
- Infra ticket: https://issues.apache.org/jira/browse/INFRA-21646

By distributing the workflows to use author's resources, we can get around this issue.

### Does this PR introduce _any_ user-facing change?

No, this is a dev-only change.

### How was this patch tested?

Manually tested at https://github.com/HyukjinKwon/spark/pull/34 and https://github.com/HyukjinKwon/spark/pull/33.

Closes #32092 from HyukjinKwon/poc-fork-resources.

Lead-authored-by: HyukjinKwon <gurwls223@apache.org>
Co-authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 2974b70)
The file was modifieddev/run-tests.py (diff)
The file was modified.github/workflows/build_and_test.yml (diff)
The file was added.github/workflows/notify_test_workflow.yml
Commit 47d62af2a92c35d4b2d3dd4c18d6d0c038938e01 by gurwls223
[SPARK-35035][PYTHON] Port Koalas internal implementation unit tests into PySpark

### What changes were proposed in this pull request?
Now that we merged the Koalas main code into the PySpark code base (#32036), we should port the Koalas internal implementation unit tests to PySpark.

### Why are the changes needed?
Currently, the pandas-on-Spark modules are not tested fully. We should enable the internal implementation unit tests.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Enable internal implementation unit tests.

Closes #32137 from xinrong-databricks/port.test_internal_impl.

Lead-authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Co-authored-by: xinrong-databricks <47337188+xinrong-databricks@users.noreply.github.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 47d62af)
The file was addedpython/pyspark/pandas/tests/test_typedef.py
The file was addedpython/pyspark/pandas/tests/test_internal.py
The file was addedpython/pyspark/pandas/tests/test_default_index.py
The file was addedpython/pyspark/pandas/tests/test_numpy_compat.py
The file was addedpython/pyspark/pandas/tests/test_utils.py
The file was modifieddev/sparktestsupport/modules.py (diff)
The file was addedpython/pyspark/pandas/tests/test_config.py
The file was addedpython/pyspark/pandas/tests/test_extension.py
Commit 4ae57d5b3a1820b85b6cb7003f27f7660d00eb10 by gurwls223
[SPARK-35039][PYTHON] Remove PySpark version dependent codes

### What changes were proposed in this pull request?

Removes PySpark version dependent codes from `pyspark.pandas` main codes.

### Why are the changes needed?

There are several places to check the PySpark version and switch the logic, but now those are not necessary.
We should remove them.

We will do the same thing after we finish porting tests.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #32138 from ueshin/issues/SPARK-35039/pyspark_version.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 4ae57d5)
The file was modifiedpython/pyspark/pandas/utils.py (diff)
The file was modifiedpython/pyspark/pandas/spark/functions.py (diff)
The file was modifiedpython/pyspark/pandas/internal.py (diff)
The file was modifiedpython/pyspark/pandas/groupby.py (diff)
The file was modifiedpython/pyspark/pandas/accessors.py (diff)
The file was modifiedpython/pyspark/pandas/spark/accessors.py (diff)
The file was modifiedpython/pyspark/pandas/__init__.py (diff)
The file was modifiedpython/pyspark/pandas/frame.py (diff)
The file was modifiedpython/pyspark/pandas/generic.py (diff)
The file was modifiedpython/pyspark/pandas/indexes/multi.py (diff)
The file was modifiedpython/pyspark/pandas/series.py (diff)
The file was modifiedpython/pyspark/pandas/namespace.py (diff)
Commit 0fc97b5bf44e4f9f0faf011a2d12e11af1cdb972 by wenchen
[SPARK-34577][SQL][FOLLOWUP] Add change of `DESC NAMESPACE`'s schema to migration guide

### What changes were proposed in this pull request?
Add change of `DESC NAMESPACE`'s schema to migration guide

### Why are the changes needed?
Update doc

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes #32155 from AngersZhuuuu/SPARK-34577-followup.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 0fc97b5)
The file was modifieddocs/sql-migration-guide.md (diff)
Commit 816f6dd13eb35908bab8f1524c7629a5c6d585c6 by wenchen
[SPARK-34527][SQL] Resolve duplicated common columns from USING/NATURAL JOIN

### What changes were proposed in this pull request?

Adds the duplicated common columns as hidden columns to the Projection used to rewrite NATURAL/USING JOINs.

### Why are the changes needed?

Allows users to resolve either side of the NATURAL/USING JOIN's common keys.
Previously, the user could only resolve the following columns:

| Join type | Left key columns | Right key columns |
| --- | --- | --- |
| Inner | Yes | No |
| Left | Yes | No |
| Right | No | Yes |
| Outer | No | No |

### Does this PR introduce _any_ user-facing change?

Yes. The user can now symmetrically resolve the common columns from a NATURAL/USING JOIN.

### How was this patch tested?

SQL-side tests. The behavior matches PostgreSQL and MySQL.

Closes #31666 from karenfeng/spark-34527.

Authored-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 816f6dd)
The file was addedsql/core/src/test/resources/sql-tests/inputs/using-join.sql
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/natural-join.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/natural-join.sql (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala (diff)
The file was addedsql/core/src/test/resources/sql-tests/results/using-join.sql.out
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/AnalysisHelper.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Implicits.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala (diff)
Commit b5241c97b17a1139a4ff719bfce7f68aef094d95 by wenchen
[SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed

### What changes were proposed in this pull request?

This PR proposes to introduce the `AnalysisOnlyCommand` trait such that a command that extends this trait can have its children only analyzed, but not optimized. There is a corresponding analysis rule `HandleAnalysisOnlyCommand` that marks the command as analyzed after all other analysis rules are run.

This can be useful if a logical plan has children where they need to be only analyzed, but not optimized - e.g., `CREATE VIEW` or `CACHE TABLE AS`. This also addresses the issue found in #31933.

This PR also updates `CreateViewCommand`, `CacheTableAsSelect`, and `AlterViewAsCommand` to use the new trait / rule such that their children are only analyzed.

### Why are the changes needed?

To address the issue where the plan is unnecessarily re-analyzed in `CreateViewCommand`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests should cover the changes.

Closes #32032 from imback82/skip_transform.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: b5241c9)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Dataset.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/explain.sql.out (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CacheTableExec.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
Commit a153efa643dcb1d8e6c2242846b3db0b2be39ae7 by yumwang
[SPARK-35002][YARN][TESTS][FOLLOW-UP] Fix java.net.BindException in MiniYARNCluster

### What changes were proposed in this pull request?

This PR fixes two tests below:

https://github.com/apache/spark/runs/2320161984

```
[info] YarnShuffleIntegrationSuite:
[info] org.apache.spark.deploy.yarn.YarnShuffleIntegrationSuite *** ABORTED *** (228 milliseconds)
[info]   org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
[info]   at org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:373)
[info]   at org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:128)
[info]   at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:503)
[info]   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
[info]   at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
[info]   at org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:322)
[info]   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
[info]   at org.apache.spark.deploy.yarn.BaseYarnClusterSuite.beforeAll(BaseYarnClusterSuite.scala:95)
...
[info]   Cause: java.net.BindException: Port in use: fv-az186-831:0
[info]   at org.apache.hadoop.http.HttpServer2.constructBindException(HttpServer2.java:1231)
[info]   at org.apache.hadoop.http.HttpServer2.bindForSinglePort(HttpServer2.java:1253)
[info]   at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:1316)
[info]   at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:1167)
[info]   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:449)
[info]   at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:1247)
[info]   at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1356)
[info]   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
[info]   at org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:365)
[info]   at org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:128)
[info]   at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:503)
[info]   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
[info]   at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
[info]   at org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:322)
[info]   at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
[info]   at org.apache.spark.deploy.yarn.BaseYarnClusterSuite.beforeAll(BaseYarnClusterSuite.scala:95)
[info]   at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:61)
...
```

https://github.com/apache/spark/runs/2323342094

```
[info] Test org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testBadSecret started
[error] Test org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testBadSecret failed: java.lang.AssertionError: Connecting to /10.1.0.161:39895 timed out (120000 ms), took 120.081 sec
[error]     at org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testBadSecret(ExternalShuffleSecuritySuite.java:85)
[error]     ...
[info] Test org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testBadAppId started
[error] Test org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testBadAppId failed: java.lang.AssertionError: Connecting to /10.1.0.198:44633 timed out (120000 ms), took 120.08 sec
[error]     at org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testBadAppId(ExternalShuffleSecuritySuite.java:76)
[error]     ...
[info] Test org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testValid started
[error] Test org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testValid failed: java.io.IOException: Connecting to /10.1.0.119:43575 timed out (120000 ms), took 120.089 sec
[error]     at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:285)
[error]     at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
[error]     at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
[error]     at org.apache.spark.network.shuffle.ExternalBlockStoreClient.registerWithShuffleServer(ExternalBlockStoreClient.java:211)
[error]     at org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.validate(ExternalShuffleSecuritySuite.java:108)
[error]     at org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testValid(ExternalShuffleSecuritySuite.java:68)
[error]     ...
[info] Test org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testEncryption started
[error] Test org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testEncryption failed: java.io.IOException: Connecting to /10.1.0.248:35271 timed out (120000 ms), took 120.014 sec
[error]     at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:285)
[error]     at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
[error]     at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
[error]     at org.apache.spark.network.shuffle.ExternalBlockStoreClient.registerWithShuffleServer(ExternalBlockStoreClient.java:211)
[error]     at org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.validate(ExternalShuffleSecuritySuite.java:108)
[error]     at org.apache.spark.network.shuffle.ExternalShuffleSecuritySuite.testEncryption(ExternalShu
```

For Yarn cluster suites, its difficult to fix. This PR makes it skipped if it fails to bind.
For shuffle related suites, it uses local host

### Why are the changes needed?

To make the tests stable

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Its tested in GitHub Actions: https://github.com/HyukjinKwon/spark/runs/2340210765

Closes #32126 from HyukjinKwon/SPARK-35002-followup.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(commit: a153efa)
The file was modifiedresource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/TestUtils.java (diff)
Commit 4ca99582701c67f2492f6f8e4a7def4fa2d9ce49 by wenchen
[SPARK-35069][SQL] TRANSFORM forbidden  `DISTICNT` and `ALL`, also make the error clear

### What changes were proposed in this pull request?
According to https://github.com/apache/spark/pull/29087#discussion_r612267050,  add UT in `transform.sql`

It seems that distinct is not recognized as a reserved word here

```
-- !query
explain extended SELECT TRANSFORM(distinct b, a, c)
                   USING 'cat' AS (a, b, c)
                 FROM script_trans
                 WHERE a <= 4
-- !query schema
struct<plan:string>
-- !query output
== Parsed Logical Plan ==
'ScriptTransformation [*], cat, [a#x, b#x, c#x], ScriptInputOutputSchema(List(),List(),None,None,List(),List(),None,None,false)
+- 'Project ['distinct AS b#x, 'a, 'c]
   +- 'Filter ('a <= 4)
      +- 'UnresolvedRelation [script_trans], [], false

== Analyzed Logical Plan ==
org.apache.spark.sql.AnalysisException: cannot resolve 'distinct' given input columns: [script_trans.a, script_trans.b, script_trans.c]; line 1 pos 34;
'ScriptTransformation [*], cat, [a#x, b#x, c#x], ScriptInputOutputSchema(List(),List(),None,None,List(),List(),None,None,false)
+- 'Project ['distinct AS b#x, a#x, c#x]
   +- Filter (a#x <= 4)
      +- SubqueryAlias script_trans
         +- View (`script_trans`, [a#x,b#x,c#x])
            +- Project [cast(a#x as int) AS a#x, cast(b#x as int) AS b#x, cast(c#x as int) AS c#x]
               +- Project [a#x, b#x, c#x]
                  +- SubqueryAlias script_trans
                     +- LocalRelation [a#x, b#x, c#x]
```

Hive's error
![image](https://user-images.githubusercontent.com/46485123/114533170-355d8380-9c80-11eb-992f-982f0b296759.png)

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added Ut

Closes #32149 from AngersZhuuuu/SPARK-28227-new-followup.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 4ca9958)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/transform.sql (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/transform.sql.out (diff)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
Commit 3e218ade9cf6becc5de8b20a4385e345021a509d by dhyun
[SPARK-35061][BUILD] Upgrade pycodestyle from 2.6.0 to 2.7.0

### What changes were proposed in this pull request?

This PR bumps up the version of pycodestyle from 2.6.0 to 2.7.0 released a month ago.

### Why are the changes needed?

2.7.0 includes three major fixes below (see https://readthedocs.org/projects/pycodestyle/downloads/pdf/latest/):

- Fix physical checks (such as W191) at end of file. PR #961.
- Add --indent-size option (defaulting to 4). PR #970.
- W605: fix escaped crlf false positive on windows. PR #976

The first and third ones could be useful for dev to detect the styles.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested locally.

Closes #32160 from HyukjinKwon/SPARK-35061.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 3e218ad)
The file was modifieddev/lint-python (diff)
Commit de9e8b6c940354c5a9b669a75acca6333c2c6399 by max.gekk
[SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

### What changes were proposed in this pull request?
Support `date +/- day-time interval`. In the PR, I propose to update the binary arithmetic rules, and cast an input date to a timestamp at the session time zone, and then add a day-time interval to it.

### Why are the changes needed?
1. To conform the ANSI SQL standard which requires to support such operation over dates and intervals:
<img width="811" alt="Screenshot 2021-03-12 at 11 36 14" src="https://user-images.githubusercontent.com/1580697/111081674-865d4900-8515-11eb-86c8-3538ecaf4804.png">
2. To fix the regression comparing to the recent Spark release 3.1 with default settings.

Before the changes:
```sql
spark-sql> select date'now' + (timestamp'now' - timestamp'yesterday');
Error in query: cannot resolve 'DATE '2021-04-14' + subtracttimestamps(TIMESTAMP '2021-04-14 18:14:56.497', TIMESTAMP '2021-04-13 00:00:00')' due to data type mismatch: argument 1 requires timestamp type, however, 'DATE '2021-04-14'' is of date type.; line 1 pos 7;
'Project [unresolvedalias(cast(2021-04-14 + subtracttimestamps(2021-04-14 18:14:56.497, 2021-04-13 00:00:00, false, Some(Europe/Moscow)) as date), None)]
+- OneRowRelation
```

Spark 3.1:
```sql
spark-sql> select date'now' + (timestamp'now' - timestamp'yesterday');
2021-04-15
```

Hive:
```sql
0: jdbc:hive2://localhost:10000/default> select date'2021-04-14' + (timestamp'2020-04-14 18:15:30' - timestamp'2020-04-13 00:00:00');
+------------------------+
|          _c0           |
+------------------------+
| 2021-04-15 18:15:30.0  |
+------------------------+
```

### Does this PR introduce _any_ user-facing change?
Should not since new intervals have not been released yet.

After the changes:
```sql
spark-sql> select date'now' + (timestamp'now' - timestamp'yesterday');
2021-04-15 18:13:16.555
```

### How was this patch tested?
By running new tests:
```
$ build/sbt "test:testOnly *ColumnExpressionSuite"
```

Closes #32170 from MaxGekk/date-add-day-time-interval.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: de9e8b6)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala (diff)
Commit 9244066ca69a3fb5d7fe446ce0e19d108892c49d by srowen
[SPARK-33882][ML] Add a vectorized BLAS implementation

### What changes were proposed in this pull request?

This patch introduces a VectorizedBLAS class which implements such hardware-accelerated BLAS operations. This feature is hidden behind the "vectorized" profile that you can enable by passing "-Pvectorized" to sbt or maven.

The Vector API has been introduced in JDK 16. Following discussion on the mailing list, this API is introduced transparently and needs to be enabled explicitely.

### Why are the changes needed?

Whenever a native BLAS implementation isn't available on the system, Spark automatically falls back onto a Java implementation. With the recent release of the Vector API in the OpenJDK [1], we can use hardware acceleration for such operations.

This change was also discussed on the mailing list. [2]

### Does this PR introduce _any_ user-facing change?

It introduces a build-time profile called `vectorized`. You can pass it to sbt and mvn with `-Pvectorized`. There is no change to the end-user of Spark and it should only impact Spark developpers. It is also disabled by default.

### How was this patch tested?

It passes `build/sbt mllib-local/test` with and without `-Pvectorized` with JDK 16. This patch also introduces benchmarks for BLAS.

The benchmark results are as follows:

```
[info] daxpy:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  37             37           0        271.5           3.7       1.0X
[info] vector                                               24             25           4        416.1           2.4       1.5X
[info]
[info] ddot:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  70             70           0        143.2           7.0       1.0X
[info] vector                                               35             35           2        288.7           3.5       2.0X
[info]
[info] sdot:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  50             51           1        199.8           5.0       1.0X
[info] vector                                               15             15           0        648.7           1.5       3.2X
[info]
[info] dscal:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  34             34           0        295.6           3.4       1.0X
[info] vector                                               19             19           0        531.2           1.9       1.8X
[info]
[info] sscal:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  25             25           1        399.0           2.5       1.0X
[info] vector                                                8              9           1       1177.3           0.8       3.0X
[info]
[info] dgemv[N]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  27             27           0          0.0       26651.5       1.0X
[info] vector                                               21             21           0          0.0       20646.3       1.3X
[info]
[info] dgemv[T]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  36             36           0          0.0       35501.4       1.0X
[info] vector                                               22             22           0          0.0       21930.3       1.6X
[info]
[info] sgemv[N]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  20             20           0          0.0       20283.3       1.0X
[info] vector                                                9              9           0          0.1        8657.7       2.3X
[info]
[info] sgemv[T]:                                 Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                  30             30           0          0.0       29845.8       1.0X
[info] vector                                               10             10           1          0.1        9695.4       3.1X
[info]
[info] dgemm[N,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 182            182           0          0.5        1820.0       1.0X
[info] vector                                              160            160           1          0.6        1597.6       1.1X
[info]
[info] dgemm[N,T]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 211            211           1          0.5        2106.2       1.0X
[info] vector                                              156            157           0          0.6        1564.4       1.3X
[info]
[info] dgemm[T,N]:                               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
[info] ------------------------------------------------------------------------------------------------------------------------
[info] f2j                                                 276            276           0          0.4        2757.8       1.0X
[info] vector                                              137            137           0          0.7        1365.1       2.0X
```

/cc srowen xkrogen

[1] https://openjdk.java.net/jeps/338
[2] https://mail-archives.apache.org/mod_mbox/spark-dev/202012.mbox/%3cDM5PR2101MB11106162BB3AF32AD29C6C79B0C69DM5PR2101MB1110.namprd21.prod.outlook.com%3e

Closes #30810 from luhenry/master.

Lead-authored-by: Ludovic Henry <luhenry@microsoft.com>
Co-authored-by: Ludovic Henry <git@ludovic.dev>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 9244066)
The file was modifiedmllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/ann/BreezeUtil.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala (diff)
The file was addedmllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASBenchmark.scala
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala (diff)
The file was modifiedmllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala (diff)
The file was modifiedmllib-local/src/main/scala/org/apache/spark/ml/linalg/Matrices.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/BucketedRandomProjectionLSH.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala (diff)
The file was modifiedproject/SparkBuild.scala (diff)
The file was modifiedmllib-local/pom.xml (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala (diff)
The file was addedmllib-local/src/jvm-vectorized/java/org/apache/spark/ml/linalg/VectorizedBLAS.java
Commit bf9f3b884fcd6bd3428898581d4b5dca9bae6538 by srowen
[SPARK-34834][NETWORK] Fix a potential Netty memory leak in TransportResponseHandler

### What changes were proposed in this pull request?
There is a potential Netty memory leak in TransportResponseHandler.

### Why are the changes needed?
Fix a potential Netty memory leak in TransportResponseHandler.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
NO

Closes #31942 from weixiuli/SPARK-34834.

Authored-by: weixiuli <weixiuli@jd.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: bf9f3b8)
The file was modifiedcommon/network-common/src/main/java/org/apache/spark/network/client/TransportResponseHandler.java (diff)
Commit f32114d17e1c022817a16c83f33138a1b8faa7c6 by yao
[SPARK-35044][SQL] `SET propertyKey` shall also lookup `sparkSession.sharedState.hadoopConf` to display the effective default hive/hadoop configs

### What changes were proposed in this pull request?

Currently, pure SQL users are short of ways to see the Hadoop configurations which may affect their jobs a lot, they are only able to get the Hadoop configs that exist in `SQLConf` while other defaults in `SharedState.hadoopConf` display wrongly and confusingly with `<undefined>`.

The pre-loaded ones from `core-site.xml, hive-site.xml` etc., will only stay in `sparkSession.sharedState.hadoopConf` or `sc._hadoopConfiguation` not `SQLConf`. Some of them that related the Hive Metastore connection(never change it spark runtime), e.g. `hive.metastore.uris`, are clearly global static and unchangeable but displayable I guess. Some of the ones that might be related to, for example, the output codec/compression, preset in Hadoop/hive config files like core-site.xml shall be still changeable from case to case, table to table, file to file, etc. It' meaningfully to show the defaults for users to change based on that.

In this PR, I propose to support get a Hadoop configuration by SET syntax, for example
```
SET mapreduce.map.output.compress.codec;
```

### Why are the changes needed?

better user experience for pure SQL users

### Does this PR introduce _any_ user-facing change?

yes, where retrieving a conf only existing in sessionState.hadoopConf, before is `undefined` and now you see it

### How was this patch tested?

new test

Closes #32144 from yaooqinn/SPARK-35044.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Kent Yao <yao@apache.org>
(commit: f32114d)
The file was modifiedsql/core/src/test/resources/hive-site.xml (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/SetCommand.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
Commit 31555f777971197f31279ce4cff22de11670255a by mszymkiewicz
[SPARK-34630][PYTHON][FOLLOWUP] Add __version__ into pyspark init __all__

### What changes were proposed in this pull request?
This patch add `__version__` into pyspark.__init__.__all__ to make the `__version__` as exported explicitly, see more in https://github.com/apache/spark/pull/32110#issuecomment-817331896

### Why are the changes needed?
1. make the `__version__` as exported explicitly
2. cleanup `noqa: F401` on `__version`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Python related CI passed

Closes #32125 from Yikun/SPARK-34629-Follow.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: zero323 <mszymkiewicz@gmail.com>
(commit: 31555f7)
The file was modifiedpython/pyspark/__init__.pyi (diff)
The file was modifiedpython/pyspark/__init__.py (diff)
Commit 58feb8514585d494bcfe75009c5fee0325a97b30 by gurwls223
[SPARK-35034][PYTHON] Port Koalas miscellaneous unit tests into PySpark

### What changes were proposed in this pull request?
Now that we merged the Koalas main code into the PySpark code base (#32036), we should port the Koalas miscellaneous unit tests to PySpark.

### Why are the changes needed?
Currently, the pandas-on-Spark modules are not tested fully. We should enable miscellaneous unit tests.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Enable miscellaneous unit tests.

Closes #32152 from xinrong-databricks/port.misc_tests.

Lead-authored-by: xinrong-databricks <47337188+xinrong-databricks@users.noreply.github.com>
Co-authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 58feb85)
The file was addedpython/pyspark/pandas/tests/test_namespace.py
The file was addedpython/pyspark/pandas/tests/test_groupby.py
The file was addedpython/pyspark/pandas/tests/test_indexing.py
The file was addedpython/pyspark/pandas/tests/test_csv.py
The file was addedpython/pyspark/pandas/tests/test_sql.py
The file was modifieddev/sparktestsupport/modules.py (diff)
The file was addedpython/pyspark/pandas/tests/test_rolling.py
The file was addedpython/pyspark/pandas/tests/test_stats.py
The file was addedpython/pyspark/pandas/tests/test_reshape.py
The file was addedpython/pyspark/pandas/tests/test_categorical.py
The file was addedpython/pyspark/pandas/tests/test_repr.py
The file was addedpython/pyspark/pandas/tests/test_window.py
The file was addedpython/pyspark/pandas/tests/test_expanding.py