Changes

Summary

  1. [SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive (commit: 4f309ce) (details)
  2. [SPARK-35152][SQL] ANSI mode: IntegralDivide throws exception on (commit: 43ad939) (details)
  3. [SPARK-35142][PYTHON][ML] Fix incorrect return type for (commit: b6350f5) (details)
  4. [SPARK-35171][R] Declare the markdown package as a dependency of the (commit: 8e9e700) (details)
  5. [SPARK-35140][INFRA] Add error message guidelines to PR template (commit: 355c399) (details)
  6. [SPARK-34692][SQL] Support Not(Int) and Not(InSet) propagate null in (commit: 81dbaed) (details)
  7. [SPARK-34897][SQL] Support reconcile schemas based on index after nested (commit: e609395) (details)
  8. [SPARK-35178][BUILD] Use new Apache 'closer.lua' syntax to obtain Maven (commit: 6860efe) (details)
  9. [SPARK-34692][SQL][FOLLOWUP] Add INSET to (commit: 548e66c) (details)
  10. [SPARK-34674][CORE][K8S] Close SparkContext after the Main method has (commit: b17a0e6) (details)
  11. [SPARK-35177][SQL] Fix arithmetic overflow in parsing the minimal (commit: bb5459f) (details)
  12. [SPARK-35180][BUILD] Allow to build SparkR with SBT (commit: c0972de) (details)
  13. [SPARK-35127][UI] When we switch between different stage-detail pages, (commit: 7242d7f) (details)
  14. [SPARK-35026][SQL] Support nested CUBE/ROLLUP/GROUPING SETS in GROUPING (commit: b22d54a) (details)
  15. [SPARK-35183][SQL] Use transformAllExpressions in CombineConcats (commit: 7f7a3d8) (details)
  16. [SPARK-35110][SQL] Handle ANSI intervals in WindowExecBase (commit: 6c587d2) (details)
  17. [SPARK-35187][SQL] Fix failure on the minimal interval literal (commit: 04e2305) (details)
  18. [SPARK-34999][PYTHON] Consolidate PySpark testing utils (commit: 4d2b559) (details)
  19. [SPARK-35182][K8S] Support driver-owned on-demand PVC (commit: 6ab0048) (details)
  20. [SPARK-35040][PYTHON] Remove Spark-version related codes from test codes (commit: 4fcbf59) (details)
  21. [SPARK-35075][SQL] Add traversal pruning for subquery related rules (commit: 47f8687) (details)
  22. [SPARK-35195][SQL][TEST] Move InMemoryTable etc to (commit: 86238d0) (details)
  23. [SPARK-35141][SQL] Support two level of hash maps for final hash (commit: cab205e) (details)
  24. [SPARK-35143][SQL][SHELL] Add default log level config for spark-sql (commit: 7582dc8) (details)
  25. [SPARK-35159][SQL][DOCS] Extract hive format doc (commit: 20d68dc) (details)
Commit 4f309cec07b1b1e5f7cddb0f98598fb1a234c2bd by wenchen
[SPARK-35096][SQL] SchemaPruning should adhere spark.sql.caseSensitive config

### What changes were proposed in this pull request?

As a part of the SPARK-26837 pruning of nested fields from object serializers are supported. But it is missed to handle case insensitivity nature of spark

In this PR I have resolved the column names to be pruned based on `spark.sql.caseSensitive ` config
**Exception Before Fix**

```
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
  at org.apache.spark.sql.types.StructType.apply(StructType.scala:414)
  at org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.$anonfun$applyOrElse$3(objects.scala:216)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at scala.collection.TraversableLike.map(TraversableLike.scala:238)
  at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
  at scala.collection.immutable.List.map(List.scala:298)
  at org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.applyOrElse(objects.scala:215)
  at org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning$$anonfun$apply$4.applyOrElse(objects.scala:203)
  at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:309)
  at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
  at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:309)
  at
```

### Why are the changes needed?
After Upgrade to Spark 3 `foreachBatch` API throws` java.lang.ArrayIndexOutOfBoundsException`. This issue will be fixed using this PR

### Does this PR introduce _any_ user-facing change?
No, Infact fixes the regression

### How was this patch tested?
Added tests and also tested verified manually

Closes #32194 from sandeep-katta/SPARK-35096.

Authored-by: sandeep.katta <sandeep.katta2007@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 4f309ce)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruningSuite.scala (diff)
Commit 43ad939a7e96744038b293b17682b2d86be2bcb9 by ltnwgl
[SPARK-35152][SQL] ANSI mode: IntegralDivide throws exception on overflow

### What changes were proposed in this pull request?

IntegralDivide should throw an exception on overflow in ANSI mode.
There is only one case that can cause that:
```
Long.MinValue div -1
```

### Why are the changes needed?

ANSI compliance

### Does this PR introduce _any_ user-facing change?

Yes, IntegralDivide throws an exception on overflow in ANSI mode

### How was this patch tested?

Unit test

Closes #32260 from gengliangwang/integralDiv.

Authored-by: Gengliang Wang <ltnwgl@gmail.com>
Signed-off-by: Gengliang Wang <ltnwgl@gmail.com>
(commit: 43ad939)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala (diff)
Commit b6350f5bb00f99a060953850b069a419b70c329e by weichen.xu
[SPARK-35142][PYTHON][ML] Fix incorrect return type for `rawPredictionUDF` in `OneVsRestModel`

### What changes were proposed in this pull request?

Fixes incorrect return type for `rawPredictionUDF` in `OneVsRestModel`.

### Why are the changes needed?
Bugfix

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test.

Closes #32245 from harupy/SPARK-35142.

Authored-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
(commit: b6350f5)
The file was modifiedpython/pyspark/ml/classification.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_algorithms.py (diff)
Commit 8e9e70045b02532a5bfc900581154cbc84f5ad47 by gurwls223
[SPARK-35171][R] Declare the markdown package as a dependency of the SparkR package

### What changes were proposed in this pull request?
Declare the markdown package as a dependency of the SparkR package

### Why are the changes needed?
If we didn't install pandoc locally, running make-distribution.sh will fail with the following message:
```
— re-building ‘sparkr-vignettes.Rmd’ using rmarkdown
Warning in engine$weave(file, quiet = quiet, encoding = enc) :
Pandoc (>= 1.12.3) not available. Falling back to R Markdown v1.
Error: processing vignette 'sparkr-vignettes.Rmd' failed with diagnostics:
The 'markdown' package should be declared as a dependency of the 'SparkR' package (e.g., in the 'Suggests' field of DESCRIPTION), because the latter contains vignette(s) built with the 'markdown' package. Please see https://github.com/yihui/knitr/issues/1864 for more information.
— failed re-building ‘sparkr-vignettes.Rmd’
```

### Does this PR introduce _any_ user-facing change?
Yes. Workaround for R packaging.

### How was this patch tested?
Manually test. After the fix, the command `sh dev/make-distribution.sh -Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pyarn` in the environment without pandoc will pass.

Closes #32270 from xuanyuanking/SPARK-35171.

Authored-by: Yuanjian Li <yuanjian.li@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 8e9e700)
The file was modifiedR/pkg/DESCRIPTION (diff)
Commit 355c39939d9e4c87ffc9538eb822a41cb2ff93fb by gurwls223
[SPARK-35140][INFRA] Add error message guidelines to PR template

### What changes were proposed in this pull request?

Adds a link to the [error message guidelines](https://spark.apache.org/error-message-guidelines.html) to the PR template to increase visibility.

### Why are the changes needed?

Increases visibility of the error message guidelines, which are otherwise hidden in the [Contributing guidelines](https://spark.apache.org/contributing.html).

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Not needed.

Closes #32241 from karenfeng/spark-35140.

Authored-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 355c399)
The file was modified.github/PULL_REQUEST_TEMPLATE (diff)
Commit 81dbaededeb450ba1b8c96af05dcafdfad928d0a by wenchen
[SPARK-34692][SQL] Support Not(Int) and Not(InSet) propagate null in predicate

### What changes were proposed in this pull request?

* Add `Not(In)` and `Not(InSet)` check in `NullPropagation` rule.
* Add more test for `In` and `Not(In)` in `Project` level.

### Why are the changes needed?

The semantics of `Not(In)` could be seen like `And(a != b, a != c)` that match the `NullIntolerant`.

As we already simplify the `NullIntolerant` expression to null if it's children have null. E.g. `a != null` => `null`. It's safe to do this with `Not(In)`/`Not(InSet)`.

Note that, we can only do the simplify in predicate which `ReplaceNullWithFalseInPredicate`  rule do.

Let's say we have two sqls:
```
select 1 not in (2, null);
select 1 where 1 not in (2, null);
```
The first sql we cannot optimize since it would return `NULL` instead of `false`. The second is postive.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Add test.

Closes #31797 from ulysses-you/SPARK-34692.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 81dbaed)
The file was modifiedsql/core/src/test/resources/sql-tests/results/predicate-functions.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/predicate-functions.sql (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala (diff)
Commit e60939591336a2c38e5b7a6a36f45b8614b46fe5 by viirya
[SPARK-34897][SQL] Support reconcile schemas based on index after nested column pruning

### What changes were proposed in this pull request?

It will remove `StructField` when [pruning nested columns](https://github.com/apache/spark/blob/0f2c0b53e8fb18c86c67b5dd679c006db93f94a5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala#L28-L42). For example:
```scala
spark.sql(
  """
    |CREATE TABLE t1 (
    |  _col0 INT,
    |  _col1 STRING,
    |  _col2 STRUCT<c1: STRING, c2: STRING, c3: STRING, c4: BIGINT>)
    |USING ORC
    |""".stripMargin)

spark.sql("INSERT INTO t1 values(1, '2', struct('a', 'b', 'c', 10L))")

spark.sql("SELECT _col0, _col2.c1 FROM t1").show
```

Before this pr. The returned schema is: ``` `_col0` INT,`_col2` STRUCT<`c1`: STRING> ``` add it will throw exception:
```
java.lang.AssertionError: assertion failed: The given data schema struct<_col0:int,_col2:struct<c1:string>> has less fields than the actual ORC physical schema, no idea which columns were dropped, fail to read.
at scala.Predef$.assert(Predef.scala:223)
at org.apache.spark.sql.execution.datasources.orc.OrcUtils$.requestedColumnIds(OrcUtils.scala:160)
```

After this pr. The returned schema is: ``` `_col0` INT,`_col1` STRING,`_col2` STRUCT<`c1`: STRING> ```.

The finally schema is ``` `_col0` INT,`_col2` STRUCT<`c1`: STRING> ``` after the complete column pruning:
https://github.com/apache/spark/blob/7a5647a93aaea9d1d78d9262e24fc8c010db04d0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala#L208-L213

https://github.com/apache/spark/blob/e64eb75aede71a5403a4d4436e63b1fcfdeca14d/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownUtils.scala#L96-L97

### Why are the changes needed?

Fix bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #31993 from wangyum/SPARK-34897.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: e609395)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruningSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownUtils.scala (diff)
Commit 6860efe4688ea1057e7a2e57b086721e62198d44 by dhyun
[SPARK-35178][BUILD] Use new Apache 'closer.lua' syntax to obtain Maven

### What changes were proposed in this pull request?

Use new Apache 'closer.lua' syntax to obtain Maven

### Why are the changes needed?

The current closer.lua redirector, which redirects to download Maven from a local mirror, has a new syntax. build/mvn does not work properly otherwise now.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual testing.

Closes #32277 from srowen/SPARK-35178.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 6860efe)
The file was modifiedbuild/mvn (diff)
Commit 548e66c98a5503e0d7567d688878e1612fe8ed51 by dhyun
[SPARK-34692][SQL][FOLLOWUP] Add INSET to ReplaceNullWithFalseInPredicate's pattern

### What changes were proposed in this pull request?

The test added by #31797 has the [failure](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137741/testReport/org.apache.spark.sql.catalyst.optimizer/ReplaceNullWithFalseInPredicateSuite/SPARK_34692__Support_Not_Int__and_Not_InSet__propagate_null/). This is a followup to fix it.

### Why are the changes needed?

Due to #32157, the rule `ReplaceNullWithFalseInPredicate` will check tree pattern before actually doing transformation. As `null` in `INSET` is not `NULL_LITERAL` pattern, we miss it and fail the newly added `not inset ...` check in `replaceNullWithFalse`.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing unit tests.

Closes #32278 from viirya/SPARK-34692-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 548e66c)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala (diff)
Commit b17a0e6931cac98cc839c047b1b5d4ea6d052009 by dhyun
[SPARK-34674][CORE][K8S] Close SparkContext after the Main method has finished

### What changes were proposed in this pull request?
Close SparkContext after the Main method has finished, to allow SparkApplication on K8S to complete.
This is fixed version of [merged and reverted PR](https://github.com/apache/spark/pull/32081).

### Why are the changes needed?
if I don't call the method sparkContext.stop() explicitly, then a Spark driver process doesn't terminate even after its Main method has been completed. This behaviour is different from spark on yarn, where the manual sparkContext stopping is not required. It looks like, the problem is in using non-daemon threads, which prevent the driver jvm process from terminating.
So I have inserted code that closes sparkContext automatically.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually on the production AWS EKS environment in my company.

Closes #32283 from kotlovs/close-spark-context-on-exit-2.

Authored-by: skotlov <skotlov@joom.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: b17a0e6)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala (diff)
Commit bb5459fb26b9d0d57eadee8b10b7488607eeaeb2 by max.gekk
[SPARK-35177][SQL] Fix arithmetic overflow in parsing the minimal interval by `IntervalUtils.fromYearMonthString`

### What changes were proposed in this pull request?
IntervalUtils.fromYearMonthString should handle Int.MinValue months correctly.
In current logic, just use `Math.addExact(Math.multiplyExact(years, 12), months)` to calculate  negative total months will overflow when actual total months is Int.MinValue, this pr fixes this bug.

### Why are the changes needed?
IntervalUtils.fromYearMonthString should handle Int.MinValue months correctly

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT

Closes #32281 from AngersZhuuuu/SPARK-35177.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: bb5459f)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/IntervalUtilsSuite.scala (diff)
Commit c0972dec1d49417aa2ee2e18c638be8b976833c5 by hyukjinkwon
[SPARK-35180][BUILD] Allow to build SparkR with SBT

### What changes were proposed in this pull request?

This PR proposes a change that allows us to build SparkR with SBT.

### Why are the changes needed?

In the current master, SparkR can be built only with Maven.
It's helpful if we can built it with SBT.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed that I can build SparkR on Ubuntu 20.04 with the following command.
```
build/sbt -Psparkr package
```

Closes #32285 from sarutak/sbt-sparkr.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
(commit: c0972de)
The file was modifiedproject/SparkBuild.scala (diff)
The file was modifiedR/README.md (diff)
Commit 7242d7f774b821cbcb564c8afeddecaf6664d159 by sarutak
[SPARK-35127][UI] When we switch between different stage-detail pages, the entry item in the newly-opened page may be blank

### What changes were proposed in this pull request?

To make sure that pageSize shoud not be shared between different stage pages.
The screenshots of the problem are placed in the attachment of [JIRA](https://issues.apache.org/jira/browse/SPARK-35127)

### Why are the changes needed?
fix the bug.

according to reference:`https://datatables.net/reference/option/lengthMenu`
`-1` represents display all rows, but now we use `totalTasksToShow`, it will cause the select item show as empty when we swich between different stage-detail pages.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
manual test, it is a small io problem, and the modification does not affect the function, but just an adjustment of js configuration

the gif below shows how the problem can be reproduced:
![reproduce](https://user-images.githubusercontent.com/52202080/115204351-f7060f80-a12a-11eb-8900-a009ad0c8870.gif)

![微信截图_20210419162849](https://user-images.githubusercontent.com/52202080/115205675-629cac80-a12c-11eb-9cb8-1939c7450e99.png)

the gif below shows the result after modified:

![after_modified](https://user-images.githubusercontent.com/52202080/115204886-91fee980-a12b-11eb-9ccb-d5900a99095d.gif)

Closes #32223 from kyoty/stages-task-empty-pagesize.

Authored-by: kyoty <echohlne@gmail.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(commit: 7242d7f)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/stagepage.js (diff)
Commit b22d54a58a7733ff97c3cc0474397de0f7e14a29 by wenchen
[SPARK-35026][SQL] Support nested CUBE/ROLLUP/GROUPING SETS in GROUPING SETS

### What changes were proposed in this pull request?
PG and Oracle both support use CUBE/ROLLUP/GROUPING SETS in GROUPING SETS's grouping set as a sugar syntax.
![image](https://user-images.githubusercontent.com/46485123/114975588-139a1180-9eb7-11eb-8f53-498c1db934e0.png)

In this PR, we support it in Spark SQL too

### Why are the changes needed?
Keep consistent with PG and oracle

### Does this PR introduce _any_ user-facing change?
User can write grouping analytics like
```
SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS(ROLLUP(a, b));
SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS((a, b), (a), ());
SELECT a, b, count(1) FROM testData GROUP BY a, GROUPING SETS(GROUPING SETS((a, b), (a), ()));
```

### How was this patch tested?
Added Test

Closes #32201 from AngersZhuuuu/SPARK-35026.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: b22d54a)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/group-analytics.sql.out (diff)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/group-analytics.sql.out (diff)
The file was modifieddocs/sql-ref-syntax-qry-select-groupby.md (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/group-analytics.sql (diff)
Commit 7f7a3d80a641f3550e9117ace2ba058c5ab2caaf by wenchen
[SPARK-35183][SQL] Use transformAllExpressions in CombineConcats

### What changes were proposed in this pull request?

Use transformAllExpressions instead of transformExpressionsDown in CombineConcats. The latter only transforms the root plan node.

### Why are the changes needed?

It allows CombineConcats to cover more cases where `concat` are not in the root plan node.

### How was this patch tested?

Unit test. The updated tests would fail without the code change.

Closes #32290 from sigmod/concat.

Authored-by: Yingyi Bu <yingyi.bu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 7f7a3d8)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CombineConcatsSuite.scala (diff)
Commit 6c587d262748a2b469a0786c244e2e555f5f5a74 by max.gekk
[SPARK-35110][SQL] Handle ANSI intervals in WindowExecBase

### What changes were proposed in this pull request?
This PR makes window frame could support `YearMonthIntervalType` and `DayTimeIntervalType`.

### Why are the changes needed?
Extend the function of window frame

### Does this PR introduce _any_ user-facing change?
Yes. Users could use `YearMonthIntervalType` or `DayTimeIntervalType` as the sort expression for window frame.

### How was this patch tested?
New tests

Closes #32294 from beliefer/SPARK-35110.

Authored-by: beliefer <beliefer@163.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: 6c587d2)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/window.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/window.sql (diff)
Commit 04e2305a9be8953623b851aab05e6ec8adc583a5 by max.gekk
[SPARK-35187][SQL] Fix failure on the minimal interval literal

### What changes were proposed in this pull request?
If the sign '-' inside of interval string, everything is fine after https://github.com/apache/spark/commit/bb5459fb26b9d0d57eadee8b10b7488607eeaeb2:
```
spark-sql> SELECT INTERVAL '-178956970-8' YEAR TO MONTH;
-178956970-8
```
but the sign outside of interval string is not handled properly:
```
spark-sql> SELECT INTERVAL -'178956970-8' YEAR TO MONTH;
Error in query:
Error parsing interval year-month string: integer overflow(line 1, pos 16)

== SQL ==
SELECT INTERVAL -'178956970-8' YEAR TO MONTH
----------------^^^
```
This pr fix this issue

### Why are the changes needed?
Fix bug

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT

Closes #32296 from AngersZhuuuu/SPARK-35187.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: 04e2305)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/interval.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/interval.sql (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/interval.sql.out (diff)
Commit 4d2b559d92425aef73aafa47bb20c395c0cc35fd by ueshin
[SPARK-34999][PYTHON] Consolidate PySpark testing utils

### What changes were proposed in this pull request?
Consolidate PySpark testing utils by removing `python/pyspark/pandas/testing`, and then creating a file `pandasutils` under `python/pyspark/testing` for test utilities used in `pyspark/pandas`.

### Why are the changes needed?

`python/pyspark/pandas/testing` hold test utilites for pandas-on-spark, and `python/pyspark/testing` contain test utilities for pyspark. Consolidating them makes code cleaner and easier to maintain.

Updated import statements are as shown below:
- from pyspark.testing.sqlutils import SQLTestUtils
- from pyspark.testing.pandasutils import PandasOnSparkTestCase, TestUtils
(PandasOnSparkTestCase is the original ReusedSQLTestCase in `python/pyspark/pandas/testing/utils.py`)

Minor improvements include:
- Usage of missing library's requirement_message
- `except ImportError` rather than `except`
- import pyspark.pandas alias as `ps` rather than `pp`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests under python/pyspark/pandas/tests.

Closes #32177 from xinrong-databricks/port.merge_utils.

Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
(commit: 4d2b559)
The file was modifiedpython/pyspark/pandas/tests/test_series_string.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_ops_on_diff_frames_groupby.py (diff)
The file was modifiedpython/pyspark/pandas/tests/plot/test_frame_plot_plotly.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_dataframe.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_dataframe_conversion.py (diff)
The file was modifiedpython/pyspark/pandas/tests/indexes/test_category.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_utils.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_extension.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_ops_on_diff_frames.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_dataframe_spark_io.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_sql.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_internal.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_namespace.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_indexops_spark.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_config.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_csv.py (diff)
The file was modifiedpython/pyspark/pandas/tests/plot/test_series_plot_matplotlib.py (diff)
The file was modifiedpython/pyspark/pandas/tests/plot/test_series_plot.py (diff)
The file was modifiedpython/pyspark/pandas/tests/indexes/test_base.py (diff)
The file was modifiedpython/pyspark/pandas/tests/plot/test_series_plot_plotly.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_reshape.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_rolling.py (diff)
The file was addedpython/pyspark/testing/pandasutils.py
The file was modifiedpython/pyspark/testing/utils.py (diff)
The file was modifiedpython/pyspark/pandas/tests/plot/test_frame_plot.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_stats.py (diff)
The file was modifiedpython/pyspark/pandas/tests/indexes/test_datetime.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_ops_on_diff_frames_groupby_expanding.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_categorical.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_frame_spark.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_series_datetime.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_series.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_expanding.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_default_index.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_groupby.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_series_conversion.py (diff)
The file was modifiedpython/pyspark/pandas/tests/plot/test_frame_plot_matplotlib.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_indexing.py (diff)
The file was removedpython/pyspark/pandas/testing/utils.py
The file was modifiedpython/pyspark/pandas/tests/test_numpy_compat.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_repr.py (diff)
The file was removedpython/pyspark/pandas/testing/__init__.py
The file was modifiedpython/pyspark/pandas/tests/test_rolling.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_window.py (diff)
Commit 6ab00488d0b29040b27bdc4a462ef6066470d980 by dhyun
[SPARK-35182][K8S] Support driver-owned on-demand PVC

### What changes were proposed in this pull request?

This PR aims to support driver-owned on-demand PVC(Persistent Volume Claim)s. It means dynamically-created PVCs will have the `ownerReference` to `driver` pod instead of `executor` pod.

### Why are the changes needed?

This allows K8s backend scheduler can reuse this later.

**BEFORE**
```
$ k get pvc tpcds-pvc-exec-1-pvc-0 -oyaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
...
  ownerReferences:
  - apiVersion: v1
    controller: true
    kind: Pod
    name: tpcds-pvc-exec-1
```

**AFTER**
```
$ k get pvc tpcds-pvc-exec-1-pvc-0 -oyaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
...
  ownerReferences:
  - apiVersion: v1
    controller: true
    kind: Pod
    name: tpcds-pvc
```

### Does this PR introduce _any_ user-facing change?

No. (The default is `false`)

### How was this patch tested?

Manually check the above and pass K8s IT.

```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- SPARK-33615: Launcher client archives
- SPARK-33748: Launcher python client respecting PYSPARK_PYTHON
- SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python
- Launcher python client dependencies using a zip file
- Test basic decommissioning
- Test basic decommissioning with shuffle cleanup
- Test decommissioning with dynamic allocation & shuffle cleanups
- Test decommissioning timeouts
- Run SparkR on simple dataframe.R example
Run completed in 16 minutes, 40 seconds.
Total number of tests run: 27
Suites: completed 2, aborted 0
Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

Closes #32288 from dongjoon-hyun/SPARK-35182.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 6ab0048)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala (diff)
Commit 4fcbf59079f591dc2214bd16aade340fd380f914 by ueshin
[SPARK-35040][PYTHON] Remove Spark-version related codes from test codes

### What changes were proposed in this pull request?

Removes PySpark version dependent codes from pyspark.pandas test codes.

### Why are the changes needed?

There are several places to check the PySpark version and switch the logic, but now those are not necessary.
We should remove them.

We will do the same thing after we finish porting tests.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #32300 from xinrong-databricks/port.rmv_spark_version_chk_in_tests.

Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
(commit: 4fcbf59)
The file was modifiedpython/pyspark/pandas/tests/test_dataframe.py (diff)
The file was modifiedpython/pyspark/pandas/tests/indexes/test_base.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_dataframe_spark_io.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_ops_on_diff_frames.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_frame_spark.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_series.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_repr.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_reshape.py (diff)
Commit 47f86875f73edc3bec56d3610ab46a16ec37091c by ltnwgl
[SPARK-35075][SQL] Add traversal pruning for subquery related rules

### What changes were proposed in this pull request?

Added the following TreePattern enums:
- DYNAMIC_PRUNING_SUBQUERY
- EXISTS_SUBQUERY
- IN_SUBQUERY
- LIST_SUBQUERY
- PLAN_EXPRESSION
- SCALAR_SUBQUERY
- FILTER

Used them in the following rules:
- ResolveSubquery
- UpdateOuterReferences
- OptimizeSubqueries
- RewritePredicateSubquery
- PullupCorrelatedPredicates
- RewriteCorrelatedScalarSubquery (not the rule itself but an internal transform call, the full support is in SPARK-35148)
- InsertAdaptiveSparkPlan
- PlanAdaptiveSubqueries

### Why are the changes needed?

Reduce the number of tree traversals and hence improve the query compilation latency.

### How was this patch tested?

Existing tests.

Closes #32247 from sigmod/subquery.

Authored-by: Yingyi Bu <yingyi.bu@databricks.com>
Signed-off-by: Gengliang Wang <ltnwgl@gmail.com>
(commit: 47f8687)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/DynamicPruning.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/InsertAdaptiveSparkPlan.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleIdCollection.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PlanAdaptiveSubqueries.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PlanDynamicPruningFilters.scala (diff)
Commit 86238d0e88f00b5ad5357da3cb4545799fd53e47 by viirya
[SPARK-35195][SQL][TEST] Move InMemoryTable etc to org.apache.spark.sql.connector.catalog

### What changes were proposed in this pull request?

Move the following classes:
- `InMemoryAtomicPartitionTable`
- `InMemoryPartitionTable`
- `InMemoryPartitionTableCatalog`
- `InMemoryTable`
- `InMemoryTableCatalog`
- `StagingInMemoryTableCatalog`

from `org.apache.spark.sql.connector` to `org.apache.spark.sql.connector.catalog`.

### Why are the changes needed?

These classes implement catalog related interfaces but reside in `org.apache.spark.sql.connector`. A more suitable place should be `org.apache.spark.sql.connector.catalog`.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

Closes #32302 from sunchao/SPARK-35195.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: 86238d0)
The file was removedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/StagingInMemoryTableCatalog.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2DataFrameSuite.scala (diff)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryPartitionTableCatalog.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/V1ReadFallbackSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/CharVarcharDDLTestBase.scala (diff)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/StagingInMemoryTableCatalog.scala
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/CatalogManagerSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/V1WriteFallbackSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DatasourceV2SQLBase.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagementSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSessionCatalogSuite.scala (diff)
The file was removedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/HiveResultSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/CommandSuiteBase.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/ShowNamespacesSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/WriteDistributionAndOrderingSuite.scala (diff)
The file was removedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryPartitionTableCatalog.scala
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTable.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala (diff)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryAtomicPartitionTable.scala
The file was removedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryPartitionTable.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TableLookupCacheSuite.scala (diff)
The file was removedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTableCatalog.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/SupportsCatalogOptionsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/CreateTablePartitioningValidationSuite.scala (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameWriterV2Suite.java (diff)
The file was removedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryAtomicPartitionTable.scala
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/TableCatalogSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/SupportsPartitionManagementSuite.scala (diff)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryPartitionTable.scala
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala
Commit cab205e9e4a567feec0b698cc8795449bcadf3f8 by wenchen
[SPARK-35141][SQL] Support two level of hash maps for final hash aggregation

### What changes were proposed in this pull request?

For partial hash aggregation (code-gen path), we have two level of hash map for aggregation. First level is from `RowBasedHashMapGenerator`, which is computation faster compared to the second level from `UnsafeFixedWidthAggregationMap`. The introducing of two level hash map can help improve CPU performance of query as the first level hash map normally fits in hardware cache and has cheaper hash function for key lookup.

For final hash aggregation, we can also support two level of hash map, to improve query performance further.
The original two level of hash map code works for final aggregation mostly out of box. The major change here is to support testing fall back of final aggregation (see change related to `bitMaxCapacity` and `checkFallbackForGeneratedHashMap`).

Example:

An aggregation query:

```
spark.sql(
  """
    |SELECT key, avg(value)
    |FROM agg1
    |GROUP BY key
  """.stripMargin)
```

The generated code for final aggregation is [here](https://gist.github.com/c21/20c10cc8e2c7e561aafbe9b8da055242).

An aggregation query with testing fallback:
```
withSQLConf("spark.sql.TungstenAggregate.testFallbackStartsAt" -> "2, 3") {
  spark.sql(
    """
      |SELECT key, avg(value)
      |FROM agg1
      |GROUP BY key
    """.stripMargin)
}
```
The generated code for final aggregation is [here](https://gist.github.com/c21/dabf176cbc18a5e2138bc0a29e81c878). Note the no more counter condition for first level fast map.

### Why are the changes needed?

Improve the CPU performance of hash aggregation query in general.

For `AggregateBenchmark."Aggregate w multiple keys"`, seeing query performance improved by 10%.
`codegen = T` means whole stage code-gen is enabled.
`hashmap = T` means two level maps is enabled for partial aggregation.
`finalhashmap = T` means two level maps is enabled for final aggregation.

```
Running benchmark: Aggregate w multiple keys
  Running case: codegen = F
  Stopped after 2 iterations, 8284 ms
  Running case: codegen = T hashmap = F
  Stopped after 2 iterations, 5424 ms
  Running case: codegen = T hashmap = T finalhashmap = F
  Stopped after 2 iterations, 4753 ms
  Running case: codegen = T hashmap = T finalhashmap = T
  Stopped after 2 iterations, 4508 ms

Java HotSpot(TM) 64-Bit Server VM 1.8.0_181-b13 on Mac OS X 10.15.7
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz
Aggregate w multiple keys:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
codegen = F                                        3881           4142         370          5.4         185.1       1.0X
codegen = T hashmap = F                            2701           2712          16          7.8         128.8       1.4X
codegen = T hashmap = T finalhashmap = F           2363           2377          19          8.9         112.7       1.6X
codegen = T hashmap = T finalhashmap = T           2252           2254           3          9.3         107.4       1.7X
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit test in `HashAggregationQuerySuite` and `HashAggregationQueryWithControlledFallbackSuite` already cover the test.

Closes #32242 from c21/agg.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: cab205e)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala (diff)
Commit 7582dc86bc7c63aefaf31ef50c142b0dba8aa8b9 by hyukjinkwon
[SPARK-35143][SQL][SHELL] Add default log level config for spark-sql

### What changes were proposed in this pull request?
Add default log config for spark-sql

### Why are the changes needed?
The default log level for spark-sql is `WARN`. How to change the log level is confusing, we need a default config.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Change config `log4j.logger.org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver=INFO` in log4j.properties, then spark-sql's default log level changed.

Closes #32248 from hddong/spark-35413.

Lead-authored-by: hongdongdong <hongdongdong@cmss.chinamobile.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
(commit: 7582dc8)
The file was modifiedconf/log4j.properties.template (diff)
Commit 20d68dc2f47ea592ecd9ef708167a69bb1c5c47b by wenchen
[SPARK-35159][SQL][DOCS] Extract hive format doc

### What changes were proposed in this pull request?
Extract common doc about hive format for `sql-ref-syntax-ddl-create-table-hiveformat.md` and `sql-ref-syntax-qry-select-transform.md` to refer.

![image](https://user-images.githubusercontent.com/46485123/115802193-04641800-a411-11eb-827d-d92544881842.png)

### Why are the changes needed?
Improve doc

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes #32264 from AngersZhuuuu/SPARK-35159.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 20d68dc)
The file was modifieddocs/sql-ref-syntax-ddl-create-table-hiveformat.md (diff)
The file was addeddocs/sql-ref-syntax-hive-format.md
The file was modifieddocs/sql-ref-syntax-qry-select-transform.md (diff)