Changes

Summary

  1. [SPARK-35255][BUILD] Automated formatting for Scala Code for Blank Lines (commit: 77e9152) (details)
  2. [SPARK-35277][BUILD] Upgrade snappy to 1.1.8.4 (commit: ac8813e) (details)
  3. [SPARK-35111][SQL] Support Cast string to year-month interval (commit: 11ea255) (details)
  4. [SPARK-35264][SQL] Support AQE side broadcastJoin threshold (commit: 39889df) (details)
  5. [SPARK-35280][K8S] Promote KubernetesUtils to DeveloperApi (commit: 4e8701a) (details)
  6. [SPARK-35273][SQL] CombineFilters support non-deterministic expressions (commit: 72e238a) (details)
  7. [SPARK-35278][SQL] Invoke should find the method with correct number of (commit: 6ce1b16) (details)
  8. [SPARK-34581][SQL] Don't optimize out grouping expressions from (commit: cfc0495) (details)
  9. [SPARK-35112][SQL] Support Cast string to day-second interval (commit: caa46ce) (details)
  10. [SPARK-35192][SQL][TESTS] Port minimal TPC-DS datagen code from (commit: cd689c9) (details)
  11. [SPARK-35285][SQL] Parse ANSI interval types in SQL schema (commit: 335f00b) (details)
  12. [SPARK-35281][SQL] StaticInvoke should not apply boxing if return type (commit: 2a8d7ed) (details)
  13. [SPARK-35176][PYTHON] Standardize input validation error type (commit: 44b7931) (details)
  14. [SPARK-35266][TESTS] Fix error in BenchmarkBase.scala that occurs when (commit: be6ecb6) (details)
  15. [MINOR][SS][DOCS] Fix a typo in the documentation of GroupState (commit: 54e0aa1) (details)
  16. [SPARK-35250][SQL][DOCS] Fix duplicated STOP_AT_DELIMITER to SKIP_VALUE (commit: 8aaa9e8) (details)
  17. [SPARK-35292][PYTHON] Delete redundant parameter in mypy configuration (commit: 176218b) (details)
  18. [SPARK-34887][PYTHON] Port Koalas dependencies into PySpark (commit: 120c389) (details)
  19. [SPARK-35300][PYTHON][DOCS] Standardize module names in install.rst (commit: 5ecb112) (details)
  20. [SPARK-35302][INFRA] Benchmark workflow should create new files for new (commit: a2927cb) (details)
  21. [SPARK-35308][TESTS] Fix bug in SPARK-35266 that creates benchmark files (commit: 9b387a1) (details)
  22. [SPARK-35294][SQL] Add tree traversal pruning in rules with dedicated (commit: 7fd3f8f) (details)
  23. [SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame (commit: f550e03) (details)
  24. [SPARK-34854][SQL][SS] Expose source metrics via progress report and add (commit: bbdbe0f) (details)
  25. [SPARK-35315][TESTS] Keep benchmark result consistent between (commit: 4fe4b65) (details)
  26. [SPARK-35155][SQL] Add rule id pruning to Analyzer rules (commit: 7970318) (details)
  27. [SPARK-35323][BUILD] Remove unused libraries from LICENSE-binary (commit: 0126924) (details)
  28. [SPARK-35319][K8S][BUILD] Upgrade K8s client to 5.3.1 (commit: a0c76a8) (details)
  29. [SPARK-35325][SQL][TESTS] Add nested column ORC encryption test case (commit: 19661f6) (details)
  30. [SPARK-35293][SQL][TESTS] Use the newer dsdgen for TPCDSQueryTestSuite (commit: 5c67d0c) (details)
Commit 77e9152898112f3acdf8e9b4820d3c7e9ac791ed by gurwls223
[SPARK-35255][BUILD] Automated formatting for Scala Code for Blank Lines

### What changes were proposed in this pull request?

https://github.com/databricks/scala-style-guide#blanklines
https://scalameta.org/scalafmt/docs/configuration.html#newlinestoplevelstatements

### How was this patch tested?

Manually tested by modifying a few files and running ./dev/scalafmt then checking that ./dev/scalastyle still passed.

Closes #32383 from lipzhu/SPARK-35255.

Authored-by: lipzhu <lipzhu@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 77e9152)
The file was modifieddev/.scalafmt.conf (diff)
The file was modifiedpom.xml (diff)
Commit ac8813e37c273565c2634b87f486230b4bf9bc46 by dhyun
[SPARK-35277][BUILD] Upgrade snappy to 1.1.8.4

### What changes were proposed in this pull request?
This PR aims to upgrade snappy to version 1.1.8.4.

### Why are the changes needed?
This will bring the latest bug fixes and improvements.
- https://github.com/xerial/snappy-java/blob/master/Milestone.md#snappy-java-1183-2021-01-20

    - Make pure-java Snappy thread-safe
    - Improved SnappyFramedInput/OutputStream performance by using java.util.zip.CRC32C

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?
Pass the CIs.

Closes #32402 from williamhyun/snappy1184.

Authored-by: William Hyun <william@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: ac8813e)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
Commit 11ea255283509a9f016f378df4865235a25b1851 by max.gekk
[SPARK-35111][SQL] Support Cast string to year-month interval

### What changes were proposed in this pull request?
Support Cast string to year-month interval
Supported format as below
```
ANSI_STYLE, like
INTERVAL -'-10-1' YEAR TO MONTH
HIVE_STYLE like
10-1 or -10-1

Rules from the SQL standard about ANSI_STYLE:

<interval literal> ::=
  INTERVAL [ <sign> ] <interval string> <interval qualifier>
<interval string> ::=
  <quote> <unquoted interval string> <quote>
<unquoted interval string> ::=
  [ <sign> ] { <year-month literal> | <day-time literal> }
<year-month literal> ::=
  <years value> [ <minus sign> <months value> ]
  | <months value>
<years value> ::=
  <datetime value>
<months value> ::=
  <datetime value>
<datetime value> ::=
  <unsigned integer>
<unsigned integer> ::= <digit>...
```
### Why are the changes needed?
Support Cast string to year-month interval

### Does this PR introduce _any_ user-facing change?
User can cast year month interval string to YearMonthIntervalType

### How was this patch tested?
Added UT

Closes #32266 from AngersZhuuuu/SPARK-SPARK-35111.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: 11ea255)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala (diff)
Commit 39889df32a7a916d826e255fda6fc62e2a3d7971 by wenchen
[SPARK-35264][SQL] Support AQE side broadcastJoin threshold

### What changes were proposed in this pull request?

~~This PR aims to add a new AQE optimizer rule `DynamicJoinSelection`. Like other AQE partition number configs, this rule add a new broadcast threshold config `spark.sql.adaptive.autoBroadcastJoinThreshold`.~~
This PR amis to add a flag in `Statistics` to distinguish AQE stats or normal stats, so that we can make some sql configs isolation between AQE and normal.

### Why are the changes needed?

The main idea here is that make join config isolation between normal planner and aqe planner which shared the same code path.

Actually we do not very trust using the static stats to consider if it can build broadcast hash join. In our experience it's very common that Spark throw broadcast timeout or driver side OOM exception when execute a bit large plan. And due to braodcast join is not reversed which means if we covert join to braodcast hash join at first time, we(AQE) can not optimize it again, so it should make sense to decide if we can do broadcast at aqe side using different sql config.

### Does this PR introduce _any_ user-facing change?

Yes, a new config `spark.sql.adaptive.autoBroadcastJoinThreshold` added.

### How was this patch tested?

Add new test.

Closes #32391 from ulysses-you/SPARK-35264.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 39889df)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit 4e8701a77dff729c4e8e0ad39c16e2717c2c32fe by dhyun
[SPARK-35280][K8S] Promote KubernetesUtils to DeveloperApi

### What changes were proposed in this pull request?

Since SPARK-22757, `KubernetesUtils` has been used as an important utility class by all K8s modules and `ExternalClusterManager`s. This PR aims to promote `KubernetesUtils` to `DeveloperApi` in order to maintain it officially in a backward compatible way at Apache Spark 3.2.0.

### Why are the changes needed?

Apache Spark 3.1.1 makes `Kubernetes` module GA and provides an extensible external cluster manager framework. To have `ExternalClusterManager` for K8s environment, `KubernetesUtils` class is crucial and needs to be stable. By promoting to a subset of K8s developer API, we can maintain these more sustainable way and give a better and stable functionality to K8s users.

In this PR, `Since` annotations denote the last function signature changes because these are going to become public at Apache Spark 3.2.0.

| Version | Function Name |
|-|-|
| 2.3.0 | parsePrefixedKeyValuePairs |
| 2.3.0 | requireNandDefined |
| 2.3.0 | parsePrefixedKeyValuePairs |
| 2.4.0 | parseMasterUrl |
| 3.0.0 | requireBothOrNeitherDefined |
| 3.0.0 | requireSecondIfFirstIsDefined |
| 3.0.0 | selectSparkContainer |
| 3.0.0 | formatPairsBundle |
| 3.0.0 | formatPodState |
| 3.0.0 | containersDescription |
| 3.0.0 | containerStatusDescription |
| 3.0.0 | formatTime |
| 3.0.0 | uniqueID |
| 3.0.0 | buildResourcesQuantities |
| 3.0.0 | uploadAndTransformFileUris |
| 3.0.0 | uploadFileUri |
| 3.0.0 | requireBothOrNeitherDefined |
| 3.0.0 | buildPodWithServiceAccount |
| 3.0.0 | isLocalAndResolvable |
| 3.1.1 | renameMainAppResource |
| 3.1.1 | addOwnerReference |
| 3.2.0 | loadPodFromTemplate |

### Does this PR introduce _any_ user-facing change?

Yes, but this is new API additions.

### How was this patch tested?

Pass the CIs.

Closes #32406 from dongjoon-hyun/SPARK-35280.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 4e8701a)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala (diff)
Commit 72e238a790211fe44a0e61ec58a5c0a21f807968 by wenchen
[SPARK-35273][SQL] CombineFilters support non-deterministic expressions

### What changes were proposed in this pull request?

This pr makes `CombineFilters` support non-deterministic expressions. For example:
```sql
spark.sql("CREATE TABLE t1(id INT, dt STRING) using parquet PARTITIONED BY (dt)")
spark.sql("CREATE VIEW v1 AS SELECT * FROM t1 WHERE dt NOT IN ('2020-01-01', '2021-01-01')")
spark.sql("SELECT * FROM v1 WHERE dt = '2021-05-01' AND rand() <= 0.01").explain()
```

Before this pr:
```
== Physical Plan ==
*(1) Filter (isnotnull(dt#1) AND ((dt#1 = 2021-05-01) AND (rand(-6723800298719475098) <= 0.01)))
+- *(1) ColumnarToRow
   +- FileScan parquet default.t1[id#0,dt#1] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(0 paths)[], PartitionFilters: [NOT dt#1 IN (2020-01-01,2021-01-01)], PushedFilters: [], ReadSchema: struct<id:int>
```

After this pr:
```
== Physical Plan ==
*(1) Filter (rand(-2400509328955813273) <= 0.01)
+- *(1) ColumnarToRow
   +- FileScan parquet default.t1[id#0,dt#1] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(0 paths)[], PartitionFilters: [isnotnull(dt#1), NOT dt#1 IN (2020-01-01,2021-01-01), (dt#1 = 2021-05-01)], PushedFilters: [], ReadSchema: struct<id:int>
```

### Why are the changes needed?

Improve query performance.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #32405 from wangyum/SPARK-35273.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 72e238a)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PruneFiltersSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
Commit 6ce1b161e96176777344beb610163636e7dfeb00 by viirya
[SPARK-35278][SQL] Invoke should find the method with correct number of parameters

### What changes were proposed in this pull request?

This patch fixes `Invoke` expression when the target object has more than one method with the given method name.

### Why are the changes needed?

`Invoke` will find out the method on the target object with given method name. If there are more than one method with the name, currently it is undeterministic which method will be used. We should add the condition of parameter number when finding the method.

### Does this PR introduce _any_ user-facing change?

Yes, fixed a bug when using `Invoke` on a object where more than one method with the given method name.

### How was this patch tested?

Unit test.

Closes #32404 from viirya/verify-invoke-param-len.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: 6ce1b16)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala (diff)
Commit cfc0495f9c6fe0e857a1a830499c3951e12054a3 by wenchen
[SPARK-34581][SQL] Don't optimize out grouping expressions from aggregate expressions without aggregate function

### What changes were proposed in this pull request?
This PR adds a new rule `PullOutGroupingExpressions` to pull out complex grouping expressions to a `Project` node under an `Aggregate`. These expressions are then referenced in both grouping expressions and aggregate expressions without aggregate functions to ensure that optimization rules don't change the aggregate expressions to invalid ones that no longer refer to any grouping expressions.

### Why are the changes needed?
If aggregate expressions (without aggregate functions) in an `Aggregate` node are complex then the `Optimizer` can optimize out grouping expressions from them and so making aggregate expressions invalid.

Here is a simple example:
```
SELECT not(t.id IS NULL) , count(*)
FROM t
GROUP BY t.id IS NULL
```
In this case the `BooleanSimplification` rule does this:
```
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.BooleanSimplification ===
!Aggregate [isnull(id#222)], [NOT isnull(id#222) AS (NOT (id IS NULL))#226, count(1) AS c#224L]   Aggregate [isnull(id#222)], [isnotnull(id#222) AS (NOT (id IS NULL))#226, count(1) AS c#224L]
+- Project [value#219 AS id#222]                                                                 +- Project [value#219 AS id#222]
    +- LocalRelation [value#219]                                                                     +- LocalRelation [value#219]
```
where `NOT isnull(id#222)` is optimized to `isnotnull(id#222)` and so it no longer refers to any grouping expression.

Before this PR:
```
== Optimized Logical Plan ==
Aggregate [isnull(id#222)], [isnotnull(id#222) AS (NOT (id IS NULL))#234, count(1) AS c#232L]
+- Project [value#219 AS id#222]
   +- LocalRelation [value#219]
```
and running the query throws an error:
```
Couldn't find id#222 in [isnull(id#222)#230,count(1)#226L]
java.lang.IllegalStateException: Couldn't find id#222 in [isnull(id#222)#230,count(1)#226L]
```

After this PR:
```
== Optimized Logical Plan ==
Aggregate [_groupingexpression#233], [NOT _groupingexpression#233 AS (NOT (id IS NULL))#230, count(1) AS c#228L]
+- Project [isnull(value#219) AS _groupingexpression#233]
   +- LocalRelation [value#219]
```
and the query works.

### Does this PR introduce _any_ user-facing change?
Yes, the query works.

### How was this patch tested?
Added new UT.

Closes #32396 from peter-toth/SPARK-34581-keep-grouping-expressions-2.

Authored-by: Peter Toth <peter.toth@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: cfc0495)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q23b.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q23b/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q62/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q23b.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q62.sf100/explain.txt (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala (diff)
The file was addedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PullOutGroupingExpressions.scala
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q62/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q99.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q23a/explain.txt (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q23b/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q99.sf100/explain.txt (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/group-by.sql (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q99/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q62.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q23a.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/group-by.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q23a/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q99/simplified.txt (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q23a.sf100/simplified.txt (diff)
Commit caa46ce0b657725c6292f3e665427fdd85d1f7d2 by max.gekk
[SPARK-35112][SQL] Support Cast string to day-second interval

### What changes were proposed in this pull request?
Support Cast string to day-seconds interval

### Why are the changes needed?
Users can cast day-second interval string to DayTimeIntervalType.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added UT

Closes #32271 from AngersZhuuuu/SPARK-35112.

Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Co-authored-by: AngersZhuuuu <angers.zhu@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: caa46ce)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala (diff)
Commit cd689c942cb8ac201431a9c9a5d9901c3a4f36be by yamamuro
[SPARK-35192][SQL][TESTS] Port minimal TPC-DS datagen code from databricks/spark-sql-perf

### What changes were proposed in this pull request?

This PR proposes to port minimal code to generate TPC-DS data from [databricks/spark-sql-perf](https://github.com/databricks/spark-sql-perf). The classes in a new class file `tpcdsDatagen.scala` are basically copied from the `databricks/spark-sql-perf` codebase.
Note that I've modified them a bit to follow the Spark code style and removed unnecessary parts from them.

The code authors of these classes are:
juliuszsompolski
npoggi
wangyum

### Why are the changes needed?

We frequently use TPCDS data now for benchmarks/tests, but the classes for the TPCDS schemas of datagen and benchmarks/tests are managed separately, e.g.,
- https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala
- https://github.com/databricks/spark-sql-perf/blob/master/src/main/scala/com/databricks/spark/sql/perf/tpcds/TPCDSTables.scala

I think this causes some inconveniences, e.g., we need to update both files in the separate repositories if we update the TPCDS schema #32037. So, it would be useful for the Spark codebase to generate them by referring to the same schema definition.

### Does this PR introduce _any_ user-facing change?

dev only.

### How was this patch tested?

Manually checked and GA passed.

Closes #32243 from maropu/tpcdsDatagen.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: cd689c9)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/TPCDSSchema.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala
The file was modified.github/workflows/build_and_test.yml (diff)
Commit 335f00b19b82e8fa1e634bafc750f805131398ae by gurwls223
[SPARK-35285][SQL] Parse ANSI interval types in SQL schema

### What changes were proposed in this pull request?
1. Extend Spark SQL parser to support parsing of:
    - `INTERVAL YEAR TO MONTH` to `YearMonthIntervalType`
    - `INTERVAL DAY TO SECOND` to `DayTimeIntervalType`
2. Assign new names to the ANSI interval types according to the SQL standard to be able to parse the names back by Spark SQL parser. Override the `typeName()` name of `YearMonthIntervalType`/`DayTimeIntervalType`.

### Why are the changes needed?
To be able to use new ANSI interval types in SQL. The SQL standard requires the types to be defined according to the rules:
```
<interval type> ::= INTERVAL <interval qualifier>
<interval qualifier> ::= <start field> TO <end field> | <single datetime field>
<start field> ::= <non-second primary datetime field> [ <left paren> <interval leading field precision> <right paren> ]
<end field> ::= <non-second primary datetime field> | SECOND [ <left paren> <interval fractional seconds precision> <right paren> ]
<primary datetime field> ::= <non-second primary datetime field | SECOND
<non-second primary datetime field> ::= YEAR | MONTH | DAY | HOUR | MINUTE
<interval fractional seconds precision> ::= <unsigned integer>
<interval leading field precision> ::= <unsigned integer>
```
Currently, Spark SQL supports only `YEAR TO MONTH` and `DAY TO SECOND` as `<interval qualifier>`.

### Does this PR introduce _any_ user-facing change?
Should not since the types has not been released yet.

### How was this patch tested?
By running the affected tests such as:
```
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z interval.sql"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z datetime.sql"
$ build/sbt "test:testOnly *ExpressionTypeCheckingSuite"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z windowFrameCoercion.sql"
$ build/sbt "sql/testOnly *SQLQueryTestSuite -- -z literals.sql"
```

Closes #32409 from MaxGekk/parse-ansi-interval-types.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 335f00b)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/promoteStrings.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime-legacy.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/windowFrameCoercion.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/datetime.sql.out (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ExpressionTypeCheckingSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/extract.sql.out (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/types/StructTypeSuite.scala (diff)
The file was modifieddocs/sql-ref-ansi-compliance.md (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/literals.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/DayTimeIntervalType.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/interval.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/YearMonthIntervalType.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/window.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/literals.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/interval.sql.out (diff)
Commit 2a8d7ed4bf017f6d71014aeebbde6ee714439f01 by gurwls223
[SPARK-35281][SQL] StaticInvoke should not apply boxing if return type is primitive

### What changes were proposed in this pull request?

In `StaticInvoke`, when result is nullable, don't box the return value if its type is primitive.

### Why are the changes needed?

It is unnecessary to apply boxing when the method return value is of primitive type, and it would hurt performance a lot if the method is simple. The check is done in `Invoke` but not in `StaticInvoke`.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added a UT.

Closes #32416 from sunchao/SPARK-35281.

Authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 2a8d7ed)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala (diff)
Commit 44b7931936d9eff4d8f3054abdef3363af26afb6 by gurwls223
[SPARK-35176][PYTHON] Standardize input validation error type

### What changes were proposed in this pull request?
This PR corrects some exception type when the function input params are failed to validate due to TypeError.
In order to convenient to review, there are 3 commits in this PR:
- Standardize input validation error type on sql
- Standardize input validation error type on ml
- Standardize input validation error type on pandas

### Why are the changes needed?
As suggestion from Python exception doc [1]: "Raised when an operation or function is applied to an object of inappropriate type.", but there are many Value error are raised in some pyspark code, this patch fix them.

[1] https://docs.python.org/3/library/exceptions.html#TypeError

Note that: this patch only addresses the exsiting some wrong raise type for input validation, the input validation decorator/framework which mentioned in [SPARK-35176](https://issues.apache.org/jira/browse/SPARK-35176), would be submited in a speparated patch.

### Does this PR introduce _any_ user-facing change?
Yes, code can raise the right TypeError instead of ValueError.

### How was this patch tested?
Existing test case and UT

Closes #32368 from Yikun/SPARK-35176.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 44b7931)
The file was modifiedpython/pyspark/pandas/tests/test_dataframe.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_functions.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_series_string.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_evaluation.py (diff)
The file was modifiedpython/pyspark/pandas/base.py (diff)
The file was modifiedpython/pyspark/ml/base.py (diff)
The file was modifiedpython/pyspark/mllib/linalg/distributed.py (diff)
The file was modifiedpython/pyspark/pandas/series.py (diff)
The file was modifiedpython/pyspark/ml/classification.py (diff)
The file was modifiedpython/pyspark/ml/param/__init__.py (diff)
The file was modifiedpython/pyspark/taskcontext.py (diff)
The file was modifiedpython/pyspark/pandas/indexes/base.py (diff)
The file was addedpython/docs/source/migration_guide/pyspark_3.1_to_3.2.rst
The file was modifiedpython/pyspark/pandas/tests/indexes/test_base.py (diff)
The file was modifiedpython/pyspark/ml/regression.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_param.py (diff)
The file was modifiedpython/pyspark/pandas/generic.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_namespace.py (diff)
The file was modifiedpython/pyspark/sql/dataframe.py (diff)
The file was modifiedpython/pyspark/pandas/plot/core.py (diff)
The file was modifiedpython/pyspark/pandas/utils.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_groupby.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_utils.py (diff)
The file was modifiedpython/docs/source/migration_guide/index.rst (diff)
The file was modifiedpython/pyspark/pandas/tests/test_series.py (diff)
The file was modifiedpython/pyspark/pandas/frame.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_dataframe.py (diff)
The file was modifiedpython/pyspark/pandas/config.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_base.py (diff)
The file was modifiedpython/pyspark/pandas/groupby.py (diff)
The file was modifiedpython/pyspark/pandas/indexes/multi.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_ops_on_diff_frames.py (diff)
The file was modifiedpython/pyspark/ml/evaluation.py (diff)
The file was modifiedpython/pyspark/pandas/namespace.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_config.py (diff)
The file was modifiedpython/pyspark/mllib/tests/test_linalg.py (diff)
The file was modifiedpython/pyspark/pandas/strings.py (diff)
Commit be6ecb6d19528cc80e3c09e6127d2a89e5b4aa13 by gurwls223
[SPARK-35266][TESTS] Fix error in BenchmarkBase.scala that occurs when creating benchmark files in non-existent directory

### What changes were proposed in this pull request?
This PR fixes an error in `BenchmarkBase.scala` that occurs when creating a benchmark file in a non-existent directory.

### Why are the changes needed?
When submitting a benchmark job using `org.apache.spark.benchmark.Benchmarks` class with `SPARK_GENERATE_BENCHMARK_FILES=1` option, an exception is raised if the directory where the benchmark file will be generated does not exist.
For more information, please refer to [SPARK-35266](https://issues.apache.org/jira/browse/SPARK-35266).

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
After building Spark, manually tested with the following command:
```
SPARK_GENERATE_BENCHMARK_FILES=1 bin/spark-submit --class \
    org.apache.spark.benchmark.Benchmarks --jars \
    "`find . -name '*-SNAPSHOT-tests.jar' -o -name '*avro*-SNAPSHOT.jar' | paste -sd ',' -`" \
    "`find . -name 'spark-core*-SNAPSHOT-tests.jar'`" \
    "org.apache.spark.ml.linalg.BLASBenchmark"
```
It successfully generated the benchmark result files.

**Why it is sufficient:**
As illustrated in the comments in `Benchmarks.scala`, the command below runs all benchmarks and generates the results:
```
SPARK_GENERATE_BENCHMARK_FILES=1 bin/spark-submit --class \
    org.apache.spark.benchmark.Benchmarks --jars \
    "`find . -name '*-SNAPSHOT-tests.jar' -o -name '*avro*-SNAPSHOT.jar' | paste -sd ',' -`" \
    "`find . -name 'spark-core*-SNAPSHOT-tests.jar'`" \
    "*"
```
Of all the benchmarks (55 benchmarks in total), only `BLASBenchmark` fails due to the proposed issue for the current code in the master branch. Thus, it is currently sufficient to test `BLASBenchmark` to validate this change.

Closes #32394 from byungsoo-oh/SPARK-35266.

Authored-by: byungsoo <byungsoo@byungsoo-pc.tn.corp.samsungelectronics.net>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: be6ecb6)
The file was modifiedcore/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala (diff)
Commit 54e0aa10c8f1fbebc5c3d6582ac314dda58e11e8 by gurwls223
[MINOR][SS][DOCS] Fix a typo in the documentation of GroupState

### What changes were proposed in this pull request?

Fixing some typos in the documenting comments.

### Why are the changes needed?

To make reading the docs more pleasant.

### Does this PR introduce _any_ user-facing change?

Yes, since the user sees the docs.

### How was this patch tested?

It was not tested, because no code was changed.

Closes #32400 from Dobiasd/patch-1.

Authored-by: Tobias Hermann <editgym@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 54e0aa1)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/GroupState.scala (diff)
Commit 8aaa9e890acfa318775627fbe6d12eaf34bb35b1 by gurwls223
[SPARK-35250][SQL][DOCS] Fix duplicated STOP_AT_DELIMITER to SKIP_VALUE at CSV's unescapedQuoteHandling option documentation

### What changes were proposed in this pull request?

This is rather a followup of https://github.com/apache/spark/pull/30518 that should be ported back to `branch-3.1` too.
`STOP_AT_DELIMITER` was mistakenly used twice. The duplicated `STOP_AT_DELIMITER` should be `SKIP_VALUE` in the documentation.

### Why are the changes needed?

To correctly document.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes the user-facing documentation.

### How was this patch tested?

I checked them via running linters.

Closes #32423 from HyukjinKwon/SPARK-35250.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 8aaa9e8)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala (diff)
The file was modifiedpython/pyspark/sql/readwriter.py (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (diff)
The file was modifiedpython/pyspark/sql/streaming.py (diff)
Commit 176218b6b8eeca99ae1e03db185e361a1cb0d1b0 by gurwls223
[SPARK-35292][PYTHON] Delete redundant parameter in mypy configuration

### What changes were proposed in this pull request?

The parameter **no_implicit_optional** is defined twice in the mypy configuration, [ligne 20](https://github.com/apache/spark/blob/master/python/mypy.ini#L20) and ligne 105.

### Why are the changes needed?

We would like to keep the mypy configuration clean.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This patch can be tested with `dev/lint-python`

Closes #32418 from garawalid/feature/clean-mypy-config.

Authored-by: garawalid <gwalid94@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 176218b)
The file was modifiedpython/mypy.ini (diff)
Commit 120c389b00c0b01960d90279fe72074be089d387 by gurwls223
[SPARK-34887][PYTHON] Port Koalas dependencies into PySpark

### What changes were proposed in this pull request?

Port Koalas dependencies appropriately to PySpark dependencies.

### Why are the changes needed?

pandas-on-Spark has its own required dependency and optional dependencies.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test.

Closes #32386 from xinrong-databricks/portDeps.

Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 120c389)
The file was modifiedpython/docs/source/getting_started/install.rst (diff)
The file was modifiedpython/setup.py (diff)
Commit 5ecb112410a5da2d718e5d327d7c40e3ed31ca44 by gurwls223
[SPARK-35300][PYTHON][DOCS] Standardize module names in install.rst

### What changes were proposed in this pull request?

Use full names of modules in `install.rst` when specifying dependencies.

### Why are the changes needed?

Using full names makes it more clear.
In addition, `pandas APIs on Spark` as a new module can start to be recognized by more people.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual verification.

Closes #32427 from xinrong-databricks/nameDoc.

Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 5ecb112)
The file was modifiedpython/docs/source/getting_started/install.rst (diff)
Commit a2927cb28b81a2be55cf099ca08a513e9cc6be11 by gurwls223
[SPARK-35302][INFRA] Benchmark workflow should create new files for new benchmarks

### What changes were proposed in this pull request?

Currently, it fails at `git diff --name-only` when new benchmarks are added, see https://github.com/HyukjinKwon/spark/actions/runs/808870999

We should include untracked files (new benchmark result files) to upload so developers download the results.

### Why are the changes needed?

So the new benchmark results can be added and uploaded.

### Does this PR introduce _any_ user-facing change?

No, dev-only

### How was this patch tested?

Tested at:

https://github.com/HyukjinKwon/spark/actions/runs/808867285

Closes #32428 from HyukjinKwon/include-new-benchmarks.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: a2927cb)
The file was modified.github/workflows/benchmark.yml (diff)
Commit 9b387a1718a291d5bd231740e668621d6aa58422 by gurwls223
[SPARK-35308][TESTS] Fix bug in SPARK-35266 that creates benchmark files in invalid path with wrong name

### What changes were proposed in this pull request?
This PR fixes a bug in [SPARK-35266](https://issues.apache.org/jira/browse/SPARK-35266) that creates benchmark files in the invalid path with the wrong name.
e.g. For `BLASBenchmark`,
- AS-IS: Creates `benchmarksBLASBenchmark-results.txt` in `{SPARK_HOME}/mllib-local/`
- TO-BE: Creates `BLASBenchmark-results.txt` in `{SPARK_HOME}/mllib-local/benchmarks/`

### Why are the changes needed?
As you can see in the above example, new benchmark files cannot be created as intended due to this bug.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
After building Spark, manually tested with the following command:
```
SPARK_GENERATE_BENCHMARK_FILES=1 bin/spark-submit --class \
    org.apache.spark.benchmark.Benchmarks --jars \
    "`find . -name '*-SNAPSHOT-tests.jar' -o -name '*avro*-SNAPSHOT.jar' | paste -sd ',' -`" \
    "`find . -name 'spark-core*-SNAPSHOT-tests.jar'`" \
    "org.apache.spark.ml.linalg.BLASBenchmark"
```
It successfully generated the benchmark files as intended (`BLASBenchmark-results.txt` in `{SPARK_HOME}/mllib-local/benchmarks/`).

Closes #32432 from byungsoo-oh/SPARK-35308.

Lead-authored-by: byungsoo <byungsoo@byungsoo-pc.tn.corp.samsungelectronics.net>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 9b387a1)
The file was modifiedcore/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala (diff)
Commit 7fd3f8f9ec55b364525407213ba1c631705686c5 by ltnwgl
[SPARK-35294][SQL] Add tree traversal pruning in rules with dedicated files under optimizer

### What changes were proposed in this pull request?

Added the following TreePattern enums:
- CREATE_NAMED_STRUCT
- EXTRACT_VALUE
- JSON_TO_STRUCT
- OUTER_REFERENCE
- AGGREGATE
- LOCAL_RELATION
- EXCEPT
- LIMIT
- WINDOW

Used them in the following rules:
- DecorrelateInnerQuery
- LimitPushDownThroughWindow
- OptimizeCsvJsonExprs
- PropagateEmptyRelation
- PullOutGroupingExpressions
- PushLeftSemiLeftAntiThroughJoin
- ReplaceExceptWithFilter
- RewriteDistinctAggregates
- SimplifyConditionalsInPredicate
- UnwrapCastInBinaryComparison

### Why are the changes needed?

Reduce the number of tree traversals and hence improve the query compilation latency.

### How was this patch tested?

Existing tests.

Closes #32421 from sigmod/opt.

Authored-by: Yingyi Bu <yingyi.bu@databricks.com>
Signed-off-by: Gengliang Wang <ltnwgl@gmail.com>
(commit: 7fd3f8f)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/LimitPushDownThroughWindow.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreePatterns.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalsInPredicate.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LocalRelation.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceExceptWithFilter.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeCsvJsonExprs.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushDownLeftSemiAntiJoin.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PropagateEmptyRelation.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleIdCollection.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PullOutGroupingExpressions.scala (diff)
Commit f550e03b96638de93381734c4eada2ace02d9a4f by yamamuro
[SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions

### What changes were proposed in this pull request?

To fix lambda variable name issues in nested DataFrame functions, this PR modifies code to use a global counter for `LambdaVariables` names created by higher order functions.

This is the rework of #31887. Closes #31887.

### Why are the changes needed?

This moves away from the current hard-coded variable names which break on nested function calls. There is currently a bug where nested transforms in particular fail (the inner variable shadows the outer variable)

For this query:
```
val df = Seq(
    (Seq(1,2,3), Seq("a", "b", "c"))
).toDF("numbers", "letters")

df.select(
    f.flatten(
        f.transform(
            $"numbers",
            (number: Column) => { f.transform(
                $"letters",
                (letter: Column) => { f.struct(
                    number.as("number"),
                    letter.as("letter")
                ) }
            ) }
        )
    ).as("zipped")
).show(10, false)
```
This is the current (incorrect) output:
```
+------------------------------------------------------------------------+
|zipped                                                                  |
+------------------------------------------------------------------------+
|[{a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}]|
+------------------------------------------------------------------------+
```
And this is the correct output after fix:
```
+------------------------------------------------------------------------+
|zipped                                                                  |
+------------------------------------------------------------------------+
|[{1, a}, {1, b}, {1, c}, {2, a}, {2, b}, {2, c}, {3, a}, {3, b}, {3, c}]|
+------------------------------------------------------------------------+
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added the new test in `DataFrameFunctionsSuite`.

Closes #32424 from maropu/pr31887.

Lead-authored-by: dsolow <dsolow@sayari.com>
Co-authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Co-authored-by: dmsolow <dsolow@sayarianalytics.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: f550e03)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
Commit bbdbe0f734c95ebc80facf74d00a78866e444128 by kabhwan.opensource
[SPARK-34854][SQL][SS] Expose source metrics via progress report and add Kafka use-case to report delay

### What changes were proposed in this pull request?
This pull request proposes a new API for streaming sources to signal that they can report metrics, and adds a use case to support Kafka micro batch stream to report the stats of # of offsets for the current offset falling behind the latest.

A public interface is added.

`metrics`: returns the metrics reported by the streaming source with given offset.

### Why are the changes needed?
The new API can expose any custom metrics for the "current" offset for streaming sources. Different from #31398, this PR makes metrics available to user through progress report, not through spark UI. A use case is that people want to know how the current offset falls behind the latest offset.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Unit test for Kafka micro batch source v2 are added to test the Kafka use case.

Closes #31944 from yijiacui-db/SPARK-34297.

Authored-by: Yijia Cui <yijia.cui@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(commit: bbdbe0f)
The file was modifiedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala (diff)
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ReportsSourceMetrics.java
The file was modifiedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchStream.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala (diff)
Commit 4fe4b65d9e4017654c93c8f7957ae3edbd270d0b by yumwang
[SPARK-35315][TESTS] Keep benchmark result consistent between spark-submit and SBT

### What changes were proposed in this pull request?

Set `IS_TESTING` to true in `BenchmarkBase`, before running benchmarks.

### Why are the changes needed?

Currently benchmark can be done via 2 ways: `spark-submit`, or SBT command. However in the former Spark will miss some properties such as `IS_TESTING`, which is necessary to turn on/off certain behavior like codegen (`spark.sql.codegen.factoryMode`). Therefore, the result could differ between the two. In addition, the benchmark GitHub workflow is using the spark-submit approach.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

Closes #32440 from sunchao/SPARK-35315.

Authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(commit: 4fe4b65)
The file was modifiedcore/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala (diff)
Commit 7970318296c4a35dc7aac1b68007dbb26f07562f by ltnwgl
[SPARK-35155][SQL] Add rule id pruning to Analyzer rules

### What changes were proposed in this pull request?

Added rule id based pruning to Analyzer rules in fixed point batches:

- org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns
- org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator
- org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions
- org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveBinaryArithmetic
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveInsertInto
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRandomSeed
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTables
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTempViews
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast
- org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUserSpecifiedColumns
- org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution
- org.apache.spark.sql.catalyst.analysis.DeduplicateRelations
- org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
- org.apache.spark.sql.catalyst.analysis.EliminateUnions
- org.apache.spark.sql.catalyst.analysis.ResolveCreateNamedStruct
- org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveCoalesceHints
- org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveJoinStrategyHints
- org.apache.spark.sql.catalyst.analysis.ResolveInlineTables
- org.apache.spark.sql.catalyst.analysis.ResolveLambdaVariables
- org.apache.spark.sql.catalyst.analysis.ResolveTimeZone
- org.apache.spark.sql.catalyst.analysis.ResolveUnion
- org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals
- org.apache.spark.sql.catalyst.analysis.TimeWindowing

Subsequent PRs will add tree bits based pruning to those rules. Split a big PR to reduce review load.

### Why are the changes needed?

Reduce the number of tree traversals and hence improve the query compilation latency.

### How was this patch tested?

Existing tests.

Closes #32425 from sigmod/analyzer.

Authored-by: Yingyi Bu <yingyi.bu@databricks.com>
Signed-off-by: Gengliang Wang <ltnwgl@gmail.com>
(commit: 7970318)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UpdateFields.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveUnion.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveInlineTables.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleIdCollection.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/timeZoneAnalysis.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/higherOrderFunctions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala (diff)
Commit 01269245681a467d2542a3a91e7d188fbf3e9162 by dhyun
[SPARK-35323][BUILD] Remove unused libraries from LICENSE-binary

### What changes were proposed in this pull request?

This PR removes unused libraries from `LICENSE-binary` file.

### Why are the changes needed?

SPARK-33212 removes many `Hadoop 3`-only transitive libraries like `dnsjava-2.1.7.jar`. We can simplify Apache Spark LICENSE file by removing them.

### Does this PR introduce _any_ user-facing change?

Yes, but this is only LICENSE file change.

### How was this patch tested?

Manual.

Closes #32445 from dongjoon-hyun/SPARK-35323.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 0126924)
The file was modifiedLICENSE-binary (diff)
Commit a0c76a8755a148e2bd774edcda12fe20f2f38c75 by dhyun
[SPARK-35319][K8S][BUILD] Upgrade K8s client to 5.3.1

### What changes were proposed in this pull request?

This PR aims to upgrade K8s client to 5.3.1.

### Why are the changes needed?

This will bring the latest bug fixes.
- https://github.com/fabric8io/kubernetes-client/releases/tag/v5.3.1

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

K8s IT is manually tested like the following.

```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- All pods have the same service account by default
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
- Start pod creation from template
- PVs with local storage
- Launcher client dependencies
- SPARK-33615: Launcher client archives
- SPARK-33748: Launcher python client respecting PYSPARK_PYTHON
- SPARK-33748: Launcher python client respecting spark.pyspark.python and spark.pyspark.driver.python
- Launcher python client dependencies using a zip file
- Test basic decommissioning
- Test basic decommissioning with shuffle cleanup
- Test decommissioning with dynamic allocation & shuffle cleanups
- Test decommissioning timeouts
- Run SparkR on simple dataframe.R example
Run completed in 18 minutes, 33 seconds.
Total number of tests run: 27
Suites: completed 2, aborted 0
Tests: succeeded 27, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Spark Project Parent POM 3.2.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [  3.959 s]
[INFO] Spark Project Tags ................................. SUCCESS [  7.830 s]
[INFO] Spark Project Local DB ............................. SUCCESS [  3.457 s]
[INFO] Spark Project Networking ........................... SUCCESS [  5.496 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  3.239 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [  9.006 s]
[INFO] Spark Project Launcher ............................. SUCCESS [  2.422 s]
[INFO] Spark Project Core ................................. SUCCESS [02:17 min]
[INFO] Spark Project Kubernetes Integration Tests ......... SUCCESS [21:05 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  23:59 min
[INFO] Finished at: 2021-05-05T11:59:19-07:00
[INFO] ------------------------------------------------------------------------
```

Closes #32443 from dongjoon-hyun/SPARK-35319.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: a0c76a8)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifiedresource-managers/kubernetes/core/pom.xml (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/pom.xml (diff)
Commit 19661f6ae2638ca555a739c8fa265e24bd831977 by dhyun
[SPARK-35325][SQL][TESTS] Add nested column ORC encryption test case

### What changes were proposed in this pull request?

This PR aims to enrich ORC encryption test coverage for nested columns.

### Why are the changes needed?

This will provide a test coverage for this feature.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the newly added test case.

Closes #32449 from dongjoon-hyun/SPARK-35325.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 19661f6)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala (diff)
Commit 5c67d0c8f7155fba750a2deffed9689ad3e4f8fc by yamamuro
[SPARK-35293][SQL][TESTS] Use the newer dsdgen for TPCDSQueryTestSuite

### What changes were proposed in this pull request?

This PR intends to replace `maropu/spark-tpcds-datagen` with `databricks/tpcds-kit` for using a newer dsdgen and update the golden files in `tpcds-query-results`.

### Why are the changes needed?

For better testing.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA passed.

Closes #32420 from maropu/UseTpcdsKit.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: 5c67d0c)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q77.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q80a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q41.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q24b.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q37.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q42.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q53.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q29.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q96.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q35.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q4.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q49.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q9.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q51a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q31.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q44.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q98.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q77a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q65.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q57.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q97.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q82.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q48.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q33.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q27a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q22.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q40.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q74.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q72.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q85.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q11.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q34.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q5a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q56.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q22.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q14a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q22a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q18a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q11.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q39a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q54.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q35a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q92.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q73.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q88.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q86a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q79.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q32.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q5.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q63.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q34.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q67a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q58.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q59.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q64.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q19.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q61.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q75.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q25.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q94.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q6.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q14b.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q76.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q66.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q8.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q74.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q27.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q13.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q20.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q46.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q36a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q98.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q2.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q43.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q81.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q12.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q47.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q6.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q70a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q69.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q3.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q52.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q47.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q49.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q23a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q60.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q14.sql.out (diff)
The file was modified.github/workflows/build_and_test.yml (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q51.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q50.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q84.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q93.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q21.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q57.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q26.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q83.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q39b.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q68.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q71.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q20.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/GenTPCDSData.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q89.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q7.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q91.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q10.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q24a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q86.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q36.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q10a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q55.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q30.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q64.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q45.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q90.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q72.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q38.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q62.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q14a.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q75.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v2_7/q35.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q70.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q28.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q1.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q12.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q16.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q15.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q18.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q95.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q23b.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q67.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q87.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q99.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q17.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-query-results/v1_4/q80.sql.out (diff)