Changes

Summary

  1. [SPARK-33684][BUILD] Upgrade httpclient from 4.5.6 to 4.5.13 (commit: d48ef34) (details)
  2. [SPARK-33671][SQL] Remove VIEW checks from V1 table commands (commit: 87c0560) (details)
  3. [SPARK-33676][SQL] Require exact matching of partition spec to the (commit: 26c0493) (details)
  4. [SPARK-33617][SQL] Add default parallelism configuration for Spark SQL (commit: 1e0c006) (details)
  5. [SPARK-32680][SQL] Don't Preprocess V2 CTAS with Unresolved Query (commit: d730b6b) (details)
  6. [SPARK-33641][SQL] Invalidate new char/varchar types in public APIs that (commit: da72b87) (details)
  7. [MINOR] Spelling sql not core (commit: c62b84a) (details)
  8. [SPARK-33693][SQL] deprecate spark.sql.hive.convertCTAS (commit: 6aff215) (details)
  9. [SPARK-33480][SQL][FOLLOWUP] do not expose user data in error message (commit: c0874ba) (details)
  10. [SPARK-33621][SQL] Add a way to inject data source rewrite rules (commit: 02508b6) (details)
  11. [SPARK-32320][PYSPARK] Remove mutable default arguments (commit: e4d1c10) (details)
  12. [SPARK-33680][SQL][TESTS][FOLLOWUP] Fix more test suites to have (commit: b2a7930) (details)
  13. [SPARK-33609][ML] word2vec reduce broadcast size (commit: ebd8b93) (details)
  14. [SPARK-33698][BUILD][TESTS] Fix the build error of (commit: 8bcebfa) (details)
  15. [SPARK-33664][SQL] Migrate ALTER TABLE ... RENAME TO to use (commit: 5aefc49) (details)
  16. [MINOR][INFRA] Add -Pdocker-integration-tests to GitHub Action Scala (commit: 3a6546d) (details)
  17. [SPARK-33679][SQL] Enable spark.sql.adaptive.enabled by default (commit: 031c5ef) (details)
  18. [SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any (commit: 99613cd) (details)
  19. [SPARK-33688][SQL] Migrate SHOW TABLE EXTENDED to new resolution (commit: 2b30dde) (details)
  20. [SPARK-33685][SQL] Migrate DROP VIEW command to use UnresolvedView to (commit: c05ee06) (details)
  21. [MINOR] Spelling sql/core (commit: a093d6f) (details)
  22. [SPARK-33675][INFRA][FOLLOWUP] Schedule branch-3.1 snapshot at master (commit: c001dd4) (details)
  23. [SPARK-32110][SQL] normalize special floating numbers in HyperLogLog++ (commit: 6fd2345) (details)
  24. [SPARK-33695][BUILD] Upgrade to jackson to 2.10.5 and jackson-databind (commit: 3ac70f1) (details)
Commit d48ef34911b8928b66df92399119caebb24616d4 by dongjoon
[SPARK-33684][BUILD] Upgrade httpclient from 4.5.6 to 4.5.13

### What changes were proposed in this pull request?

This PR upgrades `commons.httpclient` from `4.5.6` to `4.5.13`.
4.5.6 is released over 2 years ago and now we can use more stable `4.5.13`.
https://archive.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.5.x.txt

### Why are the changes needed?

To follow the more stable release.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Should be done by the existing tests.

Closes #30634 from sarutak/upgrade-httpclient.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: d48ef34)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
Commit 87c056088e853d475f1507e296ad06480862e8a7 by dongjoon
[SPARK-33671][SQL] Remove VIEW checks from V1 table commands

### What changes were proposed in this pull request?
Remove VIEW checks from the following V1 commands:
- `SHOW PARTITIONS`
- `TRUNCATE TABLE`
- `LOAD DATA`

The checks are performed earlier at:
https://github.com/apache/spark/blob/acc211d2cf0e6ab94f6578e1eb488766fd20fa4e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L885-L889

### Why are the changes needed?
To improve code maintenance, and remove dead codes.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By existing test suites like `v1/ShowPartitionsSuite`.

1. LOAD DATA:
https://github.com/apache/spark/blob/acc211d2cf0e6ab94f6578e1eb488766fd20fa4e/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala#L176-L179
2. TRUNCATE TABLE:
https://github.com/apache/spark/blob/acc211d2cf0e6ab94f6578e1eb488766fd20fa4e/sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala#L180-L183
3. SHOW PARTITIONS:
- v1/ShowPartitionsSuite

Closes #30620 from MaxGekk/show-table-check-view.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 87c0560)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala (diff)
Commit 26c0493318c2a3e5b74ff3829de88605aff8e832 by wenchen
[SPARK-33676][SQL] Require exact matching of partition spec to the schema in V2 `ALTER TABLE .. ADD/DROP PARTITION`

### What changes were proposed in this pull request?
Check that partitions specs passed to v2 `ALTER TABLE .. ADD/DROP PARTITION` exactly match to the partition schema (all partition fields from the schema are specified in partition specs).

### Why are the changes needed?
1. To have the same behavior as V1 `ALTER TABLE .. ADD/DROP PARTITION` that output the error:
```sql
spark-sql> create table tab1 (id int, a int, b int) using parquet partitioned by (a, b);
spark-sql> ALTER TABLE tab1 ADD PARTITION (A='9');
Error in query: Partition spec is invalid. The spec (a) must match the partition spec (a, b) defined in table '`default`.`tab1`';
```
2. To prevent future errors caused by not fully specified partition specs.

### Does this PR introduce _any_ user-facing change?
Yes. The V2 implementation of `ALTER TABLE .. ADD/DROP PARTITION` output the same error as V1 commands.

### How was this patch tested?
By running the test suite with new UT:
```
$ build/sbt "test:testOnly *AlterTablePartitionV2SQLSuite"
```

Closes #30624 from MaxGekk/add-partition-full-spec.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 26c0493)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/util/PartitioningUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/AlterTablePartitionV2SQLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolvePartitionSpec.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (diff)
Commit 1e0c006748c031d5277ba3b906b0bbf68e6bc893 by gurwls223
[SPARK-33617][SQL] Add default parallelism configuration for Spark SQL queries

### What changes were proposed in this pull request?

This pr add default parallelism configuration(`spark.sql.default.parallelism`) for Spark SQL and make it effective for `LocalTableScan`.

### Why are the changes needed?

Avoid generating small files for INSERT INTO TABLE from VALUES, for example:
```sql
CREATE TABLE t1(id int) USING parquet;
INSERT INTO TABLE t1 VALUES (1), (2), (3), (4), (5), (6), (7), (8);
```

Before this pr:
```
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00000-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00001-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00002-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00003-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00004-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00005-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00006-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root 421 Dec  1 01:54 part-00007-4d5a3a89-2995-4328-b2ae-908febbbaf4a-c000.snappy.parquet
-rw-r--r-- 1 root root   0 Dec  1 01:54 _SUCCESS
```

After this pr and set `spark.sql.files.minPartitionNum` to 1:
```
-rw-r--r-- 1 root root 452 Dec  1 01:59 part-00000-6de50c79-e305-4f8d-b6ae-39f46b2619c6-c000.snappy.parquet
-rw-r--r-- 1 root root   0 Dec  1 01:59 _SUCCESS
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #30559 from wangyum/SPARK-33617.

Lead-authored-by: Yuming Wang <yumwang@ebay.com>
Co-authored-by: Yuming Wang <yumwang@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 1e0c006)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SchemaMergeUtils.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FilePartition.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/LocalTableScanExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SparkPlanSuite.scala (diff)
Commit d730b6bdaa92f2ca19cc8852ac58035e28d47a4f by wenchen
[SPARK-32680][SQL] Don't Preprocess V2 CTAS with Unresolved Query

### What changes were proposed in this pull request?
The analyzer rule `PreprocessTableCreation` will preprocess table creation related logical plan. But for
CTAS, if the sub-query can't be resolved, preprocess it will cause "Invalid call to toAttribute on unresolved
object" (instead of a user-friendly error msg: "table or view not found").
This PR fixes this wrongly preprocess for CTAS using V2 catalog.

### Why are the changes needed?
bug fix

### Does this PR introduce _any_ user-facing change?
The error message for CTAS with a non-exists table changed from:
`UnresolvedException: Invalid call to toAttribute on unresolved object, tree: xxx` to
`AnalysisException: Table or view not found: xxx`

### How was this patch tested?
added test

Closes #30637 from linhongliu-db/fix-ctas.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: d730b6b)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala (diff)
Commit da72b87374a7be5416b99ed016dc2fc9da0ed88a by wenchen
[SPARK-33641][SQL] Invalidate new char/varchar types in public APIs that produce incorrect results

### What changes were proposed in this pull request?

In this PR, we suppose to narrow the use cases of the char/varchar data types, of which are invalid now or later

### Why are the changes needed?
1. udf
```scala
scala> spark.udf.register("abcd", () => "12345", org.apache.spark.sql.types.VarcharType(2))

scala> spark.sql("select abcd()").show
scala.MatchError: CharType(2) (of class org.apache.spark.sql.types.VarcharType)
  at org.apache.spark.sql.catalyst.encoders.RowEncoder$.externalDataTypeFor(RowEncoder.scala:215)
  at org.apache.spark.sql.catalyst.encoders.RowEncoder$.externalDataTypeForInput(RowEncoder.scala:212)
  at org.apache.spark.sql.catalyst.expressions.objects.ValidateExternalType.<init>(objects.scala:1741)
  at org.apache.spark.sql.catalyst.encoders.RowEncoder$.$anonfun$serializerFor$3(RowEncoder.scala:175)
  at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
  at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
  at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
  at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
  at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
  at scala.collection.mutable.ArrayOps$ofRef.flatMap(ArrayOps.scala:198)
  at org.apache.spark.sql.catalyst.encoders.RowEncoder$.serializerFor(RowEncoder.scala:171)
  at org.apache.spark.sql.catalyst.encoders.RowEncoder$.apply(RowEncoder.scala:66)
  at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:768)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
  at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:611)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:768)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:606)
  ... 47 elided
```

2. spark.createDataframe

```
scala> spark.createDataFrame(spark.read.text("README.md").rdd, new org.apache.spark.sql.types.StructType().add("c", "char(1)")).show
+--------------------+
|                   c|
+--------------------+
|      # Apache Spark|
|                    |
|Spark is a unifie...|
|high-level APIs i...|
|supports general ...|
|rich set of highe...|
|MLlib for machine...|
|and Structured St...|
|                    |
|<https://spark.ap...|
|                    |
|[![Jenkins Build]...|
|[![AppVeyor Build...|
|[![PySpark Covera...|
|                    |
|                    |
```

3. reader.schema

```
scala> spark.read.schema("a varchar(2)").text("./README.md").show(100)
+--------------------+
|                   a|
+--------------------+
|      # Apache Spark|
|                    |
|Spark is a unifie...|
|high-level APIs i...|
|supports general ...|
```
4. etc

### Does this PR introduce _any_ user-facing change?

NO, we intend to avoid protentical breaking change

### How was this patch tested?

new tests

Closes #30586 from yaooqinn/SPARK-33641.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: da72b87)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableSchemaParserSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExprUtils.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/VarcharType.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserInterface.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Column.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharVarcharUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (diff)
Commit c62b84a0432e51fd10e628088ee311dc3be73d2f by srowen
[MINOR] Spelling sql not core

### What changes were proposed in this pull request?

This PR intends to fix typos in the sub-modules:
* `sql/catalyst`
* `sql/hive-thriftserver`
* `sql/hive`

Split per srowen https://github.com/apache/spark/pull/30323#issuecomment-728981618

NOTE: The misspellings have been reported at https://github.com/jsoref/spark/commit/706a726f87a0bbf5e31467fae9015218773db85b#commitcomment-44064356

### Why are the changes needed?

Misspelled words make it harder to read / understand content.

### Does this PR introduce _any_ user-facing change?

There are various fixes to documentation, etc...

### How was this patch tested?

No testing was performed

Closes #30532 from jsoref/spelling-sql-not-core.

Authored-by: Josh Soref <jsoref@users.noreply.github.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: c62b84a)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/StreamingJoinHelper.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala (diff)
The file was modifiedsql/hive/src/test/resources/ql/src/test/queries/clientpositive/bucketsortoptimize_insert_3.q (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnvSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/higherOrderFunctions.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AliasHelper.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/GetInfoValue.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ErrorParserSuite.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ComplexTypeSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/StructFilters.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationExec.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupingAnalyticsSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (diff)
The file was modifiedsql/hive/src/test/resources/ql/src/test/queries/clientpositive/auto_sortmerge_join_13.q (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SetOperationSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushDownLeftSemiAntiJoin.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/QuantileSummaries.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeBlockSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/ColumnDescriptor.java (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/DummyListeners.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UpdateAttributeNullability.scala (diff)
The file was modifiedsql/hive/src/test/resources/ql/src/test/queries/clientpositive/smb_mapjoin_20.q (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ObjectExpressionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateSafeProjection.scala (diff)
The file was modifiedsql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/PlanHelper.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PercentileSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/UnsafeArraySuite.scala (diff)
Commit 6aff215077e2cdf9cec187c827da63c067514e4e by dongjoon
[SPARK-33693][SQL] deprecate spark.sql.hive.convertCTAS

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/30554 . Now we have a new config for converting CREATE TABLE, we don't need the old config that only works for CTAS.

### Why are the changes needed?

It's confusing for having two config while one can cover another completely.

### Does this PR introduce _any_ user-facing change?

no, it's deprecating not removing.

### How was this patch tested?

N/A

Closes #30651 from cloud-fan/minor.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 6aff215)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit c0874ba9f13b9802eef4418490020692e37652ba by dongjoon
[SPARK-33480][SQL][FOLLOWUP] do not expose user data in error message

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/30412. This PR updates the error message of char/varchar table insertion length check, to not expose user data.

### Why are the changes needed?

This is risky to expose user data in the error message, especially the string data, as it may contain sensitive data.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

updated tests

Closes #30653 from cloud-fan/minor2.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: c0874ba)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharVarcharUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala (diff)
Commit 02508b68ecc56658a13d89bf798c5ef824ba2cdc by dongjoon
[SPARK-33621][SQL] Add a way to inject data source rewrite rules

### What changes were proposed in this pull request?

This PR adds a way to inject data source rewrite rules.

### Why are the changes needed?

Right now `SparkSessionExtensions` allow us to inject optimization rules but they are added to operator optimization batch. There are cases when users need to run rules after the operator optimization batch (e.g. cases when a rule relies on the fact that expressions have been optimized). Currently, this is not possible.

### Does this PR introduce _any_ user-facing change?

Yes.

### How was this patch tested?

This PR comes with a new test.

Closes #30577 from aokolnychyi/spark-33621-v3.

Authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 02508b6)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala (diff)
Commit e4d1c10760800563d2a30410b46e5b0cd2671c4d by ruifengz
[SPARK-32320][PYSPARK] Remove mutable default arguments

This is bad practice, and might lead to unexpected behaviour:
https://florimond.dev/blog/articles/2018/08/python-mutable-defaults-are-the-source-of-all-evil/

```
fokkodriesprongFan spark % grep -R "={}" python | grep def

python/pyspark/resource/profile.py:    def __init__(self, _java_resource_profile=None, _exec_req={}, _task_req={}):
python/pyspark/sql/functions.py:def from_json(col, schema, options={}):
python/pyspark/sql/functions.py:def to_json(col, options={}):
python/pyspark/sql/functions.py:def schema_of_json(json, options={}):
python/pyspark/sql/functions.py:def schema_of_csv(csv, options={}):
python/pyspark/sql/functions.py:def to_csv(col, options={}):
python/pyspark/sql/functions.py:def from_csv(col, schema, options={}):
python/pyspark/sql/avro/functions.py:def from_avro(data, jsonFormatSchema, options={}):
```

```
fokkodriesprongFan spark % grep -R "=\[\]" python | grep def
python/pyspark/ml/tuning.py:    def __init__(self, bestModel, avgMetrics=[], subModels=None):
python/pyspark/ml/tuning.py:    def __init__(self, bestModel, validationMetrics=[], subModels=None):
```

### What changes were proposed in this pull request?

Removing the mutable default arguments.

### Why are the changes needed?

Removing the mutable default arguments, and changing the signature to `Optional[...]`.

### Does this PR introduce _any_ user-facing change?

No 👍

### How was this patch tested?

Using the Flake8 bugbear code analysis plugin.

Closes #29122 from Fokko/SPARK-32320.

Authored-by: Fokko Driesprong <fokko@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(commit: e4d1c10)
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was modifiedpython/pyspark/ml/tuning.py (diff)
The file was modifiedpython/pyspark/sql/functions.pyi (diff)
The file was modifiedpython/pyspark/ml/regression.py (diff)
The file was modifiedpython/pyspark/sql/avro/functions.pyi (diff)
The file was modifiedpython/mypy.ini (diff)
The file was modifiedpython/pyspark/ml/tuning.pyi (diff)
The file was modifiedpython/pyspark/resource/profile.py (diff)
The file was modifieddev/sparktestsupport/modules.py (diff)
The file was modifieddev/tox.ini (diff)
The file was modifiedpython/pyspark/sql/avro/functions.py (diff)
The file was modifiedpython/pyspark/resource/profile.pyi (diff)
Commit b2a79306ef7b330c5bf4dc1337ed80ebd6e08d0c by dongjoon
[SPARK-33680][SQL][TESTS][FOLLOWUP] Fix more test suites to have explicit confs

### What changes were proposed in this pull request?

This is a follow-up for SPARK-33680 to remove the assumption on the default value of `spark.sql.adaptive.enabled` .

### Why are the changes needed?

According to the test result https://github.com/apache/spark/pull/30628#issuecomment-739866168, the [previous run](https://github.com/apache/spark/pull/30628#issuecomment-739641105) didn't run all tests.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #30655 from dongjoon-hyun/SPARK-33680.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: b2a7930)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceWithActualMetricsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala (diff)
Commit ebd8b9357af296b8859e65577ab1e16593fab50d by ruifengz
[SPARK-33609][ML] word2vec reduce broadcast size

### What changes were proposed in this pull request?
1, directly use float vectors instead of converting to double vectors, this is about 2x faster than using vec.axpy;
2, mark `wordList` and `wordVecNorms` lazy
3, avoid slicing in computation of `wordVecNorms`

### Why are the changes needed?
halve broadcast size

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing testsuites

Closes #30548 from zhengruifeng/w2v_float32_transform.

Lead-authored-by: Ruifeng Zheng <ruifengz@foxmail.com>
Co-authored-by: zhengruifeng <ruifengz@foxmail.com>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(commit: ebd8b93)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala (diff)
Commit 8bcebfa59a64123f014c01bc4fb5de8d9624f8f4 by dongjoon
[SPARK-33698][BUILD][TESTS] Fix the build error of OracleIntegrationSuite for Scala 2.13

### What changes were proposed in this pull request?

This PR fixes a build error of `OracleIntegrationSuite` with Scala 2.13.

### Why are the changes needed?

Build should pass with Scala 2.13.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I confirmed that the build pass with the following command.
```
$ build/sbt -Pdocker-integration-tests -Pscala-2.13  "docker-integration-tests/test:compile"
```

Closes #30660 from sarutak/fix-docker-integration-tests-for-scala-2.13.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 8bcebfa)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala (diff)
Commit 5aefc49b0f7047f2c928c18b371098314c2f59f0 by wenchen
[SPARK-33664][SQL] Migrate ALTER TABLE ... RENAME TO to use UnresolvedTableOrView to resolve identifier

### What changes were proposed in this pull request?

This PR proposes to migrate `ALTER [TABLE|ViEW] ... RENAME TO` to use `UnresolvedTableOrView` to resolve the table/view identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in [JIRA](https://issues.apache.org/jira/browse/SPARK-29900) or [proposal doc](https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing).

### Why are the changes needed?

To use `UnresolvedTableOrView` for table/view resolution. Note that `AlterTableRenameCommand` internally resolves to a temp view first, so there is no resolution behavior change with this PR.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated existing tests.

Closes #30610 from imback82/rename_v2.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 5aefc49)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala (diff)
Commit 3a6546d3858e7c184f36cb6c4fd454f2142460f0 by sarutak
[MINOR][INFRA] Add -Pdocker-integration-tests to GitHub Action Scala 2.13 build job

### What changes were proposed in this pull request?

This aims to add `-Pdocker-integration-tests` at GitHub Action job for Scala 2.13 compilation.

### Why are the changes needed?

We fixed Scala 2.13 compilation of this module at https://github.com/apache/spark/pull/30660 . This PR will prevent accidental regression at that module.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GitHub Action Scala 2.13 job.

Closes #30661 from dongjoon-hyun/SPARK-DOCKER-IT.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(commit: 3a6546d)
The file was modified.github/workflows/build_and_test.yml (diff)
Commit 031c5ef280e0cba8c4718a6457a44b6cccb17f46 by dongjoon
[SPARK-33679][SQL] Enable spark.sql.adaptive.enabled by default

### What changes were proposed in this pull request?

This PR aims to enable `spark.sql.adaptive.enabled` by default for Apache Spark **3.2.0**.

### Why are the changes needed?

By switching the default for Apache Spark 3.2, the whole community can focus more on the stabilizing this feature in the various situation more seriously.

### Does this PR introduce _any_ user-facing change?

Yes, but this is an improvement and it's supposed to have no bugs.

### How was this patch tested?

Pass the CIs.

Closes #30628 from dongjoon-hyun/SPARK-33679.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 031c5ef)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit 99613cd5815b2de12274027dee0c0a6c0c57bd95 by yamamuro
[SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar

### What changes were proposed in this pull request?
`LikeSimplification` rule does not work correctly for many cases that have patterns containing escape characters, for example:

`SELECT s LIKE 'm%aca' ESCAPE '%' FROM t`
`SELECT s LIKE 'maacaa' ESCAPE 'a' FROM t`

For simpilicy, this PR makes this rule just be skipped if `pattern` contains any `escapeChar`.

### Why are the changes needed?
Result corrupt.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added Unit test.

Closes #30625 from luluorta/SPARK-33677.

Authored-by: luluorta <luluorta@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: 99613cd)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LikeSimplificationSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala (diff)
Commit 2b30dde24972f7123b7ee14583fdce72e9ee955f by wenchen
[SPARK-33688][SQL] Migrate SHOW TABLE EXTENDED to new resolution framework

### What changes were proposed in this pull request?
1. Remove old statement `ShowTableStatement`
2. Introduce new command `ShowTableExtended` for  `SHOW TABLE EXTENDED`.

This PR is the first step of new V2 implementation of `SHOW TABLE EXTENDED`, see SPARK-33393.

### Why are the changes needed?
This is a part of effort to make the relation lookup behavior consistent: SPARK-29900.

### Does this PR introduce _any_ user-facing change?
The changes should not affect V1 tables. For V2, Spark outputs the error:
```
SHOW TABLE EXTENDED is not supported for v2 tables.
```

### How was this patch tested?
By running `SHOW TABLE EXTENDED` tests:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *ShowTablesSuite"
```

Closes #30645 from MaxGekk/show-table-extended-statement.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 2b30dde)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/ShowTablesSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowTablesParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
Commit c05ee06f5b711dd261dc94a01b4ba4ffccdf2ea0 by wenchen
[SPARK-33685][SQL] Migrate DROP VIEW command to use UnresolvedView to resolve the identifier

### What changes were proposed in this pull request?

This PR introduces `UnresolvedView` in the resolution framework to resolve the identifier.

This PR then migrates `DROP VIEW` to use `UnresolvedView` to resolve the table/view identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in [JIRA](https://issues.apache.org/jira/browse/SPARK-29900) or [proposal doc](https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing).

### Why are the changes needed?

To use `UnresolvedView` for view resolution. Note that there is no resolution behavior change with this PR.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated existing tests.

Closes #30636 from imback82/drop_view_v2.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: c05ee06)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was addedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCommandsWithIfExists.scala
The file was removedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveNoopDropTable.scala
Commit a093d6feefb0e086d19c86ae53bf92df12ccf2fa by srowen
[MINOR] Spelling sql/core

### What changes were proposed in this pull request?

This PR intends to fix typos in the sub-modules:
* `sql/core`

Split per srowen https://github.com/apache/spark/pull/30323#issuecomment-728981618

NOTE: The misspellings have been reported at https://github.com/jsoref/spark/commit/706a726f87a0bbf5e31467fae9015218773db85b#commitcomment-44064356

### Why are the changes needed?

Misspelled words make it harder to read / understand content.

### Does this PR introduce _any_ user-facing change?

There are various fixes to documentation, etc...

### How was this patch tested?

No testing was performed

Closes #30531 from jsoref/spelling-sql-core.

Authored-by: Josh Soref <jsoref@users.noreply.github.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: a093d6f)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/ExplainUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArray.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/TPCDSTableStats.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/FlatMapGroupsWithStateExecHelper.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaPruningSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/exchange/EnsureRequirementsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/CompactibleFileStreamLog.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/ansi/decimalArithmeticOperations.sql (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/sources/TableScanSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala (diff)
The file was modifiedsql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetAggregatorSuite.java (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/internal/HiveSerDe.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Dataset.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/Java8DatasetAggregatorSuite.java (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanHelper.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/ui/SparkPlanInfoSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/postgreSQL/create_view.sql (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLogSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/sources/CreateTableAsSelectSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/DataSourceSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala (diff)
Commit c001dd49e4e9bb42f18618afe710e401b2df3afb by dongjoon
[SPARK-33675][INFRA][FOLLOWUP] Schedule branch-3.1 snapshot at master branch

### What changes were proposed in this pull request?

Currently, `master`/`branch-3.0`/`branch-2.4` snapshot publishing is successfully migrated from Jenkins to `GitHub Action`.

- https://github.com/apache/spark/actions?query=workflow%3A%22Publish+Snapshot%22

This PR aims to schedule `branch-3.1` snapshot at `master` branch.

### Why are the changes needed?

This is because it turns out that `GitHub Action Schedule` works only at `master` branch. (the default branch).
- https://docs.github.com/en/free-pro-teamlatest/actions/reference/events-that-trigger-workflows#scheduled-events

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The matrix triggering is tested at the forked branch.
- https://github.com/dongjoon-hyun/spark/runs/1519015974

Closes #30674 from dongjoon-hyun/SPARK-SCHEDULE-3.1.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: c001dd4)
The file was modified.github/workflows/publish_snapshot.yml (diff)
Commit 6fd234503cf1e85715ccd3bda42f29dae1daa71b by dongjoon
[SPARK-32110][SQL] normalize special floating numbers in HyperLogLog++

### What changes were proposed in this pull request?

Currently, Spark treats 0.0 and -0.0 semantically equal, while it still retains the difference between them so that users can see -0.0 when displaying the data set.

The comparison expressions in Spark take care of the special floating numbers and implement the correct semantic. However, Spark doesn't always use these comparison expressions to compare values, and we need to normalize the special floating numbers before comparing them in these places:
1. GROUP BY
2. join keys
3. window partition keys

This PR fixes one more place that compares values without using comparison expressions: HyperLogLog++

### Why are the changes needed?

Fix the query result

### Does this PR introduce _any_ user-facing change?

Yes, the result of HyperLogLog++ becomes correct now.

### How was this patch tested?

a new test case, and a few more test cases that pass before this PR to improve test coverage.

Closes #30673 from cloud-fan/bug.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 6fd2345)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/PredicateSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/HyperLogLogPlusPlusHelper.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala (diff)
Commit 3ac70f169d653f22bd04ec7bb6ebb49696807bb2 by dongjoon
[SPARK-33695][BUILD] Upgrade to jackson to 2.10.5 and jackson-databind to 2.10.5.1

### What changes were proposed in this pull request?

Upgrade the jackson dependencies to 2.10.5 and jackson-databind to 2.10.5.1

### Why are the changes needed?

Jackson dependency has vulnerability CVE-2020-25649.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests.

Closes #30656 from n-marion/SPARK-33695_upgrade-jackson.

Authored-by: Nicholas Marion <nmarion@us.ibm.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 3ac70f1)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)