Changes

Summary

  1. [SPARK-33740][SQL] hadoop configs in hive-site.xml can overrides (commit: 31e0bac) (details)
  2. [SPARK-33742][SQL] Throw PartitionsAlreadyExistException from (commit: fab2995) (details)
  3. [SPARK-33295][BUILD] Upgrade ORC to 1.6.6 (commit: 1ba1732) (details)
  4. [SPARK-33749][BUILD][PYTHON] Exclude target directory in pycodestyle and (commit: cd7a306) (details)
  5. [SPARK-32910][SS] Remove UninterruptibleThread usage from (commit: 7895ea1) (details)
  6. [SPARK-33527][SQL] Extend the function of decode so as consistent with (commit: 24d7e45) (details)
  7. [SPARK-33750][SQL][TESTS] Use `hadoop-3.2` distribution in (commit: 8ac86a4) (details)
  8. [MINOR][SQL] Spelling: enabled - legacy_setops_precedence_enbled (commit: c05f6f9) (details)
  9. [SPARK-33754][K8S][DOCS] Update kubernetes/integration-tests/README.md (commit: d662b95) (details)
  10. [SPARK-33527][SQL][FOLLOWUP] Fix the scala 2.13 build failure (commit: 8377aca) (details)
Commit 31e0baca30f21f71353a27b827c2acd0e25bd9d8 by dongjoon
[SPARK-33740][SQL] hadoop configs in hive-site.xml can overrides pre-existing hadoop ones

### What changes were proposed in this pull request?
org.apache.hadoop.conf.Configuration#setIfUnset will ignore those with defaults too

### Why are the changes needed?

fix a regression

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests

Closes #30709 from yaooqinn/SPARK-33740.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 31e0bac)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/internal/SharedStateSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala (diff)
The file was modifiedsql/core/src/test/resources/hive-site.xml (diff)
Commit fab2995972761503563fa2aa547c67047c51bd33 by dongjoon
[SPARK-33742][SQL] Throw PartitionsAlreadyExistException from HiveExternalCatalog.createPartitions()

### What changes were proposed in this pull request?
Throw `PartitionsAlreadyExistException` from `createPartitions()` in Hive external catalog when a partition exists. Currently, `HiveExternalCatalog.createPartitions()` throws `AlreadyExistsException` wrapped by `AnalysisException`.

In the PR, I propose to catch `AlreadyExistsException` in `HiveClientImpl` and replace it by `PartitionsAlreadyExistException`.

### Why are the changes needed?
The behaviour of Hive external catalog deviates from V1/V2 in-memory catalogs that throw `PartitionsAlreadyExistException`. To improve user experience with Spark SQL, it would be better to throw the same exception.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
By running existing test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableAddPartitionSuite"
```

Closes #30711 from MaxGekk/hive-partition-exception.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: fab2995)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/AlterTableAddPartitionSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableAddPartitionSuiteBase.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableAddPartitionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableAddPartitionSuite.scala (diff)
Commit 1ba1732beb8e01edfc4f658d9da4eaabf68ed7cf by dongjoon
[SPARK-33295][BUILD] Upgrade ORC to 1.6.6

### What changes were proposed in this pull request?

This PR aims to upgrade Apache ORC to 1.6.6 for Apache Spark 3.2.0.

### Why are the changes needed?

This brings the latest bug fixes and features.
Apache Iceberg is already using Apache ORC 1.6.6.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #30715 from dongjoon-hyun/SPARK-33295.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 1ba1732)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
Commit cd7a30641f25f99452b7eb46ee2b3c5d59b2c542 by gurwls223
[SPARK-33749][BUILD][PYTHON] Exclude target directory in pycodestyle and flake8

### What changes were proposed in this pull request?

Once you build and ran K8S tests, Python lint fails as below:

```bash
$ ./dev/lint-python
```

Before this PR:

```
starting python compilation test...
python compilation succeeded.

downloading pycodestyle from https://raw.githubusercontent.com/PyCQA/pycodestyle/2.6.0/pycodestyle.py...
starting pycodestyle test...
pycodestyle checks failed:
./resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/python/pyspark/cloudpickle/cloudpickle.py:15:101: E501 line too long (105 > 100 characters)
./resource-managers/kubernetes/integration-tests/target/spark-dist-unpacked/python/docs/source/conf.py:60:101: E501 line too long (124 > 100 characters)
...
```

After this PR:

```
starting python compilation test...
python compilation succeeded.

downloading pycodestyle from https://raw.githubusercontent.com/PyCQA/pycodestyle/2.6.0/pycodestyle.py...
starting pycodestyle test...
pycodestyle checks passed.

starting flake8 test...
flake8 checks passed.

starting mypy test...
mypy checks passed.

starting sphinx-build tests...
sphinx-build checks passed.
```

This PR excludes target directory to avoid such cases in the future.

### Why are the changes needed?

To make it easier to run linters

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested va running `./dev/lint-python`.

Closes #30718 from HyukjinKwon/SPARK-33749.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: cd7a306)
The file was modifieddev/tox.ini (diff)
Commit 7895ea1f50700b56930b3841f16c44442d26e719 by kabhwan.opensource
[SPARK-32910][SS] Remove UninterruptibleThread usage from KafkaOffsetReaderAdmin

### What changes were proposed in this pull request?
The Kafka offset reader which uses `AdminClient` still uses `UninterruptibleThread` to call it. Since there is no evidence that `AdminClient` suffers from similar issues like [KAFKA-1894](https://issues.apache.org/jira/browse/KAFKA-1894) I'm removing `UninterruptibleThread` usage. In order to put the `AdminClient` under stress and make sure it works I've created the following standalone application: https://github.com/gaborgsomogyi/kafka-admin-interruption

What this PR contains:
* Removed `UninterruptibleThread` from `KafkaOffsetReaderAdmin`
* Removed/modified comments which are not true
* Adapted `KafkaRelationSuite`
* Renamed `partitionsAssignedToConsumer` to `partitionsAssignedToAdmin`

### Why are the changes needed?
`KafkaOffsetReaderAdmin` doesn't need `UninterruptibleThread` usage.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing unit tests + manually with simple Kafka to Kafka query.

Closes #30668 from gaborgsomogyi/SPARK-32910.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
(commit: 7895ea1)
The file was modifiedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReaderAdmin.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaRelationSuite.scala (diff)
Commit 24d7e45d31181a24a37261480fcd45a9a97db659 by wenchen
[SPARK-33527][SQL] Extend the function of decode so as consistent with mainstream databases

### What changes were proposed in this pull request?
In Spark, decode(bin, charset) - Decodes the first argument using the second argument character set.

Unfortunately this is NOT what any other SQL vendor understands `DECODE` to do.
`DECODE` generally is a short hand for a simple case expression:

```
SELECT DECODE(c1, 1, 'Hello', 2, 'World', '!') FROM (VALUES (1), (2), (3)) AS T(c1)
=>
(Hello),
(World)
(!)
```
There are some mainstream database support the syntax.
**Oracle**
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/DECODE.html#GUID-39341D91-3442-4730-BD34-D3CF5D4701CE
**Vertica**
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/DECODE.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CString%20Functions%7C_____10
**DB2**
https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1447.htm
**Redshift**
https://docs.aws.amazon.com/redshift/latest/dg/r_DECODE_expression.html
**Pig**
https://pig.apache.org/docs/latest/api/org/apache/pig/piggybank/evaluation/decode/Decode.html
**Teradata**
https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/jtCpCycpEaXESG4d63kMjg
**Snowflake**
https://docs.snowflake.com/en/sql-reference/functions/decode.html

### Why are the changes needed?
It is very useful.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Jenkins test.

Closes #30479 from beliefer/SPARK-33527.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 24d7e45)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/string-functions.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/string-functions.sql (diff)
Commit 8ac86a4c318ddc99d0a979baefd197da2ce1c2b5 by dongjoon
[SPARK-33750][SQL][TESTS] Use `hadoop-3.2` distribution in HiveExternalCatalogVersionsSuite

### What changes were proposed in this pull request?

This PR aims to use `hadoop-3.2` distribution in HiveExternalCatalogVersionsSuite if available.

### Why are the changes needed?

Apache Spark 3.1 is using Hadoop 3 by default. We need to focus on Hadoop 3 more to prepare the future.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #30722 from dongjoon-hyun/SPARK-33750.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 8ac86a4)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala (diff)
Commit c05f6f98b6b06019d99d6a92b61b877afa822d0b by wenchen
[MINOR][SQL] Spelling: enabled - legacy_setops_precedence_enbled

### What changes were proposed in this pull request?

Replace `legacy_setops_precedence_enbled` with `legacy_setops_precedence_enabled`

Alternatively, `legacy_setops_precedence_enabled` could be added, and `legacy_setops_precedence_enbled` retained, and if set the code could honor it and warn about the deprecated spelling.

### Why are the changes needed?

`enabled` is misspelled in `legacy_setops_precedence_enbled`

### Does this PR introduce _any_ user-facing change?

Yes.

It would break current consumers.
Examples include:
* https://www.programmersought.com/article/87752082924/
* https://github.com/fugue-project/fugue/blob/125d873c38e18b5f09b032bd01ac47a0c6739ddc/fugue_sql/_antlr/fugue_sqlLexer.py
* https://github.com/search?q=legacy_setops_precedence_enbled&type=code

### How was this patch tested?

It's been included in #30323 for a while (and is now split out here)

Closes #30677 from jsoref/spelling-enabled.

Authored-by: Josh Soref <jsoref@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: c05f6f9)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala (diff)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
Commit d662b95535f12ebbc671a283b19291f63d2a2b8c by dongjoon
[SPARK-33754][K8S][DOCS] Update kubernetes/integration-tests/README.md to follow the default Hadoop profile updated

### What changes were proposed in this pull request?

This PR updates `kubernetes/integration-tests/README.md`.

### Why are the changes needed?

To follow the current Hadoop profile (hadoop-3.2).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I have confirmed that the integration tests pass with the following command for both Hadoop 3.2 an 2.7.
```
build/mvn integration-test -am -pl :spark-kubernetes-integration-tests_2.12 \
  -Pkubernetes \
  -Pkubernetes-integration-tests \
  -Dspark.kubernetes.test.imageTag=${IMAGE_TAG} \
  -Dspark.kubernetes.test.imageRepo=docker.io/kubespark \
  -Dspark.kubernetes.test.namespace=default \
  -Dspark.kubernetes.test.deployMode=minikube \
  -Dtest.include.tags=k8s
```

Closes #30726 from sarutak/update-kube-integ-readme.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: d662b95)
The file was modifiedresource-managers/kubernetes/integration-tests/README.md (diff)
Commit 8377aca60a4f326f2d1533c5e570518fb7de2895 by dongjoon
[SPARK-33527][SQL][FOLLOWUP] Fix the scala 2.13 build failure

### What changes were proposed in this pull request?

This PR fixes the Scala 2.13 build failure brought by #30479 .

### Why are the changes needed?

To pass Scala 2.13 build.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Should be done byGitHub Actions.

Closes #30727 from sarutak/fix-scala213-build-failure.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 8377aca)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala (diff)