Changes

Summary

  1. [SPARK-32857][CORE] Fix flaky o.a.s.s.BarrierTaskContextSuite.throw (commit: 0b326d5) (details)
  2. [SPARK-33007][SQL] Simplify named_struct + get struct field + from_json (commit: 57ed5a8) (details)
  3. [SPARK-33067][SQL][TESTS][FOLLOWUP] Check error messages in (commit: 584f90c) (details)
  4. [SPARK-33017][PYTHON][DOCS][FOLLOW-UP] Add getCheckpointDir into API (commit: 5ce321d) (details)
  5. [SPARK-33034][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add, (commit: aea78d2) (details)
  6. [SPARK-33004][SQL] Migrate DESCRIBE column to use UnresolvedTableOrView (commit: 7e99fcd) (details)
  7. [SPARK-32189][DOCS][PYTHON][FOLLOW-UP] Fixed broken link and typo in (commit: 4e1ded6) (details)
  8. [SPARK-33002][PYTHON] Remove non-API annotations (commit: 72da6f8) (details)
  9. [SPARK-33036][SQL] Refactor RewriteCorrelatedScalarSubquery code to (commit: 94d648d) (details)
  10. [SPARK-32067][K8S] Use unique ConfigMap name for executor pod template (commit: 3099fd9) (details)
  11. [SPARK-33082][SQL] Remove hive-1.2 workaround code (commit: a127387) (details)
  12. [SPARK-26499][SQL][FOLLOWUP] Print the loading provider exception (commit: 23afc93) (details)
  13. [SPARK-21708][BUILD] Migrate build to sbt 1.x (commit: 6daa2ae) (details)
  14. [SPARK-33086][PYTHON] Add static annotations for pyspark.resource (commit: 37e1b0c) (details)
  15. [SPARK-32511][FOLLOW-UP][SQL][R][PYTHON] Add dropFields to SparkR and (commit: 473b3ba) (details)
  16. [SPARK-32793][SQL] Add raise_error function, adds error message (commit: 39510b0) (details)
  17. [SPARK-33089][SQL] make avro format propagate Hadoop config from DS (commit: bbc887b) (details)
  18. [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle (commit: 1c781a4) (details)
  19. [SPARK-33074][SQL] Classify dialect exceptions in JDBC v2 Table Catalog (commit: 7d6e3fb) (details)
  20. [SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential (commit: 5effa8e) (details)
Commit 0b326d532752fd4e05b08dd16c096f80afe7d727 by dhyun
[SPARK-32857][CORE] Fix flaky o.a.s.s.BarrierTaskContextSuite.throw
exception if the number of barrier() calls are not the same on every
task
### What changes were proposed in this pull request?
Fix the flaky test.
### Why are the changes needed?
The test is flaky: `Expected exception org.apache.spark.SparkException
to be thrown, but no exception was thrown`.
Check the full error stack
[here](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128548/testReport/org.apache.spark.scheduler/BarrierTaskContextSuite/throw_exception_if_the_number_of_barrier___calls_are_not_the_same_on_every_task/).
By analyzing the log below, I found that task 0 hadn't reached the
second `context.barrier()` when another three tasks already raised the
sync timeout exceptions by the first `context.barrier()`. The timeout
exceptions were caught by the `try...catch...`. Then, each task started
another round barrier sync from the second `context.barrier()` and
completed the sync successfully.
```scala 20/09/10 20:54:48.821 dispatcher-event-loop-10 INFO
BarrierCoordinator: Current barrier epoch for Stage 0 (Attempt 0) is 0.
20/09/10 20:54:48.822 dispatcher-event-loop-10 INFO BarrierCoordinator:
Barrier sync epoch 0 from Stage 0 (Attempt 0) received update from Task
2, current progress: 1/4. 20/09/10 20:54:48.826
dispatcher-BlockManagerMaster INFO BlockManagerInfo: Added
broadcast_0_piece0 in memory on localhost:38420 (size: 2.2 KiB, free:
546.3 MiB) 20/09/10 20:54:48.908 dispatcher-event-loop-12 INFO
BarrierCoordinator: Current barrier epoch for Stage 0 (Attempt 0) is 0.
20/09/10 20:54:48.909 dispatcher-event-loop-12 INFO BarrierCoordinator:
Barrier sync epoch 0 from Stage 0 (Attempt 0) received update from Task
1, current progress: 2/4. 20/09/10 20:54:48.959 dispatcher-event-loop-11
INFO BarrierCoordinator: Current barrier epoch for Stage 0 (Attempt 0)
is 0. 20/09/10 20:54:48.960 dispatcher-event-loop-11 INFO
BarrierCoordinator: Barrier sync epoch 0 from Stage 0 (Attempt 0)
received update from Task 3, current progress: 3/4. 20/09/10
20:54:49.616 dispatcher-CoarseGrainedScheduler INFO TaskSchedulerImpl:
Skip current round of resource offers for barrier stage 0 because the
barrier taskSet requires 4 slots, while the total number of available
slots is 0. 20/09/10 20:54:49.899 dispatcher-event-loop-15 INFO
BarrierCoordinator: Current barrier epoch for Stage 0 (Attempt 0) is 0.
20/09/10 20:54:49.900 dispatcher-event-loop-15 INFO BarrierCoordinator:
Barrier sync epoch 0 from Stage 0 (Attempt 0) received update from Task
1, current progress: 1/4. 20/09/10 20:54:49.965 dispatcher-event-loop-13
INFO BarrierCoordinator: Current barrier epoch for Stage 0 (Attempt 0)
is 0. 20/09/10 20:54:49.966 dispatcher-event-loop-13 INFO
BarrierCoordinator: Barrier sync epoch 0 from Stage 0 (Attempt 0)
received update from Task 3, current progress: 2/4. 20/09/10
20:54:50.112 dispatcher-event-loop-16 INFO BarrierCoordinator: Current
barrier epoch for Stage 0 (Attempt 0) is 0. 20/09/10 20:54:50.113
dispatcher-event-loop-16 INFO BarrierCoordinator: Barrier sync epoch 0
from Stage 0 (Attempt 0) received update from Task 0, current progress:
3/4. 20/09/10 20:54:50.609 dispatcher-CoarseGrainedScheduler INFO
TaskSchedulerImpl: Skip current round of resource offers for barrier
stage 0 because the barrier taskSet requires 4 slots, while the total
number of available slots is 0. 20/09/10 20:54:50.826
dispatcher-event-loop-17 INFO BarrierCoordinator: Current barrier epoch
for Stage 0 (Attempt 0) is 0. 20/09/10 20:54:50.827
dispatcher-event-loop-17 INFO BarrierCoordinator: Barrier sync epoch 0
from Stage 0 (Attempt 0) received update from Task 2, current progress:
4/4. 20/09/10 20:54:50.827 dispatcher-event-loop-17 INFO
BarrierCoordinator: Barrier sync epoch 0 from Stage 0 (Attempt 0)
received all updates from tasks, finished successfully.
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Updated the test and tested a hundred times without failure(Previously,
there could be several failures).
Closes #29732 from Ngone51/fix-flaky-throw-exception.
Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: 0b326d5)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala (diff)
Commit 57ed5a829b7dd8c92e5dfb7bb96373c8f464246c by dhyun
[SPARK-33007][SQL] Simplify named_struct + get struct field + from_json
expression chain
### What changes were proposed in this pull request?
This proposes to simplify named_struct + get struct field + from_json
expression chain from `struct(from_json.col1, from_json.col2,
from_json.col3...)` to `struct(from_json)`.
### Why are the changes needed?
Simplify complex expression tree that could be produced by query
optimization or user.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit test.
Closes #29942 from viirya/SPARK-33007.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 57ed5a8)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJsonExprs.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJsonExprsSuite.scala (diff)
Commit 584f90c82e8e47cdcaab50f95e6c709f460cd789 by gurwls223
[SPARK-33067][SQL][TESTS][FOLLOWUP] Check error messages in
JDBCTableCatalogSuite
### What changes were proposed in this pull request? Get error message
from the expected exception, and check that they are reasonable.
### Why are the changes needed? To improve tests by expecting particular
error messages.
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? By running `JDBCTableCatalogSuite`.
Closes #29957 from MaxGekk/jdbcv2-negative-tests-followup.
Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 584f90c)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala (diff)
Commit 5ce321dc80a699fa525ca5b69bf2c28e10f8a12a by gurwls223
[SPARK-33017][PYTHON][DOCS][FOLLOW-UP] Add getCheckpointDir into API
documentation
### What changes were proposed in this pull request?
This is a followup of https://github.com/apache/spark/pull/29918. We
should add it into the documentation as well.
### Why are the changes needed?
To show users new APIs.
### Does this PR introduce _any_ user-facing change?
Yes, `SparkContext.getCheckpointDir` will be documented.
### How was this patch tested?
Manually built the PySpark documentation:
```bash cd python/docs make clean html cd build/html open index.html
```
Closes #29960 from HyukjinKwon/SPARK-33017.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 5ce321d)
The file was modifiedpython/docs/source/reference/pyspark.rst (diff)
Commit aea78d2c8cdf12f4978fa6a69107d096c07c6fec by wenchen
[SPARK-33034][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: add,
update type and nullability of columns (Oracle dialect)
### What changes were proposed in this pull request? 1. Override the
default SQL strings in the Oracle Dialect for:
   - ALTER TABLE ADD COLUMN
   - ALTER TABLE UPDATE COLUMN TYPE
   - ALTER TABLE UPDATE COLUMN NULLABILITY 2. Add new docker integration
test suite `jdbc/v2/OracleIntegrationSuite.scala`
### Why are the changes needed? In SPARK-24907, we implemented JDBC v2
Table Catalog but it doesn't support some `ALTER TABLE` at the moment.
This PR supports Oracle specific `ALTER TABLE`.
### Does this PR introduce _any_ user-facing change? Yes
### How was this patch tested? By running new integration test suite:
```
$ ./build/sbt -Pdocker-integration-tests "test-only
*.OracleIntegrationSuite"
```
Closes #29912 from MaxGekk/jdbcv2-oracle-alter-table.
Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: aea78d2)
The file was addedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/OracleIntegrationSuite.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTable.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DockerJDBCIntegrationSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala (diff)
Commit 7e99fcd64efa425f3c985df4fe957a3be274a49a by wenchen
[SPARK-33004][SQL] Migrate DESCRIBE column to use UnresolvedTableOrView
to resolve the identifier
### What changes were proposed in this pull request?
This PR proposes to migrate `DESCRIBE tbl colname` to use
`UnresolvedTableOrView` to resolve the table/view identifier. This
allows consistent resolution rules (temp view first, etc.) to be applied
for both v1/v2 commands. More info about the consistent resolution rule
proposal can be found in
[JIRA](https://issues.apache.org/jira/browse/SPARK-29900) or [proposal
doc](https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing).
### Why are the changes needed?
The current behavior is not consistent between v1 and v2 commands when
resolving a temp view. In v2, the `t` in the following example is
resolved to a table:
```scala sql("CREATE TABLE testcat.ns.t (id bigint) USING foo")
sql("CREATE TEMPORARY VIEW t AS SELECT 2 as i") sql("USE testcat.ns")
sql("DESCRIBE t i") // 't' is resolved to testcat.ns.t
Describing columns is not supported for v2 tables.;
org.apache.spark.sql.AnalysisException: Describing columns is not
supported for v2 tables.;
``` whereas in v1, the `t` is resolved to a temp view:
```scala sql("CREATE DATABASE test") sql("CREATE TABLE
spark_catalog.test.t (id bigint) USING csv") sql("CREATE TEMPORARY VIEW
t AS SELECT 2 as i") sql("USE spark_catalog.test") sql("DESCRIBE t
i").show // 't' is resolved to a temp view
+---------+----------+
|info_name|info_value|
+---------+----------+
| col_name|         i|
|data_type|       int|
|  comment|      NULL|
+---------+----------+
```
### Does this PR introduce _any_ user-facing change?
After this PR, `DESCRIBE t i` is resolved to a temp view `t` instead of
`testcat.ns.t`.
### How was this patch tested?
Added a new test
Closes #29880 from imback82/describe_column_consistent.
Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 7e99fcd)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was addedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/DescribeCommandSchema.scala
The file was removedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/DescribeTableSchema.scala
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
Commit 4e1ded67f88ffc869379319758d923aa538554b2 by gurwls223
[SPARK-32189][DOCS][PYTHON][FOLLOW-UP] Fixed broken link and typo in
PySpark docs
### What changes were proposed in this pull request?
This PR is a follow-up of #29781 to fix broken link and typo.
<img width="638" alt="Screen Shot 2020-10-07 at 3 56 28 PM"
src="https://user-images.githubusercontent.com/44108233/95297583-aa0ccb00-08b5-11eb-85db-89022c76d7e1.png">
<img width="734" alt="Screen Shot 2020-10-07 at 3 55 36 PM"
src="https://user-images.githubusercontent.com/44108233/95297508-8ba6cf80-08b5-11eb-9caa-0b52a2482ada.png">
### Why are the changes needed?
Current link is not working properly because of wrong path.
### Does this PR introduce _any_ user-facing change?
Yes, the link is working properly now.
### How was this patch tested?
Manually built the doc.
Closes #29963 from itholic/SPARK-32189-FOLLOWUP.
Authored-by: itholic <haejoon309@naver.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 4e1ded6)
The file was modifiedpython/docs/source/development/setting_ide.rst (diff)
The file was modifiedpython/docs/source/development/debugging.rst (diff)
Commit 72da6f86cfbdd36dac3fc440c333bc1db1935edd by gurwls223
[SPARK-33002][PYTHON] Remove non-API annotations
### What changes were proposed in this pull request?
This PR:
- removes annotations for modules which are not part of the public API.
- removes `__init__.pyi` files, if no annotations, beyond exports, are
present.
### Why are the changes needed?
Primarily to reduce maintenance overhead and as requested in the
comments to https://github.com/apache/spark/pull/29591
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing tests and additional MyPy checks:
``` mypy --no-incremental --config python/mypy.ini python/pyspark
MYPYPATH=python/ mypy --no-incremental --config python/mypy.ini
examples/src/main/python/ml examples/src/main/python/sql
examples/src/main/python/sql/streaming
```
Closes #29879 from zero323/SPARK-33002.
Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 72da6f8)
The file was modifiedpython/pyspark/broadcast.pyi (diff)
The file was removedpython/pyspark/shuffle.pyi
The file was removedpython/pyspark/sql/pandas/__init__.pyi
The file was modifiedpython/pyspark/util.py (diff)
The file was removedpython/pyspark/sql/pandas/serializers.pyi
The file was removedpython/pyspark/_globals.pyi
The file was removedpython/pyspark/streaming/__init__.pyi
The file was removedpython/pyspark/daemon.pyi
The file was removedpython/pyspark/serializers.pyi
The file was removedpython/pyspark/sql/pandas/typehints.pyi
The file was removedpython/pyspark/sql/utils.pyi
The file was modifiedpython/pyspark/accumulators.pyi (diff)
The file was modifiedpython/mypy.ini (diff)
The file was removedpython/pyspark/join.pyi
The file was removedpython/pyspark/rddsampler.pyi
The file was removedpython/pyspark/shell.pyi
The file was removedpython/pyspark/find_spark_home.pyi
The file was removedpython/pyspark/ml/__init__.pyi
The file was removedpython/pyspark/mllib/__init__.pyi
The file was modifiedpython/pyspark/shell.py (diff)
The file was removedpython/pyspark/sql/avro/__init__.pyi
The file was removedpython/pyspark/util.pyi
The file was removedpython/pyspark/worker.pyi
The file was removedpython/pyspark/java_gateway.pyi
The file was removedpython/pyspark/sql/pandas/types.pyi
The file was removedpython/pyspark/streaming/util.pyi
The file was modifiedpython/pyspark/serializers.py (diff)
The file was removedpython/pyspark/traceback_utils.pyi
The file was removedpython/pyspark/resource/__init__.pyi
The file was removedpython/pyspark/sql/pandas/utils.pyi
Commit 94d648dff5f24b4dea3873fd8e6609b1a099d0a2 by yamamuro
[SPARK-33036][SQL] Refactor RewriteCorrelatedScalarSubquery code to
replace exprIds in a bottom-up manner
### What changes were proposed in this pull request?
This PR intends to refactor code in `RewriteCorrelatedScalarSubquery`
for replacing `ExprId`s in a bottom-up manner instead of doing in a
top-down one.
This PR comes from the talk with cloud-fan in
https://github.com/apache/spark/pull/29585#discussion_r490371252.
### Why are the changes needed?
To improve code.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing tests.
Closes #29913 from maropu/RefactorRewriteCorrelatedScalarSubquery.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by:
Takeshi Yamamuro <yamamuro@apache.org>
(commit: 94d648d)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala (diff)
Commit 3099fd9f9d576c96642c0e66c74797b8882b70bb by dhyun
[SPARK-32067][K8S] Use unique ConfigMap name for executor pod template
### What changes were proposed in this pull request?
The pod template configmap always had the same name. This PR makes it
unique.
### Why are the changes needed?
If you scheduled 2 spark jobs they will both use the same configmap name
this will result in conflicts. This PR fixes that
**BEFORE**
```
$ kubectl get cm --all-namespaces -w | grep podspec podspec-configmap  
                          1      65s
```
**AFTER**
```
$ kubectl get cm --all-namespaces -w | grep podspec
aaece65ef82e4a30b7b7800aad600d4f 
spark-test-app-aac9f37502b2ca55-driver-podspec-conf-map   1      0s
```
This can be seen when running the integration tests
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit tests and the integration tests test if this works
Closes #29934 from
stijndehaes/bugfix/SPARK-32067-unique-name-for-template-configmap.
Authored-by: Stijn De Haes <stijndehaes@gmail.com> Signed-off-by:
Dongjoon Hyun <dhyun@apple.com>
(commit: 3099fd9)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStepSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/PodTemplateConfigMapStep.scala (diff)
Commit a127387a53e1a24e76de83c5a1858fcdbd38c3a2 by dhyun
[SPARK-33082][SQL] Remove hive-1.2 workaround code
### What changes were proposed in this pull request?
This PR removes old Hive-1.2 profile related workaround code.
### Why are the changes needed?
To simply the code.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CI.
Closes #29961 from dongjoon-hyun/SPARK-HIVE12.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: a127387)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFilters.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/ClasspathDependenciesSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveShimSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcFilterSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetastoreCatalogSuite.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetTablesOperation.scala (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkThriftServerProtocolVersionsSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala (diff)
The file was modifiedsql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala (diff)
Commit 23afc930ae2fb0f3d7fd214324351fc6a0b8253a by dhyun
[SPARK-26499][SQL][FOLLOWUP] Print the loading provider exception
starting from the INFO level
### What changes were proposed in this pull request? 1. Don't print the
exception in the error message while loading a built-in provider. 2.
Print the exception starting from the INFO level.
Up to the INFO level, the output is:
``` 17:48:32.342 ERROR
org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider:
Failed to load built in provider.
``` and starting from the INFO level:
``` 17:48:32.342 ERROR
org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider:
Failed to load built in provider. 17:48:32.342 INFO
org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider:
Loading of the provider failed with the exception:
java.util.ServiceConfigurationError:
org.apache.spark.sql.jdbc.JdbcConnectionProvider: Provider
org.apache.spark.sql.execution.datasources.jdbc.connection.IntentionallyFaultyConnectionProvider
could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at
java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at
org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider$.loadProviders(ConnectionProvider.scala:41)
```
### Why are the changes needed? To avoid "noise" in logs while running
tests. Currently, logs are blown up:
```
org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider:
Loading of the provider failed with the exception:
java.util.ServiceConfigurationError:
org.apache.spark.sql.jdbc.JdbcConnectionProvider: Provider
org.apache.spark.sql.execution.datasources.jdbc.connection.IntentionallyFaultyConnectionProvider
could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at
java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at
org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider$.loadProviders(ConnectionProvider.scala:41)
...
at java.lang.Thread.run(Thread.java:748) Caused by:
java.lang.IllegalArgumentException: Intentional Exception
at
org.apache.spark.sql.execution.datasources.jdbc.connection.IntentionallyFaultyConnectionProvider.<init>(IntentionallyFaultyConnectionProvider.scala:26)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at
java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
```
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? By running:
```
$ build/sbt "sql/test:testOnly
org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalogSuite"
```
Closes #29968 from MaxGekk/gaborgsomogyi-SPARK-32001-followup.
Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: 23afc93)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProvider.scala (diff)
Commit 6daa2aeb0164277088396102897b2ea4426b9f1c by dhyun
[SPARK-21708][BUILD] Migrate build to sbt 1.x
### What changes were proposed in this pull request?
Migrate sbt-launcher URL to download one for sbt 1.x. Update plugins
versions where required by sbt update. Change sbt version to be used to
latest released at the moment, 1.3.13 Adjust build settings according to
plugins and sbt changes.
### Why are the changes needed?
Migration to sbt 1.x: 1. enhances dev experience in development 2.
updates build plugins to bring there new features/to fix bugs in them 3.
enhances build performance on sbt side 4. eases movement to Scala 3 /
dotty
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
All existing tests passed, both on Jenkins and via Github Actions, also
manually for Scala 2.13 profile.
Closes #29286 from gemelen/feature/sbt-1.x.
Authored-by: Denis Pyshev <git@gemelen.net> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: 6daa2ae)
The file was modifiedproject/MimaBuild.scala (diff)
The file was modifiedtools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala (diff)
The file was modifiedproject/plugins.sbt (diff)
The file was added.sbtopts
The file was modifiedproject/SparkBuild.scala (diff)
The file was modifiedbuild/sbt-launch-lib.bash (diff)
The file was modifiedproject/MimaExcludes.scala (diff)
The file was modifiedproject/build.properties (diff)
Commit 37e1b0c4a5e999ba420cc6eacb2f5a7100fef029 by gurwls223
[SPARK-33086][PYTHON] Add static annotations for pyspark.resource
### What changes were proposed in this pull request?
This PR replaces dynamically generated annotations for following
modules:
- `pyspark.resource.information`
- `pyspark.resource.profile`
- `pyspark.resource.requests`
### Why are the changes needed?
These modules where not manually annotated in `pyspark-stubs`, but are
part of the public API and we should provide more precise annotations.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
MyPy tests:
``` mypy --no-incremental --config python/mypy.ini python/pyspark
```
Closes #29969 from zero323/SPARK-32714-FOLLOW-UP-RESOURCE.
Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 37e1b0c)
The file was modifiedpython/pyspark/resource/requests.pyi (diff)
The file was modifiedpython/pyspark/resource/information.pyi (diff)
The file was modifiedpython/pyspark/resource/profile.pyi (diff)
Commit 473b3ba6aa3ead60c6f3d66c982b7883e39b7ad2 by gurwls223
[SPARK-32511][FOLLOW-UP][SQL][R][PYTHON] Add dropFields to SparkR and
PySpark
### What changes were proposed in this pull request?
This PR adds `dropFields` method to:
- PySpark `Column`
- SparkR `Column`
### Why are the changes needed?
Feature parity.
### Does this PR introduce _any_ user-facing change?
No, new API.
### How was this patch tested?
- New unit tests.
- Manual verification of examples / doctests.
- Manual run of MyPy tests
Closes #29967 from zero323/SPARK-32511-FOLLOW-UP-PYSPARK-SPARKR.
Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 473b3ba)
The file was modifiedR/pkg/tests/fulltests/test_sparkSQL.R (diff)
The file was modifiedpython/pyspark/sql/tests/test_column.py (diff)
The file was modifiedpython/pyspark/sql/column.py (diff)
The file was modifiedpython/pyspark/sql/column.pyi (diff)
The file was modifiedR/pkg/NAMESPACE (diff)
The file was modifiedR/pkg/R/column.R (diff)
The file was modifiedR/pkg/R/generics.R (diff)
Commit 39510b0e9b79ca59c073bed2219d35d4b81fb7f1 by gurwls223
[SPARK-32793][SQL] Add raise_error function, adds error message
parameter to assert_true
## What changes were proposed in this pull request?
Adds a SQL function `raise_error` which underlies the refactored
`assert_true` function. `assert_true` now also (optionally) accepts a
custom error message field.
`raise_error` is exposed in SQL, Python, Scala, and R.
`assert_true` was previously only exposed in SQL; it is now also exposed
in Python, Scala, and R.
### Why are the changes needed?
Improves usability of `assert_true` by clarifying error messaging, and
adds the useful helper function `raise_error`.
### Does this PR introduce _any_ user-facing change?
Yes:
- Adds `raise_error` function to the SQL, Python, Scala, and R APIs.
- Adds `assert_true` function to the SQL, Python and R APIs.
### How was this patch tested?
Adds unit tests in SQL, Python, Scala, and R for `assert_true` and
`raise_error`.
Closes #29947 from karenfeng/spark-32793.
Lead-authored-by: Karen Feng <karen.feng@databricks.com> Co-authored-by:
Hyukjin Kwon <gurwls223@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 39510b0)
The file was modifiedpython/pyspark/sql/functions.pyi (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/misc-functions.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/expressions/ExpressionInfoSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-functions/sql-expression-schema.md (diff)
The file was modifiedR/pkg/R/generics.R (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala (diff)
The file was modifiedR/pkg/R/functions.R (diff)
The file was modifiedpython/docs/source/reference/pyspark.sql.rst (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MiscExpressionsSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/misc-functions.sql (diff)
The file was modifiedpython/pyspark/sql/tests/test_functions.py (diff)
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala (diff)
The file was modifiedR/pkg/tests/fulltests/test_sparkSQL.R (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala (diff)
Commit bbc887bf73233b8c65ace05929290c0de4f63de8 by gurwls223
[SPARK-33089][SQL] make avro format propagate Hadoop config from DS
options to underlying HDFS file system
### What changes were proposed in this pull request?
In `AvroUtils`'s `inferSchema()`, propagate Hadoop config from DS
options to underlying HDFS file system.
### Why are the changes needed?
There is a bug that when running:
```scala spark.read.format("avro").options(conf).load(path)
``` The underlying file system will not receive the `conf` options.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
unit test added
Closes #29971 from yuningzh-db/avro_options.
Authored-by: Yuning Zhang <yuning.zhang@databricks.com> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: bbc887b)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala (diff)
The file was modifiedexternal/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala (diff)
Commit 1c781a4354666bba4329e588a0e9a9fa8980303b by wenchen
[SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle
more scenarios such as PartitioningCollection
### What changes were proposed in this pull request?
This PR proposes to improve  `EnsureRquirement.reorderJoinKeys` to
handle the following scenarios: 1. If the keys cannot be reordered to
match the left-side `HashPartitioning`, consider the right-side
`HashPartitioning`. 2. Handle `PartitioningCollection`, which may
contain `HashPartitioning`
### Why are the changes needed?
1. For the scenario 1), the current behavior matches either the
left-side `HashPartitioning` or the right-side `HashPartitioning`. This
means that if both sides are `HashPartitioning`, it will try to match
only the left side. The following will not consider the right-side
`HashPartitioning`:
``` val df1 = (0 until 10).map(i => (i % 5, i % 13)).toDF("i1", "j1")
val df2 = (0 until 10).map(i => (i % 7, i % 11)).toDF("i2", "j2")
df1.write.format("parquet").bucketBy(4, "i1",
"j1").saveAsTable("t1")df2.write.format("parquet").bucketBy(4, "i2",
"j2").saveAsTable("t2") val t1 = spark.table("t1") val t2 =
spark.table("t2") val join = t1.join(t2, t1("i1") === t2("j2") &&
t1("i1") === t2("i2"))
join.explain
== Physical Plan ==
*(5) SortMergeJoin [i1#26, i1#26], [j2#31, i2#30], Inner
:- *(2) Sort [i1#26 ASC NULLS FIRST, i1#26 ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(i1#26, i1#26, 4), true, [id=#69]
:     +- *(1) Project [i1#26, j1#27]
:        +- *(1) Filter isnotnull(i1#26)
:           +- *(1) ColumnarToRow
:              +- FileScan parquet default.t1[i1#26,j1#27] Batched:
true, DataFilters: [isnotnull(i1#26)], Format: Parquet, Location:
InMemoryFileIndex[..., PartitionFilters: [], PushedFilters:
[IsNotNull(i1)], ReadSchema: struct<i1:int,j1:int>,
SelectedBucketsCount: 4 out of 4
+- *(4) Sort [j2#31 ASC NULLS FIRST, i2#30 ASC NULLS FIRST], false, 0.
  +- Exchange hashpartitioning(j2#31, i2#30, 4), true, [id=#79].     
<===== This can be removed
     +- *(3) Project [i2#30, j2#31]
        +- *(3) Filter (((j2#31 = i2#30) AND isnotnull(j2#31)) AND
isnotnull(i2#30))
           +- *(3) ColumnarToRow
              +- FileScan parquet default.t2[i2#30,j2#31] Batched: true,
DataFilters: [(j2#31 = i2#30), isnotnull(j2#31), isnotnull(i2#30)],
Format: Parquet, Location: InMemoryFileIndex[..., PartitionFilters: [],
PushedFilters: [IsNotNull(j2), IsNotNull(i2)], ReadSchema:
struct<i2:int,j2:int>, SelectedBucketsCount: 4 out of 4
```
2.  For the scenario 2), the current behavior does not handle
`PartitioningCollection`:
``` val df1 = (0 until 100).map(i => (i % 5, i % 13)).toDF("i1", "j1")
val df2 = (0 until 100).map(i => (i % 7, i % 11)).toDF("i2", "j2") val
df3 = (0 until 100).map(i => (i % 5, i % 13)).toDF("i3", "j3") val join
= df1.join(df2, df1("i1") === df2("i2") && df1("j1") === df2("j2")) //
PartitioningCollection val join2 = join.join(df3, join("j1") ===
df3("j3") && join("i1") === df3("i3")) join2.explain
== Physical Plan ==
*(9) SortMergeJoin [j1#8, i1#7], [j3#30, i3#29], Inner
:- *(6) Sort [j1#8 ASC NULLS FIRST, i1#7 ASC NULLS FIRST], false, 0.   
  <===== This can be removed
:  +- Exchange hashpartitioning(j1#8, i1#7, 5), true, [id=#58]         
  <===== This can be removed
:     +- *(5) SortMergeJoin [i1#7, j1#8], [i2#18, j2#19], Inner
:        :- *(2) Sort [i1#7 ASC NULLS FIRST, j1#8 ASC NULLS FIRST],
false, 0
:        :  +- Exchange hashpartitioning(i1#7, j1#8, 5), true, [id=#45]
:        :     +- *(1) Project [_1#2 AS i1#7, _2#3 AS j1#8]
:        :        +- *(1) LocalTableScan [_1#2, _2#3]
:        +- *(4) Sort [i2#18 ASC NULLS FIRST, j2#19 ASC NULLS FIRST],
false, 0
:           +- Exchange hashpartitioning(i2#18, j2#19, 5), true,
[id=#51]
:              +- *(3) Project [_1#13 AS i2#18, _2#14 AS j2#19]
:                 +- *(3) LocalTableScan [_1#13, _2#14]
+- *(8) Sort [j3#30 ASC NULLS FIRST, i3#29 ASC NULLS FIRST], false, 0
  +- Exchange hashpartitioning(j3#30, i3#29, 5), true, [id=#64]
     +- *(7) Project [_1#24 AS i3#29, _2#25 AS j3#30]
        +- *(7) LocalTableScan [_1#24, _2#25]
```
### Does this PR introduce _any_ user-facing change?
Yes, now from the above examples, the shuffle/sort nodes pointed by
`This can be removed` are now removed: 1. Senario 1):
```
== Physical Plan ==
*(4) SortMergeJoin [i1#26, i1#26], [i2#30, j2#31], Inner
:- *(2) Sort [i1#26 ASC NULLS FIRST, i1#26 ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(i1#26, i1#26, 4), true, [id=#67]
:     +- *(1) Project [i1#26, j1#27]
:        +- *(1) Filter isnotnull(i1#26)
:           +- *(1) ColumnarToRow
:              +- FileScan parquet default.t1[i1#26,j1#27] Batched:
true, DataFilters: [isnotnull(i1#26)], Format: Parquet, Location:
InMemoryFileIndex[..., PartitionFilters: [], PushedFilters:
[IsNotNull(i1)], ReadSchema: struct<i1:int,j1:int>,
SelectedBucketsCount: 4 out of 4
+- *(3) Sort [i2#30 ASC NULLS FIRST, j2#31 ASC NULLS FIRST], false, 0
  +- *(3) Project [i2#30, j2#31]
     +- *(3) Filter (((j2#31 = i2#30) AND isnotnull(j2#31)) AND
isnotnull(i2#30))
        +- *(3) ColumnarToRow
           +- FileScan parquet default.t2[i2#30,j2#31] Batched: true,
DataFilters: [(j2#31 = i2#30), isnotnull(j2#31), isnotnull(i2#30)],
Format: Parquet, Location: InMemoryFileIndex[..., PartitionFilters: [],
PushedFilters: [IsNotNull(j2), IsNotNull(i2)], ReadSchema:
struct<i2:int,j2:int>, SelectedBucketsCount: 4 out of 4
``` 2. Scenario 2):
```
== Physical Plan ==
*(8) SortMergeJoin [i1#7, j1#8], [i3#29, j3#30], Inner
:- *(5) SortMergeJoin [i1#7, j1#8], [i2#18, j2#19], Inner
:  :- *(2) Sort [i1#7 ASC NULLS FIRST, j1#8 ASC NULLS FIRST], false, 0
:  :  +- Exchange hashpartitioning(i1#7, j1#8, 5), true, [id=#43]
:  :     +- *(1) Project [_1#2 AS i1#7, _2#3 AS j1#8]
:  :        +- *(1) LocalTableScan [_1#2, _2#3]
:  +- *(4) Sort [i2#18 ASC NULLS FIRST, j2#19 ASC NULLS FIRST], false, 0
:     +- Exchange hashpartitioning(i2#18, j2#19, 5), true, [id=#49]
:        +- *(3) Project [_1#13 AS i2#18, _2#14 AS j2#19]
:           +- *(3) LocalTableScan [_1#13, _2#14]
+- *(7) Sort [i3#29 ASC NULLS FIRST, j3#30 ASC NULLS FIRST], false, 0
  +- Exchange hashpartitioning(i3#29, j3#30, 5), true, [id=#58]
     +- *(6) Project [_1#24 AS i3#29, _2#25 AS j3#30]
        +- *(6) LocalTableScan [_1#24, _2#25]
```
### How was this patch tested?
Added tests.
Closes #29074 from imback82/reorder_keys.
Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 1c781a4)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/exchange/EnsureRequirementsSuite.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala (diff)
Commit 7d6e3fb998021b4873f3bee8a8218d2504ed88a0 by wenchen
[SPARK-33074][SQL] Classify dialect exceptions in JDBC v2 Table Catalog
### What changes were proposed in this pull request? 1. Add new method
to the `JdbcDialect` class - `classifyException()`. It converts dialect
specific exception to Spark's `AnalysisException` or its sub-classes. 2.
Replace H2 exception  `org.h2.jdbc.JdbcSQLException` in
`JDBCTableCatalogSuite` by `AnalysisException`. 3. Add `H2Dialect`
### Why are the changes needed? Currently JDBC v2 Table Catalog
implementation throws dialect specific exception and ignores exceptions
defined in the `TableCatalog` interface. This PR adds new method for
converting dialect specific exception, and assumes that follow up PRs
will implement `classifyException()`.
### Does this PR introduce _any_ user-facing change? Yes.
### How was this patch tested? By running existing test suites
`JDBCTableCatalogSuite` and `JDBCV2Suite`.
Closes #29952 from MaxGekk/jdbcv2-classify-exception.
Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 7d6e3fb)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NoSuchItemException.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala (diff)
Commit 5effa8ea261ba59214afedc2853d1b248b330ca6 by yamamuro
[SPARK-33091][SQL] Avoid using map instead of foreach to avoid potential
side effect at callers of OrcUtils.readCatalystSchema
### What changes were proposed in this pull request?
This is a kind of a followup of SPARK-32646. New JIRA was filed to
control the fixed versions properly.
When you use `map`, it might be lazily evaluated and not executed. To
avoid this,  we should better use `foreach`. See also SPARK-16694.
Current codes look not causing any bug for now but it should be best to
fix to avoid potential issues.
### Why are the changes needed?
To avoid potential issues from `map` being lazy and not executed.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Ran related tests. CI in this PR should verify.
Closes #29974 from HyukjinKwon/SPARK-32646.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Takeshi
Yamamuro <yamamuro@apache.org>
(commit: 5effa8e)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcPartitionReaderFactory.scala (diff)