Changes

Summary

  1. [MINOR][DOCS] Replace http to https when possible in PySpark (commit: b5470ae) (details)
  2. [SPARK-34500][DOCS][EXAMPLES] Replace symbol literals with $"" in (commit: 612d523) (details)
  3. [SPARK-34490][SQL] Analysis should fail if the view refers a dropped (commit: be675a0) (details)
  4. [SPARK-34475][SQL] Rename logical nodes of v2 `ALTER` commands (commit: 8f994cb) (details)
  5. Revert "[SPARK-34380][SQL] Support ifExists for ALTER TABLE ... UNSET (commit: 429f8af) (details)
  6. [SPARK-34502][SQL] Remove unused parameters in join methods (commit: 443139b) (details)
  7. [SPARK-34508][SQL][TEST] Skip HiveExternalCatalogVersionsSuite if (commit: 0d5d248) (details)
  8. [SPARK-34168][SQL][FOLLOWUP] Improve DynamicPartitionPruningSuiteBase (commit: 95e45c6) (details)
  9. [SPARK-31891][SQL] Support `MSCK REPAIR TABLE .. [{ADD|DROP|SYNC} (commit: 7f27d33) (details)
  10. [SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by (commit: 2e31e2c) (details)
  11. [SPARK-26138][SQL] Pushdown limit through InnerLike when condition is (commit: b5afff5) (details)
  12. [MINOR][DOCS][K8S] Use hadoop-aws 3.2.2 in K8s example (commit: a6dcd55) (details)
  13. Revert "[SPARK-32703][SQL] Replace deprecated API calls from (commit: 80bad08) (details)
  14. [SPARK-34245][CORE] Ensure Master removes executors that failed to send (commit: f542ecd) (details)
  15. [SPARK-34290][SQL] Support v2 `TRUNCATE TABLE` (commit: f64fc22) (details)
  16. [SPARK-34246][SQL] New type coercion syntax rules in ANSI mode (commit: 5d9cfd7) (details)
  17. [SPARK-34152][SQL] Make CreateViewStatement.child to be LogicalPlan's (commit: 714ff73) (details)
  18. [SPARK-33599][SQL][FOLLOWUP] Group exception messages in (commit: 14934f4) (details)
  19. [SPARK-34514][SQL] Push down limit for LEFT SEMI and LEFT ANTI join (commit: 6ef57d3) (details)
  20. [SPARK-31891][SQL][DOCS][FOLLOWUP] Fix typo in the description of `MSCK (commit: 87409c4) (details)
  21. [SPARK-34515][SQL] Fix NPE if InSet contains null value during (commit: 999d3b8) (details)
  22. [SPARK-32617][K8S][TESTS] Configure kubernetes client based on (commit: b17754a) (details)
  23. [SPARK-34497][SQL] Fix built-in JDBC connection providers to restore JVM (commit: 44eadb9) (details)
  24. [SPARK-34531][CORE] Remove Experimental API tag in PrometheusServlet (commit: 22383e3) (details)
  25. [SPARK-34520][CORE] Remove unused SecurityManager references (commit: 8a1e172) (details)
  26. [SPARK-34518][SQL] Rename `AlterTableRecoverPartitionsCommand` to (commit: c56af69) (details)
  27. [SPARK-34436][SQL] DPP support LIKE ANY/ALL expression (commit: 4a3200b) (details)
Commit b5470ae294b81107443dec81648b847e0b58aca5 by gurwls223
[MINOR][DOCS] Replace http to https when possible in PySpark documentation

### What changes were proposed in this pull request?

This PR proposes:
- Change http to https for better security
- Change http://apache-spark-developers-list.1001551.n3.nabble.com/ to official mailing list link (https://mail-archives.apache.org/mod_mbox/spark-dev/)

### Why are the changes needed?

For better security, and to use official link.

### Does this PR introduce _any_ user-facing change?

Yes, It exposes more secure and correct links to the PySpark end users in PySpark documentation.

### How was this patch tested?

I manually checked if each link works

Closes #31616 from HyukjinKwon/minor-https.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: b5470ae)
The file was modifiedpython/docs/source/user_guide/arrow_pandas.rst (diff)
The file was modifiedpython/docs/source/development/contributing.rst (diff)
The file was modifiedpython/docs/source/user_guide/index.rst (diff)
The file was modifiedpython/docs/source/migration_guide/index.rst (diff)
The file was modifiedpython/docs/source/getting_started/index.rst (diff)
Commit 612d52315b8476dd588d75ce3001dee5786db747 by gurwls223
[SPARK-34500][DOCS][EXAMPLES] Replace symbol literals with $"" in examples and documents

### What changes were proposed in this pull request?

This PR replaces all the occurrences of symbol literals (`'name`) with string interpolation (`$"name"`) in examples and documents.

### Why are the changes needed?

Symbol literals are used to represent columns in Spark SQL but the Scala community seems to remove `Symbol` completely.
As we discussed in #31569, first we should replacing symbol literals with `$"name"` in user facing examples and documents.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Build docs.

Closes #31615 from sarutak/replace-symbol-literals-in-doc-and-examples.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 612d523)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Column.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Dataset.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/expressions/Window.scala (diff)
The file was modifieddocs/sql-data-sources-avro.md (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala (diff)
The file was modifiedexamples/src/main/scala/org/apache/spark/examples/sql/SimpleTypedAggregator.scala (diff)
Commit be675a052c38a36ce5e33ba56bdc69cc8972b3e8 by wenchen
[SPARK-34490][SQL] Analysis should fail if the view refers a dropped table

### What changes were proposed in this pull request?
When resolving a view, we use the captured view name in `AnalysisContext` to
distinguish whether a relation name is a view or a table. But if the resolution failed,
other rules (e.g. `ResolveTables`) will try to resolve the relation again but without
`AnalysisContext`. So, in this case, the resolution may be incorrect. For example,
if the view refers to a dropped table while a view with the same name exists, the
dropped table will be resolved as a view rather than an unresolved exception.

### Why are the changes needed?
bugfix

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
newly added test cases

Closes #31606 from linhongliu-db/fix-temp-view-master.

Lead-authored-by: Linhong Liu <linhong.liu@databricks.com>
Co-authored-by: Linhong Liu <67896261+linhongliu-db@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: be675a0)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TableLookupCacheSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
Commit 8f994cbb4a18558c2e81516ef1e339d9c8fa0d41 by wenchen
[SPARK-34475][SQL] Rename logical nodes of v2 `ALTER` commands

### What changes were proposed in this pull request?
In the PR, I propose to rename logical nodes of v2 commands in the form: `<verb> + <object>` like:
- AlterTableAddPartition -> AddPartition
- AlterTableSetLocation -> SetTableLocation

### Why are the changes needed?
1. For simplicity and readability of logical plans
2. For consistency with other logical nodes. For example, the logical node `RenameTable` for `ALTER TABLE .. RENAME TO` was added before `AlterTableRenamePartition`.

### Does this PR introduce _any_ user-facing change?
Should not since this is non-public APIs.

### How was this patch tested?
1. Check scala style: `./dev/scalastyle`
2. Affected test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRenamePartitionSuite"
```

Closes #31596 from MaxGekk/rename-alter-table-logic-nodes.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 8f994cb)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableDropPartitionParserSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableProperties.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableAddPartitionParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableRenamePartitionParserSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolvePartitionSpec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableRecoverPartitionsParserSuite.scala (diff)
Commit 429f8af9b683935151c2379bc80b27162cd1c8bf by wenchen
Revert "[SPARK-34380][SQL] Support ifExists for ALTER TABLE ... UNSET TBLPROPERTIES for v2 command"

This reverts commit 9a566f83a0e126742473574476c6381f58394aed.
(commit: 429f8af)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was removedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableProperties.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
Commit 443139b601ca87cb0e0c9c2f906d4d4c1e624e35 by viirya
[SPARK-34502][SQL] Remove unused parameters in join methods

### What changes were proposed in this pull request?

Remove unused parameters in `CoalesceBucketsInJoin`, `UnsafeCartesianRDD` and `ShuffledHashJoinExec`.

### Why are the changes needed?
Clean up

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests

Closes #31617 from huaxingao/join-minor.

Authored-by: Huaxin Gao <huaxing@us.ibm.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: 443139b)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/joins/CartesianProductExec.scala (diff)
Commit 0d5d248bdc4cdc71627162a3d20c42ad19f24ef4 by dhyun
[SPARK-34508][SQL][TEST] Skip HiveExternalCatalogVersionsSuite if network is down

### What changes were proposed in this pull request?

It's possible that the network is down when running Spark tests, and it's annoying to see `HiveExternalCatalogVersionsSuite` keep failing.

This PR proposes to skip this test suite if we can't get the latest Spark version from the Apache website.

### Why are the changes needed?

Make the Spark tests more robust.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #31627 from cloud-fan/test.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 0d5d248)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala (diff)
Commit 95e45c6257a614754e132f92b7b7239573d42b7a by dhyun
[SPARK-34168][SQL][FOLLOWUP] Improve DynamicPartitionPruningSuiteBase

### What changes were proposed in this pull request?

A few minor improvements for `DynamicPartitionPruningSuiteBase`.

### Why are the changes needed?

code cleanup

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #31625 from cloud-fan/followup.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 95e45c6)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala (diff)
Commit 7f27d33a3c538da6754a6c011b29aa7eb0dafe2c by dhyun
[SPARK-31891][SQL] Support `MSCK REPAIR TABLE .. [{ADD|DROP|SYNC} PARTITIONS]`

### What changes were proposed in this pull request?

In the PR, I propose to extend the `MSCK REPAIR TABLE` command, and support new options `{ADD|DROP|SYNC} PARTITIONS`. In particular:

1. Extend the logical node `RepairTable`, and add two new flags `enableAddPartitions` and `enableDropPartitions`.
2. Add similar flags to the v1 execution node `AlterTableRecoverPartitionsCommand`
3. Add new method `dropPartitions()` to `AlterTableRecoverPartitionsCommand` which drops partitions from the catalog if their locations in the file system don't exist.
4. Updated public docs about the `MSCK REPAIR TABLE` command:
<img width="1037" alt="Screenshot 2021-02-16 at 13 46 39" src="https://user-images.githubusercontent.com/1580697/108052607-7446d280-705d-11eb-8e25-7398254787a4.png">

Closes #31097

### Why are the changes needed?
- The changes allow to recover tables with removed partitions. The example below portraits the problem:
```sql
spark-sql> create table tbl2 (col int, part int) partitioned by (part);
spark-sql> insert into tbl2 partition (part=1) select 1;
spark-sql> insert into tbl2 partition (part=0) select 0;
spark-sql> show table extended like 'tbl2' partition (part = 0);
default tbl2 false Partition Values: [part=0]
Location: file:/Users/maximgekk/proj/apache-spark/spark-warehouse/tbl2/part=0
...
```
Remove the partition (part = 0) from the filesystem:
```
$ rm -rf /Users/maximgekk/proj/apache-spark/spark-warehouse/tbl2/part=0
```
Even after recovering, we cannot query the table:
```sql
spark-sql> msck repair table tbl2;
spark-sql> select * from tbl2;
21/01/08 22:49:13 ERROR SparkSQLDriver: Failed in [select * from tbl2]
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/Users/maximgekk/proj/apache-spark/spark-warehouse/tbl2/part=0
```

- To have feature parity with Hive: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)

### Does this PR introduce _any_ user-facing change?
Yes. After the changes, we can query recovered table:
```sql
spark-sql> msck repair table tbl2 sync partitions;
spark-sql> select * from tbl2;
1 1
spark-sql> show partitions tbl2;
part=1
```

### How was this patch tested?
- By running the modified test suite:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *MsckRepairTableParserSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *PlanResolutionSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRecoverPartitionsSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRecoverPartitionsParallelSuite"
```
- Added unified v1 and v2 tests for `MSCK REPAIR TABLE`:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *MsckRepairTableSuite"
```

Closes #31499 from MaxGekk/repair-table-drop-partitions.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 7f27d33)
The file was modifieddocs/sql-ref-syntax-ddl-repair-table.md (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/MsckRepairTableSuiteBase.scala
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala (diff)
The file was modifieddocs/sql-ref-ansi-compliance.md (diff)
The file was addedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/MsckRepairTableSuite.scala
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/MsckRepairTableSuite.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLCommandTestUtils.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/MsckRepairTableParserSuite.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/MsckRepairTableSuite.scala
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableAddPartitionSuite.scala (diff)
Commit 2e31e2c5f30742c312767f26b17396c4ecfbef72 by dhyun
[SPARK-34503][CORE] Use zstd for spark.eventLog.compression.codec by default

### What changes were proposed in this pull request?

Apache Spark 3.0 introduced `spark.eventLog.compression.codec` configuration.
For Apache Spark 3.2, this PR aims to set `zstd` as the default value for `spark.eventLog.compression.codec` configuration.
This only affects creating a new log file.

### Why are the changes needed?

The main purpose of event logs is archiving. Many logs are generated and occupy the storage, but most of them are never accessed by users.

**1. Save storage resources (and money)**

In general, ZSTD is much smaller than LZ4.
For example, in case of TPCDS (Scale 200) log, ZSTD generates about 3 times smaller log files than LZ4.

| CODEC | SIZE (bytes) |
|---------|-------------|
| LZ4         | 184001434|
| ZSTD      |  64522396|

And, the plain file is 17.6 times bigger.
```
-rw-r--r--    1 dongjoon  staff  1135464691 Feb 21 22:31 spark-a1843ead29834f46b1125a03eca32679
-rw-r--r--    1 dongjoon  staff    64522396 Feb 21 22:31 spark-a1843ead29834f46b1125a03eca32679.zstd
```

**2. Better Usability**

We cannot decompress Spark-generated LZ4 event log files via CLI while we can for ZSTD event log files. Spark's LZ4 event log files are inconvenient to some users who want to uncompress and access them.
```
$ lz4 -d spark-d3deba027bd34435ba849e14fc2c42ef.lz4
Decoding file spark-d3deba027bd34435ba849e14fc2c42ef
Error 44 : Unrecognized header : file cannot be decoded
```
```
$ zstd -d spark-a1843ead29834f46b1125a03eca32679.zstd
spark-a1843ead29834f46b1125a03eca32679.zstd: 1135464691 bytes
```

**3. Speed**
The following results are collected by running [lzbench](https://github.com/inikep/lzbench) on the above Spark event log. Note that
- This is not a direct comparison of Spark compression/decompression codec.
- `lzbench` is an in-memory benchmark. So, it doesn't show the benefit of the reduced network traffic due to the small size of ZSTD.

Here,
- To get ZSTD 1.4.8-1 result, `lzbench` `master` branch is used because Spark is using ZSTD 1.4.8.
- To get LZ4 1.7.5 result, `lzbench` `v1.7` branch is used because Spark is using LZ4 1.7.1.
```
Compressor name      Compress. Decompress. Compr. size  Ratio Filename
memcpy               7393 MB/s  7166 MB/s  1135464691 100.00 spark-a1843ead29834f46b1125a03eca32679
zstd 1.4.8 -1        1344 MB/s  3351 MB/s    56665767   4.99 spark-a1843ead29834f46b1125a03eca32679
lz4 1.7.5            1385 MB/s  4782 MB/s   127662168  11.24 spark-a1843ead29834f46b1125a03eca32679
```

### Does this PR introduce _any_ user-facing change?

- No for the apps which doesn't use `spark.eventLog.compress` because `spark.eventLog.compress` is disabled by default.
- No for the apps using `spark.eventLog.compression.codec` explicitly because this is a change of the default value.
- Yes for the apps using `spark.eventLog.compress` without setting `spark.eventLog.compression.codec`. In this case, previously `spark.io.compression.codec` value was used whose default is `lz4`.

So this JIRA issue, SPARK-34503, is labeled with `releasenotes`.

### How was this patch tested?

Pass the updated UT.

Closes #31618 from dongjoon-hyun/SPARK-34503.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 2e31e2c)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/history/EventLogFileWritersSuite.scala (diff)
The file was modifieddocs/core-migration-guide.md (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
The file was modifieddocs/configuration.md (diff)
Commit b5afff59fa389b30312914ff141e97d5bc511359 by yumwang
[SPARK-26138][SQL] Pushdown limit through InnerLike when condition is empty

### What changes were proposed in this pull request?

This pr pushdown limit through InnerLike when condition is empty(Origin pr: #23104). For example:
```sql
CREATE TABLE t1 using parquet AS SELECT id AS a, id AS b FROM range(2);
CREATE TABLE t2 using parquet AS SELECT id AS d FROM range(2);
SELECT * FROM t1 CROSS JOIN t2 LIMIT 10;
```
Before this pr:
```
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- CollectLimit 10
   +- BroadcastNestedLoopJoin BuildRight, Cross
      :- FileScan parquet default.t1[a#5L,b#6L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/tg/f5mz46090wg7swzgdc69f8q03965_0/T/warehous..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint>
      +- BroadcastExchange IdentityBroadcastMode, [id=#43]
         +- FileScan parquet default.t2[d#7L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/tg/f5mz46090wg7swzgdc69f8q03965_0/T/warehous..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<d:bigint>
```
After this pr:
```
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- CollectLimit 10
   +- BroadcastNestedLoopJoin BuildRight, Cross
      :- LocalLimit 10
      :  +- FileScan parquet default.t1[a#5L,b#6L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/tg/f5mz46090wg7swzgdc69f8q03965_0/T/warehous..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint>
      +- BroadcastExchange IdentityBroadcastMode, [id=#51]
         +- LocalLimit 10
            +- FileScan parquet default.t2[d#7L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/private/var/folders/tg/f5mz46090wg7swzgdc69f8q03965_0/T/warehous..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<d:bigint>
```

### Why are the changes needed?

Improve query performance.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #31567 from wangyum/SPARK-26138.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(commit: b5afff5)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LimitPushdownSuite.scala (diff)
Commit a6dcd5544dd7c21da1b93b43e5d3d7b67d097dc7 by gurwls223
[MINOR][DOCS][K8S] Use hadoop-aws 3.2.2 in K8s example

### What changes were proposed in this pull request?

This PR aims to update `Hadoop` dependency in K8S doc example.

### Why are the changes needed?

Apache Spark 3.2.0 is using Apache Hadoop 3.2.2 by default.

### Does this PR introduce _any_ user-facing change?

No. This is a doc-only change.

### How was this patch tested?

N/A

Closes #31628 from dongjoon-hyun/minor-doc.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: a6dcd55)
The file was modifieddocs/running-on-kubernetes.md (diff)
Commit 80bad086c806fd507b1fb197b171f87333f2fb08 by gurwls223
Revert "[SPARK-32703][SQL] Replace deprecated API calls from SpecificParquetRecordReaderBase"

This reverts commit 27873280ffbd73be6df230b4497701794ac81d91.
(commit: 80bad08)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala (diff)
Commit f542ecdb0d968af9ef66b1ec7270767f4ec42c41 by wenchen
[SPARK-34245][CORE] Ensure Master removes executors that failed to send finished state

### What changes were proposed in this pull request?

Use `ask` instead of `send` to sync the `ExecutorStateChanged` between Worker and Master and retry(up to 5 times) on the failure until the message is successfully handled by the Master. And the Worker would exit itself if the message can not be sent after 5 times retry.

### Why are the changes needed?

If the Worker fails to send ExecutorStateChanged to the Master due to some unexpected errors, e.g., temporary network error, then the Master can't remove the finished executor normally and think the executor is still alive. In the worst case, if the executor is the only executor for the application, the application can get hang.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass existing tests.

Closes #31348 from Ngone51/periodically-trigger-master-schedule.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: f542ecd)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/worker/Worker.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/master/Master.scala (diff)
Commit f64fc224665a3dd1c1581fc1966cf9924be156db by wenchen
[SPARK-34290][SQL] Support v2 `TRUNCATE TABLE`

### What changes were proposed in this pull request?
Implement the v2 execution node for the `TRUNCATE TABLE` command.

### Why are the changes needed?
To have feature parity with DS v1, and support truncation of v2 tables.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
By running the unified tests for v1 and v2 tables:
```
$ build/sbt -Phive -Phive-thriftserver "test:testOnly *TruncateTableSuite"
```

Closes #31605 from MaxGekk/truncate-table-v2.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: f64fc22)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/TruncateTableSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolvePartitionSpec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/TruncateTableSuiteBase.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/TruncateTableSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/TruncateTableParserSuite.scala (diff)
Commit 5d9cfd727c21f0f50f92c0236cdeb20f3b9111dc by gengliang.wang
[SPARK-34246][SQL] New type coercion syntax rules in ANSI mode

### What changes were proposed in this pull request?

In Spark ANSI mode, the type coercion rules are based on the type precedence lists of the input data types.
As per the section "Type precedence list determination" of "ISO/IEC 9075-2:2011
Information technology — Database languages - SQL — Part 2: Foundation (SQL/Foundation)", the type precedence lists of primitive data types are as following:

- Byte: Byte, Short, Int, Long, Decimal, Float, Double
- Short: Short, Int, Long, Decimal, Float, Double
- Int: Int, Long, Decimal, Float, Double
- Long: Long, Decimal, Float, Double
- Decimal: Any wider Numeric type
- Float: Float, Double
- Double: Double
- String: String
- Date: Date, Timestamp
- Timestamp: Timestamp
- Binary: Binary
- Boolean: Boolean
- Interval: Interval

As for complex data types, Spark will determine the precedent list recursively based on their sub-types.

With the definition of type precedent list, the general type coercion rules are as following:
- Data type S is allowed to be implicitly cast as type T iff T is in the precedence list of S
- Comparison is allowed iff the data type precedence list of both sides has at least one common element. When evaluating the comparison, Spark casts both sides as the tightest common data type of their precedent lists.
- There should be at least one common data type among all the children's precedence lists for the following operators. The data type of the operator is the tightest common precedent data type.
```
In, Except(odd), Intersect, Greatest, Least, Union, If, CaseWhen, CreateArray, Array Concat,Sequence, MapConcat, CreateMap
```

- For complex types (struct, array, map), Spark recursively looks into the element type and applies the rules above. If the element nullability is converted from true to false, add runtime null check to the elements.

Note: this new type coercion system will allow implicit converting String type literals as other primitive types, in case of breaking too many existing Spark SQL queries. This is a special rule and it is not from the ANSI SQL standard.
### Why are the changes needed?

The current type coercion rules are complex. Also, they are very hard to describe and understand. For details please refer the attached documentation "Default Type coercion rules of Spark"
[Default Type coercion rules of Spark.pdf](https://github.com/apache/spark/files/5874362/Default.Type.coercion.rules.of.Spark.pdf)

This PR is to create a new and strict type coercion system under ANSI mode. The rules are simple and clean, so that users can follow them easily

### Does this PR introduce _any_ user-facing change?

Yes,  new implicit cast syntax rules in ANSI mode. All the details are in the first section of this description.

### How was this patch tested?

Unit tests

Closes #31349 from gengliangwang/ansiImplicitConversion.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
(commit: 5d9cfd7)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/text.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/postgreSQL/with.sql (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/float4.sql.out (diff)
The file was addedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/union.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/interval.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/timestamp.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/with.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/strings.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableValuedFunctions.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/datetime.sql.out (diff)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercionSuite.scala
Commit 714ff73d4aec317fddf32720d5a7a1c283921983 by wenchen
[SPARK-34152][SQL] Make CreateViewStatement.child to be LogicalPlan's children so that it's resolved in analyze phase

### What changes were proposed in this pull request?

This PR proposes to make `CreateViewStatement.child` to be `LogicalPlan`'s `children` so that it's resolved in the analyze phase.

### Why are the changes needed?

Currently, the `CreateViewStatement.child` is resolved when the create view command runs, which is inconsistent with other plan resolutions. For example, you may see the following in the physical plan:
```
== Physical Plan ==
Execute CreateViewCommand (1)
   +- CreateViewCommand (2)
         +- Project (4)
            +- UnresolvedRelation (3)
```

### Does this PR introduce _any_ user-facing change?

Yes. For the example, you will now see the resolved plan:
```
== Physical Plan ==
Execute CreateViewCommand (1)
   +- CreateViewCommand (2)
         +- Project (5)
            +- SubqueryAlias (4)
               +- LogicalRelation (3)
```

### How was this patch tested?

Updated existing tests.

Closes #31273 from imback82/spark-34152.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 714ff73)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveHintsSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisTest.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/explain.sql.out (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
Commit 14934f42d066ca147df508b0a97bfa03223046ba by wenchen
[SPARK-33599][SQL][FOLLOWUP] Group exception messages in catalyst/analysis

### What changes were proposed in this pull request?
This PR follows up https://github.com/apache/spark/pull/30717
Maybe some contributors don't know the job and added some exception by the old way.

### Why are the changes needed?
It will largely help with standardization of error messages and its maintenance.

### Does this PR introduce _any_ user-facing change?
No. Error messages remain unchanged.

### How was this patch tested?
No new tests - pass all original tests to make sure it doesn't break any existing behavior.

Closes #31316 from beliefer/SPARK-33599-followup.

Lead-authored-by: beliefer <beliefer@163.com>
Co-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 14934f4)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveHintsSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala (diff)
Commit 6ef57d31cde110d9740ba6fb646818388feb8054 by wenchen
[SPARK-34514][SQL] Push down limit for LEFT SEMI and LEFT ANTI join

### What changes were proposed in this pull request?

I found out during code review of https://github.com/apache/spark/pull/31567#discussion_r577379572, where we can push down limit to the left side of LEFT SEMI and LEFT ANTI join, if the join condition is empty.

Why it's safe to push down limit:

The semantics of LEFT SEMI join without condition:
(1). if right side is non-empty, output all rows from left side.
(2). if right side is empty, output nothing.

The semantics of LEFT ANTI join without condition:
(1). if right side is non-empty, output nothing.
(2). if right side is empty, output all rows from left side.

With the semantics of output all rows from left side or nothing (all or nothing), it's safe to push down limit to left side.
NOTE: LEFT SEMI / LEFT ANTI join with non-empty condition is not safe for limit push down, because output can be a portion of left side rows.

Reference: physical operator implementation for LEFT SEMI / LEFT ANTI join without condition - https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala#L200-L204 .

### Why are the changes needed?

Better performance. Save CPU and IO for these joins, as limit being pushed down before join.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added unit test in `LimitPushdownSuite.scala` and `SQLQuerySuite.scala`.

Closes #31630 from c21/limit-pushdown.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 6ef57d3)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LimitPushdownSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
Commit 87409c42bcca5d9b73b6a472017c5dd65da0718d by gurwls223
[SPARK-31891][SQL][DOCS][FOLLOWUP] Fix typo in the description of `MSCK REPAIR TABLE`

### What changes were proposed in this pull request?
Fix typo and highlight that `ADD PARTITIONS` is the default.

### Why are the changes needed?
Fix a typo which can mislead users.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
n/a

Closes #31633 from MaxGekk/repair-table-drop-partitions-followup.

Lead-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 87409c4)
The file was modifieddocs/sql-ref-syntax-ddl-repair-table.md (diff)
Commit 999d3b89b6df14a5ccb94ffc2ffadb82964e9f7d by wenchen
[SPARK-34515][SQL] Fix NPE if InSet contains null value during getPartitionsByFilter

### What changes were proposed in this pull request?

Skip null value during rewrite `InSet` to `>= and <=` at getPartitionsByFilter.

### Why are the changes needed?

Spark will convert `InSet` to `>= and <=` if it's values size over `spark.sql.hive.metastorePartitionPruningInSetThreshold` during pruning partition . At this case, if values contain a null, we will get such exception 
 
```
java.lang.NullPointerException
at org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:1389)
at org.apache.spark.unsafe.types.UTF8String.compareTo(UTF8String.java:50)
at scala.math.LowPriorityOrderingImplicits$$anon$3.compare(Ordering.scala:153)
at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
at java.util.TimSort.sort(TimSort.java:220)
at java.util.Arrays.sort(Arrays.java:1438)
at scala.collection.SeqLike.sorted(SeqLike.scala:659)
at scala.collection.SeqLike.sorted$(SeqLike.scala:647)
at scala.collection.AbstractSeq.sorted(Seq.scala:45)
at org.apache.spark.sql.hive.client.Shim_v0_13.convert$1(HiveShim.scala:772)
at org.apache.spark.sql.hive.client.Shim_v0_13.$anonfun$convertFilters$4(HiveShim.scala:826)
at scala.collection.immutable.Stream.flatMap(Stream.scala:489)
at org.apache.spark.sql.hive.client.Shim_v0_13.convertFilters(HiveShim.scala:826)
at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:848)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:750)
```

### Does this PR introduce _any_ user-facing change?

Yes, bug fix.

### How was this patch tested?

Add test.

Closes #31632 from ulysses-you/SPARK-34515.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 999d3b8)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/client/FiltersSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala (diff)
Commit b17754a8cbd2593eb2b1952e95a7eeb0f8e09cdb by hkarau
[SPARK-32617][K8S][TESTS] Configure kubernetes client based on kubeconfig settings in kubernetes integration tests

### What changes were proposed in this pull request?

From [minikube version v1.1.0](https://github.com/kubernetes/minikube/blob/v1.1.0/CHANGELOG.md) kubectl is available as a command. So the kubeconfig settings can be accessed like:

```
$ minikube kubectl config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority: /Users/attilazsoltpiros/.minikube/ca.crt
    server: https://127.0.0.1:32788
  name: minikube
contexts:
- context:
    cluster: minikube
    namespace: default
    user: minikube
  name: minikube
current-context: minikube
kind: Config
preferences: {}
users:
- name: minikube
  user:
    client-certificate: /Users/attilazsoltpiros/.minikube/profiles/minikube/client.crt
    client-key: /Users/attilazsoltpiros/.minikube/profiles/minikube/client.key
```

Here the vm-driver was docker and the server port (https://127.0.0.1:32788) is different from the hardcoded 8443.

So the main part of this PR is introducing kubernetes client configuration based on the kubeconfig (output of `minikube kubectl config view`) in case of minikube versions after v1.1.0 and the old legacy way of configuration is also kept as minikube version should be supported back to v0.34.1 .

Moreover as the old style of config parsing pattern wasn't sufficient in my case as when the `minikube kubectl config view` is called kubectl downloading message might be included before the first key I changed it even for the existent keys to be a consistent pattern in this file.

The old parsing in an example:
```
private val HOST_PREFIX = "host:"

val hostString = statusString.find(_.contains(s"$HOST_PREFIX "))

val status1 = hostString.get.split(HOST_PREFIX)(1)
```

The new parsing:
```
private val HOST_PREFIX = "host: "

val hostString = statusString.find(_.contains(HOST_PREFIX))

hostString.get.split(HOST_PREFIX)(1)
```

So the PREFIX is extended with the extra space at the declaration (this way the two separate string operation are more safe and consistent with each other) and the replace is changed to split and getting the 2nd string from the result (which is guaranteed to contain only the text after the PREFIX when the PREFIX is a contained substring).

Finally there is tiny change in `dev-run-integration-tests.sh` to introduce `--skip-building-dependencies` which switchs off building of maven dependencies of `kubernetes-integration-tests` from the Spark project.
This could be used when only the `kubernetes-integration-tests` should be rebuilded as only the tests are modified.

### Why are the changes needed?

Kubernetes client configuration based on kubeconfig settings is more reliable and provides a solution which is minikube version independent.

### Does this PR introduce _any_ user-facing change?

No. This is only test code.

### How was this patch tested?

tested manually on two minikube versions.

Minikube  v0.34.1:

```
$ minikube version
minikube version: v0.34.1

$ grep "version\|building" resource-managers/kubernetes/integration-tests/target/integration-tests.log
20/12/12 12:52:25.135 ScalaTest-main-running-DiscoverySuite INFO Minikube: minikube version: v0.34.1
20/12/12 12:52:25.761 ScalaTest-main-running-DiscoverySuite INFO Minikube: building kubernetes config with apiVersion: v1, masterUrl: https://192.168.99.103:8443, caCertFile: /Users/attilazsoltpiros/.minikube/ca.crt, clientCertFile: /Users/attilazsoltpiros/.minikube/apiserver.crt, clientKeyFile: /Users/attilazsoltpiros/.minikube/apiserver.key
```

Minikube v1.15.1
```
$ minikube version

minikube version: v1.15.1
commit: 23f40a012abb52eff365ff99a709501a61ac5876

$ grep "version\|building" resource-managers/kubernetes/integration-tests/target/integration-tests.log

20/12/13 06:25:55.086 ScalaTest-main-running-DiscoverySuite INFO Minikube: minikube version: v1.15.1
20/12/13 06:25:55.597 ScalaTest-main-running-DiscoverySuite INFO Minikube: building kubernetes config with apiVersion: v1, masterUrl: https://192.168.64.4:8443, caCertFile: /Users/attilazsoltpiros/.minikube/ca.crt, clientCertFile: /Users/attilazsoltpiros/.minikube/profiles/minikube/client.crt, clientKeyFile: /Users/attilazsoltpiros/.minikube/profiles/minikube/client.key

$ minikube kubectl config view
apiVersion: v1
clusters:
- cluster:
    certificate-authority: /Users/attilazsoltpiros/.minikube/ca.crt
    server: https://192.168.64.4:8443
  name: minikube
contexts:
- context:
    cluster: minikube
    namespace: default
    user: minikube
  name: minikube
current-context: minikube
kind: Config
preferences: {}
users:
- name: minikube
  user:
    client-certificate: /Users/attilazsoltpiros/.minikube/profiles/minikube/client.crt
    client-key: /Users/attilazsoltpiros/.minikube/profiles/minikube/client.key
```

Closes #30751 from attilapiros/SPARK-32617.

Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
(commit: b17754a)
The file was modifiedresource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala (diff)
Commit 44eadb943bbcec48e90398731f57a32a967d81bb by gurwls223
[SPARK-34497][SQL] Fix built-in JDBC connection providers to restore JVM security context changes

### What changes were proposed in this pull request?
Some of the built-in JDBC connection providers are changing the JVM security context to do the authentication which is fine. The problematic part is that executors can be reused by another query. The following situation leads to incorrect behaviour:
* Query1 opens JDBC connection and changes JVM security context in Executor1
* Query2 tries to open JDBC connection but it realizes there is already an entry for that DB type in Executor1
* Query2 is not changing JVM security context and uses Query1 keytab and principal
* Query2 fails with authentication error

In this PR I've changed to code such a way that JVM security context is changed all the time but only temporarily until the connection built-up and then rolled back. Since `getConnection` is synchronised with `SecurityConfigurationLock` it ends-up in correct behaviour without any race.

### Why are the changes needed?
Incorrect JVM security context handling.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing unit + integration tests.

Closes #31622 from gaborgsomogyi/SPARK-34497.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 44eadb9)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MSSQLConnectionProvider.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/SecureConnectionProvider.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/PostgresConnectionProviderSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/PostgresConnectionProvider.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProviderSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProviderSuiteBase.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MariaDBConnectionProviderSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/OracleConnectionProviderSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MariaDBConnectionProvider.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/OracleConnectionProvider.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/ConnectionProvider.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/DB2ConnectionProvider.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/DB2ConnectionProviderSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/jdbc/connection/MSSQLConnectionProviderSuite.scala (diff)
Commit 22383e312d00ec8888cdc2d12750b7d1e7e21d99 by dhyun
[SPARK-34531][CORE] Remove Experimental API tag in PrometheusServlet

### What changes were proposed in this pull request?

The endpoints of Prometheus metrics are properly marked and documented as an experimental (SPARK-31674). The class `PrometheusServlet` itself is not the part of an API so this PR proposes to remove it.

### Why are the changes needed?

To avoid marking a non-API as an API.

### Does this PR introduce _any_ user-facing change?

No, the class is already `private[spark]`.

### How was this patch tested?

Existing tests should cover.

Closes #31640 from HyukjinKwon/SPARK-34531.

Lead-authored-by: HyukjinKwon <gurwls223@apache.org>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 22383e3)
The file was modifiedcore/src/main/scala/org/apache/spark/metrics/sink/PrometheusServlet.scala (diff)
The file was modifieddocs/monitoring.md (diff)
Commit 8a1e172b513ba58763336de83f94e00ceaa69255 by dhyun
[SPARK-34520][CORE] Remove unused SecurityManager references

### What changes were proposed in this pull request?

This is kind of a followup of https://github.com/apache/spark/pull/24033 and https://github.com/apache/spark/pull/30945.
Many of references in `SecurityManager` were introduced from SPARK-1189, and related usages were removed later from https://github.com/apache/spark/pull/24033 and https://github.com/apache/spark/pull/30945. This PR proposes to remove them out.

### Why are the changes needed?

For better readability of codes.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually complied. GitHub Actions and Jenkins build should test it out as well.

Closes #31636 from HyukjinKwon/SPARK-34520.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 8a1e172)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/ExternalShuffleService.scala (diff)
The file was modifiedstreaming/src/test/scala/org/apache/spark/streaming/ReceivedBlockHandlerSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/metrics/MetricsSystemSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/master/Master.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/metrics/sink/MetricsServlet.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/metrics/sink/GraphiteSinkSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/metrics/sink/PrometheusServlet.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/metrics/sink/PrometheusServletSuite.scala (diff)
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/metrics/sink/CsvSink.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/broadcast/TorrentBroadcastFactory.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/metrics/sink/Slf4jSink.scala (diff)
The file was modifiedresource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/metrics/sink/StatsdSink.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/broadcast/BroadcastManager.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/BlockManagerReplicationSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/metrics/sink/ConsoleSink.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/SparkEnv.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/broadcast/BroadcastFactory.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/metrics/sink/JmxSink.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/worker/Worker.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/metrics/sink/StatsdSinkSuite.scala (diff)
Commit c56af69cdf3cc68821e69fa4ef0213b5cc281ab0 by wenchen
[SPARK-34518][SQL] Rename `AlterTableRecoverPartitionsCommand` to `RepairTableCommand`

### What changes were proposed in this pull request?
Rename the execution node `AlterTableRecoverPartitionsCommand` for the commands:
- `MSCK REPAIR TABLE table [{ADD|DROP|SYNC} PARTITIONS]`
- `ALTER TABLE table RECOVER PARTITIONS`

to `RepairTableCommand`.

### Why are the changes needed?
1. After the PR https://github.com/apache/spark/pull/31499, `ALTER TABLE table RECOVER PARTITIONS` is equal to `MSCK REPAIR TABLE table ADD PARTITIONS`. And mapping of the generic command `MSCK REPAIR TABLE` to the more specific execution node `AlterTableRecoverPartitionsCommand` can confuse devs in the future.
2. `ALTER TABLE table RECOVER PARTITIONS` does not support any options/extensions. So, additional parameters `enableAddPartitions` and `enableDropPartitions` in `AlterTableRecoverPartitionsCommand` confuse as well.

### Does this PR introduce _any_ user-facing change?
No because this is internal API.

### How was this patch tested?
By running the existing test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRecoverPartitionsSuite"
$ build/sbt "test:testOnly *AlterTableRecoverPartitionsParserSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *MsckRepairTableSuite"
$ build/sbt "test:testOnly *MsckRepairTableParserSuite"
```

Closes #31635 from MaxGekk/rename-recover-partitions.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: c56af69)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala (diff)
Commit 4a3200b08ac3e7733b5a3dc7271d35e6872c5967 by wenchen
[SPARK-34436][SQL] DPP support LIKE ANY/ALL expression

### What changes were proposed in this pull request?

This pr make DPP support LIKE ANY/ALL expression:
```sql
SELECT date_id, product_id FROM fact_sk f
JOIN dim_store s
ON f.store_id = s.store_id WHERE s.country LIKE ANY ('%D%E%', '%A%B%')
```

### Why are the changes needed?

Improve query performance.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #31563 from wangyum/SPARK-34436.

Lead-authored-by: Yuming Wang <yumwang@apache.org>
Co-authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 4a3200b)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala (diff)