Changes

Summary

  1. [MINOR] Improve flaky NaiveBayes test (commit: 11fac23) (details)
  2. [SPARK-34057][SQL] UnresolvedTableOrView should retain SQL text position (commit: 8391a4a) (details)
  3. [SPARK-33970][SQL][TEST] Add test default partition in (commit: f77eeb0) (details)
  4. [SPARK-34055][SQL][TESTS][FOLLOWUP] Check partition adding to cached (commit: 664ef18) (details)
  5. [SPARK-34060][SQL] Fix Hive table caching while updating stats by `ALTER (commit: d97e991) (details)
  6. [SPARK-31952][SQL] Fix incorrect memory spill metric when doing (commit: 4afca0f) (details)
  7. [SPARK-34065][INFRA] Cancel the duplicated jobs only in PRs at GitHub (commit: ff49317) (details)
  8. [SPARK-33084][CORE][SQL] Rename Unit test file and use fake ivy link (commit: 5ef6907) (details)
  9. [SPARK-33991][CORE][WEBUI] Repair enumeration conversion error for (commit: 1495ad8) (details)
  10. [SPARK-34002][SQL] Fix the usage of encoder in ScalaUDF (commit: 0bcbafb) (details)
  11. [MINOR][SS] Add some description about auto reset and data loss note to (commit: ad9fad7) (details)
  12. [SPARK-33970][SQL][TEST][FOLLOWUP] Use String comparision (commit: 3556929) (details)
  13. [SPARK-33711][K8S] Avoid race condition between POD lifecycle manager (commit: 6bd7a62) (details)
  14. [SPARK-34045][ML] OneVsRestModel.transform should not call setter of (commit: 7ff9ff1) (details)
  15. [SPARK-34074][SQL] Update stats only when table size changes (commit: f7cbeec) (details)
  16. [SPARK-28646][SQL][FOLLOWUP] Add legacy config for allowing (commit: 02a17e9) (details)
  17. [SPARK-34003][SQL][FOLLOWUP] Avoid pushing modified Char/Varchar sort (commit: 99f8489) (details)
  18. [SPARK-34053][INFRA][FOLLOW-UP] Disables canceling push/schedule (commit: a4b7075) (details)
  19. [SPARK-34084][SQL] Fix auto updating of table stats in `ALTER TABLE .. (commit: 6c04795) (details)
  20. [SPARK-34091][SQL] Shuffle batch fetch should be able to disable after (commit: 0099715) (details)
  21. [SPARK-34069][CORE] Kill barrier tasks should respect (commit: 65222b7) (details)
  22. [SPARK-32338][SQL][PYSPARK][FOLLOW-UP][TEST] Add more tests for slice (commit: ad8e40e) (details)
  23. [SPARK-34090][SS] Cache HadoopDelegationTokenManager.isServiceEnabled (commit: b0759dc) (details)
  24. [SPARK-34071][SQL][TESTS] Check stats of cached v1 tables after altering (commit: 861f8bb) (details)
  25. [SPARK-34086][SQL] RaiseError generates too much code and may fails (commit: 04f031a) (details)
  26. [SPARK-32850][TEST][FOLLOWUP] Fix flaky test due to timeout (commit: f64297d) (details)
  27. [SPARK-34064][SQL] Cancel the running broadcast sub-jobs when SQL (commit: f1b21ba) (details)
  28. [SPARK-34076][SQL] SQLContext.dropTempTable fails if cache is non-empty (commit: 62d82b5) (details)
  29. [SPARK-33741][CORE] Add min threshold time speculation config (commit: bd5039f) (details)
  30. [SPARK-34070][CORE][SQL] Replaces find and emptiness check with exists (commit: 8c5fecd) (details)
  31. [SPARK-34068][CORE][SQL][MLLIB][GRAPHX] Remove redundant collection (commit: 8b1ba23) (details)
  32. [SPARK-34051][SQL] Support 32-bit unicode escape in string literals (commit: 62d8466) (details)
  33. [SPARK-33690][SQL][FOLLOWUP] Escape further meta-characters in (commit: b7da108) (details)
  34. [SPARK-34103][INFRA] Fix MiMaExcludes by moving SPARK-23429 from 2.4 to (commit: 9e93fdb) (details)
  35. [SPARK-34075][SQL][CORE] Hidden directories are being listed for (commit: 467d758) (details)
  36. [SPARK-34081][SQL] Only pushdown LeftSemi/LeftAnti over Aggregate if (commit: d3ea308) (details)
  37. [SPARK-34067][SQL] PartitionPruning push down pruningHasBenefit function (commit: 1288ad8) (details)
  38. [SPARK-33989][SQL] Strip auto-generated cast when using Cast.sql (commit: 92e5cfd) (details)
  39. [SPARK-33354][FOLLOWUP][DOC] Shorten the table width of ANSI compliance (commit: feedd1b) (details)
  40. [SPARK-34116][SS][TEST] Separate state store numKeys metric test and (commit: 0e64a22) (details)
  41. [SPARK-34118][CORE][SQL] Replaces filter and check for emptiness with (commit: 8ed23ed) (details)
  42. [SPARK-34099][SQL] Keep dependents of v2 tables cached during cache (commit: adba2ec) (details)
  43. [SPARK-34101][SQL] Make spark-sql CLI configurable for the behavior of (commit: bec80d7) (details)
  44. [SPARK-34114][SQL] should not trim right for read-side char length check (commit: acd6c12) (details)
  45. [SPARK-32864][SQL] Support ORC forced positional evolution (commit: 00d43b1) (details)
Commit 11fac232c855c3b5ffb510187240a79f059a2228 by ruifengz
[MINOR] Improve flaky NaiveBayes test

### What changes were proposed in this pull request?
Improve flaky NaiveBayes test

Current test may sometimes fail under different BLAS library. Due to some absTol check. Error like
```
Expected 0.7 and 0.6485507246376814 to be within 0.05 using absolute tolerance...

```

* Change absTol to relTol: The `absTol 0.05` in some cases (such as compare 0.1 and 0.05) is a big difference
* Remove the `exp` when comparing params. The `exp` will amplify the relative error.

### Why are the changes needed?
Flaky test

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
N/A

Closes #31004 from WeichenXu123/improve_bayes_tests.

Authored-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(commit: 11fac23)
The file was modifiedmllib/src/test/scala/org/apache/spark/mllib/classification/NaiveBayesSuite.scala (diff)
Commit 8391a4a687b5f583703acc0ed36883ef3ee8a93f by wenchen
[SPARK-34057][SQL] UnresolvedTableOrView should retain SQL text position for DDL commands

### What changes were proposed in this pull request?

Currently, there are many DDL commands where the position of the unresolved identifiers are incorrect:
```
scala> sql("DROP TABLE unknown")
org.apache.spark.sql.AnalysisException: Table or view not found: unknown; line 1 pos 0;
```
, whereas the `pos` should be `11`.

This PR proposes to fix this issue for commands using `UnresolvedTableOrView`:
```
DROP TABLE unknown
DESCRIBE TABLE unknown
ANALYZE TABLE unknown COMPUTE STATISTICS
ANALYZE TABLE unknown COMPUTE STATISTICS FOR COLUMNS col
ANALYZE TABLE unknown COMPUTE STATISTICS FOR ALL COLUMNS
SHOW CREATE TABLE unknown
REFRESH TABLE unknown
SHOW COLUMNS FROM unknown
SHOW COLUMNS FROM unknown IN db
ALTER TABLE unknown RENAME TO t
ALTER VIEW unknown RENAME TO v
```

### Why are the changes needed?

To fix a bug.

### Does this PR introduce _any_ user-facing change?

Yes, now the above example will print the following:
```
org.apache.spark.sql.AnalysisException: Table or view not found: unknown; line 1 pos 11;
```

### How was this patch tested?

Add a new test.

Closes #31106 from imback82/unresolved_table_or_view_message.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 8391a4a)
The file was modifiedsql/core/src/test/resources/sql-tests/results/show_columns.sql.out (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DropTableParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisExceptionPositionSuite.scala (diff)
Commit f77eeb0451e60b8c4db377538d381e05f7771cf4 by gurwls223
[SPARK-33970][SQL][TEST] Add test default partition in metastoredirectsql

### What changes were proposed in this pull request?

This pr add test default partition in metastoredirectsql.

### Why are the changes needed?

Improve test.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #31109 from wangyum/SPARK-33970.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: f77eeb0)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuite.scala (diff)
Commit 664ef184c157cd0959a61d9716a59046ed065c1b by wenchen
[SPARK-34055][SQL][TESTS][FOLLOWUP] Check partition adding to cached Hive table

### What changes were proposed in this pull request?
Replace `USING parquet` by `$defaultUsing` which is `USING parquet` for v1 In-Memory catalog and `USING hive` for v1 Hive external catalog.

### Why are the changes needed?
The PR https://github.com/apache/spark/pull/31101 added UT test but it checks only v1 In-Memory catalog. This PR runs this test for Hive external catalog as well to improve test coverage.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the affected test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.AlterTableAddPartitionSuite"
```

Closes #31117 from MaxGekk/add-partition-refresh-cache-2-followup-2.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 664ef18)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableAddPartitionSuite.scala (diff)
Commit d97e99157e8d3b7434610fd78af90911c33662c9 by wenchen
[SPARK-34060][SQL] Fix Hive table caching while updating stats by `ALTER TABLE .. DROP PARTITION`

### What changes were proposed in this pull request?
Fix canonicalisation of `HiveTableRelation` by normalisation of `CatalogTable`, and exclude table stats and temporary fields from the canonicalized plan.

### Why are the changes needed?
This fixes the issue demonstrated by the example below:
```scala
scala> spark.conf.set("spark.sql.statistics.size.autoUpdate.enabled", true)
scala> sql(s"CREATE TABLE tbl (id int, part int) USING hive PARTITIONED BY (part)")
scala> sql("INSERT INTO tbl PARTITION (part=0) SELECT 0")
scala> sql("INSERT INTO tbl PARTITION (part=1) SELECT 1")
scala> sql("CACHE TABLE tbl")
scala> sql("SELECT * FROM tbl").show(false)
+---+----+
|id |part|
+---+----+
|0  |0   |
|1  |1   |
+---+----+

scala> spark.catalog.isCached("tbl")
scala> sql("ALTER TABLE tbl DROP PARTITION (part=0)")
scala> spark.catalog.isCached("tbl")
res19: Boolean = false
```
`ALTER TABLE .. DROP PARTITION` must keep the table in the cache.

### Does this PR introduce _any_ user-facing change?
Yes. After the changes, the drop partition command keeps the table in the cache while updating table stats:
```scala
scala> sql("ALTER TABLE tbl DROP PARTITION (part=0)")
scala> spark.catalog.isCached("tbl")
res19: Boolean = true
```

### How was this patch tested?
By running new UT in `AlterTableDropPartitionSuite`.

Closes #31112 from MaxGekk/fix-caching-hive-table-2.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: d97e991)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/AlterTableDropPartitionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ShowCreateTableSuite.scala (diff)
Commit 4afca0f706504ac22e956c498538023dc64b0413 by wenchen
[SPARK-31952][SQL] Fix incorrect memory spill metric when doing Aggregate

### What changes were proposed in this pull request?

This PR takes over https://github.com/apache/spark/pull/28780.

1. Counted the spilled memory size when creating the `UnsafeExternalSorter` with the existing `InMemorySorter`

2. Accumulate the `totalSpillBytes` when merging two `UnsafeExternalSorter`

### Why are the changes needed?

As mentioned in https://github.com/apache/spark/pull/28780:

> It happends when hash aggregate downgrades to sort based aggregate.
`UnsafeExternalSorter.createWithExistingInMemorySorter` calls spill on an `InMemorySorter` immediately, but the memory pointed by `InMemorySorter` is acquired by outside `BytesToBytesMap`, instead the allocatedPages in `UnsafeExternalSorter`. So the memory spill bytes metric is always 0, but disk bytes spill metric is right.

Besides, this PR also fixes the `UnsafeExternalSorter.merge` by accumulating the `totalSpillBytes` of two sorters. Thus, we can report the correct spilled size in `HashAggregateExec.finishAggregate`.

Issues can be reproduced by the following step by checking the SQL metrics in UI:

```
bin/spark-shell --driver-memory 512m --executor-memory 512m --executor-cores 1 --conf "spark.default.parallelism=1"
scala> sql("select id, count(1) from range(10000000) group by id").write.csv("/tmp/result.json")
```

Before:

<img width="200" alt="WeChatfe5146180d91015e03b9a27852e9a443" src="https://user-images.githubusercontent.com/16397174/103625414-e6fc6280-4f75-11eb-8b93-c55095bdb5b8.png">

After:

<img width="200" alt="WeChat42ab0e73c5fbc3b14c12ab85d232071d" src="https://user-images.githubusercontent.com/16397174/103625420-e8c62600-4f75-11eb-8e1f-6f5e8ab561b9.png">

### Does this PR introduce _any_ user-facing change?

Yes, users can see the correct spill metrics after this PR.

### How was this patch tested?

Tested manually and added UTs.

Closes #31035 from Ngone51/SPARK-31952.

Lead-authored-by: yi.wu <yi.wu@databricks.com>
Co-authored-by: wangguangxin.cn <wangguangxin.cn@bytedance.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 4afca0f)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/UnsafeKVExternalSorterSuite.scala (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java (diff)
The file was modifiedcore/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java (diff)
Commit ff493173ab4d8773ce929a322f93e35c3839818c by gurwls223
[SPARK-34065][INFRA] Cancel the duplicated jobs only in PRs at GitHub Actions

### What changes were proposed in this pull request?

This is kind of a followup of https://github.com/apache/spark/pull/31104 but I decided to track it separately with a separate JIRA.

Currently the jobs are being canceled in main repo branches. If a commit is merged, for example, to master branch before the test finishes, it cancels the previous builds. This is a problem because we cannot, for example, detect logical conflict properly. We should only cancel the jobs in PRs:

![Screen Shot 2021-01-11 at 3 22 24 PM](https://user-images.githubusercontent.com/6477701/104152015-c7f04b80-5421-11eb-9e40-6b0a0e5b8442.png)

This PR proposes to don't do this in the main repo branch commits but only do it in PRs.

### Why are the changes needed?

- To keep the test coverage
- To run the test in the synced master branch instead of relying on the builds made in each PR with an outdated master branch
- To detect test failures from logical conflicts from merging two conflicting PRs at the same time.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

I manually tested in
- https://github.com/HyukjinKwon/spark/pull/27
- https://github.com/HyukjinKwon/spark/pull/28

I added Yi Wu as a co-author since he helped verifying the current fix in the PR above.

I checked that it does not cancel in the main repo branch:

![Screen Shot 2021-01-11 at 3 58 52 PM](https://user-images.githubusercontent.com/6477701/104153656-3afbc100-5426-11eb-9309-85f6f4fd9ff3.png)

I checked it cancels in PRs:

![Screen Shot 2021-01-11 at 3 58 45 PM](https://user-images.githubusercontent.com/6477701/104153658-3d5e1b00-5426-11eb-89f7-786c3ae6849a.png)

Closes #31121 from HyukjinKwon/SPARK-34065.

Lead-authored-by: hyukjinkwon <gurwls223@apache.org>
Co-authored-by: yi.wu <yi.wu@databricks.com>
Co-authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: ff49317)
The file was modified.github/workflows/cancel_duplicate_workflow_runs.yml (diff)
Commit 5ef6907792a7511a20001721d003ad582e11ebc6 by gurwls223
[SPARK-33084][CORE][SQL] Rename Unit test file and use fake ivy link

### What changes were proposed in this pull request?
According to https://github.com/apache/spark/pull/29966#discussion_r554514344
Use wrong name about suite file, this pr to fix this problem.
And change to use some fake ivy link for this test

### Why are the changes needed?
Follow file name rule

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
No

Closes #31118 from AngersZhuuuu/SPARK-33084-FOLLOW-UP.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 5ef6907)
The file was removedcore/src/test/scala/org/apache/spark/util/DependencyUtils.scala
The file was modifiedcore/src/test/scala/org/apache/spark/SparkContextSuite.scala (diff)
The file was addedcore/src/test/scala/org/apache/spark/util/DependencyUtilsSuite.scala
Commit 1495ad8c46197916527236331b57dce93aa3b8ec by srowen
[SPARK-33991][CORE][WEBUI] Repair enumeration conversion error for AllJobsPage

### What changes were proposed in this pull request?
For `AllJobsPage `class, `AllJobsPage` gets the schedulingMode of enumerated type by loading the `spark.scheduler.mode `configuration from Sparkconf, but an enumeration conversion error occurs when I set the value of this configuration to lowercase.

The reason for this problem is that the value of the `SchedulingMode `enumeration class is uppercase, which occurs when I configure `spark.scheduler.mode` to be lowercase.

I saw that the `#org.apache.spark.scheduler.TaskSchedulerImpl` class convert the `spark. scheduler.mode` value to uppercase, so I think it should be converted in `AllJobsPage `as well.

### Why are the changes needed?
An enumerated conversion error occurred with Spark when I set the value of this configuration to lowercase.

### How was this patch tested?
Existing tests.

Closes #31015 from yikf/master.

Authored-by: yikf <13468507104@163.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 1495ad8)
The file was modifiedcore/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala (diff)
Commit 0bcbafb4b868234d695b553f215e21fe66e7ff77 by dhyun
[SPARK-34002][SQL] Fix the usage of encoder in ScalaUDF

### What changes were proposed in this pull request?

This patch fixes few issues when using encoders to serialize input/output in `ScalaUDF`.

### Why are the changes needed?

This fixes a bug when using encoders in Scala UDF. First, the output data type should be corrected to the corresponding data type of the object serializer. Second, `catalystConverter` should not serialize `Option[_]` as the ordinary row because in `ScalaUDF` case it is serialized to a column, not the top-level row. Otherwise, there will be a redundant `value` struct wrapping the serialized `Option[_]` object.

### Does this PR introduce _any_ user-facing change?

Yes, fixing a bug of `ScalaUDF`.

### How was this patch tested?

Unit test.

Closes #31103 from viirya/SPARK-34002.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 0bcbafb)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala (diff)
Commit ad9fad72a96eff346be982d598449cbf52019d2c by dhyun
[MINOR][SS] Add some description about auto reset and data loss note to SS doc

### What changes were proposed in this pull request?

This patch adds a few description to SS doc about offset reset and data loss.

### Why are the changes needed?

During recent SS test, the behavior of gradual reducing input rows are confusing me. Comparing with Flink, I do not see a similar behavior. After looking into the code and doing some tests, I feel it is better to add some more description there in SS doc.

### Does this PR introduce _any_ user-facing change?

No, doc only.

### How was this patch tested?

Doc only.

Closes #31089 from viirya/ss-minor-5.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: ad9fad7)
The file was modifieddocs/structured-streaming-kafka-integration.md (diff)
Commit 3556929c43881831f25c28c7dfc1999310f96232 by dhyun
[SPARK-33970][SQL][TEST][FOLLOWUP] Use String comparision

### What changes were proposed in this pull request?

This is a follow-up to replace `version.toDouble > 2` with `version >= "2.0"`

### Why are the changes needed?

`toDouble` has some assumption and can cause `java.lang.NumberFormatException`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #31134 from dongjoon-hyun/SPARK-33970-FOLLOWUP.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 3556929)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/client/HivePartitionFilteringSuite.scala (diff)
Commit 6bd7a6200f8beaab1c68b2469df05870ea788d49 by hkarau
[SPARK-33711][K8S] Avoid race condition between POD lifecycle manager and scheduler backend

### What changes were proposed in this pull request?

Missing POD detection is extended by timestamp (and time limit) based check to avoid wrongfully detection of missing POD detection.

The two new timestamps:
- `fullSnapshotTs` is introduced for the `ExecutorPodsSnapshot` which only updated by the pod polling snapshot source
- `registrationTs` is introduced for the `ExecutorData` and it is initialized at the executor registration at the scheduler backend

Moreover a new config `spark.kubernetes.executor.missingPodDetectDelta` is used to specify the accepted delta between the two.

### Why are the changes needed?

Watching a POD (`ExecutorPodsWatchSnapshotSource`) only inform about single POD changes. This could wrongfully lead to detecting of missing PODs (PODs known by scheduler backend but missing from POD snapshots) by the executor POD lifecycle manager.

A key indicator of this error is seeing this log message:

> "The executor with ID [some_id] was not found in the cluster but we didn't get a reason why. Marking the executor as failed. The executor may have been deleted but the driver missed the deletion event."

So one of the problem is running the missing POD detection check even when a single POD is changed without having a full consistent snapshot about all the PODs (see `ExecutorPodsPollingSnapshotSource`).
The other problem could be the race between the executor POD lifecycle manager and the scheduler backend: so even in case of a having a full snapshot the registration at the scheduler backend could precede the snapshot polling (and processing of those polled snapshots).

### Does this PR introduce _any_ user-facing change?

Yes. When the POD is missing then the reason message explaining the executor's exit is extended with both timestamps (the polling time and the executor registration time) and even the new config is mentioned.

### How was this patch tested?

The existing unit tests are extended.

Closes #30675 from attilapiros/SPARK-33711.

Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Signed-off-by: Holden Karau <hkarau@apple.com>
(commit: 6bd7a62)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/ExecutorData.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshotsStoreImpl.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/DeterministicExecutorPodsSnapshotsStore.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManagerSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsLifecycleManager.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsSnapshot.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala (diff)
Commit 7ff9ff153e2f19ca2e65a7386294fc07896b5975 by ruifengz
[SPARK-34045][ML] OneVsRestModel.transform should not call setter of submodels

### What changes were proposed in this pull request?
use a tmp model in OneVsRestModel.transform, to avoid calling directly setter of model

### Why are the changes needed?
params of model (submodels) should not be changed in transform

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added testsuite

Closes #31086 from zhengruifeng/ovr_transform_tmp_model.

Authored-by: Ruifeng Zheng <ruifengz@foxmail.com>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(commit: 7ff9ff1)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala (diff)
Commit f7cbeec487f779a66fdedc2895b18600a0f1cf5b by wenchen
[SPARK-34074][SQL] Update stats only when table size changes

### What changes were proposed in this pull request?
Do not alter table stats if they are the same as in the catalog (at least since the recent retrieve).

### Why are the changes needed?
The changes reduce the number of calls to Hive external catalog.

### Does this PR introduce _any_ user-facing change?
Should not.

### How was this patch tested?
By running the modified test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableAddPartitionSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
```

Closes #31135 from MaxGekk/optimize-updateTableStats.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: f7cbeec)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/AlterTableDropPartitionSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/AlterTableAddPartitionSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLCommandTestUtils.scala (diff)
Commit 02a17e92f1d89735094385f333d87e7a53c73e30 by gurwls223
[SPARK-28646][SQL][FOLLOWUP] Add legacy config for allowing parameterless count

### What changes were proposed in this pull request?

Add a legacy configuration `spark.sql.legacy.allowParameterlessCount` in case users need the parameterless count.
This is a follow-up for https://github.com/apache/spark/pull/30541.

### Why are the changes needed?

There can be some users depends on the legacy behavior. We need a legacy flag for it.

### Does this PR introduce _any_ user-facing change?

Yes, adding a legacy flag `spark.sql.legacy.allowParameterlessCount`.

### How was this patch tested?

Unit tests

Closes #31143 from gengliangwang/countLegacy.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 02a17e9)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Count.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/count.sql (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/count.sql.out (diff)
Commit 99f84892a56446b41cb36d32db7ff92460dc96da by wenchen
[SPARK-34003][SQL][FOLLOWUP] Avoid pushing modified Char/Varchar sort attributes into aggregate for existing ones

### What changes were proposed in this pull request?

In https://github.com/apache/spark/commit/0f8e5dd445b03161a27893ba714db57919d8bcab, we partially fix the rule conflicts between `PaddingAndLengthCheckForCharVarchar` and `ResolveAggregateFunctions`, as error still exists in

sql like ```SELECT substr(v, 1, 2), sum(i) FROM t GROUP BY v ORDER BY substr(v, 1, 2)```

```sql
[info]   Failed to analyze query: org.apache.spark.sql.AnalysisException: expression 'spark_catalog.default.t.`v`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;
[info]   Project [substr(v, 1, 2)#100, sum(i)#101L]
[info]   +- Sort [aggOrder#102 ASC NULLS FIRST], true
[info]      +- !Aggregate [v#106], [substr(v#106, 1, 2) AS substr(v, 1, 2)#100, sum(cast(i#98 as bigint)) AS sum(i)#101L, substr(v#103, 1, 2) AS aggOrder#102
[info]         +- SubqueryAlias spark_catalog.default.t
[info]            +- Project [if ((length(v#97) <= 3)) v#97 else if ((length(rtrim(v#97, None)) > 3)) cast(raise_error(concat(input string of length , cast(length(v#97) as string),  exceeds varchar type length limitation: 3)) as string) else rpad(rtrim(v#97, None), 3,  ) AS v#106, i#98]
[info]               +- Relation[v#97,i#98] parquet
[info]
[info]   Project [substr(v, 1, 2)#100, sum(i)#101L]
[info]   +- Sort [aggOrder#102 ASC NULLS FIRST], true
[info]      +- !Aggregate [v#106], [substr(v#106, 1, 2) AS substr(v, 1, 2)#100, sum(cast(i#98 as bigint)) AS sum(i)#101L, substr(v#103, 1, 2) AS aggOrder#102
[info]         +- SubqueryAlias spark_catalog.default.t
[info]            +- Project [if ((length(v#97) <= 3)) v#97 else if ((length(rtrim(v#97, None)) > 3)) cast(raise_error(concat(input string of length , cast(length(v#97) as string),  exceeds varchar type length limitation: 3)) as string) else rpad(rtrim(v#97, None), 3,  ) AS v#106, i#98]
[info]               +- Relation[v#97,i#98] parquet

```
We need to look recursively into children to find char/varchars.

In this PR,  we try to resolve the full attributes including the original `Aggregate` expressions and the candidates in `SortOrder` together, then use the new re-resolved `Aggregate` expressions to determine which candidate in the `SortOrder` shall be pushed. This can avoid mismatch for the same attributes w/o this change, as the expressions returned by `executeSameContext` will change when `PaddingAndLengthCheckForCharVarchar` takes effects. W/ this change, the expressions can be matched correctly.

For those unmatched, w need to look recursively into children to find char/varchars instead of the expression itself only.

### Why are the changes needed?

bugfix

### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

add new tests

Closes #31129 from yaooqinn/SPARK-34003-F.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 99f8489)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q85.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q85/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q85/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q85.sf100/explain.txt (diff)
Commit a4b70758d3dfe2d59fb3800321ffb2450206c26f by gurwls223
[SPARK-34053][INFRA][FOLLOW-UP] Disables canceling push/schedule workflows

Changes the configuration of the cancel workflow action to skip schedule/push events from canceling. This has the effect that duplicates of all direct pushes (master merges or direct pushes to the spark repository are not cancelled)

### What changes were proposed in this pull request?

Update to CI cancel policy to skip direct pushes. Duplicates will only be cancelled for Pull Requests.

### Why are the changes needed?

[Apparenlty](https://github.com/apache/spark/pull/31104#issuecomment-758318463) the aggressive behavior of the cancel action which also cancels duplicate master builds is too agressive for spark community. This change spares merges to master and scheduled builds from duplicate checking (as a result all merges to master will be always build to completion).

The previous behavior of the action was that in case of subsequent merges to master, only the latest one was guaranteed to complete. Other changes could be cancelled before they complete to save job queue.

### Does this PR introduce _any_ user-facing change?

No, except if the master builds were somehow facing the users (but it's unlikely taking into account the ASF release policy).
There was a potential that some changes that could be detected by specific master merge failing could be detected later (in one of the subsequent builds) which could make investigation of the root cause of failure a bit more difficult, because it could have been introduced in one of the commits between two completed builds merges. But this is at most impacting the timeline of release close to release itself, not the release itself.

### How was this patch tested?

This configuration parameter has been tested earlier in Airflow. We used it initially, but since our master builds are heavy and we have extensive tests in the PRs and investigation for failed builds is not as difficult we found that limiting the strain on Github Action and cancelling master builds was more important for the health of the whole ASF community and we removed that configuration.

Tested in https://github.com/potiuk/spark/runs/1688506527?check_suite_focus=true#step:2:46 where the action found other master builds in progress but did not add them as candidates to cancel.

Closes #31153 from potiuk/skip-schedule-push-branches.

Authored-by: Jarek Potiuk <potiuk@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: a4b7075)
The file was modified.github/workflows/cancel_duplicate_workflow_runs.yml (diff)
Commit 6c047958f9fcf4cac848695915deea289c65ddc1 by wenchen
[SPARK-34084][SQL] Fix auto updating of table stats in `ALTER TABLE .. ADD PARTITION`

### What changes were proposed in this pull request?
Fix an issue in `ALTER TABLE .. ADD PARTITION` which happens when:
- A table doesn't have stats
- `spark.sql.statistics.size.autoUpdate.enabled` is `true`

In that case, `ALTER TABLE .. ADD PARTITION` does not update table stats automatically.

### Why are the changes needed?
The changes fix the issue demonstrated by the example:
```sql
spark-sql> create table tbl (col0 int, part int) partitioned by (part);
spark-sql> insert into tbl partition (part = 0) select 0;
spark-sql> set spark.sql.statistics.size.autoUpdate.enabled=true;
spark-sql> alter table tbl add partition (part = 1);
```
the `add partition` command should update table stats but it does not. There is no stats in the output of:
```
spark-sql> describe table extended tbl;
```

### Does this PR introduce _any_ user-facing change?
Yes. After the changes, `ALTER TABLE .. ADD PARTITION` updates stats even when a table does have them before the command:
```sql
spark-sql> alter table tbl add partition (part = 1);
spark-sql> describe table extended tbl;
col0 int NULL
part int NULL
# Partition Information
# col_name data_type comment
part int NULL

# Detailed Table Information
...
Statistics 2 bytes
```

### How was this patch tested?
By running new UT and existing test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.AlterTableAddPartitionSuite"
```

Closes #31149 from MaxGekk/fix-stats-in-add-partition.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 6c04795)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLCommandTestUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableAddPartitionSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala (diff)
Commit 0099715aaee812e1a6dc55d106c2070077164dd4 by wenchen
[SPARK-34091][SQL] Shuffle batch fetch should be able to disable after it's been enabled

### What changes were proposed in this pull request?

Fix the setting issue of shuffle batch fetch in `ShuffledRowRDD`.

### Why are the changes needed?

Currently, we can not disable the shuffle batch fetch mode once the batch fetch mode has been enabled. This PR fixes the issue to make `ShuffledRowRDD` respects the `spark.sql.adaptive.fetchShuffleBlocksInBatch` at runtime.

### Does this PR introduce _any_ user-facing change?

Yes. Before this PR, users can not disable batch fetch if they enabled first. After this PR, they can.

### How was this patch tested?

Added unit test.

Closes #31155 from Ngone51/fix-batchfetch-set-issue.

Authored-by: yi.wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 0099715)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/ShuffledRowRDD.scala (diff)
Commit 65222b705109b91b89b1aa5ca811bf667cc422cd by mridulatgmail.com
[SPARK-34069][CORE] Kill barrier tasks should respect SPARK_JOB_INTERRUPT_ON_CANCEL

### What changes were proposed in this pull request?

Add shouldInterruptTaskThread check when kill barrier task.

### Why are the changes needed?

We should interrupt task thread if user set local property `SPARK_JOB_INTERRUPT_ON_CANCEL` to true.

### Does this PR introduce _any_ user-facing change?

Yes, task will be interrupted if user set `SPARK_JOB_INTERRUPT_ON_CANCEL` to true.

### How was this patch tested?

Add test.

Closes #31127 from ulysses-you/SPARK-34069.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
(commit: 65222b7)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala (diff)
Commit ad8e40e2ab80d0fb5a88fcc991c15267316b57d7 by gurwls223
[SPARK-32338][SQL][PYSPARK][FOLLOW-UP][TEST] Add more tests for slice function

### What changes were proposed in this pull request?

This PR is a follow-up of #29138 and #29195 to add more tests for `slice` function.

### Why are the changes needed?

The original PRs are missing tests with column-based arguments instead of literals.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests and existing tests.

Closes #31159 from ueshin/issues/SPARK-32338/slice_tests.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: ad8e40e)
The file was modifiedpython/pyspark/sql/tests/test_functions.py (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala (diff)
Commit b0759dc600fc7b0e2724a2efeb60c59549f90ceb by gurwls223
[SPARK-34090][SS] Cache HadoopDelegationTokenManager.isServiceEnabled result used in KafkaTokenUtil.needTokenUpdate

### What changes were proposed in this pull request?
`HadoopDelegationTokenManager.isServiceEnabled` is quite a time consuming operation which is called in `KafkaTokenUtil.needTokenUpdate` often which slowed down Kafka processing heavily. SPARK-33635 changed the if condition in order to overcome this issue when no delegation token is used but in case of delegation token usage the problem still exists. In this PR I'm caching the `HadoopDelegationTokenManager.isServiceEnabled` result in the `KafkaDataConsumer` instances which solves the issue. There would be another solution, namely caching the result inside `HadoopDelegationTokenManager` but since it's an object function and several application is running inside a JVM, different `SparkConf` instances will arrive. Caching the result per `SparkConf` instance would be an overkill.

### Why are the changes needed?
Kafka stream processing is slow with delegation token.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
* Existing unit tests
* In Kafka to Kafka live query I've double checked that `HadoopDelegationTokenManager.isServiceEnabled` call executed only when new `KafkaDataConsumer` created (new delegation token arrives or task failure).

Closes #31154 from gaborgsomogyi/SPARK-34090.

Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: b0759dc)
The file was modifiedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/KafkaDataConsumer.scala (diff)
The file was modifiedexternal/kafka-0-10-token-provider/src/main/scala/org/apache/spark/kafka010/KafkaTokenUtil.scala (diff)
The file was modifiedexternal/kafka-0-10-token-provider/src/test/scala/org/apache/spark/kafka010/KafkaTokenUtilSuite.scala (diff)
Commit 861f8bb5fb82e53a223ae121737fd6d54ab8ba52 by wenchen
[SPARK-34071][SQL][TESTS] Check stats of cached v1 tables after altering

### What changes were proposed in this pull request?
Port the test added by https://github.com/apache/spark/pull/31112 to:
1. v1 In-Memory catalog for `ALTER TABLE .. DROP PARTITION`
2. v1 In-Memory and Hive external catalogs for `ALTER TABLE .. ADD PARTITION`
3. v1 In-Memory and Hive external catalogs for `ALTER TABLE .. RENAME PARTITION`

### Why are the changes needed?
To improve test coverage.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the modified test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.AlterTableAddPartitionSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.AlterTableDropPartitionSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.AlterTableRenamePartitionSuite"
```

Closes #31131 from MaxGekk/cache-stats-tests.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 861f8bb)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLCommandTestUtils.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/AlterTableDropPartitionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableAddPartitionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableRenamePartitionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableDropPartitionSuite.scala (diff)
Commit 04f031acb38f9473802d2890b82dd6db66e052be by wenchen
[SPARK-34086][SQL] RaiseError generates too much code and may fails codegen in length check for char varchar

### What changes were proposed in this pull request?

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/133928/testReport/org.apache.spark.sql.execution/LogicalPlanTagInSparkPlanSuite/q41/

We can reduce more than 8000 bytes by removing the unnecessary CONCAT expression.

W/ this fix, for q41 in TPCDS with [Using TPCDS original definitions for char/varchar columns](https://github.com/apache/spark/pull/31012) applied, we can reduce the stage code-gen size from 22523 to 14369
```
14369  - 22523 = - 8154
```

### Why are the changes needed?

fix the perf regression(we need other improvements for q41 works), there will be a huge performance regression if codegen fails

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

modified uts

Closes #31150 from yaooqinn/SPARK-34086.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 04f031a)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharVarcharUtils.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala (diff)
Commit f64297d290ec7ce7df35e71634b10239b5f1a56b by gurwls223
[SPARK-32850][TEST][FOLLOWUP] Fix flaky test due to timeout

### What changes were proposed in this pull request?

Increase test timeout.

### Why are the changes needed?

It's more reasonable to use 60s instead of 6s since many code place use the 60s in `TestUtils.waitUntilExecutorsUp`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass exists test.

Closes #31166 from ulysses-you/SPARK-32850-FOLLOWUP.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: f64297d)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionIntegrationSuite.scala (diff)
Commit f1b21ba50538bb2db9f958136bd15b43dc40ad5b by wenchen
[SPARK-34064][SQL] Cancel the running broadcast sub-jobs when SQL statement is cancelled

### What changes were proposed in this pull request?
#24595 introduced `private val runId: UUID = UUID.randomUUID` in `BroadcastExchangeExec` to cancel the broadcast execution in the Future when timeout happens. Since the runId is a random UUID instead of inheriting the job group id, when a SQL statement is cancelled, these broadcast sub-jobs are still executing. This PR uses the job group id of the outside thread as its `runId` to abort these broadcast sub-jobs when the SQL statement is cancelled.

### Why are the changes needed?
When broadcasting a table takes too long and the SQL statement is cancelled. However, the background Spark job is still running and it wastes resources.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Manually test.
Since broadcasting a table is too fast to cancel in UT, but it is very easy to verify manually:
1. Start a Spark thrift-server with less resource in YARN.
2. When the driver is running but no executors are launched, submit a SQL which will broadcast tables from beeline.
3. Cancel the SQL in beeline

Without the patch, broadcast sub-jobs won't be cancelled.
![Screen Shot 2021-01-11 at 12 03 13 PM](https://user-images.githubusercontent.com/1853780/104150975-ab024b00-5416-11eb-8bf9-b5167bdad80a.png)

With this patch, broadcast sub-jobs will be cancelled.
![Screen Shot 2021-01-11 at 11 43 40 AM](https://user-images.githubusercontent.com/1853780/104150994-be151b00-5416-11eb-80ff-313d423c8a2e.png)

Closes #31119 from LantaoJin/SPARK-34064.

Authored-by: LantaoJin <jinlantao@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: f1b21ba)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala (diff)
Commit 62d82b5b270c2eea27ba026e66b7d6598bbaa0a6 by wenchen
[SPARK-34076][SQL] SQLContext.dropTempTable fails if cache is non-empty

### What changes were proposed in this pull request?

This changes `CatalogImpl.dropTempView` and `CatalogImpl.dropGlobalTempView` use analyzed logical plan instead of `viewDef` which is unresolved.

### Why are the changes needed?

Currently, `CatalogImpl.dropTempView` is implemented as following:

```scala
override def dropTempView(viewName: String): Boolean = {
  sparkSession.sessionState.catalog.getTempView(viewName).exists { viewDef =>
    sparkSession.sharedState.cacheManager.uncacheQuery(
      sparkSession, viewDef, cascade = false)
    sessionCatalog.dropTempView(viewName)
  }
}
```

Here, the logical plan `viewDef` is not resolved, and when passing to `uncacheQuery`, it could fail at `sameResult` call, where canonicalized plan is compared. The error message looks like:
```
Invalid call to qualifier on unresolved object, tree: 'key
```

This can be reproduced via:
```scala
sql(s"CREATE TEMPORARY VIEW $v AS SELECT key FROM src LIMIT 10")
sql(s"CREATE TABLE $t AS SELECT * FROM src")
sql(s"CACHE TABLE $t")
dropTempTable(v)
```

### Does this PR introduce _any_ user-facing change?

The only user-facing change is that, previously `SQLContext.dropTempTable` may fail in the above scenario but will work with this fix.

### How was this patch tested?

Added new unit tests.

Closes #31136 from sunchao/SPARK-34076.

Authored-by: Chao Sun <sunchao@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 62d82b5)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala (diff)
Commit bd5039fc3542dc4ac96bea28639f1896d7919388 by tgraves
[SPARK-33741][CORE] Add min threshold time speculation config

### What changes were proposed in this pull request?
Add min threshold time speculation config

### Why are the changes needed?
When we turn on speculation with default configs we have the last 10% of the tasks subject to speculation. There are a lot of stages where the stage runs for few seconds to minutes. Also in general we don't want to speculate tasks that run within a minimum threshold. By setting a minimum threshold for speculation config gives us better control for speculative tasks

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit test

Closes #30710 from redsanket/SPARK-33741.

Lead-authored-by: schintap <schintap@verizonmedia.com>
Co-authored-by: Sanket Chintapalli <chintapalli.sanketreddy@gmail.com>
Signed-off-by: Thomas Graves <tgraves@apache.org>
(commit: bd5039f)
The file was modifieddocs/configuration.md (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/Schedulable.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/Pool.scala (diff)
Commit 8c5fecda7320dce32fc36981c41407f4272a7748 by srowen
[SPARK-34070][CORE][SQL] Replaces find and emptiness check with exists

### What changes were proposed in this pull request?
This pr use `exists` to simplify `find + emptiness check`, it's semantically consistent, but looks simpler.

**Before**

```
seq.find(p).isDefined

or

seq.find(p).isEmpty
```

**After**

```
seq.exists(p)

or

!seq.exists(p)
```
### Why are the changes needed?
Code Simpilefications.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass the Jenkins or GitHub Action

Closes #31130 from LuciferYang/SPARK-34070.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 8c5fecd)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManagerDecommissioner.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala (diff)
Commit 8b1ba233f1383c395b859232c8a7bdf974a1dee2 by srowen
[SPARK-34068][CORE][SQL][MLLIB][GRAPHX] Remove redundant collection conversion

### What changes were proposed in this pull request?
There are some redundant collection conversion can be removed, for version compatibility, clean up these with Scala-2.13 profile.

### Why are the changes needed?
Remove redundant collection conversion

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass the Jenkins or GitHub  Action
- Manual test `core`, `graphx`, `mllib`, `mllib-local`, `sql`, `yarn`,`kafka-0-10` in Scala 2.13 passed

Closes #31125 from LuciferYang/SPARK-34068.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 8b1ba23)
The file was modifiedexternal/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/ReflectionUtils.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala (diff)
The file was modifiedgraphx/src/main/scala/org/apache/spark/graphx/lib/SVDPlusPlus.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala (diff)
The file was modifiedmllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedresource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/executor/ExecutorMetricsSource.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala (diff)
Commit 62d8466c74ab97f92c1a60f4a6b54d1ae4cc7826 by srowen
[SPARK-34051][SQL] Support 32-bit unicode escape in string literals

### What changes were proposed in this pull request?
<!--
Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
  1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
  2. If you fix some SQL features, you can provide some references of other DBMSes.
  3. If there is design documentation, please add the link.
  4. If there is a discussion in the mailing list, please add the link.
-->
This PR adds a feature which supports 32-bit unicode escape in string literals like PostgreSQL or some modern programming languages do (e.g, Python3, C++11 and Rust).
In addition to the feature which supports 16-bit unicode escape like `"\u0041"`, users can express unicode characters like `"\U00020BB7"` with this change.

### Why are the changes needed?
<!--
Please clarify why the changes are needed. For instance,
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
-->
Users can express unicode characters straightly without surrogate pair.

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such as the documentation fix.
If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
If no, write 'No'.
-->
Yes. Users an express all the unicode characters straightly.

### How was this patch tested?
<!--
If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
If tests were not added, please describe why they were not added and/or why it was difficult to add.
-->
Added new assertions to the existing test case.

Closes #31096 from sarutak/32-bit-unicode-escape.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 62d8466)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ParserUtilsSuite.scala (diff)
Commit b7da108cae2d354b972b0825e8b0bae1d5d300e5 by srowen
[SPARK-33690][SQL][FOLLOWUP] Escape further meta-characters in showString

### What changes were proposed in this pull request?

This is a followup PR for SPARK-33690 (#30647) .
In addition to the original PR, this PR intends to escape the following meta-characters in `Dataset#showString`.

  * `\r` (carrige ret)
  * `\f` (form feed)
  * `\b` (backspace)
  * `\u000B` (vertical tab)
  * `\u0007` (bell)

### Why are the changes needed?

To avoid breaking the layout of `Dataset#showString`.
`\u0007` does not break the layout of `Dataset#showString` but it's noisy (beeps for each row) so it should be also escaped.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Modified the existing tests.
I also build the documents and check the generated html for `sql-migration-guide.md`.

Closes #31144 from sarutak/escape-metacharacters-in-getRows.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: b7da108)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Dataset.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala (diff)
Commit 9e93fdb146b7af09922e7c11ce714bcc7e658115 by gurwls223
[SPARK-34103][INFRA] Fix MiMaExcludes by moving SPARK-23429 from 2.4 to 3.0

### What changes were proposed in this pull request?

This PR aims to fix `MiMaExcludes` rule by moving SPARK-23429 from 2.4 to 3.0.

### Why are the changes needed?

SPARK-23429 was added at Apache Spark 3.0.0.
This should land on `master` and `branch-3.1` and `branch-3.0`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the MiMa rule.

Closes #31174 from dongjoon-hyun/SPARK-34103.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 9e93fdb)
The file was modifiedproject/MimaExcludes.scala (diff)
Commit 467d7589737d2430d09f1ffbd33bf801d179f990 by gurwls223
[SPARK-34075][SQL][CORE] Hidden directories are being listed for partition inference

### What changes were proposed in this pull request?

Fix a regression from https://github.com/apache/spark/pull/29959.

In Spark, the following file paths are considered as hidden paths and they are ignored on file reads:
1. starts with "_" and doesn't contain "="
2. starts with "."

However, after the refactoring PR https://github.com/apache/spark/pull/29959, the hidden paths are not filtered out on partition inference: https://github.com/apache/spark/pull/29959/files#r556432426

This PR is to fix the bug. To archive the goal, the method `InMemoryFileIndex.shouldFilterOut` is refactored as `HadoopFSUtils.shouldFilterOutPathName`

### Why are the changes needed?

Bugfix

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a bug for reading file paths with partitions.

### How was this patch tested?

Unit test

Closes #31169 from gengliangwang/fileListingBug.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 467d758)
The file was addedcore/src/test/scala/org/apache/spark/util/HadoopFSUtilsSuite.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/util/HadoopFSUtils.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala (diff)
Commit d3ea308c8fe914085d38b96bbd32442fc8ebd3bd by wenchen
[SPARK-34081][SQL] Only pushdown LeftSemi/LeftAnti over Aggregate if join can be planned as broadcast join

### What changes were proposed in this pull request?

Should not pushdown LeftSemi/LeftAnti over Aggregate for some cases.

```scala
spark.range(50000000L).selectExpr("id % 10000 as a", "id % 10000 as b").write.saveAsTable("t1")
spark.range(40000000L).selectExpr("id % 8000 as c", "id % 8000 as d").write.saveAsTable("t2")
spark.sql("SELECT distinct a, b FROM t1 INTERSECT SELECT distinct c, d FROM t2").explain
```

Before this pr:
```
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[a#16L, b#17L], functions=[])
   +- HashAggregate(keys=[a#16L, b#17L], functions=[])
      +- HashAggregate(keys=[a#16L, b#17L], functions=[])
         +- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, [id=#72]
            +- HashAggregate(keys=[a#16L, b#17L], functions=[])
               +- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L)], LeftSemi
                  :- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) ASC NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS FIRST], false, 0
                  :  +- Exchange hashpartitioning(coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, [id=#65]
                  :     +- FileScan parquet default.t1[a#16L,b#17L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint>
                  +- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) ASC NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS FIRST], false, 0
                     +- Exchange hashpartitioning(coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L), 5), ENSURE_REQUIREMENTS, [id=#66]
                        +- HashAggregate(keys=[c#18L, d#19L], functions=[])
                           +- Exchange hashpartitioning(c#18L, d#19L, 5), ENSURE_REQUIREMENTS, [id=#61]
                              +- HashAggregate(keys=[c#18L, d#19L], functions=[])
                                 +- FileScan parquet default.t2[c#18L,d#19L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c:bigint,d:bigint>
```

After this pr:
```
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- HashAggregate(keys=[a#16L, b#17L], functions=[])
   +- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, [id=#74]
      +- HashAggregate(keys=[a#16L, b#17L], functions=[])
         +- SortMergeJoin [coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L)], [coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L)], LeftSemi
            :- Sort [coalesce(a#16L, 0) ASC NULLS FIRST, isnull(a#16L) ASC NULLS FIRST, coalesce(b#17L, 0) ASC NULLS FIRST, isnull(b#17L) ASC NULLS FIRST], false, 0
            :  +- Exchange hashpartitioning(coalesce(a#16L, 0), isnull(a#16L), coalesce(b#17L, 0), isnull(b#17L), 5), ENSURE_REQUIREMENTS, [id=#67]
            :     +- HashAggregate(keys=[a#16L, b#17L], functions=[])
            :        +- Exchange hashpartitioning(a#16L, b#17L, 5), ENSURE_REQUIREMENTS, [id=#61]
            :           +- HashAggregate(keys=[a#16L, b#17L], functions=[])
            :              +- FileScan parquet default.t1[a#16L,b#17L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<a:bigint,b:bigint>
            +- Sort [coalesce(c#18L, 0) ASC NULLS FIRST, isnull(c#18L) ASC NULLS FIRST, coalesce(d#19L, 0) ASC NULLS FIRST, isnull(d#19L) ASC NULLS FIRST], false, 0
               +- Exchange hashpartitioning(coalesce(c#18L, 0), isnull(c#18L), coalesce(d#19L, 0), isnull(d#19L), 5), ENSURE_REQUIREMENTS, [id=#68]
                  +- HashAggregate(keys=[c#18L, d#19L], functions=[])
                     +- Exchange hashpartitioning(c#18L, d#19L, 5), ENSURE_REQUIREMENTS, [id=#63]
                        +- HashAggregate(keys=[c#18L, d#19L], functions=[])
                           +- FileScan parquet default.t2[c#18L,d#19L] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/spark/spark-warehouse/org.apache.spark.sql.Data..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c:bigint,d:bigint>
```

### Why are the changes needed?

1. Pushdown LeftSemi/LeftAnti over Aggregate will affect performance.
2. It will remove user added DISTINCT operator, e.g.: [q38](https://github.com/apache/spark/blob/master/sql/core/src/test/resources/tpcds/q38.sql), [q87](https://github.com/apache/spark/blob/master/sql/core/src/test/resources/tpcds/q87.sql).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test and benchmark test.

SQL | Before this PR(Seconds) | After this PR(Seconds)
-- | -- | --
q14a | 660 | 594
q14b | 660 | 600
q38 | 55 | 29
q87 | 66 | 35

Before this pr:
![image](https://user-images.githubusercontent.com/5399861/104452849-8789fc80-55de-11eb-88da-44059899f9a9.png)

After this pr:
![image](https://user-images.githubusercontent.com/5399861/104452899-9a043600-55de-11eb-9286-d8f3a23ca3b8.png)

Closes #31145 from wangyum/SPARK-34081.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: d3ea308)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q87.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14a.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14a.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14b.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14b.sf100/simplified.txt (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LeftSemiAntiJoinPushDownSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q87.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q38.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14a.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14a.sf100/simplified.txt (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushDownLeftSemiAntiJoin.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q38.sf100/simplified.txt (diff)
Commit 1288ad814e3547f57e8f84b2b1af9cc6b2321cda by yumwang
[SPARK-34067][SQL] PartitionPruning push down pruningHasBenefit function into insertPredicate function to decrease calculate time

### What changes were proposed in this pull request?
PartitionPruning push down pruningHasBenefit function into insertPredicate function to decrease calculate time

### Why are the changes needed?
to accelerate PartitionPruning prune calculate

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
existed unit test

Closes #31122 from monkeyboy123/optimize-dynamic-pruning.

Authored-by: Dereck Li <monkeyboy.ljh@gmail.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(commit: 1288ad8)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala (diff)
Commit 92e5cfd58dc7dd5a5b1d3e2b6e93a03af5a07b67 by wenchen
[SPARK-33989][SQL] Strip auto-generated cast when using Cast.sql

### What changes were proposed in this pull request?

This PR aims to strip auto-generated cast. The main logic is:
1. Add tag if Cast is specified by user.
2. Wrap `PrettyAttribute` in usePrettyExpression.

### Why are the changes needed?

Make sql consistent with dsl. Here is an inconsistent example before this PR:

```
-- output field name: FLOOR(1)
spark.emptyDataFrame.select(floor(lit(1)))

-- output field name: FLOOR(CAST(1 AS DOUBLE))
spark.sql("select floor(1)")
```

Note that, we don't remove the `Cast` so the auto-generated `Cast` can still work. The only changed place is `usePrettyExpression`, we use `PrettyAttribute` replace `Cast` to give a better sql string.

### Does this PR introduce _any_ user-facing change?

Yes, the default field name may change.

### How was this patch tested?

Add test and pass exists test.

Closes #31034 from ulysses-you/SPARK-33989.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 92e5cfd)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q92/explain.txt (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Column.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/numeric.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/comparator.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-except.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/promoteStrings.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/null-propagation.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q81/simplified.txt (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/division.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q92/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q32.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q24b/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q24.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/implicitTypeCasts.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/window_part3.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/window_part1.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q24a.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/case.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-except-all.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/window.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/datetime.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-count.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/group-by.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-having.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/caseWhenCoercion.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/ifCoercion.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-group-analytics.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-special-values.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q30.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/int2.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/aggregates_part1.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/array.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-group-by.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-inline-table.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/subexp-elimination.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-select_implicit.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q1.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q1/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/group-by-filter.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q24b/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q32/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/float8.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/inConversion.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/groupingsets.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/operators.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-pivot.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/decimalArithmeticOperations.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/having.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/random.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/window_part2.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/aggregates_part3.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q92.sf100/simplified.txt (diff)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveAliasesSuite.scala
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q32/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q30.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/int4.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-cross-join.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q30/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q1/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/booleanEquality.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/dateTimeOperations.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q24a.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q32.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q24/simplified.txt (diff)
The file was modifiedpython/pyspark/sql/session.py (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/window_part4.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q24a/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/decimalArithmeticOperations.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/regexp-functions.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q24/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-select_having.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/int8.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q81.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/windowFrameCoercion.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-aggregates_part2.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q1.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q24b.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/float4.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/pivot.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/binaryComparison.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/select_implicit.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q81/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/array.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q24b.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q81.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-intersect-all.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-functions/sql-expression-schema.md (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-window.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/interval.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q30/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/string-functions.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/count.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-aggregates_part1.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-case.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q24a/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/sql-compatibility-functions.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-join.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q24.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/string-functions.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/decimalPrecision.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/text.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/arrayJoin.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-join-empty-relation.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/misc-functions.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/strings.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/stringCastAndExpressions.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime-legacy.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/predicate-functions.sql.out (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/extract.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-natural-join.sql.out (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q92.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-outer-join.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/timestamp.sql.out (diff)
The file was modifiedpython/pyspark/sql/context.py (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/interval.sql.out (diff)
Commit feedd1b44d571301e243ec97557f57b78fe188be by dhyun
[SPARK-33354][FOLLOWUP][DOC] Shorten the table width of ANSI compliance casting document

### What changes were proposed in this pull request?

This is a follow-up of https://github.com/apache/spark/pull/30260
It shortens the table width of ANSI compliance casting document.

### Why are the changes needed?

The table is too wide and the UI of doc site is broken if we scroll the page to right side.
![Screen Shot 2021-01-14 at 3 04 57 PM](https://user-images.githubusercontent.com/1097932/104565897-d2693b80-5601-11eb-9f93-5f603cfc94c1.png)

### Does this PR introduce _any_ user-facing change?

Minor document change

### How was this patch tested?

Build doc site locally and preview:
![Screen Shot 2021-01-14 at 4 44 30 PM](https://user-images.githubusercontent.com/1097932/104565814-b2d21300-5601-11eb-94c4-78c785cda8ed.png)

Closes #31180 from gengliangwang/reviseAnsiDocStyle.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: feedd1b)
The file was modifieddocs/sql-ref-ansi-compliance.md (diff)
Commit 0e64a22b28c93474f5aa0ba02b9300d8eccfc53c by dhyun
[SPARK-34116][SS][TEST] Separate state store numKeys metric test and validate metrics after committing

### What changes were proposed in this pull request?

This patch proposes to pull the test of `numKeys` metric into a separate test in `StateStoreSuite`.

### Why are the changes needed?

Right now in `StateStoreSuite`, the tests of get/put/remove/commit are mixed with `numKeys` metric test. I found it is flaky when I was testing with other `StateStore` implementation.

Current test logic is tightly bound to the in-memory map behavior of `HDFSBackedStateStore`. For example, put can immediately show up in the `numKeys` metric.

But for a `StateStore` implementation relying on external storage, e.g. RocksDB, the metric might be updated once the data is actually committed. And `StateStoreSuite` should be a common test suite for all kinds of StateStore implementations.

Specifically, we also are able to check these metrics after state store is updated (committed). So I think we can refactor the test a little bit to make it easier to incorporate other `StateStore` externally.

### Does this PR introduce _any_ user-facing change?

No, dev only.

### How was this patch tested?

Unit test.

Closes #31183 from viirya/SPARK-34116.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 0e64a22)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala (diff)
Commit 8ed23ed499ec7745a8e9bdc4c4fb3200fdb6c3c8 by gurwls223
[SPARK-34118][CORE][SQL] Replaces filter and check for emptiness with exists or forall

### What changes were proposed in this pull request?
This pr use `exists` or `forall` to simplify `filter + emptiness check`, it's semantically consistent, but looks simpler. The rule as follow:

- `seq.filter(p).size == 0)` -> `!seq.exists(p)`
- `seq.filter(p).length > 0` -> `seq.exists(p)`
- `seq.filterNot(p).isEmpty` -> `seq.forall(p)`
- `seq.filterNot(p).nonEmpty` -> `!seq.forall(p)`

### Why are the changes needed?
Code Simpilefications.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass the Jenkins or GitHub Action

Closes #31184 from LuciferYang/SPARK-34118.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 8ed23ed)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionIntegrationSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2UtilsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/api/r/RUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala (diff)
Commit adba2ec8f2e6e3279f7228482786a93bf206b8c2 by wenchen
[SPARK-34099][SQL] Keep dependents of v2 tables cached during cache refreshing in DS v2 commands

### What changes were proposed in this pull request?

This PR changes cache refreshing of v2 tables in v2 commands. In particular, v2 table dependents are not removed from the cache after this PR. Comparing to current implementation, we just clear cached data of all dependents and keep them in the cache. So, the next actions will fill in the cached data of the original v2 table and its dependents. In more details:

1. Add new method `recacheTable()` to `DataSourceV2Strategy` and pass it the exec node where need to recache table. New method uses `recacheByPlan` to refresh data cache of v2 tables, and keeps table dependents still cached **while clearing their caches**.
2. Simplify `invalidateCache` (and rename it `invalidateTableCache`) by retargeting it for only table cache invalidation.
3. Modify a test for `REFRESH TABLE` and check that v2 table dependent is still cached after refreshing the base table.

### Why are the changes needed?
1. This should improve user experience with table/view caching. For example, let's imagine that an user has cached v2 table and cached view based on the table. And the user passed the table to external library which drops/renames/adds partitions in the v2 table. Unfortunately, the user gets the view uncached after that even he/she hasn't uncached the view explicitly.
2. Improve code maintenance.
3. Reduce the number of calls to the Cache Manager when need to recache a table. Before the changes, `invalidateCache()` invokes the Cache Manager 3 times: `lookupCachedData()`, `uncacheQuery()` and `cacheQuery()`.
4. Also this should speed up table recaching.

### Does this PR introduce _any_ user-facing change?
From the view of the correctness of query results, there are no behavior changes but the changes might influence on consuming memory and query execution time.

### How was this patch tested?
By running the existing test suites for v2 the add/drop/rename partition commands.

Closes #31172 from MaxGekk/dsv2-recache-table.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: adba2ec)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/RefreshTableExec.scala (diff)
Commit bec80d7eec91ee83fcbb0e022b33bd526c80f423 by gurwls223
[SPARK-34101][SQL] Make spark-sql CLI configurable for the behavior of printing header by SET command

### What changes were proposed in this pull request?

This PR introduces a new property `spark.sql.cli.print.header` to let users change the behavior of printing header for spark-sql CLI by SET command.

### Why are the changes needed?

Like Hive CLI, spark-sql CLI accepts `hive.cli.print.header` property and we can change the behavior of printing header.
But spark-sql CLI doesn't allow users to change Hive specific configurations dynamically by SET command.
So, it's better to support the way to change the behavior by SET command.

### Does this PR introduce _any_ user-facing change?

Yes. Users can dynamically change the behavior by SET command.

### How was this patch tested?

I confirmed with the following commands/queries.
```
spark-sql> select (1) as a, (2) as b, (3) as c, (4) as d;
1 2 3 4
Time taken: 3.218 seconds, Fetched 1 row(s)
spark-sql> set spark.sql.cli.print.header=true;
key value
spark.sql.cli.print.header true
Time taken: 1.506 seconds, Fetched 1 row(s)
spark-sql> select (1) as a, (2) as b, (3) as c, (4) as d;
a b c d
1 2 3 4
Time taken: 0.79 seconds, Fetched 1 row(s)
```

Closes #31173 from sarutak/spark-sql-print-header.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: bec80d7)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala (diff)
Commit acd6c1271b3213f01ffbbcdf32a7750ebba75a63 by wenchen
[SPARK-34114][SQL] should not trim right for read-side char length check and padding

### What changes were proposed in this pull request?

On the read-side, we should respect the original data instead of trimming it first.

It brings extra overhead on the code-gen code side, trimming and padding for the same field, and it's also unnecessary and a bug

### Why are the changes needed?

bugfix and perf regression

### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

new tests

Closes #31181 from yaooqinn/SPARK-34114.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: acd6c12)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharVarcharUtils.scala (diff)
Commit 00d43b1f829fb5f79f0355afbbacc804162648e5 by dhyun
[SPARK-32864][SQL] Support ORC forced positional evolution

### What changes were proposed in this pull request?
Add support for `orc.force.positional.evolution` config that forces ORC top level column matching by position rather than by name.

This does work in Hive:
```
> set orc.force.positional.evolution;
+--------------------------------------+
|                 set                  |
+--------------------------------------+
| orc.force.positional.evolution=true  |
+--------------------------------------+
> create table t (c1 string, c2 string) stored as orc;
> insert into t values ('foo', 'bar');
> alter table t change c1 c3 string;
```
The orc file in this case contains the original `c1` and `c2` columns that doesn't match the metadata in HMS. But due to the positional evolution setting, Hive is capable to return all the data:
```
> select * from t;
+--------+--------+
| t.c3   | t.c2   |
+--------+--------+
| foo    | bar    |
+--------+--------+
```
Without this PR Spark returns `null`s for the renamed `c3` column.

After this PR Spark returns the data in `c3` column.

### Why are the changes needed?
Hive/ORC does support it.

### Does this PR introduce _any_ user-facing change?
Yes, we will support `orc.force.positional.evolution`.

### How was this patch tested?
New UT.

Closes #29737 from peter-toth/SPARK-32864-support-orc-forced-positional-evolution.

Lead-authored-by: Peter Toth <peter.toth@gmail.com>
Co-authored-by: Peter Toth <ptoth@cloudera.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 00d43b1)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala (diff)