Changes

Summary

  1. [SPARK-33870][CORE] Enable spark.storage.replication.proactive by (commit: 90d6f86) (details)
  2. [SPARK-33526][SQL][FOLLOWUP] Fix flaky test due to timeout and fix docs (commit: e853f06) (details)
  3. [SPARK-33879][SQL] Char Varchar values fails w/ match error as partition (commit: 2287f56) (details)
  4. [SPARK-31960][YARN][DOCS][FOLLOW-UP] Document the behaviour change of (commit: d98c216) (details)
  5. [SPARK-33787][SQL] Allow partition purge for v2 tables (commit: 34bfb3a) (details)
  6. [SPARK-33497][SQL] Override maxRows in some LogicalPlan (commit: f421c17) (details)
  7. [SPARK-33858][SQL][TESTS] Unify v1 and v2 ALTER TABLE .. RENAME (commit: cc23581) (details)
  8. [SPARK-33889][SQL] Fix NPE from `SHOW PARTITIONS` on V2 tables (commit: 303df64) (details)
  9. [SPARK-33847][SQL] Simplify CaseWhen if elseValue is None (commit: 7ffcfcf) (details)
  10. [SPARK-33891][DOCS][CORE] Update dynamic allocation related documents (commit: 47d1aa4) (details)
  11. [SPARK-32916][SHUFFLE][TEST-MAVEN][TEST-HADOOP2.7] Ensure the number of (commit: 0677c39) (details)
  12. [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming (commit: 5c9b421) (details)
  13. [SPARK-33893][CORE] Exclude fallback block manager from executorList (commit: d467d81) (details)
  14. [SPARK-33877][SQL][FOLLOWUP] SQL reference documents for INSERT w/ a (commit: 368a2c3) (details)
  15. [SPARK-33835][CORE] Refector AbstractCommandBuilder.buildJavaCommand: (commit: 61881bb) (details)
  16. [SPARK-33659][SS] Document the current behavior for (commit: 86c1cfc) (details)
  17. [SPARK-33886][SQL] UnresolvedTable should retain SQL text position for (commit: f1d3797) (details)
  18. [SPARK-33895][SQL] Char and Varchar fail in MetaOperation of (commit: d7dc42d) (details)
  19. [SPARK-33861][SQL] Simplify conditional in predicate (commit: 32d4a2b) (details)
  20. [SPARK-33443][SQL] LEAD/LAG should support [ IGNORE NULLS | RESPECT (commit: 3e9821e) (details)
  21. [SPARK-33881][SQL][TESTS] Check null and empty string as partition (commit: 54a6784) (details)
  22. [SPARK-33892][SQL] Display char/varchar in DESC and SHOW CREATE TABLE (commit: 29cca68) (details)
  23. [SPARK-33900][WEBUI] Show shuffle read size / records correctly when (commit: 700f5ab) (details)
  24. [SPARK-33857][SQL] Unify the default seed of random functions (commit: 9c30116) (details)
  25. [SPARK-30027][SQL] Support codegen for aggregate filters in (commit: 65a9ac2) (details)
  26. [SPARK-33084][CORE][SQL] Add jar support ivy path (commit: 10b6466) (details)
Commit 90d6f8600117da33bbb570dee6d893cfd8d35263 by dhyun
[SPARK-33870][CORE] Enable spark.storage.replication.proactive by default

### What changes were proposed in this pull request?

This PR aims to enable `spark.storage.replication.proactive` by default for Apache Spark 3.2.0.

### Why are the changes needed?

`spark.storage.replication.proactive` is added by SPARK-15355 at Apache Spark 2.2.0 and has been helpful when the block manager loss occurs frequently like K8s environment.

### Does this PR introduce _any_ user-facing change?

Yes, this will make the Spark jobs more robust.

### How was this patch tested?

Pass the existing UTs.

Closes #30876 from dongjoon-hyun/SPARK-33870.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 90d6f86)
The file was modifieddocs/core-migration-guide.md (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
Commit e853f068f6c8f9c2aebad37115b0fad1191650ee by dhyun
[SPARK-33526][SQL][FOLLOWUP] Fix flaky test due to timeout and fix docs

### What changes were proposed in this pull request?

Make test stable and fix docs.

### Why are the changes needed?

Query timeout sometime since we set an another config after set query timeout.
```
sbt.ForkMain$ForkError: java.sql.SQLTimeoutException: Query timed out after 0 seconds
at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:381)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$13(ThriftServerWithSparkContextSuite.scala:107)
at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$13$adapted(ThriftServerWithSparkContextSuite.scala:106)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$12(ThriftServerWithSparkContextSuite.scala:106)
at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite.$anonfun$$init$$12$adapted(ThriftServerWithSparkContextSuite.scala:89)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$withJdbcStatement$4(SharedThriftServer.scala:95)
at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.$anonfun$withJdbcStatement$4$adapted(SharedThriftServer.scala:95)
```

The reason is:
1. we execute `set spark.sql.thriftServer.queryTimeout = 1`, then all the option will be limited in 1s.
2. we execute `set spark.sql.thriftServer.interruptOnCancel = false/true`. This sql will get timeout exception if there is something hung within 1s. It's not our expected.

Reset the timeout before we do the step2 can avoid this problem.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Fix test.

Closes #30897 from ulysses-you/SPARK-33526-followup.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: e853f06)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerWithSparkContextSuite.scala (diff)
Commit 2287f56a3e105e04cf4e86283eaee12f270c09a7 by gurwls223
[SPARK-33879][SQL] Char Varchar values fails w/ match error as partition columns

### What changes were proposed in this pull request?

```sql
spark-sql> select * from t10 where c0='abcd';
20/12/22 15:43:38 ERROR SparkSQLDriver: Failed in [select * from t10 where c0='abcd']
scala.MatchError: CharType(10) (of class org.apache.spark.sql.types.CharType)
at org.apache.spark.sql.catalyst.expressions.CastBase.cast(Cast.scala:815)
at org.apache.spark.sql.catalyst.expressions.CastBase.cast$lzycompute(Cast.scala:842)
at org.apache.spark.sql.catalyst.expressions.CastBase.cast(Cast.scala:842)
at org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:844)
at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476)
at org.apache.spark.sql.catalyst.catalog.CatalogTablePartition.$anonfun$toRow$2(interface.scala:164)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:102)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at org.apache.spark.sql.types.StructType.map(StructType.scala:102)
at org.apache.spark.sql.catalyst.catalog.CatalogTablePartition.toRow(interface.scala:158)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.$anonfun$prunePartitionsByFilter$3(ExternalCatalogUtils.scala:157)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.$anonfun$prunePartitionsByFilter$3$adapted(ExternalCatalogUtils.scala:156)
```
c0 is a partition column, it fails in the partition pruning rule

In this PR, we relace char/varchar w/ string type before the CAST happends

### Why are the changes needed?

bugfix, see the case above

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

yes, new tests

Closes #30887 from yaooqinn/SPARK-33879.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 2287f56)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogUtils.scala (diff)
Commit d98c216e1959e276877c3d0a9562cc4cdd8b41bb by gurwls223
[SPARK-31960][YARN][DOCS][FOLLOW-UP] Document the behaviour change of Hadoop's classpath propagation in migration guide

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/28788, and proposes to update migration guide.

### Why are the changes needed?

To tell users about the behaviour change.

### Does this PR introduce _any_ user-facing change?

Yes, it updates migration guides for users.

### How was this patch tested?

GitHub Actions' documentation build should test it.

Closes #30903 from HyukjinKwon/SPARK-31960-followup.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: d98c216)
The file was modifieddocs/core-migration-guide.md (diff)
Commit 34bfb3a31d505a08e15454214d8f78933310ebb3 by wenchen
[SPARK-33787][SQL] Allow partition purge for v2 tables

### What changes were proposed in this pull request?
1. Add new methods `purgePartition()`/`purgePartitions()` to the interfaces `SupportsPartitionManagement`/`SupportsAtomicPartitionManagement`.
2. Default implementation of new methods throw the exception `UnsupportedOperationException`.
3. Add tests for new methods to `SupportsPartitionManagementSuite`/`SupportsAtomicPartitionManagementSuite`.
4. Add `ALTER TABLE .. DROP PARTITION` tests for DS v1 and v2.

Closes #30776
Closes #30821

### Why are the changes needed?
Currently, the `PURGE` option that user can set in `ALTER TABLE .. DROP PARTITION` is completely ignored. We should pass this flag to the catalog implementation, so, the catalog should decide how to handle the flag.

### Does this PR introduce _any_ user-facing change?
The changes can impact on behavior of `ALTER TABLE .. DROP PARTITION` for v2 tables.

### How was this patch tested?
By running the affected test suites, for instance:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
```

Closes #30886 from MaxGekk/purge-partition.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 34bfb3a)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagementSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableDropPartitionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableDropPartitionSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/AlterTableDropPartitionExec.scala (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsPartitionManagement.java (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/SupportsPartitionManagementSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java (diff)
Commit f421c172d976bf6844b44b0ab9d1e1fa55f380e3 by wenchen
[SPARK-33497][SQL] Override maxRows in some LogicalPlan

### What changes were proposed in this pull request?

This PR aims to override maxRows method in these follow `LogicalPlan`:
* `ReturnAnswer`
* `Join`
* `Range`
* `Sample`
* `RepartitionOperation`
* `Deduplicate`
* `LocalRelation`
* `Window`

### Why are the changes needed?

1. Logically, we know the max rows info with these `LogicalPlan`.
2. Before this PR, we already have some max rows with `LogicalPlan`, so we can eliminate limit with more case if we expand more.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Add test.

Closes #30443 from ulysses-you/SPARK-33497.

Lead-authored-by: ulysses-you <youxiduo@weidian.com>
Co-authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: f421c17)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q61/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q61.sf100/simplified.txt (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LocalRelation.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q61/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q90/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q90/simplified.txt (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LimitPushdownSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q61.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q28/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q28.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q28.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q90.sf100/explain.txt (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CombiningLimitsSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q90.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q28/explain.txt (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala (diff)
Commit cc23581e2645c91fa8d6e6c81dc87b4221718bb1 by wenchen
[SPARK-33858][SQL][TESTS] Unify v1 and v2 ALTER TABLE .. RENAME PARTITION tests

### What changes were proposed in this pull request?
1. Move the `ALTER TABLE .. RENAME PARTITION` parsing tests to `AlterTableRenamePartitionParserSuite`
2. Place the v1 tests for `ALTER TABLE .. RENAME PARTITION` from `DDLSuite` to `v1.AlterTableRenamePartitionSuite` and v2 tests from `AlterTablePartitionV2SQLSuite` to `v2.AlterTableRenamePartitionSuite`, so, the tests will run for V1, Hive V1 and V2 DS.

### Why are the changes needed?
- The unification will allow to run common `ALTER TABLE .. RENAME PARTITION` tests for both DSv1 and Hive DSv1, DSv2
- We can detect missing features and differences between DSv1 and DSv2 implementations.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running new test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRenamePartitionParserSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRenamePartitionSuite"
```

Closes #30863 from MaxGekk/unify-rename-partition-tests.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: cc23581)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was addedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/AlterTableRenamePartitionSuite.scala
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableRenamePartitionSuiteBase.scala
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/AlterTablePartitionV2SQLSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableRenamePartitionSuite.scala
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableRenamePartitionParserSuite.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLCommandTestUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableRenamePartitionSuite.scala
Commit 303df64b466b7734b3c497955d1cca3e34fb663e by wenchen
[SPARK-33889][SQL] Fix NPE from `SHOW PARTITIONS` on V2 tables

### What changes were proposed in this pull request?
At `ShowPartitionsExec.run()`, check that a row returned by `listPartitionIdentifiers()` contains a `null` field, and convert it to `"null"`.

### Why are the changes needed?
Because `SHOW PARTITIONS` throws NPE on V2 table with `null` partition values.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added new UT to `v2.ShowPartitionsSuite`.

Closes #30904 from MaxGekk/fix-npe-show-partitions.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 303df64)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowPartitionsExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/ShowPartitionsSuite.scala (diff)
Commit 7ffcfcf7db57fb62941130e0c7bf61bca08aa758 by wenchen
[SPARK-33847][SQL] Simplify CaseWhen if elseValue is None

### What changes were proposed in this pull request?

1. Enhance `ReplaceNullWithFalseInPredicate` to replace None of elseValue inside `CaseWhen` with `FalseLiteral` if all branches are `FalseLiteral` . The use case is:
```sql
create table t1 using parquet as select id from range(10);
explain select id from t1 where (CASE WHEN id = 1 THEN 'a' WHEN id = 3 THEN 'b' end) = 'c';
```

Before this pr:
```
== Physical Plan ==
*(1) Filter CASE WHEN (id#1L = 1) THEN false WHEN (id#1L = 3) THEN false END
+- *(1) ColumnarToRow
   +- FileScan parquet default.t1[id#1L] Batched: true, DataFilters: [CASE WHEN (id#1L = 1) THEN false WHEN (id#1L = 3) THEN false END], Format: Parquet, Location: InMemoryFileIndex[file:/Users/yumwang/opensource/spark/spark-warehouse/org.apache.spark.sql.DataF..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<id:bigint>

```

After this pr:
```
== Physical Plan ==
LocalTableScan <empty>, [id#1L]
```

2. Enhance `SimplifyConditionals` if elseValue is None and all outputs are null.

### Why are the changes needed?

Improve query performance.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #30852 from wangyum/SPARK-33847.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 7ffcfcf)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicate.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceNullWithFalseInPredicateSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PushFoldableIntoBranchesSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ReplaceNullWithFalseInPredicateEndToEndSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala (diff)
Commit 47d1aa4e93f668774fd0b16c780d3b1f6200bcd8 by gurwls223
[SPARK-33891][DOCS][CORE] Update dynamic allocation related documents

### What changes were proposed in this pull request?

This PR aims to update the followings.
- Remove the outdated requirement for `spark.shuffle.service.enabled` in `configuration.md`
- Dynamic allocation section in `job-scheduling.md`

### Why are the changes needed?

To make the document up-to-date.

### Does this PR introduce _any_ user-facing change?

No, it's a documentation update.

### How was this patch tested?

Manual.

**BEFORE**
![Screen Shot 2020-12-23 at 2 22 04 AM](https://user-images.githubusercontent.com/9700541/102986441-ae647f80-44c5-11eb-97a3-87c2d368952a.png)
![Screen Shot 2020-12-23 at 2 22 34 AM](https://user-images.githubusercontent.com/9700541/102986473-bcb29b80-44c5-11eb-8eae-6802001c6dfa.png)

**AFTER**
![Screen Shot 2020-12-23 at 2 25 36 AM](https://user-images.githubusercontent.com/9700541/102986767-2df24e80-44c6-11eb-8540-e74856a4c313.png)
![Screen Shot 2020-12-23 at 2 21 13 AM](https://user-images.githubusercontent.com/9700541/102986366-8e34c080-44c5-11eb-8054-1efd07c9458c.png)

Closes #30906 from dongjoon-hyun/SPARK-33891.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 47d1aa4)
The file was modifieddocs/job-scheduling.md (diff)
The file was modifieddocs/configuration.md (diff)
Commit 0677c39009de0830d995da77332f0756c76d6b56 by mridulatgmail.com
[SPARK-32916][SHUFFLE][TEST-MAVEN][TEST-HADOOP2.7] Ensure the number of chunks in meta file and index file are equal

### What changes were proposed in this pull request?
1. Fixes for bugs in `RemoteBlockPushResolver` where the number of chunks in meta file and index file are inconsistent due to exceptions while writing to either index file or meta file. This java class was introduced in https://github.com/apache/spark/pull/30062.
- If the writing to index file fails, the position of meta file is not reset. This means that the number of chunks in meta file is inconsistent with index file.
- During the exception handling while writing to index/meta file, we just set the pointer to the start position. If the files are closed just after this then it doesn't get rid of any the extra bytes written to it.
2. Adds an IOException threshold. If the `RemoteBlockPushResolver` encounters IOExceptions greater than this threshold  while updating data/meta/index file of a shuffle partition, then it responds to the client with  exception- `IOExceptions exceeded the threshold` so that client can stop pushing data for this shuffle partition.
3. When the update to metadata fails, exception is not propagated back to the client. This results in the increased size of the current chunk. However, with (2) in place, the current chunk will still be of a manageable size.

### Why are the changes needed?
This fix is needed for the bugs mentioned above.
1. Moved writing to meta file after index file. This fixes the issue because if there is an exception writing to meta file, then the index file position is not updated. With this change, if there is an exception writing to index file, then none of the files are effectively updated and the same is true vice-versa.
2. Truncating the lengths of data/index/meta files when the partition is finalized.
3. When the IOExceptions have reached the threshold, it is most likely that future blocks will also face the issue. So, it is better to let the clients know so that they can stop pushing the blocks for that partition.
4. When just the meta update fails, client retries pushing the block which was successfully merged to data file. This can be avoided by letting the chunk grow slightly.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added unit tests for all the bugs and threshold.

Closes #30433 from otterc/SPARK-32916-followup.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
(commit: 0677c39)
The file was modifiedcommon/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ErrorHandler.java (diff)
The file was modifiedcommon/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java (diff)
Commit 5c9b421c3711ba373b4d5cbbd83a8ece91291ed0 by dhyun
[SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends

### What changes were proposed in this pull request?

This is a retry of #30177.

This is not a complete fix, but it would take long time to complete (#30242).
As discussed offline, at least using `ContextAwareIterator` should be helpful enough for many cases.

As the Python evaluation consumes the parent iterator in a separate thread, it could consume more data from the parent even after the task ends and the parent is closed. Thus, we should use `ContextAwareIterator` to stop consuming after the task ends.

### Why are the changes needed?

Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.

E.g.,:

```py
spark.range(0, 100000, 1, 1).write.parquet(path)

spark.conf.set("spark.sql.columnVector.offheap.enabled", True)

def f(x):
    return 0

fUdf = udf(f, LongType())

spark.read.parquet(path).select(fUdf('id')).head()
```

This is because, the Python evaluation consumes the parent iterator in a separate thread and it consumes more data from the parent even after the task ends and the parent is closed. If an off-heap column vector exists in the parent iterator, it could cause segmentation fault which crashes the executor.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added tests, and manually.

Closes #30899 from ueshin/issues/SPARK-33277/context_aware_iterator.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 5c9b421)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/python/MapInPandasExec.scala (diff)
The file was addedcore/src/main/scala/org/apache/spark/ContextAwareIterator.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/python/EvalPythonExec.scala (diff)
Commit d467d817260d6ca605c34f493e68d0877209170f by dhyun
[SPARK-33893][CORE] Exclude fallback block manager from executorList

### What changes were proposed in this pull request?

This PR aims to exclude fallback block manager from `executorList` function.

### Why are the changes needed?

When a fallback storage is used, the executors UI tab hangs because the executor list REST API result doesn't have `peakMemoryMetrics` of `ExecutorMetrics`. The root cause is that the block manager id used by fallback storage is included in the API result and it doesn't have `peakMemoryMetrics` because it's populated during HeartBeat reporting. We should hide it.

### Does this PR introduce _any_ user-facing change?

No. This is a bug fix on UI.

### How was this patch tested?

Manual. Run the following and visit Spark `executors` tab UI with browser.
```
bin/spark-shell -c spark.storage.decommission.fallbackStorage.path=file:///tmp/spark-storage/
```

Closes #30911 from dongjoon-hyun/SPARK-33893.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: d467d81)
The file was modifiedcore/src/main/scala/org/apache/spark/status/AppStatusStore.scala (diff)
Commit 368a2c341d8f3315c759e1c2362439534a9d44e7 by dhyun
[SPARK-33877][SQL][FOLLOWUP] SQL reference documents for INSERT w/ a column list

### What changes were proposed in this pull request?

followup of https://github.com/apache/spark/commit/a3dd8dacee8f6b316be90500f9fd8ec8997a5784 via suggestion https://github.com/apache/spark/pull/30888#discussion_r547822642
### Why are the changes needed?

doc improvement
### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

passing GA doc

Closes #30909 from yaooqinn/SPARK-33877-F.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 368a2c3)
The file was modifieddocs/sql-ref-syntax-dml-insert-into.md (diff)
The file was modifieddocs/sql-ref-syntax-dml-insert-overwrite-table.md (diff)
Commit 61881bb6988aa0320b4bacfabbc0ee6f05f287cb by srowen
[SPARK-33835][CORE] Refector AbstractCommandBuilder.buildJavaCommand: use firstNonEmpty

### What changes were proposed in this pull request?
refector AbstractCommandBuilder.buildJavaCommand: use firstNonEmpty

### Why are the changes needed?
For better code understanding, and firstNonEmpty can detect javaHome = "   ", an empty string.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
End to End.

Closes #30831 from offthewall123/refector_AbstractCommandBuilder.

Authored-by: offthewall123 <dingyu.xu@intel.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 61881bb)
The file was modifiedlauncher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java (diff)
Commit 86c1cfc5791dae5f2ee8ccd5095dbeb2243baba6 by gurwls223
[SPARK-33659][SS] Document the current behavior for DataStreamWriter.toTable API

### What changes were proposed in this pull request?
Follow up work for #30521, document the following behaviors in the API doc:

- Figure out the effects when configurations are (provider/partitionBy) conflicting with the existing table.
- Document the lack of functionality on creating a v2 table, and guide that the users should ensure a table is created in prior to avoid the behavior unintended/insufficient table is being created.

### Why are the changes needed?
We didn't have full support for the V2 table created in the API now. (TODO SPARK-33638)

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Document only.

Closes #30885 from xuanyuanking/SPARK-33659.

Authored-by: Yuanjian Li <yuanjian.li@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 86c1cfc)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamTableAPISuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala (diff)
The file was modifiedpython/pyspark/sql/streaming.py (diff)
Commit f1d37972910d94c713c6a7cb7bd6ea2b52576d00 by wenchen
[SPARK-33886][SQL] UnresolvedTable should retain SQL text position for DDL commands

### What changes were proposed in this pull request?

Currently, there are many DDL commands where the position of the unresolved identifiers are incorrect:
```
scala> sql("MSCK REPAIR TABLE unknown")
org.apache.spark.sql.AnalysisException: Table not found: unknown; line 1 pos 0;
```
, whereas the `pos` should be 18.

This PR proposes to fix this issue for commands using `UnresolvedTable`:
```
MSCK REPAIR TABLE t
LOAD DATA LOCAL INPATH 'filepath' INTO TABLE t
TRUNCATE TABLE t
SHOW PARTITIONS t
ALTER TABLE t RECOVER PARTITIONS
ALTER TABLE t ADD PARTITION (p=1)
ALTER TABLE t PARTITION (p=1) RENAME TO PARTITION (p=2)
ALTER TABLE t DROP PARTITION (p=1)
ALTER TABLE t SET SERDEPROPERTIES ('a'='b')
COMMENT ON TABLE t IS 'hello'"
```

### Why are the changes needed?

To fix a bug.

### Does this PR introduce _any_ user-facing change?

Yes, now the above example will print the following:
```
org.apache.spark.sql.AnalysisException: Table not found: unknown; line 1 pos 18;
```

### How was this patch tested?

Add a new suite of tests.

Closes #30900 from imback82/position_Fix.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: f1d3797)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisExceptionPositionSuite.scala
Commit d7dc42d5f6bbe861c7e4ac1bb49e0830af5e19f4 by wenchen
[SPARK-33895][SQL] Char and Varchar fail in MetaOperation of ThriftServer

### What changes were proposed in this pull request?

```
Caused by: java.lang.IllegalArgumentException: Unrecognized type name: CHAR(10)
at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.toJavaSQLType(SparkGetColumnsOperation.scala:187)
at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$addToRowSet$1(SparkGetColumnsOperation.scala:203)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.addToRowSet(SparkGetColumnsOperation.scala:195)
at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$runInternal$4(SparkGetColumnsOperation.scala:99)
at org.apache.spark.sql.hive.thriftserver.SparkGetColumnsOperation.$anonfun$runInternal$4$adapted(SparkGetColumnsOperation.scala:98)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
```

meta operation is targeting raw table schema, we need to handle these types there.

### Why are the changes needed?

bugfix, see the above case
### Does this PR introduce _any_ user-facing change?

no
### How was this patch tested?

new tests

locally

![image](https://user-images.githubusercontent.com/8326978/103069196-cdfcc480-45f9-11eb-9c6a-d4c42123c6e3.png)

Closes #30914 from yaooqinn/SPARK-33895.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: d7dc42d)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkMetadataOperationSuite.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetTypeInfoOperation.scala (diff)
Commit 32d4a2b06220861efda1058b26d9a2ed3a1b2c74 by wenchen
[SPARK-33861][SQL] Simplify conditional in predicate

### What changes were proposed in this pull request?

This pr simplify conditional in predicate, after this change we can push down the filter to datasource:

Expression | After simplify
-- | --
IF(cond, trueVal, false)                   | AND(cond, trueVal)
IF(cond, trueVal, true)                    | OR(NOT(cond), trueVal)
IF(cond, false, falseVal)                  | AND(NOT(cond), elseVal)
IF(cond, true, falseVal)                   | OR(cond, elseVal)
CASE WHEN cond THEN trueVal ELSE false END | AND(cond, trueVal)
CASE WHEN cond THEN trueVal END            | AND(cond, trueVal)
CASE WHEN cond THEN trueVal ELSE null END  | AND(cond, trueVal)
CASE WHEN cond THEN trueVal ELSE true END  | OR(NOT(cond), trueVal)
CASE WHEN cond THEN false ELSE elseVal END | AND(NOT(cond), elseVal)
CASE WHEN cond THEN false END              | false
CASE WHEN cond THEN true ELSE elseVal END  | OR(cond, elseVal)
CASE WHEN cond THEN true END               | cond

### Why are the changes needed?

Improve query performance.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #30865 from wangyum/SPARK-33861.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 32d4a2b)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q73/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q34.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-modified/q34.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q34/explain.txt (diff)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalsInPredicateSuite.scala
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-modified/q34/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q34/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-modified/q73/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q34.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-modified/q73.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q34.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q73.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q34/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-modified/q34/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q73/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-modified/q73.sf100/explain.txt (diff)
The file was addedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalsInPredicate.scala
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-modified/q34.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-modified/q73/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q73.sf100/simplified.txt (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q34.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q34/simplified.txt (diff)
Commit 3e9821edfd636d2bc8be8f9cc5fc87be48bebc79 by wenchen
[SPARK-33443][SQL] LEAD/LAG should support [ IGNORE NULLS | RESPECT NULLS ]

### What changes were proposed in this pull request?
The mainstream database support `[ IGNORE NULLS | RESPECT NULLS ]` for `LEAD`/`LAG`/`NTH_VALUE`/`FIRST_VALUE`/`LAST_VALUE`.
But the current implement of `LEAD`/`LAG` don't support this syntax.

**Oracle**
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/LEAD.html#GUID-0A0481F1-E98F-4535-A739-FCCA8D1B5B77

**Presto**
https://prestodb.io/docs/current/functions/window.html

**Redshift**
https://docs.aws.amazon.com/redshift/latest/dg/r_WF_LEAD.html

**DB2**
https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm

**Teradata**
https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w

**Snowflake**
https://docs.snowflake.com/en/sql-reference/functions/lead.html
https://docs.snowflake.com/en/sql-reference/functions/lag.html

### Why are the changes needed?
Support `[ IGNORE NULLS | RESPECT NULLS ]` for `LEAD`/`LAG` is very useful.

### Does this PR introduce _any_ user-facing change?
'Yes'.

### How was this patch tested?
Jenkins test.

Closes #30387 from beliefer/SPARK-33443.

Lead-authored-by: gengjiaan <gengjiaan@360.cn>
Co-authored-by: beliefer <beliefer@163.com>
Co-authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 3e9821e)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala (diff)
Commit 54a67842e678a54e976160c5ad249767165fab0f by wenchen
[SPARK-33881][SQL][TESTS] Check null and empty string as partition values in DS v1 and v2 tests

### What changes were proposed in this pull request?
Add tests to check handling `null` and `''` (empty string) as partition values in commands `SHOW PARTITIONS`, `ALTER TABLE .. ADD PARTITION`, `ALTER TABLE .. DROP PARTITION`.

### Why are the changes needed?
To improve test coverage.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running the modified test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.ShowPartitionsSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.AlterTableAddPartitionSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *.AlterTableDropPartitionSuite"
```

Closes #30893 from MaxGekk/partition-value-empty-string.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 54a6784)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableAddPartitionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableAddPartitionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableDropPartitionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/ShowPartitionsSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/ShowPartitionsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowPartitionsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/AlterTableDropPartitionSuite.scala (diff)
Commit 29cca68e9e55fae8389378de6f30d0dfa7a74010 by wenchen
[SPARK-33892][SQL] Display char/varchar in DESC and SHOW CREATE TABLE

### What changes were proposed in this pull request?

Display char/varchar in
  - DESC table
  - DESC column
  - SHOW CREATE TABLE

### Why are the changes needed?

show the correct definition for users

### Does this PR introduce _any_ user-facing change?

yes, char/varchar  column's will print char/varchar instead of string
### How was this patch tested?

new tests

Closes #30908 from yaooqinn/SPARK-33892.

Authored-by: Kent Yao <yaooqinn@hotmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 29cca68)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/HiveCharVarcharTestSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2CommandExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablePropertiesExec.scala (diff)
Commit 700f5ab65c1c84522302ce92d176adf229c34daa by sarutak
[SPARK-33900][WEBUI] Show shuffle read size / records correctly when only remotebytesread is available

### What changes were proposed in this pull request?
Shuffle Read Size / Records can also be displayed in remoteBytesRead>0 localBytesRead=0.

current:
![image](https://user-images.githubusercontent.com/3898450/103079421-c4ca2280-460e-11eb-9e2f-49d35b5d324d.png)
fix:
![image](https://user-images.githubusercontent.com/3898450/103079439-cc89c700-460e-11eb-9a41-6b2882980d11.png)

### Why are the changes needed?
At present, the page only displays the data of Shuffle Read Size / Records when localBytesRead>0.
When there is only remote reading, metrics cannot be seen on the stage page.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
manual test

Closes #30916 from cxzl25/SPARK-33900.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(commit: 700f5ab)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/stagepage.js (diff)
Commit 9c30116fb428f87543155323617cf5fb700e84cd by dhyun
[SPARK-33857][SQL] Unify the default seed of random functions

### What changes were proposed in this pull request?

Unify the seed of random functions
1. Add a hold place expression `UnresolvedSeed ` as the defualt seed.
2. Change `Rand`,`Randn`,`Uuid`,`Shuffle` default seed to `UnresolvedSeed `.
3. Replace `UnresolvedSeed ` to real seed at `ResolveRandomSeed` rule.

### Why are the changes needed?

`Uuid` and `Shuffle` use the `ResolveRandomSeed` rule to set the seed if user doesn't give a seed value. `Rand` and `Randn` do this at constructing.

It's better to unify the default seed at Analyzer side since we have used `ExpressionWithRandomSeed` at streaming query.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass exists test and add test.

Closes #30864 from ulysses-you/SPARK-33857.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 9c30116)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
Commit 65a9ac2ff4d902976bf3ef89d1d3e29c1e6d5414 by dhyun
[SPARK-30027][SQL] Support codegen for aggregate filters in HashAggregateExec

### What changes were proposed in this pull request?

This pr intends to support code generation for `HashAggregateExec` with filters.

Quick benchmark results:
```
$ ./bin/spark-shell --master=local[1] --conf spark.driver.memory=8g --conf spark.sql.shuffle.partitions=1 -v

scala> spark.range(100000000).selectExpr("id % 3 as k1", "id % 5 as k2", "rand() as v1", "rand() as v2").write.saveAsTable("t")
scala> sql("SELECT k1, k2, AVG(v1) FILTER (WHERE v2 > 0.5) FROM t GROUP BY k1, k2").write.format("noop").mode("overwrite").save()

>> Before this PR
Elapsed time: 16.170697619s

>> After this PR
Elapsed time: 6.7825313s
```

The query above is compiled into code below;

```
...
/* 285 */   private void agg_doAggregate_avg_0(boolean agg_exprIsNull_2_0, org.apache.spark.sql.catalyst.InternalRow agg_unsafeRowAggBuffer_0, double agg_expr_2_0) throws java.io.IOException {
/* 286 */     // evaluate aggregate function for avg
/* 287 */     boolean agg_isNull_10 = true;
/* 288 */     double agg_value_12 = -1.0;
/* 289 */     boolean agg_isNull_11 = agg_unsafeRowAggBuffer_0.isNullAt(0);
/* 290 */     double agg_value_13 = agg_isNull_11 ?
/* 291 */     -1.0 : (agg_unsafeRowAggBuffer_0.getDouble(0));
/* 292 */     if (!agg_isNull_11) {
/* 293 */       agg_agg_isNull_12_0 = true;
/* 294 */       double agg_value_14 = -1.0;
/* 295 */       do {
/* 296 */         if (!agg_exprIsNull_2_0) {
/* 297 */           agg_agg_isNull_12_0 = false;
/* 298 */           agg_value_14 = agg_expr_2_0;
/* 299 */           continue;
/* 300 */         }
/* 301 */
/* 302 */         if (!false) {
/* 303 */           agg_agg_isNull_12_0 = false;
/* 304 */           agg_value_14 = 0.0D;
/* 305 */           continue;
/* 306 */         }
/* 307 */
/* 308 */       } while (false);
/* 309 */
/* 310 */       agg_isNull_10 = false; // resultCode could change nullability.
/* 311 */
/* 312 */       agg_value_12 = agg_value_13 + agg_value_14;
/* 313 */
/* 314 */     }
/* 315 */     boolean agg_isNull_15 = false;
/* 316 */     long agg_value_17 = -1L;
/* 317 */     if (!false && agg_exprIsNull_2_0) {
/* 318 */       boolean agg_isNull_18 = agg_unsafeRowAggBuffer_0.isNullAt(1);
/* 319 */       long agg_value_20 = agg_isNull_18 ?
/* 320 */       -1L : (agg_unsafeRowAggBuffer_0.getLong(1));
/* 321 */       agg_isNull_15 = agg_isNull_18;
/* 322 */       agg_value_17 = agg_value_20;
/* 323 */     } else {
/* 324 */       boolean agg_isNull_19 = true;
/* 325 */       long agg_value_21 = -1L;
/* 326 */       boolean agg_isNull_20 = agg_unsafeRowAggBuffer_0.isNullAt(1);
/* 327 */       long agg_value_22 = agg_isNull_20 ?
/* 328 */       -1L : (agg_unsafeRowAggBuffer_0.getLong(1));
/* 329 */       if (!agg_isNull_20) {
/* 330 */         agg_isNull_19 = false; // resultCode could change nullability.
/* 331 */
/* 332 */         agg_value_21 = agg_value_22 + 1L;
/* 333 */
/* 334 */       }
/* 335 */       agg_isNull_15 = agg_isNull_19;
/* 336 */       agg_value_17 = agg_value_21;
/* 337 */     }
/* 338 */     // update unsafe row buffer
/* 339 */     if (!agg_isNull_10) {
/* 340 */       agg_unsafeRowAggBuffer_0.setDouble(0, agg_value_12);
/* 341 */     } else {
/* 342 */       agg_unsafeRowAggBuffer_0.setNullAt(0);
/* 343 */     }
/* 344 */
/* 345 */     if (!agg_isNull_15) {
/* 346 */       agg_unsafeRowAggBuffer_0.setLong(1, agg_value_17);
/* 347 */     } else {
/* 348 */       agg_unsafeRowAggBuffer_0.setNullAt(1);
/* 349 */     }
/* 350 */   }
...
```

### Why are the changes needed?

For high performance.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #27019 from maropu/AggregateFilterCodegen.

Authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 65a9ac2)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/group-by-filter.sql (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/explain.sql.out (diff)
Commit 10b6466e91d2e954386c74bf6ab7d94f23dd6810 by yamamuro
[SPARK-33084][CORE][SQL] Add jar support ivy path

### What changes were proposed in this pull request?
Support add jar with ivy path

### Why are the changes needed?
Since submit app can support ivy, add jar we can also support ivy now.

### Does this PR introduce _any_ user-facing change?
User can add jar with sql like
```
add jar ivy:://group:artifict:version?exclude=xxx,xxx&transitive=true
add jar ivy:://group:artifict:version?exclude=xxx,xxx&transitive=false
```

core api
```
sparkContext.addJar("ivy:://group:artifict:version?exclude=xxx,xxx&transitive=true")
sparkContext.addJar("ivy:://group:artifict:version?exclude=xxx,xxx&transitive=false")
```

#### Doc Update snapshot
![image](https://user-images.githubusercontent.com/46485123/101227738-de451200-36d3-11eb-813d-78a8b879da4f.png)

### How was this patch tested?
Added UT

Closes #29966 from AngersZhuuuu/support-add-jar-ivy.

Lead-authored-by: angerszhu <angers.zhu@gmail.com>
Co-authored-by: AngersZhuuuu <angers.zhu@gmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: 10b6466)
The file was addedcore/src/main/scala/org/apache/spark/util/DependencyUtils.scala
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/SparkContextSuite.scala (diff)
The file was removedcore/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala
The file was modifieddocs/sql-ref-syntax-aux-resource-mgmt-add-jar.md (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala (diff)
The file was addedcore/src/test/scala/org/apache/spark/util/DependencyUtils.scala
The file was modifiedcore/src/main/scala/org/apache/spark/SparkContext.scala (diff)
The file was addedsql/core/src/test/resources/SPARK-33084.jar