Changes

Summary

  1. [SPARK-33525][SQL] Update hive-service-rpc to 3.1.2 (commit: 1de3fc4) (details)
  2. [SPARK-33565][BUILD][PYTHON] remove python3.8 and fix breakage (commit: c529426) (details)
  3. [SPARK-33523][SQL][TEST][FOLLOWUP] Fix benchmark case name in (commit: fb7b870) (details)
  4. [SPARK-33562][UI] Improve the style of the checkbox in executor page (commit: 919ea45) (details)
  5. [SPARK-33565][INFRA][FOLLOW-UP] Keep the test coverage with Python 3.8 (commit: ed9e6fc) (details)
  6. [SPARK-33551][SQL] Do not use custom shuffle reader for repartition (commit: dfa3978) (details)
Commit 1de3fc42829187c54334df1fb2149dc4aeb78ed9 by dongjoon
[SPARK-33525][SQL] Update hive-service-rpc to 3.1.2

### What changes were proposed in this pull request?

We supported Hive metastore are 0.12.0 through 3.1.2, but we supported hive-jdbc are 0.12.0 through 2.3.7. It will throw `TProtocolException` if we use hive-jdbc 3.x:

```
[rootspark-3267648 apache-hive-3.1.2-bin]# bin/beeline -u jdbc:hive2://localhost:10000/default
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Spark SQL (version 3.1.0-SNAPSHOT)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://localhost:10000/default> create table t1(id int) using parquet;
Unexpected end of file when reading from HS2 server. The root cause might be too many concurrent connections. Please ask the administrator to check the number of active connections, and adjust hive.server2.thrift.max.worker.threads if applicable.
Error: org.apache.thrift.transport.TTransportException (state=08S01,code=0)
```
```
org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client?
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:234)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
at java.base/java.lang.Thread.run(Thread.java:832)
```

This pr upgrade hive-service-rpc to 3.1.2 to fix this issue.

### Why are the changes needed?

To support hive-jdbc 3.x.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test:
```
[rootspark-3267648 apache-hive-3.1.2-bin]# bin/beeline -u jdbc:hive2://localhost:10000/default
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Spark SQL (version 3.1.0-SNAPSHOT)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://localhost:10000/default> create table t1(id int) using parquet;
+---------+
| Result  |
+---------+
+---------+
No rows selected (1.051 seconds)
0: jdbc:hive2://localhost:10000/default> insert into t1 values(1);
+---------+
| Result  |
+---------+
+---------+
No rows selected (2.08 seconds)
0: jdbc:hive2://localhost:10000/default> select * from t1;
+-----+
| id  |
+-----+
| 1   |
+-----+
1 row selected (0.605 seconds)
```

Closes #30478 from wangyum/SPARK-33525.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 1de3fc4)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/CLIService.java (diff)
The file was modifiedpom.xml (diff)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIService.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/ICLIService.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/GetInfoType.java (diff)
Commit c529426d872c6f09b05679ba76478e3b932e3696 by incomplete
[SPARK-33565][BUILD][PYTHON] remove python3.8 and fix breakage

### What changes were proposed in this pull request?
remove python 3.8 from python/run-tests.py and stop build breaks

### Why are the changes needed?
the python tests are running against the bare-bones system install of python3, rather than an anaconda environment.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
via jenkins

Closes #30506 from shaneknapp/remove-py38.

Authored-by: shane knapp <incomplete@gmail.com>
Signed-off-by: shane knapp <incomplete@gmail.com>
(commit: c529426)
The file was modifiedpython/run-tests.py (diff)
Commit fb7b87021437c52d72ad276f92c8d6f5443ebd78 by dongjoon
[SPARK-33523][SQL][TEST][FOLLOWUP] Fix benchmark case name in SubExprEliminationBenchmark

### What changes were proposed in this pull request?

Fix the wrong benchmark case name.

### Why are the changes needed?

The last commit to refactor the benchmark code missed a change of case name.

### Does this PR introduce _any_ user-facing change?

No, dev only.

### How was this patch tested?

Unit test.

Closes #30505 from viirya/SPARK-33523-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: fb7b870)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala (diff)
The file was modifiedsql/core/benchmarks/SubExprEliminationBenchmark-results.txt (diff)
The file was modifiedsql/core/benchmarks/SubExprEliminationBenchmark-jdk11-results.txt (diff)
Commit 919ea45e89b17d2f9b336dc4bfe6e15e8a083ed3 by gurwls223
[SPARK-33562][UI] Improve the style of the checkbox in executor page

### What changes were proposed in this pull request?

1. Remove the fixed width style of class `container-fluid-div`. So that the UI looks clean when the text is long.
2. Add one space between a checkbox and the text on the right side, which is consistent with the stage page.

### Why are the changes needed?

The width of class `container-fluid-div` is set as 200px after https://github.com/apache/spark/pull/21688 . This makes the checkbox in the executor page messy.
![image](https://user-images.githubusercontent.com/1097932/100242069-3bc5ab80-2ee9-11eb-8c7d-96c221398fee.png)

We should remove the width limit.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

Manual test.
After the changes:
![image](https://user-images.githubusercontent.com/1097932/100257802-2f4a4e80-2efb-11eb-9eb0-92d6988ad14b.png)

Closes #30500 from gengliangwang/reviseStyle.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 919ea45)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/webui.css (diff)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/executorspage.js (diff)
Commit ed9e6fc18236ef6994c7f24a4017cf43f77b7ca1 by gurwls223
[SPARK-33565][INFRA][FOLLOW-UP] Keep the test coverage with Python 3.8 in GitHub Actions

### What changes were proposed in this pull request?

This PR proposes to keep the test coverage with Python 3.8 in GitHub Actions. It is not tested for now in Jenkins due to an env issue.

**Before this change in GitHub Actions:**

```
========================================================================
Running PySpark tests
========================================================================
Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log
Will test against the following Python executables: ['python3.6', 'pypy3']
...
```

**After this change in GitHub Actions:**

```
========================================================================
Running PySpark tests
========================================================================
Running PySpark tests. Output is in /__w/spark/spark/python/unit-tests.log
Will test against the following Python executables: ['python3.6', 'python3.8', 'pypy3']
```

### Why are the changes needed?

To keep the test coverage with Python 3.8 in GitHub Actions.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

GitHub Actions in this build will test.

Closes #30510 from HyukjinKwon/SPARK-33565.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: ed9e6fc)
The file was modifieddev/run-tests.py (diff)
Commit dfa3978d9191e02eabf65d1829c970644d25d57e by gatorsmile
[SPARK-33551][SQL] Do not use custom shuffle reader for repartition

### What changes were proposed in this pull request?

This PR fixes an AQE issue where local shuffle reader, partition coalescing, or skew join optimization can be mistakenly applied to a shuffle introduced by repartition or a regular shuffle that logically replaces a repartition shuffle.
The proposed solution checks for the presence of any repartition shuffle and filters out not applicable optimization rules for the final stage in an AQE plan.

### Why are the changes needed?

Without the change, the output of a repartition query may not be correct.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added UT.

Closes #30494 from maryannxue/csr-repartition.

Authored-by: Maryann Xue <maryann.xue@gmail.com>
Signed-off-by: Xiao Li <gatorsmile@gmail.com>
(commit: dfa3978)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeLocalShuffleReader.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderRule.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)