Changes

Summary

  1. Revert "[SPARK-35321][SQL] Don't register Hive permanent functions when (commit: e31bef1) (details)
  2. [SPARK-35347][SQL] Use MethodUtils for looking up methods in Invoke and (commit: 5b65d8a) (details)
  3. [SPARK-35231][SQL] logical.Range override maxRowsPerPartition (commit: 620f072) (details)
  4. [SPARK-35354][SQL] Replace BaseJoinExec with ShuffledJoin in (commit: 38eb5a6) (details)
  5. [SPARK-35111][SPARK-35112][SQL][FOLLOWUP] Rename ANSI interval patterns (commit: 2c8ced9) (details)
  6. [SPARK-35261][SQL][TESTS][FOLLOW-UP] Change failOnError to false for (commit: 245dce1) (details)
  7. [SPARK-35358][BUILD] Increase maximum Java heap used for release build (commit: 20d3224) (details)
  8. [MINOR][INFRA] Add python/.idea into git ignore (commit: d808956) (details)
  9. [SPARK-35360][SQL] RepairTableCommand respects (commit: 7182f8c) (details)
  10. [SPARK-34246][FOLLOWUP] Change the definition of (commit: d2a535f) (details)
  11. [SPARK-34736][K8S][TESTS] Kubernetes and Minikube version upgrade for (commit: 8b94eff) (details)
  12. [SPARK-35088][SQL][FOLLOWUP] Improve the error message for Sequence (commit: 44bd0a8) (details)
  13. [SPARK-35363][SQL] Refactor sort merge join code-gen be agnostic to join (commit: c4ca232) (details)
Commit e31bef1ed4744de9e83bac1887cbfaad7597d78f by dongjoon
Revert "[SPARK-35321][SQL] Don't register Hive permanent functions when creating Hive client"

This reverts commit b4ec9e230484db88c6220c27e43e3db11f3bdeef.
(commit: e31bef1)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (diff)
Commit 5b65d8a129a63ac5c9ad482842901a1a0d1420ad by dongjoon
[SPARK-35347][SQL] Use MethodUtils for looking up methods in Invoke and StaticInvoke

### What changes were proposed in this pull request?

This patch proposes to use `MethodUtils` for looking up methods `Invoke` and `StaticInvoke` expressions.

### Why are the changes needed?

Currently we wrote our logic in `Invoke` and `StaticInvoke` expressions for looking up methods. It is tricky to consider all the cases and there is already existing utility package for this purpose. We should reuse the utility package.

### Does this PR introduce _any_ user-facing change?

No, internal change only.

### How was this patch tested?

Existing tests.

Closes #32474 from viirya/invoke-util.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 5b65d8a)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala (diff)
Commit 620f0727e357261c84c4f29b92f12261c9390182 by yamamuro
[SPARK-35231][SQL] logical.Range override maxRowsPerPartition

### What changes were proposed in this pull request?
when `numSlices` is avaiable, `logical.Range` should compute a exact `maxRowsPerPartition`

### Why are the changes needed?
`maxRowsPerPartition` is used in optimizer, we should provide an exact value if possible

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing testsuites

Closes #32350 from zhengruifeng/range_maxRowsPerPartition.

Authored-by: Ruifeng Zheng <ruifengz@foxmail.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: 620f072)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/LogicalPlanSuite.scala (diff)
Commit 38eb5a6936feae69037a9e6d5ca524a3722be066 by yamamuro
[SPARK-35354][SQL] Replace BaseJoinExec with ShuffledJoin in CoalesceBucketsInJoin

### What changes were proposed in this pull request?

As title. We should use a more restrictive interface `ShuffledJoin` other than `BaseJoinExec` in `CoalesceBucketsInJoin`, as the rule only applies to sort merge join and shuffled hash join (i.e. `ShuffledJoin`).

### Why are the changes needed?

Code cleanup.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit test in `CoalesceBucketsInJoinSuite`.

Closes #32480 from c21/minor-cleanup.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: 38eb5a6)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala (diff)
Commit 2c8ced95905b8a9f7b98c4913a385d25da455e7a by max.gekk
[SPARK-35111][SPARK-35112][SQL][FOLLOWUP] Rename ANSI interval patterns and regexps

### What changes were proposed in this pull request?
Rename pattern strings and regexps of year-month and day-time intervals.

### Why are the changes needed?
To improve code maintainability.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By existing test suites.

Closes #32444 from AngersZhuuuu/SPARK-35111-followup.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: 2c8ced9)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/IntervalUtils.scala (diff)
Commit 245dce1ea16eb50c7a73f9abb9d12fec347c577d by wenchen
[SPARK-35261][SQL][TESTS][FOLLOW-UP] Change failOnError to false for NativeAdd in V2FunctionBenchmark

### What changes were proposed in this pull request?

Change `failOnError` to false for `NativeAdd` in `V2FunctionBenchmark`.

### Why are the changes needed?

Since `NativeAdd` is simply doing addition on long it's better to set `failOnError` to false so it will use native long addition instead of `Math.addExact`.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

Closes #32481 from sunchao/SPARK-35261-follow-up.

Authored-by: Chao Sun <sunchao@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 245dce1)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/functions/V2FunctionBenchmark.scala (diff)
The file was modifiedsql/core/benchmarks/V2FunctionBenchmark-jdk11-results.txt (diff)
The file was modifiedsql/core/benchmarks/V2FunctionBenchmark-results.txt (diff)
Commit 20d32242a2574637d18771c2f7de5c9878f85bad by dongjoon
[SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM

### What changes were proposed in this pull request?

This patch proposes to increase the maximum heap memory setting for release build.

### Why are the changes needed?

When I was cutting RCs for 2.4.8, I frequently encountered OOM during building using mvn. It happens many times until I increased the heap memory setting.

I am not sure if other release managers encounter the same issue. So I propose to increase the heap memory setting and see if it looks good for others.

### Does this PR introduce _any_ user-facing change?

No, dev only.

### How was this patch tested?

Manually used it during cutting RCs of 2.4.8.

Closes #32487 from viirya/release-mvn-oom.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 20d3224)
The file was modifieddev/create-release/release-build.sh (diff)
Commit d808956be4aaded5afaa8ff34a2f7bae591c39bf by gurwls223
[MINOR][INFRA] Add python/.idea into git ignore

### What changes were proposed in this pull request?

This PR adds `python/.idea` into Git ignore. PyCharm is supposed to be open against `python` directory which contains `pyspark` package as its root package.

This was caused by https://github.com/apache/spark/pull/32337.

### Why are the changes needed?

To ignore `.idea` file for PyCharm.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested by testing with `git` command.

Closes #32490 from HyukjinKwon/minor-python-gitignore.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: d808956)
The file was modified.gitignore (diff)
Commit 7182f8cece0b18a199ccf448e0ff9ea42344cf62 by max.gekk
[SPARK-35360][SQL] RepairTableCommand respects `spark.sql.addPartitionInBatch.size` too

### What changes were proposed in this pull request?
RepairTableCommand respects `spark.sql.addPartitionInBatch.size` too

### Why are the changes needed?
Make RepairTableCommand add partition batch size configurable.

### Does this PR introduce _any_ user-facing change?
User can use `spark.sql.addPartitionInBatch.size` to change batch size when repair table.

### How was this patch tested?
Not need

Closes #32489 from AngersZhuuuu/SPARK-35360.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: 7182f8c)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit d2a535f85b14cd34174c8f3de5cb105964759fd6 by ltnwgl
[SPARK-34246][FOLLOWUP] Change the definition of `findTightestCommonType` for backward compatibility

### What changes were proposed in this pull request?

Change the definition of `findTightestCommonType` from
```
def findTightestCommonType(t1: DataType, t2: DataType): Option[DataType]
```
to
```
val findTightestCommonType: (DataType, DataType) => Option[DataType]
```

### Why are the changes needed?

For backward compatibility.
When running a MongoDB connector (built with Spark 3.1.1) with the latest master, there is such an error
```
java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.analysis.TypeCoercion$.findTightestCommonType()Lscala/Function2
```
from https://github.com/mongodb/mongo-spark/blob/master/src/main/scala/com/mongodb/spark/sql/MongoInferSchema.scala#L150

In the previous release, the function was
```
static public  scala.Function2<org.apache.spark.sql.types.DataType, org.apache.spark.sql.types.DataType, scala.Option<org.apache.spark.sql.types.DataType>> findTightestCommonType ()
```
After https://github.com/apache/spark/pull/31349, the function becomes:
```
static public  scala.Option<org.apache.spark.sql.types.DataType> findTightestCommonType (org.apache.spark.sql.types.DataType t1, org.apache.spark.sql.types.DataType t2)
```

This PR is to reduce the unnecessary API change.
### Does this PR introduce _any_ user-facing change?

Yes, the definition of `TypeCoercion.findTightestCommonType`  is consistent with previous release again.

### How was this patch tested?

Existing unit tests

Closes #32493 from gengliangwang/typecoercion.

Authored-by: Gengliang Wang <ltnwgl@gmail.com>
Signed-off-by: Gengliang Wang <ltnwgl@gmail.com>
(commit: d2a535f)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AnsiTypeCoercion.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala (diff)
Commit 8b94eff1ca30a99569738385495ed855c4eb44a0 by piros.attila.zsolt
[SPARK-34736][K8S][TESTS] Kubernetes and Minikube version upgrade for integration tests

### What changes were proposed in this pull request?

This PR upgrades Kubernetes and Minikube version for integration tests and removes/updates the old code for this new version.

Details of this changes:

- As [discussed in the mailing list](http://apache-spark-developers-list.1001551.n3.nabble.com/minikube-and-kubernetes-cluster-versions-for-integration-testing-td30856.html): updating Minikube version from v0.34.1 to v1.7.3 and kubernetes version from v1.15.12 to v1.17.3.
- making Minikube version checked and fail with an explanation when the test is started with on a version <  v1.7.3.
- removing minikube status checking code related to old Minikube versions
- in the Minikube backend using fabric8's `Config.autoConfigure()` method to configure the kubernetes client to use the `minikube` k8s context (like it was in [one of the Minikube's example](https://github.com/fabric8io/kubernetes-client/blob/master/kubernetes-examples/src/main/java/io/fabric8/kubernetes/examples/kubectl/equivalents/ConfigUseContext.java#L36))
- Introducing `persistentVolume` test tag: this would be a temporary change to skip PVC tests in the Kubernetes integration test, as currently the PCV tests are blocking the move to Docker as Minikube's driver (for details please check https://issues.apache.org/jira/browse/SPARK-34738).

### Why are the changes needed?

With the current suggestion one can run into several problems without noticing the Minikube/kubernetes version is the problem.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

It was tested on Mac with [this script](https://gist.github.com/attilapiros/cd58a16bdde833c80c5803c337fffa94#file-check_minikube_versions-zsh) which installs each Minikube versions from v1.7.2 (including this version to test the negative case of the version check) and runs the integration tests.

It was started with:
```
./check_minikube_versions.zsh > test_log 2>&1
```

And there was only one build failure the rest was successful:

```
$ grep "BUILD SUCCESS" test_log | wc -l
      26
$ grep "BUILD FAILURE" test_log | wc -l
       1
```

It was for Minikube v1.7.2  and the log is:

```
KubernetesSuite:
*** RUN ABORTED ***
  java.lang.AssertionError: assertion failed: Unsupported Minikube version is detected: minikube version: v1.7.2.For integration testing Minikube version 1.7.3 or greater is expected.
  at scala.Predef$.assert(Predef.scala:223)
  at org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.getKubernetesClient(Minikube.scala:52)
  at org.apache.spark.deploy.k8s.integrationtest.backend.minikube.MinikubeTestBackend$.initialize(MinikubeTestBackend.scala:33)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:163)
  at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
  at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
  at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
  at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.org$scalatest$BeforeAndAfter$$super$run(KubernetesSuite.scala:43)
  at org.scalatest.BeforeAndAfter.run(BeforeAndAfter.scala:273)
  at org.scalatest.BeforeAndAfter.run$(BeforeAndAfter.scala:271)
  ...
```

Moreover I made a test with having multiple k8s cluster contexts, too.

Closes #31829 from attilapiros/SPARK-34736.

Lead-authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Co-authored-by: attilapiros <piros.attila.zsolt@gmail.com>
Signed-off-by: attilapiros <piros.attila.zsolt@gmail.com>
(commit: 8b94eff)
The file was modifiedresource-managers/kubernetes/integration-tests/dev/dev-run-integration-tests.sh (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/PVTestsSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/README.md (diff)
Commit 44bd0a8bd36a7bc39d41cee690e242400d4a4869 by gurwls223
[SPARK-35088][SQL][FOLLOWUP] Improve the error message for Sequence expression

### What changes were proposed in this pull request?
Sequence expression output a message looks confused.
This PR will fix the issue.

### Why are the changes needed?
Improve the error message for Sequence expression

### Does this PR introduce _any_ user-facing change?
Yes. this PR updates the error message of Sequence expression.

### How was this patch tested?
Tests updated.

Closes #32492 from beliefer/SPARK-35088-followup.

Authored-by: gengjiaan <gengjiaan@360.cn>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 44bd0a8)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala (diff)
Commit c4ca23207b860f28e1828bd4945d4991b0a09049 by yamamuro
[SPARK-35363][SQL] Refactor sort merge join code-gen be agnostic to join type

### What changes were proposed in this pull request?

This is a pre-requisite of https://github.com/apache/spark/pull/32476, in discussion of https://github.com/apache/spark/pull/32476#issuecomment-836469779 . This is to refactor sort merge join code-gen to depend on streamed/buffered terminology, which makes the code-gen agnostic to different join types and can be extended to support other join types than inner join.

### Why are the changes needed?

Pre-requisite of https://github.com/apache/spark/pull/32476.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit test in `InnerJoinSuite.scala` for inner join code-gen.

Closes #32495 from c21/smj-refactor.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: c4ca232)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledJoin.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala (diff)