Changes

Summary

  1. [SPARK-34676][SQL][TEST] TableCapabilityCheckSuite should not inherit (commit: 48377d5) (details)
  2. [SPARK-34681][SQL] Fix bug for full outer shuffled hash join when (commit: a916690) (details)
  3. [SPARK-34682][SQL] Fix regression in canonicalization error check in (commit: fd48438) (details)
  4. [SPARK-34682][SQL] Use PrivateMethodTester instead of reflection (commit: fc182f7) (details)
  5. [SPARK-34507][BUILD] Update scala.version in parent POM when changing (commit: 5e120e4) (details)
  6. [SPARK-34546][SQL] AlterViewAs.query should be analyzed during the (commit: 2a6e68e) (details)
  7. [SPARK-34457][SQL] DataSource V2: Add default null ordering to (commit: 7226379) (details)
  8. [SPARK-34711][SQL][TESTS] Exercise code-gen enable/disable code paths (commit: 9aa8f06) (details)
  9. [MINOR][SQL] Remove unnecessary extend from BroadcastHashJoinExec (commit: 14ad7af) (details)
  10. [SPARK-34696][SQL][TESTS] Fix CodegenInterpretedPlanTest to generate (commit: 5c4d8f9) (details)
  11. [SPARK-34677][SQL] Support the `+`/`-` operators over ANSI SQL intervals (commit: 9d3d25b) (details)
Commit 48377d5bd9544baf7df928aa315df2504c062ac2 by dhyun
[SPARK-34676][SQL][TEST] TableCapabilityCheckSuite should not inherit all tests from AnalysisSuite

### What changes were proposed in this pull request?

Fixes a mistake in `TableCapabilityCheckSuite`, which runs some tests repeatedly.

### Why are the changes needed?

code cleanup

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #31788 from cloud-fan/minor.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 48377d5)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/TableCapabilityCheckSuite.scala (diff)
Commit a916690dd9aac40df38922dbea233785354a2f2a by dhyun
[SPARK-34681][SQL] Fix bug for full outer shuffled hash join when building left side with non-equal condition

### What changes were proposed in this pull request?

For full outer shuffled hash join with building hash map on left side, and having non-equal condition, the join can produce wrong result.

The root cause is `boundCondition` in `HashJoin.scala` always assumes the left side row is `streamedPlan` and right side row is `buildPlan` ([streamedPlan.output ++ buildPlan.output](https://github.com/apache/spark/blob/branch-3.1/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala#L141)). This is valid assumption, except for full outer + build left case.

The fix is to correct `boundCondition` in `HashJoin.scala` to handle full outer + build left case properly. See reproduce in https://issues.apache.org/jira/browse/SPARK-32399?focusedCommentId=17298414&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17298414 .

### Why are the changes needed?

Fix data correctness bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Changed the test in `OuterJoinSuite.scala` to cover full outer shuffled hash join.
Before this change, the unit test `basic full outer join using ShuffledHashJoin` in `OuterJoinSuite.scala` is failed.

Closes #31792 from c21/join-bugfix.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: a916690)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/joins/OuterJoinSuite.scala (diff)
Commit fd4843803c4670c656a94c1af652fb4b945bc82c by gurwls223
[SPARK-34682][SQL] Fix regression in canonicalization error check in CustomShuffleReaderExec

### What changes were proposed in this pull request?
There is a regression in 3.1.1 compared to 3.0.2 when checking for a canonicalized plan when executing CustomShuffleReaderExec.

The regression was caused by the call to `sendDriverMetrics` which happens before the check and will always fail if the plan is canonicalized.

### Why are the changes needed?
This is a regression in a useful error check.

### Does this PR introduce _any_ user-facing change?
No. This is not an error that a user would typically see, as far as I know.

### How was this patch tested?
I tested this change locally by making a distribution from this PR branch. Before fixing the regression I saw:

```
java.util.NoSuchElementException: key not found: numPartitions
```

After fixing this regression I saw:

```
java.lang.IllegalStateException: operating on canonicalized plan
```

Closes #31793 from andygrove/SPARK-34682.

Lead-authored-by: Andy Grove <andygrove73@gmail.com>
Co-authored-by: Andy Grove <andygrove@nvidia.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: fd48438)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CustomShuffleReaderExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
Commit fc182f7e7f9ff55a6a005044ae0968340cf6f30d by dhyun
[SPARK-34682][SQL] Use PrivateMethodTester instead of reflection

### Why are the changes needed?
SPARK-34682 was merged prematurely. This PR implements feedback from the review. I wasn't sure whether I should create a new JIRA or not.

### Does this PR introduce _any_ user-facing change?
No. Just improves the test.

### How was this patch tested?
Updated test.

Closes #31798 from andygrove/SPARK-34682-follow-up.

Authored-by: Andy Grove <andygrove73@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: fc182f7)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
Commit 5e120e4651687256b74f88ac9cfdbccd5dc16260 by gurwls223
[SPARK-34507][BUILD] Update scala.version in parent POM when changing Scala version for cross-build

### What changes were proposed in this pull request?

The `change-scala-version.sh` script updates Scala versions across the build for cross-build purposes. It manually changes `scala.binary.version` but not `scala.version`.

### Why are the changes needed?

It seems that this has always been an oversight, and the cross-built builds of Spark have an incorrect scala.version. See 2.4.5's 2.12 POM for example, which shows a Scala 2.11 version.
https://search.maven.org/artifact/org.apache.spark/spark-core_2.12/2.4.5/pom

More comments in the JIRA.

### Does this PR introduce _any_ user-facing change?

Should be a build-only bug fix.

### How was this patch tested?

Existing tests, but really N/A

Closes #31801 from srowen/SPARK-34507.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 5e120e4)
The file was modifieddev/change-scala-version.sh (diff)
Commit 2a6e68e1f7d458a82fb95f5bee49b0ca63e9270f by wenchen
[SPARK-34546][SQL] AlterViewAs.query should be analyzed during the analysis phase, and AlterViewAs should invalidate the cache

### What changes were proposed in this pull request?

This PR proposes the following:
   * `AlterViewAs.query` is currently analyzed in the physical operator `AlterViewAsCommand`, but it should be analyzed during the analysis phase.
   *  When `spark.sql.legacy.storeAnalyzedPlanForView` is set to true, store `TermporaryViewRelation` which wraps the analyzed plan, similar to #31273.
   *  Try to uncache the view you are altering.

### Why are the changes needed?

Analyzing a plan should be done in the analysis phase if possible.

Not uncaching the view (existing behavior) seems like a bug since the cache may not be used again.

### Does this PR introduce _any_ user-facing change?

Yes, now the view can be uncached if it's already cached.

### How was this patch tested?

Added new tests around uncaching.

The existing tests such as `SQLViewSuite` should cover the analysis changes.

Closes #31652 from imback82/alter_view_child.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 2a6e68e)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisTest.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala (diff)
Commit 72263797bc4abcaa27247028ac373251d66a1d00 by wenchen
[SPARK-34457][SQL] DataSource V2: Add default null ordering to SortDirection

### What changes were proposed in this pull request?

This PR adds a default null ordering to public `SortDirection` to match the Catalyst behavior.

### Why are the changes needed?

The SQL standard does not define the default null ordering for a sort direction. That's why it is up to a query engine to assign one. We need to standardize this in our public connector expressions to avoid ambiguity. That's why I propose to match the behavior in our Catalyst expressions.

### Does this PR introduce _any_ user-facing change?

Yes, it affects unreleased connector expression API.

### How was this patch tested?

Existing tests.

Closes #31580 from aokolnychyi/spark-34457.

Authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 7226379)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/SortDirection.java (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/Expressions.java (diff)
Commit 9aa8f06313f53747fc30e7e6719e31693d2cd3f0 by dhyun
[SPARK-34711][SQL][TESTS] Exercise code-gen enable/disable code paths for SHJ in join test suites

### What changes were proposed in this pull request?

Per comment in https://github.com/apache/spark/pull/31802#discussion_r592068440 , we would like to exercise whole stage code-gen enabled and disabled code paths in join unit test suites. This is for better test coverage of shuffled hash join.

### Why are the changes needed?

Better test coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing and added unit tests here.

Closes #31806 from c21/test-minor.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 9aa8f06)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/joins/OuterJoinSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/joins/ExistenceJoinSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/joins/InnerJoinSuite.scala (diff)
Commit 14ad7afa1aa0f3bfd75f1bf076a27af792721190 by dhyun
[MINOR][SQL] Remove unnecessary extend from BroadcastHashJoinExec

### What changes were proposed in this pull request?

This is just a minor fix. `HashJoin` already extends `JoinCodegenSupport`. So we don't need `CodegenSupport` here for `BroadcastHashJoinExec`. Submitted separately as a PR here per https://github.com/apache/spark/pull/31802#discussion_r592066686 .

### Why are the changes needed?

Clean up code.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests.

Closes #31805 from c21/bhj-minor.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 14ad7af)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala (diff)
Commit 5c4d8f95385ac97a66e5b491b5883ec770ae85bd by dhyun
[SPARK-34696][SQL][TESTS] Fix CodegenInterpretedPlanTest to generate correct test cases

### What changes were proposed in this pull request?

SPARK-23596 added `CodegenInterpretedPlanTest` at Apache Spark 2.4.0 in a wrong way because `withSQLConf` depends on the execution time `SQLConf.get` instead of `test` function declaration time. So, the following code executes the test twice without controlling the `CodegenObjectFactoryMode`. This PR aims to fix it correct and introduce a new function `testFallback`.

```scala
trait CodegenInterpretedPlanTest extends PlanTest {

   override protected def test(
       testName: String,
       testTags: Tag*)(testFun: => Any)(implicit pos: source.Position): Unit = {
     val codegenMode = CodegenObjectFactoryMode.CODEGEN_ONLY.toString
     val interpretedMode = CodegenObjectFactoryMode.NO_CODEGEN.toString

     withSQLConf(SQLConf.CODEGEN_FACTORY_MODE.key -> codegenMode) {
       super.test(testName + " (codegen path)", testTags: _*)(testFun)(pos)
     }
     withSQLConf(SQLConf.CODEGEN_FACTORY_MODE.key -> interpretedMode) {
       super.test(testName + " (interpreted path)", testTags: _*)(testFun)(pos)
     }
   }
}
```

### Why are the changes needed?

1. We need to use like the following.
```scala
super.test(testName + " (codegen path)", testTags: _*)(
   withSQLConf(SQLConf.CODEGEN_FACTORY_MODE.key -> codegenMode) { testFun })(pos)
super.test(testName + " (interpreted path)", testTags: _*)(
   withSQLConf(SQLConf.CODEGEN_FACTORY_MODE.key -> interpretedMode) { testFun })(pos)
```

2. After we fix this behavior with the above code, several test cases including SPARK-34596 and SPARK-34607 fail because they didn't work at both `CODEGEN` and `INTERPRETED` mode. Those test cases only work at `FALLBACK` mode. So, inevitably, we need to introduce `testFallback`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #31766 from dongjoon-hyun/SPARK-34596-SPARK-34607.

Lead-authored-by: Dongjoon Hyun <dhyun@apple.com>
Co-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 5c4d8f9)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala (diff)
Commit 9d3d25bca4d087d4e509c4fc83cf960397aebe9b by wenchen
[SPARK-34677][SQL] Support the `+`/`-` operators over ANSI SQL intervals

### What changes were proposed in this pull request?
Extend the `Add`, `Subtract` and `UnaryMinus` expression to support `DayTimeIntervalType` and `YearMonthIntervalType` added by #31614.

Note: the expressions can throw the `overflow` exception independently from the SQL config `spark.sql.ansi.enabled`. In this way, the modified expressions always behave in the ANSI mode for the intervals.

### Why are the changes needed?
To conform to the ANSI SQL standard which defines `-/+` over intervals:
<img width="822" alt="Screenshot 2021-03-09 at 21 59 22" src="https://user-images.githubusercontent.com/1580697/110523128-bd50ea80-8122-11eb-9982-782da0088d27.png">

### Does this PR introduce _any_ user-facing change?
Should not since new types have not been released yet.

### How was this patch tested?
By running new tests in the test suites:
```
$ build/sbt "test:testOnly *ArithmeticExpressionSuite"
$ build/sbt "test:testOnly *ColumnExpressionSuite"
```

Closes #31789 from MaxGekk/add-subtruct-intervals.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 9d3d25b)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeTestUtils.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/literals.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/windowFrameCoercion.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ExpressionTypeCheckingSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/literals.sql.out (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala (diff)