Changes

Summary

  1. [SPARK-33290][SQL] REFRESH TABLE should invalidate cache even though the (commit: 32b78d3) (details)
  2. [SPARK-33293][SQL] Refactor WriteToDataSourceV2Exec and reduce code (commit: c51e5fc) (details)
Commit 32b78d3795d5c4fd533b0267647977ed4f02ee49 by dhyun
[SPARK-33290][SQL] REFRESH TABLE should invalidate cache even though the
table itself may not be cached
### What changes were proposed in this pull request?
In `CatalogImpl.refreshTable`, this moves the `uncacheQuery` call out of
the condition `if (cache.nonEmpty)` so that it will be called whether
the table itself is cached or not.
### Why are the changes needed?
In the case like the following:
```sql CREATE TABLE t ...; CREATE VIEW t1 AS SELECT * FROM t; REFRESH
TABLE t;
```
If the table `t` is refreshed, the view `t1` which is depending on `t`
will not be invalidated. This could lead to incorrect result and is
similar to
[SPARK-19765](https://issues.apache.org/jira/browse/SPARK-19765).
On the other hand, if we have:
```sql CREATE TABLE t ...; CACHE TABLE t; CREATE VIEW t1 AS SELECT *
FROM t; REFRESH TABLE t;
```
Then the view `t1` will be refreshed. The behavior is somewhat
inconsistent.
### Does this PR introduce _any_ user-facing change?
Yes, with the change any cache that are depending on the table refreshed
will be invalidated with the change. Previously this only happens if the
table itself is cached.
### How was this patch tested?
Added a new UT for the case.
Closes #30187 from sunchao/SPARK-33290.
Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: 32b78d3)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala (diff)
Commit c51e5fc14b9d1d120afcf0e53714ccba5063b71e by dhyun
[SPARK-33293][SQL] Refactor WriteToDataSourceV2Exec and reduce code
duplication
### What changes were proposed in this pull request?
Refactor `WriteToDataSourceV2Exec` via removing code duplication around
write to table logic:
- renamed `AtomicTableWriteExec` to `TableWriteExec` so that the table
write logic in this trait can be modified and shared with
`CreateTableAsSelectExec`, `ReplaceTableAsSelectExec`,
`AtomicCreateTableAsSelectExec ` and `AtomicReplaceTableAsSelectExec`.
- similar to the above, renamed `writeToStagedTable` to `writeToTable`
in `TableWriteExec`.
- extended `writeToTable` so that it can handle both staged table as
well as non-staged table.
### Why are the changes needed?
Simplify the logic and remove duplication, to make this piece of code
easier to maintain.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass CIs with the existing test coverage.
Closes #30193 from sunchao/SPARK-33293.
Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: c51e5fc)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala (diff)