Changes

Summary

  1. [SPARK-33435][SQL] DSv2: REFRESH TABLE should invalidate caches (commit: cf3b655) (details)
  2. [SPARK-33259][SS] Disable streaming query with possible correctness (commit: 2c64b73) (details)
Commit cf3b6551ce010a5503d6c624e313690cd2058855 by dongjoon
[SPARK-33435][SQL] DSv2: REFRESH TABLE should invalidate caches
referencing the table
### What changes were proposed in this pull request?
This changes `RefreshTableExec` in DSv2 to also invalidate caches with
references to the target table to be refreshed. The change itself is
similar to what's done in #30211. Note that though, since we currently
don't support caching a DSv2 table directly, this doesn't add recache
logic as in the DSv1 impl. I marked it as a TODO for now.
### Why are the changes needed?
Currently the behavior in DSv1 and DSv2 is inconsistent w.r.t refreshing
table: in DSv1 we invalidate both metadata cache as well as all table
caches that are related to the table, but in DSv2 we only do the former.
This addresses the issue and make the behavior consistent.
### Does this PR introduce _any_ user-facing change?
Yes, now refreshing a v2 table also invalidate all the related caches.
### How was this patch tested?
Added a new UT.
Closes #30359 from sunchao/SPARK-33435.
Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Dongjoon Hyun
<dongjoon@apache.org>
(commit: cf3b655)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/RefreshTableExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
Commit 2c64b731ae6a976b0d75a95901db849b4a0e2393 by dongjoon
[SPARK-33259][SS] Disable streaming query with possible correctness
issue by default
### What changes were proposed in this pull request?
This patch proposes to disable the streaming query with possible
correctness issue in chained stateful operators. The behavior can be
controlled by a SQL config, so if users understand the risk and still
want to run the query, they can disable the check.
### Why are the changes needed?
The possible correctness in chained stateful operators in streaming
query is not straightforward for users. From users perspective, it will
be considered as a Spark bug. It is also possible the worse case, users
are not aware of the correctness issue and use wrong results.
A better approach should be to disable such queries and let users choose
to run the query if they understand there is such risk, instead of
implicitly running the query and let users to find out correctness issue
by themselves and report this known to Spark community.
### Does this PR introduce _any_ user-facing change?
Yes. Streaming query with possible correctness issue will be blocked to
run, except for users explicitly disable the SQL config.
### How was this patch tested?
Unit test.
Closes #30210 from viirya/SPARK-33259.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: 2c64b73)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala (diff)
The file was modifieddocs/ss-migration-guide.md (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)