Changes

Summary

  1. [MINOR][SQL] Fix a typo at 'spark.sql.sources.fileCompressionFactor' (commit: fe2ab25) (details)
  2. [SPARK-32853][SQL] Consecutive save/load calls in (commit: 9f4f49c) (details)
  3. [SPARK-32730][SQL][FOLLOW-UP] Improve LeftAnti SortMergeJoin right side (commit: 94cac59) (details)
  4. [SPARK-32180][PYTHON][DOCS] Installation page of Getting Started in (commit: f6322d1) (details)
  5. [SPARK-32845][SS][TESTS] Add sinkParameter to check sink options (commit: b4be6a6) (details)
  6. [SPARK-32851][SQL][TEST] Tests should fail if errors happen when (commit: 4269c2c) (details)
  7. [SPARK-32180][FOLLOWUP] Fix .rst error in new Pyspark installation guide (commit: ce566be) (details)
Commit fe2ab255d14bbccb72b95ed776b74e86cb9762b6 by srowen
[MINOR][SQL] Fix a typo at 'spark.sql.sources.fileCompressionFactor'
error message in SQLConf
### What changes were proposed in this pull request?
fix typo in SQLConf
### Why are the changes needed?
typo fix to increase readability
### Does this PR introduce _any_ user-facing change?
### How was this patch tested?
no test
Closes #29668 from Ted-Jiang/fix_annotate.
Authored-by: yangjiang <yangjiang@ebay.com> Signed-off-by: Sean Owen
<srowen@gmail.com>
(commit: fe2ab25)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit 9f4f49cbaa3def9f7d8573629ff3b6cbd6833b2f by dongjoon
[SPARK-32853][SQL] Consecutive save/load calls in
DataFrame/StreamReader/Writer should not fail
### What changes were proposed in this pull request?
This is a followup of https://github.com/apache/spark/pull/29328
In https://github.com/apache/spark/pull/29328 , we forbid the use case
that path option and path parameter are both specified.  However,  it
breaks some use cases:
``` val dfr =  spark.read.format(...).option(...) dfr.load(path1).xxx
dfr.load(path2).xxx
```
The reason is that: `load` has side effects. It will set path option to
the `DataFrameReader` instance. The next time you call `load`, Spark
will fail because both path option and path parameter are specified.
This PR removes the side effect of `save`/`load`/`start`  to not set the
path option.
### Why are the changes needed?
recover some use cases
### Does this PR introduce _any_ user-facing change?
Yes, some use cases fail before this PR, and can run successfully after
this PR.
### How was this patch tested?
new tests
Closes #29723 from cloud-fan/df.
Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by:
Dongjoon Hyun <dongjoon@apache.org>
(commit: 9f4f49c)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamReaderWriterSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala (diff)
Commit 94cac5978cf33f99a9f28180c9c909d5c884c152 by wenchen
[SPARK-32730][SQL][FOLLOW-UP] Improve LeftAnti SortMergeJoin right side
buffering
### What changes were proposed in this pull request?
This is a follow-up to https://github.com/apache/spark/pull/29572.
LeftAnti SortMergeJoin should not buffer all matching right side rows
when bound condition is empty, this is unnecessary and can lead to
performance degradation especially when spilling happens.
### Why are the changes needed?
Performance improvement.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New UT.
Closes #29727 from
peter-toth/SPARK-32730-improve-leftsemi-sortmergejoin-followup.
Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Wenchen
Fan <wenchen@databricks.com>
(commit: 94cac59)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala (diff)
Commit f6322d1cb149983fbcd5b90a804eeda0fe4e8a49 by srowen
[SPARK-32180][PYTHON][DOCS] Installation page of Getting Started in
PySpark documentation
### What changes were proposed in this pull request? This PR proposes to
add getting started- installation to new PySpark docs.
### Why are the changes needed? Better documentation.
### Does this PR introduce _any_ user-facing change? No. Documentation
only.
### How was this patch tested? Generating documents locally.
Closes #29640 from
rohitmishr1484/SPARK-32180-Getting-Started-Installation.
Authored-by: Rohit.Mishra <rohit.mishra@utopusinsights.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: f6322d1)
The file was modifiedpython/docs/source/getting_started/index.rst (diff)
The file was addedpython/docs/source/getting_started/installation.rst
Commit b4be6a6d12bf62f02cffe0bcc97ef32d27827d57 by dongjoon
[SPARK-32845][SS][TESTS] Add sinkParameter to check sink options
robustly in DataStreamReaderWriterSuite
### What changes were proposed in this pull request?
This PR aims to add `sinkParameter`  to check sink options robustly and
independently in DataStreamReaderWriterSuite
### Why are the changes needed?
`LastOptions.parameters` is designed to catch three cases:
`sourceSchema`, `createSource`, `createSink`. However,
`StreamQuery.stop` invokes `queryExecutionThread.join`, `runStream`,
`createSource` immediately and reset the stored options by `createSink`.
To catch `createSink` options, currently, the test suite is trying a
workaround pattern. However, we observed a flakiness in this pattern
sometimes. If we split `createSink` option separately, we don't need
this workaround and can eliminate this flakiness.
```scala val query = df.writeStream.
  ...
  .start() assert(LastOptions.paramters(..)) query.stop()
```
### Does this PR introduce _any_ user-facing change?
No. This is a test-only change.
### How was this patch tested?
Pass the newly updated test case.
Closes #29730 from dongjoon-hyun/SPARK-32845.
Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: b4be6a6)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataStreamReaderWriterSuite.scala (diff)
Commit 4269c2c252d5eecf6a861160556026ee399ad976 by yamamuro
[SPARK-32851][SQL][TEST] Tests should fail if errors happen when
generating projection code
### What changes were proposed in this pull request?
This PR intends to set `CODEGEN_ONLY` at `CODEGEN_FACTORY_MODE` in test
spark context so that tests can fail if errors happen when generating
expr code.
### Why are the changes needed?
I noticed that the code generation of `SafeProjection` failed in the
existing test (https://issues.apache.org/jira/browse/SPARK-32828) but it
passed because `FALLBACK` was set at `CODEGEN_FACTORY_MODE` (by default)
in `SharedSparkSession`. To get aware of these failures quickly, I think
its worth setting `CODEGEN_ONLY` at `CODEGEN_FACTORY_MODE`.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing tests.
Closes #29721 from maropu/ExprCodegenTest.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by:
Takeshi Yamamuro <yamamuro@apache.org>
(commit: 4269c2c)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala (diff)
Commit ce566bed17f94ac3443ebed82ad406b43dbb13c2 by ueshin
[SPARK-32180][FOLLOWUP] Fix .rst error in new Pyspark installation guide
This simply fixes an .rst generation error in
https://github.com/apache/spark/pull/29640
Closes #29735 from srowen/SPARK-32180.2.
Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Takuya UESHIN
<ueshin@databricks.com>
(commit: ce566be)
The file was modifiedpython/docs/source/getting_started/installation.rst (diff)