Changes

Summary

  1. [SPARK-32481][CORE][SQL] Support truncate table to move data to trash (commit: 065f173) (details)
  2. [SPARK-32733][SQL] Add extended information - (commit: 6dacba7) (details)
  3. [SPARK-32719][PYTHON] Add Flake8 check missing imports (commit: a1e459e) (details)
  4. [SPARK-32740][SQL] Refactor common partitioning/distribution logic to (commit: ce473b2) (details)
  5. [SPARK-32138][FOLLOW-UP] Drop obsolete StringIO import branching (commit: 5574734) (details)
  6. [SPARK-32747][R][TESTS] Deduplicate configuration set/unset in (commit: 2491cf1) (details)
Commit 065f17386d1851d732b4c1badf1ce2e14d0de338 by dongjoon
[SPARK-32481][CORE][SQL] Support truncate table to move data to trash
### What changes were proposed in this pull request? Instead of deleting
the data, we can move the data to trash. Based on the configuration
provided by the user it will be deleted permanently from the trash.
### Why are the changes needed? Instead of directly deleting the data,
we can provide flexibility to move data to the trash and then delete it
permanently.
### Does this PR introduce _any_ user-facing change? Yes, After truncate
table the data is not permanently deleted now. It is first moved to the
trash and then after the given time deleted permanently;
### How was this patch tested? new UTs added
Closes #29552 from Udbhav30/truncate.
Authored-by: Udbhav30 <u.agrawal30@gmail.com> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: 065f173)
The file was modifiedcore/src/main/scala/org/apache/spark/util/Utils.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala (diff)
Commit 6dacba7fa044555512f974595ed52f7316bb97f9 by gurwls223
[SPARK-32733][SQL] Add extended information -
arguments/examples/since/notes of expressions to the remarks field of
GetFunctionsOperation
### What changes were proposed in this pull request?
This PR adds extended information of a function including arguments,
examples, notes and the since field to the SparkGetFunctionOperation
### Why are the changes needed?
better user experience, it will help JDBC users to have a better
understanding of our builtin functions
### Does this PR introduce _any_ user-facing change?
Yes, BI tools and JDBC users will get full information on a spark
function instead of only fragmentary usage info.
e.g. date_part
#### before
``` date_part(field, source) - Extracts a part of the date/timestamp or
interval source.
```
#### after
```
   Usage:
     date_part(field, source) - Extracts a part of the date/timestamp or
interval source.
    Arguments:
     * field - selects which part of the source should be extracted, and
supported string values are as same as the fields of the equivalent
function `EXTRACT`.
     * source - a date/timestamp or interval column from where `field`
should be extracted
    Examples:
     > SELECT date_part('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456');
      2019
     > SELECT date_part('week', timestamp'2019-08-12 01:00:00.123456');
      33
     > SELECT date_part('doy', DATE'2019-08-12');
      224
     > SELECT date_part('SECONDS', timestamp'2019-10-01
00:00:01.000001');
      1.000001
     > SELECT date_part('days', interval 1 year 10 months 5 days);
      5
     > SELECT date_part('seconds', interval 5 hours 30 seconds 1
milliseconds 1 microseconds);
      30.001001
    Note:
     The date_part function is equivalent to the SQL-standard function
`EXTRACT(field FROM source)`
    Since: 3.0.0
```
### How was this patch tested?
New tests
Closes #29577 from yaooqinn/SPARK-32733.
Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 6dacba7)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkMetadataOperationSuite.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala (diff)
Commit a1e459ed9f6777fb8d5a2d09fda666402f9230b9 by gurwls223
[SPARK-32719][PYTHON] Add Flake8 check missing imports
https://issues.apache.org/jira/browse/SPARK-32719
### What changes were proposed in this pull request?
Add a check to detect missing imports. This makes sure that if we use a
specific class, it should be explicitly imported (not using a wildcard).
### Why are the changes needed?
To make sure that the quality of the Python code is up to standard.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing unit-tests and Flake8 static analysis
Closes #29563 from Fokko/fd-add-check-missing-imports.
Authored-by: Fokko Driesprong <fokko@apache.org> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: a1e459e)
The file was modifiedpython/pyspark/ml/fpm.py (diff)
The file was modifiedpython/pyspark/sql/pandas/conversion.py (diff)
The file was modifiedpython/pyspark/ml/pipeline.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_readwriter.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_datasources.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_udf_grouped_agg.py (diff)
The file was modifiedpython/pyspark/ml/tree.py (diff)
The file was modifieddev/tox.ini (diff)
The file was modifiedpython/pyspark/sql/tests/test_context.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_grouped_map.py (diff)
The file was modifiedpython/pyspark/sql/readwriter.py (diff)
The file was modifiedpython/pyspark/ml/regression.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_types.py (diff)
The file was modifiedpython/pyspark/ml/classification.py (diff)
The file was modifiedpython/pyspark/ml/recommendation.py (diff)
The file was modifiedpython/pyspark/__init__.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_udf.py (diff)
The file was modifieddev/create-release/translate-contributors.py (diff)
The file was modifiedpython/pyspark/ml/feature.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_arrow.py (diff)
The file was modifiedpython/pyspark/ml/base.py (diff)
The file was modifiedpython/pyspark/sql/pandas/types.py (diff)
The file was modifiedpython/pyspark/mllib/stat/__init__.py (diff)
The file was modifiedpython/pyspark/sql/column.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_dataframe.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_udf.py (diff)
The file was modifiedpython/pyspark/ml/clustering.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_serde.py (diff)
The file was modifiedexamples/src/main/python/sql/basic.py (diff)
The file was modifiedpython/pyspark/tests/test_serializers.py (diff)
The file was modifiedpython/pyspark/sql/group.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_column.py (diff)
The file was modifiedpython/pyspark/ml/tuning.py (diff)
The file was modifiedpython/pyspark/sql/dataframe.py (diff)
The file was modifieddev/create-release/generate-contributors.py (diff)
The file was modifiedpython/pyspark/sql/streaming.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_streaming.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_udf_scalar.py (diff)
Commit ce473b223ac64b60662a2e1731891a89fa3d126b by yamamuro
[SPARK-32740][SQL] Refactor common partitioning/distribution logic to
BaseAggregateExec
### What changes were proposed in this pull request?
For all three different aggregate physical operator:
`HashAggregateExec`, `ObjectHashAggregateExec` and `SortAggregateExec`,
they have same `outputPartitioning` and `requiredChildDistribution`
logic. Refactor these same logic into their super class
`BaseAggregateExec` to avoid code duplication and future bugs (similar
to `HashJoin` and `ShuffledJoin`).
### Why are the changes needed?
Reduce duplicated code across classes and prevent future bugs if we only
update one class but forget another. We already did similar refactoring
for join (`HashJoin` and `ShuffledJoin`).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing unit tests as this is pure refactoring and no new logic added.
Closes #29583 from c21/aggregate-refactor.
Authored-by: Cheng Su <chengsu@fb.com> Signed-off-by: Takeshi Yamamuro
<yamamuro@apache.org>
(commit: ce473b2)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortAggregateExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/BaseAggregateExec.scala (diff)
Commit 557473409317f2ed6554a0bc0b03f900b936dd7b by gurwls223
[SPARK-32138][FOLLOW-UP] Drop obsolete StringIO import branching
### What changes were proposed in this pull request?
Removal of branched `StringIO` import.
### Why are the changes needed?
Top level `StringIO` is no longer present in Python 3.x.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing tests.
Closes #29590 from zero323/SPARK-32138-FOLLOW-UP.
Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 5574734)
The file was modifiedpython/pyspark/tests/test_serializers.py (diff)
Commit 2491cf1ae1085832f21545ae563a789c34bdd098 by gurwls223
[SPARK-32747][R][TESTS] Deduplicate configuration set/unset in
test_sparkSQL_arrow.R
### What changes were proposed in this pull request?
This PR proposes to deduplicate configuration set/unset in
`test_sparkSQL_arrow.R`. Setting
`spark.sql.execution.arrow.sparkr.enabled` can be globally done instead
of doing it in each test case.
### Why are the changes needed?
To duduplicate the codes.
### Does this PR introduce _any_ user-facing change?
No, dev-only
### How was this patch tested?
Manually ran the tests.
Closes #29592 from HyukjinKwon/SPARK-32747.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 2491cf1)
The file was modifiedR/pkg/tests/fulltests/test_sparkSQL_arrow.R (diff)