Changes

Summary

  1. [SPARK-33319][SQL][TEST] Add all built-in SerDes to (commit: 789d19c) (details)
  2. [SPARK-33306][SQL][FOLLOWUP] Group DateType and TimestampType together (commit: eecebd0) (details)
  3. [SPARK-33299][SQL][DOCS] Don't mention schemas in JSON format in docs (commit: bdabf60) (details)
  4. [SPARK-33250][PYTHON][DOCS] Migration to NumPy documentation style in (commit: 3959f0d) (details)
  5. [SPARK-33324][K8S][BUILD] Upgrade kubernetes-client to 4.11.1 (commit: 27d8136) (details)
  6. [SPARK-33257][PYTHON][SQL] Support Column inputs in PySpark ordering (commit: 4c8ee88) (details)
  7. [SPARK-33284][WEB-UI] In the Storage UI page, clicking any field to sort (commit: 56c623e) (details)
Commit 789d19cab5caa20d35dcdd700ed7fe53ca1893fe by dhyun
[SPARK-33319][SQL][TEST] Add all built-in SerDes to
HiveSerDeReadWriteSuite
### What changes were proposed in this pull request?
This pr add all built-in SerDes to `HiveSerDeReadWriteSuite`.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormats&SerDe
### Why are the changes needed?
We will upgrade Parquet, ORC and Avro, need to ensure compatibility.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
N/A
Closes #30228 from wangyum/SPARK-33319.
Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: 789d19c)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeReadWriteSuite.scala (diff)
Commit eecebd03023bdde5084b7f518d709e304eff7228 by dhyun
[SPARK-33306][SQL][FOLLOWUP] Group DateType and TimestampType together
in `needsTimeZone()`
### What changes were proposed in this pull request? In the PR, I
propose to group `DateType` and `TimestampType` together in checking
time zone needs in the `Cast.needsTimeZone()` method.
### Why are the changes needed? To improve code maintainability.
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? By the existing test `"SPARK-33306:
Timezone is needed when cast Date to String"`.
Closes #30223 from MaxGekk/WangGuangxin-SPARK-33306-followup.
Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: eecebd0)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala (diff)
Commit bdabf60fb4a61b0eef95144f2c54477a10ea849f by dhyun
[SPARK-33299][SQL][DOCS] Don't mention schemas in JSON format in docs
for `from_json`
### What changes were proposed in this pull request? Remove the JSON
formatted schema from comments for `from_json()` in Scala/Python APIs.
Closes #30201
### Why are the changes needed? Schemas in JSON format is internal (not
documented). It shouldn't be recommenced for usage.
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? By linters.
Closes #30226 from MaxGekk/from_json-common-schema-parsing-2.
Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: bdabf60)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
The file was modifiedpython/pyspark/sql/functions.py (diff)
Commit 3959f0d9879fa7fa9e8f2e8ed8c8b12003d21788 by gurwls223
[SPARK-33250][PYTHON][DOCS] Migration to NumPy documentation style in
SQL (pyspark.sql.*)
### What changes were proposed in this pull request?
This PR proposes to migrate to [NumPy documentation
style](https://numpydoc.readthedocs.io/en/latest/format.html), see also
SPARK-33243. While I am migrating, I also fixed some Python type hints
accordingly.
### Why are the changes needed?
For better documentation as text itself, and generated HTMLs
### Does this PR introduce _any_ user-facing change?
Yes, they will see a better format of HTMLs, and better text format. See
SPARK-33243.
### How was this patch tested?
Manually tested via running `./dev/lint-python`.
Closes #30181 from HyukjinKwon/SPARK-33250.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 3959f0d)
The file was modifiedpython/pyspark/sql/group.py (diff)
The file was modifiedpython/pyspark/sql/utils.py (diff)
The file was modifiedpython/pyspark/sql/session.py (diff)
The file was modifiedpython/pyspark/sql/streaming.pyi (diff)
The file was modifiedpython/pyspark/sql/pandas/serializers.py (diff)
The file was modifiedpython/pyspark/sql/context.py (diff)
The file was modifiedpython/pyspark/sql/window.py (diff)
The file was modifiedpython/pyspark/sql/window.pyi (diff)
The file was modifiedpython/pyspark/sql/avro/functions.py (diff)
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was modifiedpython/pyspark/sql/dataframe.pyi (diff)
The file was modifiedpython/pyspark/sql/dataframe.py (diff)
The file was modifiedpython/pyspark/sql/streaming.py (diff)
The file was modifiedpython/pyspark/sql/column.py (diff)
The file was modifiedpython/pyspark/sql/catalog.py (diff)
The file was modifiedpython/pyspark/sql/functions.pyi (diff)
The file was modifiedpython/pyspark/sql/readwriter.py (diff)
The file was modifiedpython/pyspark/sql/pandas/map_ops.py (diff)
The file was modifiedpython/pyspark/sql/pandas/conversion.py (diff)
The file was modifiedpython/pyspark/sql/types.py (diff)
The file was modifiedpython/pyspark/sql/pandas/functions.py (diff)
The file was modifiedpython/pyspark/sql/readwriter.pyi (diff)
The file was modifiedpython/pyspark/sql/udf.py (diff)
The file was modifiedpython/pyspark/sql/pandas/group_ops.py (diff)
The file was modifiedpython/pyspark/sql/pandas/types.py (diff)
Commit 27d81369342c19bae558329ddd0e2542554433f9 by dhyun
[SPARK-33324][K8S][BUILD] Upgrade kubernetes-client to 4.11.1
### What changes were proposed in this pull request?
This PR aims to upgrade `Kubernetes-client` from 4.10.3 to 4.11.1.
### Why are the changes needed?
This upgrades the dependency for Apache Spark 3.1.0. Since 4.12.0 is
still new and has a breaking API changes, this PR chooses the latest
compatible one.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the all CIs including K8s IT.
Closes #30233 from dongjoon-hyun/SPARK-33324.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 27d8136)
The file was modifiedresource-managers/kubernetes/core/pom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/pom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
Commit 4c8ee8856cb9714d433456fb0ce44dfebb00d83f by gurwls223
[SPARK-33257][PYTHON][SQL] Support Column inputs in PySpark ordering
functions (asc*, desc*)
### What changes were proposed in this pull request?
This PR adds support for passing `Column`s as input to PySpark sorting
functions.
### Why are the changes needed?
According to SPARK-26979, PySpark functions should support both Column
and str arguments, when possible.
### Does this PR introduce _any_ user-facing change?
PySpark users can now provide both `Column` and `str` as an argument for
`asc*` and `desc*` functions.
### How was this patch tested?
New unit tests.
Closes #30227 from zero323/SPARK-33257.
Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 4c8ee88)
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was modifiedpython/pyspark/sql/functions.pyi (diff)
The file was modifiedpython/pyspark/sql/tests/test_functions.py (diff)
Commit 56c623e98c54fdb4d47c9264ae1b282ecb2b7291 by srowen
[SPARK-33284][WEB-UI] In the Storage UI page, clicking any field to sort
the table will cause the header content to be lost
### What changes were proposed in this pull request? In the old version
of spark in the storage UI page, the sorting function is normal, but
sorting in the new version will cause the header content to be lost, So
I try to fix the bug.
### Why are the changes needed?
The header field of the table on the page is similar to the following,
**note that each th contains the span attribute**:
```html
<thead>
   <tr>
       ....
       <th width="" class="">
             <span data-toggle="tooltip" title=""
data-original-title="StorageLevel displays where the persisted RDD is
stored, format of the persisted RDD (serialized or de-serialized)
andreplication factor of the persisted RDD">
               Storage Level
             </span>
       </th>
      .....
   </tr>
</thead>
```
Since  [PR#26136](https://github.com/apache/spark/pull/26136), if the
`th` in the table itself contains the `span` attribute, the `span` will
be deleted directly after clicking the sort, and the original header
content will be lost.
There are three problems  in `sorttable.js`:
1. `sortrevind.class = "sorttable_sortrevind"` in
[sorttab.js#107](https://github.com/apache/spark/blob/9d5e48ea95d1c3017a51ff69584f32a18901b2b5/core/src/main/resources/org/apache/spark/ui/static/sorttable.js#L107)
and  `sortfwdind.class = "sorttable_sortfwdind"` in
[sorttab.js#125](https://github.com/apache/spark/blob/9d5e48ea95d1c3017a51ff69584f32a18901b2b5/core/src/main/resources/org/apache/spark/ui/static/sorttable.js#L125)
sorttable_xx attribute should be assigned to`className` instead of
`class`, as javascript uses `rowlists[j].className.search` rather than
`rowlists[j].class.search` to determine whether the component has a
sorting flag or not. 2.
`rowlists[j].className.search(/\sorttable_sortrevind\b/)` in
[sorttab.js#120](https://github.com/apache/spark/blob/9d5e48ea95d1c3017a51ff69584f32a18901b2b5/core/src/main/resources/org/apache/spark/ui/static/sorttable.js#L120)
was wrong. The original intention is to search whether `className`
contains  the word `sorttable_sortrevind` , but the expression is wrong,
it should be `\bsorttable_sortrevind\b` instead of
`\sorttable_sortrevind\b` 3. The if check statement in the following
code snippet
([sorttab.js#141](https://github.com/apache/spark/blob/9d5e48ea95d1c3017a51ff69584f32a18901b2b5/core/src/main/resources/org/apache/spark/ui/static/sorttable.js#L141))
was wrong. **If the `search` function does not find the target, it will
return -1, but Boolean(-1) is actually equals true**. This statement
will cause span to be deleted even if it does not contain
`sorttable_sortfwdind` or `sorttable_sortrevind`.
```javascript rowlists = this.parentNode.getElementsByTagName("span");
for (var j=0; j < rowlists.length; j++) {
             if
(rowlists[j].className.search(/\bsorttable_sortfwdind\b/)
                 ||
rowlists[j].className.search(/\sorttable_sortrevind\b/) ) {
                 rowlists[j].parentNode.removeChild(rowlists[j]);
             }
         }
```
### Does this PR introduce _any_ user-facing change? NO.
### How was this patch tested? The manual test result of the ui page is
as below:
![fix
sorted](https://user-images.githubusercontent.com/52202080/97543194-daeaa680-1a02-11eb-8b11-8109c3e4e9a3.gif)
Closes #30182 from akiyamaneko/ui_storage_sort_error.
Authored-by: neko <echohlne@gmail.com> Signed-off-by: Sean Owen
<srowen@gmail.com>
(commit: 56c623e)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/sorttable.js (diff)