Changes

Summary

  1. [SPARK-32388][SQL] TRANSFORM with schema-less mode should keep the same (commit: e43cd8c) (details)
  2. [SPARK-32188][PYTHON][DOCS][FOLLOW-UP] Document Column APIs in API (commit: 7cdc921) (details)
  3. [SPARK-32084][PYTHON][SQL] Expand dictionary functions (commit: 4e6a310) (details)
  4. [SPARK-33243][PYTHON][BUILD] Add numpydoc into documentation dependency (commit: 9818f07) (details)
Commit e43cd8ccef153ed504200c9f52966cb6a96e73bf by gurwls223
[SPARK-32388][SQL] TRANSFORM with schema-less mode should keep the same
with hive
### What changes were proposed in this pull request? In current Spark
script transformation with hive serde mode, in case of schema less,
result is different with hive. This pr to keep result same with hive
script transform  serde.
#### Hive Scrip Transform with serde in schemaless
``` hive> create table t (c0 int, c1 int, c2 int); hive> INSERT INTO t
VALUES (1, 1, 1); hive> INSERT INTO t VALUES (2, 2, 2); hive> CREATE
VIEW v AS SELECT TRANSFORM(c0, c1, c2) USING 'cat' FROM t;
hive> DESCRIBE v; key                 string value              
string
hive> SELECT * FROM v; 1 1 1 2 2 2
hive> SELECT key FROM v; 1 2
hive> SELECT value FROM v; 1 1 2 2
```
#### Spark script transform with hive serde in schema less.
``` hive> create table t (c0 int, c1 int, c2 int); hive> INSERT INTO t
VALUES (1, 1, 1); hive> INSERT INTO t VALUES (2, 2, 2); hive> CREATE
VIEW v AS SELECT TRANSFORM(c0, c1, c2) USING 'cat' FROM t;
hive> SELECT * FROM v; 1   1 2   2
```
**No serde mode in hive (ROW FORMATTED DELIMITED)**
![image](https://user-images.githubusercontent.com/46485123/90088770-55841e00-dd52-11ea-92dd-7fe52d93f0b3.png)
### Why are the changes needed? Keep same behavior with hive script
transform
### Does this PR introduce _any_ user-facing change? Before this pr with
hive serde script transform
``` select transform(*) USING 'cat' from ( select 1, 2, 3, 4
) tmp
key     value 1         2
``` After
``` select transform(*) USING 'cat' from ( select 1, 2, 3, 4
) tmp
key     value 1         2   3  4
```
### How was this patch tested? UT
Closes #29421 from AngersZhuuuu/SPARK-32388.
Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: e43cd8c)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala (diff)
Commit 7cdc921bc07c3d627a8fcbc81cd9c320bda0b873 by gurwls223
[SPARK-32188][PYTHON][DOCS][FOLLOW-UP] Document Column APIs in API
reference
### What changes were proposed in this pull request?
This PR proposes to document the APIs in `Column` as well in API
reference of PySpark documentation.
### Why are the changes needed?
To document common APIs in PySpark.
### Does this PR introduce _any_ user-facing change?
Yes, `Column.*` will be shown in API reference page.
### How was this patch tested?
Manually tested via `cd python` and `make clean html`.
Closes #30150 from HyukjinKwon/SPARK-32188.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 7cdc921)
The file was modifiedpython/docs/source/reference/pyspark.sql.rst (diff)
Commit 4e6a310f8062102ea6a022fb21171f896c8296ae by gurwls223
[SPARK-32084][PYTHON][SQL] Expand dictionary functions
### What changes were proposed in this pull request?
- [x] Expand dictionary definitions into standalone functions.
- [x] Fix annotations for ordering functions.
### Why are the changes needed?
To simplify further maintenance of docstrings.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing tests.
Closes #30143 from zero323/SPARK-32084.
Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 4e6a310)
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was modifiedpython/pyspark/sql/functions.pyi (diff)
The file was modifiedpython/pyspark/sql/tests/test_functions.py (diff)
Commit 9818f079aa00a390c1cbd267022f42e05db6d67b by gurwls223
[SPARK-33243][PYTHON][BUILD] Add numpydoc into documentation dependency
### What changes were proposed in this pull request?
This PR proposes to initiate the migration to NumPy documentation style
(from reST style) in PySpark docstrings. This PR also adds one migration
example of `SparkContext`.
- **Before:**
   ...
   ![Screen Shot 2020-10-26 at 7 02 05
PM](https://user-images.githubusercontent.com/6477701/97161090-a8ea0200-17c0-11eb-8204-0e70d18fc571.png)
   ...
   ![Screen Shot 2020-10-26 at 7 02 09
PM](https://user-images.githubusercontent.com/6477701/97161100-aab3c580-17c0-11eb-92ad-f5ad4441ce16.png)
   ...
- **After:**
    ...
   ![Screen Shot 2020-10-26 at 7 24 08
PM](https://user-images.githubusercontent.com/6477701/97161219-d636b000-17c0-11eb-80ab-d17a570ecb4b.png)
   ...
See also https://numpydoc.readthedocs.io/en/latest/format.html
### Why are the changes needed?
There are many reasons for switching to NumPy documentation style.
1. Arguably reST style doesn't fit well when the docstring grows large
because it provides (arguably) less structures and syntax.
2. NumPy documentation style provides a better human readable docstring
format. For example, notebook users often just do `help(...)` by
`pydoc`.
3. NumPy documentation style is pretty commonly used in data science
libraries, for example, pandas, numpy, Dask, Koalas, matplotlib, ...
Using NumPy documentation style can give users a consistent
documentation style.
### Does this PR introduce _any_ user-facing change?
The dependency itself doesn't change anything user-facing. The
documentation change in `SparkContext` does, as shown above.
### How was this patch tested?
Manually tested via running `cd python` and `make clean html`.
Closes #30149 from HyukjinKwon/SPARK-33243.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 9818f07)
The file was modifieddev/lint-python (diff)
The file was modifiedpython/pyspark/context.py (diff)
The file was modifieddocs/README.md (diff)
The file was modifiedpython/docs/source/conf.py (diff)
The file was removedpython/docs/source/_templates/class_with_docs.rst
The file was modifieddev/requirements.txt (diff)
The file was addedpython/docs/source/_templates/autosummary/class.rst
The file was addedpython/docs/source/_templates/autosummary/class_with_docs.rst
The file was modifieddev/create-release/spark-rm/Dockerfile (diff)
The file was modifiedpython/docs/source/reference/pyspark.ml.rst (diff)
The file was modifiedpython/docs/source/reference/pyspark.mllib.rst (diff)
The file was modified.github/workflows/build_and_test.yml (diff)
The file was modifiedpython/docs/source/reference/pyspark.sql.rst (diff)