Changes

Summary

  1. [SPARK-33292][SQL] Make Literal ArrayBasedMapData string representation (commit: 838791b) (details)
  2. [SPARK-33286][SQL] Improve the error message about schema parsing by (commit: 343e0bb) (details)
  3. [SPARK-33248][SQL] Add a configuration to control the legacy behavior of (commit: 0c943cd) (details)
Commit 838791bf0b8290143001fe8f94b1fbbd53a181d2 by dhyun
[SPARK-33292][SQL] Make Literal ArrayBasedMapData string representation
disambiguous
### What changes were proposed in this pull request?
This PR aims to wrap `ArrayBasedMapData` literal representation with
`map(...)`.
### Why are the changes needed?
Literal ArrayBasedMapData has inconsistent string representation from
`LogicalPlan` to `Optimized Logical Plan/Physical Plan`. Also, the
representation at `Optimized Logical Plan` and `Physical Plan` is
ambiguous like `[1 AS a#0, keys: [key1], values: [value1] AS b#1]`.
**BEFORE**
```scala scala> spark.version res0: String = 2.4.7
scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true)
== Parsed Logical Plan ==
'Project [1 AS a#0, 'map(key1, value1) AS b#1]
+- OneRowRelation
== Analyzed Logical Plan == a: int, b: map<string,string> Project [1 AS
a#0, map(key1, value1) AS b#1]
+- OneRowRelation
== Optimized Logical Plan == Project [1 AS a#0, keys: [key1], values:
[value1] AS b#1]
+- OneRowRelation
== Physical Plan ==
*(1) Project [1 AS a#0, keys: [key1], values: [value1] AS b#1]
+- Scan OneRowRelation[]
```
**AFTER**
```scala scala> spark.version res0: String = 3.1.0-SNAPSHOT
scala> sql("SELECT 1 a, map('key1', 'value1') b").explain(true)
== Parsed Logical Plan ==
'Project [1 AS a#4, 'map(key1, value1) AS b#5]
+- OneRowRelation
== Analyzed Logical Plan == a: int, b: map<string,string> Project [1 AS
a#4, map(key1, value1) AS b#5]
+- OneRowRelation
== Optimized Logical Plan == Project [1 AS a#4, map(keys: [key1],
values: [value1]) AS b#5]
+- OneRowRelation
== Physical Plan ==
*(1) Project [1 AS a#4, map(keys: [key1], values: [value1]) AS b#5]
+- *(1) Scan OneRowRelation[]
```
### Does this PR introduce _any_ user-facing change?
Yes. This changes the query plan's string representation in `explain`
command and UI. However, this is a bug fix.
### How was this patch tested?
Pass the CI with the newly added test case.
Closes #30190 from dongjoon-hyun/SPARK-33292.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 838791b)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralExpressionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala (diff)
Commit 343e0bb3adae465547e1423ea79f07d0e79adee7 by gurwls223
[SPARK-33286][SQL] Improve the error message about schema parsing by
`from_json/from_csv`
# What changes were proposed in this pull request? In the PR, I propose
to improve the error message from `from_json`/`from_csv` by combining
errors from all schema parsers:
- DataType.fromJson (except CSV)
- CatalystSqlParser.parseDataType
- CatalystSqlParser.parseTableSchema
Before the changes, `from_json` does not show error messages from the
first parser in the chain that could mislead users.
### Why are the changes needed? Currently, `from_json` outputs the error
message from the fallback schema parser which can confuse end-users. For
example:
```scala
   val invalidJsonSchema = """{"fields": [{"a":123}], "type":
"struct"}"""
   df.select(from_json($"json", invalidJsonSchema, Map.empty[String,
String])).show()
``` The JSON schema has an issue in `{"a":123}` but the error message
doesn't point it out:
``` mismatched input '{' expecting {'ADD', 'AFTER', ...}(line 1, pos 0)
== SQL ==
{"fields": [{"a":123}], "type": "struct"}
^^^
org.apache.spark.sql.catalyst.parser.ParseException: mismatched input
'{' expecting {'ADD', 'AFTER',  ... }(line 1, pos 0)
== SQL ==
{"fields": [{"a":123}], "type": "struct"}
^^^
```
### Does this PR introduce _any_ user-facing change? Yes, after the
changes for the example above:
``` Cannot parse the schema in JSON format: Failed to convert the JSON
string '{"a":123}' to a field. Failed fallback parsing: Cannot parse the
data type: mismatched input '{' expecting {'ADD', 'AFTER', ...}(line 1,
pos 0)
== SQL ==
{"fields": [{"a":123}], "type": "struct"}
^^^
Failed fallback parsing: mismatched input '{' expecting {'ADD', 'AFTER',
...}(line 1, pos 0)
== SQL ==
{"fields": [{"a":123}], "type": "struct"}
^^^
```
### How was this patch tested?
- By existing tests suites like `JsonFunctionsSuite` and
`JsonExpressionsSuite`.
- Add new test to `JsonFunctionsSuite`.
- Re-gen results for `json-functions.sql`.
Closes #30183 from MaxGekk/fromDDL-error-msg.
Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 343e0bb)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/csv-functions.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/json-functions.sql.out (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
Commit 0c943cd2fbc6f2d25588991613abf469ace0153e by gurwls223
[SPARK-33248][SQL] Add a configuration to control the legacy behavior of
whether need to pad null value when value size less then schema size
### What changes were proposed in this pull request? Add a configuration
to control the legacy behavior of whether need to pad null value when
value size less then schema size. Since we can't decide whether it's a
but and some use need it behavior same as Hive.
### Why are the changes needed? Provides a compatible choice between
historical behavior and Hive
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? Existed UT
Closes #30156 from AngersZhuuuu/SPARK-33284.
Lead-authored-by: angerszhu <angers.zhu@gmail.com> Co-authored-by:
AngersZhuuuu <angers.zhu@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 0c943cd)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)