Changes

Summary

  1. [SPARK-33396][SQL] Spark SQL CLI prints appliction id when process file (commit: 34a9a77) (details)
  2. [SPARK-33432][SQL] SQL parser should use active SQLConf (commit: 156704b) (details)
  3. [SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for (commit: eea846b) (details)
Commit 34a9a77ab5c5589330298ba8eb4a6435bb6b9cba by yumwang
[SPARK-33396][SQL] Spark SQL CLI prints appliction id when process file
### What changes were proposed in this pull request? Modify
SparkSQLCLIDriver.scala to move ahead calling the
cli.printMasterAndAppId method before process file.
### Why are the changes needed? Even though in SPARK-25043 it has
already brought in the printing application id feature. But the process
file situation seems have not been included. This small change is to
make spark-sql will also print out application id when process file.
### Does this PR introduce _any_ user-facing change? No.
### How was this patch tested? env
``` spark version: 3.0.1 os: centos 7
```
/tmp/tmp.sql
```sql select 1;
```
submit command:
```sh export HADOOP_USER_NAME=my-hadoop-user bin/spark-sql  \
--master yarn \
--deploy-mode client \
--queue my.queue.name \
--conf spark.driver.host=$(hostname -i) \
--conf spark.app.name=spark-test  \
--name "spark-test" \
-f /tmp/tmp.sql
```
execution log:
```sh 20/11/09 23:18:39 WARN NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable 20/11/09 23:18:40 WARN HiveConf: HiveConf of name
hive.spark.client.rpc.server.address.use.ip does not exist 20/11/09
23:18:40 WARN HiveConf: HiveConf of name
hive.spark.client.submit.timeout.interval does not exist 20/11/09
23:18:40 WARN HiveConf: HiveConf of name hive.enforce.bucketing does not
exist 20/11/09 23:18:40 WARN HiveConf: HiveConf of name
hive.server2.enable.impersonation does not exist 20/11/09 23:18:40 WARN
HiveConf: HiveConf of name hive.run.timeout.seconds does not exist
20/11/09 23:18:40 WARN HiveConf: HiveConf of name
hive.support.sql11.reserved.keywords does not exist 20/11/09 23:18:40
WARN DomainSocketFactory: The short-circuit local reads feature cannot
be used because libhadoop cannot be loaded. 20/11/09 23:18:41 WARN
SparkConf: Note that spark.local.dir will be overridden by the value set
by the cluster manager (via SPARK_LOCAL_DIRS in
mesos/standalone/kubernetes and LOCAL_DIRS in YARN). 20/11/09 23:18:42
WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set,
falling back to uploading libraries under SPARK_HOME. 20/11/09 23:18:52
WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request
executors before the AM has registered!
Spark master: yarn, Application Id: application_1567136266901_27355775 1
1 Time taken: 4.974 seconds, Fetched 1 row(s)
```
Closes #30301 from artiship/SPARK-33396.
Authored-by: artiship <meilziner@gmail.com> Signed-off-by: Yuming Wang
<yumwang@ebay.com>
(commit: 34a9a77)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala (diff)
Commit 156704ba0dfcae39a80b8f0ce778b73913db03b2 by dongjoon
[SPARK-33432][SQL] SQL parser should use active SQLConf
### What changes were proposed in this pull request?
This PR makes SQL parser using active SQLConf instead of the one in
ctor-parameters.
### Why are the changes needed?
In ANSI mode, schema string parsing should fail if the schema uses ANSI
reserved keyword as attribute name:
```scala spark.conf.set("spark.sql.ansi.enabled", "true")
spark.sql("""select from_json('{"time":"26/10/2015"}', 'time Timestamp',
map('timestampFormat',  'dd/MM/yyyy'));""").show
```
output:
> Cannot parse the data type:
> no viable alternative at input 'time'(line 1, pos 0)
>
> == SQL ==
> time Timestamp
> ^^^
But this query may accidentally succeed in certain cases cause the
DataType parser sticks to the configs of the first created session in
the current thread:
```scala DataType.fromDDL("time Timestamp") val newSpark =
spark.newSession() newSpark.conf.set("spark.sql.ansi.enabled", "true")
newSpark.sql("""select from_json('{"time":"26/10/2015"}', 'time
Timestamp', map('timestampFormat', 'dd/MM/yyyy'));""").show
```
output:
> +--------------------------------+
> |from_json({"time":"26/10/2015"})|
> +--------------------------------+
> |                   {2015-10-26 00:00...|
> +--------------------------------+
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Newly and updated UTs
Closes #30357 from luluorta/SPARK-33432.
Authored-by: luluorta <luluorta@gmail.com> Signed-off-by: Dongjoon Hyun
<dongjoon@apache.org>
(commit: 156704b)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/connector/expressions/expressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime-legacy.sql.out (diff)
The file was addedsql/core/src/test/resources/sql-tests/inputs/parse-schema-string.sql
The file was addedsql/core/src/test/resources/sql-tests/results/ansi/parse-schema-string.sql.out
The file was addedsql/core/src/test/resources/sql-tests/inputs/ansi/parse-schema-string.sql
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/datetime.sql (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Implicits.scala (diff)
The file was addedsql/core/src/test/resources/sql-tests/results/parse-schema-string.sql.out
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLDriver.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/internal/VariableSubstitution.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/datetime.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/internal/VariableSubstitutionSuite.scala (diff)
Commit eea846b8957badf016752264816dce55916628b4 by dongjoon
[SPARK-33455][SQL][TEST] Add SubExprEliminationBenchmark for
benchmarking subexpression elimination
### What changes were proposed in this pull request?
This patch adds a benchmark `SubExprEliminationBenchmark` for
benchmarking subexpression elimination feature.
### Why are the changes needed?
We need a benchmark for subexpression elimination feature for change
such as #30341.
### Does this PR introduce _any_ user-facing change?
No, dev only.
### How was this patch tested?
Unit test.
Closes #30379 from viirya/SPARK-33455.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: eea846b)
The file was addedsql/core/benchmarks/SubExprEliminationBenchmark-results.txt
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SqlBasedBenchmark.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonBenchmark.scala (diff)
The file was addedsql/core/benchmarks/SubExprEliminationBenchmark-jdk11-results.txt