Changes

Summary

  1. [SPARK-33118][SQL] CREATE TEMPORARY TABLE fails with location (commit: 819f12e) (details)
  2. [SPARK-32455][ML][FOLLOW-UP] LogisticRegressionModel prediction (commit: 86d26b4) (details)
  3. [SPARK-33119][SQL] ScalarSubquery should returns the first two rows to (commit: e34f2d8) (details)
  4. [SPARK-32295][SQL] Add not null and size > 0 filters before inner (commit: 17eebd7) (details)
  5. [SPARK-33115][BUILD][DOCS] Fix javadoc errors in `kvstore` and `unsafe` (commit: 1b0875b) (details)
  6. [SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric (commit: feee8da) (details)
  7. [SPARK-33081][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: update (commit: af3e2f7) (details)
  8. [SPARK-33125][SQL] Improve the error when Lead and Lag are not allowed (commit: 2b7239e) (details)
  9. [SPARK-13860][SQL] Change statistical aggregate function to return null (commit: dc697a8) (details)
  10. [SPARK-33129][BUILD][DOCS] Updating the build/sbt references to (commit: 304ca1e) (details)
  11. [SPARK-33132][WEBUI] Make `formatBytes` return `0.0 B` for negative (commit: 1bfcb51) (details)
  12. [SPARK-33134][SQL] Return partial results only for root JSON objects (commit: 05a62dc) (details)
  13. [SPARK-33061][SQL] Expose inverse hyperbolic trig functions through (commit: d8c4a47) (details)
  14. [SPARK-33136][SQL] Fix mistakenly swapped parameter in (commit: 8e5cb1d) (details)
  15. [SPARK-33026][SQL][FOLLOWUP] metrics name should be numOutputRows (commit: f3ad32f) (details)
  16. [SPARK-33146][CORE] Check for non-fatal errors when loading new (commit: 9ab0ec4) (details)
  17. [SPARK-33153][SQL][TESTS] Ignore Spark 2.4 in (commit: ec34a00) (details)
  18. [SPARK-32932][SQL] Do not use local shuffle reader at final stage on (commit: 77a8efb) (details)
  19. [SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only (commit: 8e7c390) (details)
  20. [SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04 (commit: e85ed8a) (details)
  21. [SPARK-33079][TESTS] Replace the existing Maven job for Scala 2.13 in (commit: 513b6f5) (details)
  22. [SPARK-32402][SQL][FOLLOW-UP] Use quoted column name for (commit: 31f7097) (details)
  23. [SPARK-32247][INFRA] Install and test scipy with PyPy in GitHub Actions (commit: b089fe5) (details)
  24. [SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to (commit: 82eea13) (details)
  25. [SPARK-33078][SQL] Add config for json expression optimization (commit: 9e37464) (details)
  26. [SPARK-33080][BUILD] Replace fatal warnings snippet (commit: ba69d68) (details)
Commit 819f12ee2fe3cce0c59221c2b02831274c769b23 by dhyun
[SPARK-33118][SQL] CREATE TEMPORARY TABLE fails with location
### What changes were proposed in this pull request?
We have a problem when you use CREATE TEMPORARY TABLE with LOCATION
```scala spark.range(3).write.parquet("/tmp/testspark1")
sql("CREATE TEMPORARY TABLE t USING parquet OPTIONS (path
'/tmp/testspark1')") sql("CREATE TEMPORARY TABLE t USING parquet
LOCATION '/tmp/testspark1'")
```
```scala org.apache.spark.sql.AnalysisException: Unable to infer schema
for Parquet. It must be specified manually.;
at
org.apache.spark.sql.execution.datasources.DataSource.$anonfun$getOrInferFileFormatSchema$12(DataSource.scala:200)
at scala.Option.getOrElse(Option.scala:189)
at
org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:200)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:408)
at
org.apache.spark.sql.execution.datasources.CreateTempViewUsing.run(ddl.scala:94)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
at
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3618)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3616)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
at
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
``` This bug was introduced by SPARK-30507. sparksqlparser -->
visitCreateTable --> visitCreateTableClauses --> cleanTableOptions
extract the path from the options but in this case CreateTempViewUsing
need the path in the options map.
### Why are the changes needed?
To fix the problem
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit testing and manual testing
Closes #30014 from
planga82/bugfix/SPARK-33118_create_temp_table_location.
Authored-by: Pablo <pablo.langa@stratio.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 819f12e)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala (diff)
Commit 86d26b46a53acf52b85ac990059be9e5a3ec0318 by ruifengz
[SPARK-32455][ML][FOLLOW-UP] LogisticRegressionModel prediction
optimization - fix incorrect initialization
### What changes were proposed in this pull request? use `lazy array`
instead of `var` for auxiliary variables in binary lor
### Why are the changes needed? In
https://github.com/apache/spark/pull/29255, I made a mistake: the
`private var _threshold` and `_rawThreshold`  are initialized by defaut
values of `threshold`, that is beacuse: 1, param `threshold` is set
default value at first; 2, `_threshold` and `_rawThreshold` are
initialized based on the default value; 3, param `threshold` is updated
by the value from estimator, by `copyValues` method:
```
     if (map.contains(param) && to.hasParam(param.name)) {
       to.set(param.name, map(param))
     }
```
We can update `_threshold` and `_rawThreshold` in `setThreshold` and
`setThresholds`, but we can not update them in `set`/`copyValues` so
their values are kept until methods `setThreshold` and `setThresholds`
are called.
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? test in repl
Closes #30013 from zhengruifeng/lor_threshold_init.
Authored-by: zhengruifeng <ruifengz@foxmail.com> Signed-off-by:
zhengruifeng <ruifengz@foxmail.com>
(commit: 86d26b4)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala (diff)
Commit e34f2d8df222056e9c2195dec6138fa1af9ca4e1 by gurwls223
[SPARK-33119][SQL] ScalarSubquery should returns the first two rows to
avoid Driver OOM
### What changes were proposed in this pull request?
`ScalarSubquery` should returns the first two rows.
### Why are the changes needed?
To avoid Driver OOM.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing test:
https://github.com/apache/spark/blob/d6f3138352042e33a2291e11c325b8eadb8dd5f2/sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala#L147-L154
Closes #30016 from wangyum/SPARK-33119.
Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: e34f2d8)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala (diff)
Commit 17eebd72097ee65e22cdaddf375e868074251f5a by yamamuro
[SPARK-32295][SQL] Add not null and size > 0 filters before inner
explode/inline to benefit from predicate pushdown
### What changes were proposed in this pull request?
Add `And(IsNotNull(e), GreaterThan(Size(e), Literal(0)))` filter before
Explode, PosExplode and Inline, when `outer = false`. Removed unused
`InferFiltersFromConstraints` from `operatorOptimizationRuleSet` to
avoid confusion that happened during the review process.
### Why are the changes needed?
Predicate pushdown will be able to move this new filter down through
joins and into data sources for performance improvement.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit test
Closes #29092 from tanelk/SPARK-32295.
Lead-authored-by: tanel.kiis@gmail.com <tanel.kiis@gmail.com>
Co-authored-by: Tanel Kiis <tanel.kiis@reach-u.com> Signed-off-by:
Takeshi Yamamuro <yamamuro@apache.org>
(commit: 17eebd7)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromGenerateSuite.scala
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
Commit 1b0875b6924b4f29aa3cdecc26f8103fcae3dc55 by gurwls223
[SPARK-33115][BUILD][DOCS] Fix javadoc errors in `kvstore` and `unsafe`
modules
### What changes were proposed in this pull request?
Fix Javadoc generation errors in `kvstore` and `unsafe` modules
according to error message hints.
### Why are the changes needed?
Fixes `doc` task failures which prevented other tasks successful
executions (eg `publishLocal` task depends on `doc` task).
### Does this PR introduce _any_ user-facing change?
No. Meaning of text in Javadoc is stayed the same.
### How was this patch tested?
Run `build/sbt kvstore/Compile/doc`, `build/sbt unsafe/Compile/doc` and
`build/sbt doc` without errors.
Closes #30007 from gemelen/feature/doc-task-fix.
Authored-by: Denis Pyshev <git@gemelen.net> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 1b0875b)
The file was modifiedcommon/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java (diff)
The file was modifiedcommon/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java (diff)
Commit feee8da14bf506cda30506780fbcf0b8723123f9 by wenchen
[SPARK-32858][SQL] UnwrapCastInBinaryComparison: support other numeric
types
### What changes were proposed in this pull request?
In SPARK-24994 we implemented unwrapping cast for **integral types**.
This extends it to support **numeric types** such as
float/double/decimal, so that filters involving these types can be
better pushed down to data sources.
Unlike the cases of integral types, conversions between numeric types
can result to rounding up or downs. Consider the following case:
```sql cast(e as double) < 1.9
```
assume type of `e` is short, since 1.9 is not representable in the type,
the casting will either truncate or round. Now suppose the literal is
truncated, we cannot convert the expression to:
```sql e < cast(1.9 as short)
```
as in the previous implementation, since if `e` is 1, the original
expression evaluates to true, but converted expression will evaluate to
false.
To resolve the above, this PR first finds out whether casting from the
wider type to the narrower type will result to truncate or round, by
comparing a _roundtrip value_ derived from **converting the literal
first to the narrower type, and then to the wider type**, versus the
original literal value. For instance, in the above, we'll first obtain a
roundtrip value via the conversion (double) 1.9 -> (short) 1 -> (double)
1.0, and then compare it against 1.9.
<img width="1153" alt="Screen Shot 2020-09-28 at 3 30 27 PM"
src="https://user-images.githubusercontent.com/506679/94492719-bd29e780-019f-11eb-9111-71d6e3d157f7.png">
Now in the case of truncate, we'd convert the original expression to:
```sql e <= cast(1.9 as short)
``` instead, so that the conversion also is valid when `e` is 1.
For more details, please check [this blog
post](https://prestosql.io/blog/2019/05/21/optimizing-the-casts-away.html)
by Presto which offers a very good explanation on how it works.
### Why are the changes needed?
For queries such as:
```sql SELECT * FROM tbl WHERE short_col < 100.5
``` The predicate `short_col < 100.5` can't be pushed down to data
sources because it involves casts. This eliminates the cast so these
queries can run more efficiently.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit tests
Closes #29792 from sunchao/SPARK-32858.
Lead-authored-by: Chao Sun <sunchao@apple.com> Co-authored-by: Chao Sun
<sunchao@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: feee8da)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparison.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/UnwrapCastInBinaryComparisonSuite.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/UnwrapCastInComparisonEndToEndSuite.scala
Commit af3e2f7d58507a47e2d767552209c309637a3170 by wenchen
[SPARK-33081][SQL] Support ALTER TABLE in JDBC v2 Table Catalog: update
type and nullability of columns (DB2 dialect)
### What changes were proposed in this pull request?
- Override the default SQL strings in the DB2 Dialect for:
  * ALTER TABLE UPDATE COLUMN TYPE
* ALTER TABLE UPDATE COLUMN NULLABILITY
- Add new docker integration test suite
jdbc/v2/DB2IntegrationSuite.scala
### Why are the changes needed? In SPARK-24907, we implemented JDBC v2
Table Catalog but it doesn't support some ALTER TABLE at the moment.
This PR supports DB2 specific ALTER TABLE.
### Does this PR introduce _any_ user-facing change? Yes
### How was this patch tested? By running new integration test suite:
$ ./build/sbt -Pdocker-integration-tests "test-only
*.DB2IntegrationSuite"
Closes #29972 from huaxingao/db2_docker.
Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: af3e2f7)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala (diff)
The file was addedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala
The file was addedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/DB2IntegrationSuite.scala
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/OracleIntegrationSuite.scala (diff)
Commit 2b7239edfb02dc74415f6c9e6a675e1ba46ac195 by wenchen
[SPARK-33125][SQL] Improve the error when Lead and Lag are not allowed
to specify window frame
### What changes were proposed in this pull request? Except for
Postgresql, other data sources (for example: vertica, oracle, redshift,
mysql, presto) are not allowed to specify window frame for the Lead and
Lag functions.
But the current error message is not clear enough.
`Window Frame $f must match the required frame` This PR will use the
following error message.
`Cannot specify window frame for lead function`
### Why are the changes needed? Make clear error message.
### Does this PR introduce _any_ user-facing change? Yes Users will see
the clearer error message.
### How was this patch tested? Jenkins test.
Closes #30021 from beliefer/SPARK-33125.
Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer
<beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 2b7239e)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
Commit dc697a8b598aea922ee6620d87f3ace2f7947231 by wenchen
[SPARK-13860][SQL] Change statistical aggregate function to return null
instead of Double.NaN when divideByZero
### What changes were proposed in this pull request?
As [SPARK-13860](https://issues.apache.org/jira/browse/SPARK-13860)
stated, TPCDS Query 39 returns wrong results using SparkSQL. The root
cause is that when stddev_samp is applied to a single element set, with
TPCDS answer, it return null; as in SparkSQL, it return Double.NaN which
caused the wrong result.
Add an extra legacy config to fallback into the NaN logical, and return
null by default to align with TPCDS standard.
### Why are the changes needed?
SQL correctness issue.
### Does this PR introduce any user-facing change? Yes. See
sql-migration-guide
In Spark 3.1, statistical aggregation function includes `std`, `stddev`,
`stddev_samp`, `variance`, `var_samp`, `skewness`, `kurtosis`,
`covar_samp`, `corr` will return `NULL` instead of `Double.NaN` when
`DivideByZero` occurs during expression evaluation, for example, when
`stddev_samp` applied on a single element set. In Spark version 3.0 and
earlier, it will return `Double.NaN` in such case. To restore the
behavior before Spark 3.1, you can set
`spark.sql.legacy.statisticalAggregate` to `true`.
### How was this patch tested? Updated
DataFrameAggregateSuite/DataFrameWindowFunctionsSuite to test both
default and legacy behavior. Adjust
DataFrameWindowFunctionsSuite/SQLQueryTestSuite and some R case to
update to the default return null behavior.
Closes #29983 from leanken/leanken-SPARK-13860.
Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: dc697a8)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/window.sql.out (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-window.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Corr.scala (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/aggregates_part1.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/typeCoercion/native/promoteStrings.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/window_part4.sql.out (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowQuerySuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-aggregates_part1.sql.out (diff)
The file was modifiedR/pkg/tests/fulltests/test_sparkSQL.R (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala (diff)
Commit 304ca1ec93e299ebb32f961eafcaac249a45585c by dhyun
[SPARK-33129][BUILD][DOCS] Updating the build/sbt references to
test-only with testOnly for SBT 1.3.x
### What changes were proposed in this pull request?
test-only - > testOnly in docs across the project.
### Why are the changes needed?
Since the sbt version is updated, the older way or running i.e.
`test-only` is no longer valid.
### Does this PR introduce _any_ user-facing change?
docs update.
### How was this patch tested?
Manually.
Closes #30028 from ScrapCodes/fix-build/sbt-sample.
Authored-by: Prashant Sharma <prashsh1@in.ibm.com> Signed-off-by:
Dongjoon Hyun <dhyun@apple.com>
(commit: 304ca1e)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/OracleIntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2KrbIntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerQueryTestSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ExpressionsSchemaSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala (diff)
Commit 1bfcb51eebf074588ce84cc2143113ab05f07392 by dhyun
[SPARK-33132][WEBUI] Make `formatBytes` return `0.0 B` for negative
input instead of `NaN`
### What changes were proposed in this pull request? when bytesRead
metric was negative, `formatBytes` in `ui.js` should just return `0.0 B`
to avoid `NaN Undefined` result.
### Why are the changes needed? Strengthen the parameter validataion to
improve metric display on Summary Metrics of Spark  Stage UI.
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? It's a small change, just manual test.
Closes #30030 from akiyamaneko/formatBytes_NaN.
Authored-by: neko <echohlne@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: 1bfcb51)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/utils.js (diff)
Commit 05a62dcada0176301307b0af194b50c383f496ff by gurwls223
[SPARK-33134][SQL] Return partial results only for root JSON objects
### What changes were proposed in this pull request? In the PR, I
propose to restrict the partial result feature only by root JSON
objects. JSON datasource as well as `from_json()` will return `null` for
malformed nested JSON objects.
### Why are the changes needed? 1. To not raise exception to users in
the PERMISSIVE mode 2. To fix a regression and to have the same behavior
as Spark 2.4.x has 3. Current implementation of partial result is
supposed to work only for root (top-level) JSON objects, and not tested
for bad nested complex JSON fields.
### Does this PR introduce _any_ user-facing change? Yes. Before the
changes, the code below:
```scala
   val pokerhand_raw = Seq("""[{"cards": [19], "playerId":
123456}]""").toDF("events")
   val event = new StructType().add("playerId", LongType).add("cards",
ArrayType(new StructType().add("id", LongType).add("rank", StringType)))
   val pokerhand_events = pokerhand_raw.select(from_json($"events",
ArrayType(event)).as("event"))
   pokerhand_events.show
``` throws the exception even in the default **PERMISSIVE** mode:
```java java.lang.ClassCastException: java.lang.Long cannot be cast to
org.apache.spark.sql.catalyst.util.ArrayData
at
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray(rows.scala:48)
at
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getArray$(rows.scala:48)
at
org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getArray(rows.scala:195)
```
After the changes:
```
+-----+
|event|
+-----+
| null|
+-----+
```
### How was this patch tested? Added a test to `JsonFunctionsSuite`.
Closes #30031 from MaxGekk/json-skip-row-wrong-schema.
Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 05a62dc)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala (diff)
Commit d8c4a47ea19d18b0aad22263d002267d663c2f66 by srowen
[SPARK-33061][SQL] Expose inverse hyperbolic trig functions through
sql.functions API
This patch is a small extension to change-request SPARK-28133, which
added inverse hyperbolic functions to the SQL interpreter, but did not
include those methods within the Scala `sql.functions._` API. This patch
makes `acosh`, `asinh` and `atanh` functions available through the Scala
API.
Unit-tests have been added to
`sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala`.
Manual testing has been done via `spark-shell`, using the following
recipe:
``` val df = spark.range(0, 11)
             .toDF("x")
             .withColumn("x", ($"x" - 5) / 2.0) val hyps =
df.withColumn("tanh", tanh($"x"))
            .withColumn("sinh", sinh($"x"))
            .withColumn("cosh", cosh($"x")) val invhyps =
hyps.withColumn("atanh", atanh($"tanh"))
                 .withColumn("asinh", asinh($"sinh"))
                 .withColumn("acosh", acosh($"cosh")) invhyps.show
``` which produces the following output:
```
+----+--------------------+-------------------+------------------+-------------------+-------------------+------------------+
|   x|                tanh|               sinh|              cosh|     
       atanh|              asinh|             acosh|
+----+--------------------+-------------------+------------------+-------------------+-------------------+------------------+
|-2.5| -0.9866142981514303|-6.0502044810397875| 6.132289479663686|
-2.500000000000001|-2.4999999999999956|               2.5|
|-2.0| -0.9640275800758169|
-3.626860407847019|3.7621956910836314|-2.0000000000000004|-1.9999999999999991|
              2.0|
|-1.5| -0.9051482536448664|-2.1292794550948173|
2.352409615243247|-1.4999999999999998|-1.4999999999999998|             
1.5|
|-1.0| -0.7615941559557649|-1.1752011936438014| 1.543080634815244|     
        -1.0|               -1.0|               1.0|
|-0.5|-0.46211715726000974|-0.5210953054937474|1.1276259652063807|     
        -0.5|-0.5000000000000002|0.4999999999999998|
| 0.0|                 0.0|                0.0|               1.0|     
         0.0|                0.0|               0.0|
| 0.5| 0.46211715726000974| 0.5210953054937474|1.1276259652063807|     
         0.5|                0.5|0.4999999999999998|
| 1.0|  0.7615941559557649| 1.1752011936438014| 1.543080634815244|     
         1.0|                1.0|               1.0|
| 1.5|  0.9051482536448664| 2.1292794550948173| 2.352409615243247|
1.4999999999999998|                1.5|               1.5|
| 2.0|  0.9640275800758169|  3.626860407847019|3.7621956910836314|
2.0000000000000004|                2.0|               2.0|
| 2.5|  0.9866142981514303| 6.0502044810397875| 6.132289479663686|
2.500000000000001|                2.5|               2.5|
+----+--------------------+-------------------+------------------+-------------------+-------------------+------------------+
```
Closes #29938 from rwpenney/fix/inverse-hyperbolics.
Authored-by: Richard Penney <rwp@rwpenney.uk> Signed-off-by: Sean Owen
<srowen@gmail.com>
(commit: d8c4a47)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala (diff)
Commit 8e5cb1d276686ec428e4e6aa1c3cfd6bb99e4e9a by dhyun
[SPARK-33136][SQL] Fix mistakenly swapped parameter in
V2WriteCommand.outputResolved
### What changes were proposed in this pull request?
This PR proposes to fix a bug on calling
`DataType.equalsIgnoreCompatibleNullability` with mistakenly swapped
parameters in `V2WriteCommand.outputResolved`. The order of parameters
for `DataType.equalsIgnoreCompatibleNullability` are `from` and `to`,
which says that the right order of matching variables are `inAttr` and
`outAttr`.
### Why are the changes needed?
Spark throws AnalysisException due to unresolved operator in v2 write,
while the operator is unresolved due to a bug that parameters to call
`DataType.equalsIgnoreCompatibleNullability` in `outputResolved` have
been swapped.
### Does this PR introduce _any_ user-facing change?
Yes, end users no longer suffer on unresolved operator in v2 write if
they're trying to write dataframe containing non-nullable complex types
against table matching complex types as nullable.
### How was this patch tested?
New UT added.
Closes #30033 from HeartSaVioR/SPARK-33136.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 8e5cb1d)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
Commit f3ad32f4b6fc55e89e7fb222ed565ad3e32d47c6 by wenchen
[SPARK-33026][SQL][FOLLOWUP] metrics name should be numOutputRows
### What changes were proposed in this pull request?
Follow the convention and rename the metrics `numRows` to
`numOutputRows`
### Why are the changes needed?
`FilterExec`, `HashAggregateExec`, etc. all use `numOutputRows`
### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?
existing tests
Closes #30039 from cloud-fan/minor.
Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen
Fan <wenchen@databricks.com>
(commit: f3ad32f)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala (diff)
Commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90 by kabhwan.opensource
[SPARK-33146][CORE] Check for non-fatal errors when loading new
applications in SHS
### What changes were proposed in this pull request?
Adds an additional check for non-fatal errors when attempting to add a
new entry to the history server application listing.
### Why are the changes needed?
A bad rolling event log folder (missing appstatus file or no log files)
would cause no applications to be loaded by the Spark history server.
Figuring out why invalid event log folders are created in the first
place will be addressed in separate issues, this just lets the history
server skip the invalid folder and successfully load all the valid
applications.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
New UT
Closes #30037 from Kimahriman/bug/rolling-log-crashing-history.
Authored-by: Adam Binford <adam.binford@radiantsolutions.com>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
(commit: 9ab0ec4)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala (diff)
Commit ec34a001ad0ef57a496f29a6523d905128875b17 by dhyun
[SPARK-33153][SQL][TESTS] Ignore Spark 2.4 in
HiveExternalCatalogVersionsSuite on Python 3.8/3.9
### What changes were proposed in this pull request?
This PR aims to ignore Apache Spark 2.4.x distribution in
HiveExternalCatalogVersionsSuite if Python version is 3.8 or 3.9.
### Why are the changes needed?
Currently, `HiveExternalCatalogVersionsSuite` is broken on the latest OS
like `Ubuntu 20.04` because its default Python version is 3.8. PySpark
2.4.x doesn't work on Python 3.8 due to SPARK-29536.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manually.
```
$ python3 --version Python 3.8.5
$ build/sbt "hive/testOnly *.HiveExternalCatalogVersionsSuite"
...
[info] All tests passed.
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
```
Closes #30044 from dongjoon-hyun/SPARK-33153.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: ec34a00)
The file was modifiedcore/src/main/scala/org/apache/spark/TestUtils.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala (diff)
Commit 77a8efbc05cb4ecc40dd050c363429e71a9f23c1 by wenchen
[SPARK-32932][SQL] Do not use local shuffle reader at final stage on
write command
### What changes were proposed in this pull request? Do not use local
shuffle reader at final stage if the root node is write command.
### Why are the changes needed? Users usually repartition with partition
column on dynamic partition overwrite. AQE could break it by removing
physical shuffle with local shuffle reader. That could lead to a large
number of output files, even exceeding the file system limit.
### Does this PR introduce _any_ user-facing change? Yes.
### How was this patch tested? Add test.
Closes #29797 from manuzhang/spark-32932.
Authored-by: manuzhang <owenzhang1990@gmail.com> Signed-off-by: Wenchen
Fan <wenchen@databricks.com>
(commit: 77a8efb)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala (diff)
Commit 8e7c39089f885413f5e5e1bdafc2d426291a8719 by dhyun
[SPARK-33155][K8S] spark.kubernetes.pyspark.pythonVersion allows only
'3'
### What changes were proposed in this pull request?
This PR makes `spark.kubernetes.pyspark.pythonVersion` allow only `3`.
In other words, it will reject `2` for `Python 2`.
- [x] Configuration description and check is updated.
- [x] Documentation is updated
- [x] Unit test cases are updated.
- [x] Docker image script is updated.
### Why are the changes needed?
After SPARK-32138, Apache Spark 3.1 dropped Python 2 support.
### Does this PR introduce _any_ user-facing change?
Yes, but Python 2 support is already dropped officially.
### How was this patch tested?
Pass the CI.
Closes #30049 from dongjoon-hyun/SPARK-DROP-PYTHON2.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 8e7c390)
The file was modifiedresource-managers/kubernetes/integration-tests/tests/pyfiles.py (diff)
The file was modifieddocs/running-on-kubernetes.md (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/PythonTestsSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/DriverCommandFeatureStepSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala (diff)
Commit e85ed8a14c7766ea0fafc32fd9c6ac95c86c8c8f by dhyun
[SPARK-33156][INFRA] Upgrade GithubAction image from 18.04 to 20.04
### What changes were proposed in this pull request?
This PR aims to upgrade `Github Action` runner image from `Ubuntu 18.04
(LTS)` to `Ubuntu 20.04 (LTS)`.
### Why are the changes needed?
`ubuntu-latest` in `GitHub Action` is still `Ubuntu 18.04 (LTS)`.
- https://github.com/actions/virtual-environments#available-environments
This upgrade will help Apache Spark 3.1+ preparation for vote and
release on the latest OS.
This is tested here.
- https://github.com/dongjoon-hyun/spark/pull/36
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the `Github Action` in this PR.
Closes #30050 from dongjoon-hyun/ubuntu_20.04.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: e85ed8a)
The file was modified.github/workflows/build_and_test.yml (diff)
Commit 513b6f5af2b873ca8737fd7f0c42fdfd4fa24292 by gurwls223
[SPARK-33079][TESTS] Replace the existing Maven job for Scala 2.13 in
Github Actions with SBT job
### What changes were proposed in this pull request?
SPARK-32926 added a build test to GitHub Action for Scala 2.13 but it's
only with Maven. As SPARK-32873 reported, some compilation error happens
only with SBT so I think we need to add another build test to GitHub
Action for SBT. Unfortunately, we don't have abundant resources for
GitHub Actions so instead of just adding the new SBT job, let's replace
the existing Maven job with the new SBT job for Scala 2.13.
### Why are the changes needed?
To ensure build test passes even with SBT for Scala 2.13.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
GitHub Actions' job.
Closes #29958 from sarutak/add-sbt-job-for-scala-2.13.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 513b6f5)
The file was modifiedexternal/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala (diff)
The file was modified.github/workflows/build_and_test.yml (diff)
Commit 31f7097ce0d7eade17a96fe01184e62a88fd2bbd by wenchen
[SPARK-32402][SQL][FOLLOW-UP] Use quoted column name for
JDBCTableCatalog.alterTable
### What changes were proposed in this pull request? I currently have
unquoted column names in alter table, e.g. ```ALTER TABLE
"test"."alt_table" DROP COLUMN c1``` should change to quoted column name
```ALTER TABLE "test"."alt_table" DROP COLUMN "c1"```
### Why are the changes needed? We should always use quoted identifiers
in JDBC SQLs, e.g. ```CREATE TABLE "test"."abc" ("col" INTEGER )  ``` or
```INSERT INTO "test"."abc" ("col") VALUES (?)```. Using unquoted column
name in alterTable causes problems, for example:
``` sql("CREATE TABLE h2.test.alt_table (c1 INTEGER, c2 INTEGER) USING
_") sql("ALTER TABLE h2.test.alt_table DROP COLUMN c1")
org.apache.spark.sql.AnalysisException: Failed table altering:
test.alt_table;
......
Caused by: org.h2.jdbc.JdbcSQLException: Column "C1" not found; SQL
statement: ALTER TABLE "test"."alt_table" DROP COLUMN c1 [42122-195]
```
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? Existing tests
Closes #30041 from huaxingao/alter_table_followup.
Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 31f7097)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/jdbc/DB2Dialect.scala (diff)
Commit b089fe5376d72ccd0a6724ac9aa2386c5a81b06b by dhyun
[SPARK-32247][INFRA] Install and test scipy with PyPy in GitHub Actions
### What changes were proposed in this pull request?
This PR proposes to install `scipy` as well in PyPy. It will test
several ML specific test cases in PyPy as well. For example,
https://github.com/apache/spark/blob/31a16fbb405a19dc3eb732347e0e1f873b16971d/python/pyspark/mllib/tests/test_linalg.py#L487
It was not installed when GitHub Actions build was added because it
failed to install for an unknown reason. Seems like it's fixed in the
latest scipy.
### Why are the changes needed?
To improve test coverage.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
GitHub Actions build in this PR will test it out.
Closes #30054 from HyukjinKwon/SPARK-32247.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: b089fe5)
The file was modified.github/workflows/build_and_test.yml (diff)
Commit 82eea13c7686fb4bfbe8fb4185db81438d2ea884 by mridulatgmail.com
[SPARK-32915][CORE] Network-layer and shuffle RPC layer changes to
support push shuffle blocks
### What changes were proposed in this pull request?
This is the first patch for SPIP SPARK-30602 for push-based shuffle.
Summary of changes:
* Introduce new API in ExternalBlockStoreClient to push blocks to a
remote shuffle service.
* Leveraging the streaming upload functionality in SPARK-6237, it also
enables the ExternalBlockHandler to delegate the handling of block push
requests to MergedShuffleFileManager.
* Propose the API for MergedShuffleFileManager, where the core logic on
the shuffle service side to handle block push requests is defined. The
actual implementation of this API is deferred into a later RB to
restrict the size of this PR.
* Introduce OneForOneBlockPusher to enable pushing blocks to remote
shuffle services in shuffle RPC layer.
* New protocols in shuffle RPC layer to support the functionalities.
### Why are the changes needed?
Refer to the SPIP in SPARK-30602
### Does this PR introduce _any_ user-facing change? No.
### How was this patch tested? Added unit tests. The reference PR with
the consolidated changes covering the complete implementation is also
provided in SPARK-30602. We have already verified the functionality and
the improved performance as documented in the SPIP doc.
Lead-authored-by: Min Shen <mshenlinkedin.com> Co-authored-by: Chandni
Singh <chsinghlinkedin.com> Co-authored-by: Ye Zhou <yezhoulinkedin.com>
Closes #29855 from Victsm/SPARK-32915.
Lead-authored-by: Min Shen <mshen@linkedin.com> Co-authored-by: Chandni
Singh <chsingh@linkedin.com> Co-authored-by: Ye Zhou
<yezhou@linkedin.com> Co-authored-by: Chandni Singh
<singh.chandni@gmail.com> Co-authored-by: Min Shen
<victor.nju@gmail.com> Signed-off-by: Mridul Muralidharan
<mridul<at>gmail.com>
(commit: 82eea13)
The file was modifiedcommon/network-common/pom.xml (diff)
The file was modifiedcommon/network-shuffle/pom.xml (diff)
The file was modifiedresource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceMetricsSuite.scala (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java (diff)
The file was addedcommon/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ErrorHandlerSuite.java
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java (diff)
The file was modifiedcommon/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalBlockHandlerSuite.java (diff)
The file was modifiedcommon/network-common/src/main/java/org/apache/spark/network/protocol/Encoders.java (diff)
The file was addedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ErrorHandler.java
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockFetcher.java (diff)
The file was addedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/MergedShuffleFileManager.java
The file was addedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockPusher.java
The file was addedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/FinalizeShuffleMerge.java
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/BlockTransferMessage.java (diff)
The file was addedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/MergeStatuses.java
The file was addedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/MergedBlockMeta.java
The file was addedcommon/network-shuffle/src/test/java/org/apache/spark/network/shuffle/OneForOneBlockPusherSuite.java
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/ExternalShuffleServiceMetricsSuite.scala (diff)
The file was addedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/PushBlockStream.java
The file was modifiedresource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala (diff)
Commit 9e3746469c23fd88f6dacc5082a157ca6970414e by dongjoon
[SPARK-33078][SQL] Add config for json expression optimization
### What changes were proposed in this pull request?
This proposes to add a config for json expression optimization.
### Why are the changes needed?
For the new Json expression optimization rules, it is safer if we can
disable it using SQL config.
### Does this PR introduce _any_ user-facing change?
Yes, users can disable json expression optimization rule.
### How was this patch tested?
Unit test
Closes #30047 from viirya/SPARK-33078.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: 9e37464)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJsonExprsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeJsonExprs.scala (diff)
Commit ba69d68d91eed2773c56a1cd82043aba42cecea3 by srowen
[SPARK-33080][BUILD] Replace fatal warnings snippet
### What changes were proposed in this pull request?
Current solution in build file to enable build failure on compilation
warnings with exclusion of deprecation ones is not portable after SBT
version 1.3.13 (build import fails with compilation error with SBT 1.4)
and could be replaced with more robust and maintainable, especially
since Scala 2.13.2 with similar built-in functionality.
Additionally, warnings were fixed to pass the build, with as few changes
as possible: warnings in 2.12 compilation fixed in code, warnings in
2.13 compilation covered by configuration to be addressed separately
### Why are the changes needed?
Unblocks upgrade to SBT after 1.3.13. Enhances build file
maintainability. Allows fine tune of warnings configuration in scope of
Scala 2.13 compilation.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
`build/sbt`'s `compile` and `Test/compile` for both Scala 2.12 and 2.13
profiles.
Closes #29995 from gemelen/feature/warnings-reporter.
Authored-by: Denis Pyshev <git@gemelen.net> Signed-off-by: Sean Owen
<srowen@gmail.com>
(commit: ba69d68)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerSuite.scala (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala (diff)
The file was modifiedproject/SparkBuild.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/UnsafeArraySuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/shuffle/HostLocalShuffleReadingSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala (diff)