Changes

Summary

  1. [SPARK-33386][SQL] Accessing array elements in (commit: 6d31dae) (details)
  2. [MINOR][DOC] spark.executor.memoryOverhead is not cluster-mode only (commit: 4335af0) (details)
  3. [SPARK-32907][ML][PYTHON] adaptively blockify instances - LinearSVC (commit: a288716) (details)
  4. [SPARK-33421][SQL] Support Greatest and Least in Expression Canonicalize (commit: a3d2954) (details)
  5. [SPARK-33278][SQL] Improve the performance for FIRST_VALUE (commit: 2f07c56) (details)
  6. [SPARK-33140][SQL][FOLLOW-UP] change val to def in object rule (commit: 1baf0d5) (details)
Commit 6d31daeb6a2c5607ffe3b23ffb381626ad57f576 by wenchen
[SPARK-33386][SQL] Accessing array elements in
ElementAt/Elt/GetArrayItem should failed if index is out of bound
### What changes were proposed in this pull request?
Instead of returning NULL, throws runtime ArrayIndexOutOfBoundsException
when ansiMode is enable for `element_at`,`elt`, `GetArrayItem`
functions.
### Why are the changes needed?
For ansiMode.
### Does this PR introduce any user-facing change?
When `spark.sql.ansi.enabled` = true, Spark will throw
`ArrayIndexOutOfBoundsException` if out-of-range index when accessing
array elements
### How was this patch tested?
Added UT and existing UT.
Closes #30297 from leanken/leanken-SPARK-33386.
Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 6d31dae)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ComplexTypeSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/array.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SelectedField.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ProjectionOverSchema.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/array.sql (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CollectionExpressionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was addedsql/core/src/test/resources/sql-tests/inputs/ansi/array.sql
The file was addedsql/core/src/test/resources/sql-tests/results/ansi/array.sql.out
The file was modifieddocs/sql-ref-ansi-compliance.md (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala (diff)
Commit 4335af075a8ad27c4906f03ae5f8cd8f9a754e5a by yamamuro
[MINOR][DOC] spark.executor.memoryOverhead is not cluster-mode only
### What changes were proposed in this pull request?
Remove "in cluster mode" from the description of
`spark.executor.memoryOverhead`
### Why are the changes needed?
fix correctness issue in documentaion
### Does this PR introduce _any_ user-facing change?
yes, users may not get confused about the description
`spark.executor.memoryOverhead`
### How was this patch tested?
pass GA doc generation
Closes #30311 from yaooqinn/minordoc.
Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Takeshi
Yamamuro <yamamuro@apache.org>
(commit: 4335af0)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
The file was modifieddocs/configuration.md (diff)
Commit a2887164bcca152e2402169bf6991c7dfb3ac11c by weichen.xu
[SPARK-32907][ML][PYTHON] adaptively blockify instances - LinearSVC
### What changes were proposed in this pull request? 1, use
`maxBlockSizeInMB` instead of `blockSize`(#rows) to control the stacking
of vectors; 2, infer an appropriate `maxBlockSizeInMB` if set 0;
### Why are the changes needed? the performance gain is mainly related
to the nnz of block.
f2jBLAS |   |   |   |   |   |   |   |   |   |   |   |   |  
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
Duration(millisecond) | branch 3.0 Impl | blockSizeInMB=0.0625 |
blockSizeInMB=0.125 | blockSizeInMB=0.25 | blockSizeInMB=0.5 |
blockSizeInMB=1 | blockSizeInMB=2 | blockSizeInMB=4 | blockSizeInMB=8 |
blockSizeInMB=16 | blockSizeInMB=32 | blockSizeInMB=64 |
blockSizeInMB=128 epsilon(100%) | 326481 | 26143 | 25710 | 24726 | 25395
| 25840 | 26846 | 25927 | 27431 | 26190 | 26056 | 26347 | 27204
epsilon3000(67%) | 455247 | 35893 | 34366 | 34985 | 38387 | 38901 |
40426 | 40044 | 39161 | 38767 | 39965 | 39523 | 39108 epsilon4000(50%) |
306390 | 42256 | 41164 | 43748 | 48638 | 50892 | 50986 | 51091 | 51072 |
51289 | 51652 | 53312 | 52146 epsilon5000(40%) | 307619 | 43639 | 42992
| 44743 | 50800 | 51939 | 51871 | 52190 | 53850 | 52607 | 51062 | 52509
| 51570 epsilon10000(20%) | 310070 | 58371 | 55921 | 56317 | 56618 |
53694 | 52131 | 51768 | 51728 | 52233 | 51881 | 51653 | 52440
epsilon20000(10%) | 316565 | 109193 | 95121 | 82764 | 69653 | 60764 |
56066 | 53371 | 52822 | 52872 | 52769 | 52527 | 53508 epsilon200000(1%)
| 336181 | 1569721 | 1069355 | 673718 | 375043 | 218230 | 145393 |
110926 | 94327 | 87039 | 83926 | 81890 | 81787
  |   |   |   |   |   |   |   |   |   |   |   |   |  
  |   |   |   |   |   |   |   |   |   |   |   |   |  
  | Speedup |   |   |   |   |   |   |   |   |   |   |   |  
epsilon(100%) | 1 | 12.48827602 | 12.69859977 | **13.20395535** |
12.85611341 | 12.63471362 | 12.16125307 | 12.59231689 | 11.90189931 |
12.46586483 | 12.5299739 | 12.39158158 | 12.00121306 epsilon3000(67%) |
1 | 12.68344803 | **13.2470174** | 13.01263399 | 11.85940553 |
11.70270687 | 11.26124276 | 11.36866946 | 11.62500958 | 11.74315784 |
11.39114225 | 11.51853351 | 11.64076404 epsilon4000(50%) | 1 |
7.250804619 | **7.443154212** | 7.003520161 | 6.299395534 | 6.020396133
| 6.00929667 | 5.996946625 | 5.999177632 | 5.973795551 | 5.931812902 |
5.747111345 | 5.875618456 epsilon5000(40%) | 1 | 7.049176196 |
**7.155261444** | 6.875243055 | 6.055492126 | 5.92269778 | 5.930462108 |
5.894213451 | 5.712516249 | 5.847491779 | 6.024421292 | 5.858405226 |
5.965076595 epsilon10000(20%) | 1 | 5.312055644 | 5.544786395 |
5.505797539 | 5.4765269 | 5.774760681 | 5.947900481 | 5.98960748 |
5.994239097 | 5.93628549 | 5.976561747 | **6.002942714** | 5.912852784
epsilon20000(10%) | 1 | 2.899132728 | 3.328024306 | 3.824911797 |
4.544886796 | 5.209745902 | 5.64629187 | 5.931404695 | 5.993052137 |
5.987384627 | 5.999071425 | **6.026710073** | 5.916218136
epsilon200000(1%) | 1 | 0.214166084 | 0.314377358 | 0.498993644 |
0.896379882 | 1.540489392 | 2.312222734 | 3.03067811 | 3.563995463 |
3.862417997 | 4.005683578 | 4.105275369 | **4.110445425**
OpenBLAS |   |   |   |   |   |   |   |   |   |   |   |   |  
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
Duration(millisecond) | branch 3.0 Impl | blockSizeInMB=0.0625 |
blockSizeInMB=0.125 | blockSizeInMB=0.25 | blockSizeInMB=0.5 |
blockSizeInMB=1 | blockSizeInMB=2 | blockSizeInMB=4 | blockSizeInMB=8 |
blockSizeInMB=16 | blockSizeInMB=32 | blockSizeInMB=64 |
blockSizeInMB=128 epsilon(100%) | 299119 | 26047 | 25049 | 25239 | 28001
| 35138 | 36438 | 36279 | 36114 | 35111 | 35428 | 36295 | 35197
epsilon3000(67%) | 439798 | 33321 | 34423 | 34336 | 38906 | 51756 |
54138 | 54085 | 53412 | 54766 | 54425 | 54221 | 54842 epsilon4000(50%) |
302963 | 42960 | 40678 | 43483 | 48254 | 50888 | 54990 | 52647 | 51947 |
51843 | 52891 | 53410 | 52020 epsilon5000(40%) | 303569 | 44225 | 44961
| 45065 | 51768 | 52776 | 51930 | 53587 | 53104 | 51833 | 52138 | 52574
| 53756 epsilon10000(20%) | 307403 | 58447 | 55993 | 56757 | 56694 |
54038 | 52734 | 52073 | 52051 | 52150 | 51986 | 52407 | 52390
epsilon20000(10%) | 313344 | 107580 | 94679 | 83329 | 70226 | 60996 |
57130 | 55461 | 54641 | 52712 | 52541 | 53101 | 53312 epsilon200000(1%)
| 334679 | 1642726 | 1073148 | 654481 | 364974 | 213881 | 140248 |
107579 | 91757 | 85090 | 81940 | 80492 | 80250
  |   |   |   |   |   |   |   |   |   |   |   |   |  
  |   |   |   |   |   |   |   |   |   |   |   |   |  
  | Speedup |   |   |   |   |   |   |   |   |   |   |   |  
epsilon(100%) | 1 | 11.48381771 | **11.94135494** | 11.85146004 |
10.68243991 | 8.512692811 | 8.208985125 | 8.244962651 | 8.282632774 |
8.519238985 | 8.443011178 | 8.241328007 | 8.498423161 epsilon3000(67%) |
1 | 13.19882356 | 12.7762833 | **12.80865564** | 11.30411762 |
8.497526857 | 8.123646976 | 8.131607655 | 8.234067251 | 8.030493372 |
8.080808452 | 8.111211523 | 8.01936472 epsilon4000(50%) | 1 |
7.052211359 | **7.44783421** | 6.967389555 | 6.278505409 | 5.953525389 |
5.509419895 | 5.754610899 | 5.832155851 | 5.843855487 | 5.728063376 |
5.672402172 | 5.823971549 epsilon5000(40%) | 1 | **6.86419446** |
6.751829363 | 6.736247642 | 5.864027971 | 5.752027437 | 5.845734643 |
5.664974714 | 5.716499699 | 5.856674319 | 5.822413595 | 5.774127896 |
5.647164968 epsilon10000(20%) | 1 | 5.259517169 | 5.490025539 |
5.416124883 | 5.422143437 | 5.688645028 | 5.829313157 | 5.903308816 |
5.905803923 | 5.894592522 | **5.913188166** | 5.865685882 | 5.867589235
epsilon20000(10%) | 1 | 2.912660346 | 3.309540658 | 3.760323537 |
4.461937174 | 5.137123746 | 5.48475407 | 5.649807973 | 5.734594901 |
5.944452876 | **5.963799699** | 5.900905821 | 5.87755102
epsilon200000(1%) | 1 | 0.203733915 | 0.311866583 | 0.511365494 |
0.916994087 | 1.564790701 | 2.38633706 | 3.111006795 | 3.647449241 |
3.933235398 | 4.084439834 | 4.157916315 | **4.170454829**
### Does this PR introduce _any_ user-facing change? yes, param
`blockSize` -> `blockSizeInMB` in master
### How was this patch tested? added testsuites and performance test
(result attached in
[ticket](https://issues.apache.org/jira/browse/SPARK-32907))
Closes #30009 from zhengruifeng/adaptively_blockify_linear_svc_II.
Lead-authored-by: zhengruifeng <ruifengz@foxmail.com> Co-authored-by:
Weichen Xu <weichen.xu@databricks.com> Signed-off-by: Weichen Xu
<weichen.xu@databricks.com>
(commit: a288716)
The file was modifiedpython/pyspark/ml/param/_shared_params_code_gen.py (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/feature/InstanceSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala (diff)
The file was modifiedpython/pyspark/ml/classification.py (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/Instance.scala (diff)
The file was modifiedpython/pyspark/ml/param/shared.py (diff)
The file was modifiedpython/pyspark/ml/classification.pyi (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala (diff)
The file was modifiedpython/pyspark/ml/param/shared.pyi (diff)
Commit a3d2954662831ca9fa6a2b886ca5bd8d81785974 by gurwls223
[SPARK-33421][SQL] Support Greatest and Least in Expression Canonicalize
### What changes were proposed in this pull request?
Add `Greatest` and `Least` check in `Canonicalize`.
### Why are the changes needed?
The children of both `Greatest` and `Least` are order Irrelevant.
Let's say we have `greatest(1, 2)` and `greatest(2, 1)`. We can get the
same canonicalized expression in this case.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Add test.
Closes #30330 from ulysses-you/SPARK-33421.
Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: a3d2954)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala (diff)
Commit 2f07c568107b2e466a6d6e199eaff7068100bb3c by wenchen
[SPARK-33278][SQL] Improve the performance for FIRST_VALUE
### What changes were proposed in this pull request?
https://github.com/apache/spark/pull/29800 provides a performance
improvement for `NTH_VALUE`.
`FIRST_VALUE` also could use the `UnboundedOffsetWindowFunctionFrame`
and `UnboundedPrecedingOffsetWindowFunctionFrame`.
### Why are the changes needed? Improve the performance for
`FIRST_VALUE`.
### Does this PR introduce _any_ user-facing change?
'No'.
### How was this patch tested? Jenkins test.
Closes #30178 from beliefer/SPARK-33278.
Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer
<beliefer@163.com> Co-authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 2f07c56)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeWindowFunctionsSuite.scala
The file was modifiedsql/core/src/test/resources/sql-tests/results/window.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/window.sql (diff)
Commit 1baf0d5c9b481622d5a811fd600f680b0cc3229f by gurwls223
[SPARK-33140][SQL][FOLLOW-UP] change val to def in object rule
### What changes were proposed in this pull request? In #30097, many
rules changed from case class to object, but if the rule is stateful,
there will be a problem. For example, if an object rule uses a
`val` to refer to a config, it will be unchanged after initialization
even if other spark session uses a different config value.
### Why are the changes needed? Avoid potential bug
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? Existing UT
Closes #30354 from linhongliu-db/SPARK-33140-followup-2.
Lead-authored-by: Linhong Liu
<67896261+linhongliu-db@users.noreply.github.com> Co-authored-by:
Linhong Liu <linhong.liu@databricks.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 1baf0d5)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/higherOrderFunctions.scala (diff)