Changes

Summary

  1. [SPARK-28704][SQL][TEST] Add back Skiped (commit: d7f4b2a) (details)
  2. [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when (commit: d338af3) (details)
  3. [SPARK-33469][SQL] Add current_timezone function (commit: 6d625cc) (details)
  4. [SPARK-33512][BUILD] Upgrade test libraries (commit: df4a1c2) (details)
  5. [MINOR][INFRA] Suppress warning in check-license (commit: a459238) (details)
  6. [SPARK-33427][SQL][FOLLOWUP] Put key and value into IdentityHashMap (commit: aa78c05) (details)
Commit d7f4b2ad50aa7acdb0392bb400fc0c87491c6e45 by dongjoon
[SPARK-28704][SQL][TEST] Add back Skiped
HiveExternalCatalogVersionsSuite in HiveSparkSubmitSuite at JDK9+
### What changes were proposed in this pull request? We skip test
HiveExternalCatalogVersionsSuite when testing with JAVA_9 or later
because our previous version does not support JAVA_9 or later. We now
add it back since we have a version supports JAVA_9 or later.
### Why are the changes needed?
To recover test coverage.
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? Check CI logs.
Closes #30451 from AngersZhuuuu/SPARK-28704.
Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: d7f4b2a)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala (diff)
Commit d338af3101a4c986b5e979e8fdc63b8551e12d29 by kabhwan.opensource
[SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when
filtering from a batch-based file data source
### What changes were proposed in this pull request?
Two new options, _modifiiedBefore_  and _modifiedAfter_, is provided
expecting a value in 'YYYY-MM-DDTHH:mm:ss' format.
_PartioningAwareFileIndex_ considers these options during the process of
checking for files, just before considering applied _PathFilters_ such
as `pathGlobFilter.`  In order to filter file results, a new PathFilter
class was derived for this purpose.  General house-keeping around
classes extending PathFilter was performed for neatness.  It became
apparent support was needed to handle multiple potential path filters.
Logic was introduced for this purpose and the associated tests written.
### Why are the changes needed?
When loading files from a data source, there can often times be
thousands of file within a respective file path.  In many cases I've
seen, we want to start loading from a folder path and ideally be able to
begin loading files having modification dates past a certain point.
This would mean out of thousands of potential files, only the ones with
modification dates greater than the specified timestamp would be
considered.  This saves a ton of time automatically and reduces
significant complexity managing this in code.
### Does this PR introduce _any_ user-facing change?
This PR introduces an option that can be used with batch-based Spark
file data sources.  A documentation update was made to reflect an
example and usage of the new data source option.
**Example Usages**
_Load all CSV files modified after date:_
`spark.read.format("csv").option("modifiedAfter","2020-06-15T05:00:00").load()`
_Load all CSV files modified before date:_
`spark.read.format("csv").option("modifiedBefore","2020-06-15T05:00:00").load()`
_Load all CSV files modified between two dates:_
`spark.read.format("csv").option("modifiedAfter","2019-01-15T05:00:00").option("modifiedBefore","2020-06-15T05:00:00").load()
`
### How was this patch tested?
A handful of unit tests were added to support the positive, negative,
and edge case code paths.
It's also live in a handful of our Databricks dev environments.  (quoted
from cchighman)
Closes #30411 from HeartSaVioR/SPARK-31962.
Lead-authored-by: CC Highman <christopher.highman@microsoft.com>
Co-authored-by: Jungtaek Lim (HeartSaVioR)
<kabhwan.opensource@gmail.com> Signed-off-by: Jungtaek Lim (HeartSaVioR)
<kabhwan.opensource@gmail.com>
(commit: d338af3)
The file was modifieddocs/sql-data-sources-generic-options.md (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/PathFilterSuite.scala
The file was modifiedexamples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java (diff)
The file was modifiedexamples/src/main/python/sql/datasource.py (diff)
The file was modifiedexamples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala (diff)
The file was modifiedpython/pyspark/sql/readwriter.py (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamOptions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/PathFilterStrategySuite.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (diff)
The file was modifiedexamples/src/main/r/RSparkSQLExample.R (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/pathFilters.scala
Commit 6d625ccd5b5a76a149e2070df31984610629a295 by dongjoon
[SPARK-33469][SQL] Add current_timezone function
### What changes were proposed in this pull request?
Add a `CurrentTimeZone` function and replace the value at `Optimizer`
side.
### Why are the changes needed?
Let user get current timezone easily. Then user can call
``` SELECT current_timezone()
```
Presto: https://prestodb.io/docs/current/functions/datetime.html SQL
Server:
https://docs.microsoft.com/en-us/sql/t-sql/functions/current-timezone-transact-sql?view=sql-server-ver15
### Does this PR introduce _any_ user-facing change?
Yes, a new function.
### How was this patch tested?
Add test.
Closes #30400 from ulysses-you/SPARK-33469.
Lead-authored-by: ulysses <youxiduo@weidian.com> Co-authored-by:
ulysses-you <youxiduo@weidian.com> Signed-off-by: Dongjoon Hyun
<dongjoon@apache.org>
(commit: 6d625cc)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ComputeCurrentTimeSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-functions/sql-expression-schema.md (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/expressions/ExpressionInfoSuite.scala (diff)
Commit df4a1c2256b71c9a1bd2006819135f56c99a2f21 by dongjoon
[SPARK-33512][BUILD] Upgrade test libraries
### What changes were proposed in this pull request?
This PR aims to update the test libraries.
- ScalaTest: 3.2.0 -> 3.2.3
- JUnit: 4.12 -> 4.13.1
- Mockito: 3.1.0 -> 3.4.6
- JMock: 2.8.4 -> 2.12.0
- maven-surefire-plugin: 3.0.0-M3 -> 3.0.0-M5
- scala-maven-plugin: 4.3.0 -> 4.4.0
### Why are the changes needed?
This will make the test frameworks up-to-date for Apache Spark 3.1.0.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
Closes #30456 from dongjoon-hyun/SPARK-33512.
Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: df4a1c2)
The file was modifiedpom.xml (diff)
Commit a45923852342ce3f9454743a71740b09e6efe859 by gurwls223
[MINOR][INFRA] Suppress warning in check-license
### What changes were proposed in this pull request? This PR aims to
suppress the warning `File exists` in check-license
### Why are the changes needed?
**BEFORE**
```
% dev/check-license Attempting to fetch rat RAT checks passed.
% dev/check-license mkdir: target: File exists RAT checks passed.
```
**AFTER**
```
% dev/check-license Attempting to fetch rat RAT checks passed.
% dev/check-license RAT checks passed.
```
### Does this PR introduce _any_ user-facing change? No.
### How was this patch tested? Manually do dev/check-license twice.
Closes #30460 from williamhyun/checklicense.
Authored-by: William Hyun <williamhyun3@gmail.com> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: a459238)
The file was modifieddev/check-license (diff)
Commit aa78c05edc9cb910cca9fb14f7670559fe00c62d by gurwls223
[SPARK-33427][SQL][FOLLOWUP] Put key and value into IdentityHashMap
sequantially
### What changes were proposed in this pull request?
This follow-up fixes an issue when inserting key/value pairs into
`IdentityHashMap` in `SubExprEvaluationRuntime`.
### Why are the changes needed?
The last commits to #30341 follows review comment to use
`IdentityHashMap`. Because we leverage `IdentityHashMap` to compare keys
in reference, we should not convert expression pairs to Scala map before
inserting. Scala map compares keys by equality so we will loss keys with
different references.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Run benchmark to verify.
Closes #30459 from viirya/SPARK-33427-map.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: aa78c05)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntimeSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala (diff)