Changes

Summary

  1. [SPARK-33426][SQL][TESTS] Unify Hive SHOW TABLES tests (commit: 539c2de) (details)
  2. [SPARK-33439][INFRA] Use SERIAL_SBT_TESTS=1 for SQL modules (commit: a70a2b0) (details)
  3. [SPARK-33433][SQL] Change Aggregate max rows to 1 if grouping is empty (commit: 82a21d2) (details)
  4. [SPARK-33419][SQL] Unexpected behavior when using SET commands before a (commit: cdd8e51) (details)
  5. [SPARK-33166][DOC] Provide Search Function in Spark docs site (commit: f80fe21) (details)
  6. Revert "[SPARK-33139][SQL] protect setActionSession and (commit: 234711a) (details)
  7. [SPARK-33288][SPARK-32661][K8S] Stage level scheduling support for (commit: acfd846) (details)
  8. [SPARK-32916][SHUFFLE][TEST-MAVEN][TEST-HADOOP2.7] Remove the newly (commit: 423ba5a) (details)
  9. [SPARK-33337][SQL][FOLLOWUP] Prevent possible flakyness in (commit: 0046222) (details)
Commit 539c2deb896d0adb9bbd63fc1ef48a31050a6538 by wenchen
[SPARK-33426][SQL][TESTS] Unify Hive SHOW TABLES tests
### What changes were proposed in this pull request? 1. Create the
separate test suite
`org.apache.spark.sql.hive.execution.command.ShowTablesSuite`. 2. Re-use
V1 SHOW TABLES tests added by https://github.com/apache/spark/pull/30287
in the Hive test suites. 3. Add new test case for the pattern
`'table_name_1*|table_name_2*'` in the common test suite.
### Why are the changes needed? To test V1 + common  SHOW TABLES tests
in Hive.
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? By running v1/v2 and Hive v1
`ShowTablesSuite`:
```
$  build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly
*ShowTablesSuite"
```
Closes #30340 from MaxGekk/show-tables-hive-tests.
Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 539c2de)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowTablesSuiteBase.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTablesSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala (diff)
The file was removedsql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowTablesSuite.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/ShowTablesSuite.scala (diff)
The file was addedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/ShowTablesSuite.scala
Commit a70a2b02ce7d18947778d37c8fffb3f1b1b5b154 by dongjoon
[SPARK-33439][INFRA] Use SERIAL_SBT_TESTS=1 for SQL modules
### What changes were proposed in this pull request?
This PR aims to decrease the parallelism of `SQL` module like `Hive`
module.
### Why are the changes needed?
GitHub Action `sql - slow tests` become flaky.
- https://github.com/apache/spark/runs/1393670291
- https://github.com/apache/spark/runs/1393088031
### Does this PR introduce _any_ user-facing change?
No. This is dev-only feature. Although this will increase the running
time, but it's better than flakiness.
### How was this patch tested?
Pass the GitHub Action stably.
Closes #30365 from dongjoon-hyun/SPARK-33439.
Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: a70a2b0)
The file was modified.github/workflows/build_and_test.yml (diff)
Commit 82a21d2a3e3d4eafa43802b3034907a1f2725396 by yamamuro
[SPARK-33433][SQL] Change Aggregate max rows to 1 if grouping is empty
### What changes were proposed in this pull request?
Change `Aggregate` max rows to 1 if grouping is empty.
### Why are the changes needed?
If `Aggregate` grouping is empty, the result is always one row.
Then we don't need push down limit in `LimitPushDown` with such case
``` select count(*) from t1 union select count(*) from t2 limit 1
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Add test.
Closes #30356 from ulysses-you/SPARK-33433.
Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: Takeshi
Yamamuro <yamamuro@apache.org>
(commit: 82a21d2)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LimitPushdownSuite.scala (diff)
Commit cdd8e51742a59ab11ffd45b8f4e893128c43f8d7 by wenchen
[SPARK-33419][SQL] Unexpected behavior when using SET commands before a
query in SparkSession.sql
### What changes were proposed in this pull request?
SparkSession.sql converts a string value to a DataFrame, and the string
value should be one single SQL statement ending up w/ or w/o one or more
semicolons. e.g.
```sql scala> spark.sql(" select 2").show
+---+
|  2|
+---+
|  2|
+---+ scala> spark.sql(" select 2;").show
+---+
|  2|
+---+
|  2|
+---+
scala> spark.sql(" select 2;;;;").show
+---+
|  2|
+---+
|  2|
+---+
``` If we put 2 or more statements in, it fails in the parser as
expected, e.g.
```sql scala> spark.sql(" select 2; select 1;").show
org.apache.spark.sql.catalyst.parser.ParseException: extraneous input
'select' expecting {<EOF>, ';'}(line 1, pos 11)
== SQL ==
select 2; select 1;
-----------^^^
  at
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130)
at
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
at
org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:610)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:769)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607)
... 47 elided
```
As a very generic user scenario, users may want to change some settings
before they execute the queries. They may pass a string value like `set
spark.sql.abc=2; select 1;` into this API, which creates a confusing gap
between the actual effect and the user's expectations.
The user may want the query to be executed with spark.sql.abc=2, but
Spark actually treats the whole part of `2; select 1;` as the value of
the property 'spark.sql.abc', e.g.
``` scala> spark.sql("set spark.sql.abc=2; select 1;").show
+-------------+------------+
|          key|       value|
+-------------+------------+
|spark.sql.abc|2; select 1;|
+-------------+------------+
```
What's more, the SET symbol could digest everything behind it, which
makes it unstable from version to version, e.g.
#### 3.1
```sql scala> spark.sql("set;").show
org.apache.spark.sql.catalyst.parser.ParseException: Expected format is
'SET', 'SET key', or 'SET key=value'. If you want to include special
characters in key, please use quotes, e.g., SET `ke y`=value.(line 1,
pos 0)
== SQL == set;
^^^
  at
org.apache.spark.sql.execution.SparkSqlAstBuilder.$anonfun$visitSetConfiguration$1(SparkSqlParser.scala:83)
at
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:113)
at
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitSetConfiguration(SparkSqlParser.scala:72)
at
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitSetConfiguration(SparkSqlParser.scala:58)
at
org.apache.spark.sql.catalyst.parser.SqlBaseParser$SetConfigurationContext.accept(SqlBaseParser.java:2161)
at
org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
at
org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitSingleStatement$1(AstBuilder.scala:77)
at
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:113)
at
org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:77)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.$anonfun$parsePlan$1(ParseDriver.scala:82)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:113)
at
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
at
org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:610)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:769)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607)
... 47 elided
scala> spark.sql("set a;").show
org.apache.spark.sql.catalyst.parser.ParseException: Expected format is
'SET', 'SET key', or 'SET key=value'. If you want to include special
characters in key, please use quotes, e.g., SET `ke y`=value.(line 1,
pos 0)
== SQL == set a;
^^^
  at
org.apache.spark.sql.execution.SparkSqlAstBuilder.$anonfun$visitSetConfiguration$1(SparkSqlParser.scala:83)
at
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:113)
at
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitSetConfiguration(SparkSqlParser.scala:72)
at
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitSetConfiguration(SparkSqlParser.scala:58)
at
org.apache.spark.sql.catalyst.parser.SqlBaseParser$SetConfigurationContext.accept(SqlBaseParser.java:2161)
at
org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:18)
at
org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitSingleStatement$1(AstBuilder.scala:77)
at
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:113)
at
org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:77)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.$anonfun$parsePlan$1(ParseDriver.scala:82)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:113)
at
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
at
org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:610)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at
org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:769)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607)
... 47 elided
```
#### 2.4
```sql scala> spark.sql("set;").show
+---+-----------+
|key|      value|
+---+-----------+
|  ;|<undefined>|
+---+-----------+
scala> spark.sql("set a;").show
+---+-----------+
|key|      value|
+---+-----------+
| a;|<undefined>|
+---+-----------+
```
In this PR, 1.  make `set spark.sql.abc=2; select 1;` in
`SparkSession.sql` fail directly, user should call `.sql` for each
statement separately. 2. make the semicolon as the separator of
statements, and if users want to use it as part of the property value,
shall use quotes too.
### Why are the changes needed?
1. disambiguation for  `SparkSession.sql` 2. make semicolon work same
both w/ `SET` and other statements
### Does this PR introduce _any_ user-facing change?
yes, the semicolon works as a separator of statements now, it will be
trimmed if it is at the end of the statement and fail the statement if
it is in the middle. you need to use quotes if you want it to be part of
the property value
### How was this patch tested?
new tests
Closes #30332 from yaooqinn/SPARK-33419.
Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: cdd8e51)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala (diff)
Commit f80fe213bd4c5e065d5723816c42302a532be75c by gengliang.wang
[SPARK-33166][DOC] Provide Search Function in Spark docs site
### What changes were proposed in this pull request? In the last few
releases, our Spark documentation  https://spark.apache.org/docs/latest/
becomes richer. It would nice to provide a search function to make our
users find contents faster.
[DocSearch](https://docsearch.algolia.com/) is entirely free and
automated.  This PR will use it to provides search function.
The screenshots show below:
![overview](https://user-images.githubusercontent.com/8486025/98756802-30d82a80-23c3-11eb-9ca2-73bb20fb54c4.png)
### Why are the changes needed? Let the users of Spark documentation
could find the needed information effectively.
### Does this PR introduce _any_ user-facing change? Yes
### How was this patch tested? build on my machine and look on brower.
Closes #30292 from beliefer/SPARK-33166.
Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer
<beliefer@163.com> Signed-off-by: Gengliang Wang
<gengliang.wang@databricks.com>
(commit: f80fe21)
The file was addeddocs/css/docsearch.css
The file was modifieddocs/_layouts/global.html (diff)
Commit 234711a328dd4cd888d6a73145984987eabc483b by wenchen
Revert "[SPARK-33139][SQL] protect setActionSession and
clearActiveSession"
### What changes were proposed in this pull request?
In [SPARK-33139] we defined `setActionSession` and `clearActiveSession`
as deprecated API, it turns out it is widely used, and after discussion,
even if without this PR, it should work with unify view feature, it
might only be a risk if user really abuse using these two API. So revert
the PR is needed.
[SPARK-33139] has two commit, include a follow up. Revert them both.
### Why are the changes needed?
Revert.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing UT.
Closes #30367 from leanken/leanken-revert-SPARK-33139.
Authored-by: xuewei.linxuewei <xuewei.linxuewei@alibaba-inc.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 234711a)
The file was modifiedmllib/src/test/scala/org/apache/spark/mllib/util/MLlibTestSparkContext.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DeprecatedAPISuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreCoordinatorSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SparkSessionBuilderSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManagerSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SessionStateSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/CoalesceShufflePartitionsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/test/SharedSparkSession.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSharedStateSuite.scala (diff)
The file was modifiedpython/pyspark/sql/session.py (diff)
The file was modifiedmllib/src/test/java/org/apache/spark/SharedSparkSession.java (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/V1WriteFallbackSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/test/TestSQLContext.scala (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkOperation.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLContextSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/LocalSparkSession.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
Commit acfd8467534fbf58c12e9f2d993b7d135fb8d32b by tgraves
[SPARK-33288][SPARK-32661][K8S] Stage level scheduling support for
Kubernetes
### What changes were proposed in this pull request?
This adds support for Stage level scheduling to kubernetes. Kubernetes
can support dynamic allocation via the shuffle tracking option which
means we can support stage level scheduling by getting new executors.
The main changes here are having the k8s cluster manager pass the
resource profile id into the executors and then the
ExecutorsPodsAllocator has to request executors based on the individual
resource profiles.  I tried to keep code changes here to a minimum. I
specifically choose to leave the ExecutorPodsSnapshot the way it was and
construct the resource profile to pod states on the fly, with a fast
path when not using other resource profiles, to keep the impact to a
minimum.  This results in the main changes required are just wrapping
the allocation logic in a for loop over each profile.  The other main
change is in the basic feature step we have to look at the resources in
the ResourceProfile to request pods with the correct resources.  Much of
the other logic like in the executor life cycle manager doesn't need to
be resource profile.
This also adds support for [SPARK-32661]Spark executors on K8S should
request extra memory for off-heap allocations because the stage level
scheduling api has support for this and it made sense to make consistent
with YARN.  This was started with PR
https://github.com/apache/spark/pull/29477 but never updated so I just
did it here.   To do this I moved a few functions around that were now
used by both YARN and kubernetes so you will see some changes in Utils.
### Why are the changes needed?
Add the feature to Kubernetes based on customer feedback.
### Does this PR introduce _any_ user-facing change?
Yes the feature now works with K8s, but not underlying API changes.
### How was this patch tested?
Tested manually on kubernetes cluster and with unit tests.
Closes #30204 from tgravescs/stagek8sOrigSnapshotsRebase.
Lead-authored-by: Thomas Graves <tgraves@apache.org> Co-authored-by:
Thomas Graves <tgraves@nvidia.com> Signed-off-by: Thomas Graves
<tgraves@apache.org>
(commit: acfd846)
The file was modifiedcore/src/test/scala/org/apache/spark/util/UtilsSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilderSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/resource/ResourceProfileManagerSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala (diff)
The file was modifiedresource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtilSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala (diff)
The file was modifiedresource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala (diff)
The file was modifiedresource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocatorSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBuilder.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/resource/ResourceProfile.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/resource/ResourceProfileSuite.scala (diff)
The file was modifieddocs/running-on-kubernetes.md (diff)
The file was modifieddocs/configuration.md (diff)
The file was modifiedresource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnAllocatorSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/util/Utils.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorLifecycleTestUtils.scala (diff)
The file was modifiedresource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala (diff)
The file was modifiedresource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStepSuite.scala (diff)
The file was modifieddocs/running-on-yarn.md (diff)
Commit 423ba5a16038c1cb28d0973e18518645e69d5ff1 by tgraves
[SPARK-32916][SHUFFLE][TEST-MAVEN][TEST-HADOOP2.7] Remove the newly
added YarnShuffleServiceSuite.java
### What changes were proposed in this pull request? This is a follow-up
fix for the failing tests in `YarnShuffleServiceSuite.java`. This java
class was introduced in https://github.com/apache/spark/pull/30062. The
tests in the class fail when run with hadoop-2.7 profile:
```
[ERROR]
testCreateDefaultMergedShuffleFileManagerInstance(org.apache.spark.network.yarn.YarnShuffleServiceSuite)
Time elapsed: 0.627 s  <<< ERROR! java.lang.NoClassDefFoundError:
org/apache/commons/logging/LogFactory
at
org.apache.spark.network.yarn.YarnShuffleServiceSuite.testCreateDefaultMergedShuffleFileManagerInstance(YarnShuffleServiceSuite.java:37)
Caused by: java.lang.ClassNotFoundException:
org.apache.commons.logging.LogFactory
at
org.apache.spark.network.yarn.YarnShuffleServiceSuite.testCreateDefaultMergedShuffleFileManagerInstance(YarnShuffleServiceSuite.java:37)
[ERROR]
testCreateRemoteBlockPushResolverInstance(org.apache.spark.network.yarn.YarnShuffleServiceSuite)
Time elapsed: 0 s  <<< ERROR! java.lang.NoClassDefFoundError: Could not
initialize class org.apache.spark.network.yarn.YarnShuffleService
at
org.apache.spark.network.yarn.YarnShuffleServiceSuite.testCreateRemoteBlockPushResolverInstance(YarnShuffleServiceSuite.java:47)
[ERROR]
testInvalidClassNameOfMergeManagerWillUseNoOpInstance(org.apache.spark.network.yarn.YarnShuffleServiceSuite)
Time elapsed: 0.001 s  <<< ERROR! java.lang.NoClassDefFoundError: Could
not initialize class org.apache.spark.network.yarn.YarnShuffleService
at
org.apache.spark.network.yarn.YarnShuffleServiceSuite.testInvalidClassNameOfMergeManagerWillUseNoOpInstance(YarnShuffleServiceSuite.java:57)
``` A test suit for `YarnShuffleService` did exist here:
`resource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala`
I missed this when I created
`common/network-yarn/src/test/java/org/apache/spark/network/yarn/YarnShuffleServiceSuite.java`.
Moving all the new tests to the earlier test suite fixes the failures
with hadoop-2.7 even though why this happened is not clear.
### Why are the changes needed? The newly added tests are failing when
run with hadoop profile 2.7
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? Ran the unit tests with the default
profile as well as hadoop 2.7 profile.
`build/mvn test -Dtest=none
-DwildcardSuites=org.apache.spark.network.yarn.YarnShuffleServiceSuite
-Phadoop-2.7 -Pyarn`
``` Run starting. Expected test count is: 11 YarnShuffleServiceSuite:
- executor state kept across NM restart
- removed applications should not be in registered executor file
- shuffle service should be robust to corrupt registered executor file
- get correct recovery path
- moving recovery file from NM local dir to recovery path
- service throws error if cannot start
- recovery db should not be created if NM recovery is not enabled
- SPARK-31646: metrics should be registered into Node Manager's metrics
system
- create default merged shuffle file manager instance
- create remote block push resolver instance
- invalid class name of merge manager will use noop instance Run
completed in 2 seconds, 572 milliseconds. Total number of tests run: 11
Suites: completed 2, aborted 0 Tests: succeeded 11, failed 0, canceled
0, ignored 0, pending 0 All tests passed.
```
Closes #30349 from otterc/SPARK-32916-followup.
Authored-by: Chandni Singh <singh.chandni@gmail.com> Signed-off-by:
Thomas Graves <tgraves@apache.org>
(commit: 423ba5a)
The file was modifiedresource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala (diff)
The file was removedcommon/network-yarn/src/test/java/org/apache/spark/network/yarn/YarnShuffleServiceSuite.java
Commit 0046222a758fda2aead4a77bd19b19b913276693 by dongjoon
[SPARK-33337][SQL][FOLLOWUP] Prevent possible flakyness in
SubexpressionEliminationSuite
### What changes were proposed in this pull request?
This is a simple followup to prevent test flakyness in
SubexpressionEliminationSuite. If `getAllEquivalentExprs` returns more
than 1 sequences, due to HashMap, we should use `contains` instead of
assuming the order of results.
### Why are the changes needed?
Prevent test flakyness in SubexpressionEliminationSuite.
### Does this PR introduce _any_ user-facing change?
No, dev only.
### How was this patch tested?
Unit test.
Closes #30371 from viirya/SPARK-33337-followup.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: 0046222)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubexpressionEliminationSuite.scala (diff)