Changes

Summary

  1. [SPARK-21040][CORE][FOLLOW-UP] Only calculate executorKillTime when (commit: 70964e7) (details)
  2. [SPARK-32645][INFRA] Upload unit-tests.log as an artifact (commit: bfd8c34) (details)
  3. [SPARK-32018][FOLLOWUP][DOC] Add migration guide for decimal value (commit: 1b39215) (details)
  4. [SPARK-32652][SQL] ObjectSerializerPruning fails for RowEncoder (commit: f33b64a) (details)
  5. [MINOR][DOCS] Add KMeansSummary and InheritableThread to documentation (commit: 891c5e6) (details)
  6. [SPARK-32550][SQL] Make SpecificInternalRow constructors faster by using (commit: e15ae60) (details)
  7. [SPARK-32651][CORE] Decommission switch configuration should have the (commit: 3092527) (details)
  8. [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should (commit: 03e2de9) (details)
  9. [SPARK-32600][CORE] Unify task name in some logs between driver and (commit: a1a32d2) (details)
  10. [SPARK-32624][SQL] Use getCanonicalName to fix byte[] compile issue (commit: 409fea3) (details)
  11. [SPARK-32621][SQL] 'path' option can cause issues while inferring schema (commit: 3d1dce7) (details)
  12. [SPARK-28863][SQL] Introduce AlreadyPlanned to prevent reanalysis of (commit: 278d0dd) (details)
  13. [SPARK-32655][K8S] Support appId/execId placeholder in K8s (commit: 3722ed4) (details)
Commit 70964e741a715945052df4122b5e66fc39431b1a by wenchen
[SPARK-21040][CORE][FOLLOW-UP] Only calculate executorKillTime when
speculation is enabled
### What changes were proposed in this pull request?
Only calculate `executorKillTime` in
`TaskSetManager.executorDecommission()` when speculation is enabled.
### Why are the changes needed?
Avoid unnecessary operations to save time/memory.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass existed tests.
Closes #29464 from Ngone51/followup-SPARK-21040.
Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 70964e7)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala (diff)
Commit bfd8c341545c1d591763d7cee6cc52cdf5672633 by gurwls223
[SPARK-32645][INFRA] Upload unit-tests.log as an artifact
### What changes were proposed in this pull request?
This PR proposes to upload `target/unit-tests.log` into the artifact so
it will be able to download here:
![Screen Shot 2020-08-18 at 2 23 18
PM](https://user-images.githubusercontent.com/6477701/90474095-789e3b80-e15f-11ea-87f8-e7da3df3c03e.png)
### Why are the changes needed?
Jenkins has this feature. It should be best to have the same dev
functionalities with it. Also, note that this was pointed out
https://github.com/apache/spark/pull/29225#discussion_r471485011.
### Does this PR introduce _any_ user-facing change?
No, dev-only
### How was this patch tested?
https://github.com/apache/spark/actions/runs/213000777 should
demonstrate it
Closes #29454 from HyukjinKwon/SPARK-32645.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: bfd8c34)
The file was modified.github/workflows/build_and_test.yml (diff)
Commit 1b39215a6590835860697d8649b3041967b74784 by gengliang.wang
[SPARK-32018][FOLLOWUP][DOC] Add migration guide for decimal value
overflow in sum aggregation
### What changes were proposed in this pull request?
Add migration guide for decimal value overflow behavior in sum
aggregation, introduced in https://github.com/apache/spark/pull/29026
### Why are the changes needed?
Add migration guide for the behavior changes from 3.0 to 3.1. See also:
https://github.com/apache/spark/pull/29450#issuecomment-675222779
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Build docs and preview:
![image](https://user-images.githubusercontent.com/1097932/90589256-8b7e3380-e192-11ea-8ff1-05a447c20722.png)
Closes #29458 from gengliangwang/migrationGuideDecimalOverflow.
Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
(commit: 1b39215)
The file was modifieddocs/sql-migration-guide.md (diff)
Commit f33b64a6567f93e5515521b8b1e7761e16f667d0 by gurwls223
[SPARK-32652][SQL] ObjectSerializerPruning fails for RowEncoder
### What changes were proposed in this pull request?
Update `ObjectSerializerPruning.alignNullTypeInIf`, to consider the
isNull check generated in `RowEncoder`, which is `Invoke(inputObject,
"isNullAt", BooleanType, Literal(index) :: Nil)`.
### Why are the changes needed?
Query fails if we don't fix this bug, due to type mismatch in `If`.
### Does this PR introduce _any_ user-facing change?
Yes, the failed query can run after this fix.
### How was this patch tested?
new tests
Closes #29467 from cloud-fan/bug.
Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: f33b64a)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ObjectSerializerPruningSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/objects.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetOptimizationSuite.scala (diff)
Commit 891c5e661a77c713d06b102b7aba3be0667add33 by gurwls223
[MINOR][DOCS] Add KMeansSummary and InheritableThread to documentation
### What changes were proposed in this pull request?
The class `KMeansSummary` in pyspark is not included in
`clustering.py`'s `__all__` declaration. It isn't included in the docs
as a result.
`InheritableThread` and `KMeansSummary` should be into corresponding RST
files for documentation.
### Why are the changes needed?
It seems like an oversight to not include this as all similar "summary"
classes are.
`InheritableThread` should also be documented.
### Does this PR introduce _any_ user-facing change?
I don't believe there are functional changes. It should make this public
class appear in docs.
### How was this patch tested?
Existing tests / N/A.
Closes #29470 from srowen/KMeansSummary.
Lead-authored-by: Sean Owen <srowen@gmail.com> Co-authored-by:
HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 891c5e6)
The file was modifiedpython/pyspark/ml/clustering.py (diff)
The file was modifiedpython/docs/source/reference/pyspark.rst (diff)
The file was modifiedpython/docs/source/reference/pyspark.ml.rst (diff)
Commit e15ae60a530c9866e3a166cec66cffdcaa664ffb by gurwls223
[SPARK-32550][SQL] Make SpecificInternalRow constructors faster by using
while loops instead of maps
### What changes were proposed in this pull request? Change maps in two
constructors of SpecificInternalRow to while loops.
### Why are the changes needed? This was originally noticed with
https://github.com/apache/spark/pull/29353 and
https://github.com/apache/spark/pull/29354 and will have impacts on
performance of reading ORC and Avro files. Ran AvroReadBenchmarks with
the new cases of nested and array'd structs in
https://github.com/apache/spark/pull/29352. Haven't run benchmarks for
ORC but can do that if needed.
**Before:**
``` Nested Struct Scan:                       Best Time(ms)   Avg
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative Nested Struct
                                    74674          75319         912   
     0.0      142429.1       1.0X
Array of Struct Scan:                     Best Time(ms)   Avg Time(ms) 
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative Array of Structs       
                         34193          34339         206          0.0 
    65217.9       1.0X
```
**After:**
``` Nested Struct Scan:                       Best Time(ms)   Avg
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative Nested Struct
                                    48451          48619         237   
     0.0       92413.2       1.0X
Array of Struct Scan:                     Best Time(ms)   Avg Time(ms) 
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative Array of Structs       
                         18518          18683         234          0.0 
    35319.6       1.0X
```
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? Ran AvroReadBenchmarks with the new cases
of nested and array'd structs in
https://github.com/apache/spark/pull/29352.
Closes #29366 from msamirkhan/spark-32550.
Lead-authored-by: Samir Khan <muhammad.samir.khan@gmail.com>
Co-authored-by: skhan04 <samirkhan@verizonmedia.com> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: e15ae60)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SpecificInternalRow.scala (diff)
Commit 3092527f7557b64ff9a5bedadfac8bb2f189a9b4 by wenchen
[SPARK-32651][CORE] Decommission switch configuration should have the
highest hierarchy
### What changes were proposed in this pull request?
Rename `spark.worker.decommission.enabled` to
`spark.decommission.enabled` and move it from
`org.apache.spark.internal.config.Worker` to
`org.apache.spark.internal.config.package`.
### Why are the changes needed?
Decommission has been supported in Standalone and k8s yet and may be
supported in Yarn(https://github.com/apache/spark/pull/27636) in the
future. Therefore, the switch configuration should have the highest
hierarchy rather than belongs to Standalone's Worker. In other words, it
should be independent of the cluster managers.
### Does this PR introduce _any_ user-facing change?
No, as the decommission feature hasn't been released.
### How was this patch tested?
Pass existed tests.
Closes #29466 from Ngone51/fix-decom-conf.
Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 3092527)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/worker/Worker.scala (diff)
The file was modifiedstreaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/client/AppClientSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionIntegrationSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/DecommissionWorkerSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/WorkerDecommissionExtendedSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/Worker.scala (diff)
The file was modifiedstreaming/src/test/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManagerSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/WorkerDecommissionSuite.scala (diff)
Commit 03e2de99ab05bccd2f3f83a78073ca4fbf04caa6 by wenchen
[SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should
format value
### What changes were proposed in this pull request? For SQL
``` SELECT TRANSFORM(a, b, c)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
NULL DEFINED AS 'null'
USING 'cat' AS (a, b, c)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
NULL DEFINED AS 'NULL' FROM testData
``` The correct
TOK_TABLEROWFORMATFIELD should be `, `nut actually ` ','`
TOK_TABLEROWFORMATLINES should be `\n`  but actually` '\n'`
### Why are the changes needed? Fix string value format
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? Added UT
Closes #29428 from AngersZhuuuu/SPARK-32608.
Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 03e2de9)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala (diff)
Commit a1a32d2eb5dbb879ec190b0af30f2296768b317b by wenchen
[SPARK-32600][CORE] Unify task name in some logs between driver and
executor
### What changes were proposed in this pull request?
This PR replaces some arbitrary task names in logs with the widely used
task name (e.g. "task 0.0 in stage 1.0 (TID 1)") among driver and
executor. This will change the task name in `TaskDescription` by
appending TID.
### Why are the changes needed?
Some logs are still using TID(a.k.a `taskId`) only as the task name,
e.g.,
https://github.com/apache/spark/blob/7f275ee5978e00ac514e25f5ef1d4e3331f8031b/core/src/main/scala/org/apache/spark/executor/Executor.scala#L786
https://github.com/apache/spark/blob/7f275ee5978e00ac514e25f5ef1d4e3331f8031b/core/src/main/scala/org/apache/spark/executor/Executor.scala#L632-L635
And the task thread name also only has the `taskId`:
https://github.com/apache/spark/blob/7f275ee5978e00ac514e25f5ef1d4e3331f8031b/core/src/main/scala/org/apache/spark/executor/Executor.scala#L325
As mentioned in https://github.com/apache/spark/pull/1259, TID itself
does not capture stage or retries, making it harder to correlate with
the application. It's inconvenient when debugging applications.
Actually, task name like "task name (e.g. "task 0.0 in stage 1.0 (TID
1)")" has already been used widely after
https://github.com/apache/spark/pull/1259. We'd better follow the naming
convention.
### Does this PR introduce _any_ user-facing change?
Yes. Users will see the more consistent task names in the log.
### How was this patch tested?
Manually checked.
Closes #29418 from Ngone51/unify-task-name.
Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: a1a32d2)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/TaskDescription.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/executor/Executor.scala (diff)
Commit 409fea30cc40ce24a17325ec63d2f847ce49f5a6 by wgyumg
[SPARK-32624][SQL] Use getCanonicalName to fix byte[] compile issue
### What changes were proposed in this pull request?
```scala scala> Array[Byte](1, 2).getClass.getName res13: String = [B
scala> Array[Byte](1, 2).getClass.getCanonicalName res14: String =
byte[]
```
This pr replace `getClass.getName` with `getClass.getCanonicalName` in
`CodegenContext.addReferenceObj` to fix `byte[]` compile issue:
```
...
/* 030 */       value_1 =
org.apache.spark.sql.catalyst.util.TypeUtils.compareBinary(value_2,
(([B) references[0] /* min */)) >= 0 &&
org.apache.spark.sql.catalyst.util.TypeUtils.compareBinary(value_2,
(([B) references[1] /* max */)) <= 0;
/* 031 */     }
/* 032 */     return !isNull_1 && value_1;
/* 033 */   }
/* 034 */
/* 035 */
/* 036 */ }
20:49:54.886 WARN org.apache.spark.sql.catalyst.expressions.Predicate:
Expr codegen error and falling back to interpreter mode
java.util.concurrent.ExecutionException:
org.codehaus.commons.compiler.CompileException: File 'generated.java',
Line 30, Column 81: failed to compile:
org.codehaus.commons.compiler.CompileException: File 'generated.java',
Line 30, Column 81: Unexpected token "[" in primary
...
```
### Why are the changes needed?
Fix compile issue when compiling generated code.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes #29439 from wangyum/SPARK-32624.
Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Yuming Wang
<wgyumg@gmail.com>
(commit: 409fea3)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala (diff)
Commit 3d1dce75d96373130e27b3809c73d3796b5b77be by wenchen
[SPARK-32621][SQL] 'path' option can cause issues while inferring schema
in CSV/JSON datasources
### What changes were proposed in this pull request?
When CSV/JSON datasources infer schema (e.g, `def inferSchema(files:
Seq[FileStatus])`, they use the `files` along with the original options.
`files` in `inferSchema` could have been deduced from the "path" option
if the option was present, so this can cause issues (e.g., reading more
data, listing the path again) since the "path" option is **added** to
the `files`.
### Why are the changes needed?
The current behavior can cause the following issue:
```scala class TestFileFilter extends PathFilter {
override def accept(path: Path): Boolean = path.getParent.getName !=
"p=2"
}
val path = "/tmp" val df = spark.range(2) df.write.json(path + "/p=1")
df.write.json(path + "/p=2")
val extraOptions = Map(
"mapred.input.pathFilter.class" -> classOf[TestFileFilter].getName,
"mapreduce.input.pathFilter.class" -> classOf[TestFileFilter].getName
)
// This works fine.
assert(spark.read.options(extraOptions).json(path).count == 2)
// The following with "path" option fails with the following:
// assertion failed: Conflicting directory structures detected.
Suspicious paths
// file:/tmp
// file:/tmp/p=1
assert(spark.read.options(extraOptions).format("json").option("path",
path).load.count() === 2)
```
### Does this PR introduce _any_ user-facing change?
Yes, the above failure doesn't happen and you get the consistent
experience when you use `spark.read.csv(path)` or
`spark.read.format("csv").option("path", path).load`.
### How was this patch tested?
Updated existing tests.
Closes #29437 from imback82/path_bug.
Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 3d1dce7)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVDataSourceV2.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/text/TextDataSourceV2.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcDataSourceV2.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetDataSourceV2.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/v2/avro/AvroDataSourceV2.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/json/JsonDataSourceV2.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala (diff)
Commit 278d0dd25bc1479ecda42d6f722106e4763edfae by wenchen
[SPARK-28863][SQL] Introduce AlreadyPlanned to prevent reanalysis of
V1FallbackWriters
### What changes were proposed in this pull request?
This PR introduces a LogicalNode AlreadyPlanned, and related physical
plan and preparation rule.
With the DataSourceV2 write operations, we have a way to fallback to the
V1 writer APIs using InsertableRelation. The gross part is that we're in
physical land, but the InsertableRelation takes a logical plan, so we
have to pass the logical plans to these physical nodes, and then
potentially go through re-planning. This re-planning can cause issues
for an already optimized plan.
A useful primitive could be specifying that a plan is ready for
execution through a logical node AlreadyPlanned. This would wrap a
physical plan, and then we can go straight to execution.
### Why are the changes needed?
To avoid having a physical plan that is disconnected from the physical
plan that is being executed in V1WriteFallback execution. When a
physical plan node executes a logical plan, the inner query is not
connected to the running physical plan. The physical plan that actually
runs is not visible through the Spark UI and its metrics are not
exposed. In some cases, the EXPLAIN plan doesn't show it.
### Does this PR introduce _any_ user-facing change?
Nope
### How was this patch tested?
V1FallbackWriterSuite tests that writes still work
Closes #29469 from brkyvz/alreadyAnalyzed2.
Lead-authored-by: Burak Yavuz <brkyvz@gmail.com> Co-authored-by: Burak
Yavuz <burak@databricks.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 278d0dd)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/V1WriteFallbackSuite.scala (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/AlreadyPlanned.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/AlreadyPlannedSuite.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V1FallbackWriters.scala (diff)
Commit 3722ed430ddef4a4c73f8bae73b21f2097fdc33a by dongjoon
[SPARK-32655][K8S] Support appId/execId placeholder in K8s
SPARK_EXECUTOR_DIRS
### What changes were proposed in this pull request?
This PR aims to support replacements of
`SPARK_APPLICATION_ID`/`SPARK_EXECUTOR_ID` in `SPARK_EXECUTOR_DIRS `
executor environment.
### Why are the changes needed?
This PR provides users additional controllability.
**HOW TO RUN**
``` bin/spark-submit --master
k8s://https://kubernetes.docker.internal:6443 --deploy-mode cluster \
-c spark.kubernetes.container.image=spark:SPARK-32655 \
-c spark.kubernetes.driver.pod.name=pi \
-c spark.kubernetes.executor.podNamePrefix=pi \
-c spark.kubernetes.executor.volumes.nfs.data.mount.path=/efs \
-c spark.kubernetes.executor.volumes.nfs.data.mount.readOnly=false \
-c
spark.kubernetes.executor.volumes.nfs.data.options.server=efs-server-ip
\
-c spark.kubernetes.executor.volumes.nfs.data.options.path=/ \
-c
spark.executorEnv.SPARK_EXECUTOR_DIRS=/efs/SPARK_APPLICATION_ID/SPARK_EXECUTOR_ID
\
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples_2.12-3.1.0-SNAPSHOT.jar
20000
```
**EFS Layout**
```
/efs
├── spark-f45039b13b0b4fd4baf80fed561a2228
│   ├── 1
│   │   ├── blockmgr-bbe76578-8ff2-4c2d-ab4f-37671d886f56
│   │   │   ├── 0e
│   │   │   └── 11
│   │   └── spark-e41aeb41-00fc-49e1-a77d-093b6df5958a
│   │       ├── -18375678081597852666997_cache
│   │       └── -18375678081597852666997_lock
│   └── 2
│       ├── blockmgr-765bfb50-ab13-4b2b-9350-356fed0169e3
│       │   ├── 0e
│       │   └── 11
│       └── spark-737671fc-1697-4367-9daf-2b1575f92aba
│           ├── -18375678081597852666997_cache
│           └── -18375678081597852666997_lock
```
### Does this PR introduce _any_ user-facing change?
- Yes because this is a new feature.
- This will not affect the existing jobs because users don't use the
string pattern `SPARK_APPLICATION_ID` or `SPARK_EXECUTOR_ID` inside
`SPARK_EXECUTOR_DIRS` environment variable.
### How was this patch tested?
Pass the newly added test case.
Closes #29472 from dongjoon-hyun/SPARK-32655.
Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: 3722ed4)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala (diff)