Changes

Summary

  1. [SPARK-36239][PYTHON][DOCS] Remove some APIs from documentation (commit: 86471ad) (details)
  2. [SPARK-36256][BUILD] Upgrade lz4-java to 1.8.0 (commit: 13aefd6) (details)
  3. [SPARK-36257][SQL] Updated the version of TimestampNTZ related changes (commit: ae9f612) (details)
  4. [SPARK-36209][PYTHON][DOCS] Fix link to pyspark Dataframe documentation (commit: 3a1db2d) (details)
  5. [SPARK-35815][SQL] Allow delayThreshold for watermark to be represented (commit: 07fa38e) (details)
  6. [SPARK-35310][MLLIB] Update to breeze 1.2 (commit: 518f00f) (details)
  7. [SPARK-35848][MLLIB] Optimize some treeAggregates in MLlib by delaying (commit: b69c268) (details)
  8. [SPARK-36262][BUILD] Upgrade ZSTD-JNI to 1.5.0-4 (commit: a1a1974) (details)
  9. [SPARK-36265][PYTHON] Use __getitem__ instead of getItem to suppress (commit: a76a087) (details)
  10. [SPARK-36189][PYTHON] Improve bool, string, numeric DataTypeOps tests by (commit: 75fd1f5) (details)
  11. [SPARK-36248][PYTHON] Add rename_categories to CategoricalAccessor and (commit: 8b3d84b) (details)
  12. [SPARK-36268][PYTHON] Set the lowerbound of mypy version to 0.910 (commit: d6bc8cd) (details)
  13. [SPARK-36261][PYTHON] Add remove_unused_categories to (commit: 2fe12a7) (details)
  14. [SPARK-36270][BUILD] Change memory settings for enabling GA (commit: fd36ed4) (details)
  15. [SPARK-36258][PYTHON] Exposing functionExists in pyspark sql catalog (commit: 382fe44) (details)
  16. [SPARK-36226][PYTHON][DOCS] Improve python docstring links to other (commit: 701756a) (details)
  17. [SPARK-34399][SQL] Add commit duration to SQL tab's graph node (commit: 3ff8c9f) (details)
  18. [SPARK-36242][CORE] Ensure spill file closed before set `success = true` (commit: f61d599) (details)
  19. [SPARK-35561][SQL] Remove leading zeros from empty static number type (commit: fc29c91) (details)
  20. [SPARK-36273][SHUFFLE] Fix identical values comparison (commit: 530c8ad) (details)
  21. [SPARK-36276][BUILD][TESTS] Update maven-checkstyle-plugin to 3.1.2 and (commit: 32f3e21) (details)
  22. [SPARK-36270][BUILD][FOLLOWUP] Reduce metaspace size for pyspark (commit: c2de111) (details)
  23. [SPARK-35956][K8S] Support auto assigning labels to decommissioning pods (commit: bee2799) (details)
  24. [MINOR][INFRA] Add enabled_merge_buttons to .asf.yaml explicitly (commit: bad7a92) (details)
  25. [SPARK-36264][PYTHON] Add reorder_categories to CategoricalAccessor and (commit: e12bc4d) (details)
  26. [SPARK-36274][PYTHON] Fix equality comparison of unordered Categoricals (commit: 85adc2f) (details)
  27. [SPARK-36279][INFRA][PYTHON] Fix lint-python to work with Python 3.9 (commit: 663cbdf) (details)
  28. [SPARK-36225][PYTHON][DOCS] Use DataFrame in python docstrings (commit: ae1c20e) (details)
  29. [SPARK-36255][SHUFFLE][CORE] Stop pushing and retrying on FileNotFound (commit: 09e1c61) (details)
  30. [SPARK-35259][SHUFFLE] Update ExternalBlockHandler Timer variables to (commit: 70a1586) (details)
Commit 86471ad668d6eb1423c6c02eb59992fd608fd581 by gurwls223
[SPARK-36239][PYTHON][DOCS] Remove some APIs from documentation

### What changes were proposed in this pull request?

This PR proposes removing some APIs from pandas-on-Spark documentation.

Because they can be easily workaround via Spark DataFrame or Column functions, so they might be removed In the future.

### Why are the changes needed?

Because we don't want to expose some functions as a public API.

### Does this PR introduce _any_ user-facing change?

The APIs such as `(Series|Index).spark.data_type`, `(Series|Index).spark.nullable`, `DataFrame.spark.schema`, `DataFrame.spark.print_schema`, `DataFrame.pandas_on_spark.attach_id_column`, `DataFrame.spark.checkpoint`, `DataFrame.spark.localcheckpoint` and `DataFrame.spark.explain` is removed in the documentation.

### How was this patch tested?

Manually build the documents.

Closes #33458 from itholic/SPARK-36239.

Authored-by: itholic <haejoon.lee@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 86471ad)
The file was modifiedpython/docs/source/reference/pyspark.pandas/frame.rst (diff)
The file was modifiedpython/docs/source/reference/pyspark.pandas/series.rst (diff)
The file was modifiedpython/docs/source/reference/pyspark.pandas/indexing.rst (diff)
Commit 13aefd6a6612550b108f15e8a9cb8d5bc9a5ff7a by gengliang
[SPARK-36256][BUILD] Upgrade lz4-java to 1.8.0

### What changes were proposed in this pull request?

This PR upgrades `lz4-java` to `1.8.0`, which includes not only performance improvement  but also Darwin aarch64 support.
https://github.com/lz4/lz4-java/releases/tag/1.8.0
https://github.com/lz4/lz4-java/blob/1.8.0/CHANGES.md

### Why are the changes needed?

For providing better performance and platform support.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI.

Closes #33476 from sarutak/upgrade-lz4-java-1.8.0.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(commit: 13aefd6)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
Commit ae9f6126fbbf0cf5fe5a7ece8d074d7c0e11fd93 by gengliang
[SPARK-36257][SQL] Updated the version of TimestampNTZ related changes as 3.3.0

### What changes were proposed in this pull request?

As we decided to release TimestampNTZ type in Spark 3.3, we should update the versions of TimestampNTZ related changes as 3.3.0.

### Why are the changes needed?

Correct the versions in documentation/code comment.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UT

Closes #33478 from gengliangwang/updateVersion.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(commit: ae9f612)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit 3a1db2ddd439a6df2a1dd896aab8420a9b45286b by srowen
[SPARK-36209][PYTHON][DOCS] Fix link to pyspark Dataframe documentation

### What changes were proposed in this pull request?
Bugfix: link to correction location of Pyspark Dataframe documentation

### Why are the changes needed?
Current website returns "Not found"

### Does this PR introduce _any_ user-facing change?
Website fix

### How was this patch tested?
Documentation change

Closes #33420 from dominikgehl/feature/SPARK-36209.

Authored-by: Dominik Gehl <dog@open.ch>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 3a1db2d)
The file was modifieddocs/structured-streaming-programming-guide.md (diff)
The file was modifieddocs/ml-migration-guide.md (diff)
The file was modifieddocs/sql-programming-guide.md (diff)
The file was modifieddocs/streaming-programming-guide.md (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifieddocs/ml-pipeline.md (diff)
The file was modifieddocs/streaming-kinesis-integration.md (diff)
The file was modifieddocs/rdd-programming-guide.md (diff)
Commit 07fa38e2c1082c2b69b3bf9489cee4dfe4db2c26 by max.gekk
[SPARK-35815][SQL] Allow delayThreshold for watermark to be represented as ANSI interval literals

### What changes were proposed in this pull request?

This PR extends the way to represent `delayThreshold` with ANSI interval literals for watermark.

### Why are the changes needed?

A `delayThreshold` is semantically an interval value so it's should be represented as ANSI interval literals as well as the conventional `1 second` form.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

Closes #33456 from sarutak/delayThreshold-interval.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: 07fa38e)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Dataset.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala (diff)
Commit 518f00fd78ed550ac37fb3b076572454a87e9853 by srowen
[SPARK-35310][MLLIB] Update to breeze 1.2

### What changes were proposed in this pull request?

Update to the latest breeze 1.2

### Why are the changes needed?

Minor bug fixes

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests

Closes #33449 from srowen/SPARK-35310.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 518f00f)
The file was modifiedmllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala (diff)
The file was modifiedpom.xml (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/mllib/util/MLUtilsSuite.scala (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
Commit b69c26833c99337bb17922f21dd72ee3a12e0c0a by srowen
[SPARK-35848][MLLIB] Optimize some treeAggregates in MLlib by delaying allocations

### What changes were proposed in this pull request?

Optimize some treeAggregates in MLlib by delaying allocating (thus not sending around) large arrays of zeroes
This uses the same idea as in https://github.com/apache/spark/pull/23600/files

### Why are the changes needed?

Allocating huge arrays of zeroes takes additional memory and network I/O which is unnecessary in some cases. It can cause operations to run out of memory that might otherwise succeed. Specifically, this should prevent the 'zero' value from having to be (pointlessly) checked for serializability, which can fail when passing through the default JavaSerializer; it would also prevent allocating and sending large 'zero' values for an empty partition in the aggregate.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #33443 from srowen/SPARK-35848.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: b69c268)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/clustering/StreamingKMeans.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringMetrics.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala (diff)
Commit a1a197403b2e88f11523c872f1789b3e94fa97d9 by dongjoon
[SPARK-36262][BUILD] Upgrade ZSTD-JNI to 1.5.0-4

### What changes were proposed in this pull request?

This PR aims to upgrade ZSTD-JNI to 1.5.0-4.

### Why are the changes needed?

ZSTD-JNI 1.5.0-3 has a packaging issue. 1.5.0-4 is recommended to be used instead.
- https://github.com/luben/zstd-jni/issues/181#issuecomment-885138495

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #33483 from dongjoon-hyun/SPARK-36262.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: a1a1974)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
Commit a76a087f7f3ed734426a8842b6f2e4d13d080399 by gurwls223
[SPARK-36265][PYTHON] Use __getitem__ instead of getItem to suppress warnings

### What changes were proposed in this pull request?

Use `Column.__getitem__` instead of `Column.getItem` to suppress warnings.

### Why are the changes needed?

In pandas API on Spark code base, there are some places using `Column.getItem` with `Column` object, but it shows a deprecation warning.

### Does this PR introduce _any_ user-facing change?

Yes, users won't see the warnings anymore.

- before

```py
>>> s = ps.Series(list("abbccc"), dtype="category")
>>> s.astype(str)
/path/to/spark/python/pyspark/sql/column.py:322: FutureWarning: A column as 'key' in getItem is deprecated as of Spark 3.0, and will not be supported in the future release. Use `column[key]` or `column.key` syntax instead.
  warnings.warn(
0    a
1    b
2    b
3    c
4    c
5    c
dtype: object
```

- after

```py
>>> s = ps.Series(list("abbccc"), dtype="category")
>>> s.astype(str)
0    a
1    b
2    b
3    c
4    c
5    c
dtype: object
```

### How was this patch tested?

Existing tests.

Closes #33486 from ueshin/issues/SPARK-36265/getitem.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: a76a087)
The file was modifiedpython/pyspark/pandas/base.py (diff)
The file was modifiedpython/pyspark/pandas/frame.py (diff)
The file was modifiedpython/pyspark/pandas/data_type_ops/categorical_ops.py (diff)
The file was modifiedpython/pyspark/pandas/data_type_ops/base.py (diff)
Commit 75fd1f5b826562d5d377dd6c4c64bf3c64524a1f by gurwls223
[SPARK-36189][PYTHON] Improve bool, string, numeric DataTypeOps tests by avoiding joins

### What changes were proposed in this pull request?
Improve bool, string, numeric DataTypeOps tests by avoiding joins.

Previously, bool, string, numeric DataTypeOps tests are conducted between two different Series.
After the PR, bool, string, numeric DataTypeOps tests should perform on a single DataFrame.

### Why are the changes needed?
A considerable number of DataTypeOps tests have operations on different Series, so joining is needed, which takes a long time.
We shall avoid joins for a shorter test duration.

The majority of joins happen in bool, string, numeric DataTypeOps tests, so we improve them first.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Unit tests.

Closes #33402 from xinrong-databricks/datatypeops_diffframe.

Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 75fd1f5)
The file was modifiedpython/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py (diff)
The file was modifiedpython/pyspark/pandas/tests/data_type_ops/test_num_ops.py (diff)
The file was modifiedpython/pyspark/pandas/tests/data_type_ops/test_string_ops.py (diff)
The file was modifiedpython/pyspark/pandas/tests/data_type_ops/testing_utils.py (diff)
Commit 8b3d84bb7eeb798337f63c266686f2efeeaf9ea3 by gurwls223
[SPARK-36248][PYTHON] Add rename_categories to CategoricalAccessor and CategoricalIndex

### What changes were proposed in this pull request?
Add rename_categories to CategoricalAccessor and CategoricalIndex.

### Why are the changes needed?
rename_categories is supported in pandas CategoricalAccessor and CategoricalIndex. We ought to follow pandas.

### Does this PR introduce _any_ user-facing change?
Yes. `rename_categories` is supported in pandas API on Spark now.

```py
# CategoricalIndex
>>> psser = ps.CategoricalIndex(["a", "a", "b"])
>>> psser.rename_categories([0, 1])
CategoricalIndex([0, 0, 1], categories=[0, 1], ordered=False, dtype='category')
>>> psser.rename_categories({'a': 'A', 'c': 'C'})
CategoricalIndex(['A', 'A', 'b'], categories=['A', 'b'], ordered=False, dtype='category')
>>> psser.rename_categories(lambda x: x.upper())
CategoricalIndex(['A', 'A', 'B'], categories=['A', 'B'], ordered=False, dtype='category')

# CategoricalAccessor
>>> s = ps.Series(["a", "a", "b"], dtype="category")
>>> s.cat.rename_categories([0, 1])
0    0
1    0
2    1
dtype: category
Categories (2, int64): [0, 1]
>>> s.cat.rename_categories({'a': 'A', 'c': 'C'})
0    A
1    A
2    b
dtype: category
Categories (2, object): ['A', 'b']
>>> s.cat.rename_categories(lambda x: x.upper())
0    A
1    A
2    B
dtype: category
Categories (2, object): ['A', 'B']
```

### How was this patch tested?
Unit tests.

Closes #33471 from xinrong-databricks/category_rename_categories.

Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 8b3d84b)
The file was modifiedpython/docs/source/reference/pyspark.pandas/series.rst (diff)
The file was modifiedpython/pyspark/pandas/tests/indexes/test_category.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_categorical.py (diff)
The file was modifiedpython/pyspark/pandas/indexes/category.py (diff)
The file was modifiedpython/docs/source/reference/pyspark.pandas/indexing.rst (diff)
The file was modifiedpython/pyspark/pandas/categorical.py (diff)
The file was modifiedpython/pyspark/pandas/missing/indexes.py (diff)
Commit d6bc8cd6816c54a73dcc6a6eb13b87dc4ed0482d by gurwls223
[SPARK-36268][PYTHON] Set the lowerbound of mypy version to 0.910

### What changes were proposed in this pull request?

This PR proposes to set the lowerbound of mypy version to use in the testing script.

### Why are the changes needed?

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141519/console

```
python/pyspark/mllib/tree.pyi:29: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
python/pyspark/mllib/tree.pyi:38: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
python/pyspark/mllib/feature.pyi:34: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
python/pyspark/mllib/feature.pyi:42: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
python/pyspark/mllib/feature.pyi:48: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
python/pyspark/mllib/feature.pyi:54: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
python/pyspark/mllib/feature.pyi:76: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
python/pyspark/mllib/feature.pyi:124: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
python/pyspark/mllib/feature.pyi:165: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
python/pyspark/mllib/clustering.pyi:45: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
python/pyspark/mllib/clustering.pyi:72: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
python/pyspark/mllib/classification.pyi:39: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
python/pyspark/mllib/classification.pyi:52: error: Overloaded function signatures 1 and 2 overlap with incompatible return types
Found 13 errors in 4 files (checked 314 source files)
1
```

Jenkins installed mypy at SPARK-32797 but seems the version installed is not same as GIthub Actions.

It seems difficult to make the codebase compatible with multiple mypy versions. Therefore, this PR sets the lowerbound.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Jenkins job in this PR should test it out.

Also manually tested:

Without mypy:

```
...
flake8 checks passed.

The mypy command was not found. Skipping for now.
```

With mypy 0.812:

```
...
flake8 checks passed.

The minimum mypy version needs to be 0.910. Your current version is mypy 0.812. Skipping for now.
```

With mypy 0.910:

```
...
flake8 checks passed.

starting mypy test...
mypy checks passed.

all lint-python tests passed!
```

Closes #33487 from HyukjinKwon/SPARK-36268.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: d6bc8cd)
The file was modifieddev/lint-python (diff)
Commit 2fe12a75206d4dbef6d7678b876c16876136cdd0 by gurwls223
[SPARK-36261][PYTHON] Add remove_unused_categories to CategoricalAccessor and CategoricalIndex

### What changes were proposed in this pull request?

Add `remove_unused_categories` to `CategoricalAccessor` and `CategoricalIndex`.

### Why are the changes needed?

We should implement `remove_unused_categories` in `CategoricalAccessor` and `CategoricalIndex`.

### Does this PR introduce _any_ user-facing change?

Yes, users will be able to use `remove_unused_categories`.

### How was this patch tested?

Added some tests.

Closes #33485 from ueshin/issues/SPARK-36261/remove_unused_categories.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 2fe12a7)
The file was modifiedpython/pyspark/pandas/missing/indexes.py (diff)
The file was modifiedpython/docs/source/reference/pyspark.pandas/series.rst (diff)
The file was modifiedpython/docs/source/reference/pyspark.pandas/indexing.rst (diff)
The file was modifiedpython/pyspark/pandas/indexes/category.py (diff)
The file was modifiedpython/pyspark/pandas/categorical.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_categorical.py (diff)
The file was modifiedpython/pyspark/pandas/tests/indexes/test_category.py (diff)
Commit fd36ed4550c6451f69b696bc57645eeba6aca69b by gurwls223
[SPARK-36270][BUILD] Change memory settings for enabling GA

### What changes were proposed in this pull request?

Trying to adjust build memory settings and serial execution to re-enable GA.

### Why are the changes needed?

GA tests are failed recently due to return code 137. We need to adjust build settings to make GA work.

### Does this PR introduce _any_ user-facing change?

No, dev only.

### How was this patch tested?

GA

Closes #33447 from viirya/test-ga.

Lead-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Co-authored-by: Hyukjin Kwon <gurwls223@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: fd36ed4)
The file was modified.github/workflows/build_and_test.yml (diff)
The file was modifieddev/run-tests.py (diff)
The file was modifiedpom.xml (diff)
The file was modifiedproject/SparkBuild.scala (diff)
The file was modifiedbuild/sbt-launch-lib.bash (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala (diff)
Commit 382fe44b55b2404093b1b2403f62eaee7e032813 by gurwls223
[SPARK-36258][PYTHON] Exposing functionExists in pyspark sql catalog

### What changes were proposed in this pull request?
Exposing functionExists in pyspark sql catalog

### Why are the changes needed?
method was available in scala but not pyspark

### Does this PR introduce _any_ user-facing change?
Additional method

### How was this patch tested?
Unit tests

Closes #33481 from dominikgehl/SPARK-36258.

Authored-by: Dominik Gehl <dog@open.ch>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 382fe44)
The file was modifiedpython/docs/source/reference/pyspark.sql.rst (diff)
The file was modifiedpython/pyspark/sql/catalog.py (diff)
The file was modifiedpython/pyspark/sql/catalog.pyi (diff)
The file was modifiedpython/pyspark/sql/tests/test_catalog.py (diff)
Commit 701756ac957b517464cecbea3aa0799404c4b159 by gurwls223
[SPARK-36226][PYTHON][DOCS] Improve python docstring links to other classes

### What changes were proposed in this pull request?
additional links to other classes in python documentation

### Why are the changes needed?
python docstring syntax wasn't fully used everywhere

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Documentation change only

Closes #33440 from dominikgehl/feature/python-docstrings.

Authored-by: Dominik Gehl <dog@open.ch>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 701756a)
The file was modifiedpython/pyspark/sql/session.py (diff)
The file was modifiedpython/pyspark/sql/types.py (diff)
The file was modifiedpython/pyspark/sql/group.py (diff)
Commit 3ff8c9f9d651779f1eb7926eaa79271f7d3fecee by wenchen
[SPARK-34399][SQL] Add commit duration to SQL tab's graph node

### What changes were proposed in this pull request?
Since we have add log about commit time, I think this useful and we can make user know it directly in SQL tab's UI.

![image](https://user-images.githubusercontent.com/46485123/126647754-dc3ba83a-5391-427c-8a67-e6af46e82290.png)

### Why are the changes needed?
Make user can directly know commit duration.

### Does this PR introduce _any_ user-facing change?
User can see file commit duration in SQL tab's SQL plan graph

### How was this patch tested?
Mannul tested

Closes #31522 from AngersZhuuuu/SPARK-34399.

Lead-authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Co-authored-by: AngersZhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 3ff8c9f)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileBatchWrite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriteStatsTracker.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/BasicWriteTaskStatsTrackerSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/CustomWriteTaskStatsTrackerSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatDataWriter.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala (diff)
Commit f61d5993eafe024effd3e0c4c17bd9779c704073 by yi.wu
[SPARK-36242][CORE] Ensure spill file closed before set `success = true` in `ExternalSorter.spillMemoryIteratorToDisk` method

### What changes were proposed in this pull request?
The main change of this pr is move `writer.close()` before `success = true` to ensure spill file closed before set `success = true` in `ExternalSorter.spillMemoryIteratorToDisk` method.

### Why are the changes needed?
Avoid setting `success = true` first and then failure of close spill file

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass the Jenkins or GitHub Action
- Add a new Test case to check `The spill file should not exists if writer close fails`

Closes #33460 from LuciferYang/external-sorter-spill-close.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: yi.wu <yi.wu@databricks.com>
(commit: f61d599)
The file was modifiedcore/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala (diff)
The file was addedcore/src/test/scala/org/apache/spark/util/collection/ExternalSorterSpillSuite.scala
Commit fc29c91f27d866502f5b6cc4261d4943b5cccc7e by srowen
[SPARK-35561][SQL] Remove leading zeros from empty static number type partition

### What changes were proposed in this pull request?

This PR removes leading zeros from static number type partition when we insert into a partition table with empty partitions.

create table

    CREATE TABLE `table_int` ( `id` INT, `c_string` STRING, `p_int` int)
    USING parquet PARTITIONED BY (p_int);

insert

    insert overwrite table table_int partition (p_int='00011')
    select 1, 'c string'
    where true ;

|partition|
|---------|
|p_int=11|

    insert overwrite table table_int partition (p_int='00012')
    select 1, 'c string'
    where false ;

|partition|
|---------|
|p_int=00012|

### Why are the changes needed?

This PR creates consistent result when insert empty or non-empty partition

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Add Unit test

Closes #33291 from dgd-contributor/35561_insert_integer_partition_fail_when_empty.

Authored-by: dgd-contributor <dgd_contributor@viettel.com.vn>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: fc29c91)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala (diff)
Commit 530c8addbb69d64370e6b27a8f19e77a951de25f by srowen
[SPARK-36273][SHUFFLE] Fix identical values comparison

This commit fixes the use of the "o.appAttemptId" variable instead of the mistaken "appAttemptId" variable. The current situation is a comparison of identical values. Jira issue report SPARK-36273.

### What changes were proposed in this pull request?
This is a patch for SPARK-35546 which is needed for push-based shuffle.

### Why are the changes needed?
A very minor fix of adding the reference from the other "FinalizeShuffleMerge".

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
No unit tests were added. It's a pretty logical change.

Closes #33493 from almogtavor/patch-1.

Authored-by: Almog Tavor <70065337+almogtavor@users.noreply.github.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 530c8ad)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/FinalizeShuffleMerge.java (diff)
Commit 32f3e217f2c0b064e3a550b5f34747884ebed890 by dongjoon
[SPARK-36276][BUILD][TESTS] Update maven-checkstyle-plugin to 3.1.2 and checkstyle to 8.43

### What changes were proposed in this pull request?
This PR aims to update maven-checkstyle-plugin to 3.1.2 and checkstyle to 8.43.
### Why are the changes needed?
This will bring the latest bug fixes and improvements from 8.40 to 8.43.
- https://checkstyle.sourceforge.io/releasenotes.html#Release_8.43

Note that 8.44 has a false-positive bug for ArrayType checker.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass the GHA.

Closes #33500 from williamhyun/SPARK-36276.

Authored-by: William Hyun <william@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 32f3e21)
The file was modifiedpom.xml (diff)
Commit c2de111ec53a3e42812ce5a7b5268ca73faa39c1 by dongjoon
[SPARK-36270][BUILD][FOLLOWUP] Reduce metaspace size for pyspark

### What changes were proposed in this pull request?

Notice that pyspark GA module `pyspark-pandas-slow` sometimes still has return code 137. Try to reduce its metaspace size further.

### Why are the changes needed?

Fix return code 137 for pyspark GA module.

### Does this PR introduce _any_ user-facing change?

No, dev only.

### How was this patch tested?

GA

Closes #33496 from viirya/test-ga-followup.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: c2de111)
The file was modified.github/workflows/build_and_test.yml (diff)
Commit bee279997f2115af6b15e3dbb7433dccef7f14af by hkarau
[SPARK-35956][K8S] Support auto assigning labels to decommissioning pods

### What changes were proposed in this pull request?

Add a new configuration flag to allow Spark to provide hints to the scheduler when we are decommissioning or exiting a pod that this pod will have the least impact for a pre-emption event.

### Why are the changes needed?

Kubernetes added the concepts of pod disruption budgets (which can have selectors based on labels) as well pod deletion for providing hints to the scheduler as to what we would prefer to have pre-empted.

### Does this PR introduce _any_ user-facing change?

New configuration flag

### How was this patch tested?

The deletion unit test was extended.

Closes #33270 from holdenk/SPARK-35956-support-auto-assigning-labels-to-decommissioning-pods.

Lead-authored-by: Holden Karau <hkarau@netflix.com>
Co-authored-by: Holden Karau <holden@pigscanfly.ca>
Co-authored-by: Holden Karau <hkarau@apple.com>
Signed-off-by: Holden Karau <hkarau@netflix.com>
(commit: bee2799)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DecommissionSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala (diff)
The file was modifieddocs/running-on-kubernetes.md (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackendSuite.scala (diff)
Commit bad7a929aff40756445b7eea7119cc8c7b302969 by dongjoon
[MINOR][INFRA] Add enabled_merge_buttons to .asf.yaml explicitly

### What changes were proposed in this pull request?

This PR aims to add the AS-IS `enabled_merge_buttons` policy explicitly. The AS-IS policy was introduced via  https://issues.apache.org/jira/browse/INFRA-18656.

### Why are the changes needed?

Currently, this policy is maintained as a self-serving manner. Here is the official documentation. It would be great if we have this explicitly for new comers.
- https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features#Git.asf.yamlfeatures-Mergebuttons

### Does this PR introduce _any_ user-facing change?

No. This is a committer-only feature and there is no change in terms of the policy.

### How was this patch tested?

N/A

Closes #33505 from dongjoon-hyun/minor.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: bad7a92)
The file was modified.asf.yaml (diff)
Commit e12bc4d31df7f32f4ecf079d8ace6fd34df770d7 by ueshin
[SPARK-36264][PYTHON] Add reorder_categories to CategoricalAccessor and CategoricalIndex

### What changes were proposed in this pull request?

Add `reorder_categories` to `CategoricalAccessor` and `CategoricalIndex`.

### Why are the changes needed?

We should implement `reorder_categories` in `CategoricalAccessor` and `CategoricalIndex`.

### Does this PR introduce _any_ user-facing change?

Yes, users will be able to use `reorder_categories`.

### How was this patch tested?

Added some tests.

Closes #33499 from ueshin/issues/SPARK-36264/reorder_categories.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
(commit: e12bc4d)
The file was modifiedpython/docs/source/reference/pyspark.pandas/indexing.rst (diff)
The file was modifiedpython/docs/source/reference/pyspark.pandas/series.rst (diff)
The file was modifiedpython/pyspark/pandas/tests/test_categorical.py (diff)
The file was modifiedpython/pyspark/pandas/missing/indexes.py (diff)
The file was modifiedpython/pyspark/pandas/categorical.py (diff)
The file was modifiedpython/pyspark/pandas/tests/indexes/test_category.py (diff)
The file was modifiedpython/pyspark/pandas/indexes/category.py (diff)
Commit 85adc2ff60812f4af7befe0e8791d868a23359ae by ueshin
[SPARK-36274][PYTHON] Fix equality comparison of unordered Categoricals

### What changes were proposed in this pull request?
Fix equality comparison of unordered Categoricals.

### Why are the changes needed?
Codes of a Categorical Series are used for Series equality comparison. However, that doesn't apply to unordered Categoricals, where the same value can have different codes in two same categories in a different order.

So we should map codes to value respectively and then compare the equality of value.

### Does this PR introduce _any_ user-facing change?
Yes.
From:
```py
>>> psser1 = ps.Series(pd.Categorical(list("abca")))
>>> psser2 = ps.Series(pd.Categorical(list("bcaa"), categories=list("bca")))
>>> with ps.option_context("compute.ops_on_diff_frames", True):
...     (psser1 == psser2).sort_index()
...
0     True
1     True
2     True
3    False
dtype: bool
```

To:
```py
>>> psser1 = ps.Series(pd.Categorical(list("abca")))
>>> psser2 = ps.Series(pd.Categorical(list("bcaa"), categories=list("bca")))
>>> with ps.option_context("compute.ops_on_diff_frames", True):
...     (psser1 == psser2).sort_index()
...
0    False
1    False
2    False
3     True
dtype: bool
```

### How was this patch tested?
Unit tests.

Closes #33497 from xinrong-databricks/cat_bug.

Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
(commit: 85adc2f)
The file was modifiedpython/pyspark/pandas/data_type_ops/categorical_ops.py (diff)
The file was modifiedpython/pyspark/pandas/tests/data_type_ops/test_categorical_ops.py (diff)
Commit 663cbdfbe5da6fa4af969344a6281c90b711207e by gurwls223
[SPARK-36279][INFRA][PYTHON] Fix lint-python to work with Python 3.9

### What changes were proposed in this pull request?

Fix `lint-python` to pick `PYTHON_EXECUTABLE` from the environment variable first to switch the Python and explicitly specify `PYTHON_EXECUTABLE` to use `python3.9` in CI.

### Why are the changes needed?

Currently `lint-python` uses `python3`, but it's not the one we expect in CI.
As a result, `black` check is not working.

```
The python3 -m black command was not found. Skipping black checks for now.
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

The `black` check in `lint-python` should work.

Closes #33507 from ueshin/issues/SPARK-36279/lint-python.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 663cbdf)
The file was modifieddev/lint-python (diff)
The file was modified.github/workflows/build_and_test.yml (diff)
The file was modifiedpython/pyspark/pandas/tests/data_type_ops/test_num_ops.py (diff)
Commit ae1c20ee0dc24bd35cd15380e814f06e07314af2 by gurwls223
[SPARK-36225][PYTHON][DOCS] Use DataFrame in python docstrings

### What changes were proposed in this pull request?
Changing references to Dataset in python docstrings to DataFrame

### Why are the changes needed?
no Dataset class in pyspark

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Doc change only

Closes #33438 from dominikgehl/feature/SPARK-36225.

Lead-authored-by: Dominik Gehl <dog@open.ch>
Co-authored-by: Dominik Gehl <gehl@fastmail.fm>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: ae1c20e)
The file was modifiedpython/pyspark/sql/dataframe.py (diff)
The file was modifiedpython/pyspark/ml/util.py (diff)
Commit 09e1c612729292a9a000a2d64267ab874d71a1e5 by yi.wu
[SPARK-36255][SHUFFLE][CORE] Stop pushing and retrying on FileNotFound exceptions

### What changes were proposed in this pull request?
Once the shuffle is cleaned up by the `ContextCleaner`, the shuffle files are deleted by the executors. In this case, the push of the shuffle data by the executors can throw `FileNotFoundException`s because the shuffle files are deleted. When this exception is thrown from the `shuffle-block-push-thread`, it causes the executor to exit. Both the `shuffle-block-push` threads and the netty event-loops will encounter `FileNotFoundException`s in this case.  The fix here stops these threads from pushing more blocks when they encounter `FileNotFoundException`. When the exception is from the `shuffle-block-push-thread`, it will get handled and logged as warning instead of failing the executor.

### Why are the changes needed?
This fixes the bug which causes executor to exits when they are instructed to clean up shuffle data.
Below is the stacktrace of this exception:
```
21/06/17 16:03:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[block-push-thread-1,5,main]
java.lang.Error: java.io.IOException: Error in opening FileSegmentManagedBuffer

{file=********/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data, offset=10640, length=190}
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1155)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Error in opening FileSegmentManagedBuffer\{file=*******/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data, offset=10640, length=190}

at org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:89)
at org.apache.spark.shuffle.ShuffleWriter.sliceReqBufferIntoBlockBuffers(ShuffleWriter.scala:294)
at org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$sendRequest(ShuffleWriter.scala:270)
at org.apache.spark.shuffle.ShuffleWriter.org$apache$spark$shuffle$ShuffleWriter$$pushUpToMax(ShuffleWriter.scala:191)
at org.apache.spark.shuffle.ShuffleWriter$$anon$2$$anon$4.run(ShuffleWriter.scala:244)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
... 2 more
Caused by: java.io.FileNotFoundException: ******/application_1619720975011_11057757/blockmgr-560cb4cf-9918-4ea7-a007-a16c5e3a35fe/0a/shuffle_1_690_0.data (No such file or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243)
at org.apache.spark.network.buffer.FileSegmentManagedBuffer.nioByteBuffer(FileSegmentManagedBuffer.java:62)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added a unit to verify no more data is pushed when `FileNotFoundException` is encountered. Have also verified in our environment.

Closes #33477 from otterc/SPARK-36255.

Authored-by: Chandni Singh <singh.chandni@gmail.com>
Signed-off-by: yi.wu <yi.wu@databricks.com>
(commit: 09e1c61)
The file was modifiedcore/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/shuffle/ShuffleBlockPusherSuite.scala (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ErrorHandler.java (diff)
Commit 70a15868fc97e2b86c5ecc7bcf812bfdb05d98ea by yi.wu
[SPARK-35259][SHUFFLE] Update ExternalBlockHandler Timer variables to expose correct units

### What changes were proposed in this pull request?
`ExternalBlockHandler` exposes 4 metrics which are Dropwizard `Timer` metrics, and are named with a `millis` suffix:
```
    private final Timer openBlockRequestLatencyMillis = new Timer();
    private final Timer registerExecutorRequestLatencyMillis = new Timer();
    private final Timer fetchMergedBlocksMetaLatencyMillis = new Timer();
    private final Timer finalizeShuffleMergeLatencyMillis = new Timer();
```
However these Dropwizard Timers by default use nanoseconds ([documentation](https://metrics.dropwizard.io/3.2.3/getting-started.html#timers)).

This causes `YarnShuffleServiceMetrics` to expose confusingly-named metrics like `openBlockRequestLatencyMillis_nanos_max` (the actual values are currently in nanos).

This PR adds a new `Timer` subclass, `TimerWithCustomTimeUnit`, which accepts a `TimeUnit` at creation time and exposes timing information using this time unit when values are read. Internally, values are still stored with nanosecond-level precision. The `Timer` metrics within `ExternalBlockHandler` are updated to use the new class with milliseconds as the unit. The logic to include the `nanos` suffix in the metric name within `YarnShuffleServiceMetrics` has also been removed, with the assumption that the metric name itself includes the units.

### Does this PR introduce _any_ user-facing change?
Yes, there are two changes.
First, the names for metrics exposed by `ExternalBlockHandler` via `YarnShuffleServiceMetrics` such as `openBlockRequestLatencyMillis_nanos_max` and `openBlockRequestLatencyMillis_nanos_50thPercentile` have been changed to remove the `_nanos` suffix. This would be considered a breaking change, but these names were only exposed as part of #32388, which has not yet been released (slated for 3.2.0). New names are like `openBlockRequestLatencyMillis_max` and `openBlockRequestLatencyMillis_50thPercentile`
Second, the values of the metrics themselves have changed, to expose milliseconds instead of nanoseconds. Note that this does not affect metrics such as `openBlockRequestLatencyMillis_count` or `openBlockRequestLatencyMillis_rate1`, only the `Snapshot`-related metrics (`max`, `median`, percentiles, etc.). For the YARN case, these metrics were also introduced by #32388, and thus also have not yet been released. It was possible for the nanosecond values to be consumed by some other metrics reporter reading the Dropwizard metrics directly, but I'm not aware of any such usages.

### How was this patch tested?
Unit tests have been updated.

Closes #33116 from xkrogen/xkrogen-SPARK-35259-ess-fix-metric-unit-prefix.

Authored-by: Erik Krogen <xkrogen@apache.org>
Signed-off-by: yi.wu <yi.wu@databricks.com>
(commit: 70a1586)
The file was addedcommon/network-common/src/main/java/org/apache/spark/network/util/TimerWithCustomTimeUnit.java
The file was addedcommon/network-common/src/test/java/org/apache/spark/network/util/TimerWithCustomUnitSuite.java
The file was modifiedcommon/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleServiceMetrics.java (diff)
The file was modifiedresource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceMetricsSuite.scala (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java (diff)