Changes

Summary

  1. [SPARK-36365][PYTHON] Remove old workarounds related to null ordering (commit: 90d31df) (details)
  2. [SPARK-35976][PYTHON] Adjust astype method for ExtensionDtype in pandas (commit: f04e991) (details)
  3. [SPARK-36319][SQL][PYTHON] Make Observation return Map instead of Row (commit: a65eb36) (details)
  4. [SPARK-36362][CORE][SQL][TESTS] Omnibus Java code static analyzer (commit: 72615bc) (details)
  5. [SPARK-36092][INFRA][BUILD][PYTHON] Migrate to GitHub Actions with (commit: c0d1860) (details)
  6. [SPARK-36362][CORE][SQL][FOLLOWUP] Fix java linter errors (commit: 22c4922) (details)
  7. [SPARK-32919][FOLLOW-UP] Filter out driver in the merger locations and (commit: 2a18f82) (details)
  8. [SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for (commit: c039d99) (details)
  9. [SPARK-36372][SQL] v2 ALTER TABLE ADD COLUMNS should check duplicates (commit: 3b713e7) (details)
  10. [SPARK-36237][UI][SQL] Attach and start handler after application (commit: 951efb8) (details)
Commit 90d31dfcb70d880229cc2feccf7fb7bd6b7a451a by gurwls223
[SPARK-36365][PYTHON] Remove old workarounds related to null ordering

### What changes were proposed in this pull request?

Remove old workarounds related to null ordering.

### Why are the changes needed?

In pandas-on-Spark, there are still some remaining places to call `Column._jc.(asc|desc)_nulls_(first|last)` as a workaround from Koalas to support Spark 2.3.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Modified a couple of tests and existing tests.

Closes #33597 from ueshin/issues/SPARK-36365/nulls_first_last.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 90d31df)
The file was modifiedpython/pyspark/pandas/frame.py (diff)
The file was modifiedpython/pyspark/pandas/series.py (diff)
The file was modifiedpython/pyspark/pandas/groupby.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_dataframe.py (diff)
Commit f04e991e6a3bc70e92a18a9de7a1d9d6427df4ab by gurwls223
[SPARK-35976][PYTHON] Adjust astype method for ExtensionDtype in pandas API on Spark

### What changes were proposed in this pull request?
This patch set value to `<NA>` (pd.NA) in BooleanExtensionOps and StringExtensionOps.

### Why are the changes needed?
The pandas behavior:
```python
>>> pd.Series([True, False, None], dtype="boolean").astype(str).tolist()
['True', 'False', '<NA>']
>>> pd.Series(['s1', 's2', None], dtype="string").astype(str).tolist()
['1', '2', '<NA>']
```

pandas on spark
```python
>>> import pandas as pd
>>> from pyspark import pandas as ps

# Before
>>> ps.from_pandas(pd.Series([True, False, None], dtype="boolean")).astype(str).tolist()
['True', 'False', 'None']
>>> ps.from_pandas(pd.Series(['s1', 's2', None], dtype="string")).astype(str).tolist()
['True', 'False', 'None']

# After
>>> ps.from_pandas(pd.Series([True, False, None], dtype="boolean")).astype(str).tolist()
['True', 'False', '<NA>']
>>> ps.from_pandas(pd.Series(['s1', 's2', None], dtype="string")).astype(str).tolist()
['s1', 's2', '<NA>']
```

See more in [SPARK-35976](https://issues.apache.org/jira/browse/SPARK-35976)

### Does this PR introduce _any_ user-facing change?
Yes, return `<NA>` when None to follow the pandas behavior

### How was this patch tested?
Change the ut to cover this scenario.

Closes #33585 from Yikun/SPARK-35976.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: f04e991)
The file was modifiedpython/pyspark/pandas/tests/data_type_ops/test_string_ops.py (diff)
The file was modifiedpython/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py (diff)
The file was modifiedpython/pyspark/pandas/data_type_ops/string_ops.py (diff)
The file was modifiedpython/pyspark/pandas/data_type_ops/boolean_ops.py (diff)
Commit a65eb36baea9f618225f89924d3a519fb9644c4a by gurwls223
[SPARK-36319][SQL][PYTHON] Make Observation return Map instead of Row

### What changes were proposed in this pull request?
The Observation API (Scala, Java, PySpark) now returns a `Map` / `Dict`. Before, it returned `Row` simply because the metrics are (internal to Observation) retrieved from the listener as rows. Since that is hidden from the user by the Observation API, there is no need to return `Row`.

While touching this code, this moves the unit tests from `DataFrameSuite,scala` to `DatasetSuite.scala` and from `JavaDataFrameSuite.java` to `JavaDatasetSuite.java`, which is a better place.

### Why are the changes needed?
This simplifies the API and accessing the metrics, especially in Java. There is no need for the concept `Row` when retrieving the observation result.

### Does this PR introduce _any_ user-facing change?
Yes, it changes the return type of `get` from `Row` to `Map` (Scala) / `Dict` (Python) and introduces `getAsJavaMap` (Java).

### How was this patch tested?
This is tested in `DatasetSuite.SPARK-34806: observation on datasets`, `JavaDatasetSuite.testObservation` and `test_dataframe.test_observe`.

Closes #33545 from EnricoMi/branch-observation-returns-map.

Authored-by: Enrico Minack <github@enrico.minack.dev>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: a65eb36)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala (diff)
The file was modifiedpython/pyspark/sql/observation.py (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Observation.scala (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java (diff)
The file was modifiedpython/pyspark/sql/observation.pyi (diff)
The file was modifiedpython/pyspark/sql/tests/test_dataframe.py (diff)
The file was modifiedpython/pyspark/sql/dataframe.py (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java (diff)
Commit 72615bc551adaa238d15a8b43a8f99aaf741c30f by dongjoon
[SPARK-36362][CORE][SQL][TESTS] Omnibus Java code static analyzer warning fixes

### What changes were proposed in this pull request?

Fix up some minor Java issues:

- Some int*int multiplications that widen to long maybe could overflow
- Unnecessarily non-static inner classes
- Some tests "catch (AssertionError)" and do nothing
- Manual array iteration vs very slightly faster/simpler foreach
- Incorrect generic types that just happen to not cause a runtime error
- Missed opportunities for try-close
- Mutable enums
- .. and a few other minor things

### Why are the changes needed?

Some are minor but clear fixes; some may have a marginal perf impact or avoid a bug later. Also: maybe avoid future PRs to address these one by one.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests

Closes #33594 from srowen/SPARK-36362.

Authored-by: Sean Owen <srowen@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 72615bc)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/crypto/AuthIntegrationSuite.java (diff)
The file was modifiedcommon/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/crypto/AuthEngineSuite.java (diff)
The file was modifiedcommon/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java (diff)
The file was modifiedstreaming/src/test/java/org/apache/spark/streaming/JavaMapWithStateSuite.java (diff)
The file was modifiedcommon/kvstore/src/main/java/org/apache/spark/util/kvstore/ArrayWrappers.java (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/connector/JavaColumnarDataSourceV2.java (diff)
The file was modifiedcommon/kvstore/src/test/java/org/apache/spark/util/kvstore/DBIteratorSuite.java (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/StreamSuite.java (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java (diff)
The file was modifiedcore/src/test/java/org/apache/spark/shuffle/sort/PackedRecordPointerSuite.java (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/crypto/AuthMessagesSuite.java (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/util/NettyMemoryMetricsSuite.java (diff)
The file was modifiedcommon/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/connector/JavaPartitionAwareDataSource.java (diff)
The file was modifiedcommon/kvstore/src/main/java/org/apache/spark/util/kvstore/KVTypeInfo.java (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/sasl/SparkSaslSuite.java (diff)
The file was modifiedcore/src/test/java/org/apache/spark/io/GenericFileInputStreamSuite.java (diff)
The file was modifiedcore/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java (diff)
The file was modifiedstreaming/src/test/java/test/org/apache/spark/streaming/Java8APISuite.java (diff)
The file was modifiedlauncher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java (diff)
The file was modifiedcommon/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java (diff)
The file was modifiedcore/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java (diff)
The file was modifiedcommon/network-shuffle/src/test/java/org/apache/spark/network/shuffle/OneForOneBlockFetcherSuite.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java (diff)
The file was modifiedsql/catalyst/src/test/java/org/apache/spark/sql/catalyst/expressions/RowBasedKeyValueBatchSuite.java (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/StreamTestHelper.java (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/protocol/MergedBlockMetaSuccessSuite.java (diff)
The file was modifiedcore/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/RpcIntegrationSuite.java (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/protocol/MessageWithHeaderSuite.java (diff)
The file was modifiedexternal/kafka-0-10/src/test/java/org/apache/spark/streaming/kafka010/JavaConsumerStrategySuite.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/connector/JavaSchemaRequiredDataSource.java (diff)
The file was modifiedcommon/unsafe/src/main/java/org/apache/spark/unsafe/array/ByteArrayMethods.java (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/JavaColumnExpressionSuite.java (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/connector/JavaSimpleDataSourceV2.java (diff)
The file was modifiedcore/src/test/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorterSuite.java (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/connector/JavaReportStatisticsDataSource.java (diff)
The file was modifiedcommon/kvstore/src/test/java/org/apache/spark/util/kvstore/LevelDBBenchmark.java (diff)
Commit c0d1860f256eccd09a07584b8a77e6c60cc74c46 by gurwls223
[SPARK-36092][INFRA][BUILD][PYTHON] Migrate to GitHub Actions with Codecov from Jenkins

### What changes were proposed in this pull request?

This PR proposes to migrate Coverage report from Jenkins to GitHub Actions by setting a dailly cron job.

### Why are the changes needed?

For some background, currently PySpark code coverage is being reported in this specific Jenkins job: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/

Because of the security issue between [Codecov service](https://app.codecov.io/gh/) and Jenkins machines, we had to work around by manually hosting a coverage site via GitHub pages, see also https://spark-test.github.io/pyspark-coverage-site/ by spark-test account (which is shared to only subset of PMC members).

Since we now run the build via GitHub Actions, we can leverage [Codecov plugin](https://github.com/codecov/codecov-action), and remove the workaround we used.

### Does this PR introduce _any_ user-facing change?

Virtually no. Coverage site (UI) might change but the information it holds should be virtually the same.

### How was this patch tested?

I manually tested:
- Scheduled run: https://github.com/HyukjinKwon/spark/actions/runs/1082261484
- Coverage report: https://codecov.io/gh/HyukjinKwon/spark/tree/73f0291a7df1eda98045cd759303aac1c2a9c929/python/pyspark
- Run against a PR: https://github.com/HyukjinKwon/spark/actions/runs/1082367175

Closes #33591 from HyukjinKwon/SPARK-36092.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: c0d1860)
The file was modifiedREADME.md (diff)
The file was modifiedpython/pyspark/mllib/tests/test_streaming_algorithms.py (diff)
The file was modifiedpython/pyspark/tests/test_context.py (diff)
The file was modifiedpython/pyspark/tests/test_worker.py (diff)
The file was modifiedpython/run-tests-with-coverage (diff)
The file was modified.github/workflows/build_and_test.yml (diff)
The file was modifiedpython/pyspark/streaming/tests/test_dstream.py (diff)
The file was modifiedpython/test_coverage/coverage_daemon.py (diff)
The file was modifieddev/requirements.txt (diff)
The file was modifieddev/run-tests.py (diff)
The file was modified.gitignore (diff)
The file was modifiedpython/test_coverage/sitecustomize.py (diff)
Commit 22c49226f70f76caa268602d78f6c23e40aeae09 by gurwls223
[SPARK-36362][CORE][SQL][FOLLOWUP] Fix java linter errors

### What changes were proposed in this pull request?

This is a follow-up of #33594 to fix the Java linter error.

### Why are the changes needed?

To recover GitHub Action.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the GitHub Action.

Closes #33601 from dongjoon-hyun/SPARK-36362.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 22c4922)
The file was modifiedcommon/kvstore/src/test/java/org/apache/spark/util/kvstore/DBIteratorSuite.java (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/StreamSuite.java (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/JavaColumnExpressionSuite.java (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/RpcIntegrationSuite.java (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/util/NettyMemoryMetricsSuite.java (diff)
The file was modifiedcommon/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/crypto/AuthEngineSuite.java (diff)
The file was modifiedcommon/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java (diff)
Commit 2a18f829409426ce29dc553bd35a27d98979ea1c by mridulatgmail.com
[SPARK-32919][FOLLOW-UP] Filter out driver in the merger locations and fix the return type of RemoveShufflePushMergerLocations

### What changes were proposed in this pull request?

SPARK-32919 added support for fetching shuffle push merger locations with push-based shuffle. Filter out driver host in the shuffle push merger locations as driver won't participate in the shuffle merge also fix ClassCastException in the RemoveShufflePushMergerLocations.
### Why are the changes needed?

No

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added unit tests.

Closes #33425 from venkata91/SPARK-32919-follow-up.

Authored-by: Venkata krishnan Sowrirajan <vsowrirajan@linkedin.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
(commit: 2a18f82)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala (diff)
Commit c039d998128dd0dab27f43e7de083a71b9d1cfcf by mridulatgmail.com
[SPARK-32923][CORE][SHUFFLE] Handle indeterminate stage retries for push-based shuffle

### What changes were proposed in this pull request?
[[SPARK-23243](https://issues.apache.org/jira/browse/SPARK-23243)] and [[SPARK-25341](https://issues.apache.org/jira/browse/SPARK-25341)] addressed cases of stage retries for indeterminate stage involving operations like repartition. This PR addresses the same issues in the context of push-based shuffle. Currently there is no way to distinguish the current execution of a stage for a shuffle ID. Therefore the changes explained below are necessary.

Core changes are summarized as follows:

1. Introduce a new variable `shuffleMergeId` in `ShuffleDependency` which is monotonically increasing value tracking the temporal ordering of execution of <stage-id, stage-attempt-id> for a shuffle ID.
2. Correspondingly make changes in the push-based shuffle protocol layer in `MergedShuffleFileManager`, `BlockStoreClient` passing the `shuffleMergeId` in order to keep track of the shuffle output in separate files on the shuffle service side.
3. `DAGScheduler` increments the `shuffleMergeId` tracked in `ShuffleDependency` in the cases of a indeterministic stage execution
4. Deterministic stage will have `shuffleMergeId` set to 0 as no special handling is needed in this case and indeterminate stage will have `shuffleMergeId` starting from 1.

### Why are the changes needed?

New protocol changes are needed due to the reasons explained above.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?
Added new unit tests in `RemoteBlockPushResolverSuite, DAGSchedulerSuite, BlockIdSuite, ErrorHandlerSuite`

Closes #33034 from venkata91/SPARK-32923.

Authored-by: Venkata krishnan Sowrirajan <vsowrirajan@linkedin.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
(commit: c039d99)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java (diff)
The file was modifiedcommon/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ExternalBlockHandlerSuite.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/MergeStatuses.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/MergedBlocksMetaListener.java (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/shuffle/ShuffleBlockPusher.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/FetchShuffleBlockChunks.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/FinalizeShuffleMerge.java (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala (diff)
The file was modifiedcommon/network-shuffle/src/test/java/org/apache/spark/network/shuffle/OneForOneBlockPusherSuite.java (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/shuffle/ShuffleBlockPusherSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/MapOutputTracker.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/MergeStatus.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/Dependency.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/shuffle/sort/IndexShuffleBlockResolverSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/BlockStoreClient.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java (diff)
The file was modifiedcommon/network-shuffle/src/test/java/org/apache/spark/network/shuffle/ErrorHandlerSuite.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/PushBlockStream.java (diff)
The file was modifiedcommon/network-shuffle/src/test/java/org/apache/spark/network/shuffle/protocol/FetchShuffleBlockChunksSuite.java (diff)
The file was modifiedcommon/network-common/src/test/java/org/apache/spark/network/TransportRequestHandlerSuite.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ErrorHandler.java (diff)
The file was modifiedcommon/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/BlockIdSuite.scala (diff)
The file was modifiedcommon/network-shuffle/src/test/java/org/apache/spark/network/shuffle/OneForOneBlockFetcherSuite.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/MergedShuffleFileManager.java (diff)
The file was modifiedcommon/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockPusher.java (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/shuffle/ShuffleBlockResolver.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManager.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala (diff)
The file was modifiedcommon/network-common/src/main/java/org/apache/spark/network/protocol/MergedBlockMetaRequest.java (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/PushBasedFetchHelper.scala (diff)
The file was modifiedcommon/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RemoteBlockPushResolverSuite.java (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockId.scala (diff)
Commit 3b713e7f6189dfe1c5bbb1a527bf1266bde69f69 by wenchen
[SPARK-36372][SQL] v2 ALTER TABLE ADD COLUMNS should check duplicates for the user specified columns

### What changes were proposed in this pull request?

Currently, v2 ALTER TABLE ADD COLUMNS does not check duplicates for the user specified columns. For example,
```
spark.sql(s"CREATE TABLE $t (id int) USING $v2Format")
spark.sql("ALTER TABLE $t ADD COLUMNS (data string, data string)")
```
doesn't fail the analysis, and it's up to the catalog implementation to handle it. For v1 command, the duplication is checked before invoking the catalog.

### Why are the changes needed?

To check the duplicate columns during analysis and be consistent with v1 command.

### Does this PR introduce _any_ user-facing change?

Yes, now the above will command will print out the fllowing:
```
org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the user specified columns: `data`
```

### How was this patch tested?

Added new unit tests

Closes #33600 from imback82/alter_add_duplicate_columns.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 3b713e7)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/V2CommandsCaseSensitivitySuite.scala (diff)
Commit 951efb80856e2a92ba3690886c95643567dae9d0 by gengliang
[SPARK-36237][UI][SQL] Attach and start handler after application started in UI

### What changes were proposed in this pull request?
When using prometheus to fetch metrics with a defined interval, we always pull data through restful API.
If the pulling happens when a driver SparkUI port is bind to the driver and the application is not fully started, Spark driver will throw a lot of exceptions about NoSuchElementException as below.
```
21/07/19 04:53:37 INFO Client: Preparing resources for our AM container
21/07/19 04:53:37 INFO Client: Uploading resource hdfs://tl3/packages/jars/spark-2.4-archive.tar.gz -> hdfs://R2/user/xiaoke.zhou/.sparkStaging/application_1624456325569_7143920/spark-2.4-archive.tar.gz
21/07/19 04:53:37 WARN JettyUtils: GET /jobs/ failed: java.util.NoSuchElementException: Failed to get the application information. If you are starting up Spark, please wait a while until it's ready.
java.util.NoSuchElementException: Failed to get the application information. If you are starting up Spark, please wait a while until it's ready.
at org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:43)
at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:275)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:90)
at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:513)
at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.spark_project.jetty.server.Server.handle(Server.java:539)
at org.spark_project.jetty.server.HttpChannel.handle(Htt
[2021-07-19 04:54:55,111] INFO - pChannel.java:333)
at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)
```

Have check origin pr, we need to start server and bind port before taskScheduler started for client mode since we need to pass web url to register application master. But when we attach and start handler this time, we can provide restful API to user, but during this time, application is not started so we always return such error.

In this pr, to start SparUI, Spark starts Jetty Server first to bind address.
After the Spark application is fully started, call [attachAllHandlers] to start all existing handlers to Jetty seerver.

### Why are the changes needed?
Improve the SparkUI start logical

### Does this PR introduce _any_ user-facing change?
Before spark application is fully started, all url request will return
```
Spark is starting up. Please wait a while until it's ready.
```
in the page

### How was this patch tested?
Existed

During after bind address and finish start spark application, all request will show
![image](https://user-images.githubusercontent.com/46485123/127124316-0ec637c5-eeab-4e5e-973b-8fec4f928a3c.png)

Closes #33457 from AngersZhuuuu/SPARK-36237.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(commit: 951efb8)
The file was modifiedcore/src/main/scala/org/apache/spark/SparkContext.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/ui/SparkUI.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/ui/UISuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/TestUtils.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/ui/WebUI.scala (diff)