Changes

Summary

  1. [SPARK-34622][SQL] Push down limit through Project with Join (commit: fcef237) (details)
  2. [SPARK-34894][CORE] Use 'io.connectionTimeout' as a hint instead of (commit: 5692aa0) (details)
  3. [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd (commit: 7158e7f) (details)
  4. [SPARK-34463][PYSPARK][DOCS] Document caveats of Arrow selfDestruct (commit: 1237124) (details)
  5. [SPARK-32985][SQL][FOLLOWUP] Rename createNonBucketedReadRDD and minor (commit: 935aa8c) (details)
  6. [SPARK-33308][SQL] Refactor current grouping analytics (commit: a98dc60) (details)
  7. [SPARK-34884][SQL] Improve DPP evaluation to make filtering side must (commit: de66fa6) (details)
  8. [SPARK-34899][SQL] Use origin plan if we can not coalesce shuffle (commit: 24d39a5) (details)
  9. [SPARK-34828][YARN] Make shuffle service name configurable on client (commit: 9f065ff) (details)
  10. [SPARK-34906] Refactor TreeNode's children handling methods into (commit: bd0990e) (details)
  11. [SPARK-34856][FOLLOWUP][SQL] Remove dead code from (commit: c902f77) (details)
Commit fcef2375a352d0172109f9877082e7a44407522b by yamamuro
[SPARK-34622][SQL] Push down limit through Project with Join

### What changes were proposed in this pull request?

There is a `Project` between `LocalLimit` and `Join` if `Join`'s output do not match the `LocalLimit`'s output. This pr add support push down limit through this case. For example:
   ```scala
   spark.sql("create table t1(a int, b int, c int) using parquet")
   spark.sql("create table t2(x int, y int, z int) using parquet")
   spark.sql("select a from t1 left join t2 on a = x and b = y limit 5").explain("extended")
   ```

   ```
   == Optimized Logical Plan ==
   GlobalLimit 5
   +- LocalLimit 5
      +- Project [a#0]
         +- Join LeftOuter, ((a#0 = x#3) AND (b#1 = y#4))
            :- Project [a#0, b#1]
            :  +- Relation default.t1[a#0,b#1,c#2] parquet
            +- Project [x#3, y#4]
               +- Filter (isnotnull(x#3) AND isnotnull(y#4))
                  +- Relation default.t2[x#3,y#4,z#5] parquet
   ```

   After this pr:
   ```
   == Optimized Logical Plan ==
   GlobalLimit 5
   +- LocalLimit 5
      +- Project [a#0]
         +- Join LeftOuter, ((a#0 = x#3) AND (b#1 = y#4))
            :- LocalLimit 5
            :  +- Project [a#0, b#1]
            :     +- Relation default.t1[a#0,b#1,c#2] parquet
            +- Project [x#3, y#4]
               +- Filter (isnotnull(x#3) AND isnotnull(y#4))
                  +- Relation default.t2[x#3,y#4,z#5] parquet
   ```

### Why are the changes needed?

Improve limit push down to improve query performance.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit test.

Closes #31739 from wangyum/SPARK-34622.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: fcef237)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LimitPushdownSuite.scala (diff)
Commit 5692aa0c2c5f1e042d3493ab20ba2efa63dadfe1 by yao
[SPARK-34894][CORE] Use 'io.connectionTimeout' as a hint instead of 'spark.network.timeout' for lost connections

### What changes were proposed in this pull request?

Currently, when a connection for TransportClient is marked as idled and closed, we suggest users adjust `spark.network.timeout` for all transport modules. As a lot of timeout configs will fallback to the `spark.network.timeout`, this could be a piece of overkill advice, we should give a more targeted one with `spark.${moduleName}.io.connectionTimeout`

### Why are the changes needed?

better advise for overloaded network traffic cases

### Does this PR introduce _any_ user-facing change?

yes, when a connection is zombied and closed by spark internally, users can use a more targeted config to tune their jobs
### How was this patch tested?

Just log and doc. Passing Jenkins and GA

Closes #31990 from yaooqinn/SPARK-34894.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Kent Yao <yao@apache.org>
(commit: 5692aa0)
The file was modifieddocs/configuration.md (diff)
The file was modifiedcommon/network-common/src/main/java/org/apache/spark/network/server/TransportChannelHandler.java (diff)
Commit 7158e7f986630d4f67fb49a206d408c5f4384991 by gurwls223
[SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

### What changes were proposed in this pull request?
Some `spark-submit`  commands used to run benchmarks in the user's guide is wrong, we can't use these commands to run benchmarks successful.

So the major changes of this pr is correct these wrong commands, for example, run a benchmark which inherits from `SqlBasedBenchmark`, we must specify `--jars <spark core test jar>,<spark catalyst test jar>` because `SqlBasedBenchmark` based benchmark extends `BenchmarkBase(defined in spark core test jar)` and `SQLHelper(defined in spark catalyst test jar)`.

Another change of this pr is removed the `scalatest Assertions` dependency of Benchmarks because `scalatest-*.jar` are not in the distribution package, it will be troublesome to use.

### Why are the changes needed?
Make sure benchmarks can run using spark-submit cmd described in the guide

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Use the corrected `spark-submit` commands to run benchmarks successfully.

Closes #31995 from LuciferYang/fix-benchmark-guide.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 7158e7f)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/MiscBenchmark.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/UnsafeProjectionBenchmark.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/execution/benchmark/ObjectHashAggregateExecBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/OrcNestedSchemaPruningBenchmark.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/InsertTableWithDynamicPartitionsBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/ExtractBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/mllib/linalg/UDTSerializationBenchmark.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/execution/benchmark/InsertIntoHiveTableBenchmark.scala (diff)
The file was modifiedexternal/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/GenerateExecBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BloomFilterBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SortBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/PrimitiveArrayBenchmark.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/io/ZStandardBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BuiltInDataSourceWriteBenchmark.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/serializer/KryoBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/UpdateFieldsBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/ui/MetricsAggregationBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeBenchmark.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/rdd/CoalescedRDDBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/JoinBenchmark.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcReadBenchmark.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/serializer/KryoSerializerBenchmark.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/util/PropertiesCloneBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/ParquetNestedSchemaPruningBenchmark.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/util/random/XORShiftRandomBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/CharVarcharBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/RangeBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/ParquetNestedPredicatePushDownBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/InExpressionBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/MakeDateTimeBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/WideSchemaBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/HashedRelationMetricsBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UDFBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/columnar/compression/CompressionSchemeBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/OrcV2NestedSchemaPruningBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala (diff)
Commit 1237124062a541082922c650d1f1766b691f47bb by gurwls223
[SPARK-34463][PYSPARK][DOCS] Document caveats of Arrow selfDestruct

### What changes were proposed in this pull request?

As a followup for #29818, document caveats of using the Arrow selfDestruct option in toPandas, which include:
- toPandas() may be slower;
- the resulting dataframe may not support some Pandas operations due to immutable backing arrays.

### Why are the changes needed?

This will hopefully reduce user confusion as with SPARK-34463.

### Does this PR introduce _any_ user-facing change?

Yes - documentation is updated and a config setting description is updated to clearly indicate the config is experimental.

### How was this patch tested?
This is a documentation-only change.

Closes #31738 from lidavidm/spark-34463.

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 1237124)
The file was modifiedpython/docs/source/user_guide/arrow_pandas.rst (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit 935aa8c8db6824648483f26c2889c33030985259 by gurwls223
[SPARK-32985][SQL][FOLLOWUP] Rename createNonBucketedReadRDD and minor change in FileSourceScanExec

### What changes were proposed in this pull request?

This PR is a followup change to address comments in https://github.com/apache/spark/pull/31413#discussion_r603280965 and https://github.com/apache/spark/pull/31413#discussion_r603296475 . Minor change in `FileSourceScanExec`. No actual logic change here.

### Why are the changes needed?

Better readability.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests.

Closes #32000 from c21/bucket-scan.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 935aa8c)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala (diff)
Commit a98dc604087011b1035454250d397bd4b69c7507 by wenchen
[SPARK-33308][SQL] Refactor current grouping analytics

### What changes were proposed in this pull request?
As discussed in
https://github.com/apache/spark/pull/30145#discussion_r514728642
https://github.com/apache/spark/pull/30145#discussion_r514734648

We need to rewrite current Grouping Analytics grammar to support  as flexible as Postgres SQL to support subsequent development.
In  postgres sql, it support
```
select a, b, c, count(1) from t group by cube (a, b, c);
select a, b, c, count(1) from t group by cube(a, b, c);
select a, b, c, count(1) from t group by cube (a, b, c, (a, b), (a, b, c));
select a, b, c, count(1) from t group by rollup(a, b, c);
select a, b, c, count(1) from t group by rollup (a, b, c);
select a, b, c, count(1) from t group by rollup (a, b, c, (a, b), (a, b, c));
```
In this pr,  we have done three things as below, and we will split it to different pr:

- Refactor CUBE/ROLLUP (regarding them as ANTLR tokens in a parser)
- Refactor GROUPING SETS (the logical node -> a new expr)
- Support new syntax for CUBE/ROLLUP (e.g., GROUP BY CUBE ((a, b), (a, c)))

### Why are the changes needed?
Rewrite current Grouping Analytics grammar to support  as flexible as Postgres SQL to support subsequent development.

### Does this PR introduce _any_ user-facing change?
User can  write Grouping Analytics grammar as flexible as Postgres SQL to support subsequent development.

### How was this patch tested?
Added UT

Closes #30212 from AngersZhuuuu/refact-grouping-analytics.

Lead-authored-by: angerszhu <angers.zhu@gmail.com>
Co-authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Co-authored-by: AngersZhuuuu <angers.zhu@gmail.com>
Co-authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: a98dc60)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveGroupingAnalyticsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/group-analytics.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-functions/sql-expression-schema.md (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/group-analytics.sql (diff)
The file was modifieddocs/sql-ref-syntax-qry-select-groupby.md (diff)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
Commit de66fa63f988069435fd5564d216bb0f438b3aed by wenchen
[SPARK-34884][SQL] Improve DPP evaluation to make filtering side must can broadcast by size or broadcast by hint

### What changes were proposed in this pull request?

Improve dynamic partition pruning evaluation to make filtering side must can broadcast by size or broadcast by hint.

### Why are the changes needed?

1. Fast fail if filtering side can not broadcast by size or broadcast by hint.
2. We can safely disable `spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit test.

Closes #31984 from wangyum/SPARK-34884.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: de66fa6)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q72.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q72.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala (diff)
Commit 24d39a5ee234a92e3966062732f776c6d7dcd920 by wenchen
[SPARK-34899][SQL] Use origin plan if we can not coalesce shuffle partition

### What changes were proposed in this pull request?

Add check if `CoalesceShufflePartitions` really coalesce shuffle partition number.

### Why are the changes needed?

The `CoalesceShufflePartitions` can not coalesce such case if the total shuffle partitions size of mappers are big enough. Then it's confused to use `CustomShuffleReaderExec` which marked as `coalesced` but has no affect with partition number.

### Does this PR introduce _any_ user-facing change?

Probably yes, the plan changed.

### How was this patch tested?

Add test.

Closes #31994 from ulysses-you/SPARK-34899.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 24d39a5)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/CoalesceShufflePartitionsSuite.scala (diff)
Commit 9f065ff375bb2146f54dc1583b52beaf24064284 by tgraves
[SPARK-34828][YARN] Make shuffle service name configurable on client side and allow for classpath-based config override on server side

### What changes were proposed in this pull request?
Add a new config, `spark.shuffle.service.name`, which allows for Spark applications to look for a YARN shuffle service which is defined at a name other than the default `spark_shuffle`.

Add a new config, `spark.yarn.shuffle.service.metrics.namespace`, which allows for configuring the namespace used when emitting metrics from the shuffle service into the NodeManager's `metrics2` system.

Add a new mechanism by which to override shuffle service configurations independently of the configurations in the NodeManager. When a resource `spark-shuffle-site.xml` is present on the classpath of the shuffle service, the configs present within it will be used to override the configs coming from `yarn-site.xml` (via the NodeManager).

### Why are the changes needed?
There are two use cases which can benefit from these changes.

One use case is to run multiple instances of the shuffle service side-by-side in the same NodeManager. This can be helpful, for example, when running a YARN cluster with a mixed workload of applications running multiple Spark versions, since a given version of the shuffle service is not always compatible with other versions of Spark (e.g. see SPARK-27780). With this PR, it is possible to run two shuffle services like `spark_shuffle` and `spark_shuffle_3.2.0`, one of which is "legacy" and one of which is for new applications. This is possible because YARN versions since 2.9.0 support the ability to run shuffle services within an isolated classloader (see YARN-4577), meaning multiple Spark versions can coexist.

Besides this, the separation of shuffle service configs into `spark-shuffle-site.xml` can be useful for administrators who want to change and/or deploy Spark shuffle service configurations independently of the configurations for the NodeManager (e.g., perhaps they are owned by two different teams).

### Does this PR introduce _any_ user-facing change?
Yes. There are two new configurations related to the external shuffle service, and a new mechanism which can optionally be used to configure the shuffle service. `docs/running-on-yarn.md` has been updated to provide user instructions; please see this guide for more details.

### How was this patch tested?
In addition to the new unit tests added, I have deployed this to a live YARN cluster and successfully deployed two Spark shuffle services simultaneously, one running a modified version of Spark 2.3.0 (which supports some of the newer shuffle protocols) and one running Spark 3.1.1. Spark applications of both versions are able to communicate with their respective shuffle services without issue.

Closes #31936 from xkrogen/xkrogen-SPARK-34828-shufflecompat-config-from-classpath.

Authored-by: Erik Krogen <xkrogen@apache.org>
Signed-off-by: Thomas Graves <tgraves@apache.org>
(commit: 9f065ff)
The file was modifieddocs/running-on-yarn.md (diff)
The file was addedresource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnShuffleAlternateNameConfigSuite.scala
The file was modifiedcommon/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java (diff)
The file was modifiedresource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala (diff)
The file was modifiedresource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnShuffleServiceSuite.scala (diff)
The file was modifiedresource-managers/yarn/src/test/scala/org/apache/spark/network/yarn/YarnTestAccessor.scala (diff)
The file was modifiedcommon/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleServiceMetrics.java (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
Commit bd0990e3e813d17065c593fc74f383b494fe8146 by herman
[SPARK-34906] Refactor TreeNode's children handling methods into specialized traits

### What changes were proposed in this pull request?
Spark query plan node hierarchy has specialized traits (or abstract classes) for handling nodes with fixed number of children, for example `UnaryExpression`, `UnaryNode` and `UnaryExec` for representing an expression, a logical plan and a physical plan with only one child, respectively. This PR refactors the `TreeNode` hierarchy by extracting the children handling functionality into the following traits. `UnaryExpression` and other similar classes now extend the corresponding new trait:
```
trait LeafLike[T <: TreeNode[T]] { self: TreeNode[T] =>
  override final def children: Seq[T] = Nil
}

trait UnaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>
  def child: T
  transient override final lazy val children: Seq[T] = child :: Nil
}

trait BinaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>
  def left: T
  def right: T
  transient override final lazy val children: Seq[T] = left :: right :: Nil
}

trait TernaryLike[T <: TreeNode[T]] { self: TreeNode[T] =>
  def first: T
  def second: T
  def third: T
  transient override final lazy val children: Seq[T] = first :: second :: third :: Nil
}
```

This refactoring, which is part of a bigger effort to make tree transformations in Spark more efficient, has two benefits:
- It moves the children handling methods to a single place, instead of being spread in specific subclasses, which will help the future optimizations for tree traversals.
- It allows to mix in these traits with some concrete node types that could not extend the previous classes. For example, expressions with one child that extend `AggregateFunction` cannot extend `UnaryExpression` as `AggregateFunction` defines the `foldable` method final while `UnaryExpression` defines it as non final. With the new traits, we can directly extend the concrete class from `UnaryLike` in these cases. Classes with more specific child handling will make tree traversal methods faster.

In this PR we have also updated many concrete node types to extend these traits to benefit from more specific child handling.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

This is a refactoring, passes existing tests.

Closes #31932 from dbaliafroozeh/FactorOutChildHandlnigIntoSeparateTraits.

Authored-by: Ali Afroozeh <ali.afroozeh@databricks.com>
Signed-off-by: herman <herman@databricks.com>
(commit: bd0990e)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PartitionTransforms.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlus.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/DynamicPruning.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Exec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/WriteToContinuousDataSource.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/UnevaluableAggs.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubExprEvaluationRuntime.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Corr.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/streaming/WriteToStreamStatement.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Max.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PivotFirst.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Product.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CountIf.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Last.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Min.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/First.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/MaxByAndMinBy.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/bitwiseAggregates.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/streaming/WriteToStream.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/WriteToMicroBatchDataSource.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala (diff)
Commit c902f77b42d8d24d967c6208980734840535797d by srowen
[SPARK-34856][FOLLOWUP][SQL] Remove dead code from AnsiCast.typeCheckFailureMessage

### What changes were proposed in this pull request?

After https://github.com/apache/spark/pull/31954/, Array type is allowed to be cast as String type. So the customized conversion failure message branch from AnsiCast.typeCheckFailureMessage won't be reached anymore.
This PR is to remove the dead code.

### Why are the changes needed?

Code clean up.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Just removing dead code.

Closes #32004 from gengliangwang/SPARK-34856-followup.

Authored-by: Gengliang Wang <ltnwgl@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: c902f77)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala (diff)