Changes

Summary

  1. Revert "[SPARK-32850][CORE] Simplify the RPC message flow of (commit: 0c66813) (details)
  2. [SPARK-32946][R][SQL] Add withColumn to SparkR (commit: 1ad1f71) (details)
  3. [SPARK-32867][SQL] When explain, HiveTableRelation show limited message (commit: c336ddf) (details)
  4. [SPARK-32886][WEBUI] fix 'undefined' link in event timeline view (commit: d01594e) (details)
  5. [SPARK-32312][DOC][FOLLOWUP] Fix the minimum version of PyArrow in the (commit: 5440ea8) (details)
  6. [SPARK-32951][SQL] Foldable propagation from Aggregate (commit: f03c035) (details)
  7. [SPARK-32949][R][SQL] Add timestamp_seconds to SparkR (commit: 3118c22) (details)
  8. [SPARK-32955][DOCS] An item in the navigation bar in the WebUI has a (commit: 790d9ef) (details)
Commit 0c66813ad9867e366689b47c81bdd8a94ac17828 by wenchen
Revert "[SPARK-32850][CORE] Simplify the RPC message flow of
decommission"
This reverts commit 56ae95053df4afa9764df3f1d88f300896ca0183.
(commit: 0c66813)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/master/Master.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/WorkerDecommissionSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/DeployMessage.scala (diff)
The file was modifiedstreaming/src/test/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManagerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManager.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/DecommissionWorkerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManagerStorageEndpoint.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/worker/Worker.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/client/AppClientSuite.scala (diff)
Commit 1ad1f7153592344d3b2adc1196ffe8cc921e0292 by gurwls223
[SPARK-32946][R][SQL] Add withColumn to SparkR
### What changes were proposed in this pull request?
This PR adds `withColumn` function SparkR.
### Why are the changes needed?
### Does this PR introduce _any_ user-facing change?
Yes, new function, equivalent to Scala and PySpark equivalents, is
exposed to the end user.
### How was this patch tested?
New unit tests added.
Closes #29814 from zero323/SPARK-32946.
Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 1ad1f71)
The file was modifiedR/pkg/R/generics.R (diff)
The file was modifiedR/pkg/R/column.R (diff)
The file was modifiedR/pkg/tests/fulltests/test_sparkSQL.R (diff)
The file was modifiedR/pkg/NAMESPACE (diff)
Commit c336ddfdb81dd5c27fd109d62138dc129a02c30b by wenchen
[SPARK-32867][SQL] When explain, HiveTableRelation show limited message
### What changes were proposed in this pull request? In current mode,
when explain a SQL plan with HiveTableRelation, it will show so many
info about HiveTableRelation's prunedPartition,  this make plan hard to
read, this pr make this information simpler.
Before:
![image](https://user-images.githubusercontent.com/46485123/93012078-aeeca080-f5cf-11ea-9286-f5c15eadbee3.png)
For UT
```
test("Make HiveTableScanExec message simple") {
withSQLConf("hive.exec.dynamic.partition.mode" -> "nonstrict") {
     withTable("df") {
       spark.range(30)
         .select(col("id"), col("id").as("k"))
         .write
         .partitionBy("k")
         .format("hive")
         .mode("overwrite")
         .saveAsTable("df")
        val df = sql("SELECT df.id, df.k FROM df WHERE df.k < 2")
       df.explain(true)
     }
   }
}
```
After this pr will show
```
== Parsed Logical Plan ==
'Project ['df.id, 'df.k]
+- 'Filter ('df.k < 2)
  +- 'UnresolvedRelation [df], []
== Analyzed Logical Plan == id: bigint, k: bigint Project [id#11L,
k#12L]
+- Filter (k#12L < cast(2 as bigint))
  +- SubqueryAlias spark_catalog.default.df
     +- HiveTableRelation [`default`.`df`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#11L],
Partition Cols: [k#12L]]
== Optimized Logical Plan == Filter (isnotnull(k#12L) AND (k#12L < 2))
+- HiveTableRelation [`default`.`df`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#11L],
Partition Cols: [k#12L], Pruned Partitions: [(k=0), (k=1)]]
== Physical Plan == Scan hive default.df [id#11L, k#12L],
HiveTableRelation [`default`.`df`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [id#11L],
Partition Cols: [k#12L], Pruned Partitions: [(k=0), (k=1)]],
[isnotnull(k#12L), (k#12L < 2)]
```
In my pr, I will construct `HiveTableRelation`'s `simpleString` method
to avoid show too much unnecessary info in explain plan. compared to
what we had before,I decrease the detail metadata of each partition and
only retain the partSpec to show each partition was pruned. Since for
detail information, we always don't see this in Plan but to use DESC
EXTENDED statement.
### Why are the changes needed? Make plan about HiveTableRelation more
readable
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? No
Closes #29739 from AngersZhuuuu/HiveTableScan-meta-location-info.
Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: c336ddf)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala (diff)
Commit d01594e8d186e63a6c3ce361e756565e830d5237 by srowen
[SPARK-32886][WEBUI] fix 'undefined' link in event timeline view
### What changes were proposed in this pull request?
Fix ".../jobs/undefined" link from "Event Timeline" in jobs page. Job
page link in "Event Timeline" view is constructed by fetching job page
link defined in job list below. when job count exceeds page size of job
table, only links of jobs in job table can be fetched from page. Other
jobs' link would be 'undefined', and links of them in "Event Timeline"
are broken, they are redirected to some wired URL like
".../jobs/undefined". This PR is fixing this wrong link issue. With this
PR, job link in "Event Timeline" view would always redirect to correct
job page.
### Why are the changes needed?
Wrong link (".../jobs/undefined") in "Event Timeline" of jobs page. for
example, the first job in below page is not in table below, as job
count(116) exceeds page size(100). When clicking it's item in "Event
Timeline", page is redirected to ".../jobs/undefined", which is wrong.
Links in "Event Timeline" should always be correct.
![undefinedlink](https://user-images.githubusercontent.com/10524738/93184779-83fa6d80-f6f1-11ea-8a80-1a304ca9cbb2.JPG)
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Manually tested.
Closes #29757 from zhli1142015/fix-link-event-timeline-view.
Authored-by: Zhen Li <zhli@microsoft.com> Signed-off-by: Sean Owen
<srowen@gmail.com>
(commit: d01594e)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/timeline-view.js (diff)
Commit 5440ea84eeb2008d70cf890f0e3765167c2b6a62 by gurwls223
[SPARK-32312][DOC][FOLLOWUP] Fix the minimum version of PyArrow in the
installation guide
### What changes were proposed in this pull request?
Now that the minimum version of PyArrow is `1.0.0`, we should update the
version in the installation guide.
### Why are the changes needed?
The minimum version of PyArrow was upgraded to `1.0.0`.
### Does this PR introduce _any_ user-facing change?
Users see the correct minimum version in the installation guide.
### How was this patch tested?
N/A
Closes #29829 from ueshin/issues/SPARK-32312/doc.
Authored-by: Takuya UESHIN <ueshin@databricks.com> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 5440ea8)
The file was modifiedpython/docs/source/getting_started/install.rst (diff)
Commit f03c03576a34e6888da6eeb870dae1f6189b62c1 by dhyun
[SPARK-32951][SQL] Foldable propagation from Aggregate
### What changes were proposed in this pull request? This PR adds
foldable propagation from `Aggregate` as per:
https://github.com/apache/spark/pull/29771#discussion_r490412031
### Why are the changes needed? This is an improvement as `Aggregate`'s
`aggregateExpressions` can contain foldables that can be propagated up.
### Does this PR introduce _any_ user-facing change? No.
### How was this patch tested? New UT.
Closes #29816 from
peter-toth/SPARK-32951-foldable-propagation-from-aggregate.
Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: f03c035)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14a.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14b/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14a/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14a/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14a.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q41/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14a.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14a/simplified.txt (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FoldablePropagationSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14a/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q41.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q41.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14b.sf100/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14b.sf100/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q14b/simplified.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q41/explain.txt (diff)
The file was modifiedsql/core/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q14a.sf100/simplified.txt (diff)
Commit 3118c220f919ba185a57abfbe55eac1822c89a52 by dhyun
[SPARK-32949][R][SQL] Add timestamp_seconds to SparkR
### What changes were proposed in this pull request?
This PR adds R wrapper for `timestamp_seconds` function.
### Why are the changes needed?
Feature parity.
### Does this PR introduce _any_ user-facing change?
Yes, it adds a new R function.
### How was this patch tested?
New unit tests.
Closes #29822 from zero323/SPARK-32949.
Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 3118c22)
The file was modifiedR/pkg/R/functions.R (diff)
The file was modifiedR/pkg/R/generics.R (diff)
The file was modifiedR/pkg/tests/fulltests/test_sparkSQL.R (diff)
The file was modifiedR/pkg/NAMESPACE (diff)
Commit 790d9ef2d3a90388ef3c36d5ae47b2fe369a83ba by gurwls223
[SPARK-32955][DOCS] An item in the navigation bar in the WebUI has a
wrong link
### What changes were proposed in this pull request?
This PR fixes an link in `_layouts/global.html`. The item `More` in the
navigation bar in the WebUI links to `api.html` but it seems to be
wrong. This PR also removes `api.md` because it and `api.html` generated
from it are not referred from anywhere.
### Why are the changes needed?
Fix the wrong link.
### Does this PR introduce _any_ user-facing change?
Yes. "More" item no longer links to `api.html`.
### How was this patch tested?
`SKIP_API=1 jekyll build` and confirmed that the item no longer links to
`api.html`. I also confirmed `api.md` and `api.html` are no longer
referred from anywhere by the following command.
```
$ grep -Erl "api\.(html|md)" docs
```
Closes #29821 from sarutak/fix-api-doc-link.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 790d9ef)
The file was modifieddocs/_layouts/global.html (diff)
The file was removeddocs/api.md