Progress:
Changes

Summary

  1. [SPARK-36978][SQL] InferConstraints rule should create IsNotNull (details)
  2. [SPARK-36965][PYTHON] Extend python test runner by logging out the temp (details)
  3. [SPARK-35925][SQL] Support DayTimeIntervalType in width-bucket function (details)
Commit 0bba90b8f3276c8a8fedc7ef5d523eb1ce2246a7 by wenchen
[SPARK-36978][SQL] InferConstraints rule should create IsNotNull constraints on the accessed nested field instead of the root nested type

### What changes were proposed in this pull request?
The PR modifies `IsNotNull` constraint generation to generate constraints on the referenced nested field instead of generating a constraint on the top level nested type. See the following section for an example.

### Why are the changes needed?
[InferFiltersFromConstraints](https://github.com/apache/spark/blob/05c0fa573881b49d8ead9a5e16071190e5841e1b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L1206) optimization rule generates `IsNotNull` constraints corresponding to null intolerant predicates. The `IsNotNull` constraints are generated on the attribute inside the corresponding predicate.
e.g. A predicate `a > 0` on an integer column a will result in a constraint `IsNotNull(a)`. On the other hand a predicate on a nested int column `structCol.b` where `structCol` is a struct column results in a constraint `IsNotNull(structCol)`.

This generation of constraints on the root level nested type is extremely conservative as it could lead to materialization of the the entire struct. The constraint should instead be generated on the nested field being referenced by the predicate. In the above example, the constraint should be `IsNotNull(structCol.b)` instead of `IsNotNull(structCol)`.

The new constraints also create opportunities for nested pruning. Currently `IsNotNull(structCol)` constraint would preclude pruning of `structCol`. However the constraint `IsNotNull(structCol.b)` could create opportunities to prune `structCol`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added test to `InferFiltersFromConstraintsSuite`.

Closes #34263 from utkarsh39/infer-nested-constraints.

Authored-by: Utkarsh <utkarsh.agarwal@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala (diff)
Commit c29bb0207754c2018856bda842c6bf7a34d4e93b by piros.attila.zsolt
[SPARK-36965][PYTHON] Extend python test runner by logging out the temp output files

### What changes were proposed in this pull request?

Extending the python test runner by logging out the temp output files.

### Why are the changes needed?

I was running a python test which was extremely slow and I was surprised the unit-tests.log has not been even created. Looked into the code and as I got the tests can be executed in parallel and each one has its own temporary output file which is only added to the unit-tests.log when a test is finished with a failure (after acquiring a lock to avoid parallel write on unit-tests.log).

To avoid such a confusion it would make sense to log out the path of those temporary output files this way when a test got stuck we can peek into its log file.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I was running the python tests:
```
./python/run-tests
Running PySpark tests. Output is in /Users/attilazsoltpiros/git/attilapiros/spark/python/unit-tests.log
Will test against the following Python executables: ['/usr/local/bin/python3']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-pandas', 'pyspark-pandas-slow', 'pyspark-resource', 'pyspark-sql', 'pyspark-streaming']
/usr/local/bin/python3 python_implementation is CPython
/usr/local/bin/python3 version is: Python 3.9.7
Starting test(/usr/local/bin/python3): pyspark.ml.tests.test_feature (temp output: /tmp/usr_local_bin_python3__pyspark.ml.tests.test_feature__yc5_5mjk.log)
Starting test(/usr/local/bin/python3): pyspark.ml.tests.test_algorithms (temp output: /tmp/usr_local_bin_python3__pyspark.ml.tests.test_algorithms__icc6xxai.log)
Starting test(/usr/local/bin/python3): pyspark.ml.tests.test_base (temp output: /tmp/usr_local_bin_python3__pyspark.ml.tests.test_base__4m6xyiv5.log)
Starting test(/usr/local/bin/python3): pyspark.ml.tests.test_evaluation (temp output: /tmp/usr_local_bin_python3__pyspark.ml.tests.test_evaluation__fkzjlfmm.log)
Finished test(/usr/local/bin/python3): pyspark.ml.tests.test_base (16s)
Starting test(/usr/local/bin/python3): pyspark.ml.tests.test_image (temp output: /tmp/usr_local_bin_python3__pyspark.ml.tests.test_image__iuckk_c0.log)
Finished test(/usr/local/bin/python3): pyspark.ml.tests.test_evaluation (20s)
Starting test(/usr/local/bin/python3): pyspark.ml.tests.test_linalg (temp output: /tmp/usr_local_bin_python3__pyspark.ml.tests.test_linalg__3tncana4.log)
...
```

Closes #34233 from attilapiros/temp_output_at_py_tests.

Authored-by: attilapiros <piros.attila.zsolt@gmail.com>
Signed-off-by: attilapiros <piros.attila.zsolt@gmail.com>
The file was modifiedpython/run-tests.py (diff)
Commit 21fa3ce1650543d5a087266be1925eb495bf2ad7 by max.gekk
[SPARK-35925][SQL] Support DayTimeIntervalType in width-bucket function

### What changes were proposed in this pull request?
Add support DayTimeIntervalType for width_bucket function.

### Why are the changes needed?
[SPARK-35925](https://issues.apache.org/jira/browse/SPARK-35925)
1. The `WIDTH_BUCKET` function assigns values to buckets (individual segments) in an equiwidth histogram.
2. DayTimeIntervalType is necessary as an input data type for `WIDTH_BUCKET`

### Does this PR introduce _any_ user-facing change?
Yes. The user can use `width_bucket` with DayTimeIntervalType.

### How was this patch tested?
Add ut test

Closes #34309 from Peng-Lei/SPARK-35925.

Authored-by: PengLei <peng.8lei@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/interval.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/interval.sql (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/interval.sql.out (diff)