Pulse · apache/iceberg · GitHub

October 18, 2022 – October 25, 2022

Overview

49 Active pull requests

17 Active issues

31 Pull requests merged by 17 people

add Aggregate Expressions
#5961 merged Oct 25, 2022
Spark 3.2: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly
#6041 merged Oct 25, 2022
Spark 3.3: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly
#6026 merged Oct 24, 2022
API,Core: Move ScanReport to core module / extract TimerResult/CounterResult/ScanMetricsResult into own classes
#6037 merged Oct 24, 2022
Build: Bump mkdocs from 1.3.1 to 1.4.1 in /python
#6033 merged Oct 24, 2022
Core: Increase inferred column metrics limit to 100
#5916 merged Oct 23, 2022
Add section on semantic versioning and deprecations
#6032 merged Oct 23, 2022
Python: Implement S3V4RestSigner
#5969 merged Oct 21, 2022
Python: Implement select
#5966 merged Oct 21, 2022
Python: Visitor to convert Iceberg to PyArrow schema
#5949 merged Oct 21, 2022
Core: Rename TableTestBase.Assertions to not conflict with AssertJ Assertions
#6022 merged Oct 21, 2022
API: Update expression sanitization for relative dates and times
#5944 merged Oct 21, 2022
Replace and ban hamcrest usage
#6030 merged Oct 21, 2022
Replace Assert.fail usage with AssertJ fluent testing
#6029 merged Oct 21, 2022
Hive: Set the Table owner on table creation
#5763 merged Oct 21, 2022
docs:Add an example of CTAS with PARTITIONED BY (rebased, fix #3854)
#6020 merged Oct 21, 2022
Python: Split expressions base
#5987 merged Oct 21, 2022
Closes #5988 - Allow configuration of Hive MetastoreClient using Catalog properties
#5989 merged Oct 21, 2022
Core: Deprecate HTTPClientFactory / Allow configuring ObjectMapper for HTTPClient
#5998 merged Oct 21, 2022
Nessie: no longer push whole metadata JSON to Nessie
#5999 merged Oct 21, 2022
Core: Don't fail scan planning if REST metric reporting fails
#6023 merged Oct 20, 2022
[python_legacy] BOTO_STS_CLIENT lazy initialization
#5930 merged Oct 20, 2022
Core,Spark: Refactor to move "copy-on-write" and "merge-on-read" literals to constants
#6006 merged Oct 20, 2022
Python: Add support for providing SSL config for REST Catalog client.
#6019 merged Oct 20, 2022
Orc: Support row group bloom filters
#5313 merged Oct 20, 2022
Core: Parallelize the determining of reachable manifests during file cleanup
#5981 merged Oct 19, 2022
Spark 3.2: Split SparkScan and SparkBatch
#6014 merged Oct 19, 2022
Core: Fix TestSnapshotUtil time random disorder
#6015 merged Oct 19, 2022
Spark 3.2: Remove redundant imports in SparkScan
#6016 merged Oct 19, 2022
Spark 3.2: Add SparkChangelogTable
#6013 merged Oct 18, 2022
Spark 3.2: Use ScanTaskGroup methods when computing stats
#6011 merged Oct 18, 2022

18 Pull requests opened by 13 people

Core, API: Field metadata support
#6017 opened Oct 19, 2022
[Docs] Update migrate behaviour with respect to drop_table in spark-procedures docs.
#6025 opened Oct 20, 2022
Python: GlueCatalog Full Implementation
#6034 opened Oct 23, 2022
Build: Add gaborkaszab as a collaborator
#6036 opened Oct 24, 2022
Python: Fix Github pages
#6038 opened Oct 24, 2022
AWS: Add AwsKmsClient implementation
#6040 opened Oct 24, 2022
Core: Partial Update
#6043 opened Oct 25, 2022
[iceberg-hive-metastore] Add support for group ownership
#6045 opened Oct 25, 2022
Spark 3.1: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly
#6046 opened Oct 25, 2022
Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport
#6047 opened Oct 25, 2022
Docs: Fix broken link for puffin in Spec
#6048 opened Oct 25, 2022
Flink: Add Sink options to override the compression properties of the Table
#6049 opened Oct 25, 2022
Core: Improve collection handling in JsonUtil
#6051 opened Oct 25, 2022
Infra: Update slack invite link
#6052 opened Oct 25, 2022
Build: Let revapi compare API compatibility against apache-iceberg-1.0.0
#6053 opened Oct 25, 2022
Infra: Publish nightly build for Spark-3.3_2.13
#6054 opened Oct 25, 2022
Spark 3.3: Use separate scan during file filtering in copy-on-write operations
#6055 opened Oct 25, 2022
Parquet: Remove the row position since parquet row group has it natively
#6056 opened Oct 25, 2022

8 Issues closed by 3 people

Schema Evolution exception: too many data columns
#4542 closed Oct 25, 2022
HIVE_METASTORE_ERROR: Table storage descriptor is missing SerDe info - when query a view using an Iceberg table on Athena
#4549 closed Oct 25, 2022
Add min sequence number of referenced data files in a position-delete file's manifest entry
#3789 closed Oct 22, 2022
Docs: Add an example of CTAS with PARTITIONED BY
#3854 closed Oct 21, 2022
Allow configuration of HiveMetastoreClient using Catalog Properties
#5988 closed Oct 21, 2022
can not delete while iceberg sql extention set
#6024 closed Oct 20, 2022
iceberg support branch/tag, what is the difference between nessie and iceberg?
#4476 closed Oct 20, 2022
Drop managed table after drop data
#3792 closed Oct 19, 2022

9 Issues opened by 9 people

Spark3.2.2 rewriteDataFiles task yarn driver Stuck
#6050 opened Oct 25, 2022
Column pruning/projection is not happening in correlated queries (e.g Q94, Q16)
#6044 opened Oct 25, 2022
Add delete file information to partitions table
#6042 opened Oct 24, 2022
Spark : Perf enhancement by leveraging Dynamic Partition Pruning rule of spark for non partition columns used as join condition
#6039 opened Oct 24, 2022
Nessie: Switch to Nessie API v2
#6031 opened Oct 21, 2022
rewrite datafile OOM bug
#6028 opened Oct 21, 2022
metadata location wrong with hadoop HA on multi clusters
#6027 opened Oct 21, 2022
When I use flink sql to synchronize MySQL data to icerberg (hive catalog), an error is reported.
#6021 opened Oct 20, 2022
What is the expected behavior of expireOlderThan for a table with a tag that has not reached max age?
#6018 opened Oct 19, 2022

41 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Core, API: Support incremental scanning with branch
#5984 commented on Oct 25, 2022 • 24 new comments
Spark: Fix a separate table cache being created for each rewriteFiles
#5392 commented on Oct 21, 2022 • 17 new comments
Flink: Support read options in flink source
#5967 commented on Oct 25, 2022 • 17 new comments
Core: Add option to combine tasks by partition
#2276 commented on Oct 20, 2022 • 12 new comments
API: Add view interfaces
#4925 commented on Oct 25, 2022 • 12 new comments
Spark 3.3: Add a procedure to generate table changes
#6012 commented on Oct 21, 2022 • 10 new comments
Core: Add file seq number to ManifestEntry
#6002 commented on Oct 23, 2022 • 8 new comments
Data: Support reading default values from generic Avro readers
#6004 commented on Oct 23, 2022 • 7 new comments
Doc: add assume role session name doc and remove redundant spark shell examples
#5994 commented on Oct 25, 2022 • 5 new comments
Python: Fix caching of the PyArrowFileIO
#6010 commented on Oct 24, 2022 • 5 new comments
Spark: Iceberg: java.io.InvalidClassException: org.apache.iceberg.Schema; local class incompatible: stream classdesc serialVersionUID = 3320367012418887609, local class serialVersionUID = -8857144469361102787
#5970 commented on Oct 19, 2022 • 2 new comments
API/Core: Add metadata field to NestedField
#5631 commented on Oct 24, 2022 • 2 new comments
Iceberg table maintenance/compaction within AWS
#5997 commented on Oct 25, 2022 • 2 new comments
API: Add default value API
#4732 commented on Oct 21, 2022 • 2 new comments
Spark Integration to read from Snapshot ref
#5150 commented on Oct 23, 2022 • 2 new comments
Core: Use explicit JSON Parser for namespace creation request
#5968 commented on Oct 24, 2022 • 2 new comments
Orc : Bug when adding a inner struct field as partition field
#4604 commented on Oct 19, 2022 • 1 new comment
Implement rate limiting while reading stream from Iceberg table as Spark3 DSv2 source
#2789 commented on Oct 19, 2022 • 1 new comment
Quick start docker-compose demo doesn't work
#5993 commented on Oct 19, 2022 • 1 new comment
Add Checkstyle Rule to prevent Map<StructLike, ...> and Set<StructLike>
#4616 commented on Oct 20, 2022 • 1 new comment
Is there a full example for Iceberg+Flink+Minio
#3968 commented on Oct 21, 2022 • 1 new comment
Delete files not eventually removed if RewriteDataFile run right after delete (when using 'use-starting-sequence-number' default)
#4127 commented on Oct 21, 2022 • 1 new comment
IcebergGenerics.read(table) not work for most kinds of metadata tables
#4523 commented on Oct 22, 2022 • 1 new comment
pip install pyiceberg on windows require C++ to be installed
#5901 commented on Oct 22, 2022 • 1 new comment
Pyflink+Iceberg+Kinesis
#4633 commented on Oct 24, 2022 • 1 new comment
missing SetWriteDistributionAndOrdering class for spark sql plan
#4628 commented on Oct 25, 2022 • 1 new comment
Nessie: Use unique path for different table with same name
#4826 commented on Oct 24, 2022 • 1 new comment
[Core]Add EncryptionManagerFactory to configure encryption via catalog properties and table metadata.
#5539 commented on Oct 24, 2022 • 1 new comment
[Flink] Avoid submitting too many empty snapshots
#5561 commented on Oct 25, 2022 • 1 new comment
Spark: Check for hive support when using SparkSessionCatalog
#5693 commented on Oct 20, 2022 • 1 new comment
Cache dropStats result for ManifestReader iterator
#5836 commented on Oct 20, 2022 • 1 new comment
API,Core: Introduce metrics for data files by file format
#5837 commented on Oct 24, 2022 • 1 new comment
Spark: Iceberg bug 5935 fix where some methods of Spark3Util do not set current session in spark's threadlocal
#5959 commented on Oct 23, 2022 • 1 new comment
Core: Optimize the TableScanContext
#5982 commented on Oct 21, 2022 • 1 new comment
Structured Streaming writes to iceberg table with non-identity partition spec breaks with spark extensions enabled
#5625 commented on Oct 20, 2022 • 0 new comments
Spark : Spark3Util is not setting the spark session being used as active session when executing sensitive functions
#5935 commented on Oct 24, 2022 • 0 new comments
Parquet: Support parquet modular encryption
#2639 commented on Oct 24, 2022 • 0 new comments
API: Optionally ignore position deletes in rewrite validation
#4703 commented on Oct 21, 2022 • 0 new comments
Encryption integration and test
#5544 commented on Oct 24, 2022 • 0 new comments
AWS: Fix catalog names in LakeFormationTestBase
#5767 commented on Oct 20, 2022 • 0 new comments
Core: Make TableScanContext immutable
#5985 commented on Oct 19, 2022 • 0 new comments