[HUDI-7384] Secondary index support #10625

bhat-vinay · 2024-02-05T15:38:57Z

Initial commit. Supports the following features:

Modify schema to add secondary index to metadata
New partition type in the metadata table to store secondary_keys-to-record_keys mapping
Various options to support secondary index enablement, column mappings (for secondary keys) etc
Initialization of secondary keys
Update secondary keys on inserts/upsert/deletes
Add hooks in HoodieFileIndex to prune candidate files (to scan) based on secondary key column filters.
Add ability in HoodieMergedLogRecordScanner to buffer non-unique key (i.e secondary key) records and merge 'similar' records
Add support for merging secondary index records (from delta log files and base files)
Ability to merge secondary index records across a group of log-files and across log-file/base-file

Limitations:

Supports only one secondary index at the moment.
Scanning of the secondary index partition is done sequentially (both on the query side and the index-maintainance side)

Pending items:

Integrate with compaction
Handle rollback
Cleanup existing tests and add more

Change Logs

Initial commit. Supports the following features:

Modify schema to add secondary index to metadata
New partition type in the metadata table to store secondary_keys-to-record_keys mapping
Various options to support secondary index enablement, column mappings (for secondary keys) etc
Initialization of secondary keys
Update secondary keys on inserts/upsert/deletes
Add hooks in HoodieFileIndex to prune candidate files (to scan) based on secondary key column filters.
Add ability in HoodieMergedLogRecordScanner to buffer non-unique key (i.e secondary key) records and merge 'similar' records
Add support for merging secondary index records (from delta log files and base files)
Ability to merge secondary index records across a group of log-files and across log-file/base-file

Limitations:

Supports only one secondary index at the moment.
Scanning of the secondary index partition is done sequentially (both on the query side and the index-maintainance side)

Pending items:

Integrate with compaction
Handle rollback
Cleanup existing tests and add more

Impact

Support secondary index on columns (similar to record index, but for non-unique columns)

Risk level (write none, low medium or high below)

Medium. New and existing tests

Documentation Update

NA. Will be done later

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

bhat-vinay · 2024-02-22T06:52:12Z

Rebase and resolve conflicts. Fix a bug related to MOR tables with secondary index.

bhat-vinay · 2024-02-22T19:40:51Z

Moved away from using HoodieUnMergedLogRecordScanner. Added new buffer in HoodieMergedLogRecordScanner (based on SpillableDiskMap) to handle non-unique keys (secondary keys)

bhat-vinay · 2024-02-22T19:42:05Z

hudi-common/src/main/avro/HoodieMetadata.avsc

+                            "doc": "Refers to the record key that this secondary key maps to"
+                        },
+                        {
+                            "name": "isDeleted",


This field acts as. tombstone marker

bhat-vinay · 2024-02-22T19:44:51Z

hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java

@@ -240,6 +255,11 @@ public static HoodieMergedLogRecordScanner.Builder newBuilder() {

  @Override
  public <T> void processNextRecord(HoodieRecord<T> newRecord) throws IOException {
+    if (logContainsNonUniqueKeys) {


If the log files are for partitions that can have non-unique keys, then this logic makes use of the new map to buffer the scanned records.

bhat-vinay · 2024-02-22T19:48:07Z

hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java

+    String key = newRecord.getRecordKey();
+    HoodieMetadataPayload newPayload = (HoodieMetadataPayload) newRecord.getData();
+
+    // The rules for merging the prevRecord and the latestRecord is noted below. Note that this only applies for SecondaryIndex


The crux of the merging logic is here. The main issue with using the existing preCombine(...) method is that it returns 'either-or' i.e chooses only one record. Changing the API was a little tedious- - hence this approach of moving the merge logic directly in the scanner. @vinothchandar @codope.

hmmm. we should be using the combineAndGet... method of the MetadataPayload?

combineAndGet() (which internally calls preCombine()), is an either-or operation i.e the caller should ensure that the prevRecord and newRecord are similar. For secondary-index, this similarity depends on the payload of the record (HoodieMetadataPayload) - maybe I can add a new API in HoodieRecordPayload which is implemented by HoodieMetadataPayload and avoid exposing HoodieMetadataPayload to this layer?

bhat-vinay · 2024-02-22T19:49:22Z

hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java

+
+    // TODO: Merger need not be called here as the merging logic is handled explicitly in this function.
+    // Retain until Secondary Index feature is tested and stabilized
+    HoodieRecord<T> combinedRecord = (HoodieRecord<T>) recordMerger.merge(prevRecord, readerSchema,


Will remove this later. The merging logic does not really rely on the merger (because of the comment made earlier).

bhat-vinay · 2024-02-22T19:50:54Z

hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieUnMergedLogRecordScanner.java

@@ -33,6 +33,7 @@
 import org.apache.hadoop.fs.FileSystem;



Please ignore the changes in this file. Will revert in the next upload.

bhat-vinay · 2024-02-22T19:53:36Z

hudi-common/src/main/java/org/apache/hudi/keygen/constant/KeyGeneratorOptions.java

@@ -54,6 +54,13 @@ public class KeyGeneratorOptions extends HoodieConfig {
          + "Actual value will be obtained by invoking .toString() on the field value. Nested fields can be specified using\n"
          + "the dot notation eg: `a.b.c`");

+  public static final ConfigProperty<String> SECONDARYKEY_FIELD_NAME = ConfigProperty


One of the current limitations - only supports one secondary index (per table). Will remove the limitation once the functionality is working end-end. Thinking of using the same approach as functional index (different partition for different secondary index based on a config file/json)

bhat-vinay · 2024-02-22T20:04:45Z

hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java

@@ -402,6 +422,31 @@ public HoodieMetadataPayload preCombine(HoodieMetadataPayload previousRecord) {
        // 2. A key moved to a different file due to clustering

        // No need to merge with previous record index, always pick the latest payload.
+        return this;
+      case METADATA_TYPE_SECONDARY_INDEX:
+        // TODO: This block and checks are just for validation and to detecte all callers.


This is here mainly for asserting and will be removed later. Ideally all merging logic of secondary (or non-unique-key) index records need to be handled at upper layers directly (when the scanner is running the scan)

bhat-vinay · 2024-02-22T20:07:16Z

hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java

+        .withFileSystem(metaClient.getFs())
+        .withBasePath(basePath)
+        .withLogFilePaths(logFilePaths)
+        .withReaderSchema(tableSchema)


Will change to filter only the required columns. Could not find a easy way/API to get that yet.

bhat-vinay · 2024-02-22T20:09:27Z

...ark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestSecondaryIndex.scala

@@ -0,0 +1,222 @@
+/*


Please ignore the repetive/duplicated code blocks. Will change once the functionality is ready and subsequently add more tests.

bhat-vinay · 2024-02-22T20:12:32Z

...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java

+    // the secondary index partition for each of these keys. For a commit which is deleting/updating a lot of records, this
+    // operation is going to be expensive (in CPU, memory and IO)
+    List<String> keysToRemove = new ArrayList<>();
+    writeStatus.collectAsList().forEach(status -> {


This needs to be fixed. If the design is changed so that WriteStatus includes the (record-key, old-secondary-key, new-secondary-key) then this needs to change anyways.

this collect can OOM right?

Yes, the logic here is probably going to change if one uses the WriteStatus to hold the (old-secondary-key, new-secondary-key) pair. Hence did not think of optimising here yet.

bhat-vinay · 2024-02-22T20:14:33Z

...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java

+    // Reuse record index parallelism config to build secondary index
+    int parallelism = Math.min(partitionFilePairs.size(), dataWriteConfig.getMetadataConfig().getRecordIndexMaxParallelism());
+
+    return deletedRecords.union(readSecondaryKeysFromFiles(


This is convoluted. deletedRecords are the tombstone records. For correctness, this tombstone records should be emitted before the regular records. Could not find anywhere in the doc if this ordering is preserved? But noticed in the tests that it is.

vinothchandar

Initial comments. only half way there in review.

vinothchandar · 2024-02-27T20:57:26Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

@@ -2493,6 +2493,10 @@ public boolean isLogCompactionEnabledOnMetadata() {
    return getBoolean(HoodieMetadataConfig.ENABLE_LOG_COMPACTION_ON_METADATA_TABLE);
  }

+  public boolean isSecondaryIndexEnabled() {


nts: revisit if this is needed or we need to consolidate using the record level index flag

vinothchandar · 2024-02-27T20:58:57Z

...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java

+
+      // Enable secondary index only iff record index is enabled
+      if (dataWriteConfig.isSecondaryIndexEnabled() || dataMetaClient.getTableConfig().isMetadataPartitionAvailable(SECONDARY_INDEX)) {
+        this.enabledPartitionTypes.add(SECONDARY_INDEX);


nts: rename SECONDARY_INDEX ? technically the record index itself is an unique secondary index. This is just a non-unique secondary index.

vinothchandar · 2024-02-27T20:59:31Z

...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java

+        dataMetaClient,
+        EngineType.SPARK);
+
+    // Initialize the file groups - using the same estimation logic as that of record index


scope for code reuse?

vinothchandar · 2024-02-28T01:43:51Z

...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java

+    fsView.loadAllPartitions();
+
+    List<Pair<String, FileSlice>> partitionFileSlicePairs = new ArrayList<>();
+    partitions.forEach(partition -> {


would n't this take a long time, if done on the driver?

This is actually borrowed from how the existing indexes/partitions are built in metadata table (like RLI partion in initializeRecordIndexPartition())

vinothchandar · 2024-02-28T01:44:39Z

...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java

+        EngineType.SPARK);
+
+    // Initialize the file groups - using the same estimation logic as that of record index
+    final int fileGroupCount = HoodieTableMetadataUtil.estimateFileGroupCount(RECORD_INDEX, records.count(),


this is going to compute records twice? no?

vinothchandar · 2024-02-28T01:54:06Z

hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java

+      this.nonUniqueKeyRecords = new ExternalSpillableMap<>(maxMemorySizeInBytes, spillableMapBasePath, new DefaultSizeEstimator(),
+          new HoodieRecordSizeEstimator(readerSchema), diskMapType, isBitCaskDiskMapCompressionEnabled);
+
+      if (logFilePaths.size() > 0 && HoodieTableMetadata.isMetadataTableSecondaryIndexPartition(basePath, partitionName)) {


this layer cannot be aware of metadata table.

Can this all be hidden inside a method (still in this layer) - there needs to be some way to determine if the logs can have non-unique-keys. Initial implementation had it one layer above (i.e the callers instantiating HoodieMergedLogRecordScanner passing in the flag), but having it here looked cleaner.

vinothchandar · 2024-02-28T01:55:19Z

hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java

+    String key = newRecord.getRecordKey();
+    HoodieMetadataPayload newPayload = (HoodieMetadataPayload) newRecord.getData();
+
+    // The rules for merging the prevRecord and the latestRecord is noted below. Note that this only applies for SecondaryIndex


hmmm. we should be using the combineAndGet... method of the MetadataPayload?

vinothchandar · 2024-02-28T01:56:06Z

hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java

@@ -269,6 +350,68 @@ public <T> void processNextRecord(HoodieRecord<T> newRecord) throws IOException
    }
  }

+  private <T> void processNextNonUniqueKeyRecord(HoodieRecord<T> newRecord) throws IOException {


I think we should probably subclass /find some way to avoid having this code right inside this class?

hudi-common/src/main/java/org/apache/hudi/common/table/log/LogFileIterator.java

vinothchandar · 2024-02-28T01:56:43Z

hudi-common/src/main/java/org/apache/hudi/keygen/constant/KeyGeneratorOptions.java

@@ -54,6 +54,13 @@ public class KeyGeneratorOptions extends HoodieConfig {
          + "Actual value will be obtained by invoking .toString() on the field value. Nested fields can be specified using\n"
          + "the dot notation eg: `a.b.c`");

+  public static final ConfigProperty<String> SECONDARYKEY_FIELD_NAME = ConfigProperty


Initial commit. Supports the following features: 1. Modify schema to add secondary index to metadata 2. New partition type in the metadata table to store secondary_keys-to-record_keys mapping 3. Various options to support secondary index enablement, column mappings (for secondary keys) etc 4. Initialization of secondary keys 5. Update secondary keys on inserts/upsert/deletes 6. Add hooks in HoodieFileIndex to prune candidate files (to scan) based on secondary key column filters. 7. Add ability in HoodieMergedLogRecordScanner to buffer non-unique key (i.e secondary key) records and merge 'similar' records 8. Add support for merging secondary index records (from delta log files and base files) 9. Ability to merge secondary index records across a group of log-files and across log-file/base-file Limitations: 1. Supports only one secondary index at the moment. 2. Scanning of the secondary index partition is done sequentially (both on the query side and the index-maintainance side) Pending items: 1. Integrate with compaction 2. Handle rollback 3. Cleanup existing tests and add more

hudi-bot · 2024-03-05T12:07:26Z

CI report:

32d4469 Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

skyshineb · 2024-06-08T10:55:09Z

Hi @bhat-vinay! Is this design of secondary index through MDT is the only one to be implemented or there plans to make some other Index Types? As I remember there was RFC for Lucene Index and maybe some other types in future?

bhat-vinay · 2024-06-11T13:16:19Z

Hi @bhat-vinay! Is this design of secondary index through MDT is the only one to be implemented or there plans to make some other Index Types? As I remember there was RFC for Lucene Index and maybe some other types in future?

Please get in touch with @codope for the latest update on this. AFAIK, lucene based secondary index is not planned at this time and MDT based secondary index is the one being developed.

codope · 2024-06-11T17:29:49Z

Hi @skyshineb , we do plan to add more index types. If you are interested in contributing to lucene based secondary index, I can help you to get started with multi-modal indexing framework.

skyshineb · 2024-06-23T09:37:26Z

hi @codope! I planned to test this MDT implementation and the Lucene(one which I took from previous SI attempt and finished myself). And figure out is it profitable to use Lucene or not. But why this PR got closed?

bhat-vinay force-pushed the secondary-index-write branch from 264059f to 50f2165 Compare February 12, 2024 17:29

vinothchandar self-assigned this Feb 13, 2024

bhat-vinay force-pushed the secondary-index-write branch 2 times, most recently from 804d739 to 1a69c6a Compare February 20, 2024 15:57

bhat-vinay changed the title ~~[HUDI-7384][secondary-index][write] build secondary index on the keys…~~ [HUDI-7384] [HUDI-7405] [secondary-index] Secondary index support Feb 20, 2024

bhat-vinay mentioned this pull request Feb 20, 2024

[WIP] Secondary index read #10657

Closed

4 tasks

bhat-vinay force-pushed the secondary-index-write branch 2 times, most recently from aceed21 to 2656b3e Compare February 22, 2024 06:48

bhat-vinay force-pushed the secondary-index-write branch from 2656b3e to a65a927 Compare February 22, 2024 19:34

bhat-vinay changed the title ~~[HUDI-7384] [HUDI-7405] [secondary-index] Secondary index support~~ [HUDI-7384] Secondary index support Feb 22, 2024

bhat-vinay commented Feb 22, 2024

View reviewed changes

github-actions bot added the size:XL PR with lines of changes > 1000 label Feb 26, 2024

bhat-vinay force-pushed the secondary-index-write branch from a65a927 to 890599a Compare February 27, 2024 06:09

vinothchandar reviewed Feb 28, 2024

View reviewed changes

bhat-vinay force-pushed the secondary-index-write branch from 890599a to 32d4469 Compare March 5, 2024 09:37

yihua added release-1.0.0 index labels Mar 26, 2024

codope closed this Jun 11, 2024

[HUDI-7384] Secondary index support #10625

[HUDI-7384] Secondary index support #10625

Conversation

bhat-vinay commented Feb 5, 2024 • edited Loading

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

bhat-vinay commented Feb 22, 2024

bhat-vinay commented Feb 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vinothchandar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhat-vinay Feb 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hudi-bot commented Mar 5, 2024

CI report:

skyshineb commented Jun 8, 2024

bhat-vinay commented Jun 11, 2024

codope commented Jun 11, 2024

skyshineb commented Jun 23, 2024

bhat-vinay commented Feb 5, 2024 •

edited

Loading

bhat-vinay Feb 28, 2024 •

edited

Loading