Remap heap graph ordinals on serialization by ashkrisk · Pull Request #618 · datastax/jvector

ashkrisk · 2026-02-10T09:03:34Z

There is currently no way to alter the ordinals of the vectors in an OnHeapGraphIndex. This limitation extends to the save and load methods used for serializing and de-serializing the heap graph.

In particular, this means that any ordinal "holes" created by deleting nodes from the OnHeapGraphIndex can never be compacted away. This is different from the behavior of the disk graph writer, which not only provides a way to remap ordinals, it automatically compacts ordinals by default.

This PR adds the option to remap ordinals while serializing the graph, preventing the accumulation of ordinal holes.

Note that this is only useful for those systems which rely on repeatedly loading, mutating, and then saving the OnHeapGraphIndex, which is arguably an anti-pattern.

Add the option to avoid ordinal holes in the serialized heap graph.

github-actions · 2026-02-10T09:03:46Z

Before you submit for review:

Does your PR follow guidelines from CONTRIBUTIONS.md?
Did you summarize what this PR does clearly and concisely?
Did you include performance data for changes which may be performance impacting?
Did you include useful docs for any user-facing changes or features?
Did you include useful javadocs for developer oriented changes, explaining new concepts or key changes?
Did you trigger and review regression testing results against the base branch via Run Bench Main?
Did you adhere to the code formatting guidelines (TBD)
Did you group your changes for easy review, providing meaningful descriptions for each commit?
Did you ensure that all files contain the correct copyright header?

If you did not complete any of these, then please explain below.

jshook · 2026-02-17T19:27:46Z

Does this pose a risk for stable ordinal mapping with external system which retain their own ordinal maps? We need to identify how the through line from non-vector data remains connected to the associated vectors/embeddings in jvector. If this is already handled effectively here, we should make it very clear in documentation with examples.

ashkrisk · 2026-02-19T03:50:49Z

Short answer: no, since the client application needs to opt-in to this by calling either saveWithCompactOrdinals or the save overload which takes an OrdinalMapper as input.

Longer answer:
OnDiskGraphIndex actually does this already. If you take a graph index with missing ordinals (due to deletions) and pass them to the GraphIndexWriter, the output ordinals will be automatically re-numbered to avoid any ordinal "holes" in the saved graph.

Say the application creates a graph with some deleted ordinals. Then it does something like this:

OnHeapGraphIndex graph = ...;
var writer = new GraphIndexWriter.Builder(graph, ...) (...) .build();  // no OrdinalMapper supplied
writer.write(...);
graph.save(...);

This saves two files. One corresponding to the OnDiskGraphIndex, and the other corresponding to the serialized OnHeapGraphIndex. Since the disk graph ordinals are automatically re-numbered, but the heap graph's ordinals are not, these two sets of ordinals are not consistent with each other. A bug related to this was recently fixed in Opensearch.

Additionally, since the heap graph serialization process does not provide an option to pass in a mapper, the application has no choice but to live with the ordinal mismatch and compensate for it by introducing additional code complexity.

This PR doesn't attempt to change the disk graph's existing philosophy of ordinal re-numbering during write, it simply adds the same functionality to the heap graph serialization. The key difference is that the user has to "opt-in" to ordinal compaction to prevent nasty surprises in existing code. While I agree that the whole ordinal mapping system could do with more thorough documentation, that's not within the scope of this PR and should be handled separately.

Remap heap graph ordinals on serialization

449369e

Add the option to avoid ordinal holes in the serialized heap graph.

ashkrisk requested review from MarkWolters, jshook and tlwillke as code owners February 10, 2026 09:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Remap heap graph ordinals on serialization#618

Remap heap graph ordinals on serialization#618
ashkrisk wants to merge 1 commit intomainfrom
ohg-serde-ord-map

ashkrisk commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

jshook commented Feb 17, 2026

Uh oh!

ashkrisk commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

ashkrisk commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

jshook commented Feb 17, 2026

Uh oh!

ashkrisk commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants