Conversation
Add the option to avoid ordinal holes in the serialized heap graph.
|
Before you submit for review:
If you did not complete any of these, then please explain below. |
|
Does this pose a risk for stable ordinal mapping with external system which retain their own ordinal maps? We need to identify how the through line from non-vector data remains connected to the associated vectors/embeddings in jvector. If this is already handled effectively here, we should make it very clear in documentation with examples. |
|
Short answer: no, since the client application needs to opt-in to this by calling either Longer answer: Say the application creates a graph with some deleted ordinals. Then it does something like this: OnHeapGraphIndex graph = ...;
var writer = new GraphIndexWriter.Builder(graph, ...) (...) .build(); // no OrdinalMapper supplied
writer.write(...);
graph.save(...);This saves two files. One corresponding to the Additionally, since the heap graph serialization process does not provide an option to pass in a mapper, the application has no choice but to live with the ordinal mismatch and compensate for it by introducing additional code complexity. This PR doesn't attempt to change the disk graph's existing philosophy of ordinal re-numbering during write, it simply adds the same functionality to the heap graph serialization. The key difference is that the user has to "opt-in" to ordinal compaction to prevent nasty surprises in existing code. While I agree that the whole ordinal mapping system could do with more thorough documentation, that's not within the scope of this PR and should be handled separately. |
There is currently no way to alter the ordinals of the vectors in an
OnHeapGraphIndex. This limitation extends to thesaveandloadmethods used for serializing and de-serializing the heap graph.In particular, this means that any ordinal "holes" created by deleting nodes from the
OnHeapGraphIndexcan never be compacted away. This is different from the behavior of the disk graph writer, which not only provides a way to remap ordinals, it automatically compacts ordinals by default.This PR adds the option to remap ordinals while serializing the graph, preventing the accumulation of ordinal holes.
Note that this is only useful for those systems which rely on repeatedly loading, mutating, and then saving the
OnHeapGraphIndex, which is arguably an anti-pattern.