Skip to content

More ESE cursor and NTDS query improvements#39

Open
Schamper wants to merge 9 commits intomainfrom
ese-index
Open

More ESE cursor and NTDS query improvements#39
Schamper wants to merge 9 commits intomainfrom
ese-index

Conversation

@Schamper
Copy link
Member

Model the cursor implementation a little bit more like the one in dissect.apfs. Should give a little bit more flexibility and hopefully make the implementation a little bit easier to understand.

Also improve NTDS querying to allow for wildcard prefixes and missing indexes.

@Schamper
Copy link
Member Author

@Matthijsy this should address at least one of the issues you raised in #38. Let me know if this improves things for you.

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 19, 2026

Merging this PR will not alter performance

✅ 6 untouched benchmarks


Comparing ese-index (3899847) with main (dee2182)

Open in CodSpeed

@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 0% with 171 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (dee2182) to head (a2899f1).

Files with missing lines Patch % Lines
dissect/database/ese/cursor.py 0.00% 123 Missing ⚠️
dissect/database/ese/ntds/query.py 0.00% 36 Missing ⚠️
dissect/database/ese/table.py 0.00% 6 Missing ⚠️
dissect/database/ese/ntds/database.py 0.00% 3 Missing ⚠️
dissect/database/ese/page.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main     #39   +/-   ##
=====================================
  Coverage   0.00%   0.00%           
=====================================
  Files        151     150    -1     
  Lines       4163    4186   +23     
=====================================
- Misses      4163    4186   +23     
Flag Coverage Δ
unittests 0.00% <0.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Matthijsy
Copy link
Contributor

@Matthijsy this should address at least one of the issues you raised in #38. Let me know if this improves things for you.

Yes the Index for attribute 'ATTc0' not found in the NTDS database problem has been resolved after this PR indeed

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the ESE database cursor implementation to be more similar to dissect.apfs, replacing the BTree class with a new RawCursor class that provides better control over B+Tree navigation. It also enhances NTDS query capabilities to handle wildcard patterns (including prefix wildcards like *value and complex patterns like h*d*r) and gracefully handles missing indexes by falling back to table scans instead of raising errors.

Changes:

  • Replaced BTree class with RawCursor class for more flexible B+Tree navigation
  • Enhanced NTDS query support for complex wildcard patterns and missing indexes
  • Improved error messages and debugging information in page representations

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
dissect/database/ese/btree.py Removed entire BTree class implementation
dissect/database/ese/cursor.py Added new RawCursor class with stack-based B+Tree navigation; updated Cursor to use RawCursor
dissect/database/ese/table.py Migrated from BTree to RawCursor for long value retrieval
dissect/database/ese/page.py Enhanced repr with more debug info; improved error messages
dissect/database/ese/ntds/query.py Added support for complex wildcards and missing indexes; improved wildcard matching with regex
tests/ese/ntds/test_query.py Updated tests for new wildcard patterns and missing index behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Schamper
Copy link
Member Author

Schamper commented Feb 24, 2026

This PR now also incorporates an important fix where if an object had exactly one child, it would not return that when calling .children().

Copy link
Contributor

@twiggler twiggler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First wave, perhaps more comments later today or monday.

break


class RawCursor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving to rawcursor.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered it but I think for the time being I'd like them to be together.

exact: Whether to only return successfully on an exact match.

Returns:
The node number of the first node that's greater than or equal to the key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When exact=False and a key which is strictly greater than every node, we return lo, which is strictly less.

Not a bug, but an inaccuracy in the docstring

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate a little more?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that clears it up!

Copy link
Contributor

@twiggler twiggler Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is just a tiny mismatch in the docstring for one specific boundary condition.

Here is the scenario where the description doesn't quite match the output:
The Edge Case: Searching for an Oversized Key

(ai-generated example)
Imagine we have a page with four node keys: [10, 20, 30, 40].
Now, let's say the search key is 50, and exact=False.

The Loop: The while lo < hi: loop runs. Because 50 is greater than every mid value, the code repeatedly executes lo = mid + 1.

The Exit: This continues until lo reaches the very last index of the page (index 3, where the key is 40). At this point, lo == hi, and the loop exits.

The Return: The code does its final comparison. Since exact is False, the function simply returns lo (which is 3).

The Disconnect

The function returned the node at index 3 (key 40).

The docstring states the function returns: "The node number of the first node that's greater than or equal to the key."

However, 40 is strictly less than the search key of 50.

The logic does exactly what it needs to do (landing on the highest possible bound to route the search down the correct right-most branch). It's a perfectly valid fallback, but the docstring currently doesn't mention this right-side boundary outcome.
Suggested Docstring Tweak

Updating the Returns: section to something like this would clear it right up:

The node number of the first node that is greater than or equal to the key. If the search key is greater than all nodes on the page, it returns the index of the last node.

@Schamper
Copy link
Member Author

This PR now also incorporates an important fix where if an object had exactly one child, it would not return that when calling .children().

Note that this causes a regression in ADAM parsing (infinite loop) but that's because the way we resolve the root domain is shoddy. #43 fixes this, I'll move the child fix to that PR.

@Schamper Schamper requested a review from twiggler February 27, 2026 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants