-
-
Notifications
You must be signed in to change notification settings - Fork 34.1k
gh-140009: Optimize dict.items() symmetric difference via PyTuple_FromArray
#144771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-140009: Optimize dict.items() symmetric difference via PyTuple_FromArray
#144771
Conversation
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
PyTuple_FromArray in dict.items() symmetric difference
714fb11 to
451bec2
Compare
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
PyTuple_FromArray in dict.items() symmetric differencedict.items() symmetric difference via PyTuple_FromArray
|
@andrewloux Your own benchmarks show this change is performance neutral. Unless there are additional result that show why this change is a significant improvement, I suggest we close this. (I do believe this is a tiny net improvement, but in general we avoid making such small changes to reduce churn and potential unforeseen issues) |
Yup, totally makes sense - let's close this 👍🏽 Thanks @eendebakpt |
Summary
This PR replaces
PyTuple_PackwithPyTuple_FromArrayinObjects/dictobject.cwithin thedictitems_xor_lock_heldfunction.By avoiding the variadic argument (
va_args) processing overhead ofPyTuple_Pack, we reduce the per-item cost of symmetric difference operations (dict.items() ^ dict.items()) that involve value mismatches. The change uses a stack-allocated array to pass arguments directly to the tuple constructor.Benchmarks (PGO+LTO)
Validated using
pyperfin--rigorousmode on a full production build.--enable-optimizations --with-lto--rigorousmode)upstream/mainpytuple-dictitems-xor-fromarray(714fb11)dict_items_xor_overlap_neqdict_items_xor_disjointdict_items_xor_overlap_equal_controlGeometric mean: 1.00x faster (1.01x on target path)
Repro commands
Analysis
The
dict_items_xor_overlap_neqworkload specifically exercises the modified path by comparing dictionaries with overlapping keys but unequal values, triggering a tuple creation for every mismatched entry.While the aggregate effect is a micro-optimization, the results show a consistent improvement on the target path with a reduction in variance (±4.6 ms → ±2.5 ms) across multiple runs. Control workloads (
equalanddisjoint) remain neutral, confirming no regressions in non-target dictionary shapes.