gh-109329: Support for basic pystats for Tier 2#109913
gh-109329: Support for basic pystats for Tier 2#109913brandtbucher merged 17 commits intopython:mainfrom
Conversation
brandtbucher
left a comment
There was a problem hiding this comment.
This is so great to have, thanks! We can tweak the collection and output further if need be, but this is already going to be so much more insightful!
Just a couple of suggestions, but otherwise I'll plan on merging tomorrow.
Include/internal/pycore_code.h
Outdated
| #define OPT_STAT_INC(name) do { if (_Py_stats) _Py_stats->optimization_stats.name++; } while (0) | ||
| #define UOP_EXE_INC(opname) do { if (_Py_stats) _Py_stats->optimization_stats.opcode[opname].execution_count++; } while (0) | ||
| #define OPT_UNSUPPORTED_OPCODE(opname) do { if (_Py_stats) _Py_stats->optimization_stats.unsupported_opcode[opname]++; } while (0) | ||
| #define OPT_HIST(length, name) do { if (_Py_stats) _Py_stats->optimization_stats.name[_Py_bit_length((length - 1) >> 2)]++; } while (0) |
There was a problem hiding this comment.
If I understand correctly, there is a minimum trace size of 2, so the first two power-of-two buckets (for 0 and 1) would always be empty, so might as well just not support them (that's the >> 2). Then I also think it's easier to reason about buckets that have an inclusive maximum of a round number (that's the - 1).
Include/cpython/pystats.h
Outdated
| uint64_t execution_count; | ||
| } UOpStats; | ||
|
|
||
| #define _Py_UOP_HIST_SIZE 5 |
There was a problem hiding this comment.
We'll probably want more buckets than this (since execution lengths for closed loops will get pretty big), but it's probably fine for now. We can just bump it whenever.
There was a problem hiding this comment.
I thought the max trace length was 64 (_Py_UOP_MAX_TRACE_LENGTH). Can they be longer? (It's quite possible I don't understand the code). In any event, I should probably add a comment here that this should be log2 of whatever the expected maximum length is.
There was a problem hiding this comment.
Can they be longer? (It's quite possible I don't understand the code).
Not statically, but dynamically they can be quite long (imagine executing a trace for a closed loop hundreds of times).
So we should probably modify the code that adds to the histogram to be Py_MAX(_Py_UOP_HIST_SIZE - 1, _Py_bit_length((length - 1) >> _Py_UOP_HIST_BIAS)), so that anything overflowing the histogram ends up in the largest bucket instead of writing past the end of the array.
There was a problem hiding this comment.
Yeah, that's a good idea to add the range-checking.
Also, this made me realize I was counting the wrong thing -- I was recording the "instruction pointer at exit" to be the execution length. But (after discussing in person) it seems what we want is the number of uops executed per execution of the trace. I've updated this to report that instead.
Python/specialize.c
Outdated
| print_histogram(FILE *out, const char *name, uint64_t hist[_Py_UOP_HIST_SIZE]) | ||
| { | ||
| for (int i = 0; i < _Py_UOP_HIST_SIZE; i++) { | ||
| fprintf(out, "%s[%d]: %" PRIu64 "\n", name, (1 << (i + 2)), hist[i]); |
There was a problem hiding this comment.
Why i + 2? Is it to correct for the >> 2 when recording the stats?
Oh, I think I see. We're starting with the "third" bucket, since tiny traces are basically nonexistent. Maybe add another constant next to _Py_UOP_HIST_SIZE (like #define _Py_UOP_HIST_BIAS 2 or something) so it's clearer why we're doing this and where.
There was a problem hiding this comment.
Yes, that's right. Adding the #define for bias makes sense.
Tools/scripts/summarize_stats.py
Outdated
| if not key.startswith(prefix): | ||
| continue | ||
| name, _, rest = key[7:].partition("]") | ||
| name, _, rest = key[len(prefix) + 2:].partition("]") |
There was a problem hiding this comment.
Is this correct? The old value was 7, but len("opcode") + 2 is 8.
There was a problem hiding this comment.
Good catch. It seems to not matter, but might as well revert it just in case.
Tools/scripts/summarize_stats.py
Outdated
| def _emit_comparative_execution_counts(base_rows, head_rows): | ||
| base_data = dict((x[0], x[1:]) for x in base_rows) | ||
| head_data = dict((x[0], x[1:]) for x in head_rows) | ||
| opcodes = set(base_data.keys()) | set(head_data.keys()) |
There was a problem hiding this comment.
Dictionary keys support set operations:
| opcodes = set(base_data.keys()) | set(head_data.keys()) | |
| opcodes = base_data.keys() | head_data.keys() |
|
|
||
| rows = [] | ||
| default = [0, "0.0%", "0.0%", 0] | ||
| for opcode in opcodes: |
There was a problem hiding this comment.
Maybe give these a deterministic (and somewhat meaningful) order?
| for opcode in opcodes: | |
| for opcode in sorted(opcodes): |
There was a problem hiding this comment.
We could, be we end up sorting by the counts a few lines below anyway...
Tools/scripts/summarize_stats.py
Outdated
| (opcode, base_entry[0], head_entry[0], | ||
| f"{100*change:0.1f}%")) | ||
|
|
||
| rows.sort(key=lambda x: -abs(percentage_to_float(x[-1]))) |
There was a problem hiding this comment.
| rows.sort(key=lambda x: -abs(percentage_to_float(x[-1]))) | |
| rows.sort(key=lambda x: abs(percentage_to_float(x[-1])), reverse=True) |
|
(Also, it looks like CI caught some trailing whitespace.) |
This just implements a set of tier 2 pystats mentioned in #109329.
This implements:
It also fixes a "bug" where specialization "hits" in the running Tier 2 interpreter would count against the Tier 1 stats.
Not implemented (since it will require reworking of DEOPT_IF calls):
Example output (for nbody benchmark):
Optimization (Tier 2) stats
statistics about the Tier 2 optimizer
Overall stats
overall stats
Trace length histogram
Optimized trace length histogram
Trace run length histogram
Uop stats
uop stats
Unsupported opcodes
unsupported opcodes