Qualcomm AI Engine Direct - SeqMSE coarse-to-fine grid search#18082
Qualcomm AI Engine Direct - SeqMSE coarse-to-fine grid search#18082abhinaykukkadapu wants to merge 1 commit intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18082
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit c800be6 with merge base 103deb6 ( BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
fc3a8fb to
dadb5d0
Compare
Replace brute-force linear grid search in SeqMSE with a two-phase coarse-to-fine approach. The first phase sweeps 100 evenly-spaced steps over [0.01, 1.0], the second phase refines with remaining budget in a ±0.02 window around the coarse best. The `num_candidates` parameter now acts as a total evaluation budget: the first min(num_candidates, 100) evaluations are coarse, the rest are fine. Model configs updated from 1000 to 150 (100 coarse + 50 fine), giving ~6.5x fewer evaluations with no accuracy loss. Loss curve validation on Llama3.2-1B (113 nodes) and Qwen3-0.6B confirms curves are smooth and unimodal with optima in [0.87, 1.0], so coarse 0.01-resolution reliably lands in the basin. This PR was authored with the assistance of Claude.
dadb5d0 to
c800be6
Compare
| self.steps = torch.linspace( | ||
| 1 / num_candidates, 1, steps=num_candidates | ||
| ).tolist() | ||
| num_coarse = min(num_candidates, 100) |
There was a problem hiding this comment.
I wonder if we could introduce extra arguments like the lower_bound, upper_bound and iterate num_candidates steps within the range. I think it would be more straightforward and prevent hardcoded constants.
There was a problem hiding this comment.
num_candidates is still the right terminology as this represents overall mse steps, 150 for example. We consistently use first 100 as the coarse sweep and the remaining as fine sweep around the selected candidate.
We can expose this as num_coarse_mse_candidates, num_fine_mse_candidates but not sure if anyone uses it, as it is deep into the mse algorithmic territory.
There was a problem hiding this comment.
I see, how about using num_candidates for both coarse_steps & fine_steps for simplicity?
And maybe have the interval value 0.02 as an argument in constructor?
Replace brute-force linear grid search in SeqMSE with a two-phase coarse-to-fine approach. The first phase sweeps 100 evenly-spaced steps over [0.01, 1.0], the second phase refines with remaining budget in a ±0.02 window around the coarse best.
The
num_candidatesparameter now acts as a total evaluation budget: the first min(num_candidates, 100) evaluations are coarse, the rest are fine. With the defaultseq_mse_candidates=150, this gives 100 coarse + 50 fine steps — same total evaluations, better allocation of compute toward the optimum.See the gh issue: #18065 for more info
cc @cccclai @cbilgin