Skip to content

Qualcomm AI Engine Direct - SeqMSE coarse-to-fine grid search#18082

Open
abhinaykukkadapu wants to merge 1 commit intopytorch:mainfrom
abhinaykukkadapu:seqmse-coarse-fine
Open

Qualcomm AI Engine Direct - SeqMSE coarse-to-fine grid search#18082
abhinaykukkadapu wants to merge 1 commit intopytorch:mainfrom
abhinaykukkadapu:seqmse-coarse-fine

Conversation

@abhinaykukkadapu
Copy link
Contributor

@abhinaykukkadapu abhinaykukkadapu commented Mar 11, 2026

Replace brute-force linear grid search in SeqMSE with a two-phase coarse-to-fine approach. The first phase sweeps 100 evenly-spaced steps over [0.01, 1.0], the second phase refines with remaining budget in a ±0.02 window around the coarse best.

The num_candidates parameter now acts as a total evaluation budget: the first min(num_candidates, 100) evaluations are coarse, the rest are fine. With the default seq_mse_candidates=150, this gives 100 coarse + 50 fine steps — same total evaluations, better allocation of compute toward the optimum.

See the gh issue: #18065 for more info

cc @cccclai @cbilgin

@abhinaykukkadapu abhinaykukkadapu added the module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ label Mar 11, 2026
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18082

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit c800be6 with merge base 103deb6 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 11, 2026
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Replace brute-force linear grid search in SeqMSE with a two-phase
coarse-to-fine approach. The first phase sweeps 100 evenly-spaced
steps over [0.01, 1.0], the second phase refines with remaining
budget in a ±0.02 window around the coarse best.

The `num_candidates` parameter now acts as a total evaluation budget:
the first min(num_candidates, 100) evaluations are coarse, the rest
are fine. Model configs updated from 1000 to 150 (100 coarse + 50
fine), giving ~6.5x fewer evaluations with no accuracy loss.

Loss curve validation on Llama3.2-1B (113 nodes) and Qwen3-0.6B
confirms curves are smooth and unimodal with optima in [0.87, 1.0],
so coarse 0.01-resolution reliably lands in the basin.

This PR was authored with the assistance of Claude.
self.steps = torch.linspace(
1 / num_candidates, 1, steps=num_candidates
).tolist()
num_coarse = min(num_candidates, 100)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could introduce extra arguments like the lower_bound, upper_bound and iterate num_candidates steps within the range. I think it would be more straightforward and prevent hardcoded constants.

Copy link
Contributor Author

@abhinaykukkadapu abhinaykukkadapu Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_candidates is still the right terminology as this represents overall mse steps, 150 for example. We consistently use first 100 as the coarse sweep and the remaining as fine sweep around the selected candidate.

We can expose this as num_coarse_mse_candidates, num_fine_mse_candidates but not sure if anyone uses it, as it is deep into the mse algorithmic territory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, how about using num_candidates for both coarse_steps & fine_steps for simplicity?
And maybe have the interval value 0.02 as an argument in constructor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants