remove redundant empty_cache in parallel forward#161
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @akaitsuki-ii, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
此拉取请求旨在通过精简并行处理中的内存管理操作来显著提升推理速度。核心改动是移除了每次前向传播后不必要的empty_cache()调用和torch.cuda.synchronize(),从而减少了计算资源的浪费和同步延迟。现在,内存清理操作仅在模块卸载时执行,确保了效率和正确性。
Highlights
- 优化推理速度: 移除了并行前向传播中冗余的empty_cache()调用,将其移至模块卸载时执行,以避免不必要的性能开销。
- 内存管理: 删除了每次迭代后的torch.cuda.synchronize()调用,进一步减少了同步开销。
- 模块加载逻辑: 调整了模块加载时的参数处理,不再对init_fn和kwargs调用to_device,简化了数据传递。
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
每次parallel forward过程中调用empty_cache会降低推理速度,只在unload_module的时候调用即可