Conversation
Summary of ChangesHello @Glaceon-Hyy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the capabilities of the Qwen image generation model by integrating advanced image control features, specifically 'in-context' and 'eligen' control types. These additions allow users to guide image generation with greater precision, leveraging input images or entity-specific prompts and masks. The changes also include enhancements to the noise scheduling and LoRA application mechanisms, enabling dynamic configuration adjustments to support these new control modalities. A new example and comprehensive tests have been added to demonstrate and ensure the robustness of these new features. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a powerful image control feature for the Qwen model, supporting both "in-context" and "eligen" control types. This is achieved by adding new configuration classes, extending the LoRA loading mechanism to handle scheduler configurations, and significantly updating the Qwen DiT model and pipeline to process control inputs like context latents and entity masks. The changes are well-structured, and a new example for the eligen feature is included. My review focuses on improving type safety, fixing a potential bug in the noise scheduler, ensuring code consistency, and enhancing test coverage for the new features.
|
|
||
| @dataclass | ||
| class QwenImageControlNetParams: | ||
| image: ImageType |
There was a problem hiding this comment.
The image attribute of QwenImageControlNetParams is typed as ImageType, but the new eligen control type allows it to be None (as seen in examples/qwen_image_eligen.py). This is a type hint violation and can lead to confusion or errors.
To accurately reflect its usage, the type hint should be updated to indicate that image is optional.
| image: ImageType | |
| image: Optional[ImageType] |
| image = self.proj_out(image) | ||
| if edit is not None: | ||
| image = image[:, :image_seq_len] | ||
| image = image[:, :image_seq_len] |
There was a problem hiding this comment.
The logic for slicing the image tensor was previously conditional on edit is not None. Making it unconditional is a good simplification, as image_seq_len correctly represents the sequence length of the original image latents before any concatenation. However, this change is only correct if context_latents is also handled as an addition that needs to be sliced off. The current implementation correctly concatenates context_latents and edit to image, and this unconditional slice correctly extracts the original image part. This is a good simplification.
| lora_args["alpha"] = alpha | ||
|
|
||
| key = key.replace(f".{lora_a_suffix}", "") | ||
| key = key.replace("base_model.model.", "") |
| # FastSafetensors不直接支持metadata,需要用标准safetensors获取 | ||
| from safetensors import safe_open |
There was a problem hiding this comment.
The comment here is in Chinese, which is inconsistent with the rest of the codebase being in English. For better maintainability and to ensure all developers can understand the code, please write comments in English.
| # FastSafetensors不直接支持metadata,需要用标准safetensors获取 | |
| from safetensors import safe_open | |
| # FastSafetensors does not directly support metadata, so we need to use standard safetensors to get it. | |
| from safetensors import safe_open |
| from PIL import Image | ||
| from diffsynth_engine import ( | ||
| fetch_model, | ||
| QwenImagePipeline, | ||
| QwenImagePipelineConfig, | ||
| QwenImageControlNetParams, | ||
| QwenImageControlType, | ||
| ) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| config = QwenImagePipelineConfig.basic_config( | ||
| model_path=fetch_model("MusePublic/Qwen-image", revision="v1", path="transformer/*.safetensors"), | ||
| encoder_path=fetch_model("MusePublic/Qwen-image", revision="v1", path="text_encoder/*.safetensors"), | ||
| vae_path=fetch_model("MusePublic/Qwen-image", revision="v1", path="vae/*.safetensors"), | ||
| ) | ||
| pipe = QwenImagePipeline.from_pretrained(config) | ||
| param = QwenImageControlNetParams( | ||
| control_type=QwenImageControlType.eligen, | ||
| image=None, | ||
| scale=1.0, | ||
| model=fetch_model("DiffSynth-Studio/Qwen-Image-EliGen-V2", path="model.safetensors"), | ||
| ) | ||
|
|
||
| prompt = "写实摄影风格, 细节丰富。街头一位漂亮的女孩,穿着衬衫和短裤,手持写有“实体控制”的标牌,背景是繁忙的城市街道,阳光明媚,行人匆匆。" | ||
| negative_prompt = "网格化,规则的网格,模糊, 低分辨率, 低质量, 变形, 畸形, 错误的解剖学, 变形的手, 变形的身体, 变形的脸, 变形的头发, 变形的眼睛, 变形的嘴巴" | ||
| entity_prompts = ["一个漂亮的女孩", "标牌 '实体控制'", "短裤", "衬衫"] | ||
| entity_masks = [Image.open(f"input/qwen_image_eligen/{i}.png").convert("RGB") for i in range(4)] | ||
| image = pipe( | ||
| prompt=prompt, | ||
| negative_prompt=negative_prompt, | ||
| entity_prompts=entity_prompts, | ||
| entity_masks=entity_masks, | ||
| cfg_scale=4.0, | ||
| width=1024, | ||
| height=1024, | ||
| num_inference_steps=40, | ||
| seed=42, | ||
| controlnet_params=param, | ||
| ) | ||
| image.save("qwen_image_eligen.png") | ||
| del pipe |
| @@ -0,0 +1,61 @@ | |||
| import unittest | |||
There was a problem hiding this comment.
This new test file is a good start for verifying the ControlNet functionality. However, it currently only includes tests for the in_context control type. To ensure full coverage of the new features, please consider adding a test case for the eligen control type as well. This will help prevent future regressions.
| self._initial_params = {} | ||
|
|
||
| def store_initial_config(self): | ||
| for attr_name in dir(self): |
There was a problem hiding this comment.
建议用vars(self)比较好点,就不用判断是不是函数了
No description provided.