Feature/qwen image control by Glaceon-Hyy · Pull Request #176 · modelscope/DiffSynth-Engine

Glaceon-Hyy · 2025-09-22T07:00:28Z

No description provided.

gemini-code-assist · 2025-09-22T07:01:01Z

Summary of Changes

Hello @Glaceon-Hyy, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the capabilities of the Qwen image generation model by integrating advanced image control features, specifically 'in-context' and 'eligen' control types. These additions allow users to guide image generation with greater precision, leveraging input images or entity-specific prompts and masks. The changes also include enhancements to the noise scheduling and LoRA application mechanisms, enabling dynamic configuration adjustments to support these new control modalities. A new example and comprehensive tests have been added to demonstrate and ensure the robustness of these new features.

Highlights

Qwen Image Control Integration: Introduced 'in-context' and 'eligen' image control types for the Qwen image generation pipeline, allowing for more precise guidance during image synthesis.
Flexible LoRA Configuration: Enhanced LoRA loading to support LoraConfig objects, enabling dynamic adjustment of noise scheduler parameters when applying LoRA models.
Noise Scheduler Enhancements: Added methods to BaseScheduler and RecifitedFlowScheduler for storing, updating, and restoring scheduler configurations, providing greater control over the diffusion process.
Fine-grained Attention Control: Modified the Qwen image DIT model to accept and process entity-specific attention masks and context latents, which are crucial for the new image control features.
New Examples and Tests: Included a new example script for 'eligen' control and dedicated unit tests for 'in-context' control to validate the new functionalities.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a powerful image control feature for the Qwen model, supporting both "in-context" and "eligen" control types. This is achieved by adding new configuration classes, extending the LoRA loading mechanism to handle scheduler configurations, and significantly updating the Qwen DiT model and pipeline to process control inputs like context latents and entity masks. The changes are well-structured, and a new example for the eligen feature is included. My review focuses on improving type safety, fixing a potential bug in the noise scheduler, ensuring code consistency, and enhancing test coverage for the new features.

diffsynth_engine/algorithm/noise_scheduler/flow_match/recifited_flow.py

gemini-code-assist · 2025-09-22T07:03:03Z

diffsynth_engine/configs/controlnet.py

+
+@dataclass
+class QwenImageControlNetParams:
+    image: ImageType


The image attribute of QwenImageControlNetParams is typed as ImageType, but the new eligen control type allows it to be None (as seen in examples/qwen_image_eligen.py). This is a type hint violation and can lead to confusion or errors.

To accurately reflect its usage, the type hint should be updated to indicate that image is optional.

Suggested change

image: ImageType

image: Optional[ImageType]

gemini-code-assist · 2025-09-22T07:03:03Z

diffsynth_engine/models/qwen_image/qwen_image_dit.py

            image = self.proj_out(image)
-            if edit is not None:
-                image = image[:, :image_seq_len]
+            image = image[:, :image_seq_len]


The logic for slicing the image tensor was previously conditional on edit is not None. Making it unconditional is a good simplification, as image_seq_len correctly represents the sequence length of the original image latents before any concatenation. However, this change is only correct if context_latents is also handled as an addition that needs to be sliced off. The current implementation correctly concatenates context_latents and edit to image, and this unconditional slice correctly extracts the original image part. This is a good simplification.

gemini-code-assist · 2025-09-22T07:03:04Z

diffsynth_engine/pipelines/qwen_image.py

            lora_args["alpha"] = alpha

            key = key.replace(f".{lora_a_suffix}", "")
+            key = key.replace("base_model.model.", "")


Adding key = key.replace("base_model.model.", "") is a good way to handle variations in LoRA state dictionary keys, improving compatibility with models from different sources. This makes the LoRA loading more robust.

gemini-code-assist · 2025-09-22T07:03:04Z

diffsynth_engine/utils/loader.py

+            # FastSafetensors不直接支持metadata，需要用标准safetensors获取
+            from safetensors import safe_open


The comment here is in Chinese, which is inconsistent with the rest of the codebase being in English. For better maintainability and to ensure all developers can understand the code, please write comments in English.

Suggested change

# FastSafetensors不直接支持metadata，需要用标准safetensors获取

from safetensors import safe_open

# FastSafetensors does not directly support metadata, so we need to use standard safetensors to get it.

from safetensors import safe_open

gemini-code-assist · 2025-09-22T07:03:04Z

examples/qwen_image_eligen.py

+from PIL import Image
+from diffsynth_engine import (
+    fetch_model,
+    QwenImagePipeline,
+    QwenImagePipelineConfig,
+    QwenImageControlNetParams,
+    QwenImageControlType,
+)
+
+
+if __name__ == "__main__":
+    config = QwenImagePipelineConfig.basic_config(
+        model_path=fetch_model("MusePublic/Qwen-image", revision="v1", path="transformer/*.safetensors"),
+        encoder_path=fetch_model("MusePublic/Qwen-image", revision="v1", path="text_encoder/*.safetensors"),
+        vae_path=fetch_model("MusePublic/Qwen-image", revision="v1", path="vae/*.safetensors"),
+    )
+    pipe = QwenImagePipeline.from_pretrained(config)
+    param = QwenImageControlNetParams(
+        control_type=QwenImageControlType.eligen,
+        image=None,
+        scale=1.0,
+        model=fetch_model("DiffSynth-Studio/Qwen-Image-EliGen-V2", path="model.safetensors"),
+    )
+
+    prompt = "写实摄影风格, 细节丰富。街头一位漂亮的女孩，穿着衬衫和短裤，手持写有“实体控制”的标牌，背景是繁忙的城市街道，阳光明媚，行人匆匆。"
+    negative_prompt = "网格化，规则的网格，模糊, 低分辨率, 低质量, 变形, 畸形, 错误的解剖学, 变形的手, 变形的身体, 变形的脸, 变形的头发, 变形的眼睛, 变形的嘴巴"
+    entity_prompts = ["一个漂亮的女孩", "标牌 '实体控制'", "短裤", "衬衫"]
+    entity_masks = [Image.open(f"input/qwen_image_eligen/{i}.png").convert("RGB") for i in range(4)]
+    image = pipe(
+        prompt=prompt,
+        negative_prompt=negative_prompt,
+        entity_prompts=entity_prompts,
+        entity_masks=entity_masks,
+        cfg_scale=4.0,
+        width=1024,
+        height=1024,
+        num_inference_steps=40,
+        seed=42,
+        controlnet_params=param,
+    )
+    image.save("qwen_image_eligen.png")
+    del pipe


This is a great example file that clearly demonstrates how to use the new eligen feature. It will be very helpful for users looking to adopt this functionality.

gemini-code-assist · 2025-09-22T07:03:04Z

tests/test_pipelines/test_qwen_image_controlnet.py

@@ -0,0 +1,61 @@
+import unittest


This new test file is a good start for verifying the ControlNet functionality. However, it currently only includes tests for the in_context control type. To ensure full coverage of the new features, please consider adding a test case for the eligen control type as well. This will help prevent future regressions.

diffsynth_engine/algorithm/noise_scheduler/base_scheduler.py

diffsynth_engine/utils/loader.py

akaitsuki-ii · 2025-09-26T06:58:55Z

diffsynth_engine/algorithm/noise_scheduler/base_scheduler.py

+        self._initial_params = {}
+
+    def store_initial_config(self):
+        for attr_name in dir(self):


建议用vars(self)比较好点，就不用判断是不是函数了

Glaceon-Hyy added 2 commits September 22, 2025 11:24

support qwen image controlnet

3243d33

fix qwen image attn

b56c990

gemini-code-assist bot reviewed Sep 22, 2025

View reviewed changes

akaitsuki-ii reviewed Sep 22, 2025

View reviewed changes

diffsynth_engine/algorithm/noise_scheduler/base_scheduler.py Outdated Show resolved Hide resolved

akaitsuki-ii reviewed Sep 22, 2025

View reviewed changes

diffsynth_engine/utils/loader.py Show resolved Hide resolved

update scheduler interface

b449cdf

akaitsuki-ii reviewed Sep 26, 2025

View reviewed changes

Glaceon-Hyy added 4 commits September 26, 2025 17:31

del attr not in __init__ after restore scheduler

4a4c8ee

fix base scheduler

d54e1b3

feat: standardize scheduler configuration interface

6fb81ee

fix tab

07f6dc4

akaitsuki-ii approved these changes Sep 28, 2025

View reviewed changes

akaitsuki-ii merged commit 0acf4cb into main Sep 28, 2025

akaitsuki-ii deleted the feature/qwen_image_control branch September 28, 2025 09:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/qwen image control#176

Feature/qwen image control#176
akaitsuki-ii merged 7 commits intomainfrom
feature/qwen_image_control

Glaceon-Hyy commented Sep 22, 2025

Uh oh!

gemini-code-assist bot commented Sep 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Sep 22, 2025

Uh oh!

gemini-code-assist bot Sep 22, 2025

Uh oh!

gemini-code-assist bot Sep 22, 2025

Uh oh!

gemini-code-assist bot Sep 22, 2025

Uh oh!

gemini-code-assist bot Sep 22, 2025

Uh oh!

gemini-code-assist bot Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

akaitsuki-ii Sep 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# FastSafetensors不直接支持metadata，需要用标准safetensors获取
		from safetensors import safe_open

Conversation

Glaceon-Hyy commented Sep 22, 2025

Uh oh!

gemini-code-assist bot commented Sep 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

akaitsuki-ii Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants