Skip to content

Feature/qwen image control#176

Merged
akaitsuki-ii merged 7 commits intomainfrom
feature/qwen_image_control
Sep 28, 2025
Merged

Feature/qwen image control#176
akaitsuki-ii merged 7 commits intomainfrom
feature/qwen_image_control

Conversation

@Glaceon-Hyy
Copy link
Member

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Glaceon-Hyy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the capabilities of the Qwen image generation model by integrating advanced image control features, specifically 'in-context' and 'eligen' control types. These additions allow users to guide image generation with greater precision, leveraging input images or entity-specific prompts and masks. The changes also include enhancements to the noise scheduling and LoRA application mechanisms, enabling dynamic configuration adjustments to support these new control modalities. A new example and comprehensive tests have been added to demonstrate and ensure the robustness of these new features.

Highlights

  • Qwen Image Control Integration: Introduced 'in-context' and 'eligen' image control types for the Qwen image generation pipeline, allowing for more precise guidance during image synthesis.
  • Flexible LoRA Configuration: Enhanced LoRA loading to support LoraConfig objects, enabling dynamic adjustment of noise scheduler parameters when applying LoRA models.
  • Noise Scheduler Enhancements: Added methods to BaseScheduler and RecifitedFlowScheduler for storing, updating, and restoring scheduler configurations, providing greater control over the diffusion process.
  • Fine-grained Attention Control: Modified the Qwen image DIT model to accept and process entity-specific attention masks and context latents, which are crucial for the new image control features.
  • New Examples and Tests: Included a new example script for 'eligen' control and dedicated unit tests for 'in-context' control to validate the new functionalities.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a powerful image control feature for the Qwen model, supporting both "in-context" and "eligen" control types. This is achieved by adding new configuration classes, extending the LoRA loading mechanism to handle scheduler configurations, and significantly updating the Qwen DiT model and pipeline to process control inputs like context latents and entity masks. The changes are well-structured, and a new example for the eligen feature is included. My review focuses on improving type safety, fixing a potential bug in the noise scheduler, ensuring code consistency, and enhancing test coverage for the new features.


@dataclass
class QwenImageControlNetParams:
image: ImageType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The image attribute of QwenImageControlNetParams is typed as ImageType, but the new eligen control type allows it to be None (as seen in examples/qwen_image_eligen.py). This is a type hint violation and can lead to confusion or errors.

To accurately reflect its usage, the type hint should be updated to indicate that image is optional.

Suggested change
image: ImageType
image: Optional[ImageType]

image = self.proj_out(image)
if edit is not None:
image = image[:, :image_seq_len]
image = image[:, :image_seq_len]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for slicing the image tensor was previously conditional on edit is not None. Making it unconditional is a good simplification, as image_seq_len correctly represents the sequence length of the original image latents before any concatenation. However, this change is only correct if context_latents is also handled as an addition that needs to be sliced off. The current implementation correctly concatenates context_latents and edit to image, and this unconditional slice correctly extracts the original image part. This is a good simplification.

lora_args["alpha"] = alpha

key = key.replace(f".{lora_a_suffix}", "")
key = key.replace("base_model.model.", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Adding key = key.replace("base_model.model.", "") is a good way to handle variations in LoRA state dictionary keys, improving compatibility with models from different sources. This makes the LoRA loading more robust.

Comment on lines +30 to +31
# FastSafetensors不直接支持metadata,需要用标准safetensors获取
from safetensors import safe_open
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment here is in Chinese, which is inconsistent with the rest of the codebase being in English. For better maintainability and to ensure all developers can understand the code, please write comments in English.

Suggested change
# FastSafetensors不直接支持metadata,需要用标准safetensors获取
from safetensors import safe_open
# FastSafetensors does not directly support metadata, so we need to use standard safetensors to get it.
from safetensors import safe_open

Comment on lines +1 to +42
from PIL import Image
from diffsynth_engine import (
fetch_model,
QwenImagePipeline,
QwenImagePipelineConfig,
QwenImageControlNetParams,
QwenImageControlType,
)


if __name__ == "__main__":
config = QwenImagePipelineConfig.basic_config(
model_path=fetch_model("MusePublic/Qwen-image", revision="v1", path="transformer/*.safetensors"),
encoder_path=fetch_model("MusePublic/Qwen-image", revision="v1", path="text_encoder/*.safetensors"),
vae_path=fetch_model("MusePublic/Qwen-image", revision="v1", path="vae/*.safetensors"),
)
pipe = QwenImagePipeline.from_pretrained(config)
param = QwenImageControlNetParams(
control_type=QwenImageControlType.eligen,
image=None,
scale=1.0,
model=fetch_model("DiffSynth-Studio/Qwen-Image-EliGen-V2", path="model.safetensors"),
)

prompt = "写实摄影风格, 细节丰富。街头一位漂亮的女孩,穿着衬衫和短裤,手持写有“实体控制”的标牌,背景是繁忙的城市街道,阳光明媚,行人匆匆。"
negative_prompt = "网格化,规则的网格,模糊, 低分辨率, 低质量, 变形, 畸形, 错误的解剖学, 变形的手, 变形的身体, 变形的脸, 变形的头发, 变形的眼睛, 变形的嘴巴"
entity_prompts = ["一个漂亮的女孩", "标牌 '实体控制'", "短裤", "衬衫"]
entity_masks = [Image.open(f"input/qwen_image_eligen/{i}.png").convert("RGB") for i in range(4)]
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
entity_prompts=entity_prompts,
entity_masks=entity_masks,
cfg_scale=4.0,
width=1024,
height=1024,
num_inference_steps=40,
seed=42,
controlnet_params=param,
)
image.save("qwen_image_eligen.png")
del pipe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This is a great example file that clearly demonstrates how to use the new eligen feature. It will be very helpful for users looking to adopt this functionality.

@@ -0,0 +1,61 @@
import unittest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This new test file is a good start for verifying the ControlNet functionality. However, it currently only includes tests for the in_context control type. To ensure full coverage of the new features, please consider adding a test case for the eligen control type as well. This will help prevent future regressions.

self._initial_params = {}

def store_initial_config(self):
for attr_name in dir(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议用vars(self)比较好点,就不用判断是不是函数了

@akaitsuki-ii akaitsuki-ii merged commit 0acf4cb into main Sep 28, 2025
@akaitsuki-ii akaitsuki-ii deleted the feature/qwen_image_control branch September 28, 2025 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants