Skip to content

03. Guided Generation

Basic Text-to-Image generation creates images from prompts alone. However, when you need structural control like “in this pose,” “with this composition,” or “in this style,” prompts alone have their limitations.

Guided Generation is a technique that controls the generation output by injecting additional conditions into the model, such as reference images, structural information, and style weights. In ComfyUI, there are three main approaches:

ApproachControl TargetRepresentative Node
ControlNetSpatial structure, layoutControlNetApplyAdvanced
LoRAStyle, character, conceptLoraLoader
Reference/ReduxOverall style, moodModel-specific dedicated nodes

ControlNet controls the structure of generated images by injecting structural conditions extracted from an input image into the model. It can generate entirely different images while preserving the contours, depth, and human poses from the original image.

TypeExtracted InformationUse CasePreprocessing Node
CannyEdge linesMaintaining shape/silhouetteCanny
DepthDepth mapMaintaining perspective/spatial arrangementDepthAnything V2
OpenPoseBody joint positionsMaintaining poseDWPosePreprocessor
ParameterDescriptionRecommended Range
strengthControl intensity. Higher values follow the structure more strictly0.5 ~ 1.0

LoRA is a model trained on specific styles, characters, and more.

ParameterDescriptionRecommended Range
strength_modelLoRA influence on the model0.6 ~ 1.0
strength_clipLoRA influence on the text encoder0.6 ~ 1.0

Reference/Redux - Style Transfer Based on Reference Images

Section titled “Reference/Redux - Style Transfer Based on Reference Images”

The Reference approach transfers style, color palette, and mood by feeding the reference image itself into the model. While ControlNet is a command to “follow this shape,” Reference is a command to “create with this feel.”

Dedicated models like Flux.1 Redux extract visual characteristics from the reference image and apply them to new image generation.


Canny ControlNet is the most basic and intuitive ControlNet type. It extracts edge lines from the input image to control the shape of the generated image.

Model used: flux1-dev-fp8 + flux-canny-controlnet-v3.safetensors

Canny Workflow


Depth ControlNet extracts the depth map from the input image to control perspective and spatial arrangement. Rather than preserving the shape of subjects, it is effective at maintaining “what is in front and what is behind.”

Model used: flux1-dev-fp8 + flux-depth-controlnet-v3.safetensors

Depth Workflow


OpenPose ControlNet extracts the pose of people in the input image, making it effective for generating different characters in the same pose.

Model used: flux1-dev-fp8 + FLUX-1-dev-Controlnet-union-Pro.safetensors

OpenPose Workflow


The basic structure of a workflow using LoRA is as follows:

LoRA Basic Structure

To apply multiple LoRAs simultaneously, chain LoraLoader nodes in sequence:

Applying Multiple LoRAs


Flux.1 Dev USO Reference - Generation Based on Reference Images

Section titled “Flux.1 Dev USO Reference - Generation Based on Reference Images”

Reference Workflow

Generates new images while maintaining the style and subject consistency of the reference image.

Example use cases:

  • Generate the same character in various poses/backgrounds
  • Create a consistent series of product images
  • Generate diverse variations of the same subject

  • Need to maintain precise contours -> Canny ControlNet
  • Want to preserve perspective/spatial arrangement -> Depth ControlNet
  • Want to generate in a specific style/character -> LoRA
  • Want to transfer the mood of a reference image -> Reference
  • Need to maintain a specific pose -> OpenPose ControlNet

Topic CoveredKey Points
ControlNetControls spatial structure through structure maps (edges/depth/pose). Three architectures exist: dedicated models, ControlNet modules, and model patches
LoRACustomizes style/character/concept through lightweight weight files. Can be chained via LoraLoader
ReferenceTransfers style/mood from reference images. Better suited for mood transfer than structural control