Auxiliary Visual Model

The auxiliary visual model automatically enables image support for text-only models by generating detailed visual descriptions of attached images.

Setting

You can configure an auxiliary visual model in Settings > Extensions > Auxiliary Visual Model. Only models that support image input are available for selection:

setting

How to use?

Simply upload images to your chat as usual. When your current model doesn't support images natively, the auxiliary visual model will automatically process them:

[Upload an image and ask any question about it]
What do you see in this image?

How does this work?

When you upload images to a chat:

Automatic Detection: The system checks if your current model supports image input
Visual Processing: If not, and an auxiliary visual model is configured, it automatically processes each image to generate detailed descriptions
Seamless Integration: The visual descriptions are used instead of the raw images when communicating with text-only models
Caching: Generated descriptions are cached to avoid reprocessing the same images

The auxiliary visual model generates comprehensive descriptions focusing on visual elements, objects, people, text content, scenes, colors, and composition.

Auxiliary Visual Model ​

Setting ​

How to use? ​

How does this work? ​

Auxiliary Visual Model

Setting

How to use?

How does this work?