Skip to content

Auxiliary Visual Model

The auxiliary visual model automatically enables image support for text-only models by generating detailed visual descriptions of attached images.

Setting

You can configure an auxiliary visual model in Settings > Extensions > Auxiliary Visual Model. Only models that support image input are available for selection:

setting

How to use?

Simply upload images to your chat as usual. When your current model doesn't support images natively, the auxiliary visual model will automatically process them:

[Upload an image and ask any question about it]
What do you see in this image?

How does this work?

When you upload images to a chat:

  1. Automatic Detection: The system checks if your current model supports image input
  2. Visual Processing: If not, and an auxiliary visual model is configured, it automatically processes each image to generate detailed descriptions
  3. Seamless Integration: The visual descriptions are used instead of the raw images when communicating with text-only models
  4. Caching: Generated descriptions are cached to avoid reprocessing the same images

The auxiliary visual model generates comprehensive descriptions focusing on visual elements, objects, people, text content, scenes, colors, and composition.