imgutils.generic.siglip

SigLIP (Sigmoid Loss Image-Paired) model implementation module.

This module provides functionality for working with SigLIP models, which are designed for image-text matching and classification tasks. It includes components for:

  • Loading and managing SigLIP models from Hugging Face repositories

  • Image and text encoding using ONNX models

  • Prediction and classification of image-text pairs

  • Web interface creation using Gradio

  • Caching and thread-safe model operations

The module supports multiple model variants and provides both high-level and low-level APIs for model interaction.

SigLIPModel

class imgutils.generic.siglip.SigLIPModel(repo_id: str, hf_token: str | None = None)[source]

Main class for managing and using SigLIP models.

This class handles model loading, caching, and inference operations for SigLIP models. It provides thread-safe access to model components and supports multiple model variants.

Parameters:
  • repo_id (str) – Hugging Face repository ID containing the SigLIP models

  • hf_token (Optional[str]) – Optional Hugging Face authentication token

__init__(repo_id: str, hf_token: str | None = None)[source]
clear()[source]

Clear all cached encoders, preprocessors, tokenizers, and scales.

This method resets the internal state of the SigLIP model by clearing all cached components, including image encoders, image preprocessors, text encoders, text tokenizers, and logit scales.

image_encode(images: str | PathLike | bytes | bytearray | BinaryIO | Image | List[str | PathLike | bytes | bytearray | BinaryIO | Image] | Tuple[str | PathLike | bytes | bytearray | BinaryIO | Image, ...], model_name: str, fmt: Any = 'embeddings')[source]

Generate embeddings for input images using the SigLIP model.

Parameters:
  • images (MultiImagesTyping) – Input images in various supported formats

  • model_name (str) – Name of the SigLIP model variant to use

  • fmt (Any) – Output format, either ‘encodings’ or ‘embeddings’

Returns:

Image embeddings or encodings based on fmt parameter

Raises:

ValueError – If model name is invalid

launch_demo(default_model_name: str | None = None, server_name: str | None = None, server_port: int | None = None, **kwargs)[source]

Launch a web demo for the SigLIP model.

Creates and launches a Gradio web interface for interacting with the model. The demo includes the model UI and descriptive information about the model repository.

Parameters:
  • default_model_name (Optional[str]) – Name of the model to select by default

  • server_name (Optional[str]) – Server hostname to use for the demo

  • server_port (Optional[int]) – Port number to use for the demo

  • kwargs – Additional keyword arguments passed to gr.Blocks.launch()

Raises:

RuntimeError – If Gradio is not properly installed

make_ui(default_model_name: str | None = None)[source]

Create an interactive Gradio UI for the SigLIP model.

This method creates a user interface with image input, text labels input, model selection, and prediction display. If no default model is specified, it automatically selects the most recently updated model.

Parameters:

default_model_name (Optional[str]) – Name of the model to select by default

Raises:

RuntimeError – If Gradio is not properly installed

predict(images: str | PathLike | bytes | bytearray | BinaryIO | Image | List[str | PathLike | bytes | bytearray | BinaryIO | Image] | Tuple[str | PathLike | bytes | bytearray | BinaryIO | Image, ...] | ndarray, texts: List[str] | str | ndarray, model_name: str, fmt: Any = 'predictions')[source]

Perform image-text classification using the SigLIP model.

Parameters:
  • images (Union[MultiImagesTyping, numpy.ndarray]) – Input images or pre-computed image embeddings

  • texts (Union[List[str], str, numpy.ndarray]) – Input texts or pre-computed text embeddings

  • model_name (str) – Name of the SigLIP model variant to use

  • fmt (Any) – Output format, one of ‘similarities’, ‘logits’, or ‘predictions’

Returns:

Classification results in specified format

Raises:

ValueError – If model name is invalid

text_encode(texts: str | List[str], model_name: str, fmt: Any = 'embeddings')[source]

Generate embeddings for input texts using the SigLIP model.

Parameters:
  • texts (Union[str, List[str]]) – Input text or list of texts

  • model_name (str) – Name of the SigLIP model variant to use

  • fmt (Any) – Output format, either ‘encodings’ or ‘embeddings’

Returns:

Text embeddings or encodings based on fmt parameter

Raises:

ValueError – If model name is invalid

siglip_image_encode

imgutils.generic.siglip.siglip_image_encode(images: str | PathLike | bytes | bytearray | BinaryIO | Image | List[str | PathLike | bytes | bytearray | BinaryIO | Image] | Tuple[str | PathLike | bytes | bytearray | BinaryIO | Image, ...], repo_id: str, model_name: str, fmt: Any = 'embeddings', hf_token: str | None = None)[source]

Encode images using a SigLIP model.

Parameters:
  • images (MultiImagesTyping) – One or more images to encode

  • repo_id (str) – Hugging Face repository ID for the model

  • model_name (str) – Name of the specific model to use

  • fmt (Any) – Output format (‘embeddings’ or custom format)

  • hf_token (Optional[str]) – Optional Hugging Face API token for private repositories

Returns:

Encoded image features in the specified format

Return type:

Any

siglip_text_encode

imgutils.generic.siglip.siglip_text_encode(texts: str | List[str], repo_id: str, model_name: str, fmt: Any = 'embeddings', hf_token: str | None = None)[source]

Encode texts using a SigLIP model.

Parameters:
  • texts (Union[str, List[str]]) – Single text or list of texts to encode

  • repo_id (str) – Hugging Face repository ID for the model

  • model_name (str) – Name of the specific model to use

  • fmt (Any) – Output format (‘embeddings’ or custom format)

  • hf_token (Optional[str]) – Optional Hugging Face API token for private repositories

Returns:

Encoded text features in the specified format

Return type:

Any

siglip_predict

imgutils.generic.siglip.siglip_predict(images: str | PathLike | bytes | bytearray | BinaryIO | Image | List[str | PathLike | bytes | bytearray | BinaryIO | Image] | Tuple[str | PathLike | bytes | bytearray | BinaryIO | Image, ...] | ndarray, texts: List[str] | str | ndarray, repo_id: str, model_name: str, fmt: Any = 'predictions', hf_token: str | None = None)[source]

Predict similarity scores between images and texts using a SigLIP model.

This function computes similarity scores between the given images and texts using the specified SigLIP model. It can handle both raw inputs and pre-computed embeddings.

Parameters:
  • images (Union[MultiImagesTyping, np.ndarray]) – Images or image embeddings to compare

  • texts (Union[List[str], str, np.ndarray]) – Texts or text embeddings to compare

  • repo_id (str) – Hugging Face repository ID for the model

  • model_name (str) – Name of the specific model to use

  • fmt (Any) – Output format (‘predictions’ or custom format)

  • hf_token (Optional[str]) – Optional Hugging Face API token for private repositories

Returns:

Similarity scores in the specified format

Return type:

Any