imgutils.generic.siglip

SigLIP (Sigmoid Loss Image-Paired) model implementation module.

This module provides functionality for working with SigLIP models, which are designed for image-text matching and classification tasks. It includes components for:

Loading and managing SigLIP models from Hugging Face repositories
Image and text encoding using ONNX models
Prediction and classification of image-text pairs
Web interface creation using Gradio
Caching and thread-safe model operations

The module supports multiple model variants and provides both high-level and low-level APIs for model interaction.

SigLIPModel

class imgutils.generic.siglip.SigLIPModel(repo_id: str, hf_token: str | None = None)[source]

Main class for managing and using SigLIP models.

This class handles model loading, caching, and inference operations for SigLIP models. It provides thread-safe access to model components and supports multiple model variants.

Parameters:

repo_id (str) – Hugging Face repository ID containing the SigLIP models
hf_token (Optional[str]) – Optional Hugging Face authentication token

__init__(repo_id: str, hf_token: str | None = None)[source]

clear()[source]

Clear all cached encoders, preprocessors, tokenizers, and scales.

This method resets the internal state of the SigLIP model by clearing all cached components, including image encoders, image preprocessors, text encoders, text tokenizers, and logit scales.

Generate embeddings for input images using the SigLIP model.

Parameters:

images (MultiImagesTyping) – Input images in various supported formats
model_name (str) – Name of the SigLIP model variant to use
fmt (Any) – Output format, either ‘encodings’ or ‘embeddings’

Returns:

Image embeddings or encodings based on fmt parameter

Raises:

ValueError – If model name is invalid

launch_demo(default_model_name: str | None = None, server_name: str | None = None, server_port: int | None = None, **kwargs)[source]

Launch a web demo for the SigLIP model.

Creates and launches a Gradio web interface for interacting with the model. The demo includes the model UI and descriptive information about the model repository.

Parameters:

default_model_name (Optional[str]) – Name of the model to select by default
server_name (Optional[str]) – Server hostname to use for the demo
server_port (Optional[int]) – Port number to use for the demo
kwargs – Additional keyword arguments passed to gr.Blocks.launch()

Raises:

RuntimeError – If Gradio is not properly installed

make_ui(default_model_name: str | None = None)[source]

Create an interactive Gradio UI for the SigLIP model.

This method creates a user interface with image input, text labels input, model selection, and prediction display. If no default model is specified, it automatically selects the most recently updated model.

Parameters:: default_model_name (Optional[str]) – Name of the model to select by default
Raises:: RuntimeError – If Gradio is not properly installed

Perform image-text classification using the SigLIP model.

Parameters:

images (Union[MultiImagesTyping, numpy.ndarray]) – Input images or pre-computed image embeddings
texts (Union[List[str], str, numpy.ndarray]) – Input texts or pre-computed text embeddings
model_name (str) – Name of the SigLIP model variant to use
fmt (Any) – Output format, one of ‘similarities’, ‘logits’, or ‘predictions’

Returns:

Classification results in specified format

Raises:

ValueError – If model name is invalid

text_encode(texts: str | List[str], model_name: str, fmt: Any = 'embeddings')[source]

Generate embeddings for input texts using the SigLIP model.

Parameters:

texts (Union[str, List[str]]) – Input text or list of texts
model_name (str) – Name of the SigLIP model variant to use
fmt (Any) – Output format, either ‘encodings’ or ‘embeddings’

Returns:

Text embeddings or encodings based on fmt parameter

Raises:

ValueError – If model name is invalid

siglip_image_encode

Encode images using a SigLIP model.

Parameters:

images (MultiImagesTyping) – One or more images to encode
repo_id (str) – Hugging Face repository ID for the model
model_name (str) – Name of the specific model to use
fmt (Any) – Output format (‘embeddings’ or custom format)
hf_token (Optional[str]) – Optional Hugging Face API token for private repositories

Returns:

Encoded image features in the specified format

Return type:

Any

siglip_text_encode

imgutils.generic.siglip.siglip_text_encode(texts: str | List[str], repo_id: str, model_name: str, fmt: Any = 'embeddings', hf_token: str | None = None)[source]

Encode texts using a SigLIP model.

Parameters:

texts (Union[str, List[str]]) – Single text or list of texts to encode
repo_id (str) – Hugging Face repository ID for the model
model_name (str) – Name of the specific model to use
fmt (Any) – Output format (‘embeddings’ or custom format)
hf_token (Optional[str]) – Optional Hugging Face API token for private repositories

Returns:

Encoded text features in the specified format

Return type:

Any

siglip_predict

Predict similarity scores between images and texts using a SigLIP model.

This function computes similarity scores between the given images and texts using the specified SigLIP model. It can handle both raw inputs and pre-computed embeddings.

Parameters:

images (Union[MultiImagesTyping, np.ndarray]) – Images or image embeddings to compare
texts (Union[List[str], str, np.ndarray]) – Texts or text embeddings to compare
repo_id (str) – Hugging Face repository ID for the model
model_name (str) – Name of the specific model to use
fmt (Any) – Output format (‘predictions’ or custom format)
hf_token (Optional[str]) – Optional Hugging Face API token for private repositories

Returns:

Similarity scores in the specified format

Return type:

Any