imgutils.tagging.wd14
- Overview:
This module provides utilities for image tagging using WD14 taggers. It includes functions for loading models, processing images, and extracting tags.
The module is inspired by the SmilingWolf/wd-v1-4-tags project on Hugging Face.
convert_wd14_emb_to_prediction
- imgutils.tagging.wd14.convert_wd14_emb_to_prediction(emb: ndarray, model_name: str = 'SwinV2_v3', general_threshold: float = 0.35, general_mcut_enabled: bool = False, character_threshold: float = 0.85, character_mcut_enabled: bool = False, no_underline: bool = False, drop_overlap: bool = False, fmt=('rating', 'general', 'character'), denormalize: bool = False, denormalizer_name: str = 'mnum2_all')[source]
Convert WD14 embedding to understandable prediction result. This function can process both single embeddings (1-dimensional array) and batches of embeddings (2-dimensional array).
- Parameters:
emb (numpy.ndarray) – The extracted embedding(s). Can be either a 1-dim array for single image or 2-dim array for batch processing
model_name (str) – Name of the WD14 model to use for prediction
general_threshold (float) – Confidence threshold for general tags (0.0 to 1.0)
general_mcut_enabled (bool) – Enable MCut thresholding for general tags to improve prediction quality
character_threshold (float) – Confidence threshold for character tags (0.0 to 1.0)
character_mcut_enabled (bool) – Enable MCut thresholding for character tags to improve prediction quality
no_underline (bool) – Replace underscores with spaces in tag names for better readability
drop_overlap (bool) – Remove overlapping tags to reduce redundancy
fmt (tuple) – Specify return format structure for predictions, default is
('rating', 'general', 'character')
.denormalize (bool) – Whether to denormalize the embedding before prediction
denormalizer_name (str) – Name of the denormalizer to use if denormalization is enabled
- Returns:
For single embeddings: prediction result based on fmt. For batches: list of prediction results.
Note
Only the embeddings not get normalized can be converted to understandable prediction result. If normalized embeddings are provided, set
denormalize=True
to convert them back.For batch processing (2-dim input), returns a list where each element corresponds to one embedding’s predictions in the same format as single embedding output.
- Example:
>>> import os >>> import numpy as np >>> from imgutils.tagging import get_wd14_tags, convert_wd14_emb_to_prediction >>> >>> # extract the feature embedding, shape: (W, ) >>> embedding = get_wd14_tags('skadi.jpg', fmt='embedding') >>> >>> # convert to understandable result >>> rating, general, character = convert_wd14_emb_to_prediction(embedding) >>> # these 3 dicts will be the same as that returned by `get_wd14_tags('skadi.jpg')` >>> >>> # Batch processing, shape: (B, W) >>> embeddings = np.stack([ ... get_wd14_tags('img1.jpg', fmt='embedding'), ... get_wd14_tags('img2.jpg', fmt='embedding'), ... ]) >>> # results will be a list of (rating, general, character) tuples >>> results = convert_wd14_emb_to_prediction(embeddings)
denormalize_wd14_emb
- imgutils.tagging.wd14.denormalize_wd14_emb(emb: ndarray, model_name: str = 'SwinV2_v3', denormalizer_name: str = 'mnum2_all') ndarray [source]
Denormalize WD14 embeddings.
- Parameters:
emb (numpy.ndarray) – The embedding to denormalize.
model_name (str) – Name of the model.
denormalizer_name (str) – Name of the denormalizer.
- Returns:
The denormalized embedding.
- Return type:
numpy.ndarray
- Examples:
>>> import numpy as np >>> from imgutils.tagging import get_wd14_tags, convert_wd14_emb_to_prediction, denormalize_wd14_emb ... >>> embedding, (r, g, c) = get_wd14_tags( ... 'image.png', ... fmt=('embedding', ('rating', 'general', 'character')), ... ) ... >>> # normalize embedding >>> embedding = embedding / np.linalg.norm(embedding) ... >>> # denormalize this embedding >>> output = denormalize_wd14_emb(embedding) ... >>> # should be similar to r, g, c, approx 1e-3 error >>> rating, general, character = convert_wd14_emb_to_prediction(output)