imgutils.tagging.wd14

Overview:

This module provides utilities for image tagging using WD14 taggers. It includes functions for loading models, processing images, and extracting tags.

The module is inspired by the SmilingWolf/wd-v1-4-tags project on Hugging Face.

get_wd14_tags

imgutils.tagging.wd14.get_wd14_tags(image: str | PathLike | bytes | bytearray | BinaryIO | Image, model_name: str = 'SwinV2_v3', general_threshold: float = 0.35, general_mcut_enabled: bool = False, character_threshold: float = 0.85, character_mcut_enabled: bool = False, no_underline: bool = False, drop_overlap: bool = False, fmt=('rating', 'general', 'character'))[source]

Get tags for an image using WD14 taggers.

This function is similar to the SmilingWolf/wd-v1-4-tags project on Hugging Face.

Parameters:

image (ImageTyping) – The input image.
model_name (str) – The name of the model to use.
general_threshold (float) – The threshold for general tags.
general_mcut_enabled (bool) – If True, applies MCut thresholding to general tags.
character_threshold (float) – The threshold for character tags.
character_mcut_enabled (bool) – If True, applies MCut thresholding to character tags.
no_underline (bool) – If True, replaces underscores in tag names with spaces.
drop_overlap (bool) – If True, drops overlapping tags.
fmt – Return format, default is ('rating', 'general', 'character'). embedding is also supported for feature extraction.

Returns:

Prediction result based on the provided fmt.

Note

The fmt argument can include the following keys:

rating: a dict containing ratings and their confidences
general: a dict containing general tags and their confidences
character: a dict containing character tags and their confidences
tag: a dict containing all tags (including general and character, not including rating) and their confidences
embedding: a 1-dim embedding of image, recommended for index building after L2 normalization
prediction: a 1-dim prediction result of image

You can extract embedding of the given image with the follwing code

>>> from imgutils.tagging import get_wd14_tags
>>>
>>> embedding = get_wd14_tags('skadi.jpg', fmt='embedding')
>>> embedding.shape
(1024, )

This embedding is valuable for constructing indices that enable rapid querying of images based on visual features within large-scale datasets.

Example:

Here are some images for example

>>> from imgutils.tagging import get_wd14_tags
>>>
>>> rating, features, chars = get_wd14_tags('skadi.jpg')
>>> rating
{'general': 0.0011444687843322754, 'sensitive': 0.8876402974128723, 'questionable': 0.106781005859375, 'explicit': 0.000277101993560791}
>>> features
{'1girl': 0.997527003288269, 'solo': 0.9797663688659668, 'long_hair': 0.9905703663825989, 'breasts': 0.9761719703674316, 'looking_at_viewer': 0.8981098532676697, 'bangs': 0.8810765743255615, 'large_breasts': 0.9498510360717773, 'shirt': 0.8377365469932556, 'red_eyes': 0.945058286190033, 'gloves': 0.9457170367240906, 'navel': 0.969594419002533, 'holding': 0.7881088852882385, 'hair_between_eyes': 0.7687551379203796, 'very_long_hair': 0.9301245212554932, 'standing': 0.6703325510025024, 'white_hair': 0.5292627811431885, 'short_sleeves': 0.8677047491073608, 'grey_hair': 0.5859264731407166, 'thighs': 0.9536856412887573, 'cowboy_shot': 0.8056888580322266, 'sweat': 0.8394746780395508, 'outdoors': 0.9473626613616943, 'parted_lips': 0.8986269235610962, 'sky': 0.9385137557983398, 'shorts': 0.8408567905426025, 'alternate_costume': 0.4245271384716034, 'day': 0.931140661239624, 'black_gloves': 0.8830795884132385, 'midriff': 0.7279844284057617, 'artist_name': 0.5333830714225769, 'cloud': 0.64717698097229, 'stomach': 0.9516432285308838, 'blue_sky': 0.9655293226242065, 'crop_top': 0.9485014081001282, 'black_shirt': 0.7366660833358765, 'short_shorts': 0.7161656618118286, 'ass_visible_through_thighs': 0.5858667492866516, 'black_shorts': 0.6186309456825256, 'thigh_gap': 0.41193312406539917, 'no_headwear': 0.467605859041214, 'low-tied_long_hair': 0.36282333731651306, 'sportswear': 0.3756745457649231, 'motion_blur': 0.5091936588287354, 'baseball_bat': 0.951993465423584, 'baseball': 0.5634750723838806, 'holding_baseball_bat': 0.8232709169387817}
>>> chars
{'skadi_(arknights)': 0.9869340658187866}
>>>
>>> rating, features, chars = get_wd14_tags('hutao.jpg')
>>> rating
{'general': 0.49491602182388306, 'sensitive': 0.5193622708320618, 'questionable': 0.003406703472137451, 'explicit': 0.0007208287715911865}
>>> features
{'1girl': 0.9798132181167603, 'solo': 0.8046203851699829, 'long_hair': 0.7596215009689331, 'looking_at_viewer': 0.7620116472244263, 'blush': 0.46084529161453247, 'smile': 0.48454540967941284, 'bangs': 0.5152207016944885, 'skirt': 0.8023070096969604, 'brown_hair': 0.8653596639633179, 'hair_ornament': 0.7201820611953735, 'red_eyes': 0.7816740870475769, 'long_sleeves': 0.697688639163971, 'twintails': 0.8974947333335876, 'school_uniform': 0.7491052746772766, 'jacket': 0.5015512704849243, 'flower': 0.6401398181915283, 'ahoge': 0.43420469760894775, 'pleated_skirt': 0.4528769850730896, 'outdoors': 0.5730487704277039, 'tongue': 0.6739872694015503, 'hair_flower': 0.5545973181724548, 'tongue_out': 0.6946243047714233, 'bag': 0.5487751364707947, 'symbol-shaped_pupils': 0.7439308166503906, 'blazer': 0.4186026453971863, 'backpack': 0.47378358244895935, ':p': 0.4690653085708618, 'ghost': 0.7565015554428101}
>>> chars
{'hu_tao_(genshin_impact)': 0.9262397289276123, 'boo_tao_(genshin_impact)': 0.942080020904541}

convert_wd14_emb_to_prediction

imgutils.tagging.wd14.convert_wd14_emb_to_prediction(emb: ndarray, model_name: str = 'SwinV2_v3', general_threshold: float = 0.35, general_mcut_enabled: bool = False, character_threshold: float = 0.85, character_mcut_enabled: bool = False, no_underline: bool = False, drop_overlap: bool = False, fmt=('rating', 'general', 'character'), denormalize: bool = False, denormalizer_name: str = 'mnum2_all')[source]

Convert WD14 embedding to understandable prediction result. This function can process both single embeddings (1-dimensional array) and batches of embeddings (2-dimensional array).

Parameters:

emb (numpy.ndarray) – The extracted embedding(s). Can be either a 1-dim array for single image or 2-dim array for batch processing
model_name (str) – Name of the WD14 model to use for prediction
general_threshold (float) – Confidence threshold for general tags (0.0 to 1.0)
general_mcut_enabled (bool) – Enable MCut thresholding for general tags to improve prediction quality
character_threshold (float) – Confidence threshold for character tags (0.0 to 1.0)
character_mcut_enabled (bool) – Enable MCut thresholding for character tags to improve prediction quality
no_underline (bool) – Replace underscores with spaces in tag names for better readability
drop_overlap (bool) – Remove overlapping tags to reduce redundancy
fmt (tuple) – Specify return format structure for predictions, default is ('rating', 'general', 'character').
denormalize (bool) – Whether to denormalize the embedding before prediction
denormalizer_name (str) – Name of the denormalizer to use if denormalization is enabled

Returns:

For single embeddings: prediction result based on fmt. For batches: list of prediction results.

Note

Only the embeddings not get normalized can be converted to understandable prediction result. If normalized embeddings are provided, set denormalize=True to convert them back.

For batch processing (2-dim input), returns a list where each element corresponds to one embedding’s predictions in the same format as single embedding output.

Example:

>>> import numpy as np
>>> from imgutils.tagging import get_wd14_tags, convert_wd14_emb_to_prediction
>>>
>>> # extract the feature embedding, shape: (W, )
>>> embedding = get_wd14_tags('skadi.jpg', fmt='embedding')
>>>
>>> # convert to understandable result
>>> rating, general, character = convert_wd14_emb_to_prediction(embedding)
>>> # these 3 dicts will be the same as that returned by `get_wd14_tags('skadi.jpg')`
>>>
>>> # Batch processing, shape: (B, W)
>>> embeddings = np.stack([
...     get_wd14_tags('img1.jpg', fmt='embedding'),
...     get_wd14_tags('img2.jpg', fmt='embedding'),
... ])
>>> # results will be a list of (rating, general, character) tuples
>>> results = convert_wd14_emb_to_prediction(embeddings)

denormalize_wd14_emb

imgutils.tagging.wd14.denormalize_wd14_emb(emb: ndarray, model_name: str = 'SwinV2_v3', denormalizer_name: str = 'mnum2_all') → ndarray[source]

Denormalize WD14 embeddings.

Parameters:

emb (numpy.ndarray) – The embedding to denormalize.
model_name (str) – Name of the model.
denormalizer_name (str) – Name of the denormalizer.

Returns:

The denormalized embedding.

Return type:

numpy.ndarray

Examples:

>>> import numpy as np
>>> from imgutils.tagging import get_wd14_tags, convert_wd14_emb_to_prediction, denormalize_wd14_emb
...
>>> embedding, (r, g, c) = get_wd14_tags(
...     'image.png',
...     fmt=('embedding', ('rating', 'general', 'character')),
... )
...
>>> # normalize embedding
>>> embedding = embedding / np.linalg.norm(embedding)
...
>>> # denormalize this embedding
>>> output = denormalize_wd14_emb(embedding)
...
>>> # should be similar to r, g, c, approx 1e-3 error
>>> rating, general, character = convert_wd14_emb_to_prediction(output)