imgutils.tagging.wd14

Overview:

This module provides utilities for image tagging using WD14 taggers. It includes functions for loading models, processing images, and extracting tags.

The module is inspired by the SmilingWolf/wd-v1-4-tags project on Hugging Face.

get_wd14_tags

imgutils.tagging.wd14.get_wd14_tags(image: str | PathLike | bytes | bytearray | BinaryIO | Image, model_name: str = 'SwinV2_v3', general_threshold: float = 0.35, general_mcut_enabled: bool = False, character_threshold: float = 0.85, character_mcut_enabled: bool = False, no_underline: bool = False, drop_overlap: bool = False, fmt: Any = ('rating', 'general', 'character'), attachments: Mapping[str, Tuple[str, str] | Tuple[str, str, dict]] | None = None)[source]

Get tags for an image using WD14 taggers.

This function is similar to the SmilingWolf/wd-v1-4-tags project on Hugging Face.

Parameters:

image (ImageTyping) – The input image.
model_name (str) – The name of the model to use.
general_threshold (float) – The threshold for general tags.
general_mcut_enabled (bool) – If True, applies MCut thresholding to general tags.
character_threshold (float) – The threshold for character tags.
character_mcut_enabled (bool) – If True, applies MCut thresholding to character tags.
no_underline (bool) – If True, replaces underscores in tag names with spaces.
drop_overlap (bool) – If True, drops overlapping tags.
fmt (Any) – Return format, default is ('rating', 'general', 'character'). embedding is also supported for feature extraction.
attachments (Optional[Mapping[str, Union[Tuple[str, str], Tuple[str, str, dict]]]]) – Additional model attachments for extended tagging capabilities

Returns:

Prediction result based on the provided fmt.

Return type:

Any

Raises:

ValueError – If attachment configuration is invalid or incompatible

Note

The fmt argument can include the following keys:

rating: a dict containing ratings and their confidences
general: a dict containing general tags and their confidences
character: a dict containing character tags and their confidences
tag: a dict containing all tags (including general and character, not including rating) and their confidences
embedding: a 1-dim embedding of image, recommended for index building after L2 normalization
prediction: a 1-dim prediction result of image

You can extract embedding of the given image with the follwing code

>>> from imgutils.tagging import get_wd14_tags
>>>
>>> embedding = get_wd14_tags('skadi.jpg', fmt='embedding')
>>> embedding.shape
(1024, )

This embedding is valuable for constructing indices that enable rapid querying of images based on visual features within large-scale datasets.

Note

The attachment system allows integration of additional tagging models to extend the base WD14 tagger’s capabilities. Attachments are specified using a dictionary with the following format:

attachments = {
    'name': ('repo_id', 'model_name'),  # Basic format
    'name': ('repo_id', 'model_name', {'threshold': 0.35})  # With additional parameters when predicting
}

The fmt argument can include attachment results using the format ‘name/key’, where:

name: The name specified in the attachments dictionary
key: The specific output type requested from the attachment

For example:

>>> from imgutils.tagging import get_wd14_tags
>>>
>>> # Using an attachment for additional style tagging
>>> results = get_wd14_tags(
...     'image.jpg',
...     attachments={'monochrome': ('deepghs/eattach_monochrome_experiments', 'mlp_layer1_seed1')},
...     fmt=('general', 'monochrome/scores')
... )
>>>
>>> # Results will include both base tags and attachment outputs
>>> print(results)
(
    {'1girl': 0.99, ...},
    {'monochrome': 0.999, 'normal': 0.001},
)

Multiple attachments can be used simultaneously, and each attachment can provide multiple output types through its fmt specification. Ensure that attachment models are compatible with the base WD14 model’s embedding format.

Example:

Here are some images for example

>>> from imgutils.tagging import get_wd14_tags
>>>
>>> rating, features, chars = get_wd14_tags('skadi.jpg')
>>> rating
{'general': 0.0011444687843322754, 'sensitive': 0.8876402974128723, 'questionable': 0.106781005859375, 'explicit': 0.000277101993560791}
>>> features
{'1girl': 0.997527003288269, 'solo': 0.9797663688659668, 'long_hair': 0.9905703663825989, 'breasts': 0.9761719703674316, 'looking_at_viewer': 0.8981098532676697, 'bangs': 0.8810765743255615, 'large_breasts': 0.9498510360717773, 'shirt': 0.8377365469932556, 'red_eyes': 0.945058286190033, 'gloves': 0.9457170367240906, 'navel': 0.969594419002533, 'holding': 0.7881088852882385, 'hair_between_eyes': 0.7687551379203796, 'very_long_hair': 0.9301245212554932, 'standing': 0.6703325510025024, 'white_hair': 0.5292627811431885, 'short_sleeves': 0.8677047491073608, 'grey_hair': 0.5859264731407166, 'thighs': 0.9536856412887573, 'cowboy_shot': 0.8056888580322266, 'sweat': 0.8394746780395508, 'outdoors': 0.9473626613616943, 'parted_lips': 0.8986269235610962, 'sky': 0.9385137557983398, 'shorts': 0.8408567905426025, 'alternate_costume': 0.4245271384716034, 'day': 0.931140661239624, 'black_gloves': 0.8830795884132385, 'midriff': 0.7279844284057617, 'artist_name': 0.5333830714225769, 'cloud': 0.64717698097229, 'stomach': 0.9516432285308838, 'blue_sky': 0.9655293226242065, 'crop_top': 0.9485014081001282, 'black_shirt': 0.7366660833358765, 'short_shorts': 0.7161656618118286, 'ass_visible_through_thighs': 0.5858667492866516, 'black_shorts': 0.6186309456825256, 'thigh_gap': 0.41193312406539917, 'no_headwear': 0.467605859041214, 'low-tied_long_hair': 0.36282333731651306, 'sportswear': 0.3756745457649231, 'motion_blur': 0.5091936588287354, 'baseball_bat': 0.951993465423584, 'baseball': 0.5634750723838806, 'holding_baseball_bat': 0.8232709169387817}
>>> chars
{'skadi_(arknights)': 0.9869340658187866}
>>>
>>> rating, features, chars = get_wd14_tags('hutao.jpg')
>>> rating
{'general': 0.49491602182388306, 'sensitive': 0.5193622708320618, 'questionable': 0.003406703472137451, 'explicit': 0.0007208287715911865}
>>> features
{'1girl': 0.9798132181167603, 'solo': 0.8046203851699829, 'long_hair': 0.7596215009689331, 'looking_at_viewer': 0.7620116472244263, 'blush': 0.46084529161453247, 'smile': 0.48454540967941284, 'bangs': 0.5152207016944885, 'skirt': 0.8023070096969604, 'brown_hair': 0.8653596639633179, 'hair_ornament': 0.7201820611953735, 'red_eyes': 0.7816740870475769, 'long_sleeves': 0.697688639163971, 'twintails': 0.8974947333335876, 'school_uniform': 0.7491052746772766, 'jacket': 0.5015512704849243, 'flower': 0.6401398181915283, 'ahoge': 0.43420469760894775, 'pleated_skirt': 0.4528769850730896, 'outdoors': 0.5730487704277039, 'tongue': 0.6739872694015503, 'hair_flower': 0.5545973181724548, 'tongue_out': 0.6946243047714233, 'bag': 0.5487751364707947, 'symbol-shaped_pupils': 0.7439308166503906, 'blazer': 0.4186026453971863, 'backpack': 0.47378358244895935, ':p': 0.4690653085708618, 'ghost': 0.7565015554428101}
>>> chars
{'hu_tao_(genshin_impact)': 0.9262397289276123, 'boo_tao_(genshin_impact)': 0.942080020904541}

convert_wd14_emb_to_prediction

imgutils.tagging.wd14.convert_wd14_emb_to_prediction(emb: ndarray, model_name: str = 'SwinV2_v3', general_threshold: float = 0.35, general_mcut_enabled: bool = False, character_threshold: float = 0.85, character_mcut_enabled: bool = False, no_underline: bool = False, drop_overlap: bool = False, fmt: Any = ('rating', 'general', 'character'), attachments: Mapping[str, Tuple[str, str] | Tuple[str, str, dict]] | None = None, denormalize: bool = False, denormalizer_name: str = 'mnum2_all')[source]

Convert WD14 embedding to understandable prediction result. This function can process both single embeddings (1-dimensional array) and batches of embeddings (2-dimensional array).

Parameters:

emb (numpy.ndarray) – The extracted embedding(s). Can be either a 1-dim array for single image or 2-dim array for batch processing
model_name (str) – Name of the WD14 model to use for prediction
general_threshold (float) – Confidence threshold for general tags (0.0 to 1.0)
general_mcut_enabled (bool) – Enable MCut thresholding for general tags to improve prediction quality
character_threshold (float) – Confidence threshold for character tags (0.0 to 1.0)
character_mcut_enabled (bool) – Enable MCut thresholding for character tags to improve prediction quality
no_underline (bool) – Replace underscores with spaces in tag names for better readability
drop_overlap (bool) – Remove overlapping tags to reduce redundancy
fmt (Any) – Specify return format structure for predictions, default is ('rating', 'general', 'character').
attachments (Optional[Mapping[str, Union[Tuple[str, str], Tuple[str, str, dict]]]]) – Additional model attachments for extended tagging capabilities
denormalize (bool) – Whether to denormalize the embedding before prediction
denormalizer_name (str) – Name of the denormalizer to use if denormalization is enabled

Returns:

For single embeddings: prediction result based on fmt. For batches: list of prediction results.

Return type:

Any

Raises:

ValueError – If attachment configuration is invalid or incompatible

Note

Only the embeddings not get normalized can be converted to understandable prediction result. If normalized embeddings are provided, set denormalize=True to convert them back.

For batch processing (2-dim input), returns a list where each element corresponds to one embedding’s predictions in the same format as single embedding output.

Example:

>>> import os
>>> import numpy as np
>>> from imgutils.tagging import get_wd14_tags, convert_wd14_emb_to_prediction
>>>
>>> # extract the feature embedding, shape: (W, )
>>> embedding = get_wd14_tags('skadi.jpg', fmt='embedding')
>>>
>>> # convert to understandable result
>>> rating, general, character = convert_wd14_emb_to_prediction(embedding)
>>> # these 3 dicts will be the same as that returned by `get_wd14_tags('skadi.jpg')`
>>>
>>> # Batch processing, shape: (B, W)
>>> embeddings = np.stack([
...     get_wd14_tags('img1.jpg', fmt='embedding'),
...     get_wd14_tags('img2.jpg', fmt='embedding'),
... ])
>>> # results will be a list of (rating, general, character) tuples
>>> results = convert_wd14_emb_to_prediction(embeddings)

denormalize_wd14_emb

imgutils.tagging.wd14.denormalize_wd14_emb(emb: ndarray, model_name: str = 'SwinV2_v3', denormalizer_name: str = 'mnum2_all') → ndarray[source]

Denormalize WD14 embeddings.

Parameters:

emb (numpy.ndarray) – The embedding to denormalize.
model_name (str) – Name of the model.
denormalizer_name (str) – Name of the denormalizer.

Returns:

The denormalized embedding.

Return type:

numpy.ndarray

Examples:

>>> import numpy as np
>>> from imgutils.tagging import get_wd14_tags, convert_wd14_emb_to_prediction, denormalize_wd14_emb
...
>>> embedding, (r, g, c) = get_wd14_tags(
...     'image.png',
...     fmt=('embedding', ('rating', 'general', 'character')),
... )
...
>>> # normalize embedding
>>> embedding = embedding / np.linalg.norm(embedding)
...
>>> # denormalize this embedding
>>> output = denormalize_wd14_emb(embedding)
...
>>> # should be similar to r, g, c, approx 1e-3 error
>>> rating, general, character = convert_wd14_emb_to_prediction(output)