imgutils.preprocess.transformers

Overview:

Convert transformers image processors to PillowCompose objects.

Supported Processors:

Name

Supported

Repos

Function

ViTImageProcessor

5906 (33.24%)

create_transforms_from_vit_processor()

DonutImageProcessor

1901 (10.70%)

N/A

DetrImageProcessor

1575 (8.86%)

N/A

CLIPImageProcessor

1374 (7.73%)

create_transforms_from_clip_processor()

VideoMAEImageProcessor

1093 (6.15%)

N/A

ConvNextImageProcessor

648 (3.65%)

create_transforms_from_convnext_processor()

SegformerImageProcessor

533 (3.00%)

N/A

BeitImageProcessor

468 (2.63%)

N/A

SiglipImageProcessor

440 (2.48%)

create_transforms_from_siglip_processor()

LayoutLMv3ImageProcessor

403 (2.27%)

N/A

LayoutLMv2ImageProcessor

332 (1.87%)

N/A

MllamaImageProcessor

332 (1.87%)

N/A

Qwen2VLImageProcessor

314 (1.77%)

N/A

BlipImageProcessor

276 (1.55%)

create_transforms_from_blip_processor()

Idefics2ImageProcessor

226 (1.27%)

N/A

LlavaNextImageProcessor

215 (1.21%)

N/A

BitImageProcessor

210 (1.18%)

create_transforms_from_bit_processor()

Pix2StructImageProcessor

113 (0.64%)

N/A

ConditionalDetrImageProcessor

95 (0.53%)

N/A

SamImageProcessor

92 (0.52%)

N/A

DeiTImageProcessor

91 (0.51%)

N/A

Mask2FormerImageProcessor

89 (0.50%)

N/A

VivitImageProcessor

88 (0.50%)

N/A

YolosImageProcessor

84 (0.47%)

N/A

ViltImageProcessor

73 (0.41%)

N/A

DetaImageProcessor

68 (0.38%)

N/A

PixtralImageProcessor

68 (0.38%)

N/A

MobileNetV2ImageProcessor

63 (0.35%)

create_transforms_from_mobilenetv2_processor()

MobileViTImageProcessor

61 (0.34%)

N/A

DPTImageProcessor

51 (0.29%)

N/A

MaskFormerImageProcessor

49 (0.28%)

N/A

NougatImageProcessor

48 (0.27%)

N/A

IdeficsImageProcessor

47 (0.26%)

N/A

RTDetrImageProcessor

45 (0.25%)

N/A

EfficientNetImageProcessor

40 (0.23%)

N/A

DeformableDetrImageProcessor

36 (0.20%)

N/A

Idefics3ImageProcessor

32 (0.18%)

N/A

FuyuImageProcessor

22 (0.12%)

N/A

VideoLlavaImageProcessor

17 (0.10%)

N/A

PvtImageProcessor

16 (0.09%)

N/A

OneFormerImageProcessor

14 (0.08%)

N/A

MobileNetV1ImageProcessor

12 (0.07%)

N/A

Owlv2ImageProcessor

12 (0.07%)

N/A

ChineseCLIPImageProcessor

9 (0.05%)

N/A

EfficientFormerImageProcessor

8 (0.05%)

N/A

LlavaOnevisionImageProcessor

8 (0.05%)

N/A

Swin2SRImageProcessor

8 (0.05%)

N/A

ViTHybridImageProcessor

8 (0.05%)

N/A

OwlViTImageProcessor

7 (0.04%)

N/A

GroundingDinoImageProcessor

6 (0.03%)

N/A

PerceiverImageProcessor

6 (0.03%)

N/A

ChameleonImageProcessor

5 (0.03%)

N/A

LevitImageProcessor

5 (0.03%)

N/A

VitMatteImageProcessor

5 (0.03%)

N/A

register_creators_for_transformers

imgutils.preprocess.transformers.register_creators_for_transformers()[source]

Decorator that registers functions as transform creators for transformers processors.

This decorator system allows for extensible support of different processor types. When a function is decorated with this decorator, it is added to the list of available transform creators that will be tried when creating transforms from a transformers processor.

Returns:

Decorator function that registers the decorated function

Return type:

callable

Example:
>>> @register_creators_for_transformers()
>>> def create_clip_transforms(processor):
...     if not hasattr(processor, 'feature_extractor'):
...         raise NotProcessorTypeError()
...     # Create and return transforms for CLIP
...     return transforms

NotProcessorTypeError

class imgutils.preprocess.transformers.NotProcessorTypeError[source]

Exception raised when an unsupported processor type is encountered.

This custom exception is used when the system cannot create transforms from a given transformers processor, either because the processor type is not recognized or is not supported by any registered transform creators.

Inherits:

TypeError

create_transforms_from_transformers

imgutils.preprocess.transformers.create_transforms_from_transformers(processor)[source]

Create appropriate image transforms from a given transformers processor.

This function attempts to create image transforms by iterating through registered creator functions until one successfully creates transforms for the given processor type.

Parameters:

processor (transformers.ImageProcessor or similar) – A processor instance from the transformers library

Returns:

A composition of image transforms suitable for the given processor

Return type:

PillowCompose or similar transform object

Raises:

NotProcessorTypeError – If no registered creator can handle the processor type

Example:
>>> from transformers import AutoImageProcessor
>>> from imgutils.preprocess.transformers import create_transforms_from_transformers
>>>
>>> processor = AutoImageProcessor.from_pretrained("openai/clip-vit-base-patch32")
>>> transforms = create_transforms_from_transformers(processor)
>>> transforms
PillowCompose(
    PillowConvertRGB(force_background='white')
    PillowResize(size=224, interpolation=bicubic, max_size=None, antialias=True)
    PillowCenterCrop(size=(224, 224))
    PillowToTensor()
    PillowNormalize(mean=[0.48145467 0.4578275  0.40821072], std=[0.26862955 0.2613026  0.2757771 ])
)

is_valid_size_dict

imgutils.preprocess.transformers.is_valid_size_dict(size_dict)[source]

Validate if a dictionary contains valid image size specifications.

Parameters:

size_dict (dict) – Dictionary to validate

Returns:

True if the dictionary contains valid size specifications, False otherwise

Return type:

bool

Examples:
>>> is_valid_size_dict({"height": 100, "width": 200})
True
>>> is_valid_size_dict({"shortest_edge": 100})
True
>>> is_valid_size_dict({"invalid_key": 100})
False

convert_to_size_dict

imgutils.preprocess.transformers.convert_to_size_dict(size, max_size=None, default_to_square=True, height_width_order=True)[source]

Convert various size input formats to a standardized size dictionary.

Parameters:
  • size (int or tuple or list or None) – Size specification as integer, tuple/list, or None

  • max_size (int or None) – Optional maximum size constraint

  • default_to_square (bool) – If True, single integer creates square dimensions

  • height_width_order (bool) – If True, tuple values are (height, width), else (width, height)

Returns:

Dictionary with standardized size format

Return type:

dict

Raises:

ValueError – If size specification is invalid or incompatible with other parameters

Examples:
>>> convert_to_size_dict(100)
{'height': 100, 'width': 100}
>>> convert_to_size_dict((200, 300), height_width_order=True)
{'height': 200, 'width': 300}
>>> convert_to_size_dict(100, max_size=200, default_to_square=False)
{'shortest_edge': 100, 'longest_edge': 200}

get_size_dict

imgutils.preprocess.transformers.get_size_dict(size=None, max_size=None, height_width_order=True, default_to_square=True, param_name='size') dict[source]

Convert and validate size parameters into a standardized dictionary format.

This function serves as the main entry point for size processing, handling various input formats and ensuring they conform to valid size specifications.

Parameters:
  • size (int or tuple or list or dict or None) – Size specification as integer, tuple/list, dictionary, or None

  • max_size (int or None) – Optional maximum size constraint

  • height_width_order (bool) – If True, tuple values are (height, width), else (width, height)

  • default_to_square (bool) – If True, single integer creates square dimensions

  • param_name (str) – Parameter name for error messages

Returns:

Dictionary with standardized size format

Return type:

dict

Raises:

ValueError – If size specification is invalid or incompatible with other parameters

Examples:
>>> get_size_dict(100)
{'height': 100, 'width': 100}
>>> get_size_dict({'shortest_edge': 100})
{'shortest_edge': 100}
>>> get_size_dict((200, 300), height_width_order=True)
{'height': 200, 'width': 300}

create_clip_transforms

imgutils.preprocess.transformers.create_clip_transforms(do_resize: bool = True, size=<object object>, resample=3, do_center_crop=True, crop_size=<object object>, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>, do_convert_rgb: bool = True)[source]

Creates a composition of image transforms typically used for CLIP models.

Parameters:
  • do_resize (bool) – Whether to resize the image.

  • size (dict) – Target size for resizing. Can be {“shortest_edge”: int} or {“height”: int, “width”: int}.

  • resample (int) – PIL resampling filter to use for resizing.

  • do_center_crop (bool) – Whether to center crop the image.

  • crop_size (dict) – Size for center cropping in {“height”: int, “width”: int} format.

  • do_rescale (bool) – Whether to rescale pixel values.

  • rescale_factor (float) – Factor to use for rescaling pixels.

  • do_normalize (bool) – Whether to normalize the image.

  • image_mean (list or tuple) – Mean values for normalization.

  • image_std (list or tuple) – Standard deviation values for normalization.

  • do_convert_rgb (bool) – Whether to convert image to RGB.

Returns:

A composed transformation pipeline.

Return type:

PillowCompose

create_transforms_from_clip_processor

imgutils.preprocess.transformers.create_transforms_from_clip_processor(processor)[source]

Creates image transforms from a CLIP processor configuration.

Parameters:

processor (Union[CLIPProcessor, CLIPImageProcessor]) – A CLIP processor or image processor instance from transformers library.

Returns:

A composed transformation pipeline matching the processor’s configuration.

Return type:

PillowCompose

Raises:

NotProcessorTypeError – If the provided processor is not a CLIP processor.

create_convnext_transforms

imgutils.preprocess.transformers.create_convnext_transforms(do_resize: bool = True, size=<object object>, crop_pct: float = <object object>, resample=2, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>)[source]

Create a composition of image transforms specifically tailored for ConvNext models.

This function creates a transformation pipeline that can include resizing, rescaling, and normalization operations. The transforms are applied in the following order:

  1. Resize (optional)

  2. Convert to tensor

  3. Rescale (optional)

  4. Normalize (optional)

Parameters:
  • do_resize (bool) – Whether to resize the image

  • size (dict) – Target size dictionary with ‘shortest_edge’ key

  • crop_pct (float) – Center crop percentage, used to compute resize size

  • resample (int) – PIL resampling filter to use for resizing

  • do_rescale (bool) – Whether to rescale pixel values

  • rescale_factor (float) – Factor to use for rescaling pixels

  • do_normalize (bool) – Whether to normalize the image

  • image_mean (tuple or list) – Mean values for normalization

  • image_std (tuple or list) – Standard deviation values for normalization

Returns:

A composed transformation pipeline

Return type:

PillowCompose

create_transforms_from_convnext_processor

imgutils.preprocess.transformers.create_transforms_from_convnext_processor(processor)[source]

Create image transforms from a ConvNext processor configuration.

This function takes a Hugging Face ConvNextImageProcessor and creates a corresponding transformation pipeline that matches its configuration settings.

Parameters:

processor (ConvNextImageProcessor) – The ConvNext image processor to create transforms from

Returns:

A composed transformation pipeline matching the processor’s configuration

Return type:

PillowCompose

Raises:

NotProcessorTypeError – If the provided processor is not a ConvNextImageProcessor

create_vit_transforms

imgutils.preprocess.transformers.create_vit_transforms(do_resize: bool = True, size=<object object>, resample: int = 2, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>)[source]

Create a composition of image transforms typically used for ViT models.

This function creates a transform pipeline that can include resizing, tensor conversion, rescaling, and normalization operations. The transforms are applied in sequence to prepare images for ViT model input.

Parameters:
  • do_resize (bool) – Whether to resize the input images

  • size (dict) – Target size for resizing, should be dict with ‘height’ and ‘width’ keys

  • resample (int) – PIL resampling filter to use for resizing

  • do_rescale (bool) – Whether to rescale pixel values

  • rescale_factor (float) – Factor to use for rescaling pixel values

  • do_normalize (bool) – Whether to normalize the image

  • image_mean (tuple or list) – Mean values for normalization

  • image_std (tuple or list) – Standard deviation values for normalization

Returns:

A composition of image transforms

Return type:

PillowCompose

create_transforms_from_vit_processor

imgutils.preprocess.transformers.create_transforms_from_vit_processor(processor)[source]

Create image transforms from a Hugging Face ViT processor configuration.

This function takes a ViT image processor from the transformers library and creates a matching transform pipeline that replicates the processor’s preprocessing steps.

Parameters:

processor (ViTImageProcessor) – A ViT image processor from Hugging Face transformers

Returns:

A composition of image transforms matching the processor’s configuration

Return type:

PillowCompose

Raises:

NotProcessorTypeError – If the provided processor is not a ViTImageProcessor

create_siglip_transforms

imgutils.preprocess.transformers.create_siglip_transforms(do_resize: bool = True, size=<object object>, resample: int = 3, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>, do_convert_rgb: bool = True)[source]

Creates a composition of image transformations for SigLIP model input processing.

This function builds a pipeline of image transformations that can include:

  • RGB conversion

  • Image resizing

  • Tensor conversion

  • Image rescaling

  • Normalization

Parameters:
  • do_resize (bool) – Whether to resize the image

  • size (dict) – Target size dictionary with ‘height’ and ‘width’ keys

  • resample (int) – PIL image resampling filter to use for resizing

  • do_rescale (bool) – Whether to rescale pixel values

  • rescale_factor (float) – Factor to use for pixel value rescaling

  • do_normalize (bool) – Whether to normalize the image

  • image_mean (tuple or list) – Mean values for normalization

  • image_std (tuple or list) – Standard deviation values for normalization

  • do_convert_rgb (bool) – Whether to convert image to RGB

Returns:

A composed transformation pipeline

Return type:

PillowCompose

create_transforms_from_siglip_processor

imgutils.preprocess.transformers.create_transforms_from_siglip_processor(processor)[source]

Creates image transformations from a SigLIP processor configuration.

This function extracts transformation parameters from a HuggingFace SigLIP image processor and creates a corresponding transformation pipeline.

Parameters:

processor (SiglipImageProcessor) – A HuggingFace SigLIP image processor instance

Returns:

A composed transformation pipeline

Return type:

PillowCompose

Raises:

NotProcessorTypeError – If the processor is not a SiglipImageProcessor

create_bit_transforms

imgutils.preprocess.transformers.create_bit_transforms(do_resize: bool = True, size=<object object>, resample=3, do_center_crop: bool = True, crop_size=<object object>, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>, do_convert_rgb: bool = True)[source]

Create an image transformation pipeline for BiT models.

This function creates a composition of image transformations including RGB conversion, resizing, center cropping, tensor conversion, rescaling and normalization.

Parameters:
  • do_resize (bool) – Whether to resize the image.

  • size (dict) – Target size for resizing. Can be {“shortest_edge”: int} or {“height”: int, “width”: int}.

  • resample (int) – PIL interpolation method for resizing.

  • do_center_crop (bool) – Whether to perform center cropping.

  • crop_size (dict) – Size for center cropping, in format {“height”: int, “width”: int}.

  • do_rescale (bool) – Whether to rescale pixel values.

  • rescale_factor (float) – Factor to rescale pixel values.

  • do_normalize (bool) – Whether to normalize the image.

  • image_mean (list or tuple) – Mean values for normalization.

  • image_std (list or tuple) – Standard deviation values for normalization.

  • do_convert_rgb (bool) – Whether to convert image to RGB.

Returns:

A composition of image transformations.

Return type:

PillowCompose

Raises:

ValueError – If size configuration is invalid.

create_transforms_from_bit_processor

imgutils.preprocess.transformers.create_transforms_from_bit_processor(processor)[source]

Create image transformations from a BiT image processor.

This function creates a transformation pipeline based on the configuration of a Hugging Face BitImageProcessor.

Parameters:

processor (BitImageProcessor) – The BiT image processor to create transforms from.

Returns:

A composition of image transformations.

Return type:

PillowCompose

Raises:

NotProcessorTypeError – If the processor is not a BitImageProcessor.

create_blip_transforms

imgutils.preprocess.transformers.create_blip_transforms(do_resize: bool = True, size=<object object>, resample=3, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>, do_convert_rgb: bool = True)[source]

Create a transformation pipeline for BLIP image processing.

This function builds a sequence of image transformations commonly used in BLIP models, including RGB conversion, resizing, tensor conversion, rescaling, and normalization.

Parameters:
  • do_resize (bool) – Whether to resize the image.

  • size (dict) – Target size for resizing, expects dict with ‘height’ and ‘width’ keys. Defaults to {‘height’: 384, ‘width’: 384}.

  • resample (int) – Resampling filter for resize operation. Defaults to PIL.Image.BICUBIC.

  • do_rescale (bool) – Whether to rescale pixel values.

  • rescale_factor (float) – Factor to rescale pixel values. Defaults to 1/255.

  • do_normalize (bool) – Whether to normalize the image.

  • image_mean (tuple or list) – Mean values for normalization. Defaults to OPENAI_CLIP_MEAN.

  • image_std (tuple or list) – Standard deviation values for normalization. Defaults to OPENAI_CLIP_STD.

  • do_convert_rgb (bool) – Whether to convert image to RGB.

Returns:

A composed transformation pipeline.

Return type:

PillowCompose

create_transforms_from_blip_processor

imgutils.preprocess.transformers.create_transforms_from_blip_processor(processor)[source]

Create image transformations from a HuggingFace BLIP processor.

This function extracts configuration from a HuggingFace BLIP processor and creates a corresponding transformation pipeline using create_blip_transforms.

Parameters:

processor (transformers.BlipImageProcessor) – A HuggingFace BLIP image processor instance.

Returns:

A composed transformation pipeline configured according to the processor’s settings.

Return type:

PillowCompose

Raises:

NotProcessorTypeError – If the provided processor is not a BlipImageProcessor.

create_mobilenetv2_transforms

imgutils.preprocess.transformers.create_mobilenetv2_transforms(do_resize: bool = True, size: ~typing.Dict[str, int] | None = <object object>, resample=2, do_center_crop: bool = True, crop_size: ~typing.Dict[str, int] = <object object>, do_rescale: bool = True, rescale_factor: int | float = 0.00392156862745098, do_normalize: bool = True, image_mean: float | ~typing.List[float] | None = <object object>, image_std: float | ~typing.List[float] | None = <object object>)[source]

Creates a composition of transforms that replicates the behavior of MobileNetV2ImageProcessor.

This function builds a pipeline of image transformations typically used for MobileNetV2 models, including resizing, center cropping, tensor conversion, rescaling, and normalization.

Parameters:
  • do_resize (bool) – Whether to resize the image.

  • size (Optional[Dict[str, int]]) – Size dictionary specifying resize parameters. Can include keys like ‘shortest_edge’, ‘height’, ‘width’, etc.

  • resample (PIL.Image.Resampling) – Resampling filter to use for resizing operations.

  • do_center_crop (bool) – Whether to apply center cropping to the image.

  • crop_size (Dict[str, int]) – Dictionary specifying the height and width for center cropping.

  • do_rescale (bool) – Whether to rescale pixel values after tensor conversion.

  • rescale_factor (Union[int, float]) – Factor by which to rescale the image pixel values.

  • do_normalize (bool) – Whether to normalize the image with mean and std.

  • image_mean (Optional[Union[float, List[float]]]) – Mean values for normalization, per channel.

  • image_std (Optional[Union[float, List[float]]]) – Standard deviation values for normalization, per channel.

Returns:

A composition of transforms matching MobileNetV2ImageProcessor behavior.

Return type:

PillowCompose

create_transforms_from_mobilenetv2_processor

imgutils.preprocess.transformers.create_transforms_from_mobilenetv2_processor(processor)[source]

Creates transform composition from a MobileNetV2ImageProcessor instance.

This function extracts configuration from a transformers MobileNetV2ImageProcessor and creates an equivalent transform pipeline using the create_mobilenetv2_transforms function.

Parameters:

processor (transformers.MobileNetV2ImageProcessor) – A MobileNetV2ImageProcessor instance from the transformers library.

Returns:

A composition of transforms matching the processor’s configuration.

Return type:

PillowCompose

Raises:

NotProcessorTypeError – If the provided processor is not a MobileNetV2ImageProcessor.