imgutils.preprocess.transformers
- Overview:
Convert transformers image processors to PillowCompose objects.
Supported Processors:
Name
Supported
Repos
Function
ViTImageProcessor
✅
5906 (33.24%)
DonutImageProcessor
❌
1901 (10.70%)
N/A
DetrImageProcessor
❌
1575 (8.86%)
N/A
CLIPImageProcessor
✅
1374 (7.73%)
VideoMAEImageProcessor
❌
1093 (6.15%)
N/A
ConvNextImageProcessor
✅
648 (3.65%)
SegformerImageProcessor
❌
533 (3.00%)
N/A
BeitImageProcessor
❌
468 (2.63%)
N/A
SiglipImageProcessor
✅
440 (2.48%)
LayoutLMv3ImageProcessor
❌
403 (2.27%)
N/A
LayoutLMv2ImageProcessor
❌
332 (1.87%)
N/A
MllamaImageProcessor
❌
332 (1.87%)
N/A
Qwen2VLImageProcessor
❌
314 (1.77%)
N/A
BlipImageProcessor
✅
276 (1.55%)
Idefics2ImageProcessor
❌
226 (1.27%)
N/A
LlavaNextImageProcessor
❌
215 (1.21%)
N/A
BitImageProcessor
✅
210 (1.18%)
Pix2StructImageProcessor
❌
113 (0.64%)
N/A
ConditionalDetrImageProcessor
❌
95 (0.53%)
N/A
SamImageProcessor
❌
92 (0.52%)
N/A
DeiTImageProcessor
❌
91 (0.51%)
N/A
Mask2FormerImageProcessor
❌
89 (0.50%)
N/A
VivitImageProcessor
❌
88 (0.50%)
N/A
YolosImageProcessor
❌
84 (0.47%)
N/A
ViltImageProcessor
❌
73 (0.41%)
N/A
DetaImageProcessor
❌
68 (0.38%)
N/A
PixtralImageProcessor
❌
68 (0.38%)
N/A
MobileNetV2ImageProcessor
✅
63 (0.35%)
MobileViTImageProcessor
❌
61 (0.34%)
N/A
DPTImageProcessor
❌
51 (0.29%)
N/A
MaskFormerImageProcessor
❌
49 (0.28%)
N/A
NougatImageProcessor
❌
48 (0.27%)
N/A
IdeficsImageProcessor
❌
47 (0.26%)
N/A
RTDetrImageProcessor
❌
45 (0.25%)
N/A
EfficientNetImageProcessor
❌
40 (0.23%)
N/A
DeformableDetrImageProcessor
❌
36 (0.20%)
N/A
Idefics3ImageProcessor
❌
32 (0.18%)
N/A
FuyuImageProcessor
❌
22 (0.12%)
N/A
VideoLlavaImageProcessor
❌
17 (0.10%)
N/A
PvtImageProcessor
❌
16 (0.09%)
N/A
OneFormerImageProcessor
❌
14 (0.08%)
N/A
MobileNetV1ImageProcessor
❌
12 (0.07%)
N/A
Owlv2ImageProcessor
❌
12 (0.07%)
N/A
ChineseCLIPImageProcessor
❌
9 (0.05%)
N/A
EfficientFormerImageProcessor
❌
8 (0.05%)
N/A
LlavaOnevisionImageProcessor
❌
8 (0.05%)
N/A
Swin2SRImageProcessor
❌
8 (0.05%)
N/A
ViTHybridImageProcessor
❌
8 (0.05%)
N/A
OwlViTImageProcessor
❌
7 (0.04%)
N/A
GroundingDinoImageProcessor
❌
6 (0.03%)
N/A
PerceiverImageProcessor
❌
6 (0.03%)
N/A
ChameleonImageProcessor
❌
5 (0.03%)
N/A
LevitImageProcessor
❌
5 (0.03%)
N/A
VitMatteImageProcessor
❌
5 (0.03%)
N/A
register_creators_for_transformers
- imgutils.preprocess.transformers.register_creators_for_transformers()[source]
Decorator that registers functions as transform creators for transformers processors.
This decorator system allows for extensible support of different processor types. When a function is decorated with this decorator, it is added to the list of available transform creators that will be tried when creating transforms from a transformers processor.
- Returns:
Decorator function that registers the decorated function
- Return type:
callable
- Example:
>>> @register_creators_for_transformers() >>> def create_clip_transforms(processor): ... if not hasattr(processor, 'feature_extractor'): ... raise NotProcessorTypeError() ... # Create and return transforms for CLIP ... return transforms
NotProcessorTypeError
- class imgutils.preprocess.transformers.NotProcessorTypeError[source]
Exception raised when an unsupported processor type is encountered.
This custom exception is used when the system cannot create transforms from a given transformers processor, either because the processor type is not recognized or is not supported by any registered transform creators.
- Inherits:
TypeError
create_transforms_from_transformers
- imgutils.preprocess.transformers.create_transforms_from_transformers(processor)[source]
Create appropriate image transforms from a given transformers processor.
This function attempts to create image transforms by iterating through registered creator functions until one successfully creates transforms for the given processor type.
- Parameters:
processor (transformers.ImageProcessor or similar) – A processor instance from the transformers library
- Returns:
A composition of image transforms suitable for the given processor
- Return type:
PillowCompose or similar transform object
- Raises:
NotProcessorTypeError – If no registered creator can handle the processor type
- Example:
>>> from transformers import AutoImageProcessor >>> from imgutils.preprocess.transformers import create_transforms_from_transformers >>> >>> processor = AutoImageProcessor.from_pretrained("openai/clip-vit-base-patch32") >>> transforms = create_transforms_from_transformers(processor) >>> transforms PillowCompose( PillowConvertRGB(force_background='white') PillowResize(size=224, interpolation=bicubic, max_size=None, antialias=True) PillowCenterCrop(size=(224, 224)) PillowToTensor() PillowNormalize(mean=[0.48145467 0.4578275 0.40821072], std=[0.26862955 0.2613026 0.2757771 ]) )
is_valid_size_dict
- imgutils.preprocess.transformers.is_valid_size_dict(size_dict)[source]
Validate if a dictionary contains valid image size specifications.
- Parameters:
size_dict (dict) – Dictionary to validate
- Returns:
True if the dictionary contains valid size specifications, False otherwise
- Return type:
bool
- Examples:
>>> is_valid_size_dict({"height": 100, "width": 200}) True >>> is_valid_size_dict({"shortest_edge": 100}) True >>> is_valid_size_dict({"invalid_key": 100}) False
convert_to_size_dict
- imgutils.preprocess.transformers.convert_to_size_dict(size, max_size=None, default_to_square=True, height_width_order=True)[source]
Convert various size input formats to a standardized size dictionary.
- Parameters:
size (int or tuple or list or None) – Size specification as integer, tuple/list, or None
max_size (int or None) – Optional maximum size constraint
default_to_square (bool) – If True, single integer creates square dimensions
height_width_order (bool) – If True, tuple values are (height, width), else (width, height)
- Returns:
Dictionary with standardized size format
- Return type:
dict
- Raises:
ValueError – If size specification is invalid or incompatible with other parameters
- Examples:
>>> convert_to_size_dict(100) {'height': 100, 'width': 100} >>> convert_to_size_dict((200, 300), height_width_order=True) {'height': 200, 'width': 300} >>> convert_to_size_dict(100, max_size=200, default_to_square=False) {'shortest_edge': 100, 'longest_edge': 200}
get_size_dict
- imgutils.preprocess.transformers.get_size_dict(size=None, max_size=None, height_width_order=True, default_to_square=True, param_name='size') dict [source]
Convert and validate size parameters into a standardized dictionary format.
This function serves as the main entry point for size processing, handling various input formats and ensuring they conform to valid size specifications.
- Parameters:
size (int or tuple or list or dict or None) – Size specification as integer, tuple/list, dictionary, or None
max_size (int or None) – Optional maximum size constraint
height_width_order (bool) – If True, tuple values are (height, width), else (width, height)
default_to_square (bool) – If True, single integer creates square dimensions
param_name (str) – Parameter name for error messages
- Returns:
Dictionary with standardized size format
- Return type:
dict
- Raises:
ValueError – If size specification is invalid or incompatible with other parameters
- Examples:
>>> get_size_dict(100) {'height': 100, 'width': 100} >>> get_size_dict({'shortest_edge': 100}) {'shortest_edge': 100} >>> get_size_dict((200, 300), height_width_order=True) {'height': 200, 'width': 300}
create_clip_transforms
- imgutils.preprocess.transformers.create_clip_transforms(do_resize: bool = True, size=<object object>, resample=3, do_center_crop=True, crop_size=<object object>, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>, do_convert_rgb: bool = True)[source]
Creates a composition of image transforms typically used for CLIP models.
- Parameters:
do_resize (bool) – Whether to resize the image.
size (dict) – Target size for resizing. Can be {“shortest_edge”: int} or {“height”: int, “width”: int}.
resample (int) – PIL resampling filter to use for resizing.
do_center_crop (bool) – Whether to center crop the image.
crop_size (dict) – Size for center cropping in {“height”: int, “width”: int} format.
do_rescale (bool) – Whether to rescale pixel values.
rescale_factor (float) – Factor to use for rescaling pixels.
do_normalize (bool) – Whether to normalize the image.
image_mean (list or tuple) – Mean values for normalization.
image_std (list or tuple) – Standard deviation values for normalization.
do_convert_rgb (bool) – Whether to convert image to RGB.
- Returns:
A composed transformation pipeline.
- Return type:
PillowCompose
create_transforms_from_clip_processor
- imgutils.preprocess.transformers.create_transforms_from_clip_processor(processor)[source]
Creates image transforms from a CLIP processor configuration.
- Parameters:
processor (Union[CLIPProcessor, CLIPImageProcessor]) – A CLIP processor or image processor instance from transformers library.
- Returns:
A composed transformation pipeline matching the processor’s configuration.
- Return type:
PillowCompose
- Raises:
NotProcessorTypeError – If the provided processor is not a CLIP processor.
create_convnext_transforms
- imgutils.preprocess.transformers.create_convnext_transforms(do_resize: bool = True, size=<object object>, crop_pct: float = <object object>, resample=2, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>)[source]
Create a composition of image transforms specifically tailored for ConvNext models.
This function creates a transformation pipeline that can include resizing, rescaling, and normalization operations. The transforms are applied in the following order:
Resize (optional)
Convert to tensor
Rescale (optional)
Normalize (optional)
- Parameters:
do_resize (bool) – Whether to resize the image
size (dict) – Target size dictionary with ‘shortest_edge’ key
crop_pct (float) – Center crop percentage, used to compute resize size
resample (int) – PIL resampling filter to use for resizing
do_rescale (bool) – Whether to rescale pixel values
rescale_factor (float) – Factor to use for rescaling pixels
do_normalize (bool) – Whether to normalize the image
image_mean (tuple or list) – Mean values for normalization
image_std (tuple or list) – Standard deviation values for normalization
- Returns:
A composed transformation pipeline
- Return type:
PillowCompose
create_transforms_from_convnext_processor
- imgutils.preprocess.transformers.create_transforms_from_convnext_processor(processor)[source]
Create image transforms from a ConvNext processor configuration.
This function takes a Hugging Face ConvNextImageProcessor and creates a corresponding transformation pipeline that matches its configuration settings.
- Parameters:
processor (ConvNextImageProcessor) – The ConvNext image processor to create transforms from
- Returns:
A composed transformation pipeline matching the processor’s configuration
- Return type:
PillowCompose
- Raises:
NotProcessorTypeError – If the provided processor is not a ConvNextImageProcessor
create_vit_transforms
- imgutils.preprocess.transformers.create_vit_transforms(do_resize: bool = True, size=<object object>, resample: int = 2, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>)[source]
Create a composition of image transforms typically used for ViT models.
This function creates a transform pipeline that can include resizing, tensor conversion, rescaling, and normalization operations. The transforms are applied in sequence to prepare images for ViT model input.
- Parameters:
do_resize (bool) – Whether to resize the input images
size (dict) – Target size for resizing, should be dict with ‘height’ and ‘width’ keys
resample (int) – PIL resampling filter to use for resizing
do_rescale (bool) – Whether to rescale pixel values
rescale_factor (float) – Factor to use for rescaling pixel values
do_normalize (bool) – Whether to normalize the image
image_mean (tuple or list) – Mean values for normalization
image_std (tuple or list) – Standard deviation values for normalization
- Returns:
A composition of image transforms
- Return type:
PillowCompose
create_transforms_from_vit_processor
- imgutils.preprocess.transformers.create_transforms_from_vit_processor(processor)[source]
Create image transforms from a Hugging Face ViT processor configuration.
This function takes a ViT image processor from the transformers library and creates a matching transform pipeline that replicates the processor’s preprocessing steps.
- Parameters:
processor (ViTImageProcessor) – A ViT image processor from Hugging Face transformers
- Returns:
A composition of image transforms matching the processor’s configuration
- Return type:
PillowCompose
- Raises:
NotProcessorTypeError – If the provided processor is not a ViTImageProcessor
create_siglip_transforms
- imgutils.preprocess.transformers.create_siglip_transforms(do_resize: bool = True, size=<object object>, resample: int = 3, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>, do_convert_rgb: bool = True)[source]
Creates a composition of image transformations for SigLIP model input processing.
This function builds a pipeline of image transformations that can include:
RGB conversion
Image resizing
Tensor conversion
Image rescaling
Normalization
- Parameters:
do_resize (bool) – Whether to resize the image
size (dict) – Target size dictionary with ‘height’ and ‘width’ keys
resample (int) – PIL image resampling filter to use for resizing
do_rescale (bool) – Whether to rescale pixel values
rescale_factor (float) – Factor to use for pixel value rescaling
do_normalize (bool) – Whether to normalize the image
image_mean (tuple or list) – Mean values for normalization
image_std (tuple or list) – Standard deviation values for normalization
do_convert_rgb (bool) – Whether to convert image to RGB
- Returns:
A composed transformation pipeline
- Return type:
PillowCompose
create_transforms_from_siglip_processor
- imgutils.preprocess.transformers.create_transforms_from_siglip_processor(processor)[source]
Creates image transformations from a SigLIP processor configuration.
This function extracts transformation parameters from a HuggingFace SigLIP image processor and creates a corresponding transformation pipeline.
- Parameters:
processor (SiglipImageProcessor) – A HuggingFace SigLIP image processor instance
- Returns:
A composed transformation pipeline
- Return type:
PillowCompose
- Raises:
NotProcessorTypeError – If the processor is not a SiglipImageProcessor
create_bit_transforms
- imgutils.preprocess.transformers.create_bit_transforms(do_resize: bool = True, size=<object object>, resample=3, do_center_crop: bool = True, crop_size=<object object>, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>, do_convert_rgb: bool = True)[source]
Create an image transformation pipeline for BiT models.
This function creates a composition of image transformations including RGB conversion, resizing, center cropping, tensor conversion, rescaling and normalization.
- Parameters:
do_resize (bool) – Whether to resize the image.
size (dict) – Target size for resizing. Can be {“shortest_edge”: int} or {“height”: int, “width”: int}.
resample (int) – PIL interpolation method for resizing.
do_center_crop (bool) – Whether to perform center cropping.
crop_size (dict) – Size for center cropping, in format {“height”: int, “width”: int}.
do_rescale (bool) – Whether to rescale pixel values.
rescale_factor (float) – Factor to rescale pixel values.
do_normalize (bool) – Whether to normalize the image.
image_mean (list or tuple) – Mean values for normalization.
image_std (list or tuple) – Standard deviation values for normalization.
do_convert_rgb (bool) – Whether to convert image to RGB.
- Returns:
A composition of image transformations.
- Return type:
PillowCompose
- Raises:
ValueError – If size configuration is invalid.
create_transforms_from_bit_processor
- imgutils.preprocess.transformers.create_transforms_from_bit_processor(processor)[source]
Create image transformations from a BiT image processor.
This function creates a transformation pipeline based on the configuration of a Hugging Face BitImageProcessor.
- Parameters:
processor (BitImageProcessor) – The BiT image processor to create transforms from.
- Returns:
A composition of image transformations.
- Return type:
PillowCompose
- Raises:
NotProcessorTypeError – If the processor is not a BitImageProcessor.
create_blip_transforms
- imgutils.preprocess.transformers.create_blip_transforms(do_resize: bool = True, size=<object object>, resample=3, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>, do_convert_rgb: bool = True)[source]
Create a transformation pipeline for BLIP image processing.
This function builds a sequence of image transformations commonly used in BLIP models, including RGB conversion, resizing, tensor conversion, rescaling, and normalization.
- Parameters:
do_resize (bool) – Whether to resize the image.
size (dict) – Target size for resizing, expects dict with ‘height’ and ‘width’ keys. Defaults to {‘height’: 384, ‘width’: 384}.
resample (int) – Resampling filter for resize operation. Defaults to PIL.Image.BICUBIC.
do_rescale (bool) – Whether to rescale pixel values.
rescale_factor (float) – Factor to rescale pixel values. Defaults to 1/255.
do_normalize (bool) – Whether to normalize the image.
image_mean (tuple or list) – Mean values for normalization. Defaults to OPENAI_CLIP_MEAN.
image_std (tuple or list) – Standard deviation values for normalization. Defaults to OPENAI_CLIP_STD.
do_convert_rgb (bool) – Whether to convert image to RGB.
- Returns:
A composed transformation pipeline.
- Return type:
PillowCompose
create_transforms_from_blip_processor
- imgutils.preprocess.transformers.create_transforms_from_blip_processor(processor)[source]
Create image transformations from a HuggingFace BLIP processor.
This function extracts configuration from a HuggingFace BLIP processor and creates a corresponding transformation pipeline using create_blip_transforms.
- Parameters:
processor (transformers.BlipImageProcessor) – A HuggingFace BLIP image processor instance.
- Returns:
A composed transformation pipeline configured according to the processor’s settings.
- Return type:
PillowCompose
- Raises:
NotProcessorTypeError – If the provided processor is not a BlipImageProcessor.
create_mobilenetv2_transforms
- imgutils.preprocess.transformers.create_mobilenetv2_transforms(do_resize: bool = True, size: ~typing.Dict[str, int] | None = <object object>, resample=2, do_center_crop: bool = True, crop_size: ~typing.Dict[str, int] = <object object>, do_rescale: bool = True, rescale_factor: int | float = 0.00392156862745098, do_normalize: bool = True, image_mean: float | ~typing.List[float] | None = <object object>, image_std: float | ~typing.List[float] | None = <object object>)[source]
Creates a composition of transforms that replicates the behavior of MobileNetV2ImageProcessor.
This function builds a pipeline of image transformations typically used for MobileNetV2 models, including resizing, center cropping, tensor conversion, rescaling, and normalization.
- Parameters:
do_resize (bool) – Whether to resize the image.
size (Optional[Dict[str, int]]) – Size dictionary specifying resize parameters. Can include keys like ‘shortest_edge’, ‘height’, ‘width’, etc.
resample (PIL.Image.Resampling) – Resampling filter to use for resizing operations.
do_center_crop (bool) – Whether to apply center cropping to the image.
crop_size (Dict[str, int]) – Dictionary specifying the height and width for center cropping.
do_rescale (bool) – Whether to rescale pixel values after tensor conversion.
rescale_factor (Union[int, float]) – Factor by which to rescale the image pixel values.
do_normalize (bool) – Whether to normalize the image with mean and std.
image_mean (Optional[Union[float, List[float]]]) – Mean values for normalization, per channel.
image_std (Optional[Union[float, List[float]]]) – Standard deviation values for normalization, per channel.
- Returns:
A composition of transforms matching MobileNetV2ImageProcessor behavior.
- Return type:
PillowCompose
create_transforms_from_mobilenetv2_processor
- imgutils.preprocess.transformers.create_transforms_from_mobilenetv2_processor(processor)[source]
Creates transform composition from a MobileNetV2ImageProcessor instance.
This function extracts configuration from a transformers MobileNetV2ImageProcessor and creates an equivalent transform pipeline using the create_mobilenetv2_transforms function.
- Parameters:
processor (transformers.MobileNetV2ImageProcessor) – A MobileNetV2ImageProcessor instance from the transformers library.
- Returns:
A composition of transforms matching the processor’s configuration.
- Return type:
PillowCompose
- Raises:
NotProcessorTypeError – If the provided processor is not a MobileNetV2ImageProcessor.