imgutils.preprocess.transformers
- Overview:
- Convert transformers image processors to PillowCompose objects. 
Supported Processors:
Name
Supported
Repos
Function
ViTImageProcessor
✅
5906 (33.24%)
DonutImageProcessor
❌
1901 (10.70%)
N/A
DetrImageProcessor
❌
1575 (8.86%)
N/A
CLIPImageProcessor
✅
1374 (7.73%)
VideoMAEImageProcessor
❌
1093 (6.15%)
N/A
ConvNextImageProcessor
✅
648 (3.65%)
SegformerImageProcessor
❌
533 (3.00%)
N/A
BeitImageProcessor
❌
468 (2.63%)
N/A
SiglipImageProcessor
✅
440 (2.48%)
LayoutLMv3ImageProcessor
❌
403 (2.27%)
N/A
LayoutLMv2ImageProcessor
❌
332 (1.87%)
N/A
MllamaImageProcessor
❌
332 (1.87%)
N/A
Qwen2VLImageProcessor
❌
314 (1.77%)
N/A
BlipImageProcessor
✅
276 (1.55%)
Idefics2ImageProcessor
❌
226 (1.27%)
N/A
LlavaNextImageProcessor
❌
215 (1.21%)
N/A
BitImageProcessor
✅
210 (1.18%)
Pix2StructImageProcessor
❌
113 (0.64%)
N/A
ConditionalDetrImageProcessor
❌
95 (0.53%)
N/A
SamImageProcessor
❌
92 (0.52%)
N/A
DeiTImageProcessor
❌
91 (0.51%)
N/A
Mask2FormerImageProcessor
❌
89 (0.50%)
N/A
VivitImageProcessor
❌
88 (0.50%)
N/A
YolosImageProcessor
❌
84 (0.47%)
N/A
ViltImageProcessor
❌
73 (0.41%)
N/A
DetaImageProcessor
❌
68 (0.38%)
N/A
PixtralImageProcessor
❌
68 (0.38%)
N/A
MobileNetV2ImageProcessor
✅
63 (0.35%)
MobileViTImageProcessor
❌
61 (0.34%)
N/A
DPTImageProcessor
❌
51 (0.29%)
N/A
MaskFormerImageProcessor
❌
49 (0.28%)
N/A
NougatImageProcessor
❌
48 (0.27%)
N/A
IdeficsImageProcessor
❌
47 (0.26%)
N/A
RTDetrImageProcessor
❌
45 (0.25%)
N/A
EfficientNetImageProcessor
❌
40 (0.23%)
N/A
DeformableDetrImageProcessor
❌
36 (0.20%)
N/A
Idefics3ImageProcessor
❌
32 (0.18%)
N/A
FuyuImageProcessor
❌
22 (0.12%)
N/A
VideoLlavaImageProcessor
❌
17 (0.10%)
N/A
PvtImageProcessor
❌
16 (0.09%)
N/A
OneFormerImageProcessor
❌
14 (0.08%)
N/A
MobileNetV1ImageProcessor
❌
12 (0.07%)
N/A
Owlv2ImageProcessor
❌
12 (0.07%)
N/A
ChineseCLIPImageProcessor
❌
9 (0.05%)
N/A
EfficientFormerImageProcessor
❌
8 (0.05%)
N/A
LlavaOnevisionImageProcessor
❌
8 (0.05%)
N/A
Swin2SRImageProcessor
❌
8 (0.05%)
N/A
ViTHybridImageProcessor
❌
8 (0.05%)
N/A
OwlViTImageProcessor
❌
7 (0.04%)
N/A
GroundingDinoImageProcessor
❌
6 (0.03%)
N/A
PerceiverImageProcessor
❌
6 (0.03%)
N/A
ChameleonImageProcessor
❌
5 (0.03%)
N/A
LevitImageProcessor
❌
5 (0.03%)
N/A
VitMatteImageProcessor
❌
5 (0.03%)
N/A
register_creators_for_transformers
- imgutils.preprocess.transformers.register_creators_for_transformers()[source]
- Decorator that registers functions as transform creators for transformers processors. - This decorator system allows for extensible support of different processor types. When a function is decorated with this decorator, it is added to the list of available transform creators that will be tried when creating transforms from a transformers processor. - Returns:
- Decorator function that registers the decorated function 
- Return type:
- callable 
- Example:
- >>> @register_creators_for_transformers() >>> def create_clip_transforms(processor): ... if not hasattr(processor, 'feature_extractor'): ... raise NotProcessorTypeError() ... # Create and return transforms for CLIP ... return transforms 
 
NotProcessorTypeError
- class imgutils.preprocess.transformers.NotProcessorTypeError[source]
- Exception raised when an unsupported processor type is encountered. - This custom exception is used when the system cannot create transforms from a given transformers processor, either because the processor type is not recognized or is not supported by any registered transform creators. - Inherits:
- TypeError 
 
create_transforms_from_transformers
- imgutils.preprocess.transformers.create_transforms_from_transformers(processor)[source]
- Create appropriate image transforms from a given transformers processor. - This function attempts to create image transforms by iterating through registered creator functions until one successfully creates transforms for the given processor type. - Parameters:
- processor (transformers.ImageProcessor or similar) – A processor instance from the transformers library 
- Returns:
- A composition of image transforms suitable for the given processor 
- Return type:
- PillowCompose or similar transform object 
- Raises:
- NotProcessorTypeError – If no registered creator can handle the processor type 
- Example:
- >>> from transformers import AutoImageProcessor >>> from imgutils.preprocess.transformers import create_transforms_from_transformers >>> >>> processor = AutoImageProcessor.from_pretrained("openai/clip-vit-base-patch32") >>> transforms = create_transforms_from_transformers(processor) >>> transforms PillowCompose( PillowConvertRGB(force_background='white') PillowResize(size=224, interpolation=bicubic, max_size=None, antialias=True) PillowCenterCrop(size=(224, 224)) PillowToTensor() PillowNormalize(mean=[0.48145467 0.4578275 0.40821072], std=[0.26862955 0.2613026 0.2757771 ]) ) 
 
is_valid_size_dict
- imgutils.preprocess.transformers.is_valid_size_dict(size_dict)[source]
- Validate if a dictionary contains valid image size specifications. - Parameters:
- size_dict (dict) – Dictionary to validate 
- Returns:
- True if the dictionary contains valid size specifications, False otherwise 
- Return type:
- bool 
- Examples:
- >>> is_valid_size_dict({"height": 100, "width": 200}) True >>> is_valid_size_dict({"shortest_edge": 100}) True >>> is_valid_size_dict({"invalid_key": 100}) False 
 
convert_to_size_dict
- imgutils.preprocess.transformers.convert_to_size_dict(size, max_size=None, default_to_square=True, height_width_order=True)[source]
- Convert various size input formats to a standardized size dictionary. - Parameters:
- size (int or tuple or list or None) – Size specification as integer, tuple/list, or None 
- max_size (int or None) – Optional maximum size constraint 
- default_to_square (bool) – If True, single integer creates square dimensions 
- height_width_order (bool) – If True, tuple values are (height, width), else (width, height) 
 
- Returns:
- Dictionary with standardized size format 
- Return type:
- dict 
- Raises:
- ValueError – If size specification is invalid or incompatible with other parameters 
- Examples:
- >>> convert_to_size_dict(100) {'height': 100, 'width': 100} >>> convert_to_size_dict((200, 300), height_width_order=True) {'height': 200, 'width': 300} >>> convert_to_size_dict(100, max_size=200, default_to_square=False) {'shortest_edge': 100, 'longest_edge': 200} 
 
get_size_dict
- imgutils.preprocess.transformers.get_size_dict(size=None, max_size=None, height_width_order=True, default_to_square=True, param_name='size') dict[source]
- Convert and validate size parameters into a standardized dictionary format. - This function serves as the main entry point for size processing, handling various input formats and ensuring they conform to valid size specifications. - Parameters:
- size (int or tuple or list or dict or None) – Size specification as integer, tuple/list, dictionary, or None 
- max_size (int or None) – Optional maximum size constraint 
- height_width_order (bool) – If True, tuple values are (height, width), else (width, height) 
- default_to_square (bool) – If True, single integer creates square dimensions 
- param_name (str) – Parameter name for error messages 
 
- Returns:
- Dictionary with standardized size format 
- Return type:
- dict 
- Raises:
- ValueError – If size specification is invalid or incompatible with other parameters 
- Examples:
- >>> get_size_dict(100) {'height': 100, 'width': 100} >>> get_size_dict({'shortest_edge': 100}) {'shortest_edge': 100} >>> get_size_dict((200, 300), height_width_order=True) {'height': 200, 'width': 300} 
 
create_clip_transforms
- imgutils.preprocess.transformers.create_clip_transforms(do_resize: bool = True, size=<object object>, resample=3, do_center_crop=True, crop_size=<object object>, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>, do_convert_rgb: bool = True)[source]
- Creates a composition of image transforms typically used for CLIP models. - Parameters:
- do_resize (bool) – Whether to resize the image. 
- size (dict) – Target size for resizing. Can be {“shortest_edge”: int} or {“height”: int, “width”: int}. 
- resample (int) – PIL resampling filter to use for resizing. 
- do_center_crop (bool) – Whether to center crop the image. 
- crop_size (dict) – Size for center cropping in {“height”: int, “width”: int} format. 
- do_rescale (bool) – Whether to rescale pixel values. 
- rescale_factor (float) – Factor to use for rescaling pixels. 
- do_normalize (bool) – Whether to normalize the image. 
- image_mean (list or tuple) – Mean values for normalization. 
- image_std (list or tuple) – Standard deviation values for normalization. 
- do_convert_rgb (bool) – Whether to convert image to RGB. 
 
- Returns:
- A composed transformation pipeline. 
- Return type:
- PillowCompose 
 
create_transforms_from_clip_processor
- imgutils.preprocess.transformers.create_transforms_from_clip_processor(processor)[source]
- Creates image transforms from a CLIP processor configuration. - Parameters:
- processor (Union[CLIPProcessor, CLIPImageProcessor]) – A CLIP processor or image processor instance from transformers library. 
- Returns:
- A composed transformation pipeline matching the processor’s configuration. 
- Return type:
- PillowCompose 
- Raises:
- NotProcessorTypeError – If the provided processor is not a CLIP processor. 
 
create_convnext_transforms
- imgutils.preprocess.transformers.create_convnext_transforms(do_resize: bool = True, size=<object object>, crop_pct: float = <object object>, resample=2, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>)[source]
- Create a composition of image transforms specifically tailored for ConvNext models. - This function creates a transformation pipeline that can include resizing, rescaling, and normalization operations. The transforms are applied in the following order: - Resize (optional) 
- Convert to tensor 
- Rescale (optional) 
- Normalize (optional) 
 - Parameters:
- do_resize (bool) – Whether to resize the image 
- size (dict) – Target size dictionary with ‘shortest_edge’ key 
- crop_pct (float) – Center crop percentage, used to compute resize size 
- resample (int) – PIL resampling filter to use for resizing 
- do_rescale (bool) – Whether to rescale pixel values 
- rescale_factor (float) – Factor to use for rescaling pixels 
- do_normalize (bool) – Whether to normalize the image 
- image_mean (tuple or list) – Mean values for normalization 
- image_std (tuple or list) – Standard deviation values for normalization 
 
- Returns:
- A composed transformation pipeline 
- Return type:
- PillowCompose 
 
create_transforms_from_convnext_processor
- imgutils.preprocess.transformers.create_transforms_from_convnext_processor(processor)[source]
- Create image transforms from a ConvNext processor configuration. - This function takes a Hugging Face ConvNextImageProcessor and creates a corresponding transformation pipeline that matches its configuration settings. - Parameters:
- processor (ConvNextImageProcessor) – The ConvNext image processor to create transforms from 
- Returns:
- A composed transformation pipeline matching the processor’s configuration 
- Return type:
- PillowCompose 
- Raises:
- NotProcessorTypeError – If the provided processor is not a ConvNextImageProcessor 
 
create_vit_transforms
- imgutils.preprocess.transformers.create_vit_transforms(do_resize: bool = True, size=<object object>, resample: int = 2, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>)[source]
- Create a composition of image transforms typically used for ViT models. - This function creates a transform pipeline that can include resizing, tensor conversion, rescaling, and normalization operations. The transforms are applied in sequence to prepare images for ViT model input. - Parameters:
- do_resize (bool) – Whether to resize the input images 
- size (dict) – Target size for resizing, should be dict with ‘height’ and ‘width’ keys 
- resample (int) – PIL resampling filter to use for resizing 
- do_rescale (bool) – Whether to rescale pixel values 
- rescale_factor (float) – Factor to use for rescaling pixel values 
- do_normalize (bool) – Whether to normalize the image 
- image_mean (tuple or list) – Mean values for normalization 
- image_std (tuple or list) – Standard deviation values for normalization 
 
- Returns:
- A composition of image transforms 
- Return type:
- PillowCompose 
 
create_transforms_from_vit_processor
- imgutils.preprocess.transformers.create_transforms_from_vit_processor(processor)[source]
- Create image transforms from a Hugging Face ViT processor configuration. - This function takes a ViT image processor from the transformers library and creates a matching transform pipeline that replicates the processor’s preprocessing steps. - Parameters:
- processor (ViTImageProcessor) – A ViT image processor from Hugging Face transformers 
- Returns:
- A composition of image transforms matching the processor’s configuration 
- Return type:
- PillowCompose 
- Raises:
- NotProcessorTypeError – If the provided processor is not a ViTImageProcessor 
 
create_siglip_transforms
- imgutils.preprocess.transformers.create_siglip_transforms(do_resize: bool = True, size=<object object>, resample: int = 3, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>, do_convert_rgb: bool = True)[source]
- Creates a composition of image transformations for SigLIP model input processing. - This function builds a pipeline of image transformations that can include: - RGB conversion 
- Image resizing 
- Tensor conversion 
- Image rescaling 
- Normalization 
 - Parameters:
- do_resize (bool) – Whether to resize the image 
- size (dict) – Target size dictionary with ‘height’ and ‘width’ keys 
- resample (int) – PIL image resampling filter to use for resizing 
- do_rescale (bool) – Whether to rescale pixel values 
- rescale_factor (float) – Factor to use for pixel value rescaling 
- do_normalize (bool) – Whether to normalize the image 
- image_mean (tuple or list) – Mean values for normalization 
- image_std (tuple or list) – Standard deviation values for normalization 
- do_convert_rgb (bool) – Whether to convert image to RGB 
 
- Returns:
- A composed transformation pipeline 
- Return type:
- PillowCompose 
 
create_transforms_from_siglip_processor
- imgutils.preprocess.transformers.create_transforms_from_siglip_processor(processor)[source]
- Creates image transformations from a SigLIP processor configuration. - This function extracts transformation parameters from a HuggingFace SigLIP image processor and creates a corresponding transformation pipeline. - Parameters:
- processor (SiglipImageProcessor) – A HuggingFace SigLIP image processor instance 
- Returns:
- A composed transformation pipeline 
- Return type:
- PillowCompose 
- Raises:
- NotProcessorTypeError – If the processor is not a SiglipImageProcessor 
 
create_bit_transforms
- imgutils.preprocess.transformers.create_bit_transforms(do_resize: bool = True, size=<object object>, resample=3, do_center_crop: bool = True, crop_size=<object object>, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>, do_convert_rgb: bool = True)[source]
- Create an image transformation pipeline for BiT models. - This function creates a composition of image transformations including RGB conversion, resizing, center cropping, tensor conversion, rescaling and normalization. - Parameters:
- do_resize (bool) – Whether to resize the image. 
- size (dict) – Target size for resizing. Can be {“shortest_edge”: int} or {“height”: int, “width”: int}. 
- resample (int) – PIL interpolation method for resizing. 
- do_center_crop (bool) – Whether to perform center cropping. 
- crop_size (dict) – Size for center cropping, in format {“height”: int, “width”: int}. 
- do_rescale (bool) – Whether to rescale pixel values. 
- rescale_factor (float) – Factor to rescale pixel values. 
- do_normalize (bool) – Whether to normalize the image. 
- image_mean (list or tuple) – Mean values for normalization. 
- image_std (list or tuple) – Standard deviation values for normalization. 
- do_convert_rgb (bool) – Whether to convert image to RGB. 
 
- Returns:
- A composition of image transformations. 
- Return type:
- PillowCompose 
- Raises:
- ValueError – If size configuration is invalid. 
 
create_transforms_from_bit_processor
- imgutils.preprocess.transformers.create_transforms_from_bit_processor(processor)[source]
- Create image transformations from a BiT image processor. - This function creates a transformation pipeline based on the configuration of a Hugging Face BitImageProcessor. - Parameters:
- processor (BitImageProcessor) – The BiT image processor to create transforms from. 
- Returns:
- A composition of image transformations. 
- Return type:
- PillowCompose 
- Raises:
- NotProcessorTypeError – If the processor is not a BitImageProcessor. 
 
create_blip_transforms
- imgutils.preprocess.transformers.create_blip_transforms(do_resize: bool = True, size=<object object>, resample=3, do_rescale: bool = True, rescale_factor: float = 0.00392156862745098, do_normalize: bool = True, image_mean=<object object>, image_std=<object object>, do_convert_rgb: bool = True)[source]
- Create a transformation pipeline for BLIP image processing. - This function builds a sequence of image transformations commonly used in BLIP models, including RGB conversion, resizing, tensor conversion, rescaling, and normalization. - Parameters:
- do_resize (bool) – Whether to resize the image. 
- size (dict) – Target size for resizing, expects dict with ‘height’ and ‘width’ keys. Defaults to {‘height’: 384, ‘width’: 384}. 
- resample (int) – Resampling filter for resize operation. Defaults to PIL.Image.BICUBIC. 
- do_rescale (bool) – Whether to rescale pixel values. 
- rescale_factor (float) – Factor to rescale pixel values. Defaults to 1/255. 
- do_normalize (bool) – Whether to normalize the image. 
- image_mean (tuple or list) – Mean values for normalization. Defaults to OPENAI_CLIP_MEAN. 
- image_std (tuple or list) – Standard deviation values for normalization. Defaults to OPENAI_CLIP_STD. 
- do_convert_rgb (bool) – Whether to convert image to RGB. 
 
- Returns:
- A composed transformation pipeline. 
- Return type:
- PillowCompose 
 
create_transforms_from_blip_processor
- imgutils.preprocess.transformers.create_transforms_from_blip_processor(processor)[source]
- Create image transformations from a HuggingFace BLIP processor. - This function extracts configuration from a HuggingFace BLIP processor and creates a corresponding transformation pipeline using create_blip_transforms. - Parameters:
- processor (transformers.BlipImageProcessor) – A HuggingFace BLIP image processor instance. 
- Returns:
- A composed transformation pipeline configured according to the processor’s settings. 
- Return type:
- PillowCompose 
- Raises:
- NotProcessorTypeError – If the provided processor is not a BlipImageProcessor. 
 
create_mobilenetv2_transforms
- imgutils.preprocess.transformers.create_mobilenetv2_transforms(do_resize: bool = True, size: ~typing.Dict[str, int] | None = <object object>, resample=2, do_center_crop: bool = True, crop_size: ~typing.Dict[str, int] = <object object>, do_rescale: bool = True, rescale_factor: int | float = 0.00392156862745098, do_normalize: bool = True, image_mean: float | ~typing.List[float] | None = <object object>, image_std: float | ~typing.List[float] | None = <object object>)[source]
- Creates a composition of transforms that replicates the behavior of MobileNetV2ImageProcessor. - This function builds a pipeline of image transformations typically used for MobileNetV2 models, including resizing, center cropping, tensor conversion, rescaling, and normalization. - Parameters:
- do_resize (bool) – Whether to resize the image. 
- size (Optional[Dict[str, int]]) – Size dictionary specifying resize parameters. Can include keys like ‘shortest_edge’, ‘height’, ‘width’, etc. 
- resample (PIL.Image.Resampling) – Resampling filter to use for resizing operations. 
- do_center_crop (bool) – Whether to apply center cropping to the image. 
- crop_size (Dict[str, int]) – Dictionary specifying the height and width for center cropping. 
- do_rescale (bool) – Whether to rescale pixel values after tensor conversion. 
- rescale_factor (Union[int, float]) – Factor by which to rescale the image pixel values. 
- do_normalize (bool) – Whether to normalize the image with mean and std. 
- image_mean (Optional[Union[float, List[float]]]) – Mean values for normalization, per channel. 
- image_std (Optional[Union[float, List[float]]]) – Standard deviation values for normalization, per channel. 
 
- Returns:
- A composition of transforms matching MobileNetV2ImageProcessor behavior. 
- Return type:
- PillowCompose 
 
create_transforms_from_mobilenetv2_processor
- imgutils.preprocess.transformers.create_transforms_from_mobilenetv2_processor(processor)[source]
- Creates transform composition from a MobileNetV2ImageProcessor instance. - This function extracts configuration from a transformers MobileNetV2ImageProcessor and creates an equivalent transform pipeline using the create_mobilenetv2_transforms function. - Parameters:
- processor (transformers.MobileNetV2ImageProcessor) – A MobileNetV2ImageProcessor instance from the transformers library. 
- Returns:
- A composition of transforms matching the processor’s configuration. 
- Return type:
- PillowCompose 
- Raises:
- NotProcessorTypeError – If the provided processor is not a MobileNetV2ImageProcessor.