Cloudinary adds Generative AI to its Programmable Media Image and Video APIs

Powerful New Features Extend Cloudinary’s AI-powered Platform, Allowing Creative and Technical Teams to Innovate Faster and Deliver Customized Images and Video with Greater Ease

SANTA CLARA, Calif.–(BUSINESS WIRE)–#ai—Cloudinary, the image and video platform that powers many of the world’s top brands, today announced the availability of several new generative AI, large language models (LLM) and GPT-based features within its Programmable Media image and video APIs including Generative Fill, Generative Remove, Generative Replace, AI-powered Image Captioning, and a ChatGPT-backed natural language interface. The new Cloudinary features, available now, allow users to create customized, personalized assets in seconds and help technical teams scale quickly by intelligently automating workflows and eliminating repetitive and time-consuming image manipulation tasks. Learn more about these new features in today’s blog posts here and here.

Advanced creative work is time-consuming, expensive and, for some brands, simply out of reach. More than 10,000 customers and 1.5 million users have long benefited from the power of AI via Cloudinary’s award-winning image and video APIs. Today’s new generative AI capabilities extend these benefits even further by making what was once impossible, possible and more accessible for users to create, edit and deliver dynamic visual experiences at unprecedented scale. For example, instead of re-shooting an entire campaign, developers and digital marketers can remove unwanted objects and create beautiful images at scale through Cloudinary’s APIs. Likewise, AI-powered image captioning produces intelligent captions for images instantly to improve accessibility, asset searchability and SEO while boosting productivity and reducing production time.

“Generative AI has completely transformed the way we work, and the most recent advancements are just scratching the surface of what’s possible,” said Nadav Soferman, co-founder and chief product officer, Cloudinary. “Our founding mission was to revolutionize the way in which brands manage and deliver images and video at scale, and building solutions that harness the most advanced technologies has been central to delivering on that promise.”

Soferman continued, “Since launching our flagship image management product, which utilized AI for face-detection-based cropping, we’ve led advancements across media management, leveraging the power of AI, machine and deep learning, and raising the bar for what’s possible in media creation and delivery. It’s always been about letting advanced technology streamline or eliminate tedious tasks so brands can focus their limited resources on creating high impact, highly visual sites, apps and campaigns that connect, engage, inspire and convert. We are very excited to make these powerful new generative AI capabilities a reality for the technical and non-technical teams committed to bringing their best visual stories to life – and we’re just getting started.”

New features bring ease and automation to visual media workflows

Generative Fill: Enables users to enhance and expand an image with ease. For example, users can intelligently expand and extend an original image, especially useful when needing to transform an image from vertical to horizontal. With generative fill, the new AI-generated background will blend seamlessly with the original image.
Generative Remove: Via natural language prompts, users can remove unwanted elements from images and automatically add a matching background. The new feature uses state-of-the-art technology such as open-set object detection models and powerful AI capabilities through Stable Diffusion.
Generative Replace: Allows users to easily detect, change and replace unwanted elements and colors all via natural-language prompts. This capability is especially useful for users looking to more easily create color-based variations of products or to improve web accessibility for those with color blindness.
AI-powered Image Captioning: Intelligently creates image captions for galleries, user-generated content and product descriptions at scale. Image-to-text features strengthen image SEO, improve accessibility, and automate image classification for better findability. It also helps e-commerce users save time by automatically creating smart product descriptions.
Conversational Transformations Builder: This intuitive feature provides a natural language interface through ChatGPT, allowing users to effortlessly communicate desired image transformations and optimizations. For example, a simple command such as “please blur this image and crop to a 1:1 aspect ratio” would deliver a complete and correct transformation.

AI at its core from the start

Cloudinary has a long history of delivering powerful AI capabilities to its customers via trusted industry-leading AI technologies such as OpenAI, Google Vision, and Amazon Rekognition, as well as its own domain expertise and content-aware machine learning models including those for background removal, smart image tagging, video cropping and domain-specific models for industries such as fashion and furniture. For more than a decade, Cloudinary’s image and video solutions have leveraged AI and ML, offering the most advanced image and video transformations including face detection, contextual cropping and auto-tagging, and now generative AI capabilities and ChatGPT integrations for intelligent image captioning. With Cloudinary Assets, digital asset management is effortless through UI-based auto-tagging, AI-powered visual search, and content moderation, enabling seamless workflows for user-generated content. What’s more, the Cloudinary Assets Studio feature harnesses generative AI power to make editing bulk assets simple and powerful.

About Cloudinary

Cloudinary is the image and video technology platform that enables the world’s most engaging brands to deliver transformative visual experiences at global scale. More than 1.5 million users and 10,000 customers, including Apartment Therapy, Bleacher Report, Bombas, Grubhub, Hinge, NBC, Mediavine, Minted, Paul Smith and Peloton, rely on Cloudinary to bring their campaigns, apps and sites to life. With the world’s most powerful image and video APIs backed by industry-leading artificial intelligence and patented technology, Cloudinary offers a single source of truth for brands to manage, transform, optimize, and deliver visual experiences at scale. As a result, the most engaging brands across all industries are seeing up to a 203% ROI using Cloudinary with benefits including faster time to market, higher user satisfaction and increased engagement and conversions. For more information, visit www.cloudinary.com.

Contacts

Juli Greenwood

Head of Global Communications and Customer Marketing

juli@cloudinary.com

Cloudinary adds Generative AI to its Programmable Media Image and Video APIs

Modo and Pointr Partner to Transform Enterprise Workplaces with AI Mapping and Wayfinding

Cloudbeds Introduces Ask Signals, a Conversational AI Interface Built on Unified Hotel Data

OtterlyAI Wins European Search Awards 2026, Cementing Its Status as the Award-Winning AI Search Analytics Tool for the Generative Era

Everest Medicines Receives Shareholding Increase from Substantial Shareholder CBC Group, Global Expansion Expected to Benefit

Six Factors to Consider When Opting for a Credit Line in Singapore

GTJAI Won Two Awards at HR Asia 2026

You may have missed