Launched at NVIDIA GTC: Photoroom Open-Sources PRX, a 1024px Text-to-Image Model Trained on NVIDIA Hopper GPUs

Pixel-Space Diffusion, $1,500 Megapixel Runs, Optimized Inference and Multilingual Prompting

  • 1024px text-to-image training completed in circa 15 days on 32 NVIDIA Hopper GPUs, with the full training process released publicly
  • 1-megapixel capability demonstrated in 24 hours at roughly $1,500 in compute, lowering the experimentation barrier for high-resolution diffusion
  • Direct pixel-space experiments challenge heavy reliance on latent compression in modern diffusion systems
  • Inference latency and cost reduced through NVIDIA TensorRT and NVIDIA Dynamo-Triton optimisation

Photoroom, the AI-powered photo editing platform processing more than seven billion images annually, has announced the open release of PRX, its internally developed text-to-image model trained from scratch on NVIDIA Hopper GPUs. Unlike most model launches that release only final weights, Photoroom is publishing the full training process, including architecture decisions, acceleration methods, performance trade-offs and post-training techniques, making high-resolution diffusion training more transparent and reproducible for engineering teams working at scale. 

“This is about lowering the barrier to building and understanding high-quality text-to-image models,” said Matt Rouif, CEO and co-founder of Photoroom. “By open-sourcing PRX and publishing the full training process, we’re giving engineers a practical reference they can learn from and build on, from architecture choices and training efficiency through to inference cost on NVIDIA AI infrastructure and software. Too often, teams get the final weights but not the decisions that shaped them. We’re making those decisions visible so PRX can serve as both a strong open model and a practical playbook for training and deploying high-resolution text-to-image systems.”

PRX was trained from scratch rather than fine-tuned from an existing foundation model. Photoroom, a member of the NVIDIA Inception program for startups, trained the current 1.3B-parameter checkpoint for 1.7 million steps in circa 15 days on 32 NVIDIA Hopper GPUs, with documentation detailing the architecture, optimisation and scaling decisions behind the release.

Alongside full-scale training, Photoroom ran 24-hour experiments reaching 1-megapixel output at approximately $1,500 in compute cost, demonstrating how high-resolution diffusion experiments can be structured to lower infrastructure barriers for research teams.

Because most modern diffusion systems rely on variational auto-encoders to compress images before training, PRX includes experiments that reduce reliance on this compression stage by testing more direct pixel-level prediction approaches, enabling analysis of how stability, visual fidelity and compute efficiency shift when compression is minimised.

One early example of PRX’s accessibility comes from Steve Anderson, founder of Lighthouse Software, who used the open-source model to build an interactive visual explainer showing how diffusion models generate images step by step, including denoising progression, prompt blending and guidance scale adjustments.

“Being able to run PRX locally on my MacBook and generate all of the examples without relying on cloud GPUs made it possible to experiment directly and demonstrate how the model navigates from noise to image in a transparent, hands-on way,” said Steve. “Access to an open model like PRX allowed me to break down how these systems behave under the hood without requiring specialised infrastructure.”

Inference performance was also evaluated on NVIDIA AI infrastructure and optimised using NVIDIA TensorRT and NVIDIA Dynamo-Triton , with latency and throughput improvements directly reducing per-image generation cost at scale.

The model supports multilingual prompting, embedding language capability at the text encoding stage so image generation is not limited to English-only workflows. Beyond releasing weights under an Apache 2.0 license, Photoroom is publishing architecture comparisons across diffusion transformer variants, training acceleration techniques, hyperparameter experiments and post-training methods as part of an ongoing research series positioning PRX as a continuing open research track rather than a single checkpoint release.

About Photoroom

Founded in 2019, Photoroom has become one of the most widely used AI-powered photo editing and design platforms, specialising in e-commerce imagery. With over 300 million downloads across 180+ countries, Photoroom ranks among the most-used generative AI products globally.

Available across mobile, web and API, Photoroom supports SMBs, enterprise teams and prosumers by enabling fast, accurate and consistent visual production, including background removal, batch editing and generative AI tools such as AI Backgrounds, AI Images and AI Shadows.

Processing over seven billion images per year, Photoroom provides a scalable solution for creating product imagery, helping businesses improve visibility, operate more efficiently and convert demand more effectively.

error: Content is protected !!