Introducing Stable Audio: The Next Step in Text-to-Audio Generation

In the world of generative AI technology, Stability AI has been making waves with its impressive capabilities in image and code generation. However, the organization has now set its sights on a new frontier: text-to-audio generation. Today, Stability AI is proud to announce the initial public release of Stable Audio, a groundbreaking technology that allows users to generate short audio clips using simple text prompts.

Driving Force Behind Stable Audio: An Evolutionary Leap

Stability AI is best known for its ground-breaking work in image generation with its Stable Diffusion technology. Following the launch of Stable Diffusion’s new SDXL base model for improved image composition, the company expanded its scope to include code generation with the release of StableCode. Now, with the introduction of Stable Audio, Stability AI is diving into the realm of music and audio generation.

The Brainchild of Harmonai: Blending Ideas from Image and Audio Generation

Stable Audio is not the brainchild of Stability AI’s previous foray into computer-generated music with Jukedeck. Instead, the technology behind Stable Audio hails from a research studio called Harmonai, founded by Zach Evans. Harmonai serves as an open community effort for generative audio research, leveraging the core AI techniques used in image generation to bring innovation to the world of audio.

While the ability to generate base audio tracks using technology is not new, Stable Audio takes it to a whole new level. In the past, individuals relied on symbolic generation techniques that primarily worked with MIDI files. These files represented musical elements, such as drum rolls, but were limited in their generative potential. Stable Audio, on the other hand, harnesses the power of generative AI to create new music that surpasses the repetitive notes associated with MIDI and symbolic generation.

Redefining Audio Generation: Quality and Metadata

What sets Stable Audio apart is its direct utilization of raw audio samples, resulting in higher-quality output. The model was trained on a vast dataset of over 800,000 licensed music pieces from AudioSparks, an audio library. This comprehensive data not only ensures high-quality audio but also encompasses complete metadata, a significant challenge in text-based models. This metadata adds depth and richness to the generative process, elevating the overall audio experience.

Unleashing Creativity: Going Beyond Replication

With image generation models, it’s common for users to create images in the style of a particular artist. However, Stable Audio takes a different approach. Users cannot simply ask the AI model to generate music that sounds like a classic Beatles tune or any other specific artist. Rather, the focus is on empowering musicians and creators to explore their unique creativity, avoiding replication and allowing for truly original compositions.

As a diffusion model, Stable Audio boasts an impressive 1.2 billion parameters, similar to the original release of Stable Diffusion for image generation. The text model used for prompts in audio generation was meticulously designed and trained by Stability AI. Leveraging the Contrastive Language Audio Pretraining (CLAP) technique, the text model forms the backbone of Stable Audio’s generative capabilities. To assist users in leveraging this powerful technology, Stability AI is releasing a prompt guide, offering valuable insights into generating the desired audio files.

Accessible for All: Free and Pro Versions

Stable Audio is designed to be accessible to everyone. Users can choose from the free version, which allows for 20 generations per month of up to 20-second tracks, or opt for the Pro version at $12/month. The Pro version provides the convenience of 500 generations and the ability to generate tracks of up to 90 seconds. This pricing structure ensures that users have the opportunity to explore and experiment with Stable Audio without limitations.

Stability AI’s Stable Audio marks a significant advancement in the field of text-to-audio generation. By leveraging a diffusion model trained on raw audio samples, Stability AI is revolutionizing the way music and audio are created. With its commitment to creativity and accessibility, Stable Audio empowers musicians and creators to break new ground and embrace the limitless possibilities of generative AI technology. Step into the future of audio generation with Stable Audio.

Driving Force Behind Stable Audio: An Evolutionary Leap

The Brainchild of Harmonai: Blending Ideas from Image and Audio Generation

Redefining Audio Generation: Quality and Metadata

Unleashing Creativity: Going Beyond Replication

Accessible for All: Free and Pro Versions

Articles You May Like

Leave a Reply Cancel reply