Generative AI has the ability to generate all types of content including text, art, images, and even speech.
The AI startup, ElevenLabs, has supported text-to-speech generation and voice cloning since its beta launch in January and has accumulated over one million registered users.
Also: Meta unveils Voicebox AI to replicate the voices of your friends and loved ones
On Tuesday, ElevenLabs announced the closing of a$19 million dollar Series A round, as well as some major updates to the platform, including ones to address its biggest controversy.
Since its launch, Elevenlabs' voice-generating technology has had both positive and negative implications.
Some of the positive uses, as delineated by ElevenLabs, include "independent authors creating audiobooks, developers voicing characters in video games, supporting the visually impaired to access online written content, and powering the world's first AI radio channel."
Although these use cases are positive and advance the business processes of many different industries, there have been equally detrimental applications.
The voice-cloning tool, which takes snippets of a person's voice to generate new audio, has been used for nefarious means, making public figures seem like they are saying horrible, discriminatory statements.
Weeks after releasing the beta, ElevenLabs immediately took to Twitter to address the "voice cloning misuse cases." The company suggested potential ways to combat the issue such as additional account verification, verifying copyright to the voice, moving voice cloning to a paid tier, and even manually verifying each request.
Also: Vimeo adds a suite of AI tools to make video creation significantly easier
Today, it released to the public what seems to be the company's solution to the issue, an AI Speech Classifier. This tool will be able to decipher whether uploaded audio contains AI-generated audio from ElevenLabs or doesn't.
"The release of the AI Speech Classifier is the latest step in the company's push for transparency, and it is a cornerstone of their commitment to creating a safe generative media landscape," said ElevenLabs in the release.
According to a previous post announcing the tool, the tool maintains >99% accuracy in identifying when the audio is unmodified.
However, if the audio underwent Codec or reverb transformations, accuracy drops to over 90% accuracy, and the more the content has been processed, the more the accuracy drops, according to the release.
This tool won't prevent misuse and may simply help clear up the confusion after the initial harm is done. Its effectiveness in solving the issue is questionable, but it's a small step.
This isn't the first time AI-generation technology has been misused to target public figures. For example, an AI music generator was able to generate a Drake and The Weekend collaboration that sounded real although neither artist was actually on the track.
Also: Can AI-generated music win a music award? The Grammys reveal new rules
AI art and image generators have also been used to generate fake, realistic images of public figures doing certain activities. Some of these images have been used negatively as political propaganda while others have just been used for entertainment purposes, such as the meme of Pope Francis in a puffer coat.
In addition to the AI Speech Classifier, ElevenLabs also announced the arrival of "Projects" to its suite of products.
"Projects" is a workflow for editing and creating long-form spoken content available for early access now. It is meant to serve as a one-stop shop for audio-editing needs and provide a "Google Docs level of simplicity" to audio creation, according to the release.
The addition of the "Projects" feature is similar to those we have seen from other creativity platforms, such as Vimeo, TikTok, and Adobe Express. The goal of all of these platforms is to implement AI in a way that optimizes user workflow and allows for easier, optimized creation of content.