Generative artificial intelligence (AI) is a technology that uses computer algorithms to create new things like text, images, audio and video.
This technology can be used in different industries, and it’s still being studied to understand all its possibilities.
Some examples of what generative AI can create include:
- Generating written text such as news articles, product descriptions, or even whole books
- Creating images, such as photographs or digital artwork
- Generating audio, such as music or speech
- Producing videos, such as animation or even live-action footage
In fact, everything you just read was written by AI — but don’t worry, this article is now back in human hands.
Let’s take a look at some examples of what generative AI technology can already create, and where things are heading in 2023.
AI can make videos of almost anything
While AI image generators have become popular in recent years (more on them later), AI systems are increasingly being used to turn text prompts into videos.
Google says its text-to-image model Imagen can create images and videos with an “unprecedented degree of photorealism and a deep level of language understanding”.
Videos created by Google’s Imagen AI
(Videos have been sped up to reduce loading time)
‘Coffee pouring into a cup’
‘Campfire at night in a snowy forest with starry sky in the background’
Jeff Dean from Google Research wrote earlier this month that one of the company’s research challenges was creating systems which “produce high resolution, high quality, temporally consistent videos with a high level of controllability”.
“This is a very challenging area because unlike images, where the challenge was to match the desired properties of the image with the generated pixels, with video there is the added dimension of time,” he said.
“Not only must all the pixels in each frame match what should be happening in the video at the moment, they must also be consistent with other frames.”
In September 2022, Facebook’s parent company Meta revealed Make-A-Video, which it said could create “whimsical, one-of-a-kind videos with just a few words or lines of text”.
“The system learns what the world looks like from paired text-image data and how the world moves from video footage with no associated text,” its creators said.
Videos created by Meta’s Make-A-Video AI
(Videos have been sped up to reduce loading time)
‘A young couple walking in a heavy rain’
Make-A-Video can also create new video using an existing video or image as its starting point.
Take this Rembrandt painting, for example …
… which is transformed into a moving image after the AI has worked its magic.
AI can hear a snippet of a voice (or music) and keep it going
A Google research project called AudioLM can take a brief audio prompt and generate its own continuation of that audio — be it speech, or even piano.
Its creators say the software’s creations “preserve speaker identity, prosody, accent and recording conditions”, while also having coherent syntax and semantics.
Tap or click to play these Google AudioLM examples
Voice continuation example 1:
Voice continuation example 2:
AudioLM’s creators said the program could also “learn to generate coherent piano music continuations” despite only being trained on piano audio and not any musical notation.
Piano continuation example 1:
Piano continuation example 2:
Microsoft’s VALL-E research project can also synthesise speech while maintaining “the speaker’s emotion and acoustic environment”, its creators say.
Here are some examples of the AI using a short audio prompt to create speech for a different piece of text, while maintaining a certain emotion or environment.
Tap or click to play these Microsoft AI audio examples
A voice in an acoustic environment:
The AI uses a three second prompt to create a voice, before reading the following statement:
“Everything is run by computer but you got to know how to think before you can do a computer.”
A voice with a particular emotion:
The AI uses a three second prompt to create a voice, before reading the following statement with an angry tone:
“We have to reduce the number of plastic bags.”
AI can turn text prompts into music
Announced only this week, Google’s MusicLM system generates “high-fidelity music from text descriptions”, according to its creators.
The research project was trained on 280,000 hours of music to learn how to create music its creators say has “significant complexity”.
Tap or click to play these Google MusicLM examples
Music generation example 1
Slow tempo, bass-and-drums-led reggae song. Sustained electric guitar. High-pitched bongos with ringing tones. Vocals are relaxed with a laid-back feel, very expressive.
Music generation example 2
This is an r&b/hip-hop music piece. There is a male vocal rapping and a female vocal singing in a rap-like manner. The beat is comprised of a piano playing the chords of the tune with an electronic drum backing.
The atmosphere of the piece is playful and energetic. This piece could be used in the soundtrack of a high school drama movie/TV show. It could also be played at birthday parties or beach parties.
MusicLM’s creators say they don’t plan to release the software yet, as there remains a “risk of potential misappropriation of creative content”.
AI systems have controversially been used to create deepfakes of singers’ voices and songs (be they dead, or alive) in recent years.
AI can create ‘Infinite Nature’ from just one image
Late last year, Google research scientists Noah Snavely and Zhengqi Li introduced a project called Infinite Nature.
“We live in a world of great natural beauty — of majestic mountains, dramatic seascapes, and serene forests,” they wrote.
“Imagine seeing this beauty as a bird does, flying past richly detailed, three-dimensional landscapes.
“Can computers learn to synthesise this kind of visual experience? Such a capability would allow for new kinds of content for games and virtual reality experiences: for instance, relaxing within an immersive fly-through of an infinite nature scene.”
The pair said their work only used systems that were trained using still images, which they claimed was a breakthrough.
They have since worked on generating “complete, photorealistic, and consistent 3D worlds”.
Type a prompt, and OpenAI’s DALL·E 2 creates images
Text-to-image generator DALL·E 2, created by San Francisco company OpenAI, has received a lot of attention since launching publicly last year.
Its creations can sometimes be difficult to differentiate from those of a human illustrator.
Here are some images we created using DALL·E 2, along with their text prompts.
‘An elderly white male farmer holding a very large onion and smiling’
‘An 18th century oil painting of Rome’s Colosseum, with people sitting in the foreground’
‘A photorealistic image of a robot using a laptop computer, while sitting in a warm cafe’
‘A watercolour painting of a rooster standing and crowing, with a colourful background, all in pastel colours’
Text-to-image generators such as DALL·E 2 have caused controversy in recent months, as they are often trained using images taken from the internet — including copyrighted works created by artists, usually used without their permission.
OpenAI’s viral chatbot ChatGPT instantly writes essays, software code and more
ChatGPT is the AI which wrote most of the introduction to this article.
Its popularity has spiked in recent months, thanks to its ability to almost instantly create everything from essays to film scripts, software code, spreadsheet formulas, and terrifying short stories.
The software has passed medical exams and been banned by some schools and universities, but it’s already being used by workers in some industries to quickly create useful content.
It doesn’t always get things right, though.
ChatGPT’s creators OpenAI have received a multi-billion dollar investment from Microsoft “to accelerate AI breakthroughs to ensure these benefits are broadly shared with the world”, the companies said this week.
The partnership is reportedly developing a ChatGPT-powered version of Microsoft’s Bing search engine, while Google is also said to be looking to launch an AI search chatbot of its own.
AI researchers predict ‘great benefits’ and ‘scary moments’ as technology improves
Generative AI models took huge leaps in 2022, but some of their creators say we could run into some issues as things improve further in 2023 and beyond.
Google’s Jeff Dean said we would see “advances in the quality and speed of media generation itself” this year, as well as opportunities for users of AI to have more creative expression.
But he was wary that more powerful AI models may also “introduce a number of concerns”.
“They could potentially generate harmful content of various kinds, or generate fake imagery or audio content that is difficult to distinguish from reality,” he said.
“These are all issues we consider carefully when deciding when and how to deploy these models responsibly.”
OpenAI has spent time researching how language models such as ChatGPT might be misused for disinformation campaigns.
“For malicious actors, these language models bring the promise of automating the creation of convincing and misleading text for use in influence operations,” the company said in a recent report.
OpenAI CEO Sam Altman said there would be “scary moments” and “significant disruptions” as we get closer to reaching what’s known as artificial general intelligence (AGI), which is when computer systems gain the ability to understand or learn more complex intellectual tasks like humans can.
“But the upsides can be so amazing that it’s well worth overcoming the great challenges to get there,” he said on Twitter.
“In particular, there are going to be significant problems with the use of OpenAI tech over time; we will do our best but will not successfully anticipate every issue.”
Leave a Reply