How AI is enhancing entertainment experiences on voice assistants, explains Ashlesha Kadam

Ashlesha shares how companies can leverage the power of AI, in particular, LLMs in voice technology to transform music-based experiences

By Ammar Tarique

  • Follow us on
  • google-news
  • whatsapp
  • telegram

Top Stories

Published: Wed 14 Jun 2023, 6:02 PM

From notifying you of your favourite artist’s new release to getting you up to speed with the podcasts you follow to playing lullabies for your kids at bedtime, voice assistants (VAs) like Siri, Alexa and Google Assistant help you with all of these tasks, and more.

Artificial Intelligence (AI) is at the core of this revolution, enabling these voice assistants to understand users’ commands and fulfil them in the most relevant, accurate and personalised way. In the domain of entertainment — especially music — recent advancements in AI, particularly Large Language Models (LLMs) like OpenAI’s GPT 3.5, are transforming the user experience of voice assistants.


Ashlesha Kadam leads the global technical product team for AI-based music experiences on voice assistants for a top music streaming service, has worked extensively on developing one of the most widely used AI-based voice assistants, and has filed two patents in the field of voice assistants and AI. She shares her perspective with us.

Current State of AI-driven Voice Assistants for Entertainment

Talking about how voice assistants use AI today for music experiences, Kadam shares, “Today voice assistants already extensively use AI to craft a useful, entertaining and delightful experience for its users.”

Voice Assistants rely on natural language processing to understand user commands and fulfil them correctly. Kadam elaborates, “for example, a user listening to a song might just ask the voice assistant to 'play the acoustic version of this' and the voice assistant figures out what “this” means. Or when a user says “play divide”, the voice assistant knows that Divide is the name of a popular music album by Ed Sheeran and not the mathematical reference.”

AI has also been in use for a while to make music recommendations more personalised and engaging. Kadam shared an example of how AI powers recommendations on voice assistants — “Sometimes, users make vague requests like “play some music”. Knowing what to play for every user based on their musical taste is possible using AI. “For example, a user might want to listen to something that’s more upbeat in the morning versus something a bit more laid-back towards the end of the day. Finding just the right recommendation for every individual at the right time is possible using AI”, she added.

Impact of Advanced AI and LLMs on users

Large Language Models (LLMs) have an unprecedented understanding of natural language and context that is helping create exciting new applications for entertainment via voice assistants.

Kadam marked what she thinks is one of the biggest positive changes with LLMs — “LLMs, coupled with existing use of AI, can make conversations with voice assistants a lot more natural and human-like. For example, you could ask a voice assistant to create a playlist for your daily morning run that has the latest upbeat pop songs, with more music like Drake’s but no Ed Sheeran, has English and Hindi music, and lasts at least 90 minutes.”

Today, none of these requests are supported smoothly on any of the mainstream voice assistants in the US, but these possibilities are underway with the latest LLM advancements.

Latest AI developments could allow voice assistants to understand users’ specific moods and emotional states, what activity they are performing and what their musical likes and dislikes are, to create a perfectly tailored musical experience for them.

Further, coupling the high quality of music recommendations with multi-modal experiences (via touch screens, for example) and AR (Augmented Reality), voice assistants can create rich and immersive musical experiences. “Think about being able to attend a virtual concert by your favourite artist, complete with being able to experience the concert and its theatrics, from the comfort of your living room”, the expert added.

A few artists like Travis Scott and Ariana Grande have done virtual concerts during the Covid-19 lockdown era to still connect with their fans by leveraging AR.

Shortcomings and Risks

Like any other machine learning-based offering, LLMs are also prone to biases depending on the data they are trained on.

Kadam highlights a few more shortcomings of LLMs’ effectiveness around music use cases – “Like any other models, LLMs rely on the text data that they are trained on. They can’t analyse musical melodies and harmonies out of the box, nor understand music theory and composition.”

Another important drawback to keep in mind is that the music industry evolves fast. What’s trending today might be stale in a few days. “LLMs are usually trained up to a certain time, and it is expensive to keep retraining them frequently. As a result, LLMs might not help discover what’s hot and trending in the music landscape, which is something a lot of users care about”, Kadam commented.

Voice assistants have already been transforming the way users interact with technology for entertainment purposes like listening to music and podcasts. With the recent strides in the field of AI, especially LLMs, we can expect voice technology to become the key interface through which users seek unique, ground-breaking and immersive experiences. However, LLMs are not the silver bullet to provide the ideal experience. There continue to remain gaps in LLMs that still need to be addressed in order to be able to use them effectively for music use cases on voice assistants.

(Note: all opinions and points of view are strictly of the individual and not representative of their employer).

— Ammar Tarique is the content strategist at Teamology Softech and Media Private Limited.


More news from KT Network