For Businesses

Alexa multimodal skills: What they are, and how they can benefit your business

Multimodal skills for Alexa, or the addition of two or more forms of communication such as visual aids, audio or touch, are a new and exciting way to improve the Alexa experience.

Amazon Pay Team

Nov 02, 2021

Multimodal skills have a plethora of benefits, such as increasing the detail and depth of your customer journey and a richer media experience. By giving the customer different ways to engage sensorially with your product or service, you make the shopping experience more versatile and accessible.

“Multimodal interfaces give you the ability to provide videos, images, and animations in conjunction with voice. Consider a skill which displays user photos, videos, and photo metadata from a user’s hosted account. On a device without a screen, you’d only have it read out the metadata or play the audio portion of a video file. However, on a device with a screen you’d be able to display, play, and search for visual content easily. Multimodal devices give you the opportunity to do this which does not exist otherwise.”
– Alexa Developer Site, How to Build a Multimodal Alexa Skill

For customers, choice and simplicity are key

Multimodal technology, such as Alexa Presentation Language (APL), enables skill developers to create rich, interactive displays for Alexa skills and tailor experiences for a variety of Alexa-enabled devices. This gives users more choices for how they want to give and receive information.

Greater implementation of multimodal technology is a response to consumer needs and overall trends in the landscape, like the desire for personalization and interactive screens. Alexa is available through a variety of devices with multiple modalities, including smart displays like the Echo Show. Consumer adoption of smart display devices has been steadily increasing in the past year, and customers expect to be able to use the full capabilities of these devices. With multimodal technology, you can leverage these capabilities to build customized, robust displays that go beyond simple commands and voice interactions.

Multimodal technology allows merchants to address pain points in the voice shopping experience by expanding the capabilities of the overall tech to increase customer confidence. For example, visual and touch elements enable customers to easily compare products, view product details, and review order summaries before completing a purchase. As demands and habits change, different modes of interaction also let you pivot more easily and reach a wider audience.

But how does it work? And how do merchants implement multimodal technology in a way that makes sense for them?

How we use multimodal skills: Alexa Bill Planner

Alexa Bill Planner has implemented Alexa voice commands for bill paying, turning a tedious task into a simple, natural activity. With Bill Planner, customers can get details on their utility bills, compare month over month charges, get bill due date notifications, and even pay their bills all with just their voice. The new experience adds all-new visual components to supplement the voice experience that lets customers talk to Alexa while also seeing their information at a glance on their smart display devices.

Multimodal functions allowed us to think of how to expand beyond our original concepts and transform not just the program, but the bill-paying experience for our users. The visual component upgrades the overall Alexa Bill Planner experience with increased clarity, seamless connectivity, and more efficient capabilities.

1. Less confusion on bill details:

Since it’s naturally challenging for many to remember all the details of a bill when hearing it, a visual representation can be helpful for customers by displaying the most important info for them on screen. Everything from their payment history, including historical trends of bills, how much is due now, and details about previous payments, is just an Alexa command away, with simple on-screen visualizations to avoid confusion.

2. Easier account linking

One of the most tedious parts of switching how customers manage their bills is linking their Alexa account to their Utility account. With the new visual Alexa experience & Bill Planner, the process is made crystal clear on the screen. Any potential hiccups that come from relying solely on Alexa voice control capabilities are smoothed over by the addition of the visual display. A great example would be when it’s difficult for users to recite complex information, like account numbers. Instead of speaking it, customers can use the on-device keyboard to enter this information. This can provide users with more confidence that Alexa is getting the correct information, and add privacy in case they don’t feel like saying that info out loud.

3. Optimized and efficient CX:

Consumer use of smart displays skyrocketed by 50% in 2020 alone (via Voice Bot), showing strong customer desire for visual interaction. In response, we combined Alexa Bill Planner with the capabilities of other Amazon devices to keep up with ever-evolving consumer habits.

How to get started building multimodal skills

Taking advantage of multimodal technology is not as intimidating as it might sound. The Alexa Presentation Language (APL) is a design language that lets you create versatile, adaptable skills using Alexa’s high-quality audio and visual capabilities. You can implement these across Amazon devices to supercharge your relationship with your customers through an easy-to-use process.

The APL developer portal can walk you through how and why to implement MM skills. With detailed explanations of multimodal skill flows and functions, the APL developer assists you in using the design language across Amazon devices for seamless integration of multimodal skills.

For more info on what multimodal capabilities are and how best to use them, check out the Alexa Skills Kit Blog.