What makes personal assistants interesting is that Alphabet, Amazon, Microsoft, and Apple are vulnerable and need you to get on board.

Apparently seven years ago Jeff Bezos, inspired by a love of Star Trek, decided that Amazon should build a something that you can talk to, and which would turn your commands into actions. Now Amazon Alexa is competing with a growing number of intelligent personal assistants[1] from almost every corporate behemoth around. Alphabet (the company that owns Google) has Google Home, Microsoft has Cortana, and Apple, of course, has Siri which was perhaps the best known early on. Viv and Facebook M are also interesting contenders but the state of play for each of those is different enough that we’ll cover them at the end. If you ask any of these companies about investment in this endeavour they’ll say that digital assistants are the next big thing.

While each of the big four has their physical product, the battle isn’t between Google Home and Echo Dot – what they need to succeed is their operating system. Microsoft is in this race by merit of owning the first most popular way we interact with computers by mouse. Owning the most popular assistant OS could help any of these companies define the next decade.

Regardless of the millions of pounds and work hours that each of these organisations has put into their respective offerings, one thing they have in common is; it’s not enough.  Not yet, not for the lofty goal of having a program that can understand anything you say and do just what you want it to.

That’s not to say that these programs aren’t incredibly impressive technological forward steps. However, reviews like this business insider comparison and this one from CNN make it pretty clear that asking too much of these programs will quickly reveal how much they cannot do.

The problem is that we’ve come to expect a lot from computers. I would be personally outraged if my mobile phone refused to update my social media, download video, send and receive emails from multiple accounts, display every photo I took over the past few years (regardless of what device I took it on) and tell me the top speed of a grizzly bear[2]. This functionality has become synonymous with the device but mobile phone manufacturers are responsible for a relatively small proportion of that. We’re used to platforms that use years of established protocols to support all kinds of software. Now, every company trying to build the world’s AI is coming up against two core problems:

They are building a brand new breed of platform. Intelligent assistants are different enough from existing operating systems that they need to put in a lot more work rebuilding existing connections

Previously there was a certain amount of leeway for programs to be pedantic. Up to a certain extent, we accept that it’s our fault for not pushing the right buttons. We don’t have the same patience when speaking so these programs have to be able to respond to pretty much anything a person might say.

Apple once described the iPhone 6’s multi-pressure touch as “trying to read minds” but really, technology has been about reading trends and teaching minds – a series of incremental tweaks with the onus on us as consumers to adapt. The challenge here is to recreate decades of program integrations and, as a small side project, codify the entire spoken human language.

It can’t be done. Certainly not by one team and, let’s face it, if you were racing Apple to build Weird Science would you bet on yourself to do it alone?

What makes personal assistants interesting is that Alphabet, Amazon, Microsoft, and Apple are vulnerable and need you to get on board. They have bet a lot on this, and none of them wants to be Betamax. Or Zune.

Get the very best content from Distilled in your inbox every month


Your chance

There’s almost no graphic design involved and competition is far lower than you’ll find in any of their respective app stores.

This is where you come in. In order for any one of these companies to win this race, they need individuals and companies to develop a lot of the programs, or at least the program-specific integrations, for them.

Your choice now is whether you invest the time to get a stake in the ground, knowing that most will welcome it but that you’re also betting on their success.

Compared to designing a standard app, the time and training investment for many simple functions is hugely reduced.  There’s no design involved and competition is far lower than you’ll find in any of the respective app stores. As a proof of concept, with no coding knowledge pre-February, I’m building an interactive program that could integrate with a bunch of messaging platforms as well as Google Home and Alexa (more on that later).

Amazon are offering free bootcamps to learn more about building Alexa skills. Image source: Guillermo Fernandes via Flickr

The companies at play are also far more open here than in other arenas. Amazon is running free half day bootcamps to teach the principles of building Alexa skills, and are giving out a plethora of prizes and incentives for successful attempts. Alphabet is offering to suggest you if a user asks for something that your program could fulfil – the kind of relevant, single-result search ownership that companies would kill for in a browser. Companies that are taking advantage of these platforms are already reaping the rewards, for instance, the JustEat skill has been preinstalled on Amazon Echo from the first shipment thanks to their chatbot strategy – a huge advantage over competitor programs which users will manually have to download.  What’s more, a lot of these new eco-systems use engagement metrics as a way of ranking programs, so by starting now and building up those numbers before competitors cotton on, companies can vastly improve their chances when things get far more crowded.

How to build a chatbot

Unsurprisingly, the biggest change you need to make to capitalise on AI is replacing button clicks with phrases. Each of the big four has started advocating platforms that take the burden of recognising a sentence (spoken or written), breaking it up, and sending you the important information in digestible chunks. You just have to tell them what is important and when (I’ve included a list of these platforms at the bottom of this post).

By and large, the following, intentionally broad instructions, will serve you in creating a conversational application on any these platforms as they all have a few things in common. This will give you an idea of the way you need to think about interacting with them. In the coming months, I’ll be writing a more in-depth post about how I created my bot using the api.ai platform, which Alphabet acquired last year.

Plan your interactions

This will be easier once you’ve got a feel for the platform but you almost need a flowchart for the conversation with markers for times when your program is doing things behind the scenes.  David Low, Product Evangelist for Amazon, says that the lowest rated apps often have too many options. He recommends starting very small and adding options later.

Always plan your interactions to get an idea how conversations will play out.

Decide what you want people to call your program

This is the part of the process that is the most ‘SEO’ and applies most specifically to spoken interactions. Essentially this is what people need to say to wake up your program. Think “OK Google I want to talk to Superdry Online” or “Alexa, ask Dominoes to order me a 12-inch pizza”. It’s a bit clumsier than might be ideal but it means you know what you’re getting, rather than accidentally posting your Spotify password on Facebook.

Usually, once you publish your program it’s too late to change your invocation so you need to think in advance about something short, memorable, and descriptive. It helps if your brand name already ticks those boxes but you’re likely to run into problems if you have a web2.0 name like ‘Pinkr’ or ‘seetbk’. The platforms are prone to confusing homophones and you may need to get in touch with the companies directly to overcome that confusion. The fact that they are willing to work with individual brands to manage proper brand recognition is one sign of the opportunity at this point.

Create the phrases you want your program to respond to and highlight the variable information

On all of these platforms you create phrases with parts that won’t change, then you can also add parts that will change. For instance, the phrase “My name is Slim Shady” is of the format “My name is {name}”. This means that you can handle the heavy lifting of variations in speech using these platforms and it takes a load of the burden off of any external code.

Deal with the JSON it sends you

First things first – there are scenarios where you won’t need to code at all, it just limits you in what your bot can do. I created the simple back and forth you can see in this gif in about ten minutes using no external code. If you have coding experience or are comfortable with learning, you can integrate pretty much any of these services if you can securely receive and respond to a JSON POST request within about 5-8 seconds.

Test and go live

Most of the services offer some kind of easy integration out of the box. They’ll often walk you through it and, if all you need is a relatively standard setup, this will probably take you all of twenty minutes.

You’ll then usually need to go through a slightly separate process to actually publish, mainly medium-specific quality checks.

Fortunately, platforms like api.ai and converse.ai allow integration to multiple mediums at a time. So, having built for Google Home, you can roll out to Facebook, Slack, Telegram etc. with relatively little overhead.

The next five years

If you can only build for one platform and you’re trying to prioritise, you can’t go terribly far wrong. Microsoft’s linguistic processing platform, LUIS, is integrated with the popular Microsoft Bot framework which has almost tripled developer usage in the last six months and stretches far further than Cortana. This is the framework that JustEat and Three are using to build across multiple mediums, including website integrations. It’s worth noting that consumer usage figures for Cortana may be heavily inflated depending on whether Microsoft is including any use of the Windows 10 search bar, however, they are also using those search bar inputs to perfect their back end machine learning platform which should help improve accuracy across all applications.

Alphabet’s recommended platform – API.AI is easy to pick up and can launch on a number of mainstream chat mediums with just a few clicks. Alphabet can also rely heavily on their Google search engine to help make their assistants more full-functioned and user attractive from the off. Unlike with Alexa, users don’t need to manually select your bot to be installed on their device, this helps users access your service but means that individual requests become more like web searches, rather than using a specifically chosen app. Instead of competing once for install you’re competing every time a user says “Hey Google” and getting in early to be the program that Google Assistant suggests will be a huge win.

Apple seems to be the furthest behind with their developer kit, SiriKit, being pretty much limited to things that Siri can already do. That being said, Apple’s dominance in smartphone hardware and OS is a strong foothold. Apple’s laser focus on their own ecosystem could hamper long term plans to be everyone’s HAL 9000, but in the short term, people who have committed to Apple’s vision are already the closest casual consumers to having an omniscient machine that follows you from room to room.

Apple’s focus on its own ecosystem could cause it to lose out in the personal assistants arms race. Image source: Kārlis Dambrāns via Flickr.

Facebook M, Facebook’s intelligent personal assistant is an interesting departure from the norm. Rather than trying to create a program that can do everything, Facebook’s offering is more like partial automation. Facebook M is designed to deal with as many queries as possible like the other IPAs but, when it gets stuck, send the request on to human customer service reps that go as far as calling the DMV. The idea is that everything these reps do is recorded, so that Facebook M can eventually do it alone. While this is currently only available to limited geographies, and could run into some serious scalability issues, Facebook M has the potential to deliver the customer experience they’re all striving for within far shorter timelines.

Viv is another IPA worth of mention at this point. Viv was created by the team which originally built Siri. In a launch video, Co-Founder, Dag Kittlaus, explains that Viv receives a request, checks all the integrations it has at its disposal, and then writes the code it needs to fulfil the request itself. While their developer centre isn’t yet open to wider use, you can email them about a partnership and this different setup should mean the platform is far easier to build services for.

For my money, Amazon is making the most interesting strategic decisions. They are actively courting programmers and brands and are expressly separating Alexa, the program, from the Echo devices that run it. Amazon’s laissez-faire attitude to Alexa uses meant that CES 2017 included Alexa on devices from cars and washing machines to direct Echo competitors.  They’ve even managed to sneak Alexa onto iPhones by adding it as a feature to the Amazon app, which many users already have installed. This can’t compete with the ease of summoning Siri at just the hold of a button but it’s a shot across the bow for Apple’s own assistant. It’s particularly interesting that Amazon has said they think digital assistants should be able to use each other – a nice ideal and fantastic way to break out of platform silos if one service is to become dominant.

Chances are that all of these players are too big to be stamped out of the race entirely but if one of them can reach critical mass of developers and users to become the defacto disembodied voice, that is going to become very interesting indeed. And particularly valuable for those businesses that have the foresight or agility to keep up.

Platform specific resources

Microsoft is pushing LUIS in conjunction with the Microsoft Bot Framework, Google have invested in api.ai, and Amazon recommends building Alexa skills using the purpose-built section in developer.amazon.com. Amazon is also offering free (up to a point) hosting for your external code on aws.amazon.com – the downside is that the Amazon platform is a bit more dependent on code but they make linking into your code easier. Apple gives information about SiriKit, their SDK built specifically for Siri, here.

Sample JSON from API.AI

This isn’t identical to the messages that all of the platforms will send, but it’s the kind of thing you can expect:

{ “id”: “9962fb04-3808-472e-9fe0-f34de1f029b7”,
“timestamp”: “2017-06-26T17:27:48.156Z”,
“lang”: “en”,
“result”: {
“source”: “agent”,
“resolvedQuery”: “My name is Slim Shady”,
“action”: “”,
“actionIncomplete”: false,
“parameters”: {“name”: “Slim Shady”},
“contexts”: [],
“metadata”: {
“intentId”: “2c7ba931-5ea7-4693-b384-eea23a661c68”,
“webhookUsed”: “false”,
“webhookForSlotFillingUsed”: “false”,
“intentName”: “My name is name”},
“fulfillment”: {
“speech”: “”,
“messages”: [{
“type”: 0,
“speech”: “”}] },
“score”: 1},
“status”: { “code”: 200,”errorType”: “success”},
“sessionId”: “1b0e0d9a-0efb-4d48-9dfc-9a1d5ebf1364”}

[1] Or interactive personal assistants, or Digital assistants, or AI, or bots, or any of the other host of names that have sprung up.

[2] In case you’re interested, apparently it is almost 35mph according to speedofanimals.com although I have not one clue what the “feels like” column means.