91青青草

News

Can smart devices really understand us?

Virtual helpers have introduced interactive artificial intelligence into our everyday lives. The next step is for computers to learn how to relate to us as individuals.
Kuvituskuvassa 盲盲niaaltoja, ihmisen korva ja p盲盲n sis盲inen mikropiiri: Kuvittaja: Ida-Maria Wikstr枚m.

Digital  smart helpers have leaped from Sci-Fi literature into our pockets and on to our tables. Amazon鈥檚 Alexa, Apple鈥檚 Siri and Google Assistant are taking us away from computer and mobile handset screens 鈥 and making the spoken word our new user interface.

It is already easy to check news, choose music, order a cab and command household smart devices using voice control. 

But how are such AI-utilising smart helpers in fact able to understand us?

Phase 1: speech recognition

In order to function, smart helpers must always be on. They鈥檒l hibernate and listen to their environment until they recognise a key word uttered within range.

Amazon鈥檚 virtual assistant, for example, wake up when it hears its name Alexa. An LED ring on the smart speaker turns blue to indicate that it has awoken.

Apple鈥檚 Siri also functions with the same principle. When it hears the prompt Hey, Siri, it starts recording and uploading the user鈥檚 speech to recognition software stored on a cloud service.

The digitised speech is first spliced into short bits only fractions of a second in length.

鈥淓verything starts with spectrum analysis, i.e. examining the frequencies found there. Patterns, which describe different sounds, are created in the frequency space,鈥 says Associate Professor Mikko Kurimo from Aalto University鈥檚 Department of Signal Processing and Acoustics.

All material redundant from the perspective of speech recognition, such as the pitch of the speaker鈥檚 voice and background sounds, is removed in conjunction with splicing.

鈥淚n other words, it tries to find patterns that indicate what speech sounds have been uttered,鈥 Kurimo says.

Speech recognition is made more difficult by that fact that we speak incoherently, swallow words and use gestures and utterances. The words we speak can also sound alike, as is the case with, for example ate and eight.

鈥淭hese days, speech recognition is more and more often performed with deep neural networks,鈥 Kurimo says.

Deep neural networks mimic the way the brain operates and consist of certain types of simple calculators known as artificial neurons. A neural network becomes efficient when interconnected neuron layers communicate with the neurons of the same and the next layer.

In addition to statistical sound models, neural network speech recognition search algorithms utilise language models built with the help of extensive text materials. Language models predict the probability a word will occur after another word as well as the likely way in which it will be pronounced. This helps weed out unlikely words to speed up recognition.

鈥淎 speech recognition application thus performs the task of finding the sentence the user most probably spoke,鈥 Kurimo says.

Phase 2: processing natural language

The aim of natural language processing is to decipher the meaning of text 鈥 i.e. identify what the user wants from its digital helper.

Neural networks are also utilised in natural language processing. Speech data is scoured automatically for key words and phrases in order to ascertain what the user鈥檚 words might possibly relate to.

Neural networks are trained for their tasks by feeding them a large volume of data for processing and then comparing their output values to known correct values. Corrections are made until the result no longer improves. After this, the system is capable of operating independently.

One project headed by Kurimo has researched the production of automatic descriptions of audiovisual material. Among other things, archived Yle videos were chosen as source material. The developed method is able to simultaneously interpret both the speech recorded on the video as well as the moving video image 鈥 and can generate a text description of them. The system was taught by using human-written descriptions of the same videos as points of reference.

The size of the databases used to teach deep neural networks is a central factor. This is why commercial digital helpers are being produced by giant corporations like Amazon, Apple, Google and Microsoft.

鈥淢ajor companies have access to extensive databases, and they can perform automation quite easily. It arduous to start making a chatbot from scratch. You have to accumulate a database somehow.鈥

Phase 3: fulfilling the request

The last phase is to fulfil the user鈥檚 request. In addition to information retrieved from the net, digital helpers take advantage of, for example, the contact details, location information and calendar on the user鈥檚 phone in order to form a better idea of what the user wants.

This is why a digital helper can appear surprisingly smart when fulfilling simple requests like connecting a call, looking up weather information or ordering a pizza.

But ask a helper like this to tell you what鈥檚 going on in Silicon Valley, and it will provide a clumsy answer containing random search results related to the term Silicon Valley. A digital helper would be unable to deduce whether it is being asked about the history, weather or companies active in the area.

鈥淭hey run out of smarts the moment you go beyond their design space,鈥 Mikko Kurimo says.

There has also been a shift to employing deep neural networks in generating voices for digital helpers. Speech sounds are always interconnected in natural speech, and incompatible sounds were precisely what made early smart helpers sound so robotic. Today, neural networks perform calculations on the fly to enable the correct pronunciation of the phrases spoken in reply.

鈥淎 synthetic speech generator is fed the syllables and words to be emphasised as input, and these make the speech sound natural. The generated signal is then transmitted to the user鈥檚 terminal device for playback.鈥

Towards individualised user interfaces

Even the most conversational AI will not, for quite some time, be able to serve as a worthy debate partner like we鈥檝e seen in so many science fiction movies.

Professor Antti Oulasvirta from the Department of Communications and Networking considers it problematic for voice user interfaces that they are unable to actually understand language.

鈥淎I doesn鈥檛 learn language sort of by engaging in physical and social interaction. It cannot learn the linguistic frame of reference to which words or gestures refer to.鈥

Research on the interaction between humans and AI-employing systems is nevertheless progressing all the time, and the area of possible application is expanding in tandem. One such application area is using computational models to improve user interfaces, a subject that Oulasvirta鈥檚 research group has been studying.

For example, a user鈥檚 browsing history can be used to reformat a website to a layout that feels immediately familiar to the user.

鈥淚t is possible to create a more pleasant browsing experience in this way. Headers, for example, could almost always be found in the same spot.鈥

Fresh research subjects have also been discovered within an activity as mundane as inputting text. Coupling cognitive science, a field that researches phenomena related to observation, learning and memory, with AI enables the building of models, which accurately predict how a person鈥檚 individual characteristics affect, for example, writing on a smartphone display.

When such models are connected with a machine optimiser that simulates alternatives, the user interface can be tailored to suit a specific user. This process has identified smartphone use solutions for older people who suffer from shaky hands, for example.

Oulasvirta鈥檚 team has also created a new layout for French computer keyboards, which was recently approved by the standardisation authority of France.

鈥淎ll of France will be typing special characters in a way determined with the aid of our optimiser,鈥 he says.

A research project dealing with the modeling of emotions is also ongoing.

鈥淚n the final analysis, the field of AI deals with presenting human matters computationally,鈥 Oulasvirta says.

He points out that the familiar journey planner found on smartphones is also based on AI, even though most users would not think of it as an AI application. Oulasvirta鈥檚 view on the matter is, however, clear.

鈥淲henever some intellectual capacity can be realised computationally, it, in my opinion, represents AI.鈥

Text: Panu R盲ty. Illustration: Ida-Maria Wikstr枚m.

This article is published in the  (issuu.com) October 2018.

Kuvituskuvassa puhelimen mikropiirej盲, hermosoluja ja aivokudosta. Kuvittaja: Ida-Maria Wikstr枚m.

Fresh research subjects have also been discovered within an activity as mundane as inputting text. Coupling cognitive science, a field that researches phenomena related to observation, learning and memory, with AI enables the building of models, which accurately predict how a person鈥檚 individual characteristics affect, for example, writing on a smartphone display.

A brief AI glossary

  • Updated:
  • Published:
Share
URL copied!

Read more news

A person walks past a colourful mural on a brick wall, illuminated by street lamps and electric lines overhead.
Cooperation, Research & Art, University Published:

New Academy Research Fellows and Academy Projects

A total of 44 Aalto researchers received Academy Research Fellowship and Academy Project funding from the Research Council of Finland 鈥 congratulations to all!
Two flags at Aalto University: a pride flag and a yellow flag. A modern building and green trees are in the background.
Press releases Published:

LGBTQ-Friendly Firms More Innovative

Firms with progressive LGBTQ policies produce more patents, have more patent citations, and have higher innovation quality as measured by patent originality, generality, and internationality.
Two light wooden stools, one with a rectangular and one with a rounded structure, placed against a neutral background.
Research & Art Published:

Aalto University's Wood Studio's future visions of Finland's most valuable wood are presented at the Finnish Forest Museum Lusto

Curly birch 鈥 the tree pressed by the devil 鈥 exhibition will be on display in Lusto until March 15, 2026.
Five people with a diploma and flowers.
Awards and Recognition, Campus, Research & Art Published:

Spring term open science highlight: Aalto Open Science Award Ceremony

We gathered at A Grid to celebrate the awardees of the Aalto Open Science Award 2024 and discuss open science topics with the Aalto community.