Want a medal? Microsoft 7.2% less bad at speech recognition than IBM • The Register Forums

Tuesday 22nd August 2017 08:11 GMT Anonymous Coward

She said what?

My company has been trialling putting corporate videos on Microsoft Stream. The accuracy of automatic transcription is astonishing, making video content searchable by word as well as metadata.

3 2 Reply

Tuesday 22nd August 2017 08:28 GMT imanidiot

I'm not holding my breath

Error rates have been improving, but that doesn't really mean speech recognition will get GOOD any time soon. Humans tend to be able to gloss over a lot of misheard words and auto-correct from context. Automated recognition systems too often get this wrong.

4 0 Reply

Tuesday 22nd August 2017 08:30 GMT Pen-y-gors

What sort of errors?

It would be interesting to know what sort of errors they're getting, and what the humans get.

It's very different if the software interprets 'Trump' as 'rumpy-pumpy' or whether it confuses heel, heal, he'll etc.

Accents are a big problem - think strong Derry accent 'now' sounds like 'nigh'. Heaven knows how you train software to recognise that speaker A is from Glasgow and speaker B is from New Jersey.

4 0 Reply

Tuesday 22nd August 2017 08:49 GMT Mage

Re: speaker A is from Glasgow and speaker B is from New Jersey.

Context and engage in conversation is best. Speech recognition / "AI" is poor at context and rubbish at "real" conversation.

It's more brute force using a giant database.

"Kin yer mammy sew?"

Very different meaning in parts of Glasgow. Or it was 40 years ago.

5 0 Reply
Tuesday 22nd August 2017 13:03 GMT Teiwaz

Re: What sort of errors?

strong Derry accent 'now' sounds like 'nigh'.

It's more a nai[[g]h](the strength of the 'gh' is dependant on emotion, alcohol intake and how long it's been raining).

1 0 Reply

Tuesday 22nd August 2017 08:53 GMT The Jon

Outlook Voice Recognition

Here's the last line (personally transcribed) from a voicemail received recently:

I'll drop you an email in this respect as well and hopefully we'll catch up before Thursday. Take care, bye bye.

And this is what Outlook automatically transcribed it as (see if you can spot the difference):

I'll drop you an email in this respect as well and hopefully will touch before fisted take care bye bye.

3 1 Reply

Tuesday 22nd August 2017 08:56 GMT ratfox

Re: Outlook Voice Recognition

I heard that those automatic systems take into consideration the interests of the user, as extracted from their browser history, etc.

13 0 Reply
1. Tuesday 22nd August 2017 10:06 GMT Solarflare
  
  Re: Outlook Voice Recognition
  
  Personally, I would want you to buy me a coffee before any touchy-fisty business, but then I'm old fashioned like that I suppose.
  
  5 0 Reply
Tuesday 22nd August 2017 11:29 GMT kain preacher

Re: Outlook Voice Recognition

So it knows what you really do on Thursdays. Man you should clean out your emails from your domme.

1 0 Reply

Tuesday 22nd August 2017 10:12 GMT Anonymous Coward

Every time IBM crows about something good they've done, I just remember WebSphere and Lotus Notes.

8 0 Reply

Tuesday 22nd August 2017 13:31 GMT Lord Elpuss

TBH WebSphere was (is?) pretty good. The issues arose due to the sheer spellbinding complexity - often requiring a dedicated team to spend days installing it. For me, the killer was the fact that every WS installation was a custom job, and only ever worked well for one specific environment. Migration, upgrade, patching or even just breathing hard in the server room could often necessitate a custom re-install.

2 1 Reply
1. Wednesday 23rd August 2017 17:25 GMT kain preacher
  
  Websphere is use to torture the support team. Lotus notes is to torture every one. Combine the two and you get sharepoint. Wounder what happens if you combine share point point with clippy or even better yet sharepoint running on websphere with lotus notes?
  
  0 0 Reply

Tuesday 22nd August 2017 10:46 GMT PNGuinn

Obligatory

I'm so depressed ... I've got this pain ....

1 0 Reply

Tuesday 22nd August 2017 12:40 GMT Tony W

Accents

Humans learn to adjust for accents which mostly make consistent changes in vowels, so I expect AI will eventually do the same. Dialect is another matter, it took me a year to understand broad Potteries.

But I wish some of this expertise could be applied to systems used on the phone by organisations like BT and British Gas, Both these systems (made by the same company I'm sure) consistently fail to recognise when I say "yes" even on repeated attempts, and my accent is pretty ordinary London. Maybe I should say "Yep", "Yup" or "Yeah".

1 0 Reply

Tuesday 22nd August 2017 13:31 GMT Paul Herber

Re: Accents

@TonyW

Absolutely

0 0 Reply

Tuesday 22nd August 2017 17:03 GMT Nick Kew

Nothing new here

All the subjects mentioned here in comments - unclear speech, stumbles, hesitation, um, ah, accents, intoxication, sloppy figures of speech, cocktail party, are precisely what makes speech recognition hard.

And they were what made it hard when I worked in the field, back in the early 1990s. Commentards are identifying issues the researchers have been wrestling with for decades. Something has clearly improved since then, and I don't think it's *just* the march of hardware (Moore's law, etc), though certainly my project was eventually privileged to have use of a supercomputer with tens of megabytes of RAM.

Sadly, performance measures - the accuracy of speech recognition - is still based on some very suspect measures. The figures MS or IBM, or Apple or Google, report, will be very specific tasks that have limited value in measuring the real world. Accuracy measures are quite pernicious, in that they naturally favour a system that uses the same unit of classification as the test set.

So in my day, our system which worked with syllables as primary unit couldn't be meaningfully compared with the majority that used phonemes. And among the latter, different teams used different phoneme sets, thus setting themselves widely different tasks. A system that just classifies vowel-vs-consonant has a much easier task than one that classifies 80 different sounds, but you have to dig deeper into the results than this article to tell that the former's 90% might be less rather than more impressive than the latter's 80%.

I did try to push the information-theoretic measure of entropy as offering better cross-system comparisons, but they seem instead to have gone for defined tasks. Like, erm, testing diesel emissions. Or kids working to an exam. Or ...

2 0 Reply

Tuesday 22nd August 2017 20:21 GMT Anonymous Coward

It isn't just machines that have trouble

You don't need 100% accuracy, but you need 100% accuracy in the right places. My iPhone does voicemail to text, and even though it isn't always 100% I generally don't have to listen to the message because a 95% accurate (or whatever) transcription is good enough.

For meeting notes you'd want near perfect accuracy, especially if you want it searchable. If I'm looking for the meeting where we discussed the outcome of "project athena", but its name was transcribed as "project tina" in a meeting we were informed it was canceled then I'm not going to be able to get that critical information via search. Whereas if I was reading the meeting notes or received notice of the cancellation via voicemail I could easily infer what "project tina" really was.

1 0 Reply

Tuesday 22nd August 2017 21:57 GMT Nick Kew

Re: It isn't just machines that have trouble

If I'm looking for the meeting where we discussed the outcome of "project athena", but its name was transcribed as "project tina"

Now that's the kind of error that's very common in human-produced notes.

Indeed, that applies to any name (other than one clearly expected in the context), because you don't have the reference point to correct what you heard slightly ambiguously. Cold-callers know this well, hence "This is Athena from [mumble]" obscures her affiliation without her failing to tell you it - and if you hear it as Tina that's an extra bonus.

1 0 Reply
1. Thursday 24th August 2017 06:12 GMT Anonymous Coward
  
  Re: It isn't just machines that have trouble
  
  Only if the notes are taken by a different person each time, and who isn't involved in any of the work. Otherwise there's no reason why a person should make that error if project athena is discussed at previous meetings, presentations are given in the meeting that mention it, etc.
  
  0 0 Reply

Wednesday 23rd August 2017 00:27 GMT Griffo

Will it ever be possible?

Subject context is another item that must make speech recognition very difficult.

A while back, I was reviewing the official police transcript of an interview with a man who was accused of killing his wife during a scuba accident.

Luckily the video of the same interview was available, as the number of Critical mis-transcriptions was amazing. Here was something that a human had transcribed from a very clear AV feed, and had seriously gotten wrong on several occaisions. Why? Because the transcriber had no knowledge of Scuba terms, and had transcribed what he/she thought they had heard.

But what they thought they had heard was not shaped by any knowledge of the subject, so was wrongly transcribed. I

supposed a compute may one day be able to work out the subject and then apply a specific set of industry / subject terms to it (which is why I guess they make medical specific transcription software) but as a human I can never follow along when my wife switches topic 13 times in one conversation, so how will a computer?

1 0 Reply

Tuesday 29th August 2017 23:08 GMT mako23

Not a surprise

The Microsoft speech translator understands words like "honesty" "dignity" and mostly "paying a decent redundancy"

The IBM product doesn't

0 0 Reply

Topics

Special Features

Vendor Voice

Resources

COMMENTS

She said what?

I'm not holding my breath

What sort of errors?

Re: speaker A is from Glasgow and speaker B is from New Jersey.

Re: What sort of errors?

Outlook Voice Recognition

Re: Outlook Voice Recognition

Re: Outlook Voice Recognition

Re: Outlook Voice Recognition

Obligatory

Accents

Re: Accents

Nothing new here

It isn't just machines that have trouble

Re: It isn't just machines that have trouble

Re: It isn't just machines that have trouble

Will it ever be possible?

Not a surprise

POST COMMENT House rules

Enter your comment

Add an icon

Other stories you might like

US government excoriates Microsoft for 'avoidable errors' but keeps paying for its products

Microsoft slammed for lax security that led to China's cyber-raid on Exchange Online

Microsoft breach allowed Russian spies to steal emails from US government

Open source versus Microsoft: The new rebellion begins

Microsoft squashes SmartScreen security bypass bug exploited in the wild

Microsoft puts ex-DeepMind boffin in charge of London AI hub

AI gold rush continues as Microsoft invests $1.5B in UAE's G42

Microsoft to use Windows 11 Start menu as a billboard with app ads for Insiders

Microsoft to tackle spam by restricting Exchange Online bulk email

Microsoft gives Hyper-V ceilings a Herculean hike

Microsoft unbundling Teams is to appease regulators, not give customers a better deal

Microsoft brings World of Warcraft and other Blizzard titles back to China

About Us

Our Websites

Your Privacy