back to article Brit neural net pioneer just revolutionised speech recognition all over again

One of the pioneers of making what's called "machine learning" work in the real world is on the comeback trail. At Cambridge University's Computer Science department in the 1990s, Dr Tony Robinson taught a generation of students who subsequently turned the Fens into the world speech recognition centre. (Microsoft, Amazon and …

  1. Alister
    Coat

    I preferred his work on Blackadder...

    1. Pen-y-gors

      I bet he's sick of people saying "What a cunning plan my Lord" - but he does seem to have dreamt up some plans so cunning you could stick a tail on them etc.

    2. This post has been deleted by its author

  2. Mage Silver badge
    Paris Hilton

    Good

    Basically Amazon, Apple, Google, Microsoft are using 10 to 20 year old technology (envisaged 30 years ago) made accessible via gadgets.

    It's great to read about some progress.

    There is more work to do in terms of parsing of phrases and context so as to not just have fairly dumb (but speaker independent) Speech to text on an existing search engine. However that starts to move into the edge of real AI.

    Compare using Google translate (essentially Rosetta stone + brute force), to your OWN language when it's a subject you are familiar with compared to trying to explain something to another language user (especially NOT English) who doesn't understand the subject. You'll realise that current speech based real time translation (needs voice recognition and then text translate) is mostly hype.

    There is decent speech synthesis, but 2010 Kindle DXG is barely better than 1980s, and decent speech synthesis, like recognition, ultimately needs phrase / sentence parsing though for a different reason (intonation and timing which is not in written dialogue or narration and lead pipe vs lead on a dog, or polish wax vs Polish person etc). Spoken languages are not identical to written, certainly in English, where even written dialogue is nothing like real speech.

    Icon of someone trying to understand.

    1. Anonymous Coward
      Anonymous Coward

      Re: Good

      The deep neural nets in use today stretch back to the deep dark history of 2012; Krizhevsky et al, as envisaged by Hinton in 2006. They're not *that* old.

  3. David Roberts

    Sounds similar to the way we work.

    Context and hidden cues are important.

    One fine example is Peter Kay who does part of his act telling you an alternative lyric to a song then lip syncing to the track. You hear the alternative lyric because your mind and eyes have been given misleading additional information.

    Which leads me to conclude that a fine test for this kind of software would be the accurate transcription of pop song lyrics.

    1. Anonymous Coward
      Anonymous Coward

      Re: Sounds similar to the way we work.

      Ah. 'Bunny's too tight to mention' ?

      https://www.youtube.com/watch?v=f95vD0EdyXg

      1. Anonymous Coward
        Anonymous Coward

        Re: Sounds similar to the way we work.

        "I know what I want, and I want it now -

        I want you, 'cos I'm Mr Bean."

    2. Lotaresco

      Re: Sounds similar to the way we work.

      "You had a temper like my jellied eels:

      Too hot, too greasy."

      1. CustardGannet

        Re: Sounds similar to the way we work.

        "Beelzebub has a devil for a sideboard..."

  4. John Smith 19 Gold badge
    Unhappy

    All human languages are undersandable by neural networks, as that's what humans use.

    Logical when you think about it, but something a lot of researchers have forgotten.

    I think the clever parts are a)Leveraging phonemes (IE the building blocks of all words in all languages) in deep learning and b)Making it run on a phones processor (I suspect a lot of chewing on existing language samples).

    Now the bad news.

    1) Connected speaker independent voice recognition in near real time is very interesting to groups who want to spy on large numbers of people simultaneously. as a UK national Dr Robinson can be "deputized"

    2) Experience with spinvox is concerning. It sounded too good to be true. It was.

    1. Mage Silver badge
      Coffee/keyboard

      Re: All human languages are undersandable by neural networks, as that's what humans use.

      Rubbish.

      Because "Computer Neural Networks" have little in common with biological neural "Networks" in brains. It's terminology inspired by biology.

      1. John Smith 19 Gold badge
        Unhappy

        It's terminology inspired by biology.

        Well it's also architecture inspired by biology.

        It's true single layer NN were very limited, allowing Marvin Minsky to write a book killing funding for them for decades, but the fact remains you're reading this entirely through multiple layers of cellular processing.

        It's the only architecture we know can produce behaviour that humans class as "intelligent" over the full spectrum of what humans think of as "intelligent."

    2. Lotaresco

      Re: All human languages are undersandable by neural networks, as that's what humans use.

      "Leveraging phonemes"

      Go wash your mouth out. There's no need for the bullshit bingo here.

  5. Tronald Dump

    Here's to you Mr Robinson

  6. Craig 2
    Trollface

    "Icelandic has only about 400,000 speakers, and they're worried that their language will die out. "

    I sympathize with small nations losing their lingual identity (Except Wales*) but wouldn't it be great if everyone in the world spoke the same language? A naive thought I know, it seems difficult in my house just to get teenagers to speak the same language..

    *Just printing all those dual-language road signs must cost, and are they REALLY needed?

    1. Gordon JC Pearce

      Absolutely not! It's the same in the north of Scotland too with signs printed in English and Gaelic.

      We could just leave the English (mis-)translations off the signs. Tough luck for the English-speakers who want to move here but can't even be bothered to learn to speak the language.

    2. Hollerithevo

      It's the complexity we need

      We've lost hundreds of thousands of languages and can't therefore measure the ways of thinking they represent. I remember learning that Gaelic didn't have words for the same colours as English, they had ones for blue-greens and grey-blues that we don't have. Even English has lost many precise words -- a good half-dozen simply for parts of a wheel. As we simplify and homogenise we end up with 'stuff' and 'thing' and other generalisations that lessen exactitude and the joy of language.

      So I don't want one world language, just as I don't want one form of music and one kind of car.

      1. Peter X

        Re: It's the complexity we need

        <quote>I remember learning that Gaelic didn't have words for the same colours as English, they had ones for blue-greens and grey-blues that we don't have.</quote>

        I seem to recall seeing a programme on TV in the last... year or three... about somewhere foreign* (even more exotic that Scotland), where they also had names for colours that we* would consider mere shades. To their way of thinking, those colours were utterly distinct. The opposite was also true, so (I can't remember the colours in question) there was this funny thing where they'd ask them to spot the difference between one colour and another, and they honestly struggled.

        So it's interesting how language affects how individuals perceive the world. It's also probably a reason why I *should* learn at least one other language... I won't though! ;-)

        * For context, I'm from England, don't speak anything but English, and anywhere outside the British Isles *is* both foreign, and probably exotic to my mind! :D

        1. John Brown (no body) Silver badge

          Re: It's the complexity we need

          "I seem to recall seeing a programme on TV in the last... year or three... about somewhere foreign* (even more exotic that Scotland), where they also had names for colours that we* would consider mere shades. To their way of thinking, those colours were utterly distinct."

          I have that same vague memory. Aussie Aboriginals, or maybe a tribe somewhere in Africa. Many words for different colours of blue but none for green, or something like that.

      2. Anonymous Coward
        Anonymous Coward

        Re: It's the complexity we need

        It's quite scary just how quickly words are being lost due to homogenisation. There was an article on the BBC News website a few weeks ago and even something as simple as 'splinter' had 10 different words regionally throughout the UK in the 1950s and now we're down to just two with 'splinter' in most of the UK and 'spelk' hanging on in the North East of England.

        http://www.bbc.co.uk/news/uk-england-cambridgeshire-36388364

        To paraphrase: All those words will be lost in time, like tears in rain. It's been happening since language first evolved and will continue to happen, it's a natural thing but at least we have the technology and knowledge to archive as much as possible now.

    3. John Smith 19 Gold badge
      Unhappy

      "I sympathize with small nations losing their lingual identity (Except Wales*) "

      Interesting question would be if this processing is so (relatively) cheap have they done Welsh and Gaelic?

      They are both a lot closer to home than Iceland, and not quite as cold.

  7. DagD

    The new bio metric?

    By the time they work out all the finite identifiers that makes each voice unique, the bad guys will have adopted the technology to "fake" other peoples' voices by sampling a voice pattern then generating a speech algorithm.

    1. Count Ludwig

      Re: The new bio metric?

      But that doesn't matter because... repeat after me...

      A voice-print is a username, not a password.

      A voice-print is a username, not a password.

      A voice-print is a username, not a password.

      ...

      and no-one would be silly enough to use it to authenticate anything, would they?

  8. Lotaresco

    Faster speech recognition

    It's all about Time, team.

  9. allthecoolshortnamesweretaken
  10. Anonymous Coward
    Anonymous Coward

    Markov models

    FTA:

    "Hidden Markov models have this really weird assumption in them that all of that history didn't matter."

    Umm, markov models work based on the probabilty of B (or C or D) following A. Using the history of the particular data in question to work out a probability tree is exactly how they function. Or am I missing something here?

    1. The Stormcrow

      Re: Markov models

      Hidden Markov models assume the system being modeled is a Markov process - i.e., it is stateless.

      They use collected data to figure out how likely you are to go from A to B, C, or D but ignore the path you took to reach A. The probability of transitioning from A to B is the same regardless of whether you transitioned to A from C or from D. That is the "history" being referenced.

      1. Anonymous Coward
        Anonymous Coward

        Re: Markov models

        "They use collected data to figure out how likely you are to go from A to B, C, or D but ignore the path you took to reach A"

        There's nothing preventing someone writing a markov model algorithm that takes account of more than one hop: eg probability of going from A to D via B or C. Sure, the tree would start to expand exponentially but it is possible and you wouldn't need to store that many hops to approach decent prediction.

        1. ivan_llaisdy

          Re: Markov models

          "There's nothing preventing someone writing a markov model algorithm that takes account of more than one hop" People do do this, but such a model is no longer a Markov model. It wouldn't be a model of a Markov process. In a Markov process, future states of the process depend only on the current state.

          Models that take account of more than one hop would be more general graphical models like Bayesian networks, or neural networks.

  11. Stjalodbaer

    More history

    Stephen Grossberg

  12. TheElder

    Might work with some US Americans...

    I very much doubt it can work with me (Kanuck). I happen to speak Danish since I was born as well as some German, Swedish, Norsk, French, and a bit of Ruski as well as a small amount of Tagalog and some others. Not bad with tonal languages since I have true perfect pitch. When I go into the European languages I immediately go into a pan European dialect, depending on which language(s) I dream in. I can also drop my voice to Basso Profondo when I like to do some sing along.

    Also, Danish has 52 phonemes such as the swallow your tongue Glottal Stops. I can also roll my "R's" much the same as Rammstein industrial metal, one of my favorite groups.

    I wonder what the speech detector would would make of a sentence like this:

    Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.

  13. MT Field

    Tremendous

    But also disconcerting. It's taken years and much evolution of technology to make speech recognition with neural nets usable in a commercial sense.

    I get the feeling that this is just the start of a very big thing. Perhaps equivalent to the first steam engines of the industrial revolution. But will it really take two centuries to become ubiquitous?

  14. rshpount

    Twice as expensive as google speech API

    Google charges $0.006 per 15 sec. Speechmatics is 4p per minute. Are they really twice better to charge twice as much?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like