back to article Mozilla releases voice dataset and transcription engine

Mozilla has revealed an open speech dataset and a TensorFlow-based transcription engine. Mozilla floated "Project Common Voice" back in July 2017, when it called for volunteers to either submit samples of their speech or check machine translations of others' utterances. The project has since collected 500 hours of samples (in …

  1. John Smith 19 Gold badge
    Coat

    If it can do transcription of more than 1 sec in 1 sec of processing it'd goind alright.

    But can it do 1 sec of 1 second of 2 people from opposite sides of the NI/IR border talking English at > 200 WPM?*

    *Very lightly inflected and the words seem to run together.

    1. Sebastian Brosig

      Re: If it can do transcription of more than 1 sec in 1 sec of processing it'd goind alright.

      I say it's doing all right if _that_ doesn't make _any_ sense to the AI.

  2. CertMan
    Unhappy

    Real world data set with accents?

    As local accents die out; as a population we're becoming more gutter 'Essex'. Then due to uniformity, the data set can shrink, and the accuracy of translation should go up. I hope.

    "Yent nevr gunna git Rowl - Yent!" trans. "I don't think that you will ever understand the Rothwell based accent". Source: Rothwell & Desborough, Northamptonshire very local accent. (So local that even BBC Radio Northampton, based 15 miles away, took the p*ss out of it!)

    In the 1970's and 80's when I went to the local school, it would take newly imported teachers about a month to understand us properly. The only careers advice I was ever given was "get some elocution lessons, if you want to work more than 5 miles away!"

    Sadly now a very diluted accent as the town has grown with outsiders and the youth has learnt to speak from the TV and Internet rather than talking with their parents.

  3. colinb

    6.5 per cent error rate?

    That's very good.

    Given Microsoft was crowing about a 6.3% rate last year and the human error rate is 5.8%

    With improved contextual parsing the tech is on track to better human translation (for English)

    Why people think Neural Networks have no use on previous threads baffles me, its core to all visual and audio pattern matching today.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like