MY NEW, LOWER, REVISED, EXCELLENT RATES FOR 2012: Bronze, Gold, or Platinum--Choose Your Level!
The accuracy of computer speech recognition flat-lined in 2001, before reaching human levels. The funding plug was pulled, but no funeral, no text-to-speech eulogy followed. Words never meant very much to computers—which made them ten times more error-prone than humans. Humans expected that computer understanding of language would lead to artificially intelligent machines, inevitably and quickly. But the mispredicted words of speech recognition have rewritten that narrative. We just haven’t recognized it yet.
After a long gestation period in academia, speech recognition bore twins in 1982: the suggestively-named Kurzweil Applied Intelligence and sibling rival Dragon Systems. Kurzweil’s software, by age three, could understand all of a thousand words—but only when spoken one painstakingly-articulated word at a time. Two years later, in 1987, the computer’s lexicon reached 20,000 words, entering the realm of human vocabularies which range from 10,000 to 150,000 words. But recognition accuracy was horrific: 90% wrong in 1993. Another two years, however, and the error rate pushed below 50%. More importantly, Dragon Systems unveiled its Naturally Speaking software in 1997 which recognized normal human speech. Years of talking to the computer like a speech therapist seemingly paid off.
However, the core language machinery that crushed sounds into words actually dated to the 1950s and ‘60s and had not changed. Progress mainly came from freakishly faster computers and a burgeoning profusion of digital text.
Speech recognizers make educated guesses at what is being said. They play the odds. For example, the phrase “serve as the inspiration,” is ten times more likely than “serve as the installation,” which sounds similar. Such statistical models become more precise given more data. Helpfully, the digital word supply leapt from essentially zero to about a million words in the 1980s when a body of literary text called the Brown Corpus became available. Millions turned to billions as the Internet grew in the 1990s. Inevitably, Google published a trillion-word corpus in 2006. Speech recognition accuracy, borne aloft by exponential trends in text and transistors, rose skyward. But it couldn’t reach human heights.
Source: National Institute of Standards and Technology Benchmark Test History (borrowed from: http://robertfortner.posterous.com/the-unrecognized-death-of-speech-recognition)
Quick Summary: Don't record more than 30 minutes at a time--it will be too cumbersome a document to read otherwise, unless you plan to create several chapters in an e-book or it is part of a larger body of work.
Take a Moment to Sound off:
Online Surveys powered by SurveyGizmo