by Lance Winslow
Experiencing the new modern paradigm shifts
in technology will require humans to become one with the technologies
they create and the ability to interface in real-time. Perhaps, the
greatest step towards this goal is Voice Recognition Technology, where
humans can talk and communicate in a way that is natural to them through
their evolutionary process - vocal cords and speech. Thus, they are able
to interface with the tools they have created.
Today, when you buy a new computer with the
Microsoft Vista Operating System pre-loaded, it comes with Windows
Speech Recognition, but even if you have an older computer, there are
many good Voice Recognition products available such as Nuance's Dragon
Naturally Speaking. Currently, this article is being written using the
version 8.1, soon, I will upgrade to the next version 9.1, said to be
even more accurate, within 99%. This is a huge improvement from my first
try at speech recognition software, IBM VoiceType Dictation 3.0, I
purchased back in 1995, twelve
years ago.
Over the past couple of years, I have
indeed worn off the letters on three different laptop keyboards, perhaps
I do not trim my fingernails as often as I should, or perhaps it has
something to do with the fact that I write 4,000 to 14,800 words a day,
pounding out articles on those plastic keys. Either way, for me Voice
Recognition Software ranks up amongst the top greatest inventions of
mankind. Technologies that increase productivity and efficiency are the
most significant and at this point, I give thanks to Ray Kurzweil for
his contributions to Voice Recognition.
Voice Recognition Software has enjoyed good
R and D Expenditures and the number of applications that entrepreneurs
are finding for this technology have also fueled that fire going
forward. You only need to read through a few issues of Speech Technology
Magazine to get an idea of how fast things are moving right now. There
are rapid advances and uses of this software in nearly every major
Industry:
·
Transportation,
·
Communication,
·
Energy,
·
Education,
·
Military,
·
Mining,
·
Manufacturing,
·
Policing,
·
Prisons,
·
Courts,
·
Construction,
·
Disaster Relief,
·
Space
These industries use these Voice
Recognition Tools for Corporate Customer Relations, Training, Design,
Project Management, Public Relations, Advertising, Word Processing, Data
Mining, Writing, Translation, Recording, Machine Interface, and that is
to just name a few. These applications in Voice Recognition have
improved efficiency, saved time and that translates, at least to the
corporations that employ the technology, into quarterly profits and
improved shareholder's equity, its all been very well received.
No, it has not been perfect, yes, there are
kinks to still work out. There are many different dialects, regional
variations of accents, and countless languages, some rather obscure.
Indeed, there are shortages of top-notched specialists in the field.
But, Voice Recognition is crossing the digital divide and preventing
unnecessary political impasse, that means fewer conflicts, fewer wars
and a safer world as well. Can one technology really do all that? Yes,
the United Nations is also employing these tools for that very important
common cause of humanity - peace.
Corporations are now using CRM Voice
Software that can pick up customer intent, emotions and feelings, just
by studying the patterns in the voice, pauses between words, and the
voice inflection of those words spoken. This means that someone talking
to a call center computer is completely understood and the company can
derive more information about satisfaction levels, customer service and
improve the quality of their products and services faster. Politicians
are also starting to use this for constituents that call it. Of course,
these technologies have great value to the military for threat
assessment as well, protecting human civilizations from terrorism.
The voice recognition software programmers
and engineers have devised way to give each word, phrase and sentence,
values of intensity, volume, and variation, but obviously, you can see
how quickly that can complicate it self, not an easy task. However, in
doing so voice recognition can also get very close to identifying the
unique speaker. This has applications as well for training the software
for accuracy amongst users, switching users without being instructed and
bettering accuracy with unknown users. Each year voice recognition has
improved - the future of voice recognition is here.
The Voice Recognition software is slowing
replacing court reporters, an endeavor with huge staffing shortages in
many areas. Now some courts are using Voice Recognition instead. Police
Departments and US Troops in Iraq are using Tablet PCs and PDAs that can
interpret what's said and deliver that sentence in voice to the other
party in their language. When the other party replies, their words are
translated back into English and written on the screen.
Voice Recognition lends itself well to
One-on-One instruction for students using Avatars to help with teacher
shortages, help special kids and assist in computer training. Corporate
Training, as well is a big application, allowing complete interface,
with the trainee or student, which is perfect for educational purposes.
And these are only some of the applications that are currently in use or
development right now. The question is what lies in store for the
future? Lets look 5-years out and see the road ahead and ask ourselves
what else do we see just beyond the horizon.
It is time to start discussing some of the
potential future applications or 'killer aps' that are now possible or
will be shortly, along with issues of funding the research forward. We
must consider how other complimentary Artificial Intelligence advances
will hyper-advance Voice Recognition performance. If you have ideas or
concepts for the future of Voice Recognition, we should talk. Below are
just a few of the concepts that come to my mind while thinking on this
subject today, perhaps you have some too. Indeed, there really is no end
to the potential, and we are only limited by our imaginations.
Future Advances for Voice Recognition:
1.
Body Language + Facial Expression + Voice Recognition:
Currently, there are robotic android
projects in the works in Japan and in the US; facial expression or
mirroring, is very popular. The goal is for the human that interfaces
with the system to create an emotional bond with the machine. Voice
Recognition systems that also read body language and facial expression
can also be used for threat assessment at lets say airports, border
crossings and replace human workers at those locations or choke points.
If you smile at a robotic android and it
smiles back at you, while you are having a conversation, this ups the
emotional value of the conversation to the human. Perhaps the system
might start complimenting you. If you are persnickety to the system,
maybe it will mirror those responses or reciprocate an angry response or
work to diffuse the situation, of course it all depends on its
programming, but you can see the advances, potential applications and
the trends going forward.
If you will recall Hal the famous science
fiction computer, it said: "I sense hostility in your voice Dave."
Perhaps since this was once in a science fiction work, human scientists
today are trying to make it so. Right now, we are there, with have this
technology, CRM Voice Recognition Software can sense emotion,
hesitation, aggression, hostility, anger, etc. So, within five years we
will see these features in more and more applications.
Haptics is another field of science, which
lends itself well to the merging of Voice Recognition and Facial Feature
emotional recognition. Perhaps robots of the future will look like
humans and mimic their characteristics. A robot that feels a strong
handshake and firm grip along with a self-confident voice of an
individual with an earned ego might elevate the trust factor a notch or
two.
Sizing up the confidence in the individuals
ability to perform - will voice recognition software combined with these
other technologies replace corporate Human Resource Directions, Project
Managers, Middle Managers and CEOs too? Folks are already thinking here;
10-15 years out, but not without ruffling a few feathers. Being replaced
by a computer, robot or system has caused many a conflict in the past,
so the plot thickens and more barriers are foreseen.
2.
Emulation of Emotion and Empathy:
Emulating emotion and empathy is on its way
right now. Currently, most consultants of artificial intelligent
customer response systems for 'call centers' advise that the voice on
the other end if coming from a machine, should be easily identifiable by
the human calling in as a computer systems with voice recognition
features, because humans do not like to be tricked, when they find out,
it makes them upset. Of course, with the advent of emotion emulation or
empathy it is possible and we have the ability to do this now.
Indeed, artificial intelligent computers
have been used to go online and participate on forums and can
participate for 15 threads or more, without detection. In voice
recognition, if the voice sounds legitimate, a full conversation can go
on for a while, without the human realizing it is talking to a machine.
With a call center system handling a
complaint, a computer system might side with the customer and listen to
them and even say;
"I know how you feel, I am so sorry this
has happened, let me see what I can do" or;
"yes, I understand, this is very urgent,
let me have you talk to my supervisor"
then pass the customer off to a real human
or perhaps another voice system, with a more authoritative voice? The
customer on the other line may never know that they are talking to a
computer or computers. Indeed, this does not sit too well with many in
the industry but it a place where software professionals of voice
recognition are thinking and discussing now, surely you can see
applications for this.
For instance; think of how such
applications lend themselves well for Crisis Hotlines, Online Self-Help
Websites or even Computer Systems to assist At Risk Kids? What about the
Catholic Church confess to an AI Priest, and keep your secret safe (just
kidding) - But, who knows what applications folks might come up with for
these advances in emulation of emotion and empathy?
3.
Understanding a Joke and Responding with Another One:
Artificial Intelligence is getting better
all the time, soon, AI software engineers will create joke recognition
systems, where the computer will understand irony and know when the
human is telling a joke, then reciprocate with a joke of their own,
perhaps creating a joke from scratch. The system would be pre-loaded
with all the jokes common to human interaction in all cultures. It will
be able to pick one that has most likely not been heard by the human
they are working with at the time; also put in memory that it has been
told to that individual so it does not repeat it.
Wow, this is getting complicated fast isn't
it, and this is exactly why it has not been fully achieved. Humor has
been a huge stumbling-block for human voice recognition and artificial
intelligence systems, yet it is something that humans have a knack for.
Still, they are working on this challenge and we will see it within 5 to
10 years, the AI software folks will have that problem licked.
This will mean advances for human
companions for long-term space flight, help with rehabilitation and ease
the tension of humans working along side robotic partners or assistants,
as the transition of robot and human workers takes place. Since robots
will be working with and assisting humans, it will be necessary to keep
the peace to foster cooperation.
4.
Vocal Cord Vibration Recognition + Current Voice
Recognition
Currently, there is advanced research in
the US Military that allows vocal cords to be read, without actual
speech or voice, these systems are working now. This is done with a
device near the larynx that picks up sensitive vibrations, which is
coupled to a transmitter for sending. The receiver or other special
force member has a tiny ear piece so they can hear that speech, all
silent to those nearby, within six inches of those using the system.
This is getting pretty close to mimicking thought transfer, but in
essence it is a form of voice recognition, hooked to a communication
device.
These systems will get much better and soon
the secret service members, special forces, SWAT teams, will no longer
have little cords coming out of their ears, but they will communicate
without notice. The larynx
vibrational speech recognition might be mounted inside a "clip tie" and
no one will be the wiser. There are many applications for this if you
think on it.
Applications:
Today, in considering this question, I
wrote down some potential industries and uses where these technologies
will be needed and desire, which would warrant R and D expenditures.
Some of these ideas are borrowed from general knowledge, articles,
papers and/or think tank conversations, still others off the top of my
head. These are merely just the natural progression and evolution of
voice recognition. The bigger question is where do YOU see the voice
recognition future - What say you?
CAD Design Assistant
Cell Phone w/Interactive Features
Communication with Dolphins
Death of TV Remote Control
eGovernment Interactivity with the eCitizen
FAA Control
Flight Controls
Interactive Internet Searching
Interactive Online Books
Interactive Shopping Carts
Intercepting Terrorist Communications
Rehabilitation Companion
Robotic Space Arm Control
Telephone and Kiosk Ordering Systems
UAV Voice Control
Video Game Interface
Virtual Reality Voice Recognition
Entertainment
Wrist Watch All-in-one PDA, Cell Phone,
Video Phone, Music System w/ no buttons
Self-Driving Car Interface
Sources:
1.
Microsoft Vista Operating System w/Voice
Recognition:
http://www.microsoft.com/enable/products/windowsvista/speech.aspx
2.
Dragon Natural Speaking 9.1 Voice Recognition
Software:
http://www.nuance.com/naturallyspeaking/
3.
IBM VoiceType Dictation 3.0 for Windows 95 and the New IBM ViaVoice:
http://www2.edc.org/NCIP/vr/VR_VoiceType.html
http://www-306.ibm.com/software/pervasive/embedded_viavoice/
4.
Other Competing Voice Recognition Software for Word
Processing:
http://www.consumersearch.com/www/software/voice-recognition-software/comparison.html
5.
Honoring Kurzweil's contributions to Voice Recognition:
http://www.nfb.org/Images/nfb/Publications/bm/bm00/bm0003/bm000311.htm
6.
Speech Technology Magazine, articles, white papers,
research links:
http://www.speechtechmag.com/
http://www.speechtechmag.com/Archives/Default.aspx?ContextSubtypeID=133
7.
Challenges of Voice Recognition:
http://users.ece.gatech.edu/~chl/ngasr03/chair-rabiner.pdf
http://www.msri.org/publications/ln/hosted/nas/2002/rabiner/1/index.html
http://www.nist.gov/speech/test_beds/mr_proj/
http://www.clsp.jhu.edu/seminars/abstracts/F1999/juang.html
http://www.ewh.ieee.org/r10/bangalore/sps/html/spl/2007spl02.htm
http://ling.uta.edu/~laurel/NYTmachine-prose.pdf
8.
Voice Recognition for Military:
http://www.usatoday.com/tech/news/techinnovations/2007-04-02-ibm-donation_N.htm
http://www.stormingmedia.us/11/1170/A117034.html
9.
Voice Recognition PDA Translation Device:
http://www.cs.cmu.edu/~awb/papers/eurospeech2003/speechalator.pdf
http://www.wired.com/science/discoveries/news/2005/11/69537
10.
Challenges Voice Recognition in Court Reporting
Open Dialogue:
http://www.robson.org/gary/writing/cr-speechrecognition.html