Vista Speech Recognition Screencast Demo

Eugenia Loli 2006-08-09 Windows 52 Comments

“Surprise surprise. Windows Vista speech recognition actually works. Contrary to what MSNBC criticize as a ‘wreck’, the speech recognition technology is well developed and highly usable, says Long Zheng.

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSAlert.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

52 Comments

2006-08-09 5:57 am

ValiantSoul
I typically don’t say this about Microsoft products, but this is pretty cool. I like the ideas about when multiple options are available and the screen coordinate system.

Anyone know if OS X has somethign similar? (Never looked into it – just curious)

2006-08-09 8:23 am

REM2000
yeah the mac has the same functionality in Tiger, not sure about previous versions.

Good demo, glad to see Vista is starting to pull together, i hope Microsoft release another public beta (i know they said they will) soon, i also hope it’s a lot better than Beta 2 as i wasn’t impressed.

2006-08-09 10:28 am

Alleister
The catch with Tigers speech recognition is, that it is only available in english. Of course that doesn’t matter if you an native english speaker but i’m happy that i’ll get german speech recognition in vista which is something Apple has no plans for. There will also be a chinese version. Dont know thou what other languages are planned.

I have not tryed myself, since i wont install beta software on my work-box, but a friend of mine tryed it and claims that the german speech recognition works quite well.

2006-08-09 6:24 am

The Lone OSer
Alas, I find Speech Recognition a rather annoying technology. Having first started with it with IBM ViaVoice that was built in to OS/2 Warp, and then ViaVoice via Windows, and now Vista – My experiences alas, are exactly the same, even 10 years on with the technology.

I stem from a little country known as New Zealand.. Population – 98 million (94 million of them being sheep), and i find that, although we speak the Queens English (apparently) – the technology fails misserably with our accent, and you can get a hilarious response with the dictation systems… With Vista – when trying to train it, the software would lock solid every single attempt.

I wait with anticipation of a build of Vista I can actually train – and then we will see if this 10 year old problem persists

2006-08-09 10:05 am

Alleister
Sorry, but when i see an film from New Zealand even my biological speech recognition fails miserably. There would have to be a specialized version for New Zealand.
2006-08-09 11:31 am

RawMustard
Perhaps when kiwis learn to pronounce english correctly, speech recognition software may have a slight chance at working properly. Being married to one, I’m constantly asking her to repeat what she just said. I’m afraid that Fushun chups or suxdi sux don’t count as proper english

Edited 2006-08-09 11:38

2006-08-09 3:33 pm

iangibson
So what is the ‘correct’ pronunciation of English? Let me suggest that it is certainly not the way Americans speak.

Not that there is a ‘way’ – there are many regional accents across the USA just as there are in Britain and elsewhere across the English-speaking world.

For speech recognition software to be remotely useful, it surely must be able to accommodate all of the many accents that are out there.

2006-08-09 3:43 pm

Alleister
Well, that is only possible to a certain degree. On the other hand, you can’t expect an speech recognition system to understand spoken language better than a human.

I don’t know how bad the dialects are inside of america, but there are german dialects even i as an german do not understand.

So people who want to be understood by their software will have to try to speak with as few dialect as possible, because a computer wont do better work than a human on this.

2006-08-09 11:39 am

martinus
“open the bloody start menu, mate!”

2006-08-09 6:33 am

postmodern
Well it DID crash as we all saw in the video clip from MSNBC, and it was certainly a train wreck (as indicated by the audience’s chuckling). Glade to see they got the bugs worked out now. Although, Mr. Zheng never tried to write a letter to his aunt…

let’s set so double select killer delete all!
2006-08-09 6:37 am

audun
Impressive
2006-08-09 6:46 am

Lambda
Well I could only get the audio from my QT plugin, but listening to it reminded me of Blade Runner, when he’s sitting at home, studying the pictures he got from Leon’s apartment, and giving the picture viewer voice commands.

The future is here;)!
2006-08-09 6:58 am

Lambda
I’m impressed. I haven’t really been following the progress of speech recognition, but if the video is any indication, this could be the “killer app” for Vista.

2006-08-09 7:47 am

sbenitezb
“this could be the “killer app” for Vista.”

I doubt. Speech recognition is a pain in the ass. It’s mostly for impaired people, not for everyone day to day use.

We are really far from being able to tell the computer:

* Computer, read my mail; discard all spam.

* Computer, reply to Joe Sixpack: Dear Joe, this voice interface like Startrek’s ships computers is awesome. You should check it out.

* Computer, send the composed mail.

* Computer, preview the picture of my last birthday, the one I’m with my girldfrind, the new one. Also send a copy to her and print one in high resolution. Make sure Mom gets a copy.

*That*, would be nice.

2006-08-09 8:09 am

Lambda
We are really far from being able to tell the computer:

* Computer, read my mail; discard all spam.

* Computer, reply to Joe Sixpack: Dear Joe, this voice interface like Startrek’s ships computers is awesome. You should check it out.

* Computer, send the composed mail.

* Computer, preview the picture of my last birthday, the one I’m with my girldfrind, the new one. Also send a copy to her and print one in high resolution. Make sure Mom gets a copy.

Did you watch the screencast? It looks like 1-3 of your list is completely doable in the Vista system as of now. #4 is more of an AI/Image recognition problem.

2006-08-09 3:49 pm

sbenitezb
“Did you watch the screencast? It looks like 1-3 of your list is completely doable in the Vista system as of now. #4 is more of an AI/Image recognition problem.”

Yes. You must be kidding. You cannot use natural language normally to speak to your PC without repeating or making sure it understands exactly what you mean. It’s a waste of time. It’s just a toy.

2006-08-09 4:05 pm

Lambda
It’s a waste of time. It’s just a toy.

Or maybe you wish it was a waste of time and just a toy.
2006-08-09 6:54 pm

sbenitezb
“Or maybe you wish it was a waste of time and just a toy.”

If you read my previous post you will find the answer.
2006-08-09 7:38 pm

Lambda
“Or maybe you wish it was a waste of time and just a toy.”

If you read my previous post you will find the answer.

Your “answer” doesn’t coincide with reality and I doubt you even watched the video. Don’t worry, Linux will get it eventually and you can then proclaim it’s the greatest thing since sliced bread.
2006-08-10 5:22 am

sbenitezb
“Your “answer” doesn’t coincide with reality and I doubt you even watched the video. Don’t worry, Linux will get it eventually and you can then proclaim it’s the greatest thing since sliced bread.”

You like cheap talk, don’t you? I watched the video. Even if it wasn’t made up, it’s not that interesting technology for what it can actually do. And as I’m not interested in useless things (at least useless to me) I really don’t care if Linux ever has it. It works for me now, without crap.

2006-08-09 4:43 pm

47ronin
As long as the mail application is scriptable, almost all of these items have been doable as speech recognitions commands for years in Mac OS 9 (we’re talking at least seven years). I was able to login using a vocal phrase key, open and close applications, scroll windows and select items, get information and run menu items. It was a combination of the built-in AppleScript and Speakable Items features of Mac OS back in the day.

One thing I am curious about… does Vista require you to voice-train your computer BEFORE the system can recognize you properly or does it work out of the box with any Joe’s accent and drawl?

2006-08-09 7:07 am

marcos89
http://perlbox.org/Download.html

Any future with this?? Or maybe another solution??
2006-08-09 8:06 am

ActiveMan
This demo was prepared in a special enviroment without any noise, isn’t it?

You can see what happen in the real world here:

http://www.youtube.com/watch?v=fV1kqthZf2g&NR

2006-08-09 10:16 am

Alleister
Yes, because in the real world you always sit with a couple of thousand guests in your home cathedral.

Don’t get me wrong, this surely isn’t anything spectacular new, but it reflects my experiences with other speech recognition software. Usually it was enough to use a different microphone than that which you have trained it with and they stoped working.

You can be sure that no MS employee would present that software if it would not have worked when they tryed it. You know, you can think of Microsoft what you want, but there is no policy that requires MS employees to have down-syndrome.
2006-08-09 12:24 pm

apoc
lol, ActiveMan, if you don’t believe try it for yourself, it’s the best thing to do.

I had already tried Vista’s Speech Recognition before that sad video, “dear aunt etc etc”, so i know that there was a problem at the demo, if you watch it carefully you’ll notice the volume is at the max, and the guy giving the commands didn’t know how to use speech recognition, when a recognition error occurs the speech button turns orange and you have to wait for it to turn blue again, if you keep talking while it is orange you’ll only get things worse because Speech Recog will think that you’re still finishing the command when you’re actually trying to correct re recognition error.

Try it for yourselves.

I also don’t agree that the speech recognition is only for impaired people, it’s much faster saying “start AppName” than typing the name of the app in the start menu, you don’t even have to open it! Also, simple things like clicking buttons are much faster, imagine you’re composing an email, you change your mind and just say “close that, don’t save”,

minimize

maximize

switch applications

open apps/common places(documents|pics|videos|computer|other start menu buttons)

start a search(“start search”)

shudown/hibernate/sleep/log off

menubar commands(“tools options”)

these are all tasks that i believe speech recognition will help to speed up.

btw, the new explorer is also pretty nice interms of usability, the new address bar, organization options, visual feedback of folder contents, preview pane with multimedia support, metadata below, integrated search,it’s nice.
2006-08-09 1:10 pm

Janus
And you can see the whole clip here:

http://www.youtube.com/watch?v=kX8oYoYy2Gc

Unlike in the clip you posted, he successfuly does application launching and switching betwheen windows. Also, after the part where the voice recognition fouls up, he removes the text and starts anew dictating a letter without any problems. But of course, these part don’t help when you’re out to bash Microsoft.

Anyhow, as described pretty detailed in the YouTube description of that clip, it was a known bug and it hs been resolved.

If you want to know exactly what caused it, you can read all about it here:

http://blogs.msdn.com/larryosterman/archive/2006/07/31/684327.aspx

2006-08-09 9:19 am

Babi Asu
I hope Apple bring that to Leopard. I don’t care Apple will be called copycat, it’s Apple’s problem.
2006-08-09 9:25 am

RandomGuy
but does anybody know how it handles different languages?

There are times when I need to write German (my friends would give me very confused looks if I wrote all my mails in English) as well as times when I need to write English. Is there a simple command like “switch language to X”?

I think it’s really good for common task although I would not want to program this way.

2006-08-09 6:18 pm

n4cer
but does anybody know how it handles different languages?

It currently supports 8 languages/dialects:

U.S. English, U.K. English, traditional Chinese, simplified Chinese, Japanese, German, French and Spanish.

2006-08-09 9:48 am

netpython
I think voice recognition is major step forward for psysically challenged computer users.Really impressive stuff.Like a helping dog that turns of the light switch.An direct increase in quality of life.

But with all respect for the majority of us its just bling bling in the current stage.It starts to become interesting when voice recognition reaches the sophistication of computers in the sci-fi series of Star Trek.

Yet again how nice,its still far far away MS show and nobody know when,how and *if* it will be delivered.

Edited 2006-08-09 09:49
2006-08-09 10:42 am

segedunum
Am I going to believe a blog entry or an on stage demonstration that went belly up and where the recognition could recognise the most basic of words – with no background noise?

2006-08-09 11:42 am

segedunum
where the recognition could recognise the most basic of words

Whoops. That should of course read ‘where the recognition couldn’t recognise the most basic of words’.

2006-08-09 10:51 am

bolomkxxviii
I have tried ViaVoice and Dragon Naturally Speaking. Went back to typing. It is faster and more accurate.

2006-08-09 11:00 am

netpython
Went back to typing. It is faster and more accurate.

Exactly!

And i think a real graphics artist can’t get his/her “fingerspitzengefuehl” substituted by voice recognition that easily anyway.

2006-08-09 11:10 am

Isolationist
I am Vista’s delete using leech recognition as oui peak help
2006-08-09 11:24 am

Sabz
wreckignicion or recignicion ? lmao
2006-08-09 11:32 am

doubleUb
Microsoft speech recognition is available for free download, with SDK.

I was programming in Delphi under Windows Me at the time, it worked well for speech command. I remember having a small merlin on top left of my screen, wiating for commands like “switch delphi” to bring up Delphi when it was not on top.

The only matter for me was that it was English only (even if I could have free french speech synthetisys)

Since then… Nothing happened. I am pretty sure the speech recognition engine is still the same. I have been waiting for that for years.

Now, I have a graphic tablet with shortcuts. Definitely the best HID device I ever had.
2006-08-09 11:39 am

Yagami
the video really impressed me !!! i must say , the computer understood much better what he was saying than me ! ( i always though he was saying closeup )
2006-08-09 12:31 pm

makfu
This is an impressive demo. Alas, no matter how impressive Vista might be, the ABM’ers will say anything to justify their irrational vitriol of Microsoft.

2006-08-09 4:15 pm

Lambda
This is an impressive demo. Alas, no matter how impressive Vista might be, the ABM’ers will say anything to justify their irrational vitriol of Microsoft.

Exactly. I was actually surprised at how well it worked and how integrated it was into the environment. If this worked as well in a Gnome or KDE desktop it would be the greatest thing since sliced bread.

These people are so transparent. They’re bitter it did work so well.

2006-08-09 12:36 pm

brother bloat
The problem with voice recognition, in my opinion/experience, is that, while it may be reliable at times, it’s almost never reliable 100% of the time. To me, as a user, I’m used to communicating with other human beings, who can understand what I say even when I mumble, or when I’m standing in a crowded room with the music blaring.

When the computer fails to reach the level of reliability I’m used to with other humans, I find myself unable to rely on the computer for voice recognition at all as a tool. Instead, voice recognition becomes slower than simply clicking my mouse and typing on my keyboard — i.e. what I’ve always done (and gotten used to) in order to interact with the computer.

Speaking to the computer is like wading through thick oatmeal; while it should be easy and effortless, I find myself becoming annoyed when simple tasks take just a little bit longer.

Until voice recognition reaches (or comes very close) to the level of human voice interactions (perhaps through image recognition via the more and more popular webcam for lip reading??), I’m staying away. To me, until then, voice recognition will be a neat gimmick or a toy, but it’s not going to cut it for “real” work.
2006-08-09 1:36 pm

Sphinx
I had speech recognition working on my 8086/CPM powered apricot portable in the 80’s, blew a few minds but that’s about all, not a great time saver, just a neat toy.
2006-08-09 2:03 pm

ozonehole
I realize that it could be of use to the physically impaired, but for the vast majority speech recognition as it’s currently used is a disaster. I absolutely dread calling Qwest (a US phone company) or my credit card company, or numerous airline companies, because I’ve got to wade through menus where a computer asks me questions that I’m supposed to answer. Only it misinterprets the answers half the time and directs my call to the wrong place. I irrationally find myself yelling at the computer.

I would say that, at least in America, the quality of life has deteriorated somewhat thanks to speech recognition, and it continually gets worse as more companies and government offices adopt this “labor saving technology”. I wonder when the Diebold election rigging machines are going to start using speech recognition.

Remember the old days when actual humans answered the phones? I’m almost grateful for those call centers in India – just about the only chance you’ve got to reach a real person (though usually only after you’ve spent 10 minutes yelling at a computer).

Edited 2006-08-09 14:07
2006-08-09 2:21 pm

michael135
Speech recognition works fine, when the application compares the input to a limited number of entries, like in this video. Speech recognition becomes much harder when the user is allowed to speak freely, like when dictating an email.

2006-08-09 3:54 pm

MollyC
He does dictation at the end of the video, and it worked fine.

2006-08-09 2:33 pm

agentj
Shout “open cmd.exe – enter – format /autotest /u c:” ]:-D

2006-08-09 3:22 pm

IMesh
Oh I get it, like your going to format the Windows partition! Arn’t you clever. That’s almost as cool and clever as typing MS.

2006-08-09 3:42 pm

REM2000
Another side point to make is that Exchange 2007 also has speech recognition. Microsoft demo’d this a few months ago at a technet UK event.

It was one of the most impressive thing ive ever seen (or should that be heard . The Microsoft employee was able to call up the exchange server via a mobile phone, tell it to open her inbox, calendar etc.. change appointments, send voice mails etc.. all via her voice without a single key-press on the mobile.

It’s nice to see this technology finally start lifting off the ground.
2006-08-09 4:42 pm

ThawkTH
I’m usually quite anti-Microsoft. I’ve, on more than one occassion, accused vista of being a prettified XP. Nothing more.

I’ve begun to see the error in my thinking. Do I think Vista’s as different and groundbreaking as it should be after 5 years (+)?

Not at all.

This I do find impressive. Even if most people don’t use it. Even if it is a toy once in a while. It’s different and appears to work extremely well.

Now someone needs to come up with something total nerds would totally love – a tablet pc with a decently designed and integrated LCars interface running Linux with awesome speech recognition and voice responses.

“Computer, status.”

“Fetching updates. 1 minute remaining.”

Hey, I’ve been dreaming of it since I was 10!

Haha, seriously though, I think it would be cool to have some verbal responses – I’d just choose the trek lady voice I learned to love watching ST:Voyager. It would go a long way to integrating the computer into the home rather than having it be “just another appliance” that someone opens for e-mail once in a while.

/end ramble.

2006-08-09 4:52 pm

47ronin
Already done.

http://en.wikipedia.org/wiki/Speakable_items

Mac OS X has Speakable Items which has some preset commands AND it opens up a world of possibilities because the speech recognition engine can tie into the Terminal. For example if you wanted the uptime of your system you can speak a command which would launch an AppleScript that parses the value of the “uptime” command and make it phoenetically speakable by the Mac.

http://www.xvsxp.com/misc/speech.php

2006-08-09 9:15 pm

Kancept
I’ve been using the voice regonition in OS/2 for years. My voice profile is quite large, and I get a VERY good recognition rate (above 90%, typically higher). I admit, I do have years invested into my speech profile, but I have carried it over each time so I do not need to start fresh. The profile I’d say has about 10 years of patterns to go on. I also have my custom assignments in it, like “send” and things that were mentioned above.

I would like to see a more fluid system instead of the boxyness when dictating, but I have grown accustomed to it after all these years. While I don’t find it a neccessity, I do like the option of input methods. I don’t see voice input as a waste at all.
2006-08-09 10:36 pm

proforma
oh my gosh. Did we get through this thread without a “Microsoft and Vista blow” or “Linux rul3z for life!” or “Linux already has this feature and Microsoft did it horribly!”

I can’t believe it!

Praise God!

Maybe the community here is actully buying a clue and moving on with their lives.

Imagine that!