18 December 2010
You'll no doubt have seen this demonstration on the internet by now:
Only uploaded a day ago, it's set the world buzzing with talk of Star Trek technology and a future with no language lessons. The video has already been viewed over a million times.
This is a massively impressive piece of technology, but it's not the technology people think it is.
This is a demonstration of real-time image manipulation, not of language technology.
The fact that they can extract text from an image, manipulate it and put it back in with virtually no lag is phenomenally impressive from the point of view of the image processing involved, but the "translation" really is nothing of the sort.
In their sales blurb, the makers are careful to say that it only translates individual words, and it's only designed to give you the "gist" of what is written. I wouldn't class this as "translation", but I could forgive them for doing so. The problem is that their target market is really people who don't know anything about language, and these people aren't likely to realise how little use "translation" of individual words is.
But it gets worse. The video purports to be a demonstration of the software in action. The casual viewer sees a piece of software translating such complicated Spanish sentences as:
It translates the text instantly
You need only the phone and Word Lens, without connection to internet or costs of network.
Unfortunately, the casual observer isn't likely to know any Spanish. In fact, their target market is people who don't speak Spanish (with Spanish-speakers who don't speak English as a sideline). This makes it seem very sharp practice when you look at the original "Spanish" and see that it is not natural Spanish -- it appears to have been doctored to provide the results given.
The text translated as It translates the text instantly actually means the text translates it literally. The second example isn't an incorrect translation in the same way, but the original just strikes me as a very odd way of phrasing the sentence.
Word-for-word translation is inherently limited and the demonstration specifically avoids demonstrating any of the problem cases (eg "not" and "no" in English both translate to "no" in Spanish, the gender system means that it's impossible to decide between "he" and "it" or "she" and "it" when translating from Spanish, and translating the other way, "it" could actually be any of 6 different words in Spanish.
As I say, it's a massively impressive piece of software -- it's just a shame they couldn't be more honest about what it is and what it isn't.
I hope the tech gets bought by one of the big translation companies, because with an appropriate translation engine behind it, it will be truly awesome. Right now, it's just a shiny little tech demo.