Our lifestyle correspondent Lillian Luo reports back after participating in IBM and Sony Ericsson’s beta program for the release of ‘Myvoice’, their universal translation service.
“My husband is Chinese, I am English and we have just had a splendid family reunion. The occasion was our Chinese grandmother’s 80th birthday. The extended families got together around the dinner table and the conversation never stopped. What is so special about this? Well, other than my husband, our kids and myself, none of the extended family are multi-lingual. We had dreamed of getting the family together like this for years but had to wait for technology to come to our aid.
“The best part is that we were hardly aware of the technology at all. Using the tiniest mobile phone on the planet, the SonyEricsson E1, small enough to be tucked away unobtrusively in the ear, we were blissfully unaware of the smart technology required to accomplish all of this. The translations were natural, fluent and fast, and our voices sounded, well, exactly like our own voices. At the press of a button we crossed the chasm of communication.
“What we experienced in our family demonstrate the potential of removing one of the toughest barriers to international trade and tourism, that of language. The impact on communication and customer service will be far-reaching in almost any organisation operating in a multi-cultural environment.”
Read the full story in the detailed Analysis/Synthesis section – for subscribers only
ANALYSIS >> SYNTHESIS: How this scenario came to be
As the Internet boomed and connected the world almost overnight, language proved to be either a barrier to participating in this brave new world or an enabler and competitive advantage for those who spoke the language of the Internet – English.
Over 200 million future Internet users from China, India, Japan and Taiwan were expected to be online by 2003, creating what International Data Corp. (IDC) estimated in 2001, to be a $378 billion market for the translation of online text. This certainly became a growth driver in the universal translation market.
Well-known names like IBM putting its energy and money into this sector did much to highlight its enormity, adding instant credibility to this growth sector. The benefits were clear, more than 80% of online content was in English, whilst almost 70% of all PC users did not have English as a first language.
The goal was to translate everything ranging from website content, documents, forms and archives to words, phrases and email messages in real-time or delayed formats.
Globalisation and open markets also meant that huge numbers of people and vast amounts of trade was taking place between people from different areas and with different languages.
The challenge is far beyond just the translation of words and phrases for specific contexts like travelling. To make it widely applicable requires the contextual translation of the written word, as well as the real-time translation of the spoken word, in any situation. This is much more difficult. It requires semantic analysis, extracting the most likely meaning of text or speech and then expressing the same idea in the other language.
A simple example to illustrate the perils of double translation:
I asked for a translation of “Out of sight, out of mind”, after the double translation it came back as “Invisible idiot” (Please refer to the end of this section for a few more examples)
2004: IBM sets benchmark
MIT Technology Review profiles computer scientist Yuqing Gao as part of their feature ’10 Emerging Technologies that will change your world’. Working out of IBM’s Watson Research Center in Yorktown Heights, NY, Yuqing Gao is bilingual — and so is her computer. She demonstrates the current capabilities of the device to translate Mandarin Chinese to American English via a personal digital assistant (PDA). The context is a medical interview between doctor and patient and the translation is delivered in a pleasant female voice. The ultimate goal, she says, is to develop universal translation software that gleans meaning from phrases in one language and conveys it in any other language, enabling people from different cultures to communicate.
Work in this field is accelerating and Gao is at the forefront. The focus is on the use of mathematical models and natural-language processing techniques to make computerized translation more contextually accurate and more efficient.
This is different from speech recognition and synthesis. The main drivers behind the research is currently coming as a result of global business and security needs.
Alex Waibel, associate director of Carnegie Mellon University’s Language Technologies Institute, which supports several parallel efforts in the field, attributes much of the maturing of universal translation to the advances in automatic learning, computing power and available data for translation.
The challenges are to convert speech to text, making sense of that text (contextually to the conversation) and then to use speech synthesis technology to output the translation. This is enormously complex. In addition the system has to be adaptable to new situations since no conversation happens in exact phrases.
IBM is acknowledged as setting the bar in this field. In the above application it achieves 90% accuracy and has around 2000 words in its vocabulary.
The ultimate goal is to support ‘Any language to Any language’ translation
2005: Pivotal partnership
IBM and Sony Erickson announce their advanced earphone/voice and microphone/speech technology which is delivered as part of the latest mobile phone devices which also carries the translation software. For anyone familiar with Hitch Hiker’s guide to the galaxy you will remember Babelfish, a little fish that you put in your ear which translated everything. This new gizmo is getting there. Although the complete device does not yet fit into the ear, it is very discreet, contains some clever directional and programmable sound technology which makes it a boon in noisy environments such as restaurants and the voice is very soothing and humanlike. Many future improvements are planned for better speech recognition and really good matching for the speaker’s voice as it delivers the conversation to the receiver’s earpiece.
The ultimate aim of SonyEricsson is to make a mobile phone small enough in its entirety to fit into the ear in order make it unobtrusive and completely hands-free and natural, using a voice command interface.
A pivotal outcome from the collaboration between these two companies is a commitment to open standards to ensure that as many devices as possible can be used to collaborate.
A significant degree of sophistication has been reached in text translation and several online services are available. The most accurate is the service from IBM.
Another key deliverable to turn speech translation into a reality is to perfect the voice interface between humans and appliances, machines and systems. To eventually provide accurate input to the translator and a natural, flowing conversation, an accurate voice interface is essential. Huge advances have been made in this field.
2006: Business wins in this phase
There have been significant advances in semantic analysis and the pool of semantic concepts have increased dramatically, particularly in the areas of high focus such as healthcare, travel and security. It is becoming easier to hook up a new language to the network. For example instead of having to program separate Chinese-Arabic and English-Arabic translators, you need only map Arabic to the existing conceptual representations.
A high degree of accuracy and sophistication has been achieved in universal text translation and many specialised application suppliers have emerged, based on industries and purpose. So, for example, legal, business negotiations and international trade and healthcare are very well represented.
Business wins. In particular the multi-nationals who have been battling with corporate knowledge dissemination. It has been a real issue for cross-language teams who have to work in languages that are not their first language. The prohibitive cost of competent translators made this option impossible for most organisations. Translation of text (email, documents etc.) is becoming very usable. The general work in expanding the semantic concepts pool has helped enormously, but large companies are also finding that by adding their own semantic concepts they can indeed move faster and also create more clarity for their employees. The breakthrough this year has been in supporting speech translation for an increasing number of these semantic concept databases to facilitate business communication.
The priority for speech applications being deployed are driven by a number of factors such as ease of translation, commercial opportunity and, very importantly, National interest. Security has become a top priority with specialised applications being released in this field. In Airport security for example each immigration and security official at every International American Airport is now equipped with a PDA based interview set. The PDA also runs other applications, which is integrated to this interview set, a camera to capture and match the individual to a real-time database, demographic data, travel history, stress indicators based on culture and nationality and so on. The languages are still limited but expanding month on month.
2007: Niche markets benefit
Many applications are available for text translation, but speech continues to develop in its niche markets. This is still hugely important as one looks at the benefits being derived in areas like healthcare. Very few language barriers now exist in this field. Most healthcare professionals in multi-cultural environments have access to very smart universal translation systems, running on PCs, mobile phones and PDAs.
Distance learning is another area where translation plays a key role and text translation has developed to the point where it is applied quite effectively, extending the reach of many of the distance learning organisations.
Great progress has been made in developing application-specific dialog engines, conversing in natural language over multiple channels and doing this in the context of previous conversations. Some of these systems are already at work in distance learning, managing much of the voice interaction with students. This year has also seen the mainstream adoption of systems for customer service. This has become a real, low cost and very effective alternative to ‘offshore outsourcing of call-centres’. Several major Indian outsourcing companies have commented on the impact this is having on their businesses.
2009: Niche to mass market
Universal Translation applications are proliferating in specialised application areas. Mostly these applications have been deployed in structured environments and aimed at one to one interaction.
A vast expansion of the available semantic concepts and great advances in the speech recognition and voice technologies has the potential to deliver popular, mass market applications. Business is once again the big winner. Because of a more manageable semantic concepts pool they are able to start using speech translation on a regular basis.
Many embedded technology deals have been signed ensuring universal translation solutions will be supported on your device of choice.
New market- and marketing opportunities open, customers are starting to expect and enjoy a cultural sensitivity that was difficult to achieve in a world mostly serviced in the English language.
2010: Translation and Mobile converge
SonyEricsson has relentlessly been driving down the size of their mobile phone devices and they have just released the SonyEricsson E1, to great market acclaim. The E1 tucks neatly into the ear and is voice command driven.
In Universal translation technology the real breakthrough in this year has been to optimise the technologies to facilitate real, complex conversational applications. Paradoxically it is the social interaction which has proven to be the most complex because it is so variable and diverse.
The IBM and SonyEricsson universal translation service ‘Onevoice’ is launched globally. This highly sophisticated service supports many different areas of specialisation, but its biggest differentiator, is its mass-market potential. There are a number of mega-deals mooted between ‘Onevoice’ and the mobile and fixed-line operators.
It is now possible for everyone to be multi-lingual, to a very real degree. Discreet technology, great contextual semantic translation and a broad base of social semantic concepts have been mapped. A wide variety of languages and good voice matching makes the conversations of persons to persons very real and very personal. It is truly very much like having a conversation in the same language with your nearest and dearest.
The world has just become a friendlier place. It is at last possible to enjoy the culture of other countries and their people. Having a real conversation with a Frenchman in his own language – J’aime la France!
Some translation examples from 2004
Herewith a few examples of how things can go wrong (please bear in mind that this result was obtained using a free, online translator – however it does illustrate the complexities of even simple translations)
“We’re Sergeant Pepper’s Lonely Hearts Club band, We hope you will enjoy the show”
“Nous sommes des Poivre de Sergents Coeurs Solitaires Frappent la bande, Nous espérons que vous apprécierez le spectacle”
The above French phrase was then handed back to the translator with a request to translate it back to English again, here is the result, enjoy!
“We are Peppers of Sergents Hearts Solitary Hit the band, We hope that you will appreciate the spectacle”
And finally an example to show the way in which semantics can confuse meaning, when translated. This is taken from an email exchange within our workgroup:
There are two really powerful drivers for this scenario. The one is the development of technology and the other is the readiness of customers to use it.
Hay dos conductores realmente de gran alcance para este panorama. El es el desarrollo de la tecnología y la otra es la preparación de los clientes para utilizarla
Back to English
There are two really long-range conductors for this panorama. It is the development of the technology and the other is the preparation of the clients to use it