Siri 怎麼學會說上海話?

蘋果、亞馬遜、微軟以及 Google 都提供語音助理服務,孰優孰劣?
REUTERS/Suzanne Plunkett
REUTERS/Suzanne Plunkett

原文刊登於英語島雜誌 2017 年 6 月號,INSIDE 獲授權轉載。

更多詳情請見世界  公民文化中心粉絲專頁。

蘋果、亞馬遜、微軟以及 Google 都提供語音助理服務,孰優孰劣?根據路透社報導,蘋果的語音助理 Siri 在辨識語音、回答問題方面或許不再具優勢,但 Siri 一大優勢是能說最多種語言,現在正要學習說上海話,我們來看看它怎麼做到。進入本文前,想想以下單字英文怎麼說:

量身訂做 (B)(說話)含糊的 (C) 規模化

The voice-assistant wars are (1) in full swing, with Apple, Amazon, Microsoft and now Google all offering electronic assistants to take your commands.


Many researchers believe that Apple has squandered its lead when it comes to understanding speech and answering questions. However there is at least one thing Siri can do that the other assistants cannot: speak 21 languages localized for 36 countries, a very important capability in a smartphone market where most sales come from outside the United States.

許多研究人員認為在語音辨識和回答問題方面,蘋果的領先優勢已消耗殆盡,不過有件事目前只有 Siri 做到:說 36 個國家的 21 種語言。此功能在智慧手機市場極為重要,因為大部分智慧手機都銷往美國以外地區。

Microsoft Cortana, by contrast, has eight languages (A) tailored for 13 countries. Google's Assistant, which began in its Pixel phone but has since moved to other Android devices, speaks four languages. Amazon's Alexa features only English and German. Siri will even soon start to learn Shanghainese, a special dialect of Wu Chinese spoken only around Shanghai. 

微軟 Cortana 為 13 個國家制定了 8 種語言。Google 助理會說 4 種語言,這項服務出自 Google 自家手機 Pixel,現已開放其他 Android 系統手機使用。亞馬遜的 Alexa 只會說英語和德語。而 Siri 馬上要開始學上海話了,這是一種只在上海及其周邊地區使用的吳語方言。

At Apple, the company starts working on a new language by bringing in humans to read passages in a range of accents and dialects, which are then transcribed by hand so the computer has an exact representation of the spoken text to learn from, said Alex Acero, head of the speech team at Apple. Apple also captures a range of sounds in a variety of voices. From there, an acoustic model is built that tries to predict word sequences.

蘋果語音團隊負責人 Alex Acero 說,要發展新語言功能時,會讓有各種方言和口音的真人唸出文字段落,然後再手動轉錄,這樣電腦就可以擁有準確的學習樣本。蘋果還會從不同的聲音中捕捉各種語音,接著建立一個聲學模型,以嘗試預測字元序列。

Apple then deploys "dictation mode," its text-to-speech translator, in the new language, Acero said. When customers use dictation mode, Apple captures a small percentage of the audio recordings and makes them
anonymous. The recordings, complete with background noise and (B) mumbled words, are transcribed by humans, a process that helps cut the speech recognition error rate in half. 

Acero 說,接著蘋果會在新語言中部署「聽寫模式」,一種文本和語音之間的翻譯器。當使用者使用聽寫模式時,蘋果會抓取音訊錄音中的一小部分,然後對其匿名處理。這些錄音包含了背景雜音和含糊的詞語,由真人轉錄則可將語音辨識的錯誤率降低一半。

After enough data has been gathered and a voice actor has been recorded to play Siri in a new language, Siri is released with answers to what Apple expects will be the most common questions, Acero said. Once released, Siri learns more about what real-world users ask and is updated every two weeks with more tweaks. 

收集了足夠資料、配音員為 Siri 錄製講新語言的聲音後,Siri 即可發佈。發佈時,Siri 能回答出蘋果預期最常見的問題。發佈後,Siri 也能從用戶的實際問題學習,每兩周作調整並更新。

However, script-writing does not (C) scale, said Charles Jolley, creator of an intelligent assistant named Ozlo. "You can't hire enough writers to come up with the system you'd need in every language. You have to synthesize the answers," he said.

不過,智慧助理 Ozlo 的創造者 Charles Jolley 說,撰寫腳本無法規模化,「不可能聘僱夠多的作者,來打造每種語言所需的系統,必須人工合成回答。」

The founders of Viv, a startup founded by Siri's original creators that Samsung acquired last year, is working
on just that. "Viv was built to specifically address the scaling issue for intelligent assistants," said Dag Kittlaus, the CEO and co-founder of Viv. "The only way to leapfrog today's limited fuctionality versions is to open the system up and let the world teach them." 

「Siri 之父」的新創公司 Viv,正著手解決這個問題。這間公司去年由三星收購。Viv 的聯合創始人兼 CEO Dag Kittlaus 說:「Viv 想解決智慧助理的規模化問題,想要讓當今功能侷限的版本升級,唯一的方法就是開放系統,讓世界來教它們。」

1. In full swing 如火如荼;全力進行

By ten o'clock, the party was in full swing.




資深UI / UX 設計師(中壢)


獎勵 NT$15,000

IT System Engineer

Digital Forest Technologies Co. Ltd.

獎勵 NT$15,000

Java Backend開發工程師


獎勵 NT$15,000