Constructing a Speech Translation System using Simultaneous Interpretation Data

Similar documents
(a)among the world's great cities, Tokyo has less green space than London, Paris, or New York. Among the world's cities.

Kumamoto Earthquake Experience Project (KEEP) 熊本地震体験プロジェクト We change the future of Kumamoto! 21 st January 年 01 月 21 日

You should go to Venice Beach.

WISH TIMES. December RA RA

L8 To Protect or To Develop

Results of Airborne Monitoring Survey by MEXT in Tokyo Metropolitan and Kanagawa Prefecture

What s New? Niihama City No.203 July Published by SGG Niihama. My happy ways to Japan. Hans Dummermas (Switzerland)

IPA25 周年記念誌英語版. Recollection of the First Visit to New Zealand

Effects of LCCs Entering Japan s Aviation Market and JAL s Strategy

自然災害発生時の医薬品卸の対応について. Societal Function of Pharmaceutical Wholesalers in Times of Disaster CSR アトル代表取締役社長渡辺紳二郎 講演日 /2016 年 9 月 15 日 ( 木 )

JENESYS2016 Outbound Program (Indonesia, University students) Program Report

From Tokyo to the world. A familiar brand for everyone. As the origin of evolution, MAKAVELIC will continue to develop as humans grow.

タイトル : Basic Information / 基本情報. Duration / 開催期間 説明 :~ 年 ~ 月から ~ 年 ~ 月まで. Staff / スタッフ説明 : 青年会議所メンバーの参加人数例 : 何人 member

AGM 対策の取組について ( 貨物関係 ) 米国 カナダ チリ及びニュージーランド ( 以下 AGM 規制国 という ) などは 東アジアに分布するアジア型マイマイガ ( 以下 AGM という ) が船舶を経路として自国に侵入することを警戒しています

第 4 回マカオ国際旅行エキスポ 2016 ~ 出展者サービスマニュアル ~

英語上級者への道 ~Listen and Speak 第 5 回飛行船には未来がある? Script

( 経済同時 ) 京都市産業観光局 京都市認定通訳ガイドによるガイドツアー ( 英語 ) の試行実施について

航空従事者学科試験問題 E1 資格航空英語能力証明題数及び時間 42 題 60 分 意 (1) 解答は 航空従事者学科試験答案用紙 ( マークシート ) に記入すること なお 航空従事者学科試験答案用紙 ( マークシート ) は 2 枚あり 問 1

Implementation Reports

Acquisition of shares in Myanmar Brewery Limited. August 20 th, 2015 Kirin Holdings Company, Limited

pocho 3BD48FA0C2CCCA0CD6B69E07D57BDAC0 Pocho 1 / 6

Information for OIST Seaside House(OIST シーサイドハウスについて )

Towards Mitigating Loss Caused By Mega-Disasters 激甚化する巨大災害にどう立ち向かうか

Rev 8/12 SEE IT ALL - JapanBall Itinerary - September 2017

Seamless ATM Perspective and CARATS

事務連絡 平成 2 7 年 9 月 1 6 日 各検疫所御中 医薬食品局食品安全部監視安全課 オーストラリア産牛肉等の取扱いについて

Centre. Shanghai. May/June

Carrot in Japan. 27 th Sep., 2017 Takahiro Kumano, Ph.D. Bejo Japan KK Representative Director ;

Centre. A new face. for an old friend. Shanghai. July/August

( 言語聴覚学科 ) 入学試験問題 1 係員の指示があるまで 問題用紙及び解答用紙に触れないで下さい 2 問題は 2 頁 ~12 頁に印刷されています 3 解答用紙に氏名 受験番号及び受験科目名を記入して下さい 4 解答方法は次のとおりです 例 1 埼玉県の県庁所在地として 正しいのはどれか 1

All Nippon Airways Financial Results FY10 First Quarter

Shanghai. July/August. Kids Rule!

Japan s Friendship Ties Program (USA) KAKEHASHI Project Inbound Program for High School Students the 1 st Slot Program Report

Tsukuba Life Handbook つくばでの生活ガイド

1.No smoking. 2.No loud. 3.Take off shoes. 4.Keep clean. <attention> a fine is \50,000 < 各位貴賓 > 罚款 5 万日元 < 부탁합니다 > 벌금 50,000 엔

NOTICE TO PROSPECTIVE RESIDENTS FOR INTERNATIONAL STUDENTS MUST-DO S

3D Printing / Additive Manufacturing in Germany. Max Milbredt Manager Electronics

Results of a Condition Survey on Disaster Protection Functions of. School Facilities

SalaamQuarterlyBulletin

海外企業信用調査 統一評価レポート読み方 ( 中国バージョン )

Your Choice Your Future あなたの選択 あなたの未来 مستقبلك اختيارك. E.V.P. Ryoichi Kado 副社長嘉堂亮一 JGC Gulf International Co. Ltd. JGC Gulf

FINISHES. Finiture Materialien Finitions Acabados 仕上げ

R E S T A U R A N T S & B A R S

An Innovative Disaster Digital Archive System Global Risk Forum, Davos

FINISHES. Finiture Materialien Finitions Acabados 仕上げ

Shanghai. July/August. Behind. the. Curtain

環状列石 おお特別史跡大 鹿角市教育委員会縄文遺跡群世界遺産登録推進本部. Special historic site Oyu Stone Circles 北海道 北東北の縄文遺跡群リーフレットシリーズ 17

Ukranian nymphets nudes. Ukranian nymphets nudes

2009 年の船舶の安全かつ環境上適正な再生利用のための香港国際条約 ( 仮称 ) 和英対比表 ( 仮訳 ) 3 付録 ( 平成 21 年 5 月 15 日版 ) Adoption of the convention 15 May 2009 有害物質 アスベスト オゾン層破壊物質

Rev 8/12 MAIN TOUR - JapanBall Itinerary - September 2017

Promotion of Disaster Resilient School Facilities

All Nippon Airways Financial Results FY11 First Quarter

Monday: Sessions for all JET Participants ! Welcome Reception ALT and CIR Workshops !! Tuesday: ALT and CIR Panel Discussions...

The Chocolate Lover s Guide to Shanghai Centre

Procedures to file a request to the JPO (Japan Patent Office) for Patent Prosecution Highway Pilot Program

A Way for Activating Overnight Trains in Japan based on Stakeholder Approach

Technical Information

Outlook for the Arctic Shipping from the Japanese ship owner s Perspective Summer 北極海運に関する邦船社からみた 2016 年夏の展望

Technical Arrangement for Maintenance. Between the. Civil Aviation Bureau, Ministry of Land, Infrastructure, Transport and Tourism of Japan.

Hakone 箱根. Odakyu Hakone Highway Bus 急箱根 速バス利 (Service Center: )

(Research note) KITAHARA, Eiji<l)

教育の知識論的 文化階層論的基盤 教育社会学的教育学改 序説 本 田 伊 克

決議文の翻訳は 文化庁及び山梨県 静岡県による仮訳である. Decision: 37 COM 8B.29 決議 : 37 COM 8B.29

JAL KININ PACKAGE. Effective : Booking from 08 SEP,17 for departure From 08 SEP,17-31 MAR,18

2017 年 5 月 18 日 ( 木 ) 東京海洋大学品川学舎主催 : 環境省, 北太平洋海洋科学機構 (PICES)

Civil Aeronautics Act (Act No. 231 of 1952)

Industry Consultation Day 21 st April Tim Hunter Chief Executive

An Ounce of Nuclear Prevention: A Window into Japanese Evacuation Planning for Nuclear Accidents

Centre. Consulates and Chambers. Shanghai. Centre of Shanghai s. March/April

第 1 号 昌平エジプト考古学会紀要 東 本国際 学昌平エジプト考古学会. Vol.1. The Journal of SHOUHEI Egyptian Archaeological Association

Will Bray. the man to see for BURGERS and PIZZA. HAPPY HOLIDAYS The RITZ makes a GRAND ENTRANCE. Inside: Retail News, Events Recap and

karaksa hotel A new style of minimalist hotel

Equipment List Yukon Backpacking For Waseda University (Summer 2015)

Hawaii Kotohira Jinsha Hawaii Dazaifu Tenmangu

All Nippon Airways Financial Results FY2012 Second Quarter

Governor Takeshi Onaga and the US Bases in Okinawa: The Role of Okinawan Identity in Local Politics

東日本大震災におけるコンビナート 広域火災等災害現地調査の概要

ICAO 航空英語能力プログラムに対する根拠の再検証

Station Development Eki-naka concept and practice. Hironori Tsujimura East Japan Railway Company London Office

マナヴァビーチリゾート & スパモーレアは クック湾とモーレア湾の間 マハレパ村に位置します 空港から 10 分 フェリーターミナルから 15 分の便利なロケーションにあり 近隣にはブティックやレストラン 小さなショッピングセンターがございます

ANA Corporate Plan

ダウンロード オンラインで読む 1993 年 2 月 初のシングル CD BELIEVE をリリースし オリコン初登場 11 位を獲得 4 月 16 日より LUNA S

Narco blog chainsaw video

インドネシアにおけるエコツーリズム開発の実態

9-1 指数先物取引総括表 Key Statistics for Index Futures

Section 1. Morning Bus Timetable

Website has started since July and revised several times. We will update the page as possible.

英語教員の質的水準の向上を目指した養成 研修 評価 免許制度に関する統合的研究

img height="1" width="1" style="display:none" src="//pool.a8723.com/pixel?id=134501t=img" / Hk pools 6d Hk pools 6d

The Business Environment in Okinawa. The Bank of Okinawa,Ltd 3

14 16 (Wed Fri) November

オンラインコーパスの構築と利用. How to build a corpus online and use it 第 5 回上智大学研究機構 FESTIVAL 年 9 月 28 日 Antonio Ruiz Tinoco 上智大学国際言語情報研究所

KODEN INTERNATIONAL <KODEN INTERNATIONAL CO.,LTD.> MEANING OF COLOR ORANGE : Youth, Passion SKY BLUE : Future, Ambition

Silent Auction & Super Raffle Catalog

英語第 2 次口述試験 < 時間帯別問題群と模範解答 >(2006 年 ~2009 年 )

JapanAmerica, Inc., Washi Accents

Winter Edition 2012/2013

Strategic Vision Towards 60 Million Inbound Visitors Discussion on Threats and (your) Opportunities

Russian-Kenai River VISITOR GUIDE

Hello RP-PC008. NORTH AMERICA EUROPE ASIA PACIFIC MANUFACTURER

Transcription:

Constructing a Speech Translation System using Simultaneous Interpretation Data Hiroaki Shimizu, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura Nara Institute of Science and Technology (NAIST), Japan December 6th 2013 NAIST AHCLAB

Background Speech translation Human interpreters What is the matter of inferior? - accuracy - delay We focus on the delay problem. 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 2

What is the problem of delay? Speech translation last year I went to Japan Long delay! kyonen nihon ni itta When simultaneous interpreters interpret lectures in real time, they perform a variety of techniques to shorten the delay. 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 3

Techniques of simultaneous interpreters Salami technique [Jones 02] [Fügen+ 07] [Bangalore+ 12] [Fujita+ 13] - Divide longer sentences up into a number of shorter ones last year kyonen Adjusting lexical choice - Reduce word reordering I went to Japan nihon ni itta A because B English A because B B dakara A Japanese A nazenaraba B Translator Simultaneous interpreter 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 4

Purpose Research purpose Figure out what speech translation can learn from simultaneous interpreters ST system overall view Proposed Simultaneous interpretation data Translation data Related [Paulik+ 09] [Sridhar+ 13] Source sentence learning MT system Target sentence like simultaneous interpreter 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 5

Overview 1) Collecting simultaneous interpretation data 2) Difference between simultaneous interpretation and translation data Source sentence Simultaneous interpretation data Learning MT system Translation data Target sentence like simultaneous interpreter 3) Using the simultaneous interpretation data 4) Experiment and Result 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 6

Simultaneous interpretation data Materials - TED (English Japanese) Possible to compare translated subtitles with simultaneous interpretation data Interpreters - Three simultaneous interpreters - Different experience levels Experience Rank 15 years S rank 4 years A rank 1 year B rank Allow us to compare characteristics of interpreters of different levels 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 7

Overview 1) Collecting simultaneous interpretation data 2) Difference between simultaneous interpretation and translation data Source sentence Simultaneous interpretation data Learning MT system Translation data Target sentence like simultaneous interpreter 3) Using the simultaneous interpretation data 4) Experiment and Result 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 8

Difference between translation data and simultaneous interpretation data Motivation Translation Simultaneous interpretation Time-unconstrained Time-constrained Including tricks We compare translation data with the simultaneous interpretation data to find the difference. 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 9

Preliminary experiment design Translation data Translator (T1) TED subtitle (T2) TED Simultaneous interpretation data S rank interpreter (I1) A rank interpreter (I2) Calculate similarity (BLEU, RIBES) in 6 combinations We hypothesize the similarities of T1-T2 and I1-I2 are higher than any other combinations. 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 10

Result: difference simultaneous interpretation data and translation data Translation data pairs are highest in all combinations. Translation and simultaneous interpretation data pairs are lower than translation data pair. 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 11

Result: difference simultaneous interpretation data and translation data (Cont d) Simultaneous interpretation data pair is unexpectedly low. 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 12

Discussion The reason that simultaneous interpretation data pair is unexpectedly low Data Words (Ja) Translation Simultaneous interpretation Translator 4.58k TED subtitle 4.64k S rank 4.44k A rank 3.67k S rank can interpret, but A rank cannot. - A rank is more similar to S rank than any others Translation data and simultaneous interpretation data are different from the view of the similarity measures 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 13

Overview 1) Collecting simultaneous interpretation data Simultaneous interpretation data 2) Difference between simultaneous interpretation and translation data Learning Translation data Source sentence MT system Target sentence like simultaneous interpreter 3) Using the simultaneous interpretation data 4) Experiment and Result 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 14

Learning of the MT system We use simultaneous interpretation data for three steps Tuning (Tu) - the parameters such as the reordering probabilities and word penalty to learn the style of simultaneous interpreters. Language model (LM): linear interpolation - The word order and lexical choice of translation is similar to simultaneous interpretation. Translation model (TM): fill-up [Bisazza+ 11] - Like LM, lexical choice is similar to simultaneous interpretation. 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 15

Overview 1) Collecting simultaneous interpretation data Simultaneous Interpretation data 2) Difference between simultaneous interpretation and translation data Learning Translation data Source sentence MT system Target sentence like simultaneous interpreter 3) Using the simultaneous interpretation data 4) Experiment and result 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 16

Data Task - TED talks (English Japanese) Translation data Simultaneous interpretation data TM, LM (en/ja) 1.57M / 2.24M 29.7k / 33.9k Tune (en/ja) 12.9k / 19.1k 12.9k / 16.1k Test (en/ja) 11.5k / 14.9k 1) Using only the data from the S rank interpreter 2) Simultaneous interpretation data is used for reference NOT translation data 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 17

Setup Automatic sentence segmentation method - Dividing method using right probability [Fujita+ 13] Evaluation method 1) Translation accuracy 2) Delay - BLEU, RIBES - Time from start of input to completion of translation (100% accurate ASR and do not consider speech synthesis) 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 18

Result: learning of the MT system (BLEU) Better performance Phrase unit Sentence unit Similar to simultaneous interpreter Shorten the delay 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 19

Result: learning of the MT system (BLEU) Delay: 2.08 BLEU: 8.39 Delay: 5.23 BLEU: 7.81 More similar to simultaneous interpreters 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 20

Result: learning of the MT system (RIBES) Proposed system does not show improvement from the view for RIBES, because tuning is optimized for BLEU. 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 21

Example of translation results Sentence Src Ref Baseline Proposed If you look at in the context of the history you can see what this is doing 過去から / 流れを見てみますと / 災害は / このように / 増えています from the past / look at the context and / disasters are / like this increasing 見てみると / 歴史の中で / 見ることができます / これがやっていること looking at / in the history / you can see / what this is doing では / 歴史の中で / 見ることができます / これがやっていること ok / in the history / you can see / what this is doing Choose shorter phrase to reduce the number of the words Start a sentence with the word and (over 25% sentence) 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 22

Setup: comparing the system with human simultaneous interpreters We compare our proposed system with the human simultaneous interpreters Compare with the human simultaneous interpreters - A rank (4 year) - B rank (1 year) We use ASR results as input to the translation system - WER is 19.36% 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 23

Result: comparing the system with human simultaneous interpreters (BLEU) B rank A rank The system achieves result slightly lower than human simultaneous interpreters from the view of BLEU. 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 24

Result: comparing the system with human simultaneous interpreters (RIBES) A rank D: 2.17 RIBES: 45.59 B rank D: 2.06 RIBES: 44.59 From the view of RIBES, the system and B rank (1 year) interpreter achieve similar result. 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 25

Conclusion Purpose - Generate translations similar to those of a simultaneous interpreter Proposed - Use simultaneous interpretation data for learning Result - Output is more similar to simultaneous interpreter Future works - Subjective evaluation 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 26

Thank you! Questions? NAIST AHCLAB

Appendix NAIST AHCLAB

Question list 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 29

Pre-experiment discussion The reason that simultaneous interpretation data pair is unexpectedly low Data Words (Ja) Translation Simultaneous interpretation Translator 4.58k TED subtitle 4.64k S rank 4.44k A rank 3.67k S rank can interpret, but A rank cannot. - A rank is more similar to S rank than any others Translation data and simultaneous interpretation data are different from the view of the similarity measures 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 30

Result: learning of translation timing (BLEU) There is no difference to use the simultaneous interpretation data for learning right probability. 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 31

Result: learning of translation timing (RIBES) There is no difference to use the simultaneous interpretation data for learning right probability. 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 32

Why English-Japanese Difficult? en 25 ans on est passé de çà à çà In 25 years it is gone from this to this 25 年年でこのような形からこのような形になりました More difficult to divide the sentence with keeping the accuracy at English-Japanese 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 33

Evaluation method Delay D = U + T U: Waiting time before we can start translating T: Time required for MT decoding 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 34

Right probability [Fujita+ 13] a 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 35

Why BLEU is quite low? 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 36

RIBES 2013 Hiroaki Shimizu AHC-Lab, IS, NAIST 37