Cross-Language Queson Answerng by Mulple Auomac Translaons Sabano Larosa, Sefano Rovea Dp. Informaca e Scenze dell Informazone, Unversà d Genova, Ialy 2000s036@educ.ds.unge., se@ds.unge. Paolo Rosso Dpo. Ssemas Informácos y Compuacón, Unversdad Polécnca de Valenca, Span prosso@dsc.upv.es Manuel Monez-y-Gomez Laboraoro de Tecnologas de Lenguaje Ins. Nac. de Asrofsca, Ópca y Elecrónca, Mexco. mmonesg@naoep.mx
Mullngual QA Sysems MQAS allow he user o ge he answer by searchng documens wren n a language dfferen han he one used n he query, n order o explo he redundancy of documens on he Web. An mporan sep for a MQAS s he ranslaon of a queson from a language source o a desnaon one. A he momen, majory of QA sysems use onlne ranslaors. The qualy of her ranslaors s ofen no very good and hs has a negave mpac on he QA sysem effcency.
Objecves We focus on he problem relaed o he selecon of he bes ranslaon f more han one ranslaor s used. The wo mehods we propose (Word-Coun and Double Translaon, are oally sascal and herefore hey are language ndependen. We wll concenrae on he ranslaon from Ialan o Spansh, because he documens wren n he laer language presen on he Web are greaer n comparson o hose wren n Ialan. Two mehods was mplemened wh wo formulas: he DICE and he COSINE.
Word-Coun wh Dce formula Ths mehod explos he redundancy of erms n all he ranslaons. The ranslaon wh he hghes number of words n common wll be chosen. To fnd he number of common words, he nersecon of he Spansh ranslaons s aken no accoun. Example of ranslaed queson wh four dfferen ranslaors: 1. Qué sgnfca la sgla CEE? Che cosa sgnfca la sgla CEE? 2. Qué cosa sgnfca sglas el EEC? ( Wha does he abbrevaon EEC mean? 3. Qué sgnfca la CEE de la abrevacón? 4. Qué cosa sgnfca la pone la sgla CEE?
Word-Coun wh Dce formula The Dce formula s used o esablsh he degree of smlary among he ranslaons and o creae a herarchy explong he nformaon ha hey have n common: Sm(, j = 2* len( len( + len( j j Where: and j are he ranslaons ha we consder; len( j represens he nersecon (number of words n common; len( and len ( j represen he number of words for every ranslaon. For nsance, o ge he smlary grade of he frs ranslaon we do: Sm 12 + Sm 13 + Sm 14
Word-Coun wh Dce formula To ncrease he accuracy n he choce of he bes ranslaon, N-Grams are used up o 3-Grams. Example of 2-Grams of he phase: Qué sgnfca la sgla CEE? (Wha does he abbrevaon EEC mean? Qué sgnfca sgnfca la la sgla sgla CEE The N-Grams are very useful n cases n whch ranslaons are formed by same dencal words bu n dfferen order.
Word-Coun wh Cosne Formula The cosne formula s used o calculae he smlary degree. The ranslaons are represened as vecors n a -dmensonal space and o calculae he keyword weghs, he scheme TermFrequency- InverseDocumenFrequency (d-df s used. Example: Qual è la capale della Repubblca del Sud Afrca? ( Wha s he capal of he Republc of Souh Afrca? 1. Cuál es la capal de la Repúblca de la Sur Áfrca? 2. Cuál es enenddo ellos de la repúblca de la Áfrca del sur? 3. Cuál es la capal de la Repúblca del Sur una Afrca? 4. Cuál es el capal de la repúblca del sur Afrca?
Word-Coun wh Cosne Formula All words ha are n he ranslaon are consdered keywords (k only once and whou repeon. Ls of keywords: cuál, es, la, capal, de, repúblca, sur, áfrca, enenddo, ellos, del, una, afrca, el To calculae he weghs for every ranslaon he followng formula s used: f (, j*log(1+ n N Where: N s he oal number of ranslaons; n s he number of documens ha conan k f(i,j=freq(i,j / max(freq(,j I represens he frequency of he keywords n he ranslaon, normalzed w.r. he maxmum, calculaed on all he keywords of ha ranslaon.
Word-Coun wh Cosne Formula The vecor conanng he assocaon weghs o every keywords s obaned. T1:[1.33, 4.0, 0.62, 1.33, 0.35, 0.93, 0.50,, 0.30] Once he vecors have been found, he nex sep s he calculaon of he smlary degree among ranslaons by usng he followng formula: Sm( j, q = ( j 2 j * * q 2 q The fnal calculaon s performed n hs way: Tran1 = Sm( 1, 2 + Sm( 1, 3 + Sm( 1, 4 Tran2 = Sm( 2, 1 + Sm( 2, 3 + Sm( 2, 4 Tran3 = Sm( 3, 1 + Sm( 3, 2 + Sm( 3, 4 Tran4 = Sm( 4, 1 + Sm( 4, 2 + Sm( 4, 3 The ranslaon wh he hghes value s chosen
Double Translaon Mehod Every queson n Ialan s ranslaed no Spansh hen reranslaed back no Ialan. Four ranslaors are used and he ranslaon whose resuls are more smlar o he orgnal queson wll be chosen. The algorhm for hs mehod wh he Dce formula dffers from he prevous for he nersecon beween ranslaors. In fac we make an nersecon beween he orgnal queson and he reranslaed queson. For he mehod wh he Cosne formula he dfference wh he prevous are ha we make a ls of keywords ncludng he orgnal queson. For he orgnal queson we use hs formula: (0.5 + [0.5* f (, j] *log(1+ n N
Resuls We ranslae 450 facual queson from he CLEF 2003 compeon These quesons are ranslaed wh 4 dfferen ranslaors. WC wh Dce DT wh Dce WC wh Cos DT wh Cos 1-Gram 51,33% 46,66% 48,66% 45,77% 2-Grams 51,11% 49,11% 49,33% 48,44% 3-Grams 51,55% 50,22% 50,00% 49,11% The able shows he percenage of success usng he dfferen ranslaors, applyng he echnques prevously explaned. To ncrease he accuracy n he choce of he bes ranslaon N-Grams are used up o 3-Grams.
Resuls Dae Person Organzao n Locaon Measure WcDce1-G 46% 59% 58% WcDce2-G 58% DDce2-G 61% DDce3-G 61% 64% DCos3-G 61% Baselne 70% 64% 42% 72% 40% The able shows he percenage of success for each caegory of queson
Conclusons and Furher Work A prooype of ranslaon for a Mullngual QA Sysem was proposed We have observed ha some ranslaors make a bad ranslaon, probably due o he fac ha an nermedary ranslaon n Englsh s needed, for wo ranslaors, o oban a fnal Spansh ranslaon. There are some cases where he bad redundancy penalzes he elecon of he bes ranslaon mehod. The machne ranslaor whch obaned he bes resuls s PowerTranslaorPro (55.33%. Ths baselne was beer han our bes resuls (51.55% whch are obaned wh he Word-Coun mehod. The prelmnary resuls seem o be promsng. In fac an opmal combnaon among he Word-Coun and Double Translaon could ncrease he percenage of success. We esmae ha should be possble o oban approxmaely an ncrease up o 20%. Ths s due o he fac ha he choce obaned from wo mehods are no he same. Furher expermens are needed o mprove he qualy of ranslaons. The use of oher ranslaors s foreseen. We need o make some furher expermens wh oher ses of facual quesons o make a comparson wh he prelmnary resuls we obaned.