Friday, June 12, 2009

Machine translation may have a way to go...

For some reason the Japanese children's song about goat pen-pals came to mind. I couldn't completely recall the lyrics but I got a few of them: black goat, white goat, letter, ate... and google helped me out. If your display looks like mine, quite a few entries show a "Translate this page" link on the right, and just for kicks I tried one.

The original page is here:; what you see basically in the middle is the title "やぎさんゆうびん" -- yagi-san (goat) yuubin (post, as in postal service). Then come the lines of the song:
♪ しろやぎさんから おてがみついた
♪ くろやぎさんたら よまずにたべた
♪ しかたがないので おてがみかいた
♪ さっきのてがみの ごようじなぁに
Basically, a letter from the white goat came to the black goat, who ate it without reading it. Nothing for it but to write a letter back: "What was your last letter about?" You can see where this is going, right? The white goat gets the letter from the black goat, eats it rather than reading, and infinite loop....

Now check out the translation. Japanese often doesn't have separations between words; in the above song, there are separations sort of between lines. I think this is what makes it really hard for the translation software to parse it.

Still the translation has its farcical side:
♪ MITSUITA to us from the white goat TABETA to YOMAZU GISANTARA
♪ ♪ KAITA there are black and you can not stand it so I stick ♪ The earlier you have it
I'll guess that English, French, or Chinese are easier than Japanese. Just a guess though.

