Jack HalpernThe Wired article Me Translate Pretty One Day said
Spanish to English? French to Russian? Computers haven’t been up to the task. But a New York firm with an ingenious algorithm and a really big dictionary is finally cracking the code.
Making this work effectively requires a dictionary that includes all of the possible conjugations and variations for every word. Declaramos, for example, offers up “declare”, “declared”, “declaring”, “stating”, and “testifying”, among others. Meaningful Machines’ Spanish-to-English dictionary, a database with about 2 million entries (20 times more than a standard Merriam-Webster’s), is a lexical feat in and of itself. The company outsourced the task to an institute run by Jack Halpern, a prominent lexicographer. The result is one of the largest bilingual dictionaries in the world.
The options spit out by the dictionary for each chunk of text can number in the thousands, many of which are gibberish. To determine the most coherent candidates, the system scans the 150 Gbytes of English text, ranking candidates by how many times they appear. The more often they’ve actually been used by an English speaker, the more likely they are to be a correct translation. “We declare our responsibility for what has occurred” is more likely to appear than, say, “responsibility of which it has happened.”
Jack Halpern is the CEO of
The CJK Dictionary Institute
(CJKI) in Japan. The institute, which is one of world’s prime sources
for CJK (Chinese, Japanese and Korean) dictionaries, specializes in the
compilation of comprehensive CJK and Arabic lexical databases for the
NLP industry. A lexicographer by profession, for sixteen years he
engaged in the compilation of the
Dictionary, and as a research fellow at Showa Women’s University
(Tokyo) was editor-in-chief of several kanji dictionaries for
learners, which have become standard reference works.
Jack, who has lived in Japan over 30 years, was born in Germany and lived in six countries including France, Brazil and the United States. An avid polyglot who specializes in Japanese and Chinese lexicography, he has studied 14 languages (fluent in nine) and has devoted several decades to the study of languages and lexicography. Founder and director of the Japan Yiddish Club, he has taught his native Yiddish to Japanese students. His favorite language is Brazilian Portuguese, which he speaks fluently and considers “God’s mother tongue”.
He has published over twenty books and dozens of articles and academic papers, mostly on the Japanese writing system and CJK information processing, has given over 600 public lectures on Japanese language and culture, and presented several dozen papers at international conferences. His books include The Kodansha Kanji Learners Dictionary (Japanese for Busy People) and NTC’s New Japanese–English Character Dictionary.
On a lighter note, Jack loves the sport of unicycling. Founder and long-time president of the International Unicycling Federation, he has promoted the sport worldwide for some 30 years, has started the unicycling sport in Japan, and is a director of the Japan Unicycling Association. Currently, his passion is playing the quena and improving his Chinese, Esperanto and Arabic.