Jack Halpern
The Wired article Me Translate Pretty One Day said
Spanish to English? French to Russian? Computers haven’t been up to the task. But a New York firm with an ingenious algorithm and a really big dictionary is finally cracking the code.
Making this work effectively requires a dictionary that includes all of the possible conjugations and variations for every word. Declaramos, for example, offers up “declare”, “declared”, “declaring”, “stating”, and “testifying”, among others. Meaningful Machines’ Spanish-to-English dictionary, a database with about 2 million entries (20 times more than a standard Merriam-Webster’s), is a lexical feat in and of itself. The company outsourced the task to an institute run by Jack Halpern, a prominent lexicographer. The result is one of the largest bilingual dictionaries in the world.
The options spit out by the dictionary for each chunk of text can number in the thousands, many of which are gibberish. To determine the most coherent candidates, the system scans the 150 Gbytes of English text, ranking candidates by how many times they appear. The more often they’ve actually been used by an English speaker, the more likely they are to be a correct translation. “We declare our responsibility for what has occurred” is more likely to appear than, say, “responsibility of which it has happened.”
Jack Halpern is the CEO of
The CJK Dictionary Institute
(CJKI) in Japan. The institute, which is one of world’s prime sources
for CJK (Chinese, Japanese and Korean) dictionaries, specializes in the
compilation of comprehensive CJK and Arabic lexical databases for the
NLP industry. A lexicographer by profession, for sixteen years he
engaged in the compilation of the New
Japanese–English Character
Dictionary, and as a research fellow at Showa Women’s University
(Tokyo) was editor-in-chief of several kanji dictionaries for
learners, which have become standard reference works.
Jack, who has lived in Japan over 30 years, was born in
Germany and lived in six countries including France, Brazil and the
United States. An avid polyglot who specializes in Japanese and Chinese
lexicography, he has studied 14 languages (fluent in nine) and has
devoted several decades to the study of languages and lexicography.
Founder and director of the
Japan Yiddish Club, he has taught his
native Yiddish to Japanese students. His favorite language is Brazilian
Portuguese, which he speaks fluently and considers “God’s mother
tongue”.
He has published over
twenty books and dozens of articles
and
academic papers, mostly on the Japanese writing system and CJK
information processing, has given over 600 public lectures on Japanese
language and culture, and presented several dozen papers at
international conferences.
His books include
The Kodansha Kanji Learners Dictionary (Japanese for Busy
People) and
NTC’s New Japanese–English Character Dictionary.
On a lighter note, Jack loves the sport of unicycling.
Founder and long-time president of the
International Unicycling
Federation, he has promoted the sport worldwide for some 30 years,
has
started the unicycling sport in Japan, and is a director of the
Japan
Unicycling Association. Currently, his passion is playing the quena
and
improving his Chinese, Esperanto and Arabic.