Toan Q. Nguyen, David Chiang

We present a simple method to improve neural translation of a low-resourcelanguage pair using parallel data from a related, also low-resource, languagepair. The method is based on the transfer method of Zoph et al., but whereastheir method ignores any source vocabulary overlap, ours exploits it. First, wesplit words using Byte Pair Encoding (BPE) to increase vocabulary overlap.Then, we train a model on the first language pair and transfer its parameters,including its source word embeddings, to another model and continue training onthe second language pair. Our experiments show that transfer learning helpsword-based translation only slightly, but when used on top of a much strongerBPE baseline, it yields larger improvements of up to 4.3 BLEU.