In Machine Learning, the traditional assumption is that the data our model is applied to is the same as the data we used for training. This assumption is proven false as soon as we move into the real world: many of the data sources we encounter will be very different than our original training data (same meaning here that it comes from the same distribution). In practice, this causes the performance of our model to deteriorate significantly.

Domain adaptation is a prominent approach to transfer learning that can help to bridge this discrepancy between the training and test data. Domain adaptation methods typically seek to identify features that are shared between the domains or learn representations that are general enough to be useful for both domains. In this blog post, I will discuss the motivation for, and the findings of the recent paper that I published with Barbara Planck. In it, we outline a complementary approach to domain adaptation – rather than learning a model that can adapt between the domains, we will learn to select data that is useful for training our model.