Workshop
Domain Adaptation Workshop: Theory and Application
John Blitzer · Corinna Cortes · Afshin Rostamizadeh
Melia Sierra Nevada: Monachil
A common assumption in theoretical models of learning such as the standard PAC model [20], as well as in the design of learning algorithms, is that training instances are drawn according to the same distribution as the unseen test examples. In practice, however, there are many cases where this assumption does not hold. There can be no hope for generalization, of course, when the training and test distributions vastly differ, but when they are less dissimilar, learning can be more successful. The main theme of this workshop is the theoretical, algorithmic, and empirical analysis of such cases where there is a mismatch between the training and test distributions. This includes the crucial scenario of domain adaptation where the training examples are drawn from a source domain distinct from the target domain from which the test examples are extracted, or the more general scenario of multiple source adaptation where training instances may have been collected from multiple source domains, all distinct from the target [13]. The topic of our workshop also covers other important problems such that of sample bias correction and has tight connections with other problems such as active learning where the active distribution corresponding to the learner's labeling request differs from the target distribution. Many other intermediate problems and scenarios appear in practice, which will be all covered by this workshop. These problems are all critical and appear in almost all real-world applications of machine learning. Ignoring them can lead to dramatically poor results. Some straightforward existing solutions based on importance weighting are not always successful [5]. Which algorithms should be used for domain adaptation? Under what theoretical conditions will they be successful? How do these algorithms scale to large domain adaptation problems? These are some of the questions that the workshop aims to address. The problem of domain adaptation and other related ones already mentioned are crucial in practice. They arise in a variety of applications in natural language processing [7, 2, 10, 4, 6], speech processing [11, 8, 17, 19, 9, 18], computer vision [15], and many other areas.
The empirical performance of domain adaptation in these applications, the design of new and effective algorithms, as well as the creation of a solid theoretical framework for domain adaptation as initiated by recent work [1, 13, 12, 14, 5] are all challenging objectives for this workshop. By bringing together current experts in all aspects of this problem, we aim to foster collaborations and successful progress in this field.
Goals:
Despite the recent advances in domain adaptation, many of the most successful practical achievements in domain adaptation [3, 16, 21] have not been robust, in part because they lack formal assumptions about when they could perform well. At the same time, some of the most influential theoretical work guarantees near optimal performance in new domains, but under assumptions that may not hold in practice [1, 12, 13].
Our workshop will bridge theory and practice in the following ways:
1.We will have one applied and two theoretical invited talks.
2.We will advertise the workshop to both the applied and theoretical communities.
3.We will have discussion sessions whose aim emphasizes both the formal assumptions underlying successful practical algorithms and new algorithms based on theoretical foundations.
Workshop attendees should come away with an understanding of the domain adaptation problem, how it appears in practical applications and existing theoretical guarantees that can be provided in this more general setting. More importantly, attendees will be exposed to the important open problems of the field, which will encourage new collaborations and results.
References:
[1] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. Proceedings of NIPS 2006, 2007.
[2] J. Blitzer, M. Dredze, and F. Pereira. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. In ACL 2007, 2007.
[3] J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, 2006.
[4] C. Chelba and A. Acero. Adaptation of maximum entropy capitalizer: Little data can help a lot. Computer Speech & Language, 20(4):382-399, 2006.
[5] C. Cortes, Y. Mansour, and M. Mohri. Learning bounds for importance weighting. In Advances in Neural Information Processing Systems (NIPS 2010), Vancouver, Canada, 2010. MIT Press.
[6] H. Daum'e III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26:101-126, 2006.
[7] M. Dredze, J. Blitzer, P. P. Talukdar, K. Ganchev, J. Graca, and F. Pereira. Frustratingly Hard Domain Adaptation for Parsing. In CoNLL 2007, 2007.
[8] J.-L. Gauvain and Chin-Hui. Maximum a posteriori estimation for multi- variate gaussian mixture observations of markov chains. IEEE Transactions on Speech and Audio Processing, 2(2):291-298, 1994.
[9] F. Jelinek. Statistical Methods for Speech Recognition. The MIT Press, 1998.
[10] J. Jiang and C. Zhai. Instance Weighting for Domain Adaptation in NLP. In Proceedings of ACL 2007, pages 264-271. Association for Computational Linguistics, 2007.
[11] C. J. Legetter and P. C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models. Computer Speech and Language, pages 171-185, 1995.
[12] Y. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation: Learning bounds and algorithms. Conference on Learning Theory, 2009.
[13] Y. Mansour, M. Mohri, and A. Rostamizadeh. Domain adaptation with multiple sources. In Advances in Neural Information Processing Systems (NIPS 2008), pages 1041-1048, Vancouver, Canada, 2009. MIT Press.
[14] Y. Mansour, M. Mohri, and A. Rostamizadeh. Multiple source adaptation and the R'enyi divergence. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), Montr'eal, Canada, June 2009.
[15] A. M. Mart'inez. Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class. IEEE Trans. Pattern Anal. Mach. Intell., 24(6):748-763, 2002.
[16] D. McClosky, E. Charniak, and M. Johnson. Reranking and self-training for parser adaptation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 337-344. Association for Computational Linguistics, 2006.
[17] S. D. Pietra, V. D. Pietra, R. L. Mercer, and S. Roukos. Adaptive language modeling using minimum discriminant estimation. In HLT '91: Proceedings of the workshop on Speech and Natural Language, pages 103-106, 1992.
[18] B. Roark and M. Bacchiani. Supervised and unsupervised PCFG adaptation to novel domains. In Proceedings of HLT-NAACL, 2003.
[19] R. Rosenfeld. A Maximum Entropy Approach to Adaptive Statistical Language Modeling. Computer Speech and Language, 10:187-228, 1996.
[20] L. G. Valiant. A theory of the learnable. ACM Press New York, NY, USA, 1984.
[21] G. Xue, W. Dai, Q. Yang, and Y. Yu. Topic-bridged plsa for cross-domain text classication. In SIGIR, 2008.