Scheme: University Research Fellowship
Organisation: University of Cambridge
Dates: Oct 2010-Jul 2014
Summary: I conduct research on Natural Language Processing (NLP) a growing field of computer science which develops (i) key technologies for analysing, understanding and generating human language and (ii) robust text-based applications for human use (e.g. text mining, text summarisation, question answering, machine translation). NLP is a particularly interesting and timely field to work on. Due to the growing problem of information overload, there is a great demand for NLP-based applications in many areas of the society (e.g. communication, healthcare, science). After decades of research, basic techniques are now sufficiently developed to be integrated into practical applications. Yet major challenge is still involved in improving them further for demanding real-world applications.
My work focuses on the development of novel ideas, techniques and tools for NLP, as well as on applying them to benefit important application tasks. My main area of interest is automatic lexical acquisition. Because successful language engineering requires accurate knowledge about words, high-quality lexical resources (e.g. dictionaries) are essential. Currently, most lexical resources are developed manually. This is costly, and the resulting resources require extensive labour-intensive porting to new tasks. Automatic acquisition of lexical information from repositories of text (corpora) is a more promising avenue to pursue. It is now viable, cost-effective and can significantly improve the performance and portability of systems. While advances have recently been made in many areas of automatic lexical acquisition, few current techniques are ready for real-world use. Most techniques rely heavily on supervision and the availability of expensive, manually annotated datasets, or are unsupervised but suffer from low accuracy or limited scope.
My research focuses on improving the accuracy, portability and scalability of lexical acquisition techniques so that they could be used to support practical application tasks. My aim is to develop unsupervised or lightly supervised approaches that integrate relevant insights from theoretical linguistics and are capable of inducing large-scale lexical data from general, domain-specific and multilingual texts. I demonstrate the usefulness of these techniques in the context of practical application tasks and use them to support relevant research in cognitive sciences, e.g. child language acquisition and human language processing.
Dates: Oct 2005-Sep 2010
Summary: This project summary is not available for publication.