This site contains information and resources related to Andrew Lampert's email text segmentation and classification research.
We have built systems that classify Requests and Commitments in Email at a range of different granularities: the message, paragraph and sentence level. The corpus we make available here can be used to create corpora at each of these levels, based on a variety of agreement metrics. We discuss the approach that we take in detail in my thesis, which is due for submission at the end of July 2013.
More detail will be available once the thesis has been examined.
In the meantime, we have made available the Annotated Dataset that contains 1000 email messages from the Enron email corpus, that have been independently marked by two independent annotators.
Our annotated data is licensed under a Creative Commons Attribution-Noncommercial 2.0 Generic License.
If you make use of any of these resources, please cite one of the following papers:
Andrew Lampert, Robert Dale and Cécile Paris (2010) - Detecting emails containing requests for action, In Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics Processing (HLT/NAACL 2010), pp. 984-992, Los Angeles, USA.
Andrew Lampert, Robert Dale and Cécile Paris (2008) - Requests and Commitments in Email are More Complex Than You Think: Eight Reasons to be Cautious, In Proceedings of Australasian Language Technology Association Workshop, pp. 64-72. Hobart, Australia.
Andrew Lampert, Robert Dale and Cécile Paris (2008) - The Nature of Requests and Commitments in Email Messages, In Proceedings of EMAIL-08: the AAAI Workshop on Enhanced Messaging, pp. 42-47, Chicago, USA.