
Overview
The underlying process of this analysis is called entity extraction, entity identification, or named entity recognition. This process seeks to locate and classify small elements in text into predefined categories – in this case dates. The OpenNLP Library is used to run the text through a series of activities that include: tokenization, Part-of-speech tagging, and entity extraction. Information extraction is used to extract date entities that can be displayed on a timeline. This allows a researcher to review sentences that include dates by examining a timeline. We are using the OpenNLP system to extract the entities from the text in an automated fashion. The date entities and their sentences are then displayed in Simile Timeline.
References
- OpenNLP – http://opennlp.sourceforge.net
- Simile Timeline – http://code.google.com/p/simile-widgets/














