Print Print

Tag Cloud Viewer With Stemming
Posted by admin on May 31 2010 04:22:42pm
This flow is similar to the "Tag Cloud Viewer" flow, but adds the components necessary to perform stemming. This flow will display a tag cloud for a url that points to a text document. For pdf, the text is extracted from the pdf file and a tag cloud is created for words. For html, the html tags are removed and then a tag cloud is created for the words.

This flow applies stemming to the words to create a dictionary of terms used, then reverse stems to the shortest word with the same mapping for the tag cloud visualization. So "run", "runs", "running" would map to "run" and the word used in the tag cloud visualization would be "run". If the word "run" did not exist, then the shortest word that exists in the dictionary created would be used, in this case "runs".
 
Flow URI: http://seasr.org/flows/tag-cloud-viewer-with-stemming/
Location URI: http://repository.seasr.org/Meandre/Locations/1.4.8/Flows/tag-cloud-viewer-with-stemming/repository.rdf

Overview

The creation of Tag Clouds for text documents has become very popular. This flow retrieves a document from a url and creates a tag cloud from the words. This flow is filtering common words and displaying the top 100 words.

References

  • Tag Cloud – http://emumarketing.uoregon.edu/

Leave a Reply