| Table of Content [Hide] |
| Source: | http://repository.seasr.org/Meandre/Repositories/Demo-Flows/Clustering/repository.rdf |
| URI: | http://seasr.org/flows/clustering/ |
| Name: | Clustering |
| Creator: | admin |
| Date: | 2008-10-31 (14:07:37) |
| Rights: | UofI/NCSA |
| Tags: | agglomerative, discovery, dendrogram, hierarchical, cluster |
| Description: | This flow loads a delimited data set into a table. The first row has attribute labels and the second row has attribute types. This flow does a bottom-up clustering of a simple data set and displays the ‘tree’ representation of how the clusters were nested together (called a Dendrogram).\n\nThe default data set is the ‘iris’ data set, which relates basic measurements of flowers to their type. When the flow is run, you will be prompted for ‘input’ and ‘output’ attributes. Select all measurements (everything but the last attribute) as ‘inputs’ and the last attribute, ‘class’, as an ‘output’.\n\nAfter submitting the Attributes, you should see a red and green visualization of how the clustering model algorithm grouped the data. Clicking on one of these segments will bring up the raw data values of records that were put into that cluster. In general, the smaller clusters lower in the tree should have less and less diversity of flower class, while the top cluster will have the entire data set inside (and therefore the most diversity).\n\nAny numeric data set in a csv format can be processed by this flow. Simply put the data file’s location in the ‘Input URL or Path’ prior to executing the flow. For data files on the machine running the Meandre Infrastructure server, use the syntax: ‘file:///myDir/myFile.csv’, where the data file is ‘/myDir/myFile.csv’.
<br> |
Overview
Clustering is an unsupervised learning approach that seeks to discover groups of data that share similarity among attribute values. This flow does a bottom-up clustering of a simple data set and displays the “tree” representation of how the clusters were nested together (called a Dendrogram).
The default data set is the “iris” data set, which relates basic measurements of flowers to their type. When the flow is run, you will be prompted for “input” and “output” attributes. Select all measurements (everything but the last attribute) as “inputs” and the last attribute, “class”, as an “output”.
After submitting the Attributes, you should see a red and green visualization of how the clustering model algorithm grouped the data. Clicking on one of these segments will bring up the raw data values of records that were put into that cluster. In general, the smaller clusters lower in the tree should have less and less diversity of flower class, while the top cluster will have the entire data set inside (and therefore the most diversity).
Application
Resulting Visualization
Other Applications
Detailed Description
View of Flow
Data Type Restrictions
Data Handling
Any numeric data set in a csv format can be processed by this flow. Simply put the data file’s location in the “Input URL or Path” prior to executing the flow. For data files on the machine running the Meandre Infrastructure server, use the syntax: “file:///myDir/myFile.csv”, where the data file is ‘/myDir/myFile.csv’.


