1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
|
<HTML>
<HEAD>
<TITLE>Clustering</TITLE>
<LINK HREF="doxygen.css" REL="stylesheet" TYPE="text/css">
<LINK HREF="style_ini.css" REL="stylesheet" TYPE="text/css">
</HEAD>
<BODY BGCOLOR="#FFFFFF">
<A href="index.html">Home</A> ·
<A href="classes.html">Classes</A> ·
<A href="annotated.html">Annotated Classes</A> ·
<A href="modules.html">Modules</A> ·
<A href="functions_func.html">Members</A> ·
<A href="namespaces.html">Namespaces</A> ·
<A href="pages.html">Related Pages</A>
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<!-- Generated by Doxygen 1.8.5 -->
</div><!-- top -->
<div class="header">
<div class="headertitle">
<div class="title">Clustering </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>In <a class="el" href="namespaceOpenMS.html" title="Main OpenMS namespace. ">OpenMS</a>, generic hierarchical clustering is available, the example (Tutorial_Clustering.C) shows how to build a rudimental clustering pipeline.</p>
<h1><a class="anchor" id="Inputdata"></a>
Inputdata</h1>
<p>All types of data can be clustered, as long as a SimilarityComparator for the type is provided. This Comparator has to produce a similarity measurment with the <em></em>()-operator in the range of <em></em>[0,1] for each two elements of this type, so it can be transformed to a distance. Some SimilarityComparators are already implemented, e.g. the baseclass for the PeakSpectrum-type SimilarityComparator is <a class="el" href="classOpenMS_1_1PeakSpectrumCompareFunctor.html" title="Base class for compare functors of spectra, that return a similiarity value for two spectra...">OpenMS::PeakSpectrumCompareFunctor</a>.</p>
<div class="fragment"><div class="line"><span class="keyword">class </span>LowLevelComparator</div>
<div class="line">{</div>
<div class="line"><span class="keyword">public</span>:</div>
<div class="line"> <span class="keywordtype">double</span> operator()(<span class="keyword">const</span> <span class="keywordtype">double</span> first, <span class="keyword">const</span> <span class="keywordtype">double</span> second)<span class="keyword"> const</span></div>
<div class="line"><span class="keyword"> </span>{</div>
<div class="line"> <span class="keywordtype">double</span> x, y;</div>
<div class="line"> x = min(second, first);</div>
<div class="line"> y = max(first, second);</div>
<div class="line"> <span class="keywordflow">if</span> ((y - x) > 1)</div>
<div class="line"> {</div>
<div class="line"> <span class="keywordflow">throw</span> Exception::InvalidRange(__FILE__, __LINE__, __PRETTY_FUNCTION__);</div>
<div class="line"> }</div>
<div class="line"> <span class="keywordflow">return</span> 1 - (y - x);</div>
<div class="line"> }</div>
<div class="line"></div>
<div class="line">}; <span class="comment">// end of LowLevelComparator</span></div>
</div><!-- fragment --></p>
<p>This example of a SimilarityComparator is very basic and takes onedimensional input of <em>doubles</em> in the range of <em></em>[0,1]. Real input will generally be more complex and so has to be the corresponding SimilarityComparator. Note that similarity in the example is calculated by <em>1-distance</em>, whereas generally distance is obtained by getting the similarity and not the other way round.</p>
<h1><a class="anchor" id="Clustering"></a>
Clustering</h1>
<p>Clustering is conducted in the <em><a class="el" href="classOpenMS_1_1ClusterHierarchical.html" title="Hierarchical clustering with generic clustering functions. ">OpenMS::ClusterHierarchical</a></em> class that offers an easy way to perform the clustering.</p>
<p><div class="fragment"><div class="line"><a class="code" href="classInt.html">Int</a> <a class="code" href="RNPxl_8C.html#a217dbf8b442f20279ea00b898af96f52">main</a>()</div>
<div class="line">{</div>
<div class="line"> <span class="comment">// data</span></div>
<div class="line"> vector<double> data; <span class="comment">// must be filled</span></div>
<div class="line"> LowLevelComparator llc;</div>
<div class="line"> CompleteLinkage sl;</div>
<div class="line"> vector<BinaryTreeNode> tree;</div>
<div class="line"> DistanceMatrix<Real> dist; <span class="comment">// will be filled</span></div>
<div class="line"> ClusterHierarchical ch;</div>
<div class="line"> ch.setThreshold(0.15);</div>
</div><!-- fragment --></p>
<p>The <em>ClusterHierarchical</em> functions will need at least these arguments, setting the threshold is optional (per default set to 1,0). The template-arguments have to be set to the type of clustered data and the type of CompareFunctor used. In this example double and LowLevelComparator.</p>
<p><div class="fragment"><div class="line"> <span class="comment">// clustering</span></div>
<div class="line"> ch.cluster<<a class="code" href="classdouble.html">double</a>, LowLevelComparator>(data, llc, sl, tree, dist);</div>
</div><!-- fragment --></p>
<p>This function will create a hirarchical clustering up to the threshold. See <a class="el" href="tutorial_clustering.html#Output">Output</a>.</p>
<h1><a class="anchor" id="Output"></a>
Output</h1>
<p>If known, at what threshold (see <em><a class="el" href="classOpenMS_1_1ClusterHierarchical.html#ae2f71a927933400ed5ce71513352e9ae" title="Clustering function. ">OpenMS::ClusterHierarchical::cluster</a></em>) a reasonable clustering is produced, the setting of the right threshold can potentually speed up the clustering process. After exceeding the threshold, the resulting tree (std::vector of <a class="el" href="classOpenMS_1_1BinaryTreeNode.html" title="Elements of a binary tree used to represent a hierarchical clustering process. ">OpenMS::BinaryTreeNode</a>) is filled with dummy nodes. The tree represents the hirarchy of clusters by storing the stepwise merging process. It can eventually be transformed to a tree-representation in Newick-format and/or be analyzed with other methods the <a class="el" href="classOpenMS_1_1ClusterAnalyzer.html" title="Bundles analyzing tools for a clustering (given as sequence of BinaryTreeNode's) ">OpenMS::ClusterAnalyzer</a> class provides.</p>
<p><div class="fragment"><div class="line"> ClusterAnalyzer ca;</div>
<div class="line"> std::cout << ca.newickTree(tree) << std::endl;</div>
<div class="line"></div>
<div class="line"> <span class="keywordflow">return</span> 0;</div>
<div class="line">} <span class="comment">//end of main</span></div>
</div><!-- fragment --></p>
<p>So the output will look something like this (may actually vary since random numbers are used in this example): </p>
<div class="fragment"><div class="line">( ( ( ( ( 0 , 1 ) , ( 2 , ( 7 , 8 ) ) ) , ( ( 3 , 10 ) , ( 4 , 5 ) ) ) , ( 6 , 9 ) ) , 11 )</div>
</div><!-- fragment --><p>For closer survey of the clustering process one can also view the whole hirarchy by viewing the tree in Newick-format with a tree viewer such as TreeViewX. A visualization of a particular cluster step (which gives rise to a certain partition of the data clustered) can be created with heatmaps (for example with gnuplot 4.3 heatmaps and the corresponding distance matrix). </p>
</div></div><!-- contents -->
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<TABLE width="100%" border="0">
<TR>
<TD><font color="#c0c0c0">OpenMS / TOPP release 1.11.1</font></TD>
<TD align="right"><font color="#c0c0c0">Documentation generated on Thu Nov 14 2013 11:19:24 using doxygen 1.8.5</font></TD>
</TR>
</TABLE>
</BODY>
</HTML>
|