File: tutorial_clustering.html

package info (click to toggle)
openms 1.11.1-5
links: PTS, VCS
area: main
in suites: jessie, jessie-kfreebsd
size: 436,688 kB
ctags: 150,907
sloc: cpp: 387,126; xml: 71,547; python: 7,764; ansic: 2,626; php: 2,499; sql: 737; ruby: 342; sh: 325; makefile: 128
file content (85 lines) | stat: -rw-r--r-- 7,590 bytes
<HTML>
<HEAD>
<TITLE>Clustering</TITLE>
<LINK HREF="doxygen.css" REL="stylesheet" TYPE="text/css">
<LINK HREF="style_ini.css" REL="stylesheet" TYPE="text/css">
</HEAD>
<BODY BGCOLOR="#FFFFFF">
<A href="index.html">Home</A> &nbsp;&middot;
<A href="classes.html">Classes</A> &nbsp;&middot;
<A href="annotated.html">Annotated Classes</A> &nbsp;&middot;
<A href="modules.html">Modules</A> &nbsp;&middot;
<A href="functions_func.html">Members</A> &nbsp;&middot;
<A href="namespaces.html">Namespaces</A> &nbsp;&middot;
<A href="pages.html">Related Pages</A>
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<!-- Generated by Doxygen 1.8.5 -->
</div><!-- top -->
<div class="header">
  <div class="headertitle">
<div class="title">Clustering </div>  </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>In <a class="el" href="namespaceOpenMS.html" title="Main OpenMS namespace. ">OpenMS</a>, generic hierarchical clustering is available, the example (Tutorial_Clustering.C) shows how to build a rudimental clustering pipeline.</p>
<h1><a class="anchor" id="Inputdata"></a>
Inputdata</h1>
<p>All types of data can be clustered, as long as a SimilarityComparator for the type is provided. This Comparator has to produce a similarity measurment with the <em></em>()-operator in the range of <em></em>[0,1] for each two elements of this type, so it can be transformed to a distance. Some SimilarityComparators are already implemented, e.g. the baseclass for the PeakSpectrum-type SimilarityComparator is <a class="el" href="classOpenMS_1_1PeakSpectrumCompareFunctor.html" title="Base class for compare functors of spectra, that return a similiarity value for two spectra...">OpenMS::PeakSpectrumCompareFunctor</a>.</p>
 <div class="fragment"><div class="line"><span class="keyword">class </span>LowLevelComparator</div>
<div class="line">{</div>
<div class="line"><span class="keyword">public</span>:</div>
<div class="line">  <span class="keywordtype">double</span> operator()(<span class="keyword">const</span> <span class="keywordtype">double</span> first, <span class="keyword">const</span> <span class="keywordtype">double</span> second)<span class="keyword"> const</span></div>
<div class="line"><span class="keyword">  </span>{</div>
<div class="line">    <span class="keywordtype">double</span> x, y;</div>
<div class="line">    x = min(second, first);</div>
<div class="line">    y = max(first, second);</div>
<div class="line">    <span class="keywordflow">if</span> ((y - x) &gt; 1)</div>
<div class="line">    {</div>
<div class="line">      <span class="keywordflow">throw</span> Exception::InvalidRange(__FILE__, __LINE__, __PRETTY_FUNCTION__);</div>
<div class="line">    }</div>
<div class="line">    <span class="keywordflow">return</span> 1 - (y - x);</div>
<div class="line">  }</div>
<div class="line"></div>
<div class="line">}; <span class="comment">// end of LowLevelComparator</span></div>
</div><!-- fragment --></p>
<p>This example of a SimilarityComparator is very basic and takes onedimensional input of <em>doubles</em> in the range of <em></em>[0,1]. Real input will generally be more complex and so has to be the corresponding SimilarityComparator. Note that similarity in the example is calculated by <em>1-distance</em>, whereas generally distance is obtained by getting the similarity and not the other way round.</p>
<h1><a class="anchor" id="Clustering"></a>
Clustering</h1>
<p>Clustering is conducted in the <em><a class="el" href="classOpenMS_1_1ClusterHierarchical.html" title="Hierarchical clustering with generic clustering functions. ">OpenMS::ClusterHierarchical</a></em> class that offers an easy way to perform the clustering.</p>
<p><div class="fragment"><div class="line"><a class="code" href="classInt.html">Int</a> <a class="code" href="RNPxl_8C.html#a217dbf8b442f20279ea00b898af96f52">main</a>()</div>
<div class="line">{</div>
<div class="line">  <span class="comment">// data</span></div>
<div class="line">  vector&lt;double&gt; data; <span class="comment">// must be filled</span></div>
<div class="line">  LowLevelComparator llc;</div>
<div class="line">  CompleteLinkage sl;</div>
<div class="line">  vector&lt;BinaryTreeNode&gt; tree;</div>
<div class="line">  DistanceMatrix&lt;Real&gt; dist; <span class="comment">// will be filled</span></div>
<div class="line">  ClusterHierarchical ch;</div>
<div class="line">  ch.setThreshold(0.15);</div>
</div><!-- fragment --></p>
<p>The <em>ClusterHierarchical</em> functions will need at least these arguments, setting the threshold is optional (per default set to 1,0). The template-arguments have to be set to the type of clustered data and the type of CompareFunctor used. In this example double and LowLevelComparator.</p>
<p><div class="fragment"><div class="line">  <span class="comment">// clustering</span></div>
<div class="line">  ch.cluster&lt;<a class="code" href="classdouble.html">double</a>, LowLevelComparator&gt;(data, llc, sl, tree, dist);</div>
</div><!-- fragment --></p>
<p>This function will create a hirarchical clustering up to the threshold. See <a class="el" href="tutorial_clustering.html#Output">Output</a>.</p>
<h1><a class="anchor" id="Output"></a>
Output</h1>
<p>If known, at what threshold (see <em><a class="el" href="classOpenMS_1_1ClusterHierarchical.html#ae2f71a927933400ed5ce71513352e9ae" title="Clustering function. ">OpenMS::ClusterHierarchical::cluster</a></em>) a reasonable clustering is produced, the setting of the right threshold can potentually speed up the clustering process. After exceeding the threshold, the resulting tree (std::vector of <a class="el" href="classOpenMS_1_1BinaryTreeNode.html" title="Elements of a binary tree used to represent a hierarchical clustering process. ">OpenMS::BinaryTreeNode</a>) is filled with dummy nodes. The tree represents the hirarchy of clusters by storing the stepwise merging process. It can eventually be transformed to a tree-representation in Newick-format and/or be analyzed with other methods the <a class="el" href="classOpenMS_1_1ClusterAnalyzer.html" title="Bundles analyzing tools for a clustering (given as sequence of BinaryTreeNode&#39;s) ">OpenMS::ClusterAnalyzer</a> class provides.</p>
<p><div class="fragment"><div class="line">  ClusterAnalyzer ca;</div>
<div class="line">  std::cout &lt;&lt; ca.newickTree(tree) &lt;&lt; std::endl;</div>
<div class="line"></div>
<div class="line">  <span class="keywordflow">return</span> 0;</div>
<div class="line">} <span class="comment">//end of main</span></div>
</div><!-- fragment --></p>
<p>So the output will look something like this (may actually vary since random numbers are used in this example): </p>
<div class="fragment"><div class="line">( ( ( ( ( 0 , 1 ) , ( 2 , ( 7 , 8 ) ) ) , ( ( 3 , 10 ) , ( 4 , 5 ) ) ) , ( 6 , 9 ) ) , 11 )</div>
</div><!-- fragment --><p>For closer survey of the clustering process one can also view the whole hirarchy by viewing the tree in Newick-format with a tree viewer such as TreeViewX. A visualization of a particular cluster step (which gives rise to a certain partition of the data clustered) can be created with heatmaps (for example with gnuplot 4.3 heatmaps and the corresponding distance matrix). </p>
</div></div><!-- contents -->
<HR style="height:1px; border:none; border-top:1px solid #c0c0c0;">
<TABLE width="100%" border="0">
<TR>
<TD><font color="#c0c0c0">OpenMS / TOPP release 1.11.1</font></TD>
<TD align="right"><font color="#c0c0c0">Documentation generated on Thu Nov 14 2013 11:19:24 using doxygen 1.8.5</font></TD>
</TR>
</TABLE>
</BODY>
</HTML>