1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184
|
Statistical Clustering Methods Validation
1. Overview
The purpose of this validation study is to determine the
factors that affect the performance and accuracy of the clustering
methods. The subject clustering methods are k-d tree based
k-means(k-means) and maximum likelihood expectation
maximization(MLEM). We used a set of real T1-weighted scans and a set
of synthesized T1 and PD scans of brain tissues. Both clustering
methods were tested with the two datasets. The first dataset provides
sensitivity and specificity measures for univariate cases, and the
second set provids same measures for bivariate cases.
2. How to run the experients
1) change directory to "Code" subdirectory.
2) To run the univarate experiment, edit the "IBSR.experiment.R" file
using any text editor. You have to change the "validationRoot",
"imagePath", and "binPath". The "validationRoot" is where this file
exists, the "imagePath" is where your IBSR images are, and the
"binPath" is the "bin" directory for your ITK build. After this
change, just type "R BATCH IBSR.experiment.R &" in command line, of if
you don't have GNU R software, run the "IBSR.experiment.sh" that has
a sequence of program calls. If you are not using bash shell or using
windows, modified the file as a batch file or shell script for your
system. The R script will skip program calls if the
"Result" subdirectory already has the result data files (.dat files)
for the program. So, if you want to run a specific program call,
remove the output data file from the "Results" directory before
running the R script.
3) To run the bivariate experiment, edit the "BrainWeb.experiment.R"
file as done with the "IBSR.experiment.R".
* You can get more information on GNU R and the software at
< http://www.r-project.org/ >
3. How to get the datasets and prepare them for experiments
3-1. "Internet Brain Segmentation Repository (IBSR) " dataset
IBSR has multiple sets of images and segmentation results from
synthetic source or real brain scans. Among them, We used the "20
Normal Subjects, T1-Weighted Scans with Segmentations" dataset. It
comes with segmentation results for 20 normal subjects.
1) Go to the "Internet Brain Segmentation Repository" website at
< http://http://www.cma.mgh.harvard.edu/ibsr/>. Click the "Data
Exchange" link. Click the "Application for Data" link and fill the
form and submit it. You will get a notification e-mail with your user
id and password.
2) Click the "Real Data" link in the "2.Download
Segementation and MR Image Data" section on the "Data Exchange" page.
3) Click the "Real Data" and type in you ID and password when a prompt
box appears. Read the "REAME file" for the "20 Normal Subjects,
T1-Weighted Scans with Segmentations" dataset. follow the download
instructions in the "README" file. You only need the"brain-only MR
data files" and "manual segmentation files."
4) After extracting the compressed files to your image directory, copy
the ".mhd" files and "offset.dat" file from the "this
directory"/Inputs/ibsr/20_Normal_T1_{brain|seg} to "your image
directory"/20_Normal_T1_{brain|seg} accordingly.
"MR brain data set 1320_2_max and its manual segmentation
was provided by the Center for Morphometric Analysis at Massachusetts
General Hospital and is available at
http://neuro-www.mgh.harvard.edu/cma/ibsr."
3-2. "BrainWeb: Simulated Brain Database" dataset
BrainWeb database has several simulated brain scans for
different modality such as T1, T2, and PD of normal and MS lesion
brain. We use only the normal T1 and PD scans.
1) Go to the website at < http://www.bic.mni.mcgill.ca/brainweb/ >.
2) Click the "Normal Brain Database" link.
3) Select T1 for "Modality", 1mm for "Slice thickness", 0% for
"Noise", and 0% for "Intensity non-uniformity("RF")". Then, click the
"Download" button.
4) Select raw short for the file format and any option for the
"Compresion" as you want. Fill the personal information fields and
click the "[Start download]" button.
5) Repeat step 2) - 4) with 3% noise and 9% noise options selected.
6) To get the discrete class labels image, click the "Nromal Brain
Database" link on the first page. Click the "anatomical model of
normal brain" link. Click the "[Download]" link next to the "Discrete
Model".
7) Copy the ".mhd" files from "this directory"/Inputs/BrainWeb/" to
"your image directory".
8) Edit the copied .mhd file with proper raw data files that were
generated by decompressing the downloaded BrainWeb datasets. For
example, brainweb.PD.1mm.9.0.mhd file is for the raw data with
Modality - PD, thickness - 1mm, and noise - 9% settings. If your raw
file name is bwPD9pnoise.raw, then change the "ElementDataFile" field
value from "brainweb.PD.1mm.9.0.raw" to "bwPD9pnoise.raw".
4. How to interpret the clustering output file (.dat files under the
"Results" directory)
Every .dat file starts with eithter BrainWeb or IBSR, which
indicates the file is output of experiments using BrainWeb data or
IBSR data. The second part of the file name indicates the clustering
method used (EM for MLEM and Kmeans for k-d tree based k-means). The
third part has different meaning for BrainWeb and IBSR results. In the
case of IBSR, that part should look like 111_2 and it means which scan
was used to generate the results. In BrainWeb cases, that one should
be one of 0pn, 3pn and 9pn. It indicates it was generated by
experiements using scans of 0% noise, 3% noise, or 9% noise. The last
part is common to BrainWeb and IBSR experiments. "1ext" means the
initial parameters (centroids) are generated within the range of
plus-minus one standard deviation from the "true" means. "2ext" means
the range is plus-minus two standard deviation. You can see the one
hundred sets of initial parameters for CSF, GM, WM for each output
file, in the "Inputs/{BrainWeb|ibsr}" directory. The initial
parameters files' names includes params in it. The "true" means and
other class statistics are in the files with "classes" in their file
names.
The "class statistics files", "initial parameters file", and
"result files" are basically tables. You would find what column means
what in that table by looking at the column headers in each file.
For example, the 202_3.classes.dat has tree lines. The first line
contains the class statistics for CSF (labelled as 128 in the
segmentation image), the second and third are for gray matter (192)
and white matter (254). The "sigma.1" column is variance.
The "result file" is more complex. Let's look at several line
in the "IBSR.EM.202_3.1ext.dat" file.
The header of the file is:
"case" "class" "mapped class" "mean.1" "sigma.1" "proportion" "128" "192" "254" "iterations" "tim
e estimation" "time total"
The line 14 - 15 is as follows: (tabs are inserted for the
column "128" "192" and "254" for presentation purpose. In actual file
all those tabs are space characters.)
5 128 128 78.8675 8187.62 0.0750448 7473 13148 659 270 0.25 0.64
5 192 192 147.706 1934.05 0.590045 1665 235810 24717 270 0.25 0.64
5 254 254 195.985 3202.48 0.33491 71 16606 128313 270 0.25 0.64
The "case" column means a run with a set of initial paramters
for the tree classes (128, 192, 254 - the second column). The "mapped
class" means the class labels that each cluster was assigned to after
estimation of cluster parameters. the "mean.1" and "sigma.1" are
straightforward mean and variance. You can find the "proportion"
column only in result files for MLEM experiments. It is the estimated proportion
of a class. As you might already notice, the "128", "192", and "254"
columns are class lables. You can think those three columns along
a same "case" ( or run) as a 3x3 classification matrix. The cell at
[1, 1] (7473) shows the number of measurement vectors (instances in a
sample) that classified correctly as a member of class "128". The cell
at [2, 3] (247171) shows the number of measurement vectors that actuall
belongs to the class "254" but classified as a member of class
"192". The last three columns are same for a single run. They are
measurements of performance in terms of the number of iterations taken
before convergence, time (in seconds) to estimate the final class
parameters, total time (in secondds) to estimate the parameters and
assign class labels to every measurement.
|