File: iris_copypaste.md

package info (click to toggle)
r-cran-simplermarkdown 0.0.6-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 392 kB
  • sloc: makefile: 2
file content (57 lines) | stat: -rw-r--r-- 1,872 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
title: An analysis of iris
---


## Introduction

From the help page of the iris data set:

> This famous (Fisher's or Anderson's) iris data set gives the
> measurements in centimeters of the variables sepal length and width
> and petal length and width, respectively, for 50 flowers from each of
> 3 species of iris. The species are *Iris setosa*, *versicolor*, and
> *virginica*.

## Descriptives

The table below shows for each of the iris species the mean value of the
colums in the data set.

: Mean values for each of the properties for each of the iris species.

|Species   |Sepal.Length|Sepal.Width|Petal.Length|Petal.Width|
|----------|------------|-----------|------------|-----------|
|setosa    |5.006       |3.428      |1.462       |0.246      |
|versicolor|5.936       |2.770      |4.260       |1.326      |
|virginica |6.588       |2.974      |5.552       |2.026      |

```{.R}
pal <- hcl.colors(3, "Dark2")
plot(iris$Sepal.Width, iris$Sepal.Length, pch = 20, 
  col = pal[iris$Species], xlab = "Sepal Width", 
  ylab = "Sepal Length", bty = 'n', las = 1)
legend("topright", legend = levels(iris$Species), 
  fill = pal, bty = 'n', border = NA)
```

![Relation between sepal length and width for the different iris species.](./figures/iris.png){#figure}

## Species prediction

``` R
library(MASS)
m <- lda(Species ~ Sepal.Width + Sepal.Length, data = iris)
p <- predict(m)
predicted_species <- p$class
table(predicted_species, iris$Species)
##                  
## predicted_species setosa versicolor virginica
##        setosa         49          0         0
##        versicolor      1         36        15
##        virginica       0         14        35
```

This model predicts in 80% of the cases the correct species. However,
this is mainly for *setosa* for the other species the model predicts the
correct species only for 71% of the records.