File: iris_copypaste.md

package info (click to toggle)
r-cran-simplermarkdown 0.0.6-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 392 kB
  • sloc: makefile: 2
file content (65 lines) | stat: -rw-r--r-- 1,871 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
title: An analysis of iris
---

```{.R results=FALSE echo=FALSE}
options(md_formatter = format_copypaste)
```

Introduction
-------------------------

From the help page of the iris data set:

> This famous (Fisher's or Anderson's) iris data set gives the
> measurements in centimeters of the variables sepal length and
> width and petal length and width, respectively, for 50 flowers
> from each of 3 species of iris.  The species are _Iris setosa_,
> _versicolor_, and _virginica_.


Descriptives
--------------------------

The table below shows for each of the iris species the mean value of the colums in the data set. 

```{.R #table fun=output_table caption="Mean values for each of the properties for each of the iris species."}
aggregate(iris[1:4], iris["Species"], mean)
```


```{.R #figure fun=output_figure 
  caption="Relation between sepal length and width for the different iris species." 
  name="iris" height=6 width=8 units="in" res=150 echo=TRUE}
pal <- hcl.colors(3, "Dark2")
plot(iris$Sepal.Width, iris$Sepal.Length, pch = 20, 
  col = pal[iris$Species], xlab = "Sepal Width", 
  ylab = "Sepal Length", bty = 'n', las = 1)
legend("topright", legend = levels(iris$Species), 
  fill = pal, bty = 'n', border = NA)
```



Species prediction
---------------------------------

```{.R}
library(MASS)
m <- lda(Species ~ Sepal.Width + Sepal.Length, data = iris)
p <- predict(m)
predicted_species <- p$class
table(predicted_species, iris$Species)
```

This model predicts in `round(mean(predicted_species==iris$Species)*100)`{.R}% of the
cases the correct species. However, this is mainly for *setosa* for the other species the
model predicts the correct species only for
`sel<-iris$Species!="setosa";round(100*mean(predicted_species[sel] == iris$Species[sel]))`{.R}% of
the records.



```{.R results=FALSE echo=FALSE}
options(md_formatter = NULL)
```