File: data_partition.md

package info (click to toggle)
r-cran-datawizard 1.0.1%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,300 kB
  • sloc: sh: 13; makefile: 2
file content (136 lines) | stat: -rw-r--r-- 6,559 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# data_partition works as expected

    Code
      data_partition(letters, seed = 123)
    Output
      $p_0.7
         data .row_id
      1     c       3
      2     e       5
      3     h       8
      4     i       9
      5     j      10
      6     k      11
      7     l      12
      8     m      13
      9     n      14
      10    o      15
      11    p      16
      12    r      18
      13    s      19
      14    t      20
      15    u      21
      16    w      23
      17    x      24
      18    y      25
      
      $test
        data .row_id
      1    a       1
      2    b       2
      3    d       4
      4    f       6
      5    g       7
      6    q      17
      7    v      22
      8    z      26
      

---

    Code
      str(data_partition(iris, proportion = 0.7, seed = 123))
    Output
      List of 2
       $ p_0.7:'data.frame':	105 obs. of  6 variables:
        ..$ Sepal.Length: num [1:105] 4.6 5.4 4.6 5 4.4 4.9 4.8 4.8 4.3 5.8 ...
        ..$ Sepal.Width : num [1:105] 3.1 3.9 3.4 3.4 2.9 3.1 3.4 3 3 4 ...
        ..$ Petal.Length: num [1:105] 1.5 1.7 1.4 1.5 1.4 1.5 1.6 1.4 1.1 1.2 ...
        ..$ Petal.Width : num [1:105] 0.2 0.4 0.3 0.2 0.2 0.1 0.2 0.1 0.1 0.2 ...
        ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
        ..$ .row_id     : int [1:105] 4 6 7 8 9 10 12 13 14 15 ...
       $ test :'data.frame':	45 obs. of  6 variables:
        ..$ Sepal.Length: num [1:45] 5.1 4.9 4.7 5 5.4 5.1 5.7 5.2 5.2 5.2 ...
        ..$ Sepal.Width : num [1:45] 3.5 3 3.2 3.6 3.7 3.5 3.8 3.5 3.4 4.1 ...
        ..$ Petal.Length: num [1:45] 1.4 1.4 1.3 1.4 1.5 1.4 1.7 1.5 1.4 1.5 ...
        ..$ Petal.Width : num [1:45] 0.2 0.2 0.2 0.2 0.2 0.3 0.3 0.2 0.2 0.1 ...
        ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
        ..$ .row_id     : int [1:45] 1 2 3 5 11 18 19 28 29 33 ...

---

    Code
      str(data_partition(iris, proportion = c(0.2, 0.5), seed = 123))
    Output
      List of 3
       $ p_0.2:'data.frame':	30 obs. of  6 variables:
        ..$ Sepal.Length: num [1:30] 4.6 4.4 4.3 4.6 5 5 5.4 5 4.4 5 ...
        ..$ Sepal.Width : num [1:30] 3.4 2.9 3 3.6 3 3.4 3.4 3.5 3.2 3.3 ...
        ..$ Petal.Length: num [1:30] 1.4 1.4 1.1 1 1.6 1.6 1.5 1.3 1.3 1.4 ...
        ..$ Petal.Width : num [1:30] 0.3 0.2 0.1 0.2 0.2 0.4 0.4 0.3 0.2 0.2 ...
        ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
        ..$ .row_id     : int [1:30] 7 9 14 23 26 27 32 41 43 50 ...
       $ p_0.5:'data.frame':	75 obs. of  6 variables:
        ..$ Sepal.Length: num [1:75] 4.6 5.4 5 4.9 4.8 5.8 5.7 5.4 5.1 5.7 ...
        ..$ Sepal.Width : num [1:75] 3.1 3.9 3.4 3.1 3.4 4 4.4 3.9 3.5 3.8 ...
        ..$ Petal.Length: num [1:75] 1.5 1.7 1.5 1.5 1.6 1.2 1.5 1.3 1.4 1.7 ...
        ..$ Petal.Width : num [1:75] 0.2 0.4 0.2 0.1 0.2 0.2 0.4 0.4 0.3 0.3 ...
        ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
        ..$ .row_id     : int [1:75] 4 6 8 10 12 15 16 17 18 19 ...
       $ test :'data.frame':	45 obs. of  6 variables:
        ..$ Sepal.Length: num [1:45] 5.1 4.9 4.7 5 5.4 4.8 5.4 5.1 5.2 4.9 ...
        ..$ Sepal.Width : num [1:45] 3.5 3 3.2 3.6 3.7 3 3.4 3.7 4.1 3.1 ...
        ..$ Petal.Length: num [1:45] 1.4 1.4 1.3 1.4 1.5 1.4 1.7 1.5 1.5 1.5 ...
        ..$ Petal.Width : num [1:45] 0.2 0.2 0.2 0.2 0.2 0.1 0.2 0.4 0.1 0.2 ...
        ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
        ..$ .row_id     : int [1:45] 1 2 3 5 11 13 21 22 33 35 ...

---

    Code
      str(data_partition(iris, proportion = 0.7, by = "Species", seed = 123))
    Output
      List of 2
       $ p_0.7:'data.frame':	105 obs. of  6 variables:
        ..$ Sepal.Length: num [1:105] 4.7 4.6 5 4.6 5 4.4 4.9 5.4 4.8 4.8 ...
        ..$ Sepal.Width : num [1:105] 3.2 3.1 3.6 3.4 3.4 2.9 3.1 3.7 3.4 3 ...
        ..$ Petal.Length: num [1:105] 1.3 1.5 1.4 1.4 1.5 1.4 1.5 1.5 1.6 1.4 ...
        ..$ Petal.Width : num [1:105] 0.2 0.2 0.2 0.3 0.2 0.2 0.1 0.2 0.2 0.1 ...
        ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
        ..$ .row_id     : int [1:105] 3 4 5 7 8 9 10 11 12 13 ...
       $ test :'data.frame':	45 obs. of  6 variables:
        ..$ Sepal.Length: num [1:45] 5.1 4.9 5.4 5.7 5.1 5.1 5.1 4.6 5.5 4.9 ...
        ..$ Sepal.Width : num [1:45] 3.5 3 3.9 4.4 3.5 3.8 3.7 3.6 4.2 3.1 ...
        ..$ Petal.Length: num [1:45] 1.4 1.4 1.7 1.5 1.4 1.5 1.5 1 1.4 1.5 ...
        ..$ Petal.Width : num [1:45] 0.2 0.2 0.4 0.4 0.3 0.3 0.4 0.2 0.2 0.2 ...
        ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
        ..$ .row_id     : int [1:45] 1 2 6 16 18 20 22 23 34 35 ...

---

    Code
      str(data_partition(iris, proportion = c(0.2, 0.5), by = "Species", seed = 123))
    Output
      List of 3
       $ p_0.2:'data.frame':	30 obs. of  6 variables:
        ..$ Sepal.Length: num [1:30] 4.7 4.3 5.8 4.8 5 4.8 5.5 4.5 4.4 4.6 ...
        ..$ Sepal.Width : num [1:30] 3.2 3 4 3.4 3 3.1 3.5 2.3 3.2 3.2 ...
        ..$ Petal.Length: num [1:30] 1.3 1.1 1.2 1.9 1.6 1.6 1.3 1.3 1.3 1.4 ...
        ..$ Petal.Width : num [1:30] 0.2 0.1 0.2 0.2 0.2 0.2 0.2 0.3 0.2 0.2 ...
        ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
        ..$ .row_id     : int [1:30] 3 14 15 25 26 31 37 42 43 48 ...
       $ p_0.5:'data.frame':	75 obs. of  6 variables:
        ..$ Sepal.Length: num [1:75] 5 5.4 5 4.4 4.9 5.4 4.8 4.8 5.7 5.4 ...
        ..$ Sepal.Width : num [1:75] 3.6 3.9 3.4 2.9 3.1 3.7 3.4 3 4.4 3.9 ...
        ..$ Petal.Length: num [1:75] 1.4 1.7 1.5 1.4 1.5 1.5 1.6 1.4 1.5 1.3 ...
        ..$ Petal.Width : num [1:75] 0.2 0.4 0.2 0.2 0.1 0.2 0.2 0.1 0.4 0.4 ...
        ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
        ..$ .row_id     : int [1:75] 5 6 8 9 10 11 12 13 16 17 ...
       $ test :'data.frame':	45 obs. of  6 variables:
        ..$ Sepal.Length: num [1:45] 5.1 4.9 4.6 4.6 5.7 5.4 4.6 5 5.2 4.7 ...
        ..$ Sepal.Width : num [1:45] 3.5 3 3.1 3.4 3.8 3.4 3.6 3.4 3.5 3.2 ...
        ..$ Petal.Length: num [1:45] 1.4 1.4 1.5 1.4 1.7 1.7 1 1.6 1.5 1.6 ...
        ..$ Petal.Width : num [1:45] 0.2 0.2 0.2 0.3 0.3 0.2 0.2 0.4 0.2 0.2 ...
        ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
        ..$ .row_id     : int [1:45] 1 2 4 7 19 21 23 27 28 30 ...