File: basics_nesteddatasets.rst.in

package info (click to toggle)
datalad 0.14.0-1
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 7,576 kB
  • sloc: python: 63,618; javascript: 25,500; sh: 1,823; makefile: 226
file content (188 lines) | stat: -rw-r--r-- 5,957 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
.. The content of this file was auto-generated by your mad uncle from Germany
   using the cast2rst script and a recorded asciicast as input.

   Do not edit this file!



DataLad provides seamless management of nested Git repositories...

Let's create a dataset

.. code-block:: ansi-color

   ~ % datalad create demo
   [INFO   ] Creating a new annex repo at /demo/demo 
   create(ok): /demo/demo (dataset)
   ~ % cd demo


A DataLad dataset is just a Git repo with some initial configuration

.. code-block:: ansi-color

   ~/demo % git log --oneline
   472e34b (HEAD -> master) [DATALAD] new dataset
   f968257 [DATALAD] Set default backend for all files to be MD5E


We can generate nested datasets, by telling DataLad to register a
new dataset in a parent dataset

.. code-block:: ansi-color

   ~/demo % datalad create -d . sub1
   [INFO   ] Creating a new annex repo at /demo/demo/sub1 
   add(ok): sub1 (dataset) [added new subdataset]
   add(notneeded): sub1 (dataset) [nothing to add from /demo/demo/sub1]
   add(notneeded): .gitmodules (file) [already included in the dataset]
   save(ok): /demo/demo (dataset)
   create(ok): sub1 (dataset)
   action summary:
     add (notneeded: 2, ok: 1)
     create (ok: 1)
     save (ok: 1)


A subdataset is nothing more than regular Git submodule

.. code-block:: ansi-color

   ~/demo % git submodule
    5f0cddf2026e3fb4864139f27e7415fd72c7d4d0 sub1 (heads/master)


Of course subdatasets can be nested

.. code-block:: ansi-color

   ~/demo % datalad create -d . sub1/justadir/sub2
   [INFO   ] Creating a new annex repo at /demo/demo/sub1/justadir/sub2 
   add(ok): sub1/justadir/sub2 (dataset) [added new subdataset]
   add(notneeded): sub1/justadir/sub2 (dataset) [nothing to add from /demo/demo/sub1/justadir/sub2]
   add(notneeded): sub1/.gitmodules (file) [already included in the dataset]
   add(notneeded): sub1 (dataset) [already known subdataset]
   save(ok): /demo/demo/sub1 (dataset)
   save(ok): /demo/demo (dataset)
   create(ok): sub1/justadir/sub2 (dataset)
   action summary:
     add (notneeded: 3, ok: 1)
     create (ok: 1)
     save (ok: 2)


Unlike Git, DataLad automatically takes care of committing all
changes associated with the added subdataset up to the given
parent dataset

.. code-block:: ansi-color

   ~/demo % git status
   On branch master
   nothing to commit, working tree clean


Let's create some content in the deepest subdataset

.. code-block:: ansi-color

   ~/demo % mkdir sub1/justadir/sub2/anotherdir
   ~/demo % touch sub1/justadir/sub2/anotherdir/afile


Git can only tell us that something underneath the top-most
subdataset was modified

.. code-block:: ansi-color

   ~/demo % git status
   On branch master
   Changes not staged for commit:
     (use "git add <file>..." to update what will be committed)
     (use "git checkout -- <file>..." to discard changes in working directory)
     (commit or discard the untracked or modified content in submodules)
   
   	modified:   sub1 (untracked content)
   
   no changes added to commit (use "git add" and/or "git commit -a")


DataLad saves us from further investigation

.. code-block:: ansi-color

   ~/demo % datalad diff -r
      modified(dataset): sub1
      modified(dataset): sub1/justadir/sub2
   untracked(directory): sub1/justadir/sub2/anotherdir


Like Git, it can report individual untracked files, but also across
repository boundaries

.. code-block:: ansi-color

   ~/demo % datalad diff -r --report-untracked all
      modified(dataset): sub1
      modified(dataset): sub1/justadir/sub2
        untracked(file): sub1/justadir/sub2/anotherdir/afile


Adding this new content with Git or git-annex would be an exercise

.. code-block:: ansi-color

   ~/demo % git add sub1/justadir/sub2/anotherdir/afile
   fatal: Pathspec 'sub1/justadir/sub2/anotherdir/afile' is in submodule 'sub1'


DataLad does not require users to determine the correct repository
in the tree

.. code-block:: ansi-color

   ~/demo % datalad add -d . sub1/justadir/sub2/anotherdir/afile
   add(ok): sub1/justadir/sub2/anotherdir/afile (file)
   save(ok): /demo/demo/sub1/justadir/sub2 (dataset)
   save(ok): /demo/demo/sub1 (dataset)
   save(ok): /demo/demo (dataset)
   action summary:
     add (ok: 1)
     save (ok: 3)


Again, all associated changes in the entire dataset tree, up to
the given parent dataset, were committed

.. code-block:: ansi-color

   ~/demo % git status
   On branch master
   nothing to commit, working tree clean


DataLad's 'diff' is able to report the changes from these related
commits throughout the repository tree

.. code-block:: ansi-color

   ~/demo % datalad diff --revision @~1 -r
      modified(dataset): sub1
      modified(dataset): sub1/justadir/sub2
            added(file): sub1/justadir/sub2/anotherdir/afile


 _____________________________________
/ Demo was using datalad 0.9.2.dev1.  \
\ Discover more at http://datalad.org /
 -------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||


.. code-block:: ansi-color

   ~/demo % exit