File: cli.rst

package info (click to toggle)
python-internetarchive 1.8.1-1%2Bdeb10u1
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 832 kB
  • sloc: python: 4,646; makefile: 180; xml: 180
file content (370 lines) | stat: -rw-r--r-- 12,105 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
Command-Line Interface
======================

The ``ia`` command-line tool is installed with ``internetarchive``, or `available as a binary <installation.html#binaries>`_. ``ia`` allows you to interact with various archive.org services from the command-line.

Getting Started
---------------

The easiest way to start using ``ia`` is downloading a binary. The only requirements of the binary are a Unix-like environment with Python installed. To download the latest binary, and make it executable simply:

.. code:: bash

    $ curl -LOs https://archive.org/download/ia-pex/ia
    $ chmod +x ia
    $ ./ia help
    A command line interface to archive.org.

    usage:
        ia [--help | --version]
        ia [--config-file FILE] [--log | --debug] [--insecure] <command> [<args>]...

    options:
        -h, --help
        -v, --version
        -c, --config-file FILE  Use FILE as config file.
        -l, --log               Turn on logging [default: False].
        -d, --debug             Turn on verbose logging [default: False].
        -i, --insecure          Use HTTP for all requests instead of HTTPS [default: false]

    commands:
        help      Retrieve help for subcommands.
        configure Configure `ia`.
        metadata  Retrieve and modify metadata for items on archive.org.
        upload    Upload items to archive.org.
        download  Download files from archive.org.
        delete    Delete files from archive.org.
        search    Search archive.org.
        tasks     Retrieve information about your archive.org catalog tasks.
        list      List files in a given item.

    See 'ia help <command>' for more information on a specific command.


Metadata
--------

Reading Metadata
^^^^^^^^^^^^^^^^

You can use ``ia`` to read and write metadata from archive.org. To retrieve all of an item's metadata in JSON, simply:

.. code:: bash

    $ ia metadata TripDown1905

A particularly useful tool to use alongside ``ia`` is `jq <https://stedolan.github.io/jq/>`_. ``jq`` is a command-line tool for parsing JSON. For example:

.. code:: bash

    $ ia metadata TripDown1905 | jq '.metadata.date'
    "1906"


Modifying Metadata
^^^^^^^^^^^^^^^^^^

Once ``ia`` has been `configured <quickstart.html#configuring>`_, you can modify metadata:

.. code:: bash

    $ ia metadata <identifier> --modify="foo:bar" --modify="baz:foooo"

You can remove a metadata field by setting the value of the given field to ``REMOVE_TAG``.
For example, to remove the metadata field ``foo`` from the item ``<identifier>``:

.. code:: bash

    $ ia metadata <identifier> --modify="foo:REMOVE_TAG"

Note that some metadata fields (e.g. ``mediatype``) cannot be modified, and must instead be set initially on upload.

The default target to write to is ``metadata``. If you would like to write to another target, such as ``files``, you can specify so using the ``--target`` parameter. For example, if we had an item whose identifier was ``my_identifier`` and we wanted to add a metadata field to a file within the item called ``foo.txt``: 

.. code:: bash

    $ ia metadata my_identifier --target="files/foo.txt" --modify="title:My File"

You can also create new targets if they don't exist:

.. code:: bash

    $ ia metadata <identifier> --target="extra_metadata" --modify="foo:bar"

There is also an ``--append`` option which allows you to append a string to an existing metadata strings (Note: use ``--append-list`` for appending elments to a list).
For example, if your item's title was ``Foo`` and you wanted it to be ``Foo Bar``, you could simply do:

.. code:: bash

    $ ia metadata <identifier> --append="title: Bar"

If you would like to add a new value to an existing field that is an array (like ``subject`` or ``collection``), you can use the ``--append-list`` option:

.. code:: bash

    $ ia metadata <identifier> --append-list="subject:another subject"

This command would append ``another subject`` to the items list of subjects, if it doesn't already exist (i.e. no duplicate elements are added).

Metadata fields or elements can be removed with the ``--remove`` option:

.. code:: bash

    $ ia metadata <identifier> --remove="subject:another subject"

This would remove ``another subject`` from the items subject field, regardless of whether or not the field is a single or multi-value field.


Refer to `Internet Archive Metadata <metadata.html>`_ for more specific details regarding metadata and archive.org.


Modifying Metadata in Bulk
^^^^^^^^^^^^^^^^^^^^^^^^^^

If you have a lot of metadata changes to submit, you can use a CSV spreadsheet to submit many changes with a single command.
Your CSV must contain an ``identifier`` column, with one item per row. Any other column added will be treated as a metadata field to modify. If no value is provided in a given row for a column, no changes will be submitted. If you would like to specify multiple values for certain fields, an index can be provided: ``subject[0]``, ``subject[1]``. Your CSV file should be UTF-8 encoded. See `metadata.csv <https://archive.org/download/ia-pex/metadata.csv>`_ for an example CSV file.

Once you're ready to submit your changes, you can submit them like so:

.. code:: bash

    $ ia metadata --spreadsheet=metadata.csv

See ``ia help metadata`` for more details.


Upload
------

``ia`` can also be used to upload items to archive.org. After `configuring ia <quickstart.html#configuring>`__, you can upload files like so:

.. code:: bash

    $ ia upload <identifier> file1 file2 --metadata="mediatype:texts" --metadata="blah:arg"

Please note that, unless specified otherwise, items will be uploaded with a ``data`` mediatype. **This cannot be changed afterwards.** Therefore, you should specify a mediatype when uploading, eg. ``--metadata="mediatype:movies"``

You can upload files from ``stdin``:

.. code:: bash

    $ curl http://dumps.wikimedia.org/kywiki/20130927/kywiki-20130927-pages-logging.xml.gz \
      | ia upload <identifier> - --remote-name=kywiki-20130927-pages-logging.xml.gz --metadata="title:Uploaded from stdin."

You can use the ``--retries`` parameter to retry on errors (i.e. if IA-S3 is overloaded):

.. code:: bash
    
    $ ia upload <identifier> file1 --retries 10

Note that ``ia upload`` makes a backup of any files that are clobbered.
They are saved to a directory in the item named ``history/files/``.
The files are named in the format ``$key.~N~``.
These files can be deleted like normal files.
You can also prevent the backup from happening on clobbers by adding ``-H x-archive-keep-old-version:0`` to your command.

Refer to `archive.org Identifiers <metadata.html#archive-org-identifiers>`_ for more information on creating valid archive.org identifiers.
Please also read the `Internet Archive Items <items.html>`_ page before getting started.

Bulk Uploading
^^^^^^^^^^^^^^

Uploading in bulk can be done similarly to `Modifying Metadata in Bulk`_. The only difference is that you must provide a ``file`` column which contains a relative or absolute path to your file. Please see `uploading.csv <https://archive.org/download/ia-pex/uploading.csv>`_ for an example.

Once you are ready to start your upload, simply run:

.. code:: bash

    $ ia upload --spreadsheet=uploading.csv


See ``ia help upload`` for more details.


Download
--------


Download an entire item:

.. code:: bash

    $ ia download TripDown1905

Download specific files from an item:

.. code:: bash

    $ ia download TripDown1905 TripDown1905_512kb.mp4 TripDown1905.ogv

Download specific files matching a glob pattern:

.. code:: bash

    $ ia download TripDown1905 --glob="*.mp4"

Note that you may have to escape the ``*`` differently depending on your shell (e.g. ``\*.mp4``, ``'*.mp4'``, etc.).

Download only files of a specific format:

.. code:: bash

    $ ia download TripDown1905 --format='512Kb MPEG4'

Note that ``--format`` cannot be used with ``--glob``.
You can get a list of the formats of a given item like so:

.. code:: bash

    $ ia metadata --formats TripDown1905

Download an entire collection:

.. code:: bash

    $ ia download --search 'collection:glasgowschoolofart'

Download from an itemlist:

.. code:: bash

    $ ia download --itemlist itemlist.txt

See ``ia help download`` for more details.


Downloading On-The-Fly Files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Some files on archive.org are generated on-the-fly as requested. This currently includes non-original files of the formats EPUB, MOBI, DAISY, and archive.org's own MARC XML. These files can be downloaded using the ``--on-the-fly`` parameter:

.. code:: bash

    $ ia download goodytwoshoes00newyiala --on-the-fly


Delete
------

You can use ``ia`` to delete files from archive.org items:

.. code:: bash

    $ ia delete <identifier> <file>

Delete a file *and* all files derived from the specified file:

.. code:: bash

    $ ia delete <identifier> <file> --cascade

Delete all files in an item:

.. code:: bash

    $ ia delete <identifier> --all

Note that ``ia delete`` makes a backup of any files that are deleted.
They are saved to a directory in the item named ``history/files/``.
The files are named in the format ``$key.~N~``.
These files can be deleted like normal files.
You can also prevent the backup from happening on deletes by adding ``-H x-archive-keep-old-version:0`` to your command.

See ``ia help delete`` for more details.


Search
------

``ia`` can also be used for retrieving archive.org search results in JSON:

.. code:: bash

    $ ia search 'subject:"market street" collection:prelinger'
    
By default, ``ia search`` attempts to return all items meeting the search criteria,
and the results are sorted by item identifier. If you want to just select the top ``n``
items, you can specify a ``page`` and ``rows`` parameter. For example, to get the 
top 20 items matching the search 'dogs':

.. code:: bash

    $ ia search --parameters="page=1&rows=20" "dogs"

You can use ``ia search`` to create an itemlist:

.. code:: bash

    $ ia search 'collection:glasgowschoolofart' --itemlist > itemlist.txt

You can pipe your itemlist into a GNU Parallel command to download items concurrently:

.. code:: bash

    $ ia search 'collection:glasgowschoolofart' --itemlist | parallel 'ia download {}'

See ``ia help search`` for more details.


Tasks
-----

You can also use ``ia`` to retrieve information about your catalog tasks, after `configuring ia <https://github.com/jjjake/internetarchive#configuring>`__.
To retrieve the task history for an item, simply run:

.. code:: bash

    $ ia tasks <identifier>

View all of your queued and running archive.org tasks:

.. code:: bash

    $ ia tasks

See ``ia help tasks`` for more details.


List
----

You can list files in an item like so:

.. code:: bash

    $ ia list goodytwoshoes00newyiala

See ``ia help list`` for more details.


Copy
----

You can copy files in archive.org items like so:

.. code:: bash

    $ ia copy <src-identifier>/<src-filename> <dest-identifier>/<dest-filename>

If you're copying your file to a new item, you can provide metadata as well:

.. code:: bash

    $ ia copy <src-identifier>/<src-filename> <dest-identifier>/<dest-filename> --metadata 'title:My New Item' --metadata collection:test_collection

Note that ``ia copy`` makes a backup of any files that are clobbered.
They are saved to a directory in the item named ``history/files/``.
The files are named in the format ``$key.~N~``.
These files can be deleted like normal files.
You can also prevent the backup from happening on clobbers by adding ``-H x-archive-keep-old-version:0`` to your command.

Move
----

``ia move`` works just like ``ia copy`` except the source file is deleted after the file has been successfully copied.

Note that ``ia move`` makes a backup of any files that are clobbered or deleted.
They are saved to a directory in the item named ``history/files/``.
The files are named in the format ``$key.~N~``.
These files can be deleted like normal files.
You can also prevent the backup from happening on clobbers or deletes by adding ``-H x-archive-keep-old-version:0`` to your command.