1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
|
Using GNU Parallel with ia
==========================
`GNU Parallel <https://www.gnu.org/software/parallel/>`_ is a shell tool for executing jobs in parallel.
It is a very useful tool to use with ``ia`` for bulk jobs.
It can be installed via many OS package managers.
For example, it can be installed via `homebrew <https://brew.sh/>`_ on Mac OS::
brew install parallel
Refer to the `GNU Parallel homepage <https://www.gnu.org/software/parallel/>`_ for more details on available packaes, source code, installation, and other documentation and tutorials.
Basic Usage
-----------
You can use ``parallel`` to retrieve metadata from archive.org items concurrently:
.. code:: bash
$ cat itemlist.txt
jj-test-2020-09-17-1
jj-test-2020-09-17-2
jj-test-2020-09-17-3
$ cat itemlist.txt | parallel 'ia metadata {}' | jq .metadata.date
"1999"
"1999"
"1999"
You can run ``parallel`` with ``--dry-run`` to check your commands before running them:
.. code:: bash
$ cat itemlist.txt | parallel --dry-run 'ia metadata {}'
ia metadata jj-test-2020-09-17-2
ia metadata jj-test-2020-09-17-1
ia metadata jj-test-2020-09-17-3
Logging and retrying with Parallel
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Parallel also offers an easy way to log and retry failed commands.
Here's an example of a job that is retrieving metadata for all of the items in the file named ``itemlist.txt``, and outputting the metadata to a file named ``output.jsonl``.
It uses the ``--joblog`` option to log all commands and their exit value to ``/tmp/my_ia_job.log``:
.. code:: bash
$ cat itemlist.txt | parallel --joblog /tmp/my_ia_job.log 'ia metadata {}' > output.jsonl
You can now retry any commands that failed by using the ``--retry-failed`` option (don't forget to switch ``>`` to ``>>`` in this example, so you don't overwrite ``output.jsonl``! ``>>`` means to append to the output file, rather than clobber it):
.. code:: bash
$ parallel --retry-failed --joblog /tmp/my_ia_job.log 'ia metadata {}' >> output.jsonl
If there were no failed commands, nothing will be rerun.
You can rerun this command until it exits with ``0``.
You can check the exit code by running ``echo $?`` directly after the ``parallel`` command finishes.
Resources
_________
- Intro videos: `https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1 <https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1>`_
- Cheat sheet: `https://www.gnu.org/software/parallel/parallel_cheat.pdf <https://www.gnu.org/software/parallel/parallel_cheat.pdf>`_
- Examples from the man page: `https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Working-as-xargs--n1.-Argument-appending <https://www.gnu.org/software/parallel/man.html#EXAMPLE:-Working-as-xargs--n1.-Argument-appending>`_
|