File: python-lib.rst

package info (click to toggle)
python-internetarchive 5.7.2-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,028 kB
  • sloc: python: 8,392; makefile: 235; xml: 180
file content (213 lines) | stat: -rw-r--r-- 5,995 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
.. _python-lib:

Python Library Usage
====================

The ``internetarchive`` Python library provides two main ways to interact with archive.org:

1. **Simple functional interface** via :mod:`internetarchive.api` - Easy to use for common tasks
2. **Flexible object-oriented interface** via :class:`~internetarchive.session.ArchiveSession` - More control for complex applications

Quick Start
-----------

The easiest way to get started is with the :mod:`internetarchive.api` module, which provides
simple functions for common operations:

.. code-block:: python

    from internetarchive import download, upload, search_items, get_item

    # Download files from an item
    download('TripDown1905', glob_pattern='*.mp4')

    # Search for items
    search = search_items('collection:opensource')
    for result in search:
        print(result['identifier'])

    # Get an item and work with it
    item = get_item('TripDown1905')
    print(item.metadata['title'])

For more control and to persist configuration across operations, use a :class:`~internetarchive.session.ArchiveSession`:

.. code-block:: python

    from internetarchive import get_session

    # Create a session with your configuration
    session = get_session(config_file='~/.config/ia.ini')

    # Use the session for all operations
    item = session.get_item('TripDown1905')
    item.download()
    search = session.search_items('subject:science')

Simple Functional Interface
---------------------------

The :mod:`internetarchive.api` module provides these convenient functions for common tasks:

.. automodule:: internetarchive.api
   :members:
   :exclude-members: get_username, get_user_info, configure
   :noindex:

These functions are great for scripts and simple applications. They automatically create
a session in the background for you. For complete documentation including all parameters,
see :ref:`api-module` in the reference.

Using Sessions
--------------

For more complex applications or when you need to perform multiple operations, use
the :class:`~internetarchive.session.ArchiveSession` class:

.. autoclass:: internetarchive.session.ArchiveSession
   :members:
   :exclude-members: set_file_logger, set_stream_logger, rebuild_auth,
                     mount_http_adapter, send, _get_user_agent_string,
                     s3_is_overloaded, get_tasks_api_rate_limit
   :noindex:

Creating a session:

.. code-block:: python

    from internetarchive import get_session

    # From config file
    session = get_session(config_file='~/.config/ia.ini')

    # From dictionary
    config = {
        's3': {
            'access': 'your_access_key',
            'secret': 'your_secret_key'
        }
    }
    session = get_session(config=config)

For complete session documentation, see :ref:`session-module`.

Working with Items
------------------

Once you have an item (from :func:`get_item` or :meth:`~internetarchive.session.ArchiveSession.get_item`), you can:

.. code-block:: python

    item = get_item('TripDown1905')

    # Access metadata
    print(item.metadata['title'])
    print(item.metadata['creator'])

    # Download files
    item.download(glob_pattern='*.mp4')

    # Upload new files
    item.upload(['file1.txt', 'file2.jpg'],
                metadata={'title': 'My New Files'})

    # Modify metadata
    item.modify_metadata({'subject': ['history', 'film']})

    # List files
    for file in item.files:
        print(file.name, file.format)

For complete item documentation, see :ref:`item-module`.

Searching for Items
-------------------

.. code-block:: python

    from internetarchive import search_items

    # Basic search
    search = search_items('collection:opensource movies')

    # Iterate through results
    for result in search:
        print(f"{result['identifier']}: {result.get('title', 'No title')}")

    # Get specific fields
    search = search_items('subject:science',
                          fields=['identifier', 'title', 'date'])
    for result in search:
        print(result)

For complete search documentation, see :ref:`search-module`.

Common Patterns
---------------

**Download all files from multiple items:**

.. code-block:: python

    from internetarchive import get_item

    identifiers = ['TripDown1905', 'goodytwoshoes00newyiala']
    for identifier in identifiers:
        item = get_item(identifier)
        item.download()

**Upload with custom metadata:**

.. code-block:: python

    from internetarchive import upload

    upload(
        'my-new-item-001',
        files=['document.pdf', 'cover.jpg'],
        metadata={
            'title': 'My Document',
            'mediatype': 'texts',
            'collection': 'opensource',
            'subject': ['documentation', 'tutorial']
        }
    )

**Search and process results:**

.. code-block:: python

    from internetarchive import search_items

    # Search with pagination
    search = search_items(
        'collection:prelinger',
        params={'rows': 50, 'page': 1}
    )

    # Collect identifiers
    identifiers = [result['identifier'] for result in search]

    # Process in batches
    for identifier in identifiers[:10]:  # First 10 items
        print(f"Processing {identifier}")

Configuration
-------------

The library needs your archive.org credentials for certain operations (uploading,
modifying metadata, etc.). You can configure it in several ways:

1. **Config file** (recommended): Use ``ia configure`` from the CLI or :func:`~internetarchive.api.configure` from Python
2. **Environment variables**: Set ``IA_ACCESS_KEY_ID`` and ``IA_SECRET_ACCESS_KEY``
3. **Python dictionary**: Pass credentials directly when creating a session

See :ref:`configuration` for complete configuration details.

Next Steps
----------

For complete documentation of all modules, classes, and methods, see :ref:`modules`.

For troubleshooting and advanced usage, check the examples in the
`GitHub repository <https://github.com/jjjake/internetarchive>`_.