1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213
|
.. _python-lib:
Python Library Usage
====================
The ``internetarchive`` Python library provides two main ways to interact with archive.org:
1. **Simple functional interface** via :mod:`internetarchive.api` - Easy to use for common tasks
2. **Flexible object-oriented interface** via :class:`~internetarchive.session.ArchiveSession` - More control for complex applications
Quick Start
-----------
The easiest way to get started is with the :mod:`internetarchive.api` module, which provides
simple functions for common operations:
.. code-block:: python
from internetarchive import download, upload, search_items, get_item
# Download files from an item
download('TripDown1905', glob_pattern='*.mp4')
# Search for items
search = search_items('collection:opensource')
for result in search:
print(result['identifier'])
# Get an item and work with it
item = get_item('TripDown1905')
print(item.metadata['title'])
For more control and to persist configuration across operations, use a :class:`~internetarchive.session.ArchiveSession`:
.. code-block:: python
from internetarchive import get_session
# Create a session with your configuration
session = get_session(config_file='~/.config/ia.ini')
# Use the session for all operations
item = session.get_item('TripDown1905')
item.download()
search = session.search_items('subject:science')
Simple Functional Interface
---------------------------
The :mod:`internetarchive.api` module provides these convenient functions for common tasks:
.. automodule:: internetarchive.api
:members:
:exclude-members: get_username, get_user_info, configure
:noindex:
These functions are great for scripts and simple applications. They automatically create
a session in the background for you. For complete documentation including all parameters,
see :ref:`api-module` in the reference.
Using Sessions
--------------
For more complex applications or when you need to perform multiple operations, use
the :class:`~internetarchive.session.ArchiveSession` class:
.. autoclass:: internetarchive.session.ArchiveSession
:members:
:exclude-members: set_file_logger, set_stream_logger, rebuild_auth,
mount_http_adapter, send, _get_user_agent_string,
s3_is_overloaded, get_tasks_api_rate_limit
:noindex:
Creating a session:
.. code-block:: python
from internetarchive import get_session
# From config file
session = get_session(config_file='~/.config/ia.ini')
# From dictionary
config = {
's3': {
'access': 'your_access_key',
'secret': 'your_secret_key'
}
}
session = get_session(config=config)
For complete session documentation, see :ref:`session-module`.
Working with Items
------------------
Once you have an item (from :func:`get_item` or :meth:`~internetarchive.session.ArchiveSession.get_item`), you can:
.. code-block:: python
item = get_item('TripDown1905')
# Access metadata
print(item.metadata['title'])
print(item.metadata['creator'])
# Download files
item.download(glob_pattern='*.mp4')
# Upload new files
item.upload(['file1.txt', 'file2.jpg'],
metadata={'title': 'My New Files'})
# Modify metadata
item.modify_metadata({'subject': ['history', 'film']})
# List files
for file in item.files:
print(file.name, file.format)
For complete item documentation, see :ref:`item-module`.
Searching for Items
-------------------
.. code-block:: python
from internetarchive import search_items
# Basic search
search = search_items('collection:opensource movies')
# Iterate through results
for result in search:
print(f"{result['identifier']}: {result.get('title', 'No title')}")
# Get specific fields
search = search_items('subject:science',
fields=['identifier', 'title', 'date'])
for result in search:
print(result)
For complete search documentation, see :ref:`search-module`.
Common Patterns
---------------
**Download all files from multiple items:**
.. code-block:: python
from internetarchive import get_item
identifiers = ['TripDown1905', 'goodytwoshoes00newyiala']
for identifier in identifiers:
item = get_item(identifier)
item.download()
**Upload with custom metadata:**
.. code-block:: python
from internetarchive import upload
upload(
'my-new-item-001',
files=['document.pdf', 'cover.jpg'],
metadata={
'title': 'My Document',
'mediatype': 'texts',
'collection': 'opensource',
'subject': ['documentation', 'tutorial']
}
)
**Search and process results:**
.. code-block:: python
from internetarchive import search_items
# Search with pagination
search = search_items(
'collection:prelinger',
params={'rows': 50, 'page': 1}
)
# Collect identifiers
identifiers = [result['identifier'] for result in search]
# Process in batches
for identifier in identifiers[:10]: # First 10 items
print(f"Processing {identifier}")
Configuration
-------------
The library needs your archive.org credentials for certain operations (uploading,
modifying metadata, etc.). You can configure it in several ways:
1. **Config file** (recommended): Use ``ia configure`` from the CLI or :func:`~internetarchive.api.configure` from Python
2. **Environment variables**: Set ``IA_ACCESS_KEY_ID`` and ``IA_SECRET_ACCESS_KEY``
3. **Python dictionary**: Pass credentials directly when creating a session
See :ref:`configuration` for complete configuration details.
Next Steps
----------
For complete documentation of all modules, classes, and methods, see :ref:`modules`.
For troubleshooting and advanced usage, check the examples in the
`GitHub repository <https://github.com/jjjake/internetarchive>`_.
|