File: CLAUDE.md

package info (click to toggle)
python-internetarchive 5.7.2-1
links: PTS, VCS
area: main
in suites: forky, sid
size: 1,028 kB
sloc: python: 8,392; makefile: 235; xml: 180
file content (110 lines) | stat: -rw-r--r-- 3,561 bytes
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Python library and `ia` CLI for interacting with archive.org. Used for uploading, downloading, searching, and managing items and their metadata. Also provides catalog task management and account administration utilities. Items are identified by a unique identifier and contain files and metadata.

## Common Commands

```bash
# Install for development
pip install -e .

# Run tests
pytest

# Run tests with linting
ruff check && pytest

# Run a single test file
pytest tests/test_api.py

# Run a specific test
pytest tests/test_api.py::test_get_item

# Multi-version testing (requires Python 3.9-3.14 installed)
tox

# Lint only
ruff check

# Build docs
pip install -r docs/requirements.txt
cd docs && make html
```

## Architecture

The library has a three-layer architecture:

**Layer 1 - Public API (`internetarchive/api.py`)**
Convenience functions that wrap the core classes: `get_item()`, `search_items()`, `upload()`, `download()`, `modify_metadata()`, `delete()`, `configure()`, `get_session()`.

**Layer 2 - Core Classes**
- `ArchiveSession` (`session.py`) - Extends `requests.Session`. Manages config, credentials, HTTP headers, connection pooling.
- `Item` (`item.py`) - Represents an Archive.org item. Contains files, metadata, and methods for download/upload/modify.
- `File` (`files.py`) - Represents a single file within an item. Handles download, delete, checksum verification.
- `Search` (`search.py`) - Query interface with pagination and field selection.

**Layer 3 - Supporting Modules**
- `config.py` - INI-based configuration (credentials at `~/.config/internetarchive/ia.ini` or `~/.ia`)
- `iarequest.py` - HTTP request builders (`MetadataRequest`, `S3Request`)
- `auth.py` - S3 authentication handlers
- `catalog.py` - Catalog task management

**CLI (`internetarchive/cli/`)**
- Entry point: `ia.py:main()` → registered as `ia` console script
- Subcommands: `ia_download.py`, `ia_upload.py`, `ia_metadata.py`, `ia_search.py`, `ia_list.py`, `ia_delete.py`, `ia_copy.py`, `ia_move.py`, `ia_tasks.py`, `ia_configure.py`, etc.

## Code Style

- Line length: 90 characters
- Linter: ruff (configured in `pyproject.toml`)
- Formatter: black
- Type checking: mypy (type stubs in `options.extras_require` under `types`)

## Key Dependencies

- `requests` - HTTP client
- `jsonpatch` - JSON patching for metadata updates
- `tqdm` - Progress bars
- `responses` - HTTP mocking for tests

## Contributing Notes

- All new features should be developed on a feature branch, not directly on master
- PRs require tests and must pass ruff linting
- Avoid introducing new dependencies
- Support Python 3.9+

## Releasing

To release a new version (must be on master with clean working directory):

```bash
# 1. Prepare release (updates __version__.py and HISTORY.rst date)
make prepare-release RELEASE=X.Y.Z

# 2. Review and commit version changes
git diff
git add -A && git commit -m "Bump version to X.Y.Z"

# 3. Publish to PyPI + archive.org + GitHub
make publish-all
```

Individual release targets:
- `make publish` - PyPI + GitHub release (no binary)
- `make publish-all` - PyPI + pex binary + GitHub release
- `make publish-binary` - pex binary only (after PyPI release)

The release process will:
- Run tests and linting
- Build the package
- Build and test the pex binary
- Create and push a git tag
- Upload to PyPI
- Upload binary to archive.org
- Create a GitHub release with changelog from HISTORY.rst