File: README.md

package info (click to toggle)
url-normalize 2.2.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 268 kB
  • sloc: python: 935; makefile: 16; sh: 8
file content (151 lines) | stat: -rw-r--r-- 4,708 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
# url-normalize

[![tests](https://github.com/niksite/url-normalize/actions/workflows/ci.yml/badge.svg)](https://github.com/niksite/url-normalize/actions/workflows/ci.yml)
[![Coveralls](https://img.shields.io/coveralls/github/niksite/url-normalize/master.svg)](https://coveralls.io/r/niksite/url-normalize)
[![PyPI](https://img.shields.io/pypi/v/url-normalize.svg)](https://pypi.org/project/url-normalize/)

A Python library for standardizing and normalizing URLs with support for internationalized domain names (IDN).

## Table of Contents

- [Introduction](#introduction)
- [Features](#features)
- [Installation](#installation)
- [Usage](#usage)
  - [Python API](#python-api)
  - [Command Line](#command-line-usage)
- [Documentation](#documentation)
- [Contributing](#contributing)
- [License](#license)

## Introduction

url-normalize provides a robust URI normalization function that:

- Takes care of IDN domains.
- Always provides the URI scheme in lowercase characters.
- Always provides the host, if any, in lowercase characters.
- Only performs percent-encoding where it is essential.
- Always uses uppercase A-through-F characters when percent-encoding.
- Prevents dot-segments appearing in non-relative URI paths.
- For schemes that define a default authority, uses an empty authority if the
  default is desired.
- For schemes that define an empty path to be equivalent to a path of "/",
  uses "/".
- For schemes that define a port, uses an empty port if the default is desired
- Ensures all portions of the URI are utf-8 encoded NFC from Unicode strings

Inspired by Sam Ruby's [urlnorm.py](http://intertwingly.net/blog/2004/08/04/Urlnorm)

## Features

- **IDN Support**: Full internationalized domain name handling
- **Configurable Defaults**:
  - Customizable default scheme (https by default)
  - Configurable default domain for absolute paths
- **Query Parameter Control**:
  - Parameter filtering with allowlists
  - Support for domain-specific parameter rules
- **Versatile URL Handling**:
  - Empty string URLs
  - Double slash URLs (//domain.tld)
  - Shebang (#!) URLs
- **Developer Friendly**:
  - Cross-version Python compatibility (3.8+)
  - 100% test coverage
  - Modern type hints and string handling

## Installation

```sh
pip install url-normalize
```

## Usage

### Python API

```python
from url_normalize import url_normalize

# Basic normalization (uses https by default)
print(url_normalize("www.foo.com:80/foo"))
# Output: https://www.foo.com/foo

# With custom default scheme
print(url_normalize("www.foo.com/foo", default_scheme="http"))
# Output: http://www.foo.com/foo

# With query parameter filtering enabled
print(url_normalize("www.google.com/search?q=test&utm_source=test", filter_params=True))
# Output: https://www.google.com/search?q=test

# With custom parameter allowlist as a dict
print(url_normalize(
    "example.com?page=1&id=123&ref=test",
    filter_params=True,
    param_allowlist={"example.com": ["page", "id"]}
))
# Output: https://example.com?page=1&id=123

# With custom parameter allowlist as a list
print(url_normalize(
    "example.com?page=1&id=123&ref=test",
    filter_params=True,
    param_allowlist=["page", "id"]
))
# Output: https://example.com?page=1&id=123

# With default domain for absolute paths
print(url_normalize("/images/logo.png", default_domain="example.com"))
# Output: https://example.com/images/logo.png

# With default domain and custom scheme
print(url_normalize("/images/logo.png", default_scheme="http", default_domain="example.com"))
# Output: http://example.com/images/logo.png
```

### Command-line Usage

You can also use `url-normalize` from the command line:

```bash
$ url-normalize "www.foo.com:80/foo"
# Output: https://www.foo.com/foo

# With custom default scheme
$ url-normalize -s http "www.foo.com/foo"
# Output: http://www.foo.com/foo

# With query parameter filtering
$ url-normalize -f "www.google.com/search?q=test&utm_source=test"
# Output: https://www.google.com/search?q=test

# With custom allowlist
$ url-normalize -f -p page,id "example.com?page=1&id=123&ref=test"
# Output: https://example.com/?page=1&id=123

# With default domain for absolute paths
$ url-normalize -d example.com "/images/logo.png"
# Output: https://example.com/images/logo.png

# With default domain and custom scheme
$ url-normalize -d example.com -s http "/images/logo.png"
# Output: http://example.com/images/logo.png

# Via uv tool/uvx
$ uvx url-normalize www.foo.com:80/foo
# Output: https://www.foo.com:80/foo
```

## Documentation

For a complete history of changes, see [CHANGELOG.md](CHANGELOG.md).

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## License

MIT License