File: url-parsing.md

package info (click to toggle)
ruby-protocol-http 0.55.0-1
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 840 kB
  • sloc: ruby: 6,904; makefile: 4
file content (130 lines) | stat: -rw-r--r-- 4,051 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# URL Parsing

This guide explains how to use `Protocol::HTTP::URL` for parsing and manipulating URL components, particularly query strings and parameters.

## Overview

{ruby Protocol::HTTP::URL} provides utilities for parsing and manipulating URL components, particularly query strings and parameters. It offers robust encoding/decoding capabilities for complex parameter structures.

While basic query parameter encoding follows the `application/x-www-form-urlencoded` standard, there is no universal standard for serializing complex nested structures (arrays, nested objects) in URLs. Different frameworks use varying conventions for these cases, and this implementation follows common patterns where possible.

## Basic Query Parameter Parsing

``` ruby
require 'protocol/http/url'

# Parse query parameters from a URL:
reference = Protocol::HTTP::Reference.parse("/search?q=ruby&category=programming&page=2")
parameters = Protocol::HTTP::URL.decode(reference.query)
# => {"q" => "ruby", "category" => "programming", "page" => "2"}

# Symbolize keys for easier access:
parameters = Protocol::HTTP::URL.decode(reference.query, symbolize_keys: true)
# => {:q => "ruby", :category => "programming", :page => "2"}
```

## Complex Parameter Structures

The URL module handles nested parameters, arrays, and complex data structures:

``` ruby
# Array parameters:
query = "tags[]=ruby&tags[]=programming&tags[]=web"
parameters = Protocol::HTTP::URL.decode(query)
# => {"tags" => ["ruby", "programming", "web"]}

# Nested hash parameters:
query = "user[name]=John&user[email]=john@example.com&user[preferences][theme]=dark"
parameters = Protocol::HTTP::URL.decode(query)
# => {"user" => {"name" => "John", "email" => "john@example.com", "preferences" => {"theme" => "dark"}}}

# Mixed structures:
query = "filters[categories][]=books&filters[categories][]=movies&filters[price][min]=10&filters[price][max]=100"
parameters = Protocol::HTTP::URL.decode(query)
# => {"filters" => {"categories" => ["books", "movies"], "price" => {"min" => "10", "max" => "100"}}}
```

## Encoding Parameters to Query Strings

``` ruby
# Simple parameters:
parameters = {"search" => "protocol-http", "limit" => "20"}
query = Protocol::HTTP::URL.encode(parameters)
# => "search=protocol-http&limit=20"

# Array parameters:
parameters = {"tags" => ["ruby", "http", "protocol"]}
query = Protocol::HTTP::URL.encode(parameters)
# => "tags[]=ruby&tags[]=http&tags[]=protocol"

# Nested parameters:
parameters = {
	user: {
		profile: {
			name: "Alice",
			settings: {
				notifications: true,
				theme: "light"
			}
		}
	}
}
query = Protocol::HTTP::URL.encode(parameters)
# => "user[profile][name]=Alice&user[profile][settings][notifications]=true&user[profile][settings][theme]=light"
```

## URL Escaping and Unescaping

``` ruby
# Escape special characters:
Protocol::HTTP::URL.escape("hello world!")
# => "hello%20world%21"

# Escape path components (preserves path separators):
Protocol::HTTP::URL.escape_path("/path/with spaces/file.html")
# => "/path/with%20spaces/file.html"

# Unescape percent-encoded strings:
Protocol::HTTP::URL.unescape("hello%20world%21")
# => "hello world!"

# Handle Unicode characters:
Protocol::HTTP::URL.escape("café")
# => "caf%C3%A9"

Protocol::HTTP::URL.unescape("caf%C3%A9")
# => "café"
```

## Scanning and Processing Query Strings

For custom processing, you can scan query strings directly:

``` ruby
query = "name=John&age=30&active=true"

Protocol::HTTP::URL.scan(query) do |key, value|
	puts "#{key}: #{value}"
end
# Output:
# name: John
# age: 30
# active: true
```

## Security and Limits

The URL module includes built-in protection against deeply nested parameter attacks:

``` ruby
# This will raise an error to prevent excessive nesting:
begin
	Protocol::HTTP::URL.decode("a[b][c][d][e][f][g][h][i]=value")
rescue ArgumentError => error
	puts error.message
	# => "Key length exceeded limit!"
end

# You can adjust the maximum nesting level:
Protocol::HTTP::URL.decode("a[b][c]=value", 5)  # Allow up to 5 levels of nesting
```