1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307
|
# HTTP Client
<!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
**Table of Contents**
- [HTTP Client](#http-client)
- [Problems](#problems)
- [REST Client](#rest-client)
- [Persistent connections](#persistent-connections)
- [Routing](#routing)
- [Inconsistencies](#inconsistencies)
- [Error Handling](#error-handling)
- [SSL Trust Stores](#ssl-trust-stores)
- [Proposal](#proposal)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Design](#design)
- [Classes](#classes)
- [Client](#client)
- [Connection Pool](#connection-pool)
- [Route](#route)
- [Service](#service)
- [Resolvers](#resolvers)
- [Session](#session)
- [Routing](#routing-1)
- [DNS SRV](#dns-srv)
- [Server List](#server-list)
- [Default Puppet Settings](#default-puppet-settings)
- [Generic HTTP(S) Requests](#generic-https-requests)
- [Puppetserver](#puppetserver)
<!-- markdown-toc end -->
## Problems
This is a proposal to restructure the HTTP client code in puppet to solve the
following problems.
### REST Client
It's difficult to use puppet as a library to call our own REST APIs due to the
coupling of puppet's http code with the indirector. As a result, users have
created REST clients, but they don't behave the same way our agent does, such as
serialization and deserialization of rich data, `server_list` for high
availability, and JSON to PSON content negotiation, etc.
It would be beneficial to the puppet ecosystem to have an REST client that's
reusable by more than the agent.
### Persistent connections
Persistent HTTP connections allow puppet to establish an HTTP(S) connection once
and reuse it for multiple HTTP requests. This avoids making a new TCP connection
and SSL handshake for each request. This is important for pluginsync, due to the
large number of individual GET requests. However, persistent connections are not
enabled by default, and must be opted into, as was recently done for `puppet
device` and `puppet plugin download`. More than likely, other applications
should be using persistent connections, but aren't.
### Routing
Puppet supports 3 ways of routing connections: DNS SRV records, server list, and
static puppet settings. However, some routing methods are not consistently
applied. For example, `puppet plugin download` and `puppet report upload` don't
observe server list.
Once a route has been determined, puppet stores the last used server and port in
Puppet's context system, but it's more of a hack than anything. As a result,
it's difficult to know how the last used server and port were set and when to
invalidate them.
### Inconsistencies
`Puppet::Network::HTTP::Connection` supports two ways of making GET and POST
requests, but they don't behave consistently when handling HTTP redirects, the
`Retry-After` header, server and proxy authentication, and exception handling.
### Error Handling
The `Puppet::Network::HttpPool` and related classes don't specify which
exceptions can be raised. Instead they pass through whatever exceptions ruby
raises. Everything from `SocketError` to `SystemCallError` to
`OpenSSL::SSL::SSLError` to `Net::ProtocolError` and `TimeoutError`. As a
result, it's hard for clients to build higher level abstractions.
### SSL Trust Stores
Puppet only trusts the puppet PKI when connecting to puppet infrastructure, but
needs to additionally trust the system cert store for requests like PMT and
downloading files from https sources. However, the current API doesn't allow the
caller to do that, which is why `Puppet::Util::HttpProxy#request_with_redirects`
duplicates the logic from`Puppet::Network::HTTP::Connection#request_with_redirects`.
## Proposal
In order to solve these problems, I propose creating an HTTP client in puppet
with the following goals:
### Goals
* Implement a REST client in puppet capable of serializing/deserializing puppet
objects like Catalog, Report, etc.
* Reuse the existing networking code as much as possible, such as
`Puppet::Network::HTTP::Pool`, but restructure it with a clear API.
* Always use persistent connections unless the caller explicitly opts out.
* Handle server resolution (via DNS SRV, etc) in a consistent way.
* Define an exception hierarchy for the API so that `Net::HTTP` specific
exceptions don't leak out.
* Make it possible to use the system trust store for a single HTTPS request.
### Non-Goals
* Ruby's builtin `Net::HTTP` library is fairly buggy, however, we're not switching
away from it right now. We may in the future, but it's out of scope.
* Serialization of puppet domain objects requires pops, rich data and loaders.
As a result, creating a standalone puppet-http gem is out of scope.
## Design
### Classes

#### Client
Has a pool of persistent HTTP connections and creates HTTP sessions. Closes
persistent connections when its close method is called.
Has low-level HTTP methods, such as `get`, `post`, etc which take the path,
headers, options, and allow the caller to stream the request and response body.
Returns `Puppet::HTTP::Response` with the response code, etc.
#### Connection Pool
Maintains the pool of persistent `Net::HTTP` connections, keeping track of when
idle connections expire. The `with_connection` method takes a block, which
ensures borrowed connections are always returned to the pool.
#### Route
Defines a route to a REST service. Includes the API prefix, DNS SRV service name,
and puppet server and port settings for that service.
#### Service
Represents an instance of a puppet web service. Includes the URL used to connect
to the service, such as `https://puppet:8140/puppet/v3`. There are four
services: `ca`, `report`, `fileserver`, and the default `puppet`.
The `ca` and `report` services handle certs and reports, respectively. The
`fileserver` service handles puppet file metadata and content requests, such as
pluginsync and file resources with `source => 'puppet://'`. The `puppet` service
handles nodes, facts, and catalogs, and is also the fallback for the other three
services.
Each service is responsible for serializing/deserializing the HTTP entity into a
domain object. It uses the existing `Puppet::Network::Format` code to do so.
#### Resolvers
Each resolver represents a different strategy for resolving a service name into
a list of candidate servers and ports.
#### Session
Represents an HTTP session through which services may be connected to and accessed.
Has a `Session#route_to` method to route to a web service based on the requested
service name and client configuration:
```ruby
client = Puppet::HTTP::Client.new
session = client.create_session
service = session.route_to(:ca)
cert = service.get_certificate('foo')
puts "Retrieved cert #{cert.subject.to_utf8} from #{service.url}"
```
The `Session#route_to(:ca)` method (above) returns an instance of
`Puppet::HTTP::Service::Ca`, which has methods appropriate for that type of
service. All services extend `Puppet::HTTP::Service`.
### Routing
If an explicit server and port are specified on the command line or
configuration, such as `puppet agent -t --server foo.example.com`, then the
`Session#route_to` method will always return a `Service` with that host and port.
Otherwise, the session will walk the list of resolvers in priority order:
* DNS SRV
* Server list
* Puppet server/port settings
If the `route_to` method attempts to connect to a service, but it results in an
exception, such as "connection refused", then the session will attempt the next
service.
If the caller successfully uses a service, then the session will return the same
service the next time `route_to` is called again.
#### DNS SRV
The DNS SRV resolver performs an SRV lookup, and randomly selects one of the
targets based on the weight of each entry in the SRV record. A target with
weight 2 would be twice as likely to be chosen as a target with weight 1.
```ruby
client = Puppet::HTTP::Client.new(use_srv: true, srv_domain: 'puppet.example.com')
session = client.create_session
service = session.route_to(:ca)
# service.url is "https://compiler1.puppet.example.com:8140"
```
#### Server List
The server list resolver selects the first available server using puppetserver's
simple status endpoint. This applies when routing requests to the `:puppet`
service, as well as any service whose server and port are the same as the
`:puppet` service. For example, when `:ca_server` and `:report_server` have not
been overridden.
```ruby
client = Puppet::HTTP::Client.new(server_list: ['compiler1', 'compiler2'])
session = client.create_session
service = session.route_to(:puppet)
# service.url is "https://compiler1:8140"
```
#### Puppet Settings
The resolver selects a route based on the puppet settings for that service:
| service | server setting | port setting |
|------------|----------------|--------------|
| ca | ca_server | ca_port |
| fileserver | server | serverport |
| report | report_server | report_port |
| puppet | server | serverport |
For example, `route_to(:report)` would use `Puppet[:report_server]` and
`Puppet[:report_port]`.
#### Example: CA Service Routing
There are some variations in how the different services are routed. Here is a
visual of how the CA service is routed. We have to preserve some [interesting behavior](https://github.com/puppetlabs/puppet/blob/master/lib/puppet/http/client.rb#L243-L249)
with this service, but otherwise the flow is similar to that of other services.

### Generic HTTP(S) Requests
Puppet agents support downloading file content from 3rd party file servers,
which reduces load on the compiler. The `Client` will provide a low-level API
for making `GET` requests for an arbitrary URL, and streaming the response body.
Puppet only trusts the puppet PKI for its REST requests. However, it should be
possible to additionally trust the system store when making HTTPS requests:
```ruby
client = Puppet::HTTP::Client.new
response = client.get("https://artifactory.example.com/java.tar.gz", options: { include_system_store: true })
response.read_body do |data|
puts "Read #{data.bytes}"
end
```
### Puppetserver
Puppet ruby code running in puppetserver sometimes make outbound connections
such as the [puppetdb
terminus](https://github.com/puppetlabs/puppetdb/blob/6.5.0/puppet/lib/puppet/util/puppetdb/http.rb#L138),
PE classifier terminus, and ['http' report
processor](https://github.com/puppetlabs/puppet/blob/6.7.0/lib/puppet/reports/http.rb#L32).
Currently, puppetserver registers its own http client class, so that it can
perform the HTTP request using Apache HttpClient.
In order to preserve this capability, puppetserver should have a way of
overriding the `get` and `post` methods of `Puppet::HTTP::Client` to call the
Apache HttpClient instead.
One way might be to create an adapter that overrides Puppet's implementation and
delegates to [puppetserver's
client](https://github.com/puppetlabs/puppetserver/blob/f718994c0f32f8c697daa662ec4074e4596350fc/src/ruby/puppetserver-lib/puppet/server/http_client.rb#L23):
```ruby
class Puppet::Server::HttpClientAdapter < Puppet::HTTP::Client
def initialize(http_client)
super
@http_client = http_client
end
def get(url, headers={}, options={})
@http_client.get(url, headers, options)
end
# etc
end
```
And register it with puppet:
```ruby
Puppet.push_context(http_client: HttpClientAdapter.new(Puppet::Server::HttpClient.new))
```
|