File: collapsed_forwarding.en.rst

package info (click to toggle)
trafficserver 9.2.5%2Bds-0%2Bdeb12u3
  • links: PTS, VCS
  • area: main
  • in suites: bookworm-proposed-updates
  • size: 64,964 kB
  • sloc: cpp: 345,958; ansic: 31,184; python: 25,297; sh: 7,023; makefile: 3,045; perl: 2,255; java: 277; pascal: 119; sql: 94; xml: 2
file content (188 lines) | stat: -rw-r--r-- 9,202 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
.. Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing,
   software distributed under the License is distributed on an
   "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
   KIND, either express or implied.  See the License for the
   specific language governing permissions and limitations
   under the License.

.. include:: ../../common.defs

.. _admin-plugins-collapsed-forwarding:

Collapsed Forwarding Plugin
***************************

This is a plugin for Apache Traffic Server that allows to achieve
effective connection collapse by blocking all but one of the multiple
concurrent requests for the same object from going to the Origin.

Installation
------------

To make this plugin available, you must either enable experimental plugins
when building |TS|::

    ./configure --enable-experimental-plugins

Or use :program:`tsxs` to compile the plugin against your current |TS| build.
To do this, you must ensure that:

#. Development packages for |TS| are installed.

#. The :program:`tsxs` binary is in your path.

#. The version of this plugin you are building, and the version of |TS| against
   which you are building it are compatible.

Once those conditions are satisfied, enter the source directory for the plugin
and perform the following::

    make -f Makefile.tsxs
    make -f Makefile.tsxs install

Using the plugin
----------------

This plugin can function as a per remap plugin or a global plugin, and it takes two optional
arguments for specifying the delay between successive retries and a max
number of retries.

To activate the plugin in per remap mode, in :file:`remap.config`, simply append the
below to the specific remap line::

  @plugin=collapsed_forwarding.so @pparam=--delay=<delay> @pparam=--retries=<retries>

To activate the plugin globally, in :file:`plugin.config`, add the following line::

  collapsed_forwarding.so --delay=<delay> --retries=<retries>

If the plugin is enabled both globally and per remap, Traffic Server will issue an error on startup.

Functionality
-------------

Traffic Server plugin to allow collapsed forwarding of concurrent requests for
the same object. This plugin is based on open_write_fail_action feature, which
detects cache open write failure on a cache miss and returns a 502 error along
with a special @-header indicating the reason for 502 error. The plugin acts
on the error by using an internal redirect follow back to itself, essentially
blocking the request until a response arrives, at which point, relies on
read-while-writer feature to start downloading the object to all waiting
clients. The following config parameters are assumed to be set for this
plugin to work:

- :ts:cv:`proxy.config.http.cache.open_write_fail_action`        ``1``
- :ts:cv:`proxy.config.cache.enable_read_while_writer`           ``1``
- :ts:cv:`proxy.config.http.number_of_redirections`             ``10``
- :ts:cv:`proxy.config.http.redirect_use_orig_cache_key`         ``1``
- :ts:cv:`proxy.config.http.background_fill_active_timeout`      ``0``
- :ts:cv:`proxy.config.http.background_fill_completed_threshold` ``0``

Additionally, given that collapsed forwarding works based on cache write
lock failure detection, the plugin requires cache to be enabled and ready.
On a restart, Traffic Server typically takes a few seconds to initialize
the cache depending on the cache size and number of dirents. While the
cache is not ready yet, collapsed forwarding can not detect the write lock
contention and so can not work. The setting :ts:cv:`proxy.config.http.wait_for_cache`
may be enabled which allows blocking incoming connections from being
accepted until cache is ready.

Description
-----------
Traffic Server has been affected severely by the Thundering Herd problem caused
by its inability to do effective connection collapse of multiple concurrent
requests for the same segment. This is especially critical when Traffic Server
is used as a solution to use cases such as delivering a large scale video
live streaming. This problem results in a specific behavior where multiple
number of requests for the same file are leaked upstream to the Origin layer
choking the upstream bandwidth due to the duplicated large file downloads or
process intensive file at the Origin layer. This ultimately can cause
stability problems on the origin layer disrupting the overall network
performance.

Traffic Server supports several kind of connection collapse mechanisms including
Read-While-Writer (RWW), Stale-While-Revalidate (SWR) etc each very effective
dealing with a majority of the use cases that can result in the
Thundering herd problem.

For a large scale Video Streaming scenario, there's a combination of a
large number of revalidations (e.g. media playlists) and cache misses
(e.g. media segments) that occur for the same file. Traffic Server's
RWW works great in collapsing the concurrent requests in such a scenario,
however, as described in :ref:`admin-configuration-reducing-origin-requests`,
Traffic Server's implementation of RWW has a significant limitation, which
restricts its ability to invoke RWW only when the response headers are
already received. This means that any number of concurrent requests for
the same file that are received before the response headers arrive are
leaked upstream, which can result in a severe Thundering herd problem,
depending on the network latencies (which impact the TTFB for the
response headers) at a given instant of time.

To address this limitation, Traffic Server supports a few Cache tuning
solutions, such as Open Read Retry, and a new feature called
Open Write Fail action from 6.0. To understand how these approaches work,
it is important to understand the high level flow of how Traffic Server
handles a GET request.

On receiving a HTTP GET request, Traffic Server generates the cache key
(basically, a hash of the request URL) and looks up for the directory
entry (dirent) using the generated index. On a cache miss, the lookup
fails and Traffic Server then tries to just get a write lock for the
cache object and proceeds to the origin to download the object. On
the Other hand, if the lookup is successful, meaning, the dirent
exists for the generated cache key, Traffic Server tries to obtain
a read lock on the cache object to be able to serve it from the cache.
If the read lock is not successful (possibly, due to the fact that
the object's being written to at that same instant and the response
headers are not in the cache yet), Traffic Server then moves to the
next step of trying to obtain an exclusive write lock. If the write
lock is already held exclusively by another request (transaction), the
attempt fails and at this point Traffic Server simply disables the
cache on that transaction and downloads the object in a proxy-only
mode::

  1). Cache Lookup (lookup for the dirent using the request URL as cache key).
    1.1). If lookup fails (cache miss), goto (3).
    1.2). If lookup succeeds, try to obtain a read lock, goto (2).
  2). Open Cache Read (try to obtain read lock)
    2.1). If read lock succeeds, serve from cache, goto (4).
    2.2). If read lock fails, goto (3).
  3). Open Cache Write (try to obtain write lock).
    3.1). If write lock succeeds, download the object into cache and to the client in parallel
    3.2). If write lock fails, disable cache, and download to the client in a proxy-only mode.
  4). Done

As can be seen above, if a majority of concurrent requests arrive before
response headers are received, they hit (2.2) and (3.2) above. Open Read
Retry can help to repeat (2) after a configured delay on 2.2, thereby
increasing the chances for obtaining a read lock and being able to serve
from the cache.

However, the Open Read Retry can not help with the concurrent requests
that hit (1.1) above, jumping to (3) directly. Only one such request will
be able to obtain the exclusive write lock and all other requests are
leaked upstream. This is where, the recently developed Traffic Server
feature Open Write Fail Action will help. The feature detects the write
lock failure and can return a stale copy for a Cache Revalidation or a
5xx status code for a Cache Miss with a special internal header
<@Ats-Internal> that allows a TS plugin to take other special actions
depending on the use-case.

``collapsed_forwarding`` plugin catches that error in SEND_RESPONSE_HDR_HOOK
and performs an internal 3xx Redirect back to the same host, the configured
number of times with the configured amount of delay between consecutive
retries, allowing to be able to initiate RWW, whenever the response headers
are received for the request that was allowed to go to the Origin.


More details are available at :ref:`admin-configuration-reducing-origin-requests`