1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445
|
<pre>Internet Engineering Task Force (IETF) J. Falk, Ed.
Request for Comments: 6590 Return Path
Category: Standards Track M. Kucherawy, Ed.
ISSN: 2070-1721 Cloudmark
April 2012
<span class="h1">Redaction of Potentially Sensitive Data from Mail Abuse Reports</span>
Abstract
Email messages often contain information that might be considered
private or sensitive, per either regulation or social norms. When
such a message becomes the subject of a report intended to be shared
with other entities, the report generator may wish to redact or elide
the sensitive portions of the message. This memo suggests one method
for doing so effectively.
Status of This Memo
This is an Internet Standards Track document.
This document is a product of the Internet Engineering Task Force
(IETF). It represents the consensus of the IETF community. It has
received public review and has been approved for publication by the
Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in <a href="./rfc5741#section-2">Section 2 of RFC 5741</a>.
Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
<a href="http://www.rfc-editor.org/info/rfc6590">http://www.rfc-editor.org/info/rfc6590</a>.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to <a href="https://www.rfc-editor.org/bcp/bcp78">BCP 78</a> and the IETF Trust's Legal
Provisions Relating to IETF Documents
(<a href="http://trustee.ietf.org/license-info">http://trustee.ietf.org/license-info</a>) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
<span class="grey">Falk & Kucherawy Standards Track [Page 1]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-2" ></span>
<span class="grey"><a href="./rfc6590">RFC 6590</a> Redaction April 2012</span>
Table of Contents
<a href="#section-1">1</a>. Introduction ....................................................<a href="#page-2">2</a>
<a href="#section-2">2</a>. Key Words .......................................................<a href="#page-3">3</a>
<a href="#section-3">3</a>. Recommended Practice ............................................<a href="#page-3">3</a>
<a href="#section-4">4</a>. Transformation Mechanisms .......................................<a href="#page-4">4</a>
<a href="#section-5">5</a>. Security Considerations .........................................<a href="#page-5">5</a>
<a href="#section-5.1">5.1</a>. General ....................................................<a href="#page-5">5</a>
<a href="#section-5.2">5.2</a>. Digest Collisions ..........................................<a href="#page-5">5</a>
<a href="#section-5.3">5.3</a>. Information Not Redacted ...................................<a href="#page-5">5</a>
<a href="#section-6">6</a>. Privacy Considerations ..........................................<a href="#page-6">6</a>
<a href="#section-7">7</a>. References ......................................................<a href="#page-6">6</a>
<a href="#section-7.1">7.1</a>. Normative References .......................................<a href="#page-6">6</a>
<a href="#section-7.2">7.2</a>. Informative References .....................................<a href="#page-6">6</a>
<a href="#appendix-A">Appendix A</a>. Example ................................................<a href="#page-7">7</a>
<a href="#appendix-B">Appendix B</a>. Acknowledgements .......................................<a href="#page-8">8</a>
<span class="h2"><a class="selflink" id="section-1" href="#section-1">1</a>. Introduction</span>
The Abuse Reporting Format [<a href="#ref-ARF" title=""An Extensible Format for Email Feedback Reports"">ARF</a>] defines a message format for sending
reports of abuse in the messaging infrastructure, with an eye toward
automating both the generation and consumption of those reports.
For privacy considerations, it might be the policy of a report
generator to anonymize, or obscure, portions of the report that might
identify an end user who caused the report to be generated. This has
come to be known in feedback loop parlance as "redaction". Precisely
how this is done is unspecified in [<a href="#ref-ARF" title=""An Extensible Format for Email Feedback Reports"">ARF</a>], as it will generally be a
matter of local policy. That specification does admonish generators
against being too overzealous with this practice, as obscuring too
much data makes the report non-actionable.
Previous redaction practices, such as replacing local-parts of
addresses with a uniform string like "xxxxxxxx", frustrated any kind
of prioritizing or grouping of reports. This memo presents a
practice for conducting redaction in a manner that allows a report
receiver to detect that two reports were caused by the same end user,
without revealing the identity of that user. That is, the report
receiver can use the redacted string, such as an obscured email
address, to determine that two such unredacted strings were
identical; the reports originally contained the same address.
<span class="grey">Falk & Kucherawy Standards Track [Page 2]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-3" ></span>
<span class="grey"><a href="./rfc6590">RFC 6590</a> Redaction April 2012</span>
Generally, it is assumed that the recipient-identifying fields of a
message, when copied into a report, are to be obscured to protect the
identity of the end user who submitted the complaint about the
message. However, it is also presumed that other data will be left
intact, and those data could be correlated against log files or other
resources to determine the intended recipient of the original
message.
<span class="h2"><a class="selflink" id="section-2" href="#section-2">2</a>. Key Words</span>
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [<a href="#ref-KEYWORDS" title=""Key words for use in RFCs to Indicate Requirement Levels"">KEYWORDS</a>].
<span class="h2"><a class="selflink" id="section-3" href="#section-3">3</a>. Recommended Practice</span>
When redacting of reports is desired, in order to enable a report
receiver to correlate reports that might refer to a common but
anonymous source, the report generator SHOULD use the following
practice:
1. Select a transformation mechanism (see <a href="#section-4">Section 4</a>) that is
consistent (i.e., the same input string produces the same output
each time) and reasonably collision-resistant (i.e., two
different inputs are unlikely to produce the same output).
2. Identify string(s) (such as local-parts of email addresses) in a
message that need to be redacted. Call these strings the
"private data".
3. For each piece of private data, apply the selected transformation
mechanism.
4. If the output of the transformation can contain bytes that are
not printable ASCII, or if the output can include characters not
appropriate to replace the private data directly, encode the
output with the base64 algorithm as defined in Section 4 of
[<a href="#ref-BASE64" title=""The Base16, Base32, and Base64 Data Encodings"">BASE64</a>], or some similar translation, to form a valid
replacement in the original context. For example, replacing a
local-part in an email address with transformation output
containing an "@" character (ASCII 0x40) or a space character
(ASCII 0x20) is not permitted by the specification for local-part
[<a href="#ref-SMTP" title=""Simple Mail Transfer Protocol"">SMTP</a>], so the transformation output needs to be encoded as
described.
<span class="grey">Falk & Kucherawy Standards Track [Page 3]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-4" ></span>
<span class="grey"><a href="./rfc6590">RFC 6590</a> Redaction April 2012</span>
5. Replace each instance of private data with the corresponding
(possibly encoded) transformation when generating the report.
Note that the replaced text could also be in a context that has
constraints, such as length limits that need to be observed.
This has the effect of obscuring the data (in a potentially
irreversible way) while still allowing the report recipient to
observe that numerous reports are about one particular end user.
Such detection enables the receiver to prioritize its reactions based
on problems that appear to be focused on specific end users that may
be under attack.
<span class="h2"><a class="selflink" id="section-4" href="#section-4">4</a>. Transformation Mechanisms</span>
This memo does not specify a particular transformation mechanism as a
requirement. The interoperability that this memo seeks to provide is
enabled by the consistency of the transformation.
Dealing with the issue of the security of the transformation (i.e.,
frustrating attempts to reverse the transformation) is a matter of
local policy. A continuum of possible transformations exists, from
trivial ones such as rot13, CRC32, and base64, through strong
cryptographic encodings such as the Hashed Message Authentication
Code [<a href="#ref-HMAC" title=""HMAC: Keyed- Hashing for Message Authentication"">HMAC</a>] and even full encryption, or private transformations such
as mapping an email address to an internal customer number. An
operator wishing to perform report redaction needs to select a
consistent transformation that obscures the private data and is
resilient to attempts to extract the original data to the extent
required by local policy, keeping in mind that the environment in
which the transformation is operating is not a highly secure one.
See <a href="#section-5.3">Section 5.3</a> for further details of this issue.
An implementation MAY choose any transformation that has a reasonably
low likelihood of collision.
<span class="grey">Falk & Kucherawy Standards Track [Page 4]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-5" ></span>
<span class="grey"><a href="./rfc6590">RFC 6590</a> Redaction April 2012</span>
<span class="h2"><a class="selflink" id="section-5" href="#section-5">5</a>. Security Considerations</span>
<span class="h3"><a class="selflink" id="section-5.1" href="#section-5.1">5.1</a>. General</span>
General security issues with respect to these reports are found
in [<a href="#ref-ARF" title=""An Extensible Format for Email Feedback Reports"">ARF</a>].
<span class="h3"><a class="selflink" id="section-5.2" href="#section-5.2">5.2</a>. Digest Collisions</span>
Message digest collisions are a well-understood issue. Their
application here involves a report receiver improperly concluding
that two pieces of redacted information were originally the same when
in fact they are not. This can lead to a denial of service, where
the inadvertently improper application of complaint data causes
unjustified corrective action. Such cases are sufficiently unlikely
as to be of little concern.
<span class="h3"><a class="selflink" id="section-5.3" href="#section-5.3">5.3</a>. Information Not Redacted</span>
Although the identity of the user causing a report to be generated
can be obscured using this mechanism, other properties of a message
(such as the Message-ID field) that are not redacted could be used to
recover the original data by locating them in the message logs of the
originating system or via other data correlation techniques. It is
incumbent on the report generator to anticipate and redact or
otherwise obscure such data, or accept that such recovery is possible
even from the very simplest kinds of feedback.
It is for this reason that the normative portions of this memo do not
include stronger assertions about cryptography used in the
transformation. Given the ultimate recoverability of the redacted
information, the cryptographic strength of the transformation is not
a critical security measure.
The process of redacting a feedback report satisfies a privacy
requirement established by local policy, and is not meant to provide
strong security properties.
[<a id="ref-FBL-BCP">FBL-BCP</a>] and Section 8 of [<a href="#ref-ARF" title=""An Extensible Format for Email Feedback Reports"">ARF</a>] discuss topics related to
establishment of bilateral agreements between report producers and
consumers. The issues raised here are also things to be considered
when establishing such agreements.
<span class="grey">Falk & Kucherawy Standards Track [Page 5]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-6" ></span>
<span class="grey"><a href="./rfc6590">RFC 6590</a> Redaction April 2012</span>
<span class="h2"><a class="selflink" id="section-6" href="#section-6">6</a>. Privacy Considerations</span>
While the method of redaction described in this document may reduce
the likelihood of some types of private data from leaking between
ADministrative Management Domains (ADMDs), it is extremely unlikely
that report generation software could ever be created to recognize
all of the different ways that private information could be expressed
through human written language. If further protections are required,
implementers may wish to consider establishing some sort of out-of-
band arrangements between the relevant entities, to contain private
data as much as possible.
<span class="h2"><a class="selflink" id="section-7" href="#section-7">7</a>. References</span>
<span class="h3"><a class="selflink" id="section-7.1" href="#section-7.1">7.1</a>. Normative References</span>
[<a id="ref-ARF">ARF</a>] Shafranovich, Y., Levine, J., and M. Kucherawy, "An
Extensible Format for Email Feedback Reports", <a href="./rfc5965">RFC 5965</a>,
August 2010.
[<a id="ref-BASE64">BASE64</a>] Josefsson, S., "The Base16, Base32, and Base64 Data
Encodings", <a href="./rfc4648">RFC 4648</a>, October 2006.
[<a id="ref-KEYWORDS">KEYWORDS</a>] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", <a href="https://www.rfc-editor.org/bcp/bcp14">BCP 14</a>, <a href="./rfc2119">RFC 2119</a>, March 1997.
<span class="h3"><a class="selflink" id="section-7.2" href="#section-7.2">7.2</a>. Informative References</span>
[<a id="ref-FBL-BCP">FBL-BCP</a>] Falk, J., Ed., "Complaint Feedback Loop Operational
Recommendations", <a href="./rfc6449">RFC 6449</a>, November 2011.
[<a id="ref-HMAC">HMAC</a>] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed-
Hashing for Message Authentication", <a href="./rfc2104">RFC 2104</a>,
February 1997.
[<a id="ref-SMTP">SMTP</a>] Klensin, J., "Simple Mail Transfer Protocol", <a href="./rfc5321">RFC 5321</a>,
October 2008.
<span class="grey">Falk & Kucherawy Standards Track [Page 6]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-7" ></span>
<span class="grey"><a href="./rfc6590">RFC 6590</a> Redaction April 2012</span>
<span class="h2"><a class="selflink" id="appendix-A" href="#appendix-A">Appendix A</a>. Example</span>
Assume the following input message:
From: alice@example.com
To: bob@example.net
Subject: Make money fast!
Message-ID: <123456789@mailer.example.com>
Date: Thu, 17 Nov 2011 22:19:40 -0500
Want to make a lot of money really fast? Check it out!
http://www.example.com/scam/0xd0d0cafe
On receipt, bob@example.net reports this message as abusive through
whatever mechanism his mailbox provider has established. This causes
an [<a href="#ref-ARF" title=""An Extensible Format for Email Feedback Reports"">ARF</a>] message to be generated. However, example.net wishes to
obscure Bob's email address lest it be relayed to the offending
agent, which could lead to more trouble for Bob.
Thus, example.net plans to redact the local-part of the recipient
address in the To: field. Local policy and security requirements
suggest that the algorithm known as "H" (a hash of a key concatenated
with the data to be obscured) using SHA1 is adequate. It has thus
selected a redaction key of "potatoes", and the private data in this
case is the string "bob". The concatenation of "potatoesbob" is
digested with SHA1 and then base64-encoded to the string
"rZ8cqXWGiKHzhz1MsFRGTysHia4=".
Therefore, when constructing the ARF message in response to Bob's
complaint, the following form of the received message is used in the
third part of the ARF report:
From: alice@example.com
To: rZ8cqXWGiKHzhz1MsFRGTysHia4=@example.net
Subject: Make money fast!
Message-ID: <123456789@mailer.example.com>
Date: Thu, 17 Nov 2011 22:19:40 -0500
Want to make a lot of money really fast? Check it out!
http://www.example.com/scam/0xd0d0cafe
Note, however, that it is possible that the redacted information can
be recovered by agents at example.com searching their logs for the
original envelope associated with the message, by correlating with
the Message-ID contents, which were not redacted here. It is
expected that feedback loops generating such reports involve senders
that have been vetted against such information leakage.
<span class="grey">Falk & Kucherawy Standards Track [Page 7]</span></pre>
<hr class='noprint'/><!--NewPage--><pre class='newpage'><span id="page-8" ></span>
<span class="grey"><a href="./rfc6590">RFC 6590</a> Redaction April 2012</span>
<span class="h2"><a class="selflink" id="appendix-B" href="#appendix-B">Appendix B</a>. Acknowledgements</span>
Much of the text in this document was initially moved from other MARF
working group documents, with contributions from Monica Chew, Tim
Draegen, Michael Adkins, and other members of the Messaging Anti-
Abuse Working Group. Additional feedback was provided by John
Levine, S. Moonesamy, Alessandro Vesely, and Mykyta Yevstifeyev.
Authors' Addresses
J.D. Falk (editor)
Return Path
100 Mathilda Place, Suite 100
Sunnyvale, CA 94086
US
EMail: ietf@cybernothing.org
URI: <a href="http://www.returnpath.net/">http://www.returnpath.net/</a>
M. Kucherawy (editor)
Cloudmark
128 King St., 2nd Floor
San Francisco, CA 94107
US
EMail: msk@cloudmark.com
Falk & Kucherawy Standards Track [Page 8]
</pre>
|