1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
|
<h2>Simple substitutions</h2>
<p>Here I interrupt myself with an en dash – no, now it’s with—an em dash.</p>
<p>And finally…wait for it, and again with spaces…I’ve tested ellipses…and also with even more spaces.</p>
<h2>Escapes</h2>
<p>Before getting to the hard stuff, I’ll run through all the escape sequences — they shouldn’t need to become HTML entities.</p>
<pre><code>\\ \" \' \` \- \. \>
</code></pre>
<p>The “smarty-pants” extra adds escapes for 'single quotes' and "double
quotes" in case you want to force dumb quotes.</p>
<h2>Quotation marks</h2>
<p>You’ll notice that I began this document with a quotation to test a potential error: $ is zero-width and \s is one-width, and you can’t have both in a backreference. Meanwhile, I’ve this paragraph has tested contractions four times; ’tis close, but this last apostrophe should fool the regex.</p>
<p>“This text” tests to see whether an adjacent <p> tag messes up detection of quotation marks.</p>
<p>The docs say, “You can open and close quotations with quotation marks, and they don’t both have to be single or double.” So ‘this” works. And “this.’ And finally, ‘this.’</p>
<p>Most of the corrections are consistent with what a word processor might do when autoformatting:</p>
<ul>
<li>When a single- or double-prime falls between text and whitespace, it opens facing the text.</li>
<li>Edge case: in “British grammar”, quotations are closed just before punctuation, so a closing quotation mark may be followed not by whitespace but by one of ,;.?!</li>
<li>Other edge cases: nested quotation marks, or perhaps an apostrophe (see directly above) neighboring a quotation mark. The only “easy” solution is to have such quotation marks adjust to actual text, or if they’re only neighbored by whitespace and/or quotation marks, wait for those quotation marks to pick a direction, and then match it. Ick!</li>
<li>Other edge cases: opening or closing quotations just within parentheses or brackets of some kind, generally in code, etc. Transformations here are <strong>not</strong> supported because said transformations are only meant to apply to plain English or other natural language; trying to satisfy such edge cases would lead to a slippery slope and bloat.</li>
</ul>
<h3>Edge case: contractions</h3>
<p>A single-prime can be surrounded by text, in which case it becomes an apostrophe and opens left.</p>
<p>For common contractions, a space single-prime non-space combination should produce an apostrophe (&#8217;) instead of an opening scare quote (&#8216;).</p>
<p>Here is the full list: ’tis, ’twas, ’twer, ’neath, ’o, ’n, ’round, ’bout, ’twixt, ’nuff, ’fraid, ’sup <br />
The full list, capitalized: ’Tis, ’Twas, ’Twer, ’Neath, ’O, ’N, ’Round, ’Bout, ’Twixt, ’Nuff, ’Fraid, ’Sup <br />
And normal text: ‘random ‘stuff ‘that ‘shouldn’t ‘be ‘detected ‘as ‘contractions <br />
And years: ’29 ’91 ‘1942 ‘2001 ‘2010</p>
<p>Like quotation marks, the year shorthand expects a year, e.g. '29, to be followed by whitespace or sentence-ending punctuation. Numbers like '456.7 will throw it off, but those aren’t entered very often.</p>
<p>These transformations don’t consider whether or not the contraction was preceded by whitespace. If it was preceded by text, then it would have been converted by the standard contraction rule (see the first line of this section).</p>
|