Here I interrupt myself with an en dash – no, now it’s with—an em dash.
And finally…wait for it, and again with spaces…I’ve tested ellipses…and also with even more spaces.
Before getting to the hard stuff, I’ll run through all the escape sequences — they shouldn’t need to become HTML entities.
\\ \" \' \` \- \. \>
The “smarty-pants” extra adds escapes for 'single quotes' and "double quotes" in case you want to force dumb quotes.
You’ll notice that I began this document with a quotation to test a potential error: $ is zero-width and \s is one-width, and you can’t have both in a backreference. Meanwhile, I’ve this paragraph has tested contractions four times; ’tis close, but this last apostrophe should fool the regex.
“This text” tests to see whether an adjacent <p> tag messes up detection of quotation marks.
The docs say, “You can open and close quotations with quotation marks, and they don’t both have to be single or double.” So ‘this” works. And “this.’ And finally, ‘this.’
Most of the corrections are consistent with what a word processor might do when autoformatting:
A single-prime can be surrounded by text, in which case it becomes an apostrophe and opens left.
For common contractions, a space single-prime non-space combination should produce an apostrophe (’) instead of an opening scare quote (‘).
Here is the full list: ’tis, ’twas, ’twer, ’neath, ’o, ’n, ’round, ’bout, ’twixt, ’nuff, ’fraid, ’sup
The full list, capitalized: ’Tis, ’Twas, ’Twer, ’Neath, ’O, ’N, ’Round, ’Bout, ’Twixt, ’Nuff, ’Fraid, ’Sup
And normal text: ‘random ‘stuff ‘that ‘shouldn’t ‘be ‘detected ‘as ‘contractions
And years: ’29 ’91 ‘1942 ‘2001 ‘2010
Like quotation marks, the year shorthand expects a year, e.g. '29, to be followed by whitespace or sentence-ending punctuation. Numbers like '456.7 will throw it off, but those aren’t entered very often.
These transformations don’t consider whether or not the contraction was preceded by whitespace. If it was preceded by text, then it would have been converted by the standard contraction rule (see the first line of this section).