1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202
|
Variant Conversion Info (VARCON)
Revision 4.1
August 10, 2004
Copyright 2000-2004 by Kevin Atkinson (kevina@gnu.org)
This package contains tables to convert between American, British, and
Canadian spellings and vocabulary as well as well as a table listing the
equivalent forms of other variants.
The latest version can be found at http://wordlist.sourceforge.net/.
The abbc.tab contains mappings between American, British with "ise"
spelling, British with "ize" spelling, and Canadian spellings. The
fields are separated by a tab character and have the Unix EOL
character. The first four columns are the spellings respectively.
The last column is used to mark words where the American or British
spelling is also used in the British or American spelling but only
when the word has a certain meaning. American words that are also
used in British / Canadian spellings are marked with a "A" while
British words that are also used in American / Canadian spellings are
marked with a "B".
The file voc.tab is like abbc.tab except that it converts between
vocabulary instead of spelling. If more than one word if often uses to
describe the same thing the words are separated with commas. The last
column contains additional notes on when the word is used. Unlike
abbc.tab it is generally not suitable for automatic conversion.
The file variant-also.tab contains additional mappings between various
spellings of a word which are not necessarily region dependent. Only
mappings that are not listed in abbc.tab are included in this mapping.
No attempt is made to distinguish the primary form of a word. The
file variant-infl.tab is like variant-also.tab except that it is
created automatically from the AGID inflection database. The file
variant-wroot.tab is like variant-infl.tab except that it also
included the root form of the word.
Both the translation array and variant table includes a lot of strange
affixations of words which I have no intention of removing as some
people might find them useful.
The "make-variant" Perl script will combine abbc.tab, variant-also.tab,
and variant-infl.tab into one huge mapping and will output the result
to "variant.tab". If the "no-infl" option is given than
variant-infl.tab will not be included.
The "split" script will create 5 words lists from abbc.tab:
american.lst, british.lst, british_z.lst, canadian.lst, and
common.lst. The common.lst file contains words that were marked in
the last column as explained above and the other four contain all the
possible words that might be used by that particular country, included
some of the words marked in the last column.
The "translate" Perl script will translate a text file from one
spelling to another. Its usage is:
translate <options> [<translation array>] <from> <to>
<options> is any of
-?,-h,--help this screen
-m,--mark mark words where the translation is questionable
-i,--include include words where the translation is questionable
<translation array> is the file name of the translation array,
defaults to "abbc.tab".
<from> and <to> are one of: american, british, british_z, or canadian.
british-ise and british-ize can also be used.
Text is read in from standard input and is outputted to standard out.
Words are marked with a '?' before and after the questionable word
when the option is enabled.
If you discover any errors in these mappings, besides the strange
affixations, or have suggestions for additions be sure and let me know
at kevina@gnu.org.
SOURCE:
These mappings were compiled from numerous sources.
The abc.tab was originally created from the American and British word
lists found in the Ispell distribution and the Canadian word list
created by Garst R. Reese <reese@isn.net>:
What I have discovered is that Canadian is a modification of British.
Canadians use ize ization, izing izable like Americans, and gram instead
of gramme. The one exception I found was practise. It does not go to
practize. Otherwise they use British spelling. So, what I am currently
checking books with is a an edited version of British, where I have
changed all occurrences of ise to ize, isab to izab, isation to ization,
ising to izing, and gramme to gram except I allow programme, which is
sometimes proper unless you are talking about a computer program. I did
bunches of greps to be sure these substitutions would work as expected.
Many other words have been added to abc.tab which were not in the
original Ispell word lists.
Many different web sources were consuled when crating the tables. They
include:
The American-British British-American Dictionary
http://www.peak.org/~jeremy/dictionary/dictionary.html
American and British Spelling Differences
http://www.peak.org/~jeremy/dictionary/spellcat.html
Dave (VE7CNV)'s Truly Canadian Dictionary of Canadian Spelling
http://www.luther.bc.ca/~dave7cnv/cdnspelling/cdnspelling.html
Canadian Spelling Convention
http://imej.wfu.edu/articles/1999/1/02/demo/tutorial/canas.html
Cornerstone's Canadian English Page
http://www.web.net/cornerstone/cdneng.htm
Inter-Play Translation: British/Canadian/American Spelling
http://www.inter-play.com/translation/spel-ukus.htm
Inter-Play Translation: British/Canadian/American Vocabulary
http://www.inter-play.com/translation/voc-ukus.htm
As well as several online dicionaries:
Marriam-Webster: http://www.m-w.com/
American Heritage: http://www.bartleby.com/61/
Cambridge (ESL): http://dictionary.cambridge.org/
CHANGELOG:
From Revision 4 to Revision 4.1 (August 10, 2004)
- Fixed various errors ib abbc.tab
- Removed clause 4 from the Ispell copyright with permission of Geoff
Kuenning.
From Revision 3 to Revision 4 (August 7, 2004)
- Added a column to "abc.tab" for the British "ize" spelling and
renamed the file to abbc.tab.
- Added verb forms of prize/prise to abbc.tab, removed from
variant-also.tab
From Revision 2 to Revision 3 (January 2, 2003)
- Added an option for not including variant-infl.tab for the
make-variant perl script
- Added the file variant-wroot.tab
- Added a few entries given to me by Francis Bond and Edward Betts
From Revision 1 to Revision 2 (January 27, 2001)
- Removed all "B" markers because I could not find any evidence for
them
- Corrected a few Canadian entries, especially those with the "B"
markers
- Added some more entries by trying fixed changes (such as ize to
ise) to words in SCOWL and hand-checking over the ones with semi-common
words in them.
- Added variant-infl.tab
COPYRIGHT:
Copyright 2000-2004 by Kevin Atkinson
Permission to use, copy, modify, distribute and sell this array, the
associated software, and its documentation for any purpose is hereby
granted without fee, provided that the above copyright notice appears
in all copies and that both that copyright notice and this permission
notice appear in supporting documentation. Kevin Atkinson makes no
representations about the suitability of this array for any
purpose. It is provided "as is" without express or implied warranty.
Since the original words lists come from the Ispell distribution:
Copyright 1993, Geoff Kuenning, Granada Hills, CA
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. All modifications to the source code must be clearly marked as
such. Binary redistributions based on modified source code
must be clearly marked as modified versions in the documentation
and/or other materials provided with the distribution.
(clause 4 removed with permission from Geoff Kuenning)
5. The name of Geoff Kuenning may not be used to endorse or promote
products derived from this software without specific prior
written permission.
THIS SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL GEOFF KUENNING OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
|