1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458
|
UNOFFICIAL HTML-MODE PATCH FOR ISPELL 3.1.18/3.1.20
What does the patch do?
This patch adds a html-mode to ispell. Basically this means that
a patched copy of ispell will ignore any mark-up tags or html
entities in a html document when spell checking that document.
Any text inside an 'alt' attribute will however be checked.
What is Ispell anyway?
Ispell is a fast screen-oriented spelling checker that shows
you your errors in the context of the original file, and
suggests possible corrections when it can figure them out.
Compared to UNIX spell, it is faster and much easier to use.
Ispell can also handle languages other than English. [taken
from the ispell README file]
Where can I get the Ispell package?
Ispell 3.1 kits are available for anonymous ftp from a number
of sites. Check Archie for the string "ispell-3.1". The URL
for the master archive site (with IP numbers following) is:
ftp://ftp.cs.ucla.edu/pub/ispell-3.1/ispell-3.1.20.tar.gz
(131.179.240.10)
ftp://ftp.math.orst.edu/pub/ispell-3.1/ispell-3.1.20.tar.gz
(128.193.80.161)
The following European sites mirror ispell. If you can't find
the latest version there, it probably just hasn't been mirrored
yet:
ftp://ftp.th-darmstadt.de/pub/dicts/ispell/ispell-3.1.20.tar.gz
(130.83.55.75)
ftp://ftp.nl.net:/pub/textproc/ispell/ispell-3.1.20.tar.gz
(193.78.240.13)
ftp://ftp.ibp.fr:/pub/ispell/ispell-3.1.20.tar.gz
You can also locate ispell archive sites via the ispell home
page:
http://www.cs.ucla.edu/ficus-members/geoff/ispell.html
How do I install the patch?
You first need to get the source to ispell (the patch should
work with versions 3.1.18 and also 3.1.2). Then untar and
uncompress the ispell distribution. Cd to the ispell-3.1
directory which should have been created. Then run the
following command to apply the patch
patch < path_to_directory_containing_this_file/this_filename
You can then install ispell as normal (see the README file
included in the ispell distribution for details).
How do I use it?
The patched version of ispell should automatically enter
html-mode whenever checking a file with a .htm or .html
extension. You can also explicitly enter html-mode by using
the -h command line option (see man page). If you want to
spell-check a file with a .htm or a .html extension without
treating it as a html file simply use either the -t or -n
command line options.
Examples: ispell index.html # html tags will be ignored
ispell -h README # html tags will be ignored
ispell -n index.html # html tags will be spell-checked
What do I do if I find any bugs?
If you find a bug and you feel that it is due to the
html-mode-patch then please send an email to
gtierney@nova.ucd.ie explaining what you think is wrong.
Ispell bug reports in general should be sent to
ispell-bugs@itcorp.com unless they are related to Emacs in
which case the address to send reports to is
ispell-el-bugs@itcorp.com.
Is there any warranty?
In a word: NO.
As the developer of ispell, Geoff Kuenning, states: THIS
SOFTWARE IS PROVIDED BY GEOFF KUENNING AND CONTRIBUTORS ``AS
IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT
SHALL GEOFF KUENNING OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
Please note however that Geoff Kuenning has no responsibility
for this patch (except for writing much of the ispell code on
which it is based) so if anything goes wrong, you are not even
justified in bearing a grudge against him or any of the other
contributors. Geoff Keunning should not be seen as endorsing
this patch in any way, shape or form.
Can I redistribute or change the code?
As far as I am concerned you can do whatever you like with the
code. However since the code is based around Geoff Kuenning's
code, you are constrained by his redistribution and use
restrictions. These can be found at the start of any of the
source code files in the ispell distribution.
---------------C-U-T------H-E-R-E-----------------
-- (actually there's no need to cut, patch will --
-- ignore the above text anyway. --
--------------------------------------------------
*** correct.c.orig Thu Oct 12 22:04:06 1995
--- correct.c Tue Dec 16 22:50:00 1997
***************
*** 233,238 ****
--- 233,241 ----
int bufsize;
int ch;
+ /* line added by Gerry Tierney */
+ insidehtml = 0;
+
for (bufno = 0; bufno < contextsize; bufno++)
contextbufs[bufno][0] = '\0';
*** defmt.c.orig Thu Oct 12 22:04:06 1995
--- defmt.c Tue Dec 16 22:50:03 1997
***************
*** 160,165 ****
--- 160,166 ----
static int save_math_mode;
static char save_LaTeX_Mode;
+ /* parameters changed by Gerry Tierney to include the output file */
static char * skiptoword (bufp) /* Skip to beginning of a word */
char * bufp;
{
***************
*** 170,175 ****
--- 171,223 ----
|| (tflag && (math_mode & 1)))
)
{
+ /* Start of modifications by Gerry Tierney */
+ /* We first check for an end-quote character if we are checking
+ inside of an alt attribute. If we find one we ignore the
+ rest of the tag */
+ if (insidehtml == -1 && *bufp == '\"')
+ {
+ insidehtml = 1;
+ while (*bufp != '>' && *bufp != NULL)
+ bufp++;
+ if (*bufp == NULL)
+ insidehtml = 1;
+ }
+
+ /* If we are checking a html file we want to ignore any
+ HTML tags. These should start with a '<'
+ and end with a '>' so we simply skip over anything
+ between these two symbols. If we reach the end of the line
+ before finding a matching '>' we set a flag 'insidehtml' */
+ if (htmlflag == 1 && *bufp == '<')
+ {
+ /* Found start of html tag - Skip to end of tag or EOL */
+ while (*bufp != '>' && *bufp != NULL &&
+ strncasecmp(bufp,"alt=\"",5) != 0)
+ bufp++;
+ /* If we find an alt tag, we want to check its text */
+ if (strncasecmp(bufp,"alt=\"",5) == 0)
+ {
+ insidehtml=-1;
+ bufp = bufp + 4;
+ }
+ else if (*bufp == NULL)
+ /* we've reached EOL without closing the tag */
+ insidehtml = 1;
+ }
+
+ /* Skip over quoted entities such as "
+ These all start with an ampersand and
+ end with a semi-colon. We do not need
+ to worry about them extending over more than one line */
+ if (htmlflag == 1 && *bufp == '&')
+ {
+ while (*bufp != ';' && *bufp != NULL)
+ bufp++;
+ }
+ /* End of modifications by Gerry Tierney */
+
+
/* check paren necessity... */
if (tflag) /* TeX or LaTeX stuff */
{
***************
*** 389,395 ****
if (hadlf)
contextbufs[0][len] = 0;
! if (!tflag)
{
/* skip over .if */
if (*currentchar == NRDOT
--- 437,444 ----
if (hadlf)
contextbufs[0][len] = 0;
! /* Conditions modified by Gerry Tierney to handle html-mode */
! if (!tflag && htmlflag != 1)
{
/* skip over .if */
if (*currentchar == NRDOT
***************
*** 426,432 ****
/* if this is a formatter command, skip over it */
! if (!tflag && *currentchar == NRDOT)
{
while (*currentchar && !myspace (chartoichar (*currentchar)))
{
--- 475,482 ----
/* if this is a formatter command, skip over it */
! /* Conditions modified by Gerry Tierney to handle html-mode */
! if (!tflag && htmlflag != 1 && *currentchar == NRDOT)
{
while (*currentchar && !myspace (chartoichar (*currentchar)))
{
***************
*** 441,447 ****
--- 491,531 ----
return;
}
}
+
+ /* Start of modifications by Gerry Tierney */
+
+ /* If we are checking a htmlfile and we have being left with
+ an open tag from a previous line, then we ignore everything
+ from the start of the line until we either reach the end of
+ the line or we close the tag */
+
+ if (htmlflag == 1 && insidehtml == 1)
+ {
+ while (*currentchar != '>' && *currentchar != NULL)
+ {
+ /* We check for an alt attribute (found inside img
+ tags). We want to spell check it's text so if
+ we find one, we switch out html-mode until we
+ find the next quote character. We signal this
+ state by setting the insidehtml flag to -1 */
+ if (strncasecmp(currentchar,"alt=\"",5) == 0)
+ {
+ copyout(¤tchar,5);
+ insidehtml = -1;
+ break;
+ }
+
+ (void) putc (*currentchar, ofile);
+ currentchar++;
+ }
+ if (*currentchar == '>')
+ /* We've closed the tag so we reset the flag */
+ insidehtml = 0;
+
+ }
+ /* End of modifications by Gerry Tierney */
+
for ( ; ; )
{
p = skiptoword (currentchar);
*** ispell.1X.orig Mon Jan 23 21:28:25 1995
--- ispell.1X Tue Dec 16 22:50:02 1997
***************
*** 110,115 ****
--- 113,119 ----
.IP \fIcommon-flags\fP:
.RB [ \-t ]
.RB [ \-n ]
+ .RB [ \-h ]
.RB [ \-b ]
.RB [ \-x ]
.RB [ \-B ]
***************
*** 296,301 ****
--- 300,307 ----
The input file is in TeX or LaTeX format.
.IP \fB\-n\fR
The input file is in nroff/troff format.
+ .IP \fB\-h\fR
+ The input file is in html format.
.IP \fB\-b\fR
Create a backup file by appending ".bak"
to the name of the input file.
***************
*** 337,344 ****
.RB ( \-n )
or TeX/LaTeX
.RB ( \-t )
! input mode.
! (The default is controlled by the DEFTEXFLAG installation option.)
TeX/LaTeX mode is also automatically selected if an input file has
the extension ".tex", unless overridden by the
.B \-n
--- 343,354 ----
.RB ( \-n )
or TeX/LaTeX
.RB ( \-t )
! input mode (This does not work for html
! .RB ( \-h )
! mode. However html-mode is assumed for any files with a ".html"
! or ".htm" extension unless nroff/troff or TeX/LaTeX modes have
! being explicted defined).
! (The default mode is controlled by the DEFTEXFLAG installation option.)
TeX/LaTeX mode is also automatically selected if an input file has
the extension ".tex", unless overridden by the
.B \-n
*** ispell.c.orig Thu Oct 12 22:04:07 1995
--- ispell.c Tue Dec 16 22:50:02 1997
***************
*** 298,304 ****
* ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
* ^^^^ ^^^ ^ ^^ ^^
* abcdefghijklmnopqrstuvwxyz
! * ^^^^^^ ^^^ ^ ^^ ^^^
*/
arglen = strlen (*argv);
switch ((*argv)[1])
--- 298,306 ----
* ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789
* ^^^^ ^^^ ^ ^^ ^^
* abcdefghijklmnopqrstuvwxyz
! * ^^^^^^ ^ ^^^ ^ ^^ ^^^
! *
! * -h flag used by Gerry Tierney for html-mode
*/
arglen = strlen (*argv);
switch ((*argv)[1])
***************
*** 488,493 ****
--- 490,496 ----
if (arglen > 2)
usage ();
tflag = 0; /* nroff/troff mode */
+ htmlflag = -1; /* non-html mode */
deftflag = 0;
if (preftype == NULL)
preftype = "nroff";
***************
*** 496,505 ****
--- 499,519 ----
if (arglen > 2)
usage ();
tflag = 1;
+ htmlflag = -1; /* non-html mode */
deftflag = 1;
if (preftype == NULL)
preftype = "tex";
break;
+
+ /* -h option to enable HTML-mode added by Gerry Tierney */
+ case 'h':
+ if (arglen > 2)
+ usage ();
+ tflag = 0; /* non-TeX mode */
+ deftflag = 0;
+ htmlflag = 1; /* Html-Mode */
+ break;
+
case 'T': /* Set preferred file type */
p = (*argv)+2;
if (*p == '\0')
***************
*** 810,816 ****
if (tflag < 0)
tflag =
(cp = rindex (filename, '.')) != NULL && strcmp (cp, ".tex") == 0;
!
if (prefstringchar < 0)
{
defdupchar =
--- 824,830 ----
if (tflag < 0)
tflag =
(cp = rindex (filename, '.')) != NULL && strcmp (cp, ".tex") == 0;
!
if (prefstringchar < 0)
{
defdupchar =
***************
*** 818,823 ****
--- 832,845 ----
if (defdupchar < 0)
defdupchar = 0;
}
+
+ /* Modification by Gerry Tierney to set hmtl-mode
+ * based on file extension */
+ if (htmlflag == 0)
+ htmlflag =
+ (cp = rindex (filename, '.')) != NULL &&
+ ( strcmp (cp, ".html") == 0 ||
+ strcmp (cp, ".htm") );
if ((infile = fopen (filename, "r")) == NULL)
{
*** ispell.h.orig Thu Oct 12 22:04:08 1995
--- ispell.h Tue Dec 16 22:50:00 1997
***************
*** 624,629 ****
--- 624,641 ----
INIT (int tflag, DEFTEXFLAG); /* NZ for TeX mode in current file */
INIT (int prefstringchar, -1); /* Preferred string character type */
+ /* The following two definitions added by
+ * Gerry Tierney <gtierney@nova.ucd.ie>
+ * 14th Oct 95
+ */
+ INIT (int htmlflag, 0); /* HTML-checking state.
+ * 1=enable html-mode,
+ * 0=enable html-mode based on filename,
+ * -1=disable html-mode */
+ INIT (int insidehtml, 0); /* Flag to indicate that the current html
+ * tag has spanned more than one line */
+ /* End of Gerry's Interference */
+
INIT (int terse, 0); /* NZ for "terse" mode */
INIT (char tempfile[MAXPATHLEN], ""); /* Name of file we're spelling into */
---------------C-U-T------H-E-R-E-----------------
+---------------------------------+------------------------------------+
| Gerry Tierney, | You know there ain't no devil |
| Computer Science Dept, | There's just God when he's drunk! |
| University College Dublin. | ... Tom Waits |
+---------------------------------+------------------------------------+
|