
|
<html lang="en"><head>
<title>Textchk</title>
<meta http-equiv="Content-Type" content="text/html">
<meta name=description content="Textchk">
<meta name=generator content="makeinfo 4.0">
<link href="http://texinfo.org/" rel=generator-home>
</head><body>
<h1>Table of Contents</h1>
<ul>
<li><a href="#Introduction">Introduction</a>
<ul>
<li><a href="#Introduction">License</a>
<li><a href="#Introduction">Obtain Textchk</a>
<li><a href="#Introduction">How to contact the author</a>
</ul>
<li><a href="#The%20problem%20to%20solve">The problem to solve</a>
<li><a href="#Configuration">Configuration</a>
<ul>
<li><a href="#Configuration">Configuration hierarchy</a>
<li><a href="#Configuration">Special cases</a>
</ul>
<li><a href="#Input">Input for the analysis</a>
<li><a href="#How%20to%20use">How to use Textchk</a>
<ul>
<li><a href="#How%20to%20use">How errors are shown</a>
</ul>
<li><a href="#How%20to%20install">How to install Textchk</a>
<ul>
<li><a href="#How%20to%20install">Gettext</a>
<li><a href="#How%20to%20install">Dependencies</a>
</ul>
<li><a href="#Index">Index</a>
</ul>
<p><hr>
Node:<a name="Introduction">Introduction</a>
<br>
<h1>Introduction</h1>
<p>This is the documentation for Textchk. I decided to write this simple
program to help me to find my usual mistakes when I was writing an
italian book about GNU/Linux and free software:
<a href="http://www.pluto.linux.it/ildp/appuntilinux/">Appunti Linux</a>.
<p>I was convinced to translate this program into English and to make it as
more generalized as possible, as before it was made only for my own
formatting system (ALtools).
<p>I am sorry, but my English is very poor. Any comment and
language correction to this manual is appreciated.
<h2>License</h2>
<p>Textchk is released under the GNU General Public License.
<p>This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
<p>This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
<p>You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
<h2>Obtain Textchk</h2>
<p>At the moment, the main distribution source for Textchk is
the following URI:
<a href="http://master.swlibero.org/~daniele/software/textchk/">http://master.swlibero.org/~daniele/software/textchk/</a>
<h2>How to contact the author</h2>
<pre>Daniele Giacomini
Via Turati, 15
I-31100 Treviso
Italy
daniele @ swlibero.org
</pre>
<p><hr>
Node:<a name="The%20problem%20to%20solve">The problem to solve</a>
<br>
<h1>The problem to solve</h1>
<p>Human writers make mistakes. With the help of a spell checker
can be found only words wrongly spelled, but nothing more. Every one
has it's own typical mistakes, that maybe can be found using simple
regular expression.
<p>Mistakes are not absolute; as languages are dynamic and every author may
decide the style. Textchk helps with the definition of rules that define
a kind of mistake. For example, <code>\b[Tt]his *this\b</code> is a regular
expression that catch the use of the word "this" for two times (the
first time can be capitalized), and this is presumably an error.
<p>Error like these may be typical for one person and very unusual for the
other. Textchk is made to let crate personalized rules, following the
needs. These rules are mainly thought to be part of a particular
documentation project; but can be defined also personal rules (valid for
any personal documentation project) and also general rules to be
extended system-wide.
<p><hr>
Node:<a name="Configuration">Configuration</a>
<br>
<h1>Configuration</h1>
<p>Configuration of Textchk is made of file that defines error rules (with
exceptions) and special situation that are not to be considered mistakes
for some reasons. The file that contains error and exception rules
is organized with records like this:
<p><code>DBL____<var>error-rule</var>[____<var>explanation-text</var>]</code>
<p><code>ERR____<var>error-rule</var>[____<var>explanation-text</var>]</code>
<p><code>EXC____<var>exception-rule</var></code>
<p>Empty lines and lines that start with a <code>#</code> are ignored.
<p>The four <code>_</code> are used to separate fields. The first one defines the
type of record: <code>DBL</code> means that the record describes a word
repeated with no reason; <code>ERR</code> means that the record describes an
error; <code>EXC</code> means that the record describe an exception for the
previous error. The second filed is a regular expression that describe
an error or an exception, depending on the first field. The third field
is available to explain the error. An example may help:
<pre>ERR____\bI'm\b____I'm --> I am
EXC____\bI'm going\b
EXC____\bI'm very proud\b
</pre>
<p>In this case, it is considered an error to use <code>I'm</code>, because the
author like more to expand it to <code>I am</code>. The description to the
error is very simple, <code>I'm --> I am</code>, but can be also more clear
(something like <code>I do not want things like "I'm"</code>). But this error
has two exceptions: <code>I'm going</code> and <code>I'm very proud</code> are
allowed.
<p>When Textchk finds a correspondence with an error rule, it isolates the
text around the error, exactly tree words before and three words after.
Of course, there may be less of three words available. After that, the
comparison with exceptions is made using this extracted text. This means
that the following exception cannot be ever found, because there are
four words after the text that is identified as an error.
<pre>ERR____\bI'm\b____I'm --> I am
# The following exception cannot be verified.
EXC____\bI'm very very very proud\b
</pre>
<p>Regular expressions that describe errors and exceptions should not
include reference to the beginning and the end of a text line. That is:
regular expression like <code>^...$</code> are not allowed.
<p>The <code>DBL</code> record describes a word what might appear double times,
intended as an error. For example:
<pre>DBL____\w\w+____Doubles
EXC____\b[bB]ye\s+bye\b
</pre>
<p>In that case, any two or more alphanumeric characters, making a word,
are located if written double time. Something like: "I need need money".
The word "need" is written twice, and it is a mistake. As it can be
seen, the exception showed inside the example means that the sequence
"bye bye", or "Bye bye" must be allowed.
<h2>Configuration hierarchy</h2>
<p>Textchk is thought to be used with configuration specific for every
documentation project that any author can handle. Anyway, it is also
possible to define a personal configuration and a system-wide
configuration. Here are the configuration files for error and
exceptions; at least one of these files is required:
<ol type=1 start=1>
</p><li><code>./.textchk.rules</code> is the current configuration, that
is read before the other;
<li><code>~/.textchk.rules</code> is the personal configuration, that is
read after the current one and before the system-wide configuration;
<li><code>/etc/textchk.rules</code> is the system-wide configuration, that is
read after the others.
</ol>
<p>Generally it is better to avoid the use of a system-wide configuration.
Anyway, if there is the need to override a system-wide rule, the same
rule can be inserted inside the personal or current configuration file,
followed with an exception with the same regular expression.
That is; suppose that a system-wide rule is as it follows:
<pre>ERR____\bI'm\b____I'm --> I am
</pre>
<p>If you don't want to be bored with that, you can add this to your
personal or current configuration:
<pre># Override system-wide rule.
ERR____\bI'm\b
EXC____\bI'm\b
</pre>
<h2>Special cases</h2>
<p>Some times it is not convenient to define an exception rule for a
particular error. Textchk generates a file containing the peaces of text
containing the errors found. If some of these peaces of text are no
mistakes, but you don't want to describe an exception to avoid this
warning, you can copy them into <code>./.textchk.special</code> (there is no
personal, nor system-wide one).
<p>Suppose that you run Textchk and you obtain a report made of the
following lines, because you decided that "I'm" is a mistake:
<pre>this is because I'm over the big
I'm out of control
I'm not going anywhere
</pre>
<p>Suppose that you don't want to be warned when the peace of text is
<code>I'm not going anywhere</code>. Just put that line into the file
<code>./.textchk.special</code>, and you will not see this warning anymore.
<pre>I'm not going anywhere
</pre>
<p>Now should be clear that the file <code>./.textchk.special</code> is only for
special exceptions: no regular expressions, but only pure text.
Eventually, empty lines are ignored, but no comments are allowed.
<p><hr>
Node:<a name="Input">Input</a>
<br>
<h1>Input for the analysis</h1>
<p>Textchk read the input file line by line and the comparison with error
rules is made inside the space of a single line. This way, the text file
that is used as an input, should be transformed so that paragraphs are
joined together; that is: every paragraph should stay on a single line.
<p>This job is made by a front-end for man pages, HTML pages and Texinfo
sources. For other sources, the text must be normalized as a simple text
file with very long lines.
<p><hr>
Node:<a name="How%20to%20use">How to use</a>
<br>
<h1>How to use Textchk</h1>
<p>Textchk is made of one single executable: <code>textchk</code>.
<p>
<table width="100%">
<tr>
<td align="left"><b>textchk</b><i> <var>option</var> <var>file-to-be-analyzed</var> [<var>report-file</var> [<var>diag-file</var>]]
</i></td>
<td align="right">Command</td>
</tr>
</table>
<table width="95%" align="center">
<tr><td>
</TD></TR>
</TABLE>
<p>The option defines the type of the file,
<code>--input-type=<var>type</var></code>, so that it can be transformed before
the real scan. Some key words are available:
<ul>
<li><code>man</code> means that this is a man page;
<li><code>html</code> means that this is an HTML page;
<li><code>texinfo</code>, <code>texi</code> means that this is a Texinfo
source;
<li><code>standard</code> means that this is a normalized text file.
</ul>
<p>The second argument is the name of the file. The third argument can be
the name of the report file (the one that store the peaces of text
considered mistakes); if not given it is equal to
<code><var>file-to-be-analyzed</var>.err</code>. The fourth argument is the name
for a diagnostic file, that contains all information of the scanning
made, useful to understand where rules doesn't do what is expected. If
this name is not given, it is equal to <code><var>report-file</var>.diag</code> or
<code><var>file-to-be-analyzed</var>.diag</code>.
<p>For example,
<p><code>textchk --input-type=man bash.1</code>
<p>gives two files: <code>bash.1.err</code> and <code>bash.1.diag</code>.
<h2>How errors are shown</h2>
<p>During its work, Textchk shows on screen what it finds, delimiting
errors with <code>>></code> and <code><<</code>. For example, if we have the same
old error rule:
<pre>ERR____\bI'm\b____I'm --> I am
EXC____\bI'm going\b
</pre>
<p>we can obtain warning like these:
<pre>I'm --> I am
to be here. >>I'm<< here today and
I'm --> I am
>>I'm<< not mad.
</pre>
<p>Inside the diagnostic report, all the process is shown:
<pre>??? to be here. >>I'm<< here today and
ERR \bI'm\b
!!! to be here. >>I'm<< here today and
??? I know, >>I'm<< going to be
ERR \bI'm\b
EXC \bI'm going\b
??? >>I'm<< not mad.
ERR \bI'm\b
!!! >>I'm<< not mad.
??? Now >>I'm<< here to stay
ERR \bI'm\b
SPC Now I'm here to stay
</pre>
<p>Records starting with <code>???</code> show the problem; record starting with
<code>ERR</code> show the error rule that is responsible; record starting with
<code>EXC</code> show an exception rule that revert the error into a valid
string; record starting with <code>SPC</code> show a special string that is to
be considered valid; record starting with <code>!!!</code> show an error that
persist.
<p><hr>
Node:<a name="How%20to%20install">How to install</a>
<br>
<h1>How to install Textchk</h1>
<p>Textchk is made essentially of one executable: <code>textchk</code>. This
file can be placed everywhere you can run it without giving the path;
that is: inside a directory listed by the environment variable
<code>PATH</code>.
<p>It is needed Perl as <code>/usr/bin/perl</code>. If your system is organized
differently, you should modify the first line of this executable:
<pre>#!/usr/bin/perl
#...
</pre>
<p>After that, you need only a suitable <code>./.textchk.rules</code> and maybe
also <code>./.textchk.special</code>
<h2>Gettext</h2>
<p>The messages that Textchk shows may be translated. To install the already
translated PO files, it is necessary to compile them like this:
<pre>msgfmt -o textchk.mo it.po
</pre>
<p>In this example the file <code>it.po</code> is compiled and it is generated
the file <code>textchk.mo</code>. This generated file must be copied inside
the right directory; in this case, may be
<code>/usr/share/locale/it/LC_MESSAGES/</code>.
<p>If you don't have installed the Perl-gettext module and you don't want
to warry about it, you can comment the following instructions:
<pre># We *don't* want to use gettext.
#use POSIX;
#use Locale::gettext;
#setlocale (LC_MESSAGES, "");
#textdomain ("textchk");
</pre>
<p>Then you have to introduce a dummy <code>gettext()</code> function:
<pre>sub gettext
{
return $_[0];
}
</pre>
<h2>Dependencies</h2>
<p>Textchk depends on other software to transform manual pages, HTML pages
and Texinfo sources into normalized text. This is Groff, Lynx and
Texinfo. As it is included the use of Gettext, the Perl-gettext module
must be installed.
<p><hr>
Node:<a name="Index">Index</a>
<br>
<h1>Index</h1>
<ul compact>
<li><code>./.textchk.rules</code>: <a href="#Configuration">Configuration</a>
<li><code>./.textchk.special</code>: <a href="#Configuration">Configuration</a>
<li><code>/etc/textchk.rules</code>: <a href="#Configuration">Configuration</a>
<li>configuration: <a href="#Configuration">Configuration</a>
<li>dependencies: <a href="#How%20to%20install">How to install</a>
<li>Gettext: <a href="#How%20to%20install">How to install</a>
<li>input text: <a href="#Input">Input</a>
<li>installation: <a href="#How%20to%20install">How to install</a>
<li>normalized text: <a href="#Input">Input</a>
<li><code>PATH</code>: <a href="#How%20to%20install">How to install</a>
<li><code>textchk</code>: <a href="#How%20to%20use">How to use</a>
<li><code>~/.textchk.rules</code>: <a href="#Configuration">Configuration</a>
</ul>
</body></html>
|