File: translit.c

package info (click to toggle)
rel 1.3-3
  • links: PTS
  • area: non-free
  • in suites: hamm, potato, slink
  • size: 496 kB
  • ctags: 216
  • sloc: ansic: 1,868; sh: 254; makefile: 142
file content (315 lines) | stat: -rw-r--r-- 10,531 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
/*

------------------------------------------------------------------------------

A license is hereby granted to reproduce this software source code and
to create executable versions from this source code for personal,
non-commercial use.  The copyright notice included with the software
must be maintained in all copies produced.

THIS PROGRAM IS PROVIDED "AS IS". THE AUTHOR PROVIDES NO WARRANTIES
WHATSOEVER, EXPRESSED OR IMPLIED, INCLUDING WARRANTIES OF
MERCHANTABILITY, TITLE, OR FITNESS FOR ANY PARTICULAR PURPOSE.  THE
AUTHOR DOES NOT WARRANT THAT USE OF THIS PROGRAM DOES NOT INFRINGE THE
INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY IN ANY COUNTRY.

Copyright (c) 1995, 1996, John Conover, All Rights Reserved.

Comments and/or bug reports should be addressed to:

    john@johncon.com (John Conover)

------------------------------------------------------------------------------

translit.c, transliterate a page

ssize_t transliterate (unsigned char *page, ssize_t count);

    translate count many characters in page using uppercase as a
    translation table

    note: the sole reason for breaking this module out of searchfile()
    is to provide a means of manipulating the content of a file that
    is being searched-the rules are:

        1) the search area must start at page[0], (but can constitute
        a smaller area of the page data space,) and the search area
        must end a ' ' character; it is a requirement of bmhsearch(),
        in bmhsearch.c, that the '\0' character is reserved as an end
        of search sentinel in the pattern-failure to observe this rule
        will result in a program that is erratic and either hangs
        forever, or perhaps does a core dump of a very involved data
        structure, that is very difficult to analyze-see also
        uppercase.c and bmhsearch.c

        2) the return value, count, must be the size of the data
        space, in page to be searched, *_NOT_* including the last ' '
        character

In conjunction with uppercase.c, hyphenation, backspace and
underlining, and phrase searching are addressed:

    1) hyphenation could be implemented by omitting a '-' followed by
    any number of white space characters

    2) if the program is used primarily for searching catman pages,
    the backspace and underlining features that are incorporated in
    the man page system can be defeated by deleting the "backspace
    character" sequences from the documents.

    3) phrase searching could be enhanced by translating any number of
    whitespace characters into a single ' ' character-the "\ " search
    phrase would then be interpreted as any number of white space
    characters. See uppercase.c for comments concerning whitespace,
    and locale specific issues.

    Note that main() in rel.c calls transliterate() in tranlit.c to
    transliterate the query/search criteria-if an exact match is
    specified, both the pattern and the data would be altered in
    exactly the same manner, and appropriate matches found even though
    both were translated, (although additional patterns could
    conceivably be matched, the originals will be found,
    irregardless,) for example, the data:

        ... re-engineering ...

    or hyphenated:

        ... re-
        engineering ...

    would become reengineering, which could be found by any of the
    query patterns:

        re
        engineering
        reengineering
        re-engineering

    Likewise for multiple space compression in phrase query patterns.
    Quite probably, such scenarios should be controlled by command
    line options, perhaps via a language selection to avoid
    localization and portability conflicts.

The algorithm is as follows:

    for each character in the page

        replace the character with its equivilent in uppercase[]

Usage is a call with page referencing the first character to be
translated, and count the number of characters to be translated,
for example:

    count = transliterate (page, count + 2);

There are no errors, and the number of characters translated is
returned

To test this module, compile the module source with -DTEST_TRANSLIT

$Revision: 1.2 $
$Date: 1996/09/13 13:47:23 $
$Id: translit.c,v 1.2 1996/09/13 13:47:23 john Exp $
$Log: translit.c,v $
Revision 1.2  1996/09/13 13:47:23  john
Added handling of circularly linked directories and subdirectories in searchpath.c
Cosmetic changes to bmhsearch.c, postfix.c, rel.c, searchfile.c, translit.c, uppercase.c, version.c.

Revision 1.1  1996/02/08 02:55:10  john
Added hyphenation, backspace, and multiple whitespace capability.
Changes to files: uppercase.c translit.c searcfile.c rel.c and version.c-required for hyphenation, backspace, and multiple whitespace capability.

 * Revision 1.0  1995/04/22  05:13:18  john
 * Initial revision
 *

*/

#include "rel.h"

#ifndef LINT /* include rcsid only if not running lint */

static char rcsid[] = "$Id: translit.c,v 1.2 1996/09/13 13:47:23 john Exp $"; /* module version */
static char rcsid_h[] = TRANSLIT_H_ID; /* module include version */

#endif

/*

Note: the heuristics for addressing hyphenation issues are as follows:

    if a hyphen is found while transliterating the page:

        skip the hyphen, and any following whitespace or another
        hyphens, to the first character that is not whitespace or a
        hyphen, which will collapse consecutive instances of
        whitespace and hyphens into nothing.

Note: the heuristics for addressing the backspace character is as
follows:

    if a backspace character is found while transliterating the page:

        skip the backspace, and overwrite the character before the
        backspace with the character after the backspace, which will
        instantiate the character of the last instance of of
        consecutive backspace/character combinations. This is
        specifically for catman pages which utilize
        underscore/backspace/character combinations for underlining,
        in addition to backspace/character combinations for bold
        representation-note that for this process to be successful,
        the underscore must preceed the character in the sequence.

Note: the heuristics for addressing phrase issues are as follows:

    if a whitespace character is found while transliterating the page:

        and if the previous character found while transliterating the
        page is also whitespace, skip the second instance of the
        whitespace character, which will collapse consecutive
        instances of whitespace characters into a single space.

*/

#ifdef __STDC__

ssize_t transliterate (unsigned char *page, ssize_t count)

#else

ssize_t transliterate (page, count)
    unsigned char *page;
    ssize_t count;

#endif

{
    unsigned char last_char = (unsigned char) '\0', /* last character in memory page */
                  current_char, /* current character in memory page */
                  *char_ref = page; /* reference to character in memory page */

    int i, /* character counter */
        j = 0; /* character count */

    for (i = 0; i < (int) count; i++) /* for each character in the page */
    {
        current_char = *char_ref = (unsigned char) uppercase[(int) page[i]]; /* convert the character to uppercase */

        switch ((int) current_char) /* what is the current character in the memory page? */
        {

            case (int) '-': /* hyphenation? */

                i++; /* yes, skip the hyphen; next character in the page */

                for (i = i; i < (int) count; i++) /* for each character following the hyphen */
                {
                    current_char = *char_ref = (unsigned char) uppercase[(int) page[i]]; /* convert the character to uppercase */

                    if (current_char != (unsigned char) ' ') /* character not whitespace? */
                    {

                        if (current_char != (unsigned char) '-') /* yes, character not a hyphen? */
                        {
                            char_ref++; /* yes, next character */
                            j++; /* yes, increment the character count */
                            break;
                        }

                    }

                }

                break;

            case (int) '\b': /* backspace? */

                i++; /* yes, skip the backspace; next character in the page */
                char_ref --; /* previous character */
                current_char = *char_ref = (unsigned char) uppercase[(int) page[i]]; /* convert the character to uppercase */
                char_ref++; /* next character */
                break;

            case (int) ' ': /* space? */

                if (last_char != (unsigned char) ' ') /* yes, last character in memory page not a space? */
                {
                    char_ref++; /* yes, next character */
                    j++; /* increment the character count */
                }

                break;

            default:

                char_ref++; /* next character */
                j++; /* increment the character count */
                break;

        }

        last_char = current_char; /* last character in memory page is current character in memory page */
    }

    return (j); /* return the size of the page */
}

#ifdef TEST_TRANSLIT

/*

simple exerciser for testing transliterate (); get a string from
stdin, transliterate it, and print it to stdout; ignore the:

declared global, could be static
    transliterate       translit.c(xx)

from lint

*/

#ifdef __STDC__

int main (void)

#else

int main ()

#endif

{
    unsigned char buffer[BUFSIZ]; /* buffer to be parsed */

    ssize_t i; /* length of transliterated buffer */

    if (make_uppercase () != (unsigned char *) 0) /* setup the uppercase array */
    {

        while (gets ((char *) buffer) != 0) /* input the string to be transliterated */
        {
            i = transliterate (buffer, strlen ((char *) buffer)); /* transliterate the buffer */
            buffer[i] = '\0'; /* terminate the transliterated buffer with an EOS for printing */
            (void) printf ("%s\n", buffer); /* print the transliterate buffer */
        }

    }

    else
    {
        (void) fprintf (stderr, "error making uppercase array\n"); /* couldn't setup the uppercase array, print the error */
        exit (1); /* and exit */
    }

    exit (0); /* return success */

#ifdef LINT /* include only if running lint */

    return (0); /* for LINT formality */

#endif

}

#endif