1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
|
/*b
* Copyright (C) 2001,2002 Rick Richardson
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*
* Author: Rick Richardson <rickr@mn.rr.com>
b*/
/*
* This is a simple byte-stream lexical analyzer for SGML. It produces only
* two tokens: tag or text, which are returned via a callback function.
*
* It entirely is up to the caller to meaningfully parse the tags. The
* caller is also responsible for postprocessing text to remove &code;
* escapes and the like.
*
*/
typedef struct sgml_lexer SGML_LEXER; /* Opaque handle to internal data */
/*
* Create a new lexical analyzer.
*
* bufsize is the desired size of the tag or text buffer. If a tag or
* text doesn't fit in the buffer, then the user will have to handle
* the _TRUNC lexer codes.
*/
SGML_LEXER *sgml_lexer_new(int bufsize);
/*
* Destroy a lexical analyzer
*/
void sgml_lexer_destroy(SGML_LEXER *lp);
/*
* Lexical analyzer callback routine
*/
typedef enum {
SGML_LEXER_TEXT=1,
SGML_LEXER_TAG=2,
SGML_LEXER_HTTP=3,
SGML_LEXER_HTTP_END=4,
SGML_LEXER_TEXT_TRUNC=-1,
SGML_LEXER_TAG_TRUNC=-2,
SGML_LEXER_HTTP_TRUNC=-3,
} SGML_LEXER_CODE;
typedef void (*SGML_LEXER_CB)(
void *cbarg, /* user supplied context */
SGML_LEXER_CODE code, /* tag or text indicator */
char *data /* tag or text data */
);
/*
* Lexical analyzer input routine.
*
* The lexical analyzer is byte-streaam based. You hand it the byte
* stream character-by-character with this routine. When it detects
* a complete token (tag or text), it executes the callback routine.
*/
typedef enum {
SGML_LEXER_CONT=0,
SGML_LEXER_EOF=-1
} SGML_LEXER_RC;
SGML_LEXER_RC sgml_lexer_putc(SGML_LEXER *lp,
int c, /* Input character */
SGML_LEXER_CB cb, /* callback routine */
void *cbarg /* user supplied context */
);
/*
* Reset lexer to scan a new page
*/
void sgml_lexer_http(SGML_LEXER *lp);
void sgml_lexer_reset(SGML_LEXER *lp);
|