1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182
|
MediaWiki Markup Translator
===========================
This package provides Python framework for translating WikiMedia
articles to various formats. The present version supports
conversions to plain text, HTML, and Texinfo formats.
A command line converter utility is included.
Classes
=======
class ``WikiMarkup``
--------------------
A base class for all translator classes. Unless you plan extending
wikitrans, you will never have to create objects of this
class. Instead, you will be using one of its derived classes.
Constructor arguments common for all derived classes:
filename = *name*
The file *name* is opened and used for input.
file = *fd*
An already opened file *fd* is used for input.
text = *string*
Input is taken from *string*, line by line.
lang = *code*
Specifies language version. Default is ``en``. This variable can be
referred to as ``%(lang)s`` in the keyword arguments below.
html_base = *url*
Base URL for cross-references. Default is
``http://%(lang)s.wikipedia.org/wiki/``.
image_base = *url*
Base URL for images. Default is
``http://upload.wikimedia.org/wikipedia/commons/thumb/a/bf``
media_base = *url*
Base URL for media files. Default is
``http://www.mediawiki.org/xml/export-0.3``
debug_level = *int*
Debug verbosity level (0 - no debug info, 100 - excessively
copious debug messages). Default is 0.
strict = *bool*
Strict parsing mode. Throw exceptions on syntax errors. Default is False.
class ``TextWikiMarkup``
------------------------
Translates material in Wiki markup language to plain text. Usage::
from WikiTrans.wiki2text import TextWikiMarkup
markup = TextWikiMarkup(filename='input.txt')
markup.parse()
print(str(markup))
Specific constructor arguments:
width = *N*
Limit output width to *N* columns. Default is 78.
show_urls = *bool*
Whether or not to show the URLs links refer to. If *bool* is
``True`` (the default), a URL will be displayed in parentheses next
to the link text. If ``False``, only the link text will be displayed.
class ``TextWiktionaryMarkup``
------------------------------
Translate material from wiktionary to plain text form. This is
supposed to provide a wiktionary-specific form of
``TextWikiMarkup``. Currently, this class differs from
``TextWikiMarkup`` only in that the default value for ``html_base``
is ``http://%(lang)s.wikipedia.org/wiki/``.
class ``TexiWikiMarkup``
------------------------
Translate Wiki markup to Texinfo source. Usage::
from WikiTrans.wiki2texi import TexiWikiMarkup
markup = TexiWikiMarkup(filename='input.txt')
markup.parse()
print(str(markup))
Two markup-specific keywords control the sectioning model used.
sectioning_model = *model*
Selects the Texinfo sectioning model for the output
document. Possible values are:
``numbered``
Top of document is marked with ``@top``. Headings (``=``, ``==``,
``===``, etc) produce ``@chapter``,
``@section``, ``@subsection``, etc.
``unnumbered``
Unnumbered sectioning: ``@top``, ``@unnumbered``, ``@unnumberedsec``,
``@unnumberedsubsec``.
``appendix``
Sectioning suitable for appendix entries: ``@top``, ``@appendix``,
``@appendixsec``, ``@appendixsubsec``, etc.
``heading``
Use heading directives to reflect sectioning: ``@majorheading``,
``@chapheading``, ``@heading``, ``@subheading``, etc.
sectioning_start = *n*
Shift resulting heading level by *n* positions. For example, supposing
``sectioning_model=numbered``, ``== A ==`` will produce ``@section
A`` on output. If ``sectioning_start=1`` is also given, this
directive will produce ``@subsection A`` instead.
class ``HtmlWikiMarkup``
------------------------
Translates Wiki markup to HTML. Usage::
from WikiTrans.wiki2html import HtmlWikiMarkup
markup = HtmlWikiMarkup(filename='input.txt')
markup.parse()
print(str(markup))
Supported keywords are same as for ``WikiMarkup`` class.
class ``HtmlWiktionaryMarkup``
------------------------------
Translate material from wiktionary to HTML form. This is
supposed to provide a wiktionary-specific form of
``HtmlWikiMarkup``. Currently both classes are equivalent, except that
the default value for ``html_base`` in ``HtmlWiktionaryMarkup``
is ``http://%(lang)s.wikipedia.org/wiki/``.
The ``wikitrans`` utility
=========================
This command line utility converts the supplied text to selected
output format. The usage syntax is::
wikitrans [OPTIONS] ARG
If ARG looks like a URL, the wiki text to be converted will be
downloaded from that URL.
Otherwise, if the ``--base-url=URL`` option is given, ARG is treated as
the name of the page to get from the WikiMedia istallation at ``URL``.
Otherwise, ARG is treated as the name of the file to read wiki
material from.
Examples::
wikitrans text.wiki
wikitrans --base-url http://en.wiktionary.org door
wikitrans https://en.wiktionary.org/wiki/Special:Export/door
Options are:
``--version``
Show program's version number and exit.
``-h``, ``--help``
Show a short usage summary and exit.
``-v``, ``--verbose``
Verbose operation.
``-I ITYPE``, ``--input-type=ITYPE``
Set input document type. *ITYPE* is one of: ``default`` or ``wiktionary``.
``-t OTYPE``, ``--to=OTYPE``, ``--type=OTYPE``
Set output document type (``html`` (the default), ``texi``,
``text``, or ``dump``).
``-l LANG``, ``--lang=LANG``
Set input document language.
``-o KW=VAL``, ``--option=KW=VAL``
Pass the keyword argument ``KW=VAL`` to the parser class constructor.
``-d DEBUG``, ``--debug=DEBUG``
Set debug level (0..100).
``-D``, ``--dump``
Dump parse tree and exit; same as ``--type=dump``.
``-b URL``, ``--base-url=URL``
Set base url.
Note: when using ``--base-url`` or passing URL as an argument (2nd and 3rd
use cases above), if the URL is in 'wikipedia.org' or 'wiktionary.org'
domain, the options ``--input-type``, and ``--lang`` are set automatically.
|