1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155
|
=================================
OCRFeeder - A Complete OCR Suite
=================================
:Author: Joaquim Rocha
:Contact: joaquimrocha1 {AT} gmail {DOT} com
:Date: 2009-05-10
:Web site: http://code.google.com/p/ocrfeeder/
:Version: 0.3
:License: OCRFeeder is published under the GNU GPL v3 license.
.. contents::
Introduction
==============
OCRFeeder is a complete Optical Character Recognition and Document Analysis
and Recognition program.
This document explains what's needed to and how to install and run OCRFeeder.
ODFPy is developed by OpenDocument Fellowship:
http://opendocumentfellowship.com
System Requirements
====================
The following list presents the system requirements for this project to run
with all its functionalities. Each dependence has the minimum version that
should be used. While newer versions are likely to keep working as well,
older versions may or may not work as they were not tested.
* Python (version 2.5);
* PyGTK (version 2.13);
* Python Enchant (version 1.3)
* Python Imaging Sane (version 1.1.7)
* Python Image Library (version 1.1);
* PyGoocanvas (version 0.12);
* Ghostscript (version 8.63);
* Unpaper (version 0.3);
The module ODFPy is included as part of this project in order to make the
installation and execution of the project easier.
Since this project doesn't use a particular OCR engine, no engine was listed
as a dependence above. Nevertheless, the project is not usable without an
OCR engine. The configuration XML file for the engine Ocrad is already
included with the project so only what's needed to be installed for a first
test is the Ocrad engine itself. In case the user doesn't want to use Ocrad,
the configuration file that is placed in the OCRFeeder's configuration folder
(~/.ocrfeeder) must be deleted.
Other engines that might also be considered are Gocr and Tesseract.
Installation on Ubuntu
=======================
The only packages needed to be installed on Ubuntu 8.10 is PyGoocanvas and
Unpaper, the rest of the dependences are already installed in a fresh install
of this version of Ubuntu. The engine Ocrad is also installed for the reasons
explained in the previous section.
To install PyGoocanvas, Ocrad and Unpaper, the following command should be
executed as superuser:
# apt-get install python-pygoocanvas ocrad unpaper
After all of the packages finish the installation, OCRFeeder is ready to be
installed. To install it, all that is needed is to run setup.py script as
superuser:
# setup.py install
OCRFeeder can now be run by calling it from a desktop menu or by running the
*ocrfeeder* command.
When using the GNOME desktop, if the desktop menu entry is not showing the
OCRFeeder's icon, the following command must be used to update the icon cache
(as superuser):
# gtk-update-icon-cache -f -t /usr/share/icons/hicolor
Command Line Usage
===================
This section explains how to use OCRFeeder from the command line.
The command line interface part of OCRFeeder aims at users who want to perform
quick and unattended conversions of document images to editable formats. It
also makes this project usable from other applications.
Two parameters are mandatory:
1) the path to each document image to be processed is given after the parameter
--images;
2) the name of the document to be generated is given after the parameter
--o.
For example:
$ ocrfeeder-cli --images ~/image1.png ~/image2.jpeg
--o converted_document
The pages of the generated documents honor the order of the given paths.
It is also possible to specify the format of the document to be generated
(HTML or ODT) with the option --format. In case no format is specified,
the images will be exported to ODT. Continuing with the example above:
$ ocrfeeder-cli --images ~/image1.png ~/image2.jpeg --format HTML
--o converted_document
OCRFeeder Studio (the graphical user interface part) can also be launched
from the command line. Two options can be used to load images right after
the program initiates. Those are --images which will add the images given
as the option's arguments and --dir that will add all the images under a
given directory path. The options can be used individually or combined,
for example:
$ ocrfeeder --images ~/image1.png ~/image2.jpeg
--dir ~/Desktop
For any usage, the options and parameters can be given in any order.
Copyright
==========
OCRFeeder is (c) 2008-2010 Joaquim Rocha
OCRFeeder is (c) 2009-2010 Igalia, S.L.
http://live.gnome.org/OCRFeeder
The *odf* module is (c) Søren Roug, European Environment Agency
http://odfpy.forge.osor.eu/
See AUTHORS file for details on authors
OCRFeeder is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
OCRFeeder is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
See the COPYING file for more details.
|