File: CHANGES

package info (click to toggle)
fuzzyocr 3.6.0-15
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, sid
  • size: 804 kB
  • sloc: perl: 3,127; sh: 45; makefile: 2
file content (91 lines) | stat: -rw-r--r-- 5,606 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
version 3.4.0
    • Initial development release from SVN (based on 2.3j by Jorge Valdes)
    • Majorly refactored (see  http://www.joval.info/proj/FuzzyOcr-2.3j/CHANGES for a Changelog 2.3b -> 2.3j)
    • Improved support for animated gifs (requires new dependency gifsicle)
    • Removed ImageMagick dependency
version 3.4.1
    • Fixed logging facility
        ◦ Now logs to file only if specified
        ◦ SA output does not go to logfile anymore
        ◦ Running SA in debug mode always outputs FuzzyOcr debug messages
    • Some documentation parts and configuration file updated
version 3.4.2
    • Fixed Configuration Facility to work properly together with other custom plugins (like RelayChecker)
        ◦ Thanks to John Rudd for reporting this
    • fuzzy-find.pl utility fixed
        ◦ now outputs usage description
        ◦ uses FuzzyOcr.cf for file paths
        ◦ removed ImageMagick dependency
version 3.5.0
    • First release which is completely modularized for better maintaining
    • Completely rewrote handling of external applications
        ◦ Process timeouts are enforced now
        ◦ No zombie processes
        ◦ Decide between global or per-application timeout
        ◦ Flexible way to add helper additional applications in the config
    • Completely new scanset and preprocessor interface
        ◦ Scansets and preprocessors are now in two seperate files
        ◦ New, easy syntax for both
        ◦ Flexible enough to allow almost all applications without using scripts
        ◦ Easily put preprocessors together to one pipe for a scanset
        ◦ Plugin emulates STDIN and STDOUT automatically if necessary
        ◦ Plugin handles all input and output files as well as pipes automatically
        ◦ Plugin substitutes automatically all $macros with registered helper applications
        ◦ Allows the use of TesserAct and practically every other command line OCR engine
        ◦ Zombie save system, each external application is terminated properly
    • Hash database system extended
        ◦ Experimental MySQL interface for the hashing system
        ◦ MLDBM databases are now properly locked to prevent corruption and failure
            ▪ Requires the new dependency MLDBM::Sync as listed in the installation manual
            ▪ Requires Tie::Cache (MLDBM::Sync dependency)
    • More resource saving features
        ◦ Option to skip files based on
            ▪ File size (defined per format)
            ▪ File type
            ▪ Image dimensions
        ◦ Negative auto disable value (Messages below a given score are skipped)
        ◦ Minimal scanset option (The first scanset producing enough hits is taken, others are skipped)
        ◦ Automatic scanset resorting option
            ▪ When running in memory, plugin keeps track of the efficiency for each scanset on the last X messages
            ▪ After each scan, the scanset order is changed to have the most efficient scanset as first scanset.
        ◦ Autodisable score is now rechecked between initial FuzzyOcr? tests (content-type, etc) and OCR tests
    • More features against false positives
        ◦ Auto threshold adjusting for smaller words
        ◦ OCR Results are analyzed twice
            ▪ First pass without stripping spaces, hits scored here weigth more than second pass hits
            ▪ Second pass with stripping spaces, only done if first pass does not hit enough
        ◦ New option to allow each word to match only once per image
    • Improved tools
        ◦ fuzzy-find majorly improved
            ▪ Works together with configuration file now
            ▪ Added switches to learn spam or ham from command line as hashes or image files
            ▪ MySQL support
            ▪ Some bugs fixed
    • Better extraction of images from the message
        ◦ Content-Type Application/Octet?-Stream is accepted now for all images
        ◦ New rule which scores if file extension and file format don't match
        ◦ Attachments with Application/Octet?-Stream are now always checked for magic bytes
    • Better logging
        ◦ Running spamassassin with -D will always output all debug messages to stderr
        ◦ New logfile log levels (0-3)
        ◦ Message ID, sender and receipient are now logged in debug mode
        ◦ Debug mode logging shows execution time for external applications now to find bottlenecks
    • Bugfixes and minor changes
        ◦ Configuration parser rewritten, accepts 0 now (instead of 0.0)
        ◦ Zombie bugs fixed
        ◦ Fixed bug in locking of logfile and plain hash database
        ◦ Temporary directories now cleaned up on global timeout and if multiple images were in the message
        ◦ Only create temporary directory if an image was found (speeds up processing)
        ◦ Skip Ocrad scansets with images smaller than 16x16 (produces error with ocrad)
        ◦ New option to allow matching of numbers in the OCR output
        ◦ Personal wordlist option now uses userstate by default to allow automatic substitution with user's homedir by SA
version 3.5.1
    • Several bugfixes, see the Patchset list for 3.5.0 for a detailed list
    • Added maximum height/width for images to scan
    • Bugfix in kill_pid()
version 3.6.0
    • Tagged from SVN revision 137
    • Compatibility with SpamAssassin 3.2.x
    • Breaks compatibility with SpamAssassin 3.1.x
    • Several smaller bugfixes that have been used from SVN for a longer time