1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
|
version 3.4.0
• Initial development release from SVN (based on 2.3j by Jorge Valdes)
• Majorly refactored (see http://www.joval.info/proj/FuzzyOcr-2.3j/CHANGES for a Changelog 2.3b -> 2.3j)
• Improved support for animated gifs (requires new dependency gifsicle)
• Removed ImageMagick dependency
version 3.4.1
• Fixed logging facility
◦ Now logs to file only if specified
◦ SA output does not go to logfile anymore
◦ Running SA in debug mode always outputs FuzzyOcr debug messages
• Some documentation parts and configuration file updated
version 3.4.2
• Fixed Configuration Facility to work properly together with other custom plugins (like RelayChecker)
◦ Thanks to John Rudd for reporting this
• fuzzy-find.pl utility fixed
◦ now outputs usage description
◦ uses FuzzyOcr.cf for file paths
◦ removed ImageMagick dependency
version 3.5.0
• First release which is completely modularized for better maintaining
• Completely rewrote handling of external applications
◦ Process timeouts are enforced now
◦ No zombie processes
◦ Decide between global or per-application timeout
◦ Flexible way to add helper additional applications in the config
• Completely new scanset and preprocessor interface
◦ Scansets and preprocessors are now in two seperate files
◦ New, easy syntax for both
◦ Flexible enough to allow almost all applications without using scripts
◦ Easily put preprocessors together to one pipe for a scanset
◦ Plugin emulates STDIN and STDOUT automatically if necessary
◦ Plugin handles all input and output files as well as pipes automatically
◦ Plugin substitutes automatically all $macros with registered helper applications
◦ Allows the use of TesserAct and practically every other command line OCR engine
◦ Zombie save system, each external application is terminated properly
• Hash database system extended
◦ Experimental MySQL interface for the hashing system
◦ MLDBM databases are now properly locked to prevent corruption and failure
▪ Requires the new dependency MLDBM::Sync as listed in the installation manual
▪ Requires Tie::Cache (MLDBM::Sync dependency)
• More resource saving features
◦ Option to skip files based on
▪ File size (defined per format)
▪ File type
▪ Image dimensions
◦ Negative auto disable value (Messages below a given score are skipped)
◦ Minimal scanset option (The first scanset producing enough hits is taken, others are skipped)
◦ Automatic scanset resorting option
▪ When running in memory, plugin keeps track of the efficiency for each scanset on the last X messages
▪ After each scan, the scanset order is changed to have the most efficient scanset as first scanset.
◦ Autodisable score is now rechecked between initial FuzzyOcr? tests (content-type, etc) and OCR tests
• More features against false positives
◦ Auto threshold adjusting for smaller words
◦ OCR Results are analyzed twice
▪ First pass without stripping spaces, hits scored here weigth more than second pass hits
▪ Second pass with stripping spaces, only done if first pass does not hit enough
◦ New option to allow each word to match only once per image
• Improved tools
◦ fuzzy-find majorly improved
▪ Works together with configuration file now
▪ Added switches to learn spam or ham from command line as hashes or image files
▪ MySQL support
▪ Some bugs fixed
• Better extraction of images from the message
◦ Content-Type Application/Octet?-Stream is accepted now for all images
◦ New rule which scores if file extension and file format don't match
◦ Attachments with Application/Octet?-Stream are now always checked for magic bytes
• Better logging
◦ Running spamassassin with -D will always output all debug messages to stderr
◦ New logfile log levels (0-3)
◦ Message ID, sender and receipient are now logged in debug mode
◦ Debug mode logging shows execution time for external applications now to find bottlenecks
• Bugfixes and minor changes
◦ Configuration parser rewritten, accepts 0 now (instead of 0.0)
◦ Zombie bugs fixed
◦ Fixed bug in locking of logfile and plain hash database
◦ Temporary directories now cleaned up on global timeout and if multiple images were in the message
◦ Only create temporary directory if an image was found (speeds up processing)
◦ Skip Ocrad scansets with images smaller than 16x16 (produces error with ocrad)
◦ New option to allow matching of numbers in the OCR output
◦ Personal wordlist option now uses userstate by default to allow automatic substitution with user's homedir by SA
version 3.5.1
• Several bugfixes, see the Patchset list for 3.5.0 for a detailed list
• Added maximum height/width for images to scan
• Bugfix in kill_pid()
version 3.6.0
• Tagged from SVN revision 137
• Compatibility with SpamAssassin 3.2.x
• Breaks compatibility with SpamAssassin 3.1.x
• Several smaller bugfixes that have been used from SVN for a longer time
|