File: README

package info (click to toggle)
libextractor 0.4.2-2sarge6
  • links: PTS
  • area: main
  • in suites: sarge
  • size: 26,048 kB
  • ctags: 4,689
  • sloc: ansic: 24,558; cpp: 17,181; sh: 11,543; makefile: 689; java: 159; sed: 16; python: 10
file content (62 lines) | stat: -rw-r--r-- 2,168 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
libextractor
============

libextractor is a simple library for keyword extraction.  libextractor
does not support all formats but supports a simple plugging mechanism
such that you can quickly add extractors for additional formats, even
without recompiling libextractor. libextractor typically ships with a
dozen helper-libraries that can be used to obtain keywords from common
file-types.



extract
=======

extract is a simple command-line interface to libextractor.



Writing plugins
===============


If you want to write your own extractor for some filetype, all you
need to do is write a little library that implements a single method
with this signature:


KeywordList * <libraryname>_extract(const char * filename,
                                    char * data,
                                    size_t size,
                                    KeywordList * prev);

where <libraryname> is the name of the library file that you will tell
libExtractor to load, minus the suffix.  For example, if you link your
extractor into a file called 'myextractor.so', the method above should
be called 'myextractor_extract'.

The filename is the name of the file, data is a pointer to the
contents of the file and size is the size of the file.  The extract
method must prepend keywords that it finds to the linked list 'prev'
and return the new head.  The library must allocate (malloc) the entry
in the keyword list and the memory for the filename since both will be
free'ed by libExtractor once the application calls freeKeywords.

An example implementation can be found in mp3extractor.c.



Notes
=====

libextractor contains some very large C files.  gcc can easily use
over (!) 100 MB of memory to compile them.  If you have that much,
libextractor will compile in about a minute.  If you don't have that
much, you may want to consider using the binaries.

On Mac OS X, libextractor will avoid using GCC 3.1, because of
problems compiling one of the extractors.  GCC 3.3 and 2.95.2 are
known to work well; as such, libextractor will first look for 3.3 (by
attempting to run gcc-3.3, cpp-3.3, and g++-3.3) and then 2.95.2 (by
attempting to run gcc2 and g++2).