File: README

package info (click to toggle)
mat 0.3.2-1
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 724 kB
  • sloc: python: 4,130; makefile: 6
file content (187 lines) | stat: -rw-r--r-- 6,066 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
METADATA:
    Metadata consist of information that characterizes data.
    Metadata are used to provide documentation for data products.
    In essence, metadata answer who, what, when, where, why, and how about
    every facet of the data that are being documented.


METADATA AND PRIVACY:
    Metadata within a file can tell a lot about you.
    Cameras record data about when a picture was taken and what
    camera was used. Office documents like PDF or Office automatically adds
    author and company information to documents and spreadsheets.
    Maybe you don't want to disclose those information on the web.


WARNING :
    Mat only removes metadata from your files, it does not anonymise their
    content, nor can it handle watermarking, steganography, or any too custom
    metadata field/system.

    If you really want to be anonym, use format that does not contain any
    metadata, or better : use plain-text.


DEPENDENCIES:
    python2.6 (at least)
    python-hachoir-core and python-hachoir-parser
    python-pdfrw, python-cairo and python-poppler for full PDF support
    shred (should be already installed)


OPTIONALS DEPENDENCIES:
    python-mutagen : for massive audio format support
    exiftool : for _massive_ image format support


USAGE:
        mat --help
    or
        mat-gui


SUPPORTED FORMAT:
    Portable Network Graphics (.png)
        support : full
        metadata : textual metadata + date
        method : removal of harmful fields is done with hachoir


    Jpeg (.jpeg, .jpg)
        support : full
        metadata : comment + exif/photoshop/adobe
        method : removal of harmful fields is done with hachoir


    Open Document (.odt, .odx, .ods, ...)
        support : full
        metadata : a meta.xml file
        method : removal of the meta.xml file


    Office Openxml (.docx, .pptx, .xlsx, ...)
        support : full
        metadata : a docProps folder containings xml metadata files
        method : removal of the docProps folder


    Portable Document Fileformat (.pdf)
        support : full
        metadata : a lot
        method : rendering of the PDF file on a cairo surface with the help of
                poppler in order to remove all the internal metadata.
                For now, cairo create some metadata.
                They can be remove if you install either exiftool, or python-pdfrw.
                The next version of python-cairo will support PDF metadata.


    Tape ARchive (.tar, .tar.bz2, .tar.gz)
        support : full
        metadata : metadata from the file itself, metadata from the file contained
                into the archive, and metadata added by tar to the file at then
                creation of the archive
        method : extraction of each file, treatement of the file, add treated file
            to a new archive, right before the add, remove the metadata added by tar
            itself. When the new archive is complete, remove all his metadata.


    Zip (.zip)
        support : .partial
        metadata : metadata from the file itself, metadata from the file contained
                into the archive, and metadata added by zip to the file when added to
                the archive.

        method : extraction of each file, treatement of the file, add treated file
            to a new archive. When the new archive is complete, remove all his metadata


    MPEG Audio (.mp3, .mp2, .mp1)
        support : full
        metadata : id3
        method : removal of harmful fields is done with hachoir


    Ogg Vorbis (.ogg)
        support : full
        metadata : Vorbis
        method : removal of harmful fields is done with mutagen


    Free Lossless Audio Codec (.flac)
        support : full
        metadata : Flac, Vorbis
        method : removal of harmful fields is done with mutagen

    Torrent (.torrent)
        support : full
        metadata : torrent
        method : using the nice bencode lib by Petru Paler,
            heavily tuned/rewritten.


HOW TO IMPLEMENT NEW FORMATS:
    1. add the format's mimetype to the STRIPPER list in mat.py
    2. inherit the GenericParser class (parser.py)
    3. read the parser.py module
    4. implement at least these three methods:
        - is_clean(self)
        - remove_all(self)
        - get_meta(self)
    5. don't forget to call the do_backup() method when necessary


HOW TO LAUNCH THE TESTSUITE:
    1. cd ./test
    2. python2.6 test.py    : launch all testsuites
    3. python2.6 clitest.py : launch the testsuite for the CLI
    4. python2.6 libtest.py : launch the testsuite for the mat internal library


ALTERNATIVES AND COMPLEMENTS:
for images:
    exiftool (perl) : metadata manipulation
    exiv2 (C++) : metadata manipulation
    graphicsmagick (a fork from imagemagick) : cli image manipulation

for PDF:
    pdfminer (python) : PDF manipulation

other tools:
    an hexadecimal editor


NOTES:
    Formats that are not in the test suite are not well-tested,
    please don't trust the MAT about them !


LICENSE:
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License version 2 as
    published by the Free Software Foundation.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
    MA 02110-1301, USA.


THANKS:
    Mat would not exist without :
    - the Google Summer of Code,
    - the Python language
    - the amazing (and messy) hachoir library,
    - poppler and cairo's python bindings,
    - and the mutagen library
    - people on #tails@freenode
    many thanks to them !


KNOWN BUGS:
    Zipfiles are not totally cleaned, I know.
    I am working on a patch for zipfile.py