1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187
|
METADATA:
Metadata consist of information that characterizes data.
Metadata are used to provide documentation for data products.
In essence, metadata answer who, what, when, where, why, and how about
every facet of the data that are being documented.
METADATA AND PRIVACY:
Metadata within a file can tell a lot about you.
Cameras record data about when a picture was taken and what
camera was used. Office documents like PDF or Office automatically adds
author and company information to documents and spreadsheets.
Maybe you don't want to disclose those information on the web.
WARNING :
Mat only removes metadata from your files, it does not anonymise their
content, nor can it handle watermarking, steganography, or any too custom
metadata field/system.
If you really want to be anonym, use format that does not contain any
metadata, or better : use plain-text.
DEPENDENCIES:
python2.6 (at least)
python-hachoir-core and python-hachoir-parser
python-pdfrw, python-cairo and python-poppler for full PDF support
shred (should be already installed)
OPTIONALS DEPENDENCIES:
python-mutagen : for massive audio format support
exiftool : for _massive_ image format support
USAGE:
mat --help
or
mat-gui
SUPPORTED FORMAT:
Portable Network Graphics (.png)
support : full
metadata : textual metadata + date
method : removal of harmful fields is done with hachoir
Jpeg (.jpeg, .jpg)
support : full
metadata : comment + exif/photoshop/adobe
method : removal of harmful fields is done with hachoir
Open Document (.odt, .odx, .ods, ...)
support : full
metadata : a meta.xml file
method : removal of the meta.xml file
Office Openxml (.docx, .pptx, .xlsx, ...)
support : full
metadata : a docProps folder containings xml metadata files
method : removal of the docProps folder
Portable Document Fileformat (.pdf)
support : full
metadata : a lot
method : rendering of the PDF file on a cairo surface with the help of
poppler in order to remove all the internal metadata.
For now, cairo create some metadata.
They can be remove if you install either exiftool, or python-pdfrw.
The next version of python-cairo will support PDF metadata.
Tape ARchive (.tar, .tar.bz2, .tar.gz)
support : full
metadata : metadata from the file itself, metadata from the file contained
into the archive, and metadata added by tar to the file at then
creation of the archive
method : extraction of each file, treatement of the file, add treated file
to a new archive, right before the add, remove the metadata added by tar
itself. When the new archive is complete, remove all his metadata.
Zip (.zip)
support : .partial
metadata : metadata from the file itself, metadata from the file contained
into the archive, and metadata added by zip to the file when added to
the archive.
method : extraction of each file, treatement of the file, add treated file
to a new archive. When the new archive is complete, remove all his metadata
MPEG Audio (.mp3, .mp2, .mp1)
support : full
metadata : id3
method : removal of harmful fields is done with hachoir
Ogg Vorbis (.ogg)
support : full
metadata : Vorbis
method : removal of harmful fields is done with mutagen
Free Lossless Audio Codec (.flac)
support : full
metadata : Flac, Vorbis
method : removal of harmful fields is done with mutagen
Torrent (.torrent)
support : full
metadata : torrent
method : using the nice bencode lib by Petru Paler,
heavily tuned/rewritten.
HOW TO IMPLEMENT NEW FORMATS:
1. add the format's mimetype to the STRIPPER list in mat.py
2. inherit the GenericParser class (parser.py)
3. read the parser.py module
4. implement at least these three methods:
- is_clean(self)
- remove_all(self)
- get_meta(self)
5. don't forget to call the do_backup() method when necessary
HOW TO LAUNCH THE TESTSUITE:
1. cd ./test
2. python2.6 test.py : launch all testsuites
3. python2.6 clitest.py : launch the testsuite for the CLI
4. python2.6 libtest.py : launch the testsuite for the mat internal library
ALTERNATIVES AND COMPLEMENTS:
for images:
exiftool (perl) : metadata manipulation
exiv2 (C++) : metadata manipulation
graphicsmagick (a fork from imagemagick) : cli image manipulation
for PDF:
pdfminer (python) : PDF manipulation
other tools:
an hexadecimal editor
NOTES:
Formats that are not in the test suite are not well-tested,
please don't trust the MAT about them !
LICENSE:
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License version 2 as
published by the Free Software Foundation.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
MA 02110-1301, USA.
THANKS:
Mat would not exist without :
- the Google Summer of Code,
- the Python language
- the amazing (and messy) hachoir library,
- poppler and cairo's python bindings,
- and the mutagen library
- people on #tails@freenode
many thanks to them !
KNOWN BUGS:
Zipfiles are not totally cleaned, I know.
I am working on a patch for zipfile.py
|