File: WORK-IN-PROGRESS

package info (click to toggle)
exiv2 0.27.3-3
  • links: PTS, VCS
  • area: main
  • in suites: bullseye, sid
  • size: 54,956 kB
  • sloc: cpp: 80,713; python: 4,360; sh: 1,497; makefile: 320; javascript: 237; awk: 92; ansic: 78; sed: 16
file content (345 lines) | stat: -rw-r--r-- 16,569 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
T A B L E   o f   C O N T E N T S
---------------------------------

1   Building Adobe XMPsdk and Samples in Terminal with the ./Generate_XXX_mac.sh scripts
1.1 Amazing Discovery 1    DumpFile is linked to libstdc++.6.dylib
1.2 Amazing Discovery 2    Millions of "weak symbol/visibility" messages

4   Build design for v0.26.1
4.8 Support for MinGW

5   Refactoring the Tiff Code
5.1 Background
5.2 How does Exiv2 decode the ExifData in a JPEG?
5.3 How is metadata organized in Exiv2
5.4 Where are the tags defined?
5.5 How do the MakerNotes get decoded?
5.6 How do the encoders work?

6   Using external XMP SDK via Conan

==========================================================================

4   Build design for v0.26.1

Added   : 2017-08-18
Modified: 2017-08-23

    The purpose of the v0.26.1 is to release bug fixes and
    experimental new features which may become defaults with v0.27

4.8 Support for MinGW
    MinGW msys/1.0 was deprecated when v0.26 was released.
    No support for MinGW msys/1.0 will be provided.
    It's very likely that the MinGW msys/1.0 will build.
    I will not provide any user support for MinGW msys/1.0 in future.

    MinGW msys/2.0 might be supported as "experimental" in Exiv2 v0.26.2


==========================================================================

5   Refactoring the Tiff Code

Added   : 2017-09-24
Modified: 2017-09-24

5.1 Background
    Tiff parsing is the root code of a metadata engine.

    The Tiff parsing code in Exiv2 is very difficult to understand and has major architectural shortcomings:

    1) It requires the Tiff file to be totally in memory
    2) It cannot handle BigTiff
    3) The parser doesn't know the source of the in memory tiff image
    4) It uses memory mapping on the tiff file
       - if the network connection is lost, horrible things happen
       - it requires a lot of VM to map the complete file
       - BigTiff file can be 100GB+
       - The memory mapping causes problems with Virus Detection software on Windows
    5) The parser cannot deal with multi-page tiff files
    6) It requires the total file to be in contiguous memory and defeats 'webready'.

    The Tiff parsing code in Exiv2 is ingenious.  It's also very robust.  It works well.  It can:

    1) Handle 32-bit Tiff and Many Raw formats (which are derived from Tiff)
    2) It can read and write Manufacturer's MakerNotes which are (mostly) in Tiff format
    3) It probably has other great features that I haven't discovered
       - because the code is so hard to understand, I can't simply browse and read it.
    4) It separates file navigation from data analysis.

    The code in image::printStructure was originally written to understand "what is a tiff?"
    It has problems:
    1) It was intended to be a single threaded debugging function and has security issues.
    2) It doesn't handle BigTiff
    3) It's messy.  It's reading and processing metadata simultaneously.

    The aim of this project is to
    1) Reconsider the Tiff Code.
    2) Keep everything good in the code and address known deficiencies
    3) Establish a Team Exiv2 "Tiff Expert" who knows the code intimately.

5.2 How does Exiv2 decode the ExifData in a JPEG?
    You can get my test file from http://clanmills.com/Stonehenge.jpg

    808 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/build $ exiv2 -pS ~/Stonehenge.jpg
        STRUCTURE OF JPEG FILE: /Users/rmills/Stonehenge.jpg
         address | marker       |  length | data
               0 | 0xffd8 SOI
               2 | 0xffe1 APP1  |   15288 | Exif..II*......................
           15292 | 0xffe1 APP1  |    2610 | http://ns.adobe.com/xap/1.0/.<?x
           17904 | 0xffed APP13 |      96 | Photoshop 3.0.8BIM.......'.....
           18002 | 0xffe2 APP2  |    4094 | MPF.II*...............0100.....
           22098 | 0xffdb DQT   |     132
           22232 | 0xffc0 SOF0  |      17
           22251 | 0xffc4 DHT   |     418
           22671 | 0xffda SOS
        809 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/build $

    Exiv2 calls JpegBase::readMetadata which locates the APP1/Exif segment.
    It invokes the ExifParser:
       ExifParser::decode(exifData_, rawExif.pData_, rawExif.size_);
    This is thin wrapper over:
       TiffParserWorker::decode(....) in tiffimage.cpp

    What happens then?  I don't know.  The metadata is decoded in:
       tiffvisitor.cpp TiffDecoder::visitEntry()

    The design of the TiffMumble classes is the "Visitor" pattern
    described in "Design Patterns" by Addison & Wesley.  The aim of the pattern
    is to separate parsing from dealing with the data.

    The data is being stored in ExifData which is a vector.
    Order is important and preserved.
    As the data values are recovered they are stored as Exifdatum in the vector.

    How does the tiff visitor work?  I think the reader and processor
    are connected by this line in TiffParser::
        rootDir->accept(reader);

    The class tree for the decoder is:

    class TiffDecoder : public TiffFinder {
      class TiffReader ,
      class TiffFinder : public TiffVisitor {
        class TiffVisitor {
          public:
          //! Events for the stop/go flag. See setGo().
          enum GoEvent {
              geTraverse       = 0,
              geKnownMakernote = 1
          };

          void setGo(GoEvent event, bool go);
          virtual void visitEntry(TiffEntry* object) =0;
          virtual void visitDataEntry(TiffDataEntry* object) =0;
          virtual void visitImageEntry(TiffImageEntry* object) =0;
          virtual void visitSizeEntry(TiffSizeEntry* object) =0;
          virtual void visitDirectory(TiffDirectory* object) =0;
          virtual void visitSubIfd(TiffSubIfd* object) =0;
          virtual void visitMnEntry(TiffMnEntry* object) =0;
          virtual void visitIfdMakernote(TiffIfdMakernote* object) =0;
          virtual void visitIfdMakernoteEnd(TiffIfdMakernote* object);
          virtual void visitBinaryArray(TiffBinaryArray* object) =0;
          virtual void visitBinaryArrayEnd(TiffBinaryArray* object);
          //! Operation to perform for an element of a binary array
          virtual void visitBinaryElement(TiffBinaryElement* object) =0;

          //! Check if stop flag for \em event is clear, return true if it's clear.
          bool go(GoEvent event) const;
        }
      }
    }

    The reader works by stepping along the Tiff directory and calls the visitor's
    "callbacks" as it reads.

    There are 2000 lines of code in tiffcomposite.cpp and, to be honest,
    I don't know what most of it does!

    Set a breakpoint in src/exif.cpp#571.
    That’s where he adds the key/value to the exifData vector.
    Exactly how did he get here?  That’s a puzzle.

    void ExifData::add(const ExifKey& key, const Value* pValue)
    {
        add(Exifdatum(key, pValue));
    }

5.3 How is metadata organized in Exiv2
    section.group.tag

    section: Exif | IPTC | Xmp
    group:   Photo | Image | MakerNote | Nikon3 ....
    tag: YResolution etc ...

    820 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/src $ exiv2 -pa ~/Stonehenge.jpg | cut -d' ' -f 1 | cut -d. -f 1 | sort | uniq
    Exif
    Iptc
    Xmp

    821 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/src $ exiv2 -pa --grep Exif ~/Stonehenge.jpg  | cut -d'.' -f 2 | sort | uniq
    GPSInfo
    Image
    Iop
    MakerNote
    Nikon3
    NikonAf2
    NikonCb2b
    NikonFi
    NikonIi
    NikonLd3
    NikonMe
    NikonPc
    NikonVr
    NikonWt
    Photo
    Thumbnail

    822 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/src $ 533 rmills@rmillsmbp:~/Downloads $ exiv2 -pa --grep Exif ~/Stonehenge.jpg  | cut -d'.' -f 3 | cut -d' ' -f 1 | sort | uniq
    AFAperture
    AFAreaHeight
    AFAreaMode
    ...
    XResolution
    YCbCrPositioning
    YResolution
534 rmills@rmillsmbp:~/Downloads $
    823 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/src $

    The data in IFD0 of is Exiv2.Image:

    826 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/src $ exiv2 -pR ~/Stonehenge.jpg  | head -20
    STRUCTURE OF JPEG FILE: /Users/rmills/Stonehenge.jpg
     address | marker       |  length | data
           0 | 0xffd8 SOI
           2 | 0xffe1 APP1  |   15288 | Exif..II*......................
      STRUCTURE OF TIFF FILE (II): MemIo
       address |    tag                              |      type |    count |    offset | value
            10 | 0x010f Make                         |     ASCII |       18 |       146 | NIKON CORPORATION
            22 | 0x0110 Model                        |     ASCII |       12 |       164 | NIKON D5300
            34 | 0x0112 Orientation                  |     SHORT |        1 |           | 1
            46 | 0x011a XResolution                  |  RATIONAL |        1 |       176 | 300/1
            58 | 0x011b YResolution                  |  RATIONAL |        1 |       184 | 300/1
            70 | 0x0128 ResolutionUnit               |     SHORT |        1 |           | 2
            82 | 0x0131 Software                     |     ASCII |       10 |       192 | Ver.1.00
            94 | 0x0132 DateTime                     |     ASCII |       20 |       202 | 2015:07:16 20:25:28
           106 | 0x0213 YCbCrPositioning             |     SHORT |        1 |           | 1
           118 | 0x8769 ExifTag                      |      LONG |        1 |           | 222
        STRUCTURE OF TIFF FILE (II): MemIo
         address |    tag                              |      type |    count |    offset | value
             224 | 0x829a ExposureTime                 |  RATIONAL |        1 |       732 | 10/4000
             236 | 0x829d FNumber                      |  RATIONAL |        1 |       740 | 100/10
    827 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/src $ exiv2 -pa --grep Image ~/Stonehenge.jpg
    Exif.Image.Make                              Ascii      18  NIKON CORPORATION
    Exif.Image.Model                             Ascii      12  NIKON D5300
    Exif.Image.Orientation                       Short       1  top, left
    Exif.Image.XResolution                       Rational    1  300
    Exif.Image.YResolution                       Rational    1  300
    Exif.Image.ResolutionUnit                    Short       1  inch
    Exif.Image.Software                          Ascii      10  Ver.1.00
    Exif.Image.DateTime                          Ascii      20  2015:07:16 20:25:28
    Exif.Image.YCbCrPositioning                  Short       1  Centered
    Exif.Image.ExifTag                           Long        1  222
    Exif.Nikon3.ImageBoundary                    Short       4  0 0 6000 4000
    Exif.Nikon3.ImageDataSize                    Long        1  6173648
    Exif.NikonAf2.AFImageWidth                   Short       1  0
    Exif.NikonAf2.AFImageHeight                  Short       1  0
    Exif.Photo.ImageUniqueID                     Ascii      33  090caaf2c085f3e102513b24750041aa
    Exif.Image.GPSTag                            Long        1  4060
    828 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/src $

    The data in IFD1 is Exiv2.Photo

    The data in the MakerNote is another embedded TIFF (which more embedded tiffs)

    829 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/src $ exiv2 -pa --grep MakerNote ~/Stonehenge.jpg
    Exif.Photo.MakerNote                         Undefined 3152  (Binary value suppressed)
    Exif.MakerNote.Offset                        Long        1  914
    Exif.MakerNote.ByteOrder                     Ascii       3  II
    830 rmills@rmillsmbp:~/gnu/github/exiv2/exiv2/src $

    The MakerNote decodes them into:

    Exif.Nikon1, Exiv2.NikonAf2 and so on.  I don't know exactly it achieves this.
    However it means that tag-numbers can be reused in different IFDs.
    Tag 0x0016 = Nikon GPSSpeed and can mean something different elsewhere.

5.4 Where are the tags defined?

    There's an array of "TagInfo" data structures in each of the makernote decoders.
    These define the tag (a number) and the tag name, the groupID (eg canonId) and the default type.
    There's also a callback to print the value of the tag.  This does the "interpretation"
    that is performed by the -pt in the exiv2 command-line program.

    TagInfo(0x4001, "ColorData", N_("Color Data"), N_("Color data"), canonId, makerTags, unsignedShort, -1, printValue),

5.5 How do the MakerNotes get decoded?

    I don't know.  It has something to do with this code in tiffcomposite.cpp#936

    TiffMnEntry::doAccept(TiffVisitor& visitor) { ... }

    Most makernotes are TiffStructures.  So the TiffXXX classes are invoked recursively to decode the maker note.

#0	0x000000010058b4b0 in Exiv2::Internal::TiffDirectory::doAccept(Exiv2::Internal::TiffVisitor&) at /Users/rmills/gnu/github/exiv2/exiv2/src/tiffcomposite.cpp:916
    This function iterated the array of entries

#1	0x000000010058b3c6 in Exiv2::Internal::TiffComponent::accept(Exiv2::Internal::TiffVisitor&) at /Users/rmills/gnu/github/exiv2/exiv2/src/tiffcomposite.cpp:891
#2	0x00000001005b5357 in Exiv2::Internal::TiffParserWorker::parse(unsigned char const*, unsigned int, unsigned int, Exiv2::Internal::TiffHeaderBase*) at /Users/rmills/gnu/github/exiv2/exiv2/src/tiffimage.cpp:2006
    This function creates an array of TiffEntries

#3	0x00000001005a2a60 in Exiv2::Internal::TiffParserWorker::decode(Exiv2::ExifData&, Exiv2::IptcData&, Exiv2::XmpData&, unsigned char const*, unsigned int, unsigned int, void (Exiv2::Internal::TiffDecoder::* (*)(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned int, Exiv2::Internal::IfdId))(Exiv2::Internal::TiffEntryBase const*), Exiv2::Internal::TiffHeaderBase*) at /Users/rmills/gnu/github/exiv2/exiv2/src/tiffimage.cpp:1900
#4	0x00000001005a1ae9 in Exiv2::TiffParser::decode(Exiv2::ExifData&, Exiv2::IptcData&, Exiv2::XmpData&, unsigned char const*, unsigned int) at /Users/rmills/gnu/github/exiv2/exiv2/src/tiffimage.cpp:260
#5	0x000000010044d956 in Exiv2::ExifParser::decode(Exiv2::ExifData&, unsigned char const*, unsigned int) at /Users/rmills/gnu/github/exiv2/exiv2/src/exif.cpp:625
#6	0x0000000100498fd7 in Exiv2::JpegBase::readMetadata() at /Users/rmills/gnu/github/exiv2/exiv2/src/jpgimage.cpp:386
#7	0x000000010000bc59 in Action::Print::printList() at /Users/rmills/gnu/github/exiv2/exiv2/src/actions.cpp:530
#8	0x0000000100005835 in Action::Print::run(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) at /Users/rmills/gnu/github/exiv2/exiv2/src/actions.cpp:245


5.6 How do the encoders work?

    I understand writeMetadata() and will document that soon.
    I still have to study how the TiffVisitor writes metadata.


6   Using external XMP SDK via Conan

Section 1 describes how to compile the newer versions of XMP SDK with a bash script. This
approach had few limitations:

    1) We had to include sources from other projects into the Exiv2 repository: Check the folder
    xmpsdk/third-party.
    2) Different scripts for compiling XMP SDK on Linux, Mac OSX and Windows.
    3) Lot of configuration/compilation issues depending on the system configuration.

Taking into account that during the last months we have done a big effort in migrating the
manipulation of 3rd party dependencies to Conan, we have decided to do the same here. A conan recipe
has been written for XmpSdk at:

https://github.com/piponazo/conan-xmpsdk

And the recipe and package binaries can be found in the piponazo's bintray repository:

https://bintray.com/piponazo/piponazo

This conan recipe provides a custom CMake finder that will be used by our CMake code to properly
find XMP SDK in the conan cache and then be able to use the CMake variables: ${XMPSDK_LIBRARY} and
${XMPSDK_INCLUDE_DIR}.

These are the steps you will need to follow to configure the project with the external XMP support:

    # Add the conan-piponazo remote to your conan configuration (only once)
    conan remote add conan-piponazo https://api.bintray.com/conan/piponazo/piponazo 

    mkdir build && cd build

    # Run conan to bring the dependencies. Note that the XMPSDK is not enabled by default and you will
    # need to enable the xmp option to bring it.
    conan install .. --options xmp=True

    # Configure the project with support for the external XMP version. Disable the normal XMP version
    cmake -DCMAKE_BUILD_TYPE=Release -DEXIV2_ENABLE_XMP=OFF -DEXIV2_ENABLE_EXTERNAL_XMP=ON -DBUILD_SHARED_LIBS=ON ..

Note that the usage of the newer versions of XMP is experimental and it was included in Exiv2
because few users has requested it.