File: tutorial.rst

package info (click to toggle)
python-ebooklib 0.19-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 648 kB
  • sloc: python: 1,800; makefile: 132; sh: 52
file content (322 lines) | stat: -rw-r--r-- 10,121 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
Tutorial
========

Introduction
------------

Ebooklib is used to manage EPUB2/EPUB3 files.

Reading EPUB
------------
::

    import ebooklib
    from ebooklib import epub

    book = epub.read_epub('test.epub')

There is a :func:`ebooklib.epub.read_epub` function used for reading EPUB files. It accepts full file path to the
EPUB file as argument. It will return instance of :class:`ebooklib.epub.EpubBook` class.


Metadata
++++++++

Method :meth:`ebooklib.epub.EpubBook.get_metadata` is used for fetching metadata. It accepts 2 arguments. First argument
is name of the namespace ('DC' for Dublin Core and 'OPF' for custom metadata). Second argument is name of the key.
It always returns a list. List will be empty if nothing is defined for that key.

Minimal required metadata from Dublin Core set (for EPUB3) is:

* DC:identifier
* DC:title
* DC:language

'DC' namespace is used when accessing Dublin Core metadata.

::

    >>> book.get_metadata('DC', 'title')
    [('Ratio', {})]

    >>> book.get_metadata('DC', 'creator')
    [('Firstname Lastname ', {})]

    >>> book.get_metadata('DC', 'identifier')
    [('9781416566120', {'id': 'isbn_9781416566120'})]


Optional metadata from the Dublin Core set is:

* DC:creator
* DC:contributor
* DC:publisher
* DC:rights
* DC:coverage
* DC:date
* DC:description

This is how Dublin Core metadata is defined inside of content.opf file.

::

    <dc:language>en</dc:language>
    <dc:identifier id="isbn_9781416566120">9781416566120</dc:identifier>

You can also have custom metadata. For instance this is how custom metadata is defined in content.opf file.
You can define same key more then once.
2
::

    <meta content="my-cover-image" name="cover"/>
    <meta content="cover-image" name="cover"/>

When accessing custom metadata you will use namespace 'OPF'. Notice you will get more then one result now.

::

    >>> book.get_metadata('OPF', 'cover')
    [(None, {'content': 'my-cover-image', 'name': 'cover'}),
     (None, {'content': 'cover-image', 'name': 'cover'})]

Check the official documentation for more info:

* http://www.idpf.org/epub/30/spec/epub30-publications.html#sec-opf-dcmes-optional.
* http://www.idpf.org/epub/30/spec/epub30-publications.html
* http://dublincore.org/documents/dces/


Items
+++++

All of the resources (style sheets, images, videos, sounds, scripts and html files) are items.


::

    images = book.get_items_of_type(ebooklib.ITEM_IMAGE)

Fetch items by their type with :meth:`ebooklib.epub.EpubBook.get_items_of_type`.

Here is a list of current item types you can use:

* ITEM_UNKNOWN
* ITEM_IMAGE
* ITEM_STYLE
* ITEM_SCRIPT
* ITEM_NAVIGATION
* ITEM_VECTOR
* ITEM_FONT
* ITEM_VIDEO
* ITEM_AUDIO
* ITEM_DOCUMENT
* ITEM_COVER
* ITEM_SMIL

::

    cover_image = book.get_item_with_id('cover-image')

Fetch items by their id (if you know it) with :meth:`ebooklib.epub.EpubBook.get_item_with_id`.

::

    index = book.get_item_with_href('index.xhtml')

Fetch them by their filename with :meth:`ebooklib.epub.EpubBook.get_item_with_href`.

::

    items = book.get_items_of_media_type('image/png')

Fetch them by their media type with :meth:`ebooklib.epub.EpubBook.get_items_of_type`.

::

    all_items = book.get_items()

Return all of the items with :meth:`ebooklib.epub.EpubBook.get_items`. This is what you are going to use most
of the times when handling unknown EPUB files.

**Important to remember!** Methods *get_item_with_id*, *get_item_with_href* will
return item object. Methods *get_items_of_type*, *get_items_of_type* and *get_items* will return iterator (and not list).

To get a content from existing item (regarding if it is image, style sheet or html file) you use :meth:`ebooklib.epub.EpubItem.get_content`.
For HTML items you also have :meth:`ebooklib.epub.EpubHtml.get_body_content`. What is the difference? Get_content always
return entire content of the file while get_body_content only returns whatever is in the <body> part of the HTML document.

::

    for item in book.get_items():
        if item.get_type() == ebooklib.ITEM_DOCUMENT:
            print('==================================')
            print('NAME : ', item.get_name())
            print('----------------------------------')
            print(item.get_content())
            print('==================================')


Creating EPUB
-------------

::

    from ebooklib import epub

    book = epub.EpubBook()

EPUB has some minimal metadata requirements which you need to fulfil. You need to define unique identifier, title of the book
and language used inside. When it comes to language code recommended best practice is to use a controlled vocabulary such as RFC 4646
- http://www.ietf.org/rfc/rfc4646.txt.

::

    book.set_identifier('sample123456')
    book.set_title('Sample book')
    book.set_language('en')

    book.add_author('Aleksandar Erkalovic')

You can also add custom metadata. First one is from the Dublin Core namespace and second one is purely custom.

::

    book.add_metadata('DC', 'description', 'This is description for my book')
    book.add_metadata(None, 'meta', '', {'name': 'key', 'content': 'value'})

This is how our custom metadata will end up in the *content.opf* file.

::

    <dc:description>This is description for my book</dc:description>
    <meta content="value" name="key"></meta>

Chapters are represented by :class:`ebooklib.epub.EpubHtml`. You must define the *file_name* and *title*. In our case
title is going to be used when generating Table of Contents.

When defining content you can define it as valid HTML file or just parts of HTML elements you have as a content. It will
ignore whatever you have in <head> element.

::

    # intro chapter
    c1 = epub.EpubHtml(title='Introduction',
                       file_name='intro.xhtml',
                       lang='en')
    c1.set_content(u'<html><body><h1>Introduction</h1><p>Introduction paragraph.</p></body></html>')

    # about chapter
    c2 = epub.EpubHtml(title='About this book',
                       file_name='about.xhtml')
    c2.set_content('<h1>About this book</h1><p>This is a book.</p>')

Do some basic debugging to see what kind of content will end up in the book. In this case we have inserted title
of the chapter and language definition. It would also add links to the style sheet files if we have attached them
to this chapter.

::

    >>> print(c1.get_content())
    b'<?xml version=\'1.0\' encoding=\'utf-8\'?>\n<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml"
    xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/#"
    lang="en" xml:lang="en">\n  <head>\n    <title>Introduction</title>\n  </head>\n  <body>\n    <h1>Introduction</h1>\n
    <p>Introduction paragraph.</p>\n  </body>\n</html>\n'

Any kind of item (style sheet, image, HTML file) must be added to the book.

::

    book.add_item(c1)
    book.add_item(c2)


You can add any kind of file to the book. For instance, in this case we are adding style sheet file. We define
filename, unique id, media_type and content for it. Just like the chapter files you need to add it to the book.
Style sheet files could also be added to the chapter. In that case links would be automatically added to the
chapter HTML.

::

    style = 'body { font-family: Times, Times New Roman, serif; }'

    nav_css = epub.EpubItem(uid="style_nav",
                            file_name="style/nav.css",
                            media_type="text/css",
                            content=style)
    book.add_item(nav_css)


Table of the contents must be defined manually. ToC is a tuple/list of elements. You can either define link manually
with :class:`ebooklib.epub.Link` or just insert item object inside. When you manually insert you can define different
title in the ToC than in the chapter. If you just insert item object it will use whatever title you defined for
that item when creating it.

Sections are just tuple with two values. First one is title of the section and 2nd is tuple/list with subchapters.

::

    book.toc = (epub.Link('intro.xhtml', 'Introduction', 'intro'),
                  (
                    epub.Section('Languages'),
                    (c1, c2)
                  )
                )

So as the Spine. You can use unique id for the item or just add instance of it to the spine.

::

    book.spine = ['nav', c1, c2]


At the end we need to add NCX and Navigation tile. They will not be added automatically.

::

    book.add_item(epub.EpubNcx())
    book.add_item(epub.EpubNav())


At the end write down your book. You need to specify full path to the book, you can not write it down to the
File Object or something else. At the moment exception will not be raised if something goes wrong. In the future
version this will be changed. Right now you need to check the return value of the function or control 
it with the option *raise_exceptions*.

::

    epub.write_epub('test.epub', book)

It also accepts some of the options.

=================   ====================================
Option              Default value
=================   ====================================
epub2_guide         True
epub3_landmark      True
epub3_pages         True
ignore_ncx          False
landmark_title      "Guide"
pages_title         "Pages"
spine_direction     True
package_direction   False
play_order          {'enabled': False, 'start_from': 1}
raise_exceptions    False
compresslevel       6
=================   ====================================

In the future version default value for ignore_ncx will be changed. According to the documentation default
behaviour should be "EPUB 3 Reading Systems must ignore the NCX in favor of the EPUB Navigation Document".
Because we have been doing wrong this all time we will keep the default behavior to prepare for the change.

compresslevel goes from 0 to 9, where 0 is no compression.

Example when overriding default options:

::

    epub.write_epub('test.epub', book, {"epub3_pages": False})


Samples
-------
Further examples are available in https://github.com/aerkalov/ebooklib/tree/master/samples