File: text.rst

package info (click to toggle)
python-docx 0.8.11%2Bdfsg1-5
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 6,640 kB
  • sloc: xml: 25,311; python: 21,911; makefile: 168
file content (389 lines) | stat: -rw-r--r-- 14,125 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389

Working with Text
=================

To work effectively with text, it's important to first understand a little
about block-level elements like paragraphs and inline-level objects like
runs.


Block-level vs. inline text objects
-----------------------------------

The paragraph is the primary block-level object in Word.

A block-level item flows the text it contains between its left and right
edges, adding an additional line each time the text extends beyond its right
boundary. For a paragraph, the boundaries are generally the page margins, but
they can also be column boundaries if the page is laid out in columns, or
cell boundaries if the paragraph occurs inside a table cell.

A table is also a block-level object.

An inline object is a portion of the content that occurs inside a block-level
item. An example would be a word that appears in bold or a sentence in
all-caps. The most common inline object is a *run*. All content within
a block container is inside of an inline object. Typically, a paragraph
contains one or more runs, each of which contain some part of the paragraph's
text.

The attributes of a block-level item specify its placement on the page, such
items as indentation and space before and after a paragraph. The attributes
of an inline item generally specify the font in which the content appears,
things like typeface, font size, bold, and italic.


Paragraph properties
--------------------

A paragraph has a variety of properties that specify its placement within its
container (typically a page) and the way it divides its content into separate
lines.

In general, it's best to define a *paragraph style* collecting these
attributes into a meaningful group and apply the appropriate style to each
paragraph, rather than repeatedly apply those properties directly to each
paragraph. This is analogous to how Cascading Style Sheets (CSS) work with
HTML. All the paragraph properties described here can be set using a style as
well as applied directly to a paragraph.

The formatting properties of a paragraph are accessed using the
|ParagraphFormat| object available using the paragraph's
:attr:`~.Paragraph.paragraph_format` property.


Horizontal alignment (justification)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Also known as *justification*, the horizontal alignment of a paragraph can be
set to left, centered, right, or fully justified (aligned on both the left
and right sides) using values from the enumeration
:ref:`WdParagraphAlignment`::

    >>> from docx.enum.text import WD_ALIGN_PARAGRAPH
    >>> document = Document()
    >>> paragraph = document.add_paragraph()
    >>> paragraph_format = paragraph.paragraph_format

    >>> paragraph_format.alignment
    None  # indicating alignment is inherited from the style hierarchy
    >>> paragraph_format.alignment = WD_ALIGN_PARAGRAPH.CENTER
    >>> paragraph_format.alignment
    CENTER (1)


Indentation
~~~~~~~~~~~

Indentation is the horizontal space between a paragraph and edge of its
container, typically the page margin. A paragraph can be indented separately
on the left and right side. The first line can also have a different
indentation than the rest of the paragraph. A first line indented further
than the rest of the paragraph has *first line indent*. A first line indented
less has a *hanging indent*.

Indentation is specified using a |Length| value, such as |Inches|, |Pt|, or
|Cm|. Negative values are valid and cause the paragraph to overlap the margin
by the specified amount. A value of |None| indicates the indentation value is
inherited from the style hierarchy. Assigning |None| to an indentation
property removes any directly-applied indentation setting and restores
inheritance from the style hierarchy::

    >>> from docx.shared import Inches
    >>> paragraph = document.add_paragraph()
    >>> paragraph_format = paragraph.paragraph_format

    >>> paragraph_format.left_indent
    None  # indicating indentation is inherited from the style hierarchy
    >>> paragraph_format.left_indent = Inches(0.5)
    >>> paragraph_format.left_indent
    457200
    >>> paragraph_format.left_indent.inches
    0.5


Right-side indent works in a similar way::

    >>> from docx.shared import Pt
    >>> paragraph_format.right_indent
    None
    >>> paragraph_format.right_indent = Pt(24)
    >>> paragraph_format.right_indent
    304800
    >>> paragraph_format.right_indent.pt
    24.0




First-line indent is specified using the
:attr:`~.ParagraphFormat.first_line_indent` property and is interpreted
relative to the left indent. A negative value indicates a hanging indent::

    >>> paragraph_format.first_line_indent
    None
    >>> paragraph_format.first_line_indent = Inches(-0.25)
    >>> paragraph_format.first_line_indent
    -228600
    >>> paragraph_format.first_line_indent.inches
    -0.25


Tab stops
~~~~~~~~~

A tab stop determines the rendering of a tab character in the text of
a paragraph. In particular, it specifies the position where the text
following the tab character will start, how it will be aligned to that
position, and an optional leader character that will fill the horizontal
space spanned by the tab.

The tab stops for a paragraph or style are contained in a |TabStops| object
accessed using the :attr:`~.ParagraphFormat.tab_stops` property on
|ParagraphFormat|::

    >>> tab_stops = paragraph_format.tab_stops
    >>> tab_stops
    <docx.text.tabstops.TabStops object at 0x106b802d8>

A new tab stop is added using the :meth:`~.TabStops.add_tab_stop` method::

    >>> tab_stop = tab_stops.add_tab_stop(Inches(1.5))
    >>> tab_stop.position
    1371600
    >>> tab_stop.position.inches
    1.5

Alignment defaults to left, but may be specified by providing a member of the
:ref:`WdTabAlignment` enumeration. The leader character defaults to spaces,
but may be specified by providing a member of the :ref:`WdTabLeader`
enumeration::

    >>> from docx.enum.text import WD_TAB_ALIGNMENT, WD_TAB_LEADER
    >>> tab_stop = tab_stops.add_tab_stop(Inches(1.5), WD_TAB_ALIGNMENT.RIGHT, WD_TAB_LEADER.DOTS)
    >>> print(tab_stop.alignment)
    RIGHT (2)
    >>> print(tab_stop.leader)
    DOTS (1)

Existing tab stops are accessed using sequence semantics on |TabStops|::

    >>> tab_stops[0]
    <docx.text.tabstops.TabStop object at 0x1105427e8>

More details are available in the |TabStops| and |TabStop| API documentation


Paragraph spacing
~~~~~~~~~~~~~~~~~

The :attr:`~.ParagraphFormat.space_before` and
:attr:`~.ParagraphFormat.space_after` properties control the spacing between
subsequent paragraphs, controlling the spacing before and after a paragraph,
respectively. Inter-paragraph spacing is *collapsed* during page layout,
meaning the spacing between two paragraphs is the maximum of the
`space_after` for the first paragraph and the `space_before` of the second
paragraph. Paragraph spacing is specified as a |Length| value, often using
|Pt|::

    >>> paragraph_format.space_before, paragraph_format.space_after
    (None, None)  # inherited by default

    >>> paragraph_format.space_before = Pt(18)
    >>> paragraph_format.space_before.pt
    18.0

    >>> paragraph_format.space_after = Pt(12)
    >>> paragraph_format.space_after.pt
    12.0


Line spacing
~~~~~~~~~~~~

Line spacing is the distance between subsequent baselines in the lines of
a paragraph. Line spacing can be specified either as an absolute distance or
relative to the line height (essentially the point size of the font used).
A typical absolute measure would be 18 points. A typical relative measure
would be double-spaced (2.0 line heights). The default line spacing is
single-spaced (1.0 line heights).

Line spacing is controlled by the interaction of the
:attr:`~.ParagraphFormat.line_spacing` and
:attr:`~.ParagraphFormat.line_spacing_rule` properties.
:attr:`~.ParagraphFormat.line_spacing` is either a |Length| value,
a (small-ish) |float|, or None. A |Length| value indicates an absolute
distance. A |float| indicates a number of line heights. |None| indicates line
spacing is inherited. :attr:`~.ParagraphFormat.line_spacing_rule` is a member
of the :ref:`WdLineSpacing` enumeration or |None|::

    >>> from docx.shared import Length
    >>> paragraph_format.line_spacing
    None
    >>> paragraph_format.line_spacing_rule
    None

    >>> paragraph_format.line_spacing = Pt(18)
    >>> isinstance(paragraph_format.line_spacing, Length)
    True
    >>> paragraph_format.line_spacing.pt
    18.0
    >>> paragraph_format.line_spacing_rule
    EXACTLY (4)

    >>> paragraph_format.line_spacing = 1.75
    >>> paragraph_format.line_spacing
    1.75
    >>> paragraph_format.line_spacing_rule
    MULTIPLE (5)


Pagination properties
~~~~~~~~~~~~~~~~~~~~~

Four paragraph properties, :attr:`~.ParagraphFormat.keep_together`,
:attr:`~.ParagraphFormat.keep_with_next`,
:attr:`~.ParagraphFormat.page_break_before`, and
:attr:`~.ParagraphFormat.widow_control` control aspects of how the paragraph
behaves near page boundaries.

:attr:`~.ParagraphFormat.keep_together` causes the entire paragraph to appear
on the same page, issuing a page break before the paragraph if it would
otherwise be broken across two pages.

:attr:`~.ParagraphFormat.keep_with_next` keeps a paragraph on the same page
as the subsequent paragraph. This can be used, for example, to keep a section
heading on the same page as the first paragraph of the section.

:attr:`~.ParagraphFormat.page_break_before` causes a paragraph to be placed
at the top of a new page. This could be used on a chapter heading to ensure
chapters start on a new page.

:attr:`~.ParagraphFormat.widow_control` breaks a page to avoid placing the
first or last line of the paragraph on a separate page from the rest of the
paragraph.

All four of these properties are *tri-state*, meaning they can take the value
|True|, |False|, or |None|. |None| indicates the property value is inherited
from the style hierarchy. |True| means "on" and |False| means "off"::

    >>> paragraph_format.keep_together
    None  # all four inherit by default
    >>> paragraph_format.keep_with_next = True
    >>> paragraph_format.keep_with_next
    True
    >>> paragraph_format.page_break_before = False
    >>> paragraph_format.page_break_before
    False


Apply character formatting
--------------------------

Character formatting is applied at the Run level. Examples include font
typeface and size, bold, italic, and underline.

A |Run| object has a read-only :attr:`~.Run.font` property providing access
to a |Font| object. A run's |Font| object provides properties for getting
and setting the character formatting for that run.

Several examples are provided here. For a complete set of the available
properties, see the |Font| API documentation.

The font for a run can be accessed like this::

    >>> from docx import Document
    >>> document = Document()
    >>> run = document.add_paragraph().add_run()
    >>> font = run.font

Typeface and size are set like this::

    >>> from docx.shared import Pt
    >>> font.name = 'Calibri'
    >>> font.size = Pt(12)

Many font properties are *tri-state*, meaning they can take the values
|True|, |False|, and |None|. |True| means the property is "on", |False| means
it is "off". Conceptually, the |None| value means "inherit". A run exists in
the style inheritance hierarchy and by default inherits its character
formatting from that hierarchy. Any character formatting directly applied
using the |Font| object overrides the inherited values.

Bold and italic are tri-state properties, as are all-caps, strikethrough,
superscript, and many others. See the |Font| API documentation for a full
list::

    >>> font.bold, font.italic
    (None, None)
    >>> font.italic = True
    >>> font.italic
    True
    >>> font.italic = False
    >>> font.italic
    False
    >>> font.italic = None
    >>> font.italic
    None

Underline is a bit of a special case. It is a hybrid of a tri-state property
and an enumerated value property. |True| means single underline, by far the
most common. |False| means no underline, but more often |None| is the right
choice if no underlining is wanted. The other forms of underlining, such as
double or dashed, are specified with a member of the :ref:`WdUnderline`
enumeration::

    >>> font.underline
    None
    >>> font.underline = True
    >>> # or perhaps
    >>> font.underline = WD_UNDERLINE.DOT_DASH

Font color
~~~~~~~~~~

Each |Font| object has a |ColorFormat| object that provides access to its
color, accessed via its read-only :attr:`~.Font.color` property.

Apply a specific RGB color to a font::

    >>> from docx.shared import RGBColor
    >>> font.color.rgb = RGBColor(0x42, 0x24, 0xE9)

A font can also be set to a theme color by assigning a member of the
:ref:`MsoThemeColorIndex` enumeration::

    >>> from docx.enum.dml import MSO_THEME_COLOR
    >>> font.color.theme_color = MSO_THEME_COLOR.ACCENT_1

A font's color can be restored to its default (inherited) value by assigning
|None| to either the :attr:`~.ColorFormat.rgb` or
:attr:`~.ColorFormat.theme_color` attribute of |ColorFormat|::

    >>> font.color.rgb = None

Determining the color of a font begins with determining its color type::

    >>> font.color.type
    RGB (1)

The value of the :attr:`~.ColorFormat.type` property can be a member of the
:ref:`MsoColorType` enumeration or None. `MSO_COLOR_TYPE.RGB` indicates it is
an RGB color. `MSO_COLOR_TYPE.THEME` indicates a theme color.
`MSO_COLOR_TYPE.AUTO` indicates its value is determined automatically by the
application, usually set to black. (This value is relatively rare.) |None|
indicates no color is applied and the color is inherited from the style
hierarchy; this is the most common case.

When the color type is `MSO_COLOR_TYPE.RGB`, the :attr:`~.ColorFormat.rgb`
property will be an |RGBColor| value indicating the RGB color::

    >>> font.color.rgb
    RGBColor(0x42, 0x24, 0xe9)

When the color type is `MSO_COLOR_TYPE.THEME`, the
:attr:`~.ColorFormat.theme_color` property will be a member of
:ref:`MsoThemeColorIndex` indicating the theme color::

    >>> font.color.theme_color
    ACCENT_1 (5)