1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389
|
Working with Text
=================
To work effectively with text, it's important to first understand a little
about block-level elements like paragraphs and inline-level objects like
runs.
Block-level vs. inline text objects
-----------------------------------
The paragraph is the primary block-level object in Word.
A block-level item flows the text it contains between its left and right
edges, adding an additional line each time the text extends beyond its right
boundary. For a paragraph, the boundaries are generally the page margins, but
they can also be column boundaries if the page is laid out in columns, or
cell boundaries if the paragraph occurs inside a table cell.
A table is also a block-level object.
An inline object is a portion of the content that occurs inside a block-level
item. An example would be a word that appears in bold or a sentence in
all-caps. The most common inline object is a *run*. All content within
a block container is inside of an inline object. Typically, a paragraph
contains one or more runs, each of which contain some part of the paragraph's
text.
The attributes of a block-level item specify its placement on the page, such
items as indentation and space before and after a paragraph. The attributes
of an inline item generally specify the font in which the content appears,
things like typeface, font size, bold, and italic.
Paragraph properties
--------------------
A paragraph has a variety of properties that specify its placement within its
container (typically a page) and the way it divides its content into separate
lines.
In general, it's best to define a *paragraph style* collecting these
attributes into a meaningful group and apply the appropriate style to each
paragraph, rather than repeatedly apply those properties directly to each
paragraph. This is analogous to how Cascading Style Sheets (CSS) work with
HTML. All the paragraph properties described here can be set using a style as
well as applied directly to a paragraph.
The formatting properties of a paragraph are accessed using the
|ParagraphFormat| object available using the paragraph's
:attr:`~.Paragraph.paragraph_format` property.
Horizontal alignment (justification)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Also known as *justification*, the horizontal alignment of a paragraph can be
set to left, centered, right, or fully justified (aligned on both the left
and right sides) using values from the enumeration
:ref:`WdParagraphAlignment`::
>>> from docx.enum.text import WD_ALIGN_PARAGRAPH
>>> document = Document()
>>> paragraph = document.add_paragraph()
>>> paragraph_format = paragraph.paragraph_format
>>> paragraph_format.alignment
None # indicating alignment is inherited from the style hierarchy
>>> paragraph_format.alignment = WD_ALIGN_PARAGRAPH.CENTER
>>> paragraph_format.alignment
CENTER (1)
Indentation
~~~~~~~~~~~
Indentation is the horizontal space between a paragraph and edge of its
container, typically the page margin. A paragraph can be indented separately
on the left and right side. The first line can also have a different
indentation than the rest of the paragraph. A first line indented further
than the rest of the paragraph has *first line indent*. A first line indented
less has a *hanging indent*.
Indentation is specified using a |Length| value, such as |Inches|, |Pt|, or
|Cm|. Negative values are valid and cause the paragraph to overlap the margin
by the specified amount. A value of |None| indicates the indentation value is
inherited from the style hierarchy. Assigning |None| to an indentation
property removes any directly-applied indentation setting and restores
inheritance from the style hierarchy::
>>> from docx.shared import Inches
>>> paragraph = document.add_paragraph()
>>> paragraph_format = paragraph.paragraph_format
>>> paragraph_format.left_indent
None # indicating indentation is inherited from the style hierarchy
>>> paragraph_format.left_indent = Inches(0.5)
>>> paragraph_format.left_indent
457200
>>> paragraph_format.left_indent.inches
0.5
Right-side indent works in a similar way::
>>> from docx.shared import Pt
>>> paragraph_format.right_indent
None
>>> paragraph_format.right_indent = Pt(24)
>>> paragraph_format.right_indent
304800
>>> paragraph_format.right_indent.pt
24.0
First-line indent is specified using the
:attr:`~.ParagraphFormat.first_line_indent` property and is interpreted
relative to the left indent. A negative value indicates a hanging indent::
>>> paragraph_format.first_line_indent
None
>>> paragraph_format.first_line_indent = Inches(-0.25)
>>> paragraph_format.first_line_indent
-228600
>>> paragraph_format.first_line_indent.inches
-0.25
Tab stops
~~~~~~~~~
A tab stop determines the rendering of a tab character in the text of
a paragraph. In particular, it specifies the position where the text
following the tab character will start, how it will be aligned to that
position, and an optional leader character that will fill the horizontal
space spanned by the tab.
The tab stops for a paragraph or style are contained in a |TabStops| object
accessed using the :attr:`~.ParagraphFormat.tab_stops` property on
|ParagraphFormat|::
>>> tab_stops = paragraph_format.tab_stops
>>> tab_stops
<docx.text.tabstops.TabStops object at 0x106b802d8>
A new tab stop is added using the :meth:`~.TabStops.add_tab_stop` method::
>>> tab_stop = tab_stops.add_tab_stop(Inches(1.5))
>>> tab_stop.position
1371600
>>> tab_stop.position.inches
1.5
Alignment defaults to left, but may be specified by providing a member of the
:ref:`WdTabAlignment` enumeration. The leader character defaults to spaces,
but may be specified by providing a member of the :ref:`WdTabLeader`
enumeration::
>>> from docx.enum.text import WD_TAB_ALIGNMENT, WD_TAB_LEADER
>>> tab_stop = tab_stops.add_tab_stop(Inches(1.5), WD_TAB_ALIGNMENT.RIGHT, WD_TAB_LEADER.DOTS)
>>> print(tab_stop.alignment)
RIGHT (2)
>>> print(tab_stop.leader)
DOTS (1)
Existing tab stops are accessed using sequence semantics on |TabStops|::
>>> tab_stops[0]
<docx.text.tabstops.TabStop object at 0x1105427e8>
More details are available in the |TabStops| and |TabStop| API documentation
Paragraph spacing
~~~~~~~~~~~~~~~~~
The :attr:`~.ParagraphFormat.space_before` and
:attr:`~.ParagraphFormat.space_after` properties control the spacing between
subsequent paragraphs, controlling the spacing before and after a paragraph,
respectively. Inter-paragraph spacing is *collapsed* during page layout,
meaning the spacing between two paragraphs is the maximum of the
`space_after` for the first paragraph and the `space_before` of the second
paragraph. Paragraph spacing is specified as a |Length| value, often using
|Pt|::
>>> paragraph_format.space_before, paragraph_format.space_after
(None, None) # inherited by default
>>> paragraph_format.space_before = Pt(18)
>>> paragraph_format.space_before.pt
18.0
>>> paragraph_format.space_after = Pt(12)
>>> paragraph_format.space_after.pt
12.0
Line spacing
~~~~~~~~~~~~
Line spacing is the distance between subsequent baselines in the lines of
a paragraph. Line spacing can be specified either as an absolute distance or
relative to the line height (essentially the point size of the font used).
A typical absolute measure would be 18 points. A typical relative measure
would be double-spaced (2.0 line heights). The default line spacing is
single-spaced (1.0 line heights).
Line spacing is controlled by the interaction of the
:attr:`~.ParagraphFormat.line_spacing` and
:attr:`~.ParagraphFormat.line_spacing_rule` properties.
:attr:`~.ParagraphFormat.line_spacing` is either a |Length| value,
a (small-ish) |float|, or None. A |Length| value indicates an absolute
distance. A |float| indicates a number of line heights. |None| indicates line
spacing is inherited. :attr:`~.ParagraphFormat.line_spacing_rule` is a member
of the :ref:`WdLineSpacing` enumeration or |None|::
>>> from docx.shared import Length
>>> paragraph_format.line_spacing
None
>>> paragraph_format.line_spacing_rule
None
>>> paragraph_format.line_spacing = Pt(18)
>>> isinstance(paragraph_format.line_spacing, Length)
True
>>> paragraph_format.line_spacing.pt
18.0
>>> paragraph_format.line_spacing_rule
EXACTLY (4)
>>> paragraph_format.line_spacing = 1.75
>>> paragraph_format.line_spacing
1.75
>>> paragraph_format.line_spacing_rule
MULTIPLE (5)
Pagination properties
~~~~~~~~~~~~~~~~~~~~~
Four paragraph properties, :attr:`~.ParagraphFormat.keep_together`,
:attr:`~.ParagraphFormat.keep_with_next`,
:attr:`~.ParagraphFormat.page_break_before`, and
:attr:`~.ParagraphFormat.widow_control` control aspects of how the paragraph
behaves near page boundaries.
:attr:`~.ParagraphFormat.keep_together` causes the entire paragraph to appear
on the same page, issuing a page break before the paragraph if it would
otherwise be broken across two pages.
:attr:`~.ParagraphFormat.keep_with_next` keeps a paragraph on the same page
as the subsequent paragraph. This can be used, for example, to keep a section
heading on the same page as the first paragraph of the section.
:attr:`~.ParagraphFormat.page_break_before` causes a paragraph to be placed
at the top of a new page. This could be used on a chapter heading to ensure
chapters start on a new page.
:attr:`~.ParagraphFormat.widow_control` breaks a page to avoid placing the
first or last line of the paragraph on a separate page from the rest of the
paragraph.
All four of these properties are *tri-state*, meaning they can take the value
|True|, |False|, or |None|. |None| indicates the property value is inherited
from the style hierarchy. |True| means "on" and |False| means "off"::
>>> paragraph_format.keep_together
None # all four inherit by default
>>> paragraph_format.keep_with_next = True
>>> paragraph_format.keep_with_next
True
>>> paragraph_format.page_break_before = False
>>> paragraph_format.page_break_before
False
Apply character formatting
--------------------------
Character formatting is applied at the Run level. Examples include font
typeface and size, bold, italic, and underline.
A |Run| object has a read-only :attr:`~.Run.font` property providing access
to a |Font| object. A run's |Font| object provides properties for getting
and setting the character formatting for that run.
Several examples are provided here. For a complete set of the available
properties, see the |Font| API documentation.
The font for a run can be accessed like this::
>>> from docx import Document
>>> document = Document()
>>> run = document.add_paragraph().add_run()
>>> font = run.font
Typeface and size are set like this::
>>> from docx.shared import Pt
>>> font.name = 'Calibri'
>>> font.size = Pt(12)
Many font properties are *tri-state*, meaning they can take the values
|True|, |False|, and |None|. |True| means the property is "on", |False| means
it is "off". Conceptually, the |None| value means "inherit". A run exists in
the style inheritance hierarchy and by default inherits its character
formatting from that hierarchy. Any character formatting directly applied
using the |Font| object overrides the inherited values.
Bold and italic are tri-state properties, as are all-caps, strikethrough,
superscript, and many others. See the |Font| API documentation for a full
list::
>>> font.bold, font.italic
(None, None)
>>> font.italic = True
>>> font.italic
True
>>> font.italic = False
>>> font.italic
False
>>> font.italic = None
>>> font.italic
None
Underline is a bit of a special case. It is a hybrid of a tri-state property
and an enumerated value property. |True| means single underline, by far the
most common. |False| means no underline, but more often |None| is the right
choice if no underlining is wanted. The other forms of underlining, such as
double or dashed, are specified with a member of the :ref:`WdUnderline`
enumeration::
>>> font.underline
None
>>> font.underline = True
>>> # or perhaps
>>> font.underline = WD_UNDERLINE.DOT_DASH
Font color
~~~~~~~~~~
Each |Font| object has a |ColorFormat| object that provides access to its
color, accessed via its read-only :attr:`~.Font.color` property.
Apply a specific RGB color to a font::
>>> from docx.shared import RGBColor
>>> font.color.rgb = RGBColor(0x42, 0x24, 0xE9)
A font can also be set to a theme color by assigning a member of the
:ref:`MsoThemeColorIndex` enumeration::
>>> from docx.enum.dml import MSO_THEME_COLOR
>>> font.color.theme_color = MSO_THEME_COLOR.ACCENT_1
A font's color can be restored to its default (inherited) value by assigning
|None| to either the :attr:`~.ColorFormat.rgb` or
:attr:`~.ColorFormat.theme_color` attribute of |ColorFormat|::
>>> font.color.rgb = None
Determining the color of a font begins with determining its color type::
>>> font.color.type
RGB (1)
The value of the :attr:`~.ColorFormat.type` property can be a member of the
:ref:`MsoColorType` enumeration or None. `MSO_COLOR_TYPE.RGB` indicates it is
an RGB color. `MSO_COLOR_TYPE.THEME` indicates a theme color.
`MSO_COLOR_TYPE.AUTO` indicates its value is determined automatically by the
application, usually set to black. (This value is relatively rare.) |None|
indicates no color is applied and the color is inherited from the style
hierarchy; this is the most common case.
When the color type is `MSO_COLOR_TYPE.RGB`, the :attr:`~.ColorFormat.rgb`
property will be an |RGBColor| value indicating the RGB color::
>>> font.color.rgb
RGBColor(0x42, 0x24, 0xe9)
When the color type is `MSO_COLOR_TYPE.THEME`, the
:attr:`~.ColorFormat.theme_color` property will be a member of
:ref:`MsoThemeColorIndex` indicating the theme color::
>>> font.color.theme_color
ACCENT_1 (5)
|