File: storage-page-layout.html

package info (click to toggle)
pgadmin3 1.4.3-2
  • links: PTS
  • area: main
  • in suites: etch, etch-m68k
  • size: 29,796 kB
  • ctags: 10,758
  • sloc: cpp: 55,356; sh: 6,164; ansic: 1,520; makefile: 576; sql: 482; xml: 100; perl: 18
file content (326 lines) | stat: -rw-r--r-- 12,320 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>50.3.Database Page Layout</title>
<link rel="stylesheet" href="stylesheet.css" type="text/css">
<link rev="made" href="pgsql-docs@postgresql.org">
<meta name="generator" content="DocBook XSL Stylesheets V1.70.0">
<link rel="start" href="index.html" title="PostgreSQL 8.1.4 Documentation">
<link rel="up" href="storage.html" title="Chapter50.Database Physical Storage">
<link rel="prev" href="storage-toast.html" title="50.2.TOAST">
<link rel="next" href="bki.html" title="Chapter51.BKI Backend Interface">
<link rel="copyright" href="ln-legalnotice.html" title="Legal Notice">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="sect1" lang="en">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="storage-page-layout"></a>50.3.Database Page Layout</h2></div></div></div>
<p>This section provides an overview of the page format used within
<span class="productname">PostgreSQL</span> tables and indexes.<sup>[<a name="id845399" href="#ftn.id845399">10</a>]</sup>
Sequences and <acronym class="acronym">TOAST</acronym> tables are formatted just like a regular table.</p>
<p>In the following explanation, a
<em class="firstterm">byte</em>
is assumed to contain 8 bits.  In addition, the term
<em class="firstterm">item</em>
refers to an individual data value that is stored on a page.  In a table,
an item is a row; in an index, an item is an index entry.</p>
<p>Every table and index is stored as an array of <em class="firstterm">pages</em> of a
fixed size (usually 8Kb, although a different page size can be selected
when compiling the server).  In a table, all the pages are logically
equivalent, so a particular item (row) can be stored in any page.  In
indexes, the first page is generally reserved as a <em class="firstterm">metapage</em>
holding control information, and there may be different types of pages
within the index, depending on the index access method.</p>
<p><a href="storage-page-layout.html#page-table" title="Table50.2.Overall Page Layout">Table50.2, &#8220;Page Layout&#8221;</a> shows the overall layout of a page.
There are five parts to each page.</p>
<div class="table">
<a name="page-table"></a><p class="title"><b>Table50.2.Overall Page Layout</b></p>
<div class="table-contents"><table summary="Overall Page Layout" border="1">
<colgroup>
<col>
<col>
</colgroup>
<thead><tr>
<th>Item</th>
<th>Description</th>
</tr></thead>
<tbody>
<tr>
<td>PageHeaderData</td>
<td>20 bytes long. Contains general information about the page, including
free space pointers.</td>
</tr>
<tr>
<td>ItemPointerData</td>
<td>Array of (offset,length) pairs pointing to the actual items.
4 bytes per item.</td>
</tr>
<tr>
<td>Free space</td>
<td>The unallocated space. New item pointers are allocated from the start
of this area, new items from the end.</td>
</tr>
<tr>
<td>Items</td>
<td>The actual items themselves.</td>
</tr>
<tr>
<td>Special space</td>
<td>Index access method specific data. Different methods store different
data. Empty in ordinary tables.</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p>
  The first 20 bytes of each page consists of a page header
  (PageHeaderData). Its format is detailed in <a href="storage-page-layout.html#pageheaderdata-table" title="Table50.3.PageHeaderData Layout">Table50.3, &#8220;PageHeaderData Layout&#8221;</a>. The first two fields track the most
  recent WAL entry related to this page. They are followed by three 2-byte
  integer fields
  (<code class="structfield">pd_lower</code>, <code class="structfield">pd_upper</code>,
  and <code class="structfield">pd_special</code>). These contain byte offsets
  from the page start to the start
  of unallocated space, to the end of unallocated space, and to the start of
  the special space. 
  The last 2 bytes of the page header,
  <code class="structfield">pd_pagesize_version</code>, store both the page size
  and a version indicator.  Beginning with
  <span class="productname">PostgreSQL</span> 8.1 the version number is 3;
  <span class="productname">PostgreSQL</span> 8.0 used version number 2;
  <span class="productname">PostgreSQL</span> 7.3 and 7.4 used version number 1;
  prior releases used version number 0.
  (The basic page layout and header format has not changed in these versions,
  but the layout of heap row headers has.)  The page size
  is basically only present as a cross-check; there is no support for having
  more than one page size in an installation.
  
 </p>
<div class="table">
<a name="pageheaderdata-table"></a><p class="title"><b>Table50.3.PageHeaderData Layout</b></p>
<div class="table-contents"><table summary="PageHeaderData Layout" border="1">
<colgroup>
<col>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>Field</th>
<th>Type</th>
<th>Length</th>
<th>Description</th>
</tr></thead>
<tbody>
<tr>
<td>pd_lsn</td>
<td>XLogRecPtr</td>
<td>8 bytes</td>
<td>LSN: next byte after last byte of xlog record for last change
   to this page</td>
</tr>
<tr>
<td>pd_tli</td>
<td>TimeLineID</td>
<td>4 bytes</td>
<td>TLI of last change</td>
</tr>
<tr>
<td>pd_lower</td>
<td>LocationIndex</td>
<td>2 bytes</td>
<td>Offset to start of free space</td>
</tr>
<tr>
<td>pd_upper</td>
<td>LocationIndex</td>
<td>2 bytes</td>
<td>Offset to end of free space</td>
</tr>
<tr>
<td>pd_special</td>
<td>LocationIndex</td>
<td>2 bytes</td>
<td>Offset to start of special space</td>
</tr>
<tr>
<td>pd_pagesize_version</td>
<td>uint16</td>
<td>2 bytes</td>
<td>Page size and layout version number information</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p>  All the details may be found in
  <code class="filename">src/include/storage/bufpage.h</code>.
 </p>
<p>
  Following the page header are item identifiers
  (<code class="type">ItemIdData</code>), each requiring four bytes.
  An item identifier contains a byte-offset to
  the start of an item, its length in bytes, and a few attribute bits
  which affect its interpretation.
  New item identifiers are allocated
  as needed from the beginning of the unallocated space.
  The number of item identifiers present can be determined by looking at
  <code class="structfield">pd_lower</code>, which is increased to allocate a new identifier.
  Because an item
  identifier is never moved until it is freed, its index may be used on a
  long-term basis to reference an item, even when the item itself is moved
  around on the page to compact free space.  In fact, every pointer to an
  item (<code class="type">ItemPointer</code>, also known as
  <code class="type">CTID</code>) created by
  <span class="productname">PostgreSQL</span> consists of a page number and the
  index of an item identifier.

 </p>
<p> 
  The items themselves are stored in space allocated backwards from the end
  of unallocated space.  The exact structure varies depending on what the
  table is to contain. Tables and sequences both use a structure named
  <code class="type">HeapTupleHeaderData</code>, described below.

 </p>
<p> 
  The final section is the &#8220;<span class="quote">special section</span>&#8221; which may
  contain anything the access method wishes to store.  For example,
  b-tree indexes store links to the page's left and right siblings,
  as well as some other data relevant to the index structure.
  Ordinary tables do not use a special section at all (indicated by setting
  <code class="structfield">pd_special</code> to equal the page size).
  
 </p>
<p>
  All table rows are structured in the same way. There is a fixed-size
  header (occupying 27 bytes on most machines), followed by an optional null
  bitmap, an optional object ID field, and the user data. The header is
  detailed
  in <a href="storage-page-layout.html#heaptupleheaderdata-table" title="Table50.4.HeapTupleHeaderData Layout">Table50.4, &#8220;HeapTupleHeaderData Layout&#8221;</a>.  The actual user data
  (columns of the row) begins at the offset indicated by
  <code class="structfield">t_hoff</code>, which must always be a multiple of the MAXALIGN
  distance for the platform.
  The null bitmap is
  only present if the <em class="firstterm">HEAP_HASNULL</em> bit is set in
  <code class="structfield">t_infomask</code>. If it is present it begins just after
  the fixed header and occupies enough bytes to have one bit per data column
  (that is, <code class="structfield">t_natts</code> bits altogether). In this list of bits, a
  1 bit indicates not-null, a 0 bit is a null.  When the bitmap is not
  present, all columns are assumed not-null.
  The object ID is only present if the <em class="firstterm">HEAP_HASOID</em> bit
  is set in <code class="structfield">t_infomask</code>.  If present, it appears just
  before the <code class="structfield">t_hoff</code> boundary.  Any padding needed to make
  <code class="structfield">t_hoff</code> a MAXALIGN multiple will appear between the null
  bitmap and the object ID.  (This in turn ensures that the object ID is
  suitably aligned.)
  
 </p>
<div class="table">
<a name="heaptupleheaderdata-table"></a><p class="title"><b>Table50.4.HeapTupleHeaderData Layout</b></p>
<div class="table-contents"><table summary="HeapTupleHeaderData Layout" border="1">
<colgroup>
<col>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>Field</th>
<th>Type</th>
<th>Length</th>
<th>Description</th>
</tr></thead>
<tbody>
<tr>
<td>t_xmin</td>
<td>TransactionId</td>
<td>4 bytes</td>
<td>insert XID stamp</td>
</tr>
<tr>
<td>t_cmin</td>
<td>CommandId</td>
<td>4 bytes</td>
<td>insert CID stamp</td>
</tr>
<tr>
<td>t_xmax</td>
<td>TransactionId</td>
<td>4 bytes</td>
<td>delete XID stamp</td>
</tr>
<tr>
<td>t_cmax</td>
<td>CommandId</td>
<td>4 bytes</td>
<td>delete CID stamp (overlays with t_xvac)</td>
</tr>
<tr>
<td>t_xvac</td>
<td>TransactionId</td>
<td>4 bytes</td>
<td>XID for VACUUM operation moving a row version</td>
</tr>
<tr>
<td>t_ctid</td>
<td>ItemPointerData</td>
<td>6 bytes</td>
<td>current TID of this or newer row version</td>
</tr>
<tr>
<td>t_natts</td>
<td>int16</td>
<td>2 bytes</td>
<td>number of attributes</td>
</tr>
<tr>
<td>t_infomask</td>
<td>uint16</td>
<td>2 bytes</td>
<td>various flag bits</td>
</tr>
<tr>
<td>t_hoff</td>
<td>uint8</td>
<td>1 byte</td>
<td>offset to user data</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p>   All the details may be found in
   <code class="filename">src/include/access/htup.h</code>.
 </p>
<p> 
  Interpreting the actual data can only be done with information obtained
  from other tables, mostly <code class="structname">pg_attribute</code>. The
  key values needed to identify field locations are
  <code class="structfield">attlen</code> and <code class="structfield">attalign</code>.
  There is no way to directly get a
  particular attribute, except when there are only fixed width fields and no
  null values. All this trickery is wrapped up in the functions
  <em class="firstterm">heap_getattr</em>, <em class="firstterm">fastgetattr</em>
  and <em class="firstterm">heap_getsysattr</em>.
  
 </p>
<p>
  To read the data you need to examine each attribute in turn. First check
  whether the field is NULL according to the null bitmap. If it is, go to
  the next. Then make sure you have the right alignment.  If the field is a
  fixed width field, then all the bytes are simply placed. If it's a
  variable length field (attlen = -1) then it's a bit more complicated.
  All variable-length datatypes share the common header structure
  <code class="type">varattrib</code>, which includes the total length of the stored
  value and some flag bits.  Depending on the flags, the data may be either
  inline or in a <acronym class="acronym">TOAST</acronym> table;
  it might be compressed, too (see <a href="storage-toast.html" title="50.2.TOAST">Section50.2, &#8220;TOAST&#8221;</a>).
  
 </p>
<div class="footnotes">
<br><hr width="100" align="left">
<div class="footnote"><p><sup>[<a name="ftn.id845399" href="#id845399">10</a>] </sup>    Actually, index access methods need not use this page format.
    All the existing index methods do use this basic format,
    but the data kept on index metapages usually doesn't follow
    the item layout rules.
  </p></div>
</div>
</div></body>
</html>