1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326
|
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>50.3.Database Page Layout</title>
<link rel="stylesheet" href="stylesheet.css" type="text/css">
<link rev="made" href="pgsql-docs@postgresql.org">
<meta name="generator" content="DocBook XSL Stylesheets V1.70.0">
<link rel="start" href="index.html" title="PostgreSQL 8.1.4 Documentation">
<link rel="up" href="storage.html" title="Chapter50.Database Physical Storage">
<link rel="prev" href="storage-toast.html" title="50.2.TOAST">
<link rel="next" href="bki.html" title="Chapter51.BKI Backend Interface">
<link rel="copyright" href="ln-legalnotice.html" title="Legal Notice">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="sect1" lang="en">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="storage-page-layout"></a>50.3.Database Page Layout</h2></div></div></div>
<p>This section provides an overview of the page format used within
<span class="productname">PostgreSQL</span> tables and indexes.<sup>[<a name="id845399" href="#ftn.id845399">10</a>]</sup>
Sequences and <acronym class="acronym">TOAST</acronym> tables are formatted just like a regular table.</p>
<p>In the following explanation, a
<em class="firstterm">byte</em>
is assumed to contain 8 bits. In addition, the term
<em class="firstterm">item</em>
refers to an individual data value that is stored on a page. In a table,
an item is a row; in an index, an item is an index entry.</p>
<p>Every table and index is stored as an array of <em class="firstterm">pages</em> of a
fixed size (usually 8Kb, although a different page size can be selected
when compiling the server). In a table, all the pages are logically
equivalent, so a particular item (row) can be stored in any page. In
indexes, the first page is generally reserved as a <em class="firstterm">metapage</em>
holding control information, and there may be different types of pages
within the index, depending on the index access method.</p>
<p><a href="storage-page-layout.html#page-table" title="Table50.2.Overall Page Layout">Table50.2, “Page Layout”</a> shows the overall layout of a page.
There are five parts to each page.</p>
<div class="table">
<a name="page-table"></a><p class="title"><b>Table50.2.Overall Page Layout</b></p>
<div class="table-contents"><table summary="Overall Page Layout" border="1">
<colgroup>
<col>
<col>
</colgroup>
<thead><tr>
<th>Item</th>
<th>Description</th>
</tr></thead>
<tbody>
<tr>
<td>PageHeaderData</td>
<td>20 bytes long. Contains general information about the page, including
free space pointers.</td>
</tr>
<tr>
<td>ItemPointerData</td>
<td>Array of (offset,length) pairs pointing to the actual items.
4 bytes per item.</td>
</tr>
<tr>
<td>Free space</td>
<td>The unallocated space. New item pointers are allocated from the start
of this area, new items from the end.</td>
</tr>
<tr>
<td>Items</td>
<td>The actual items themselves.</td>
</tr>
<tr>
<td>Special space</td>
<td>Index access method specific data. Different methods store different
data. Empty in ordinary tables.</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p>
The first 20 bytes of each page consists of a page header
(PageHeaderData). Its format is detailed in <a href="storage-page-layout.html#pageheaderdata-table" title="Table50.3.PageHeaderData Layout">Table50.3, “PageHeaderData Layout”</a>. The first two fields track the most
recent WAL entry related to this page. They are followed by three 2-byte
integer fields
(<code class="structfield">pd_lower</code>, <code class="structfield">pd_upper</code>,
and <code class="structfield">pd_special</code>). These contain byte offsets
from the page start to the start
of unallocated space, to the end of unallocated space, and to the start of
the special space.
The last 2 bytes of the page header,
<code class="structfield">pd_pagesize_version</code>, store both the page size
and a version indicator. Beginning with
<span class="productname">PostgreSQL</span> 8.1 the version number is 3;
<span class="productname">PostgreSQL</span> 8.0 used version number 2;
<span class="productname">PostgreSQL</span> 7.3 and 7.4 used version number 1;
prior releases used version number 0.
(The basic page layout and header format has not changed in these versions,
but the layout of heap row headers has.) The page size
is basically only present as a cross-check; there is no support for having
more than one page size in an installation.
</p>
<div class="table">
<a name="pageheaderdata-table"></a><p class="title"><b>Table50.3.PageHeaderData Layout</b></p>
<div class="table-contents"><table summary="PageHeaderData Layout" border="1">
<colgroup>
<col>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>Field</th>
<th>Type</th>
<th>Length</th>
<th>Description</th>
</tr></thead>
<tbody>
<tr>
<td>pd_lsn</td>
<td>XLogRecPtr</td>
<td>8 bytes</td>
<td>LSN: next byte after last byte of xlog record for last change
to this page</td>
</tr>
<tr>
<td>pd_tli</td>
<td>TimeLineID</td>
<td>4 bytes</td>
<td>TLI of last change</td>
</tr>
<tr>
<td>pd_lower</td>
<td>LocationIndex</td>
<td>2 bytes</td>
<td>Offset to start of free space</td>
</tr>
<tr>
<td>pd_upper</td>
<td>LocationIndex</td>
<td>2 bytes</td>
<td>Offset to end of free space</td>
</tr>
<tr>
<td>pd_special</td>
<td>LocationIndex</td>
<td>2 bytes</td>
<td>Offset to start of special space</td>
</tr>
<tr>
<td>pd_pagesize_version</td>
<td>uint16</td>
<td>2 bytes</td>
<td>Page size and layout version number information</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p> All the details may be found in
<code class="filename">src/include/storage/bufpage.h</code>.
</p>
<p>
Following the page header are item identifiers
(<code class="type">ItemIdData</code>), each requiring four bytes.
An item identifier contains a byte-offset to
the start of an item, its length in bytes, and a few attribute bits
which affect its interpretation.
New item identifiers are allocated
as needed from the beginning of the unallocated space.
The number of item identifiers present can be determined by looking at
<code class="structfield">pd_lower</code>, which is increased to allocate a new identifier.
Because an item
identifier is never moved until it is freed, its index may be used on a
long-term basis to reference an item, even when the item itself is moved
around on the page to compact free space. In fact, every pointer to an
item (<code class="type">ItemPointer</code>, also known as
<code class="type">CTID</code>) created by
<span class="productname">PostgreSQL</span> consists of a page number and the
index of an item identifier.
</p>
<p>
The items themselves are stored in space allocated backwards from the end
of unallocated space. The exact structure varies depending on what the
table is to contain. Tables and sequences both use a structure named
<code class="type">HeapTupleHeaderData</code>, described below.
</p>
<p>
The final section is the “<span class="quote">special section</span>” which may
contain anything the access method wishes to store. For example,
b-tree indexes store links to the page's left and right siblings,
as well as some other data relevant to the index structure.
Ordinary tables do not use a special section at all (indicated by setting
<code class="structfield">pd_special</code> to equal the page size).
</p>
<p>
All table rows are structured in the same way. There is a fixed-size
header (occupying 27 bytes on most machines), followed by an optional null
bitmap, an optional object ID field, and the user data. The header is
detailed
in <a href="storage-page-layout.html#heaptupleheaderdata-table" title="Table50.4.HeapTupleHeaderData Layout">Table50.4, “HeapTupleHeaderData Layout”</a>. The actual user data
(columns of the row) begins at the offset indicated by
<code class="structfield">t_hoff</code>, which must always be a multiple of the MAXALIGN
distance for the platform.
The null bitmap is
only present if the <em class="firstterm">HEAP_HASNULL</em> bit is set in
<code class="structfield">t_infomask</code>. If it is present it begins just after
the fixed header and occupies enough bytes to have one bit per data column
(that is, <code class="structfield">t_natts</code> bits altogether). In this list of bits, a
1 bit indicates not-null, a 0 bit is a null. When the bitmap is not
present, all columns are assumed not-null.
The object ID is only present if the <em class="firstterm">HEAP_HASOID</em> bit
is set in <code class="structfield">t_infomask</code>. If present, it appears just
before the <code class="structfield">t_hoff</code> boundary. Any padding needed to make
<code class="structfield">t_hoff</code> a MAXALIGN multiple will appear between the null
bitmap and the object ID. (This in turn ensures that the object ID is
suitably aligned.)
</p>
<div class="table">
<a name="heaptupleheaderdata-table"></a><p class="title"><b>Table50.4.HeapTupleHeaderData Layout</b></p>
<div class="table-contents"><table summary="HeapTupleHeaderData Layout" border="1">
<colgroup>
<col>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>Field</th>
<th>Type</th>
<th>Length</th>
<th>Description</th>
</tr></thead>
<tbody>
<tr>
<td>t_xmin</td>
<td>TransactionId</td>
<td>4 bytes</td>
<td>insert XID stamp</td>
</tr>
<tr>
<td>t_cmin</td>
<td>CommandId</td>
<td>4 bytes</td>
<td>insert CID stamp</td>
</tr>
<tr>
<td>t_xmax</td>
<td>TransactionId</td>
<td>4 bytes</td>
<td>delete XID stamp</td>
</tr>
<tr>
<td>t_cmax</td>
<td>CommandId</td>
<td>4 bytes</td>
<td>delete CID stamp (overlays with t_xvac)</td>
</tr>
<tr>
<td>t_xvac</td>
<td>TransactionId</td>
<td>4 bytes</td>
<td>XID for VACUUM operation moving a row version</td>
</tr>
<tr>
<td>t_ctid</td>
<td>ItemPointerData</td>
<td>6 bytes</td>
<td>current TID of this or newer row version</td>
</tr>
<tr>
<td>t_natts</td>
<td>int16</td>
<td>2 bytes</td>
<td>number of attributes</td>
</tr>
<tr>
<td>t_infomask</td>
<td>uint16</td>
<td>2 bytes</td>
<td>various flag bits</td>
</tr>
<tr>
<td>t_hoff</td>
<td>uint8</td>
<td>1 byte</td>
<td>offset to user data</td>
</tr>
</tbody>
</table></div>
</div>
<br class="table-break"><p> All the details may be found in
<code class="filename">src/include/access/htup.h</code>.
</p>
<p>
Interpreting the actual data can only be done with information obtained
from other tables, mostly <code class="structname">pg_attribute</code>. The
key values needed to identify field locations are
<code class="structfield">attlen</code> and <code class="structfield">attalign</code>.
There is no way to directly get a
particular attribute, except when there are only fixed width fields and no
null values. All this trickery is wrapped up in the functions
<em class="firstterm">heap_getattr</em>, <em class="firstterm">fastgetattr</em>
and <em class="firstterm">heap_getsysattr</em>.
</p>
<p>
To read the data you need to examine each attribute in turn. First check
whether the field is NULL according to the null bitmap. If it is, go to
the next. Then make sure you have the right alignment. If the field is a
fixed width field, then all the bytes are simply placed. If it's a
variable length field (attlen = -1) then it's a bit more complicated.
All variable-length datatypes share the common header structure
<code class="type">varattrib</code>, which includes the total length of the stored
value and some flag bits. Depending on the flags, the data may be either
inline or in a <acronym class="acronym">TOAST</acronym> table;
it might be compressed, too (see <a href="storage-toast.html" title="50.2.TOAST">Section50.2, “TOAST”</a>).
</p>
<div class="footnotes">
<br><hr width="100" align="left">
<div class="footnote"><p><sup>[<a name="ftn.id845399" href="#id845399">10</a>] </sup> Actually, index access methods need not use this page format.
All the existing index methods do use this basic format,
but the data kept on index metapages usually doesn't follow
the item layout rules.
</p></div>
</div>
</div></body>
</html>
|