1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
|
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>26.4.WAL Internals</title>
<link rel="stylesheet" href="stylesheet.css" type="text/css">
<link rev="made" href="pgsql-docs@postgresql.org">
<meta name="generator" content="DocBook XSL Stylesheets V1.70.0">
<link rel="start" href="index.html" title="PostgreSQL 8.1.4 Documentation">
<link rel="up" href="wal.html" title="Chapter26.Reliability and the Write-Ahead Log">
<link rel="prev" href="wal-configuration.html" title="26.3.WAL Configuration">
<link rel="next" href="regress.html" title="Chapter27.Regression Tests">
<link rel="copyright" href="ln-legalnotice.html" title="Legal Notice">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="sect1" lang="en">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="wal-internals"></a>26.4.WAL Internals</h2></div></div></div>
<p> <acronym class="acronym">WAL</acronym> is automatically enabled; no action is
required from the administrator except ensuring that the
disk-space requirements for the <acronym class="acronym">WAL</acronym> logs are met,
and that any necessary tuning is done (see <a href="wal-configuration.html" title="26.3.WAL Configuration">Section26.3, “<acronym class="acronym">WAL</acronym> Configuration”</a>).
</p>
<p> <acronym class="acronym">WAL</acronym> logs are stored in the directory
<code class="filename">pg_xlog</code> under the data directory, as a set of
segment files, normally each 16 MB in size. Each segment is divided into
pages, normally 8 KB each. The log record headers are described in
<code class="filename">access/xlog.h</code>; the record content is dependent
on the type of event that is being logged. Segment files are given
ever-increasing numbers as names, starting at
<code class="filename">000000010000000000000000</code>. The numbers do not wrap, at
present, but it should take a very very long time to exhaust the
available stock of numbers.
</p>
<p> It is of advantage if the log is located on another disk than the
main database files. This may be achieved by moving the directory
<code class="filename">pg_xlog</code> to another location (while the server
is shut down, of course) and creating a symbolic link from the
original location in the main data directory to the new location.
</p>
<p> The aim of <acronym class="acronym">WAL</acronym>, to ensure that the log is
written before database records are altered, may be subverted by
disk drives<a name="id674272"></a> that falsely report a
successful write to the kernel,
when in fact they have only cached the data and not yet stored it
on the disk. A power failure in such a situation may still lead to
irrecoverable data corruption. Administrators should try to ensure
that disks holding <span class="productname">PostgreSQL</span>'s
<acronym class="acronym">WAL</acronym> log files do not make such false reports.
</p>
<p> After a checkpoint has been made and the log flushed, the
checkpoint's position is saved in the file
<code class="filename">pg_control</code>. Therefore, when recovery is to be
done, the server first reads <code class="filename">pg_control</code> and
then the checkpoint record; then it performs the REDO operation by
scanning forward from the log position indicated in the checkpoint
record. Because the entire content of data pages is saved in the
log on the first page modification after a checkpoint, all pages
changed since the checkpoint will be restored to a consistent
state.
</p>
<p> To deal with the case where <code class="filename">pg_control</code> is
corrupted, we should support the possibility of scanning existing log
segments in reverse order [mdash ] newest to oldest [mdash ] in order to find the
latest checkpoint. This has not been implemented yet.
<code class="filename">pg_control</code> is small enough (less than one disk page)
that it is not subject to partial-write problems, and as of this writing
there have been no reports of database failures due solely to inability
to read <code class="filename">pg_control</code> itself. So while it is
theoretically a weak spot, <code class="filename">pg_control</code> does not
seem to be a problem in practice.
</p>
</div></body>
</html>
|