To Save: Michael Goldberg: Write-Ahead Logging (WAL)

Michael Goldberg: Write-Ahead Logging (WAL)

via Planet PostgreSQL

Introduction

Write-Ahead Logging (WAL) is a standard method for ensuring data integrity. Briefly, WAL's central concept is that changes to data files (where tables and indexes reside) must be written only after those changes have been logged, that is, after log records describing the changes have been flushed to permanent storage.

The Problem and the Solution

The buffer cache (and other buffers in RAM) increases performance, but decreases reliability. In the event of a failure in the DBMS, the contents of the buffer cache is lost. If a crash occurs in the operating system or at the hardware level, the contents of the OS buffers is also lost (but the operating system itself copes with this).

PostgreSQL uses write-ahead logging to ensure reliability. When performing any operation, a record is created containing the minimum necessary information so that the operation can be performed again.

Such a record is necessarily saved to the disk (or other non-volatile device) before the modified page gets there (that is why it is called the write- ahead log).

Protection

WAL protects all objects that are handled in RAM buffers: tables, indexes and other objects, transactions status. The log does not contain data on temporary tables (such tables are accessible only to the user who created it and only for the duration of the session or transaction) and unlogged tables (such tables are no different from regular tables, except that they are not protected by the log). In case of a failure, such tables are simply cleared. The purpose of their existence is that work with them is much faster.

Record Structure

Logically, a log can be thought of as a sequence of records of various lengths. Each record contains data on a certain operation, preceded by a header. The header, among other things, indicates:

– number of the transaction to which the record belongs;
– resource manager – the system component responsible for this record;
– checksum (CRC)

The data itself can have different meanings. The resource manager "understands" how to interpret the data in its record. There are separate managers for the tables, for each type of indexes, for the status of transactions, etc. For example, the data can be some fragment of a page that needs to be written over its contents with a certain offset.

In order to refer to a specific record, the data type pg_lsn (LSN = log sequence number) is used – a 64-bit number representing the byte offset before the record relative to the beginning of the log:

On disk, the log is stored as files in the $ PGDATA / pg_wal directory. Each file takes 16 MB. The size can only be changed when compiling the source codes (starting with PostgreSQL 11, the size can be changed when the cluster is initialized).

Storage

Log entries saved into the current used file; when it ends, the next file begins to be used. Special buffers are allocated for the log in RAM. The cache size is set by the wal_buffers parameter (the default value means automatic configuration: 1/32 of the buffer cache is allocated).

WAL cache is arranged like a buffer cache, but it works primarily in the ring buffer mode: entries are added to the "head" of the buffer and written to the disk from the "tail".

PostgreSQL WAL Official documentation

The post Write-Ahead Logging (WAL) first appeared on Awide.