maandag 29 augustus 2022

[UPDATE] WizTree v4.10

scr7VGZDE.jpg

WizTree is a very fast disk space analyzer and search tool, scanning a single folder or your entire hard drive quickly and showing those files and folders using the most disk space. The program uses the NTFS Master File Table (MFT) feature in Windows to rapidly view files.

Note that this high-speed mode is limited to local, NTFS-formatted drives. The program must do a (much slower) scan of FAT-formatted or network drives for analysis.

New article matched your Blogger! rule

Testing Wide Production 2020

How to calculate the size of a database?

Often even DBAs have misalignment on what is meant by "database size". As data and databases grow with a rather high rate, it is important to understand the size of the database estate. Worldwide data is expected to hit 175 zettabytes by 2025, representing a 61% CAGR. Also, 51% of the data will be in data centers and 49% will be in the public cloud. So, when moving to cloud, it is important to know how much storage is needed for the databases.

Ever wondered which is the biggest database in the world? Here is an interesting link: 10 Largest Databases in the World. "Really? Sure about that? How do you actually know if it's true? Who is the source of information, and can we trust it?" I was wondering exactly the same when checking the list.

And also: how should all the data be stored? According to Data: A Lenovo Solutions Perspective, the answer is that within the core (i.e. data center or cloud) the classic relational database will still be the dominant approach for many years. Thus, one more reason to properly estimate and calculate the size of your databases.

So, how how to calculate the size of say an Oracle database (check the links at the end for other database brands)? The most common way is to simply calculate the space which the database files physically consume on disk:

select sum(bytes)/1024/1024/1024 size_in_GB from dba_data_files;

But how about the other files? Like temp, control and redolog files? Are the undo files really part of the database? Do we always want to include the files from SYSTEM and SYSAUX too?

Also, not all this space in the files listed in dba_data_files is necessarily allocated. There could be sections of these files that are not used. Then, we can go with this method:

select sum(bytes)/1024/1024/1024 size_in_GB from dba_segments;

I often prefer using the query which ignores the system, temp and undo data:

select nvl(ceil(sum(bytes)/(1024*1024*1024)),0) size_in_GB   from dba_extents   where tablespace_name not in ('SYSTEM','TEMP','SYSAUX') and tablespace_name not like '%UNDO%';

Another way used is to only calculate the size of the real data from each schema in the database – the methods above include the indexes too. This requires recent analyze (not estimate but rather compute 100%) for all tables.

select owner, nvl(ceil(sum(num_rows*avg_row_len)/(1024*1024*1024)),0) size_in_GB   from dba_tables   where owner not in ('SYSTEM','OUTLN','SYS') group by owner;

However, checking the overall size including TEMP and REDO is perhaps the best approach. There is a MOS note, How to Calculate the Size of the Database (Doc ID 1360446.1) which is also worth reading. Here is the suggested method to calculate the total database size:

select a.data_size+b.temp_size+c.redo_size+d.cont_size "total_size"     from ( select sum(bytes) data_size            from dba_data_files ) a,          ( select nvl(sum(bytes),0) temp_size            from dba_temp_files ) b,          ( select sum(bytes) redo_size            from sys.v_$logfile lf, sys.v_$log l            where lf.group# = l.group#) c,          ( select sum(block_size*file_size_blks) cont_size            from v$controlfile ) d;

If you have enabled block change tracking and want to be really pedantic with the database size, you should also add the BCT file.

Here are some additional links and a query which will help you find indexes which are bigger than tables.

Database Size in Oracle by Bijay Kumar Sahoo

How to calculate current DB size from asktom

Estimate the Size of a Database in SQL Server

How to calculate total size of the database

How to calculate the size of a MySQL database

select owner "TABLE_OWNER", tablename "TABLE_NAME", tablesize "TABLE SIZE (GB)", indexname "INDEX_NAME", indexsize "INDEX SIZE (GB)", indexsize/tablesize "INDEX/TABLE" from     (     with     tabs as (select owner, segment_name tablename,sum(bytes/1024/1024/1024) tablesize from dba_segments where segment_type='TABLE' group by owner, segment_name),     inds as (select i.owner, i.index_name indexname, i.table_name tablename, sum(s.bytes/1024/1024/1024) indexsize from dba_indexes i join dba_segments s on (i.owner=s.owner and i.index_name=s.segment_name) group by i.owner, i.index_name, i.table_name)     select * from tabs natural join inds where indexsize > tablesize and indexsize>1     )     order by indexsize/tablesize desc;

Finally, there is a view in Oracle, called dba_hist_tbspc_space_usage, which displays historical tablespace usage statistics. So, you can retrieve historical growth of the tablespaces and thus the database as a whole. Check this blog post: How to retrieve growth history for Oracle tablespaces. Note that the history is available for as far back as AWR data is retained. It is considered good practice to keep track of the database size on say weekly basis. You will be always able to answer questions on capacity needs, trends, growth, size, etc.

New article matched your Blogger! rule

Testing Wide Production 2020

maandag 22 augustus 2022

Michael Goldberg: Write-Ahead Logging (WAL)

Introduction

Write-Ahead Logging (WAL) is a standard method for ensuring data integrity. Briefly, WAL's central concept is that changes to data files (where tables and indexes reside) must be written only after those changes have been logged, that is, after log records describing the changes have been flushed to permanent storage.

The Problem and the Solution

The buffer cache (and other buffers in RAM) increases performance, but decreases reliability. In the event of a failure in the DBMS, the contents of the buffer cache is lost. If a crash occurs in the operating system or at the hardware level, the contents of the OS buffers is also lost (but the operating system itself copes with this).

PostgreSQL uses write-ahead logging to ensure reliability. When performing any operation, a record is created containing the minimum necessary information so that the operation can be performed again. 

Such a record is necessarily saved to the disk (or other non-volatile device) before the modified page gets there (that is why it is called the write- ahead log).

Protection

WAL protects all objects that are handled in RAM buffers: tables, indexes and other objects, transactions status. The log does not contain data on temporary tables (such tables are accessible only to the user who created it and only for the duration of the session or transaction) and unlogged tables (such tables are no different from regular tables, except that they are not protected by the log). In case of a failure, such tables are simply cleared. The purpose of their existence is that work with them is much faster.

Record Structure

Logically, a log can be thought of as a sequence of records of various lengths. Each record contains data on a certain operation, preceded by a header. The header, among other things, indicates:

– number of the transaction to which the record belongs;
– resource manager – the system component responsible for this record;
– checksum (CRC)

The data itself can have different meanings. The resource manager "understands" how to interpret the data in its record. There are separate managers for the tables, for each type of indexes, for the status of transactions, etc. For example, the data can be some fragment of a page that needs to be written over its contents with a certain offset.

In order to refer to a specific record, the data type pg_lsn (LSN = log sequence number) is used – a 64-bit number representing the byte offset before the record relative to the beginning of the log:

 

On disk, the log is stored as files in the $ PGDATA / pg_wal directory. Each file takes 16 MB. The size can only be changed when compiling the source codes (starting with PostgreSQL 11, the size can be changed when the cluster is initialized).

Storage

Log entries saved into the current used file; when it ends, the next file begins to be used. Special buffers are allocated for the log in RAM. The cache size is set by the wal_buffers parameter (the default value means automatic configuration: 1/32 of the buffer cache is allocated). 

WAL cache is arranged like a buffer cache, but it works primarily in the ring buffer mode: entries are added to the "head" of the buffer and written to the disk from the "tail".

 

PostgreSQL WAL Official documentation

The post Write-Ahead Logging (WAL) first appeared on Awide.

New article matched your Blogger! rule

Testing Wide Production 2020

zaterdag 20 augustus 2022

The blog is back

via Expedition Data by Marcel-Jan Krijgsman

Well, that was scary. Just before I went on holiday I switched providers for my marcel-jan.eu domain. And while I had some time build in before going on vacation, there were problems with the transfer code not working. Because apparently the .eu domain is different from the regular .nl domain.

In the end I managed to get my marcel-jan.eu mail working just the evening before leaving. But I saw no way to migrate the blog while packing my bags. So the blog was down for more than 2 weeks. Did anybody miss it?

After getting back home I had to piece back the WordPress blog with a .zip backup and a backup of the filesystem. Never done such a thing before. And the original WordPress blog on my old provider's site was already gone. So there were no more alternatives to do a better export.

Importing did not go as planned

I started by installing WordPress at my new provider's site. And I went to PHPMyAdmin, which is the tool to work with the database behind WordPress. I imported the .zip (with a .sql file in it). And.. no blogposts. A further look with PHPMyAdmin in the database showed that there were several xxx_posts tables. The one the WordPress site was looking in, was wplx_posts. My imported tables where called wp_posts and 4a2vK12BOL_posts. wp_posts contained old stuff. The 4a2vK12BOL_posts table turned out to have all my posts.

Time to play dirty with SQL

So how do I point WordPress to the right data? It's good to have some SQL skills. What if.. hear me out.. I read the .sql file I got from the export, pick out the SQL to import the 4a2vK12BOL_posts table. Search and replace in the SQL text the term "4a2vK12BOL_posts" for "wplx_posts" in a text editor? And then import that? It's dirty, I grant you that.

But it turns out, it works. As long as you don't create any new posts beforehand that use the same ID as the ones you try to import. A quick removal of the Hello World post made sure of that.

And it worked. I got my posts back. Okay, that's something. I don't have to type all my writings from 2017 to now again.

I did something similar for the comments. Make sure you do that before the first comment spam arrives. Because it will overlap the ID in the comment table with the ones you try to import.

Now I need some images

I was not really surprised that restoring table contents did nothing for my images. Pretty sure that had to come from the filesystem. Luckily I had made a backup of all that. But where to get the image files and where to put them?

Well, looking over the sql for the posts table, I found references to image files like this one: https://marcel-jan.eu/datablog/wp-content/uploads/2017/11/Heart-Reanimation-65992.gif. So somewhere there should be a path with something like wp-content/uploads in the name and a lot of gifs and jpgs in it. I found that, uploaded the directories to the new site and now I had my images back.

That one time I used TablePress

My article about Lion's Mane is one of the most popular blogposts for some reason. Lots of people who want to gain cognitive enhancement. (I wished my post about becoming a skeptic was just as popular. Oh well.) In that post was my one use of a TablePress table. How to get that back?

It turns out the data can be found in the options table. But I had some doubts whether importing it would mess other things up and whether TablePress would find it. So I dug in the Internet Archive to find the contents of the table, and used Excel to create a csv file of that table. Imported that in TablePress and hey presto: we got ourselves our table back.

Tags and categories

One thing I noticed that my categories and tags were gone. The categories were a big mess after 5 years of blogging. Actually it wasn't a big loss. More like a good moment to rethink them. As for tags: it would be nice to retrieve them somehow.

Fortunately there is documentation on the data model of WordPress' database. Like this site: https://wp-staging.com/docs/the-wordpress-database-structure/

From this I learned what tables I needed to import to get my tags back. It turns out it's wplx_term_taxonomy and wplx_term_relationships. In wplx_term_taxonomy there were already 3 IDs taken. ID 2 and 3 were now a wp_theme, where in my old table they were categories.

I decided to remove ID 1, 2 and 3 from my insert statement and import that. If I'm missing 2 categories, that won't hurt me a lot.

Anything else?

From the wp-staging article I learned I probably won't be needing much more from the import. Maybe I will me missing some stuff from the options table, because there's all kind of stuff that plugins put there. But I'm not going to open that can of worms.

I certainly learned a lot on WordPress and its database.. forcefully. Glad the blog is back on the road at my new provider.

Coverart by DALL-E 2

New article matched your Blogger! rule

Testing Wide Production 2020

Fraser boos over ophef na uitspraken over papadag co-aanvoerder

143081.jpg

Mark van der Maarel werd afgelopen week voor de tweede keer vader en besloot om het duel in Emmen van zijn ploeg FC Utrecht zaterdagavond over te slaan voor een papadag. Er kwam veel kritiek op de aanvoerder van de Domstedelingen, maar trainer Henk Fraser nam zijn pupil in bescherming. Hij had met de kennis van nu dezelfde keuze gemaakt.

New article matched your Blogger! rule

Testing Wide Production 2020

vrijdag 19 augustus 2022

Time to rebuild indexes in the Oracle database?

One of the most controversial topics among DBAs is whether to rebuild indexes or not.

But it is so generic that the answer to the question depends on the database version, the index type and the reasons behind doing it.

In most cases, we consider b-tree indexes for rebuilt. Note that b stands not for "binary" but for "balanced". We create indexes mostly for performance reasons. If performance is the real concern, we need to first understand at least at high level how those indexes are organized.

The MOS note Lists All Indexes that Benefit from a Rebuild (Doc ID 122008.1) suggests that indexes considered for rebuilt are indexes for which:

– deleted entries represent 20% or more of the current entries
– the index depth is more then 4 levels

And possible candidate for bitmap index were considered when the distinctiveness was more than 99%:

if ( height > 5 ) OR ( (del_lf_rows/lf_rows) > 0.2 ) then         dbms_output.put_line (rpad(r_indx.owner,16,' ') || rpad(r_indx.index_name,40,' ') ||                                lpad(round((del_lf_rows/lf_rows)*100,3),17,' ') ||                                lpad(height-1,7,' ') || lpad(round((lf_rows-distinct_keys)*100/lf_rows,3),16,' '));     end if;

However, it is a good question if this applies to version 11g and above. I have personally seen good benefits of rebuilding indexes satisfying those conditions in lower versions of the database.

Another MOS article, Script to investigate a b-tree index structure (Doc ID 989186.1), provides a script which is quite handy as it verifies the structure of a b-tree index based on the existing table and index statistics.  The script calculates the following items:

– Estimate the size the index should be as optimal packing can be specified

– The index layout

This script keeps a history of the the data gathered in the INDEX_HIST table. This can be useful to prevent a pattern in index rebuilding. The history log can be user defined.

Yes another MOS article, Index Rebuild, the Need vs the Implications (Doc ID 989093.1), specifies that the most common justifications given for rebuilding an index are:
– index becomes fragmented
– index grows and grows – deleted space is not re-used
– index clustering factor becomes out of sync

However, as noted, the impact of rebuilding the index can be quite significant:

  • Most scripts around depend on the index_stats dynamic table
  • Redo activity may increase and general performance might be impacted as a direct result of rebuilding an index

An index coalesce is often preferred instead of an index rebuild. It has the following advantages:

  • does not require approximately 2 times the disk storage
  • always online
  • does not restructure the index, but combines index leaf blocks as much as possible, avoiding system overhead

Due to the reasons listed above, it is strongly advised not to rebuild indexes on a regular basis but instead use proper diagnostics.

Next, here is my personal experience from rebuilding indexes: either drop and create or simply rebuild. I have seen 3 main use cases:

  1. For performance reasons: because of too many deleted rows or because of split index nodes. With every new release of Oracle, it looks like that we need less and less time on performing such a rebuild operation.
  2. Indexes get fragmented over time and occupy too much space. There was a mission critical production database where data was about 500MB and the indexes all together were about 4TB. You can release sometimes quite a lot of space.
  3. Indexes should be in my opinion in a separate tablespace. Not where the data is. If something gets wrong with the index tablespace: logical or physical corruption, or deleting files unintentionally, then it is just a matter of time to recreate the indexes again.

In Oracle 21c, there is a new feature called Automatic Index Optimization. The optimization process includes 3 actions:

• Compress: Compresses portions of the key values in an index segment (~3 times)

• Shrink: Merges the contents of index blocks where possible to free blocks for reuse

• Rebuild: Rebuilds an index to improve space usage and access speed

For a very long time, both DBAs and Developers, have been struggling (really struggling) with what indexes should be created, what type of indexes they should be created as and what indexes should be dropped from the database. By far, the most interesting new feature of Oracle Database 19c is Automatic Index creation (AI Creation). In the long run, this is to be one of the most important features in the Oracle database. Note that you cannot rebuild an auto index! Nice and useful capability for AI is that Oracle automatically rebuilds indexes that are marked as "Unusable".

For more, check:

Automatic Indexes: Automatically Rebuild Unusable Indexes Part I by Richard Foote

Oracle B-Tree Index Internals: Rebuilding The Truth:

"Generally rebuild index when the clustering factor exceeds eight times the number of dirty blocks in the base table, when the levels exceed two or when there are excessive brown nodes in the index".

When an index should be rebuilt? by Gouranga

SQL Server: Reorganize and Rebuild Indexes in the Database

"Microsoft recommends fixing index fragmentation issues by rebuilding the index if the fragmentation percentage of the index exceeds 30%, where it recommends fixing the index fragmentation issue by reorganizing the index if the index fragmentation percentage exceeds 5% and less than 30%"

How Often Do You Rebuild Indexes?

Rebuilding Indexes by Jonathan Lewis

New article matched your Blogger! rule

Testing Wide Production 2020

woensdag 17 augustus 2022

How to corrupt your PostgreSQL database

Of course most people do not want to corrupt their databases. These people will profit from avoiding the techniques used in this article. But for some, it might be useful to corrupt a database, for example if you want to test a tool or procedure that will be used to detect or fix data corruption.

Prerequisites

We need a database with some data in it, and for some of our experiments, we will need to have some ongoing activity. For that, we can use the built-in PostgreSQL benchmark pgbench. We use scale factor 100, so that the largest table contains 10 million rows:

     $ pgbench -q -i -s 100     dropping old tables...     creating tables...     generating data (client-side)...     10000000 of 10000000 tuples (100%) done (elapsed 7.44 s, remaining 0.00 s)     vacuuming...     creating primary keys...     done in 10.12 s (drop tables 0.18 s, create tables 0.01 s, client-side generate 7.52 s, vacuum 0.14 s, primary keys 2.28 s).     

Load will be generated with 5 concurrent client sessions:

     $ pgbench -c 5 -T 3600     

Creating a corrupt database by setting fsync = off

Let's set fsync = off in postgresql.conf and power off the server while it is under load.

After a few attempts, we can detect data corruption with the amcheck extension:

     postgres=# CREATE EXTENSION amcheck;     CREATE EXTENSION     postgres=# SELECT bt_index_parent_check('pgbench_accounts_pkey', TRUE, TRUE);     WARNING:  concurrent delete in progress within table "pgbench_accounts"     ERROR:  could not access status of transaction 1949706     DETAIL:  Could not read from file "pg_subtrans/001D" at offset 196608: read too few bytes.     CONTEXT:  while checking uniqueness of tuple (131074,45) in relation "pgbench_accounts"     

What happened? Data were no longer flushed to disk in the correct order, so that data modifications could hit the disk before the WAL did. This leads to data corruption during crash recovery.

Creating a corrupt database from a backup

While pgbench is running, we create a base backup:

     $ psql     postgres=# SELECT pg_backup_start('test');      pg_backup_start      ═════════════════      1/47F8A130     (1 row)     

Note that since I am using PostgreSQL v15, the function to start backup mode is pg_backup_start() rather than pg_start_backup(). This is because the exclusive backup API, which had been deprecated since PostgreSQL 9.6, was finally removed in v15. To find out more, read my updated post in the link.

Let's figure out the object IDs of the database and of the primary key index of pgbench_accounts:

     postgres=# SELECT relfilenode FROM pg_class                WHERE relname = 'pgbench_accounts_pkey';      relfilenode      ═════════════            16430     (1 row)          postgres=# SELECT oid FROM pg_database                WHERE datname = 'postgres';      oid      ═════        5     (1 row)     

We create a backup by copying the data directory. Afterwards, we copy the primary key index of pgbench_accounts and the commit log again to make sure that they are more recent than the rest:

     $ cp -r data backup     $ cp data/base/5/16430* backup/base/5     $ cp data/pg_xact/* backup/pg_xact/     $ rm backup/postmaster.pid     

The crucial part: do not create backup_label

Now we exit backup mode, but ignore the contents of the backup_label file returned from pg_backup_stop():

     postgres=# SELECT labelfile FROM pg_backup_stop();     NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup                                labelfile                                 ════════════════════════════════════════════════════════════════      START WAL LOCATION: 1/47F8A130 (file 000000010000000100000047)↵      CHECKPOINT LOCATION: 1/65CD24F0                               ↵      BACKUP METHOD: streamed                                       ↵      BACKUP FROM: primary                                          ↵      START TIME: 2022-07-05 08:32:47 CEST                          ↵      LABEL: test                                                   ↵      START TIMELINE: 1                                             ↵           (1 row)     

Then, let's make sure that the last checkpoint in the control file is different:

     $ pg_controldata -D backup | grep REDO     Latest checkpoint's REDO location:    1/890077D0     Latest checkpoint's REDO WAL file:    000000010000000100000089     

Great! Let's start the server:

     $ echo 'port = 5555' >> backup/postgresql.auto.conf     $ pg_ctl -D backup start     waiting for server to start..... done     server started     

Now an index scan on pgbench_accounts fails, because the index contains more recent data than the table:

     postgres=# SELECT * FROM pgbench_accounts ORDER BY aid;     ERROR:  could not read block 166818 in file "base/5/16422.1": read only 0 of 8192 bytes     

What happened? By omitting the backup_label file from the backup, we recovered from the wrong checkpoint, so the data in the table and its index were no longer consistent. Note that we can get the same effect without pg_backup_start() and pg_backup_stop(), I only wanted to emphasize the importance of backup_label.

Creating a corrupt database with pg_resetwal

While the database is under load from pgbench, we crash it with

     pg_ctl stop -m immediate -D data     

Then we run pg_resetwal:

     pg_resetwal -D data     The database server was not shut down cleanly.     Resetting the write-ahead log might cause data to be lost.     If you want to proceed anyway, use -f to force reset.     $ pg_resetwal -f -D data     Write-ahead log reset     

Then we start the server and use amcheck like before to check the index for integrity:

     postgres=# CREATE EXTENSION amcheck;     CREATE EXTENSION     postgres=# SELECT bt_index_parent_check('pgbench_accounts_pkey', TRUE, TRUE);     WARNING:  concurrent delete in progress within table "pgbench_accounts"     ERROR:  could not access status of transaction 51959     DETAIL:  Could not read from file "pg_subtrans/0000" at offset 204800: read too few bytes.     CONTEXT:  while checking uniqueness of tuple (1,1) in relation "pgbench_accounts"     

What happened? pg_resetwal is only safe to use on a cluster that was shutdown cleanly. The option -f is intended as a last-ditch effort to get a corrupted server to start and salvage some data. Only experts should use it.

Creating a corrupt database with pg_upgrade --link

We create a second cluster with initdb:

     $ initdb -E UTF8 --locale=C -U postgres data2     

Then we edit postgresql.conf and choose a different port number. After shutting down the original cluster, we run an "upgrade" in link mode:

     $ pg_upgrade -d /home/laurenz/data -D /home/laurenz/data2 \     > -b /usr/pgsql-15/bin -B /usr/pgsql-15/bin -U postgres --link     Performing Consistency Checks     ..     Performing Upgrade     ..     Adding ".old" suffix to old global/pg_control               ok          If you want to start the old cluster, you will need to remove     the ".old" suffix from /home/laurenz/data/global/pg_control.old.     Because "link" mode was used, the old cluster cannot be safely     started once the new cluster has been started.     ..     Upgrade Complete     ----------------     Optimizer statistics are not transferred by pg_upgrade.     Once you start the new server, consider running:         /usr/pgsql-15/bin/vacuumdb -U postgres --all --analyze-in-stages          Running this script will delete the old cluster's data files:         ./delete_old_cluster.sh     

pg_upgrade renamed the control file of the old cluster, so that it cannot get started by accident. We'll undo that:

     mv /home/laurenz/data/global/pg_control.old \     >  /home/laurenz/data/global/pg_control     

Now we can start both clusters and run pgbench on both. Soon we will see error messages like

     ERROR:  unexpected data beyond EOF in block 1 of relation base/5/16397     HINT:  This has been seen to occur with buggy kernels; consider updating your system.          ERROR:  duplicate key value violates unique constraint "pgbench_accounts_pkey"     DETAIL:  Key (aid)=(8040446) already exists.          WARNING:  could not write block 13 of base/5/16404     DETAIL:  Multiple failures --- write error might be permanent.          ERROR:  xlog flush request 0/98AEE3E0 is not satisfied --- flushed only to 0/648CDC58     CONTEXT:  writing block 13 of relation base/5/16404          ERROR:  could not access status of transaction 39798     DETAIL:  Could not read from file "pg_subtrans/0000" at offset 155648: read too few bytes.     

What happened? Since both clusters share the same data files, we managed to start two servers on the same data files. This leads to data corruption.

Creating a corrupt database by manipulating data files

For that, we figure out the file name that belongs to the table pgbench_accounts:

     postgres=# SELECT relfilenode FROM pg_class                WHERE relname = 'pgbench_accounts';      relfilenode      ═════════════            16396     (1 row)     

Now we stop the server and write some garbage into the first data block:

     yes 'this is garbage' | dd of=data/base/5/16396 bs=1024 seek=2 count=1 conv=notrunc     0+1 records in     0+1 records out     1024 bytes (1.0 kB, 1.0 KiB) copied, 0.00031255 s, 3.3 MB/s     

Then we start the server and try to select from the table:

     postgres=# TABLE pgbench_accounts ;     ERROR:  compressed pglz data is corrupt     

What happened? We tampered with the data files, so it's unsurprising that the table is corrupted.

Creating a corrupt database with catalog modifications

Who needs ALTER TABLE to drop a table column? We can simply run

     DELETE FROM pg_attribute     WHERE attrelid = 'pgbench_accounts'::regclass       AND attname = 'bid';     

After that, an attempt to query the table will result in an error:

     ERROR:  pg_attribute catalog is missing 1 attribute(s) for relation OID 16396     

What happened? We ignored that dropping a column sets attisdropped to TRUE in pg_attribute rather than actually removing the entry. Moreover, we didn't check for dependencies in pg_depend, nor did we properly lock the table against concurrent access. Modifying catalog tables is unsupported, and if it breaks the database, you get to keep both pieces.

Conclusion

We have seen a number of ways how you can corrupt a PostgreSQL database. Some of these were obvious, some might surprise the beginner. If you don't want a corrupted database,

  • don't mess with the system catalogs
  • never modify anything in the data directory (with the exception of configuration files)
  • don't run with fsync = off
  • don't call pg_resetwal -f on a crashed server
  • remove the old cluster after an upgrade with pg_upgrade --link
  • don't delete or omit backup_label
  • run a supported version of PostgreSQL to avoid known software bugs
  • run on reliable hardware

I hope you can save some databases with this information! If you'd like to know more about troubleshooting PostgreSQL performance, read my post on join strategies.

The post How to corrupt your PostgreSQL database appeared first on CYBERTEC.

New article matched your Blogger! rule

Testing Wide Production 2020