Maintaining a PostgreSQL database health with ANALYZE, REINDEX, and VACUUM commands

April 01, 2016

Large and heavily updated database tables in PostgreSQL often suffer from two issues:

  • table and index bloat, which means they occupy way more disk space and memory than actually required;
  • corrupted indexes, which means query planner can't generate efficient query execution plans for them and as a result DB performance degrades over time.

To help developers and database administrators deal with these issues, PostgreSQL has a few commands: ANALYZE, VACUUM, REINDEX.

Let's take a quick look at each of PostgreSQL database maintenance commands to understand what they do and when to use them.

ANALYZE

PostgreSQL ANALYZE command collects statistics about specific table columns, entire table, or entire database. The PostgreSQL query planner then uses that data to generate efficient execution plans for queries.

Examples:

  • ANALYZE users; collects statistics for users table.
  • ANALYZE VERBOSE users; does exactly the same plus prints progress messages.
  • ANALYZE users (id, display_name); collects statistics for id and display_name columns of users table.
  • ANALYZE; collects statistics for all table in the current database.

Note that, in PostgreSQL, ANALYZE command doesn't read or update indexes. It deals only with table/column contents. It also doesn't block the table - other queries may read from the table while ANALYZE is running.

You may want to run ANALYZE in these situations:

  • when the contents of a table has changed significantly. For example, when a few percents of records in a table have been added, updated or deleted.
  • just before or after adding an index to a table. That may help query planner to generate optimal query plans that will efficiently the new index.

VACUUM

PostgreSQL VACUUM command reclaims storage occupied by dead records in database tables. VACUUM removes deleted records and previous versions of updated records. The reason those records have to be removed is that they are only marked as unavailable but still occupy disk space, making table scans slower.

VACUUM also has the ANALYZE option that tells it to analyze the table during vacuuming.

There are two different VACUUM versions.

  1. VACUUM reorganises live records in a table and makes the allocated storage space available for new records. However, that storage space doesn't become available to the operating system. This command can be run in parallel with other read and write operations on the table.
  2. VACUUM FULL exclusively locks the table, creates a new table file, copies only live records to this file, then deletes the old table file. The reclaimed storage space is returned to the operating system in this case. This command doesn't allow any other read and write operations on the vacuumed table.

Examples:

  • VACUUM users; vacuums users table.
  • VACUUM VERBOSE users; vacuums users table and prints progress messages.
  • VACUUM FULL users; fully vacuums users table. Other queries cannot access users table while vacuuming is running.
  • VACUUM ANALYZE users; vacuums and analyses users table.
  • VACUUM FULL VERBOSE ANALYZE users; fully vacuums users table and displays progress messages.
  • VACUUM; vacuums all the tables in the database the current user has access to.

Even though PostgreSQL can autovacuum tables after a certain percentage of rows gets marked as deleted, some developers and DB admins prefer to run VACUUM ANALYZE on tables with a lot of read/write operations on a custom schedule. For example, daily, at the least busy time of the day.

You may want to run VACUUM when the contents of a table has changed significantly: a significant percentage of records in a table has been added, updated or deleted.

REINDEX

PostgreSQL REINDEX command rebuilds an existing index or all indexes for a table or entire database. Use this command to fix corrupted and unusable indexes, or when an index gets bloated after significant change in the table contents.

Examples:

  • REINDEX INDEX idx_users_on_display_name; reindexes index idx_users_on_display_name.
  • REINDEX TABLE users; reindexes users table.
  • REINDEX VERBOSE TABLE users; reindexes users table and prints progress messages.
  • REINDEX DATABASE app_production; reindexes all tables in app_production database.

In my experience, running VACUUM ANALYZE and then REINDEX commands on the largest by the number of rows and/or size tables in the application database helps to improve the performance of SQL queries.

Copyright © 2014-2019 Andrei Gridnev