How to Log in PostgreSQL – Tips & Best Practices
What is PostgreSQL?
PostgreSQL is an open-source relational database management system. With the tremendous proliferation of cloud-hosted web, mobile, IoT or big data applications in last few years, its popularity has started to gain momentum in recent years. Startups, small to medium enterprises, and even larger organizations are preferring Postgres over expensive vendor databases for its ease-of-use, cross platform compatibility and support from cloud vendors like AWS.
Like any other database, PostgreSQL also generates event logs. These logs are invaluable for tuning performance, troubleshooting errors and auditing security. Fortunately, the Postgres configuration file has a number of parameters which can be fine-tuned for capturing useful events. In this article we’ll discuss logging best practices and how-to’s for configuring PostgreSQL logs and proper log analysis.
PostgreSQL Logging Tips & Best Practices
Tip 1: Consider the impact of any changes you make
The postgresql.conf is the main configuration file for Postgres. When you want to change logging behaviour, you make changes to the log-related parameters in this file. Sometimes these changes may be temporary, sometimes they are intended to be permanent.
If the Postgres server is controlled by configuration management tools like Puppet or Ansible, you need to ensure they are not reverting any changes you made to the logging parameters. Any active configuration management daemon will overwrite manual changes to any file it controls.
There are few ways you can address this:
- If your change is temporary, temporarily disable the config management agent in the server. Enable it when changes are rolled back.
- Ensure the configuration management process is aware of new logging parameters (e.g. update in Puppet hiera).
- Exclude the postgresql.conf file from configuration management (this option should not be chosen unless there is a strong case for it)
The changed parameters should also be reflected in Postgres infrastructure provisioning scripts written in say, CloudFormation or Terraform. This will ensure the next time a stack is created, the updated logging parameters are used.
Be cautious when you are making changes to a parameter file that’s used in more than one Postgres instance. An example can be a parameter group used in multiple RDS PostgreSQL or Aurora PostgreSQL instances (typically it’s the default parameter group). If you want only specific instances to have changed logging parameters, create a separate parameter group or file for those instances only.
Finally, changes to some parameters may require restarting the PostgreSQL service. You would want to test this in non-production nodes first. If the PostgreSQL node is part of a replication setup, you also need to consider any failover implications and how you will fail back.
Tip 2: Use native PostgreSQL logging
Unless there is a good reason, use separate log files for Postgres instead of sending events to syslog (or Windows event log). We recommend this because:
- Syslog and event logs are operating system files. Beside Postgres, they will contain log events from the OS and other applications. Although modern log management tools can easily extract Postgres messages from syslog, it’s still an effort and may contain “noise” messages. It’s rather simpler to point the log management tool to the dedicated log folder.
- In a very busy system, the OS may decide to drop off Postgres messages to make space for other messages. Native PostgreSQL logging is not like that. Here, logging is done by a separate daemon. When a system gets busy, the process will hold off writing to the log files to let other Postgres daemons finish. This is a better outcome than dropping messages.
To send Postgres events to separate log files, two configuration parameters need to be set:
- log_destination: This parameter can be set to “stderr”, “syslog”, “csvlog” or “eventlog” (for Windows systems). For Amazon RDS PostgreSQL or Aurora PostgreSQL, the values can be set to “csvlog” or “stderr” only.
- We recommend setting this parameter to “stderr”. If you want to download the log files and easily load into common analysis tools like Excel, you would set this to .”csvlog” However, when using CSV format log files, you can’t really use a custom log line prefix.
- logging_collector: this has to be set to “1” for the Postgres logging daemon to run. Again, this parameter is already restricted to 1 in parameter groups for RDS or Aurora PostgreSQL.
Tip 3: Use a custom directory for log files
If the logging collector is on, the log_directory parameter tells PostgreSQL where to save its log files. This parameter can’t be changed for RDS/Aurora PostgreSQL, but for non-managed installations, set this parameter to a value other than pg_log or log (from version 10) under the installation directory. Here is why:
If you need to recreate the PostgreSQL installation for some reason (for example when you are re-creating replication), the logging folder will be deleted from the installation directory and you will lose all events from the old instance. It will be impossible to go back to track any events or errors from that instance.
Security and legal compliance may require retaining logs for a long period of time (in some cases it can be up to ten years) and the volume hosting the Postgres directory may not have enough space for that. You can create the logging directory say, under /var/log and mount it to a separate file system.
Even if you decide to keep the logs under the Postgres installation directory, make sure they are copied to a different location (EFS volume or S3 bucket) regularly.
Tip 4: Keep one day’s worth of events in each log file
Set the log_rotation_age parameter to 1440 (24 hours) and the log_rotation_size parameter to 0. That way, a new log file will be generated once every day. This will help you search through a much smaller number of entries if you are looking for events from a particular date. It becomes easier for log management solutions to index and search log files as well.
Where this does not fit is when you have events spanning across multiple days: for example, a slow running batch job taking more than one day to complete. In such cases you have to search individual log files but that’s still better than searching through one huge file.
Tip 5: Use meaningful log file names
To easily identify individual log files, use a naming standard that shows:
- The server where it’s generated
- The year, month and date it was created
- The hour, minute and second it was created
The log file name can be set with the log_filename parameter. The date and time components in the file name can be escaped with “%” characters. So the default value “postgresql-%Y-%m-%d_%H%M%S.log”, means:
- the file name will start with “postgresql”
- then there will be a 4-digit year, hyphenated by month and the day of the month
- After that the hour, minute and second will be shown, separated from the date with an underscore. The hour will be in 24-hour format.
We recommend you use the following naming pattern:
Here, we are adding the host name. This is useful when you have a number of log files from different systems copied to a central location.
If you want to be really verbose in file naming, you can use the following pattern:
The filename will be something like: “dbserver1-postgresql_Monday-24-October_142033.csv
In RDS or Aurora PostgreSQL, log files can have only two types of names:
If you decide to name your log files without any date timestamp and they are not unique, set the
log_truncate_on_rotation parameter to 0. This will ensure older log files with the same name are not overwritten when log files are rotated.
Tip 6: Set the time zone to local
The log_timezone parameter controls what time zone will be used for timestamps in the log files. We recommend it’s set to the server’s time zone, which should be set to your local time zone.
Tip 7: Use a meaningful prefix for each log event
To set the prefix for every log event line, use a non-empty string for the log_line_prefix parameter. We recommend the prefix should show the following:
- The time of the event – shown up to milliseconds (%m)
- User name (%u)
- Remote host name or its IP address (%h)
- Application name (%a)
- Database name (%d)
- Process ID (%p)
So you can have a log_line_prefix: ‘[%m] – %p %q- %u@%h:%d – %a ’
This will expand to different values depending on where the event is coming from. That’s because the log file will contain messages from both user processes (sessions) and background processes. Some placeholder wildcards mentioned above are not available for background processes, like username (%u) or application name (%a). The %q parameter tells Postgres to print nothing after it if the event is logged by a background process. For session events, this parameter is ignored.
Also, note how we included a single space after the log line prefix. This space separates the prefix from the actual event. It does not have to be a single space – anything like double colon (::) or hyphen (-) or other meaningful separators can be used.
To log hostnames connecting to the server, set the log_hostname parameter to 1. Without this parameter, only IP addresses will be logged. When you are trying to troubleshoot connections from hundreds of hosts, this will save you time instead of figuring out hostnames with nslookup.
However, the DNS lookup will also add some extra time for the logging daemon to write to the file and depending on how busy your system is, the delay can cause significant performance penalty. We suggest you turn on this feature during production workload time and check for any performance issues. If there is no significant performance delay, you can safely keep it on.
Tip 8: Log only database structure changes
You may hear management wants to log “everything” and wants to be able to see “who’s looking at the data”. This may tempt you to log any and every type of SQL statement running on your Postgres server, but in reality this may not be a good idea.
Capturing all types of SQL statements can mean a huge number of log entries per file, particularly in a busy system, and a long time to search through those files. Instead, you would want to set the value of log_statement to “ddl”. This will capture only statements that show any changes to made to the database objects. Unwanted (or unauthorized) database structure changes cause more application issues, and you want to track those events first.
The only time you want this parameter to be set to “all” is when you are using a tool like pgBadger for performance testing and want to get a full workload picture, but this should be temporary only.
Tip 9: Log only warnings, errors, fatal or or panic level messages
The log_min_messages parameter decides what type of messages are logged. The default is set to “warning” and we recommend keeping that. Ideally, you want to look at log files when something is going wrong. Setting it to warn will show all message levels including, warning, error, fatal and panic.
Another parameter, log_min_error_statement, is similar to log_min_messages, except it shows the SQL statement that caused an error condition. Like log_min_messages it has a few levels and we recommend setting this to “warning”.
Tip 10: Log all connections, but not disconnections
From a security perspective, you want to see when someone connects or attempts to connect to your Postgres server, not when someone disconnects. Typically disconnect events are of lower value unless an application is facing frequent disconnections. We recommend setting the log_connections parameter to 1 and log_disconnections to 0.
Tip 11: Don’t log command durations
There are two parameters that control if statement durations are logged:
We recommend setting log_duration to 0 and log_min_duration_statement to -1. This will effectively stop showing show long a SQL statement or a command took to complete. Unless you are capturing log events for performance analysis, there is no reason to show this value.
Tip 12: Don’t use verbose logging
Unless you are troubleshooting application errors or performance issues, don’t use “verbose” for the log_error_verbosity parameter and leave it to “default”. Verbose output can unnecessarily make the files large with increased disk I/O. This means the logging daemon will have to work harder and thereby slow down the overall PostgreSQL performance.
Tip 13: Create a set of search patterns
Don’t spend time writing the same search commands for your logs. If you are frequently using cat and grep to search through log files, make it an automated process. Write some script that takes search strings as parameter and searches through the logs for you.
For example, if you are searching for “access denied” type of message in your logs, the typical message you would be looking for is “permission denied for database” and “user does not have CONNECT privilege“. You can create a simple script that searches for these texts from all your log files over a particular date range. If you are using a log management solution, save such queries for reuse.
Although PostgreSQL has a wide variety of parameters to control logging, it can take some time to fine-tune them to get the type of logs you want. We recommend tweaking the parameters based on your primary focus of logging. For example, if you are capturing logs for security and compliance purposes, check if all connections and every SQL statement is logged.
Also, you may want to keep more than one version of postgresql.conf file handy, each with different set of logging parameters. For example, when debugging an application timeout issue, the active version of the postgresql.conf file may have the log_min_duration_statement parameter set to zero. Once the troubleshooting is over, you would replace the current postgresql.conf with the original one.
Depending on how busy your system is, log files can be quite large. Eyeballing hundreds of lines of events or searching with “grep” can take long time. That’s when you should start considering a modern log management solution like LogDNA.
To learn how LogDNA’s powerful log management and analysis system can instantly collect, parse, and search PostgreSQL logs in real time, visit the web site, sign up for a free 2-week trial (no credit card needed!) or contact us for technical help.