dnsworkshop.de
16 Jul 2012

Supervising BIND 9

BIND 9 is a mature piece of software. Compared with its predecessors BIND 4 and BIND 8, it is more stable and has less security problems. One reason for this is the "Design by contract" programming style used by the BIND 9 team. BIND 9 is very paranoid about data it consumes, and about its internal data structures. Once BIND 9 finds an unexpected state in its internal data-structures, it terminates the DNS server process instead of continue to run with wrong data (and risking a security vulnerability).

While this behavior is good for security, it is very bad for service uptime. The DNS server process terminates, and with it the DNS service. Users (Customers) do not not so much care about security if they cannot reach Facebook. BIND 9 had several incidents in the past years where BIND 9 terminated because of issues inside the code or data-structures, like "BIND 9 Resolver crashes after logging an error in query.c".

BIND 10 is aiming to solve this, as project lead Shane Kerr writes in "Software Robustness and BIND 10". But until BIND 10 arrives, a work-around is needed for BIND 9.

The real issue for the DNS service is not BIND 9 terminating on bad data, but that BIND 9 cannot restart after the fact. There is no "supervisor" process in BIND 9.

Some operating systems have a build-in solution: MacOS X has launchd, and the BIND 9 version Apple delivers with the OS is automatically restarted should it terminate unexpected. Solaris has SMF (Service Management Facility), and BIND 9 can be integrated into SMF. Unbuntu Linux now has upstart, and Fedora systemd, which can also monitor processes and restart them if needed.

For Unix and Linux operating systems that do not ship with a process supervisor solution, supervisord is a nice and easy to setup solution. Supervisord comes as a package with many Linux distributions, and it also works on the BSD Unixes. The configuration below is for OpenBSD, but should require only minor tweaks to run on other Unix systems as well.

Installation

Supervisord is written in Python (2.4 - 2.7) and can be installed from source (where we have to download and install all dependencies) or with the help of setuptools, which takes care of downloading and installing all dependencies. Below I use setuptools:

bash# sh setuptools-0.6c11-py2.7.egg
bash# easy_install supervisor

A basic configuration file for BIND 9 "named"

Below is my basic /etc/supervisord.conf configuration file for one service, the BIND 9 DNS Server:

[unix_http_server]
file = /tmp/supervisor.sock
chmod = 0777
chown= nobody:nogroup

[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///tmp/supervisor.sock

[supervisord]
logfile = /var/log/supervisord.log
logfile_maxbytes = 10MB
logfile_backups=10
loglevel = info
pidfile = /var/run/supervisord.pid
identifier = supervisor
directory = /tmp

[program:named]
command=/usr/sbin/named -f
process_name=%(program_name)s
numprocs=1
directory=/var/named
priority=100
autostart=true
autorestart=unexpected
startsecs=5
startretries=3
exitcodes=0,2
stopsignal=TERM
stopwaitsecs=10
redirect_stderr=false
stdout_logfile=/var/log/named_supervisord.log
stdout_logfile_maxbytes=1MB
stdout_logfile_backups=10
stdout_capture_maxbytes=1MB

starting "supervisord"

Once the configuration file is in place, we can start supervisord. Make sure that BIND 9 is not started, else you will end up with two instances of the BIND 9 server running, which is not a good idea. Also make sure that supervisord will be started on reboot of the server, either trough a startscript or other means. The Supervisord packages coming with Linux distributions install a startscript.

bash# supervisord
bash# tail /var/log/supervisord.log
2012-06-16 16:59:48,812 INFO Increased RLIMIT_NOFILE limit to 1024
2012-06-16 16:59:48,949 INFO RPC interface 'supervisor' initialized
2012-06-16 16:59:48,953 INFO RPC interface 'supervisor' initialized
2012-06-16 16:59:48,963 INFO daemonizing the supervisord process
2012-06-16 16:59:48,964 INFO set current directory: '/tmp'
2012-06-16 16:59:48,967 INFO supervisord started with pid 14724
2012-06-16 16:59:49,976 INFO spawned: 'named' with pid 16701
2012-06-16 16:59:55,020 INFO success: named entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)

Great, supervisord has started, and it also started the BIND 9 process "named". DNS is working now.

simulating a BIND 9 crash

to simulate a BIND 9 crash, we "kill" the BIND 9 named process:

bash# ps aux | grep named
_syslogd 32633  0.0  0.1   512   648 ??  I     17Apr12    2:28.76 syslogd -a /var/named/dev/log -a /var/empty/dev/log
root     16701  0.0  0.8  5684  6500 ??  I      4:59PM    0:00.50 /usr/sbin/named -f
bash# bash-3.2# kill -9 16701
bash# tail supervisord.log 
2012-06-16 17:03:29,192 INFO exited: named (terminated by SIGKILL; not expected)
2012-06-16 17:03:30,201 INFO spawned: 'named' with pid 9832
bash#

Works as a expected. Supervisord has detected that the BIND 9 process has terminated, and has restarted a new one. DNS is still up and running.

Controlling supervisord

The supervisord can be controlled from the commandline using the supervisorctl command. A list of all a control commands can be found with "help", and a description of each command with "help command":

bash# supervisorctl help

default commands (type help ):
=====================================
add    clear  fg        open  quit    remove  restart   start   stop  update 
avail  exit   maintail  pid   reload  reread  shutdown  status  tail  version

bash# supervisorctl help status
status                  Get all process status info.
status            Get status on a single process by name.
status      Get status on multiple named processes.

bash# supervisorctl status
named                            RUNNING    pid 25770, uptime 0:00:12

bash# supervisorctl stop named
named: stopped

bash# supervisorctl start named
named: started

Now, whenever there is an assertion error in the code triggered, BIND 9 will terminate, but supervisord will bring it back from the dead. Your DNS service stays up, and the users and customers happy.

Read the supervisord documentation on how to setup event notifications, so that you get an E-Mail on the event that BIND 9 has been restarted. There might be a security vulnerability nontheless, which you would like to report to bind9-bugs@isc.org.

Of course supervisord can be used to restart other processes as well, including other types of DNS Servers (NSD, Unbound, dnsmasq …).

Other posts
Creative Commons License
strotmann.de by Carsten Strotmann is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License .