Monday, February 8, 2010

Automatically Generate Configuration Files (Part 1)

0 comments

This article begins a series on using m4 to generate configurations for MySQL but applies to other configurations as well. Why would I want to generate configurations versus just managing each configuration separately? How often do we feel we need to make a change that applies to several instances all at once? Wouldn't it be nice if there was an easy way to change something in one place so that wold be reflected in all the configuration files under our care? What about when we want to see the same change in all the files except one or two? What about when we want to see that change happen in only one or two files?

There are a plethora of templating, macro, and text processing systems available that can help administrators deal with this type of challenge. I am using m4 for my own needs because it is very simple to understand and maintain while retaining the flexibility I need to go far beyond standard search and replace. I use m4 to help me generate configurations for mysql, httpd, bind(DNS), and other software. Generating configurations makes it much easier for me to maintain, test, and version my changes across several systems.

Enough of the lead-in, let's get started using m4 in a meaningful way.

m4 is available on most Unix-based systems automatically. If it's not available on yours, check your repositories. On Windows, m4 is available in Cygwin. If you don't already have Cygwin, it is a way to get many of the tools that run in Unix-based environments without having to run Unix free of charge.

m4 is a macro processor that has its roots in the C language processors. It takes a stream of text, looks for certain keywords and uses those as macros that are expanded based on the needs presented to it. m4 can handle loops, conditionals, and other common programming constructs. m4 is a tool that helps users by keeping repetition down to a minimum.

m4 uses processing instructions to determine when it needs to do something. I've told it only to read processing instructions that start with m4_ because I don't want it confusing my instructions with real information that needs to be left alone. It's just a precaution, but a good one and I encourage you to use it as well. To get m4 to require m4_, simply add a -P to your m4 command (as shown below).

The first instruction I'll show you is m4_define - the root of all that's m4. What good would a macro processor be if we couldn't define macros to be expanded? Isn't that like having a sail boat without a sail? Anyway, I digress. The syntax for m4_define looks like this: m4_define(`defined_label', `macro expanded') with the defined_label being the text that m4 will substitue and macro expanded being what it will replace defined_label with. Notice that I've wrapped these items with a back-tick and a regular tick. This is important and I'll explain more about that in upcoming articles. For now, let's get with the program! ;-)

$ cat my.cnf.m4
m4_define(`DEF_SERVER_ID', 1)

[mysqld]
server-id = DEF_SERVER_ID
$ 

If we run this file through m4, we will get the following:

$ m4 -P < my.cnf.m4


[mysqld]
server-id = 1
$ 
Notice that the definition of the DEF_SERVER_ID variable seems to have disappeared. That's true - m4 "ate" the definition macro. The new line is still there but that's all that's left of the first line of the file. What's cool, however, is we now have one place where the server ID was specified and another where it was used. No big deal there - what value does that give me, you ask? None at all unless you really like repeating yourself a lot. Of course there is more. Let's make this a little more meaningful.
$ cat my.cnf.m4
m4_define(`DEF_MYSQLDIR', `/var/lib/mysql')
m4_define(`DEF_SERVER_ID', 1)

[mysqld]
server-id = DEF_SERVER_ID
log = DEF_MYSQLDIR/logs/mysql.log
socket = DEF_MYSQLDIR/mysql.sock
pid-file = DEF_MYSQLDIR/run/mysql.pid
datadir = DEF_MYSQLDIR/data
$ 
Did you think through what m4 will do for us? Think about your answer before you read further.
$ m4 -P < my.cnf.m4



[mysqld]
server-id = 1
log = /var/lib/mysql/logs/mysql.log
socket = /var/lib/mysql/mysql.sock
pid-file = /var/lib/mysql/run/mysql.pid
datadir = /var/lib/mysql/data
$ 
Pretty cool, eh? Well - maybe not yet. Did you get it right? What if I decided I needed to move my MySQL instance to another directory like /opt/mysqlInstance1 ? Rather than having to search for and replace all the text of "/var/lib/mysql" changing it to "/opt/mysqlInstance1", all I need to do is change the definition of DEF_MYSQLDIR at the top of the m4 template, then re-run m4 on my template and I get the new ouptut file. That's what I call slick. :-)
$ cat my.cnf.m4
m4_define(`DEF_MYSQLDIR', `/opt/mysqlInstance1')
m4_define(`DEF_SERVER_ID', 1)

[mysqld]
server-id = DEF_SERVER_ID
log = DEF_MYSQLDIR/logs/mysql.log
socket = DEF_MYSQLDIR/mysql.sock
pid-file = DEF_MYSQLDIR/run/mysql.pid
datadir = DEF_MYSQLDIR/data
$ m4 -P < my.cnf.m4



[mysqld]
server-id = 1
log = /opt/mysqlInstance1/mysql.log
socket = /opt/mysqlInstance1/mysql.sockm4 
pid-file = /opt/mysqlInstance1/run/mysql.pid
datadir = /opt/mysqlInstance1/data
$ 
So - that's nice, but there's a lot more to m4 than just being able to define macros. One of the most powerful tools that m4 offers is the ability to pull in other files. I'll explain by example:
$ cat my.cnf.m4
[mysqld]
server-id = DEF_SERVER_ID
log = DEF_MYSQLDIR/logs/mysql.log
socket = DEF_MYSQLDIR/mysql.sock
pid-file = DEF_MYSQLDIR/run/mysql.pid
datadir = DEF_MYSQLDIR/data

$ cat instance1.cnf.m4
m4_define(`DEF_MYSQLDIR', `/opt/mysqlInstance1')m4_dnl
m4_define(`DEF_SERVER_ID', 1)m4_dnl
m4_include(`my.cnf.m4')
$ cat instance2.cnf.m4
m4_define(`DEF_MYSQLDIR', `/opt/mysqlInstance2')m4_dnl
m4_define(`DEF_SERVER_ID', 2)m4_dnl
m4_include(`my.cnf.m4')

$ m4 -P < instance1.cnf.m4
[mysqld]
server-id = 1
log = /opt/mysqlInstance1/mysql.log
socket = /opt/mysqlInstance1/mysql.sock
pid-file = /opt/mysqlInstance1/run/mysql.pid
datadir = /opt/mysqlInstance1/data

$ m4 -P < instance2.cnf.m4
[mysqld]
server-id = 2
log = /opt/mysqlInstance2/mysql.log
socket = /opt/mysqlInstance2/mysql.sock
pid-file = /opt/mysqlInstance2/run/mysql.pid
datadir = /opt/mysqlInstance2/data

$ 

So - I added two things: m4_include and m4_dnl. m4_include says basically go read this file and continue processing as if the text were part of this file. m4_dnl says discard to next line. It's a form of comment but I often use it to mean "don't new line."

You notice in the example above, I created two very small files with the m4_define statements I needed (defining my macros) and then included the main template where those definitions were expanded as explained in the smaller files. So - rather than creating one my.cnf file with the definitions for one instance then copying and hand-modifying that second file for the second instance, I am able to write one template and a couple of small definition files to get the same result. If I need ot make a change to both files at the same time, I can simply change my my.cnf.m4 then re-generate my my.cnf files for each instance.

In today's tutorial, you saw how to create and get m4 to expand basic macros, how to include other files into an m4 macro file, and how to create comments. In our next episode, I'll explain how to make m4 smarter by showing you how to make it ask questions and do things differently depending on the answer. I'll also explain how to get the make command to automatically handle creating all your configuration files for you as well as getting the files into source control.

Do you like this series? Please post a comment! :-)

How to COUNT in SQL

0 comments
Often, I see users attempting to count rows of a table with COUNT(*) as a part of their query. Regardless of what database you're using, this probably not an optimal way to count.

COUNT(*) is likely to cause the system to do a table scan to find out exactly how many rows are available. Table scans are a very bad thing because it causes the system to go to disk to read the table. COUNT(columnName) on a column that is indexed will allow the database to use the index to compute the count (meaning it reads less data and may even be able to do it entirely from RAM if the index is already loaded there). The difference between COUNT(*) and COUNT(columnName) is a simple one - COUNT(*) counts every row where COUNT(columnName) counts the rows where columnName has a non-NULL value.

My favorite way to do counts (when I don't already have a summary table telling me how many rows are in the result set I'm looking for) is to count the non-contextual primary key in the table since I make it a habit of putting a non-contextual primary key in every table I can. This takes advantage of every possible optimization I can and keeps the amount of data the system must read down to a minimum.

This is a great practice to be in because it helps not only in MySQL, but in other databases as well.