Tuesday, September 14, 2010

Dude's Law: Meaningful Thinking

0 comments
I had never really thought much about how coaches get to be really good at what they do. I (like many, I assume) think that being a really good player makes someone a good coach of players. Then I thought about the adage - "Practice doesn't make perfect, practice makes permanent. Right practice makes perfect." The problem is - without the right feedback, there is little chance of knowing that practice is right.

The article I'm pointing to here inspired me to think more about "why" I do things. Why do I go to work each day besides earning a paycheck? Why do I blog? Why do I play disc golf? Why did I get married? Why do I stay married? Why did I make so many of the decisions I've made in my life?

The more I ask why, the more I have to answer "so that." By being conscious of the "so that's," I think living life can become much easier.

Friday, May 14, 2010

Timestamps in tables please

0 comments
As MySQL DBA's, it helps us to understand when rows were added in a table and when those rows were last updated. As a result, we're requesting that all tables have the following two columns in them to help us track those changes:

, created TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00'
, updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP

This should help when it comes time to maintain rows in all your tables because the database knows how to handle these two columns by itself given the right scenario:

mysql> CREATE TABLE foo (
->     foo_id SMALLINT NOT NULL AUTO_INCREMENT PRIMARY KEY
->   , bar varchar(24) NOT NULL DEFAULT ''
->   , created TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00'
->   , updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
-> ) ;
Query OK, 0 rows affected (0.04 sec)

mysql> INSERT foo ( bar, created, updated )
-> VALUES ( 'hello', NULL, NULL )
->      , ( 'world', NULL, NULL )
->      ;
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

Notice here that we intentionally insert NULL values into the created and updated columns. These will be rows 1 and 2 below.

mysql> INSERT foo ( bar ) VALUES ( 'hello again' ) ;
Query OK, 1 row affected (0.00 sec)

This time, I'm not setting the created field at all so you can see what happens when we skip it. I'm not crazy about this inconsistency, but it's what we have to work with for now.

mysql> UPDATE foo SET bar = 'from here' WHERE foo_id = 2 ;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> SELECT * FROM foo ;
+--------+-------------+---------------------+---------------------+
| foo_id | bar         | created             | updated             |
+--------+-------------+---------------------+---------------------+
|      1 | hello       | 2010-05-13 15:23:01 | 2010-05-13 15:23:01 |
|      2 | from here   | 2010-05-13 15:23:01 | 2010-05-13 15:23:19 |
|      3 | hello again | 0000-00-00 00:00:00 | 2010-05-13 15:23:03 |
+--------+-------------+---------------------+---------------------+
3 rows in set (0.00 sec)

Notice that row 3's created column has the 0000-... default value in it. Rows 1 and 2 don't because we actually inserted the NULL value into the NOT NULL column. That demonstrates that inserting NULL into a TIMESTAMP column created with NOT NULL will cause the column to use the current time when an action is performed (INSERT/UPDATE/REPLACE). This does not work with DATE, TIME, and DATETIME fields.

Want to learn more? Check out this link.

Sunday, March 28, 2010

Two of my favorite "toys"

0 comments


Doing work all day long can get rather boring.  So, here, I am showing off one of my hobbies.  The magazine is there for perspective in size.  The small helicopter is a Blade CP+ and the larger is a Bergen Gasser though not fully assembled.  As you can see, the Gasser is about 2.5 times the size of the Blade even with the training gear on.


Monday, February 8, 2010

Automatically Generate Configuration Files (Part 1)

0 comments

This article begins a series on using m4 to generate configurations for MySQL but applies to other configurations as well. Why would I want to generate configurations versus just managing each configuration separately? How often do we feel we need to make a change that applies to several instances all at once? Wouldn't it be nice if there was an easy way to change something in one place so that wold be reflected in all the configuration files under our care? What about when we want to see the same change in all the files except one or two? What about when we want to see that change happen in only one or two files?

There are a plethora of templating, macro, and text processing systems available that can help administrators deal with this type of challenge. I am using m4 for my own needs because it is very simple to understand and maintain while retaining the flexibility I need to go far beyond standard search and replace. I use m4 to help me generate configurations for mysql, httpd, bind(DNS), and other software. Generating configurations makes it much easier for me to maintain, test, and version my changes across several systems.

Enough of the lead-in, let's get started using m4 in a meaningful way.

m4 is available on most Unix-based systems automatically. If it's not available on yours, check your repositories. On Windows, m4 is available in Cygwin. If you don't already have Cygwin, it is a way to get many of the tools that run in Unix-based environments without having to run Unix free of charge.

m4 is a macro processor that has its roots in the C language processors. It takes a stream of text, looks for certain keywords and uses those as macros that are expanded based on the needs presented to it. m4 can handle loops, conditionals, and other common programming constructs. m4 is a tool that helps users by keeping repetition down to a minimum.

m4 uses processing instructions to determine when it needs to do something. I've told it only to read processing instructions that start with m4_ because I don't want it confusing my instructions with real information that needs to be left alone. It's just a precaution, but a good one and I encourage you to use it as well. To get m4 to require m4_, simply add a -P to your m4 command (as shown below).

The first instruction I'll show you is m4_define - the root of all that's m4. What good would a macro processor be if we couldn't define macros to be expanded? Isn't that like having a sail boat without a sail? Anyway, I digress. The syntax for m4_define looks like this: m4_define(`defined_label', `macro expanded') with the defined_label being the text that m4 will substitue and macro expanded being what it will replace defined_label with. Notice that I've wrapped these items with a back-tick and a regular tick. This is important and I'll explain more about that in upcoming articles. For now, let's get with the program! ;-)

$ cat my.cnf.m4
m4_define(`DEF_SERVER_ID', 1)

[mysqld]
server-id = DEF_SERVER_ID
$ 

If we run this file through m4, we will get the following:

$ m4 -P < my.cnf.m4


[mysqld]
server-id = 1
$ 
Notice that the definition of the DEF_SERVER_ID variable seems to have disappeared. That's true - m4 "ate" the definition macro. The new line is still there but that's all that's left of the first line of the file. What's cool, however, is we now have one place where the server ID was specified and another where it was used. No big deal there - what value does that give me, you ask? None at all unless you really like repeating yourself a lot. Of course there is more. Let's make this a little more meaningful.
$ cat my.cnf.m4
m4_define(`DEF_MYSQLDIR', `/var/lib/mysql')
m4_define(`DEF_SERVER_ID', 1)

[mysqld]
server-id = DEF_SERVER_ID
log = DEF_MYSQLDIR/logs/mysql.log
socket = DEF_MYSQLDIR/mysql.sock
pid-file = DEF_MYSQLDIR/run/mysql.pid
datadir = DEF_MYSQLDIR/data
$ 
Did you think through what m4 will do for us? Think about your answer before you read further.
$ m4 -P < my.cnf.m4



[mysqld]
server-id = 1
log = /var/lib/mysql/logs/mysql.log
socket = /var/lib/mysql/mysql.sock
pid-file = /var/lib/mysql/run/mysql.pid
datadir = /var/lib/mysql/data
$ 
Pretty cool, eh? Well - maybe not yet. Did you get it right? What if I decided I needed to move my MySQL instance to another directory like /opt/mysqlInstance1 ? Rather than having to search for and replace all the text of "/var/lib/mysql" changing it to "/opt/mysqlInstance1", all I need to do is change the definition of DEF_MYSQLDIR at the top of the m4 template, then re-run m4 on my template and I get the new ouptut file. That's what I call slick. :-)
$ cat my.cnf.m4
m4_define(`DEF_MYSQLDIR', `/opt/mysqlInstance1')
m4_define(`DEF_SERVER_ID', 1)

[mysqld]
server-id = DEF_SERVER_ID
log = DEF_MYSQLDIR/logs/mysql.log
socket = DEF_MYSQLDIR/mysql.sock
pid-file = DEF_MYSQLDIR/run/mysql.pid
datadir = DEF_MYSQLDIR/data
$ m4 -P < my.cnf.m4



[mysqld]
server-id = 1
log = /opt/mysqlInstance1/mysql.log
socket = /opt/mysqlInstance1/mysql.sockm4 
pid-file = /opt/mysqlInstance1/run/mysql.pid
datadir = /opt/mysqlInstance1/data
$ 
So - that's nice, but there's a lot more to m4 than just being able to define macros. One of the most powerful tools that m4 offers is the ability to pull in other files. I'll explain by example:
$ cat my.cnf.m4
[mysqld]
server-id = DEF_SERVER_ID
log = DEF_MYSQLDIR/logs/mysql.log
socket = DEF_MYSQLDIR/mysql.sock
pid-file = DEF_MYSQLDIR/run/mysql.pid
datadir = DEF_MYSQLDIR/data

$ cat instance1.cnf.m4
m4_define(`DEF_MYSQLDIR', `/opt/mysqlInstance1')m4_dnl
m4_define(`DEF_SERVER_ID', 1)m4_dnl
m4_include(`my.cnf.m4')
$ cat instance2.cnf.m4
m4_define(`DEF_MYSQLDIR', `/opt/mysqlInstance2')m4_dnl
m4_define(`DEF_SERVER_ID', 2)m4_dnl
m4_include(`my.cnf.m4')

$ m4 -P < instance1.cnf.m4
[mysqld]
server-id = 1
log = /opt/mysqlInstance1/mysql.log
socket = /opt/mysqlInstance1/mysql.sock
pid-file = /opt/mysqlInstance1/run/mysql.pid
datadir = /opt/mysqlInstance1/data

$ m4 -P < instance2.cnf.m4
[mysqld]
server-id = 2
log = /opt/mysqlInstance2/mysql.log
socket = /opt/mysqlInstance2/mysql.sock
pid-file = /opt/mysqlInstance2/run/mysql.pid
datadir = /opt/mysqlInstance2/data

$ 

So - I added two things: m4_include and m4_dnl. m4_include says basically go read this file and continue processing as if the text were part of this file. m4_dnl says discard to next line. It's a form of comment but I often use it to mean "don't new line."

You notice in the example above, I created two very small files with the m4_define statements I needed (defining my macros) and then included the main template where those definitions were expanded as explained in the smaller files. So - rather than creating one my.cnf file with the definitions for one instance then copying and hand-modifying that second file for the second instance, I am able to write one template and a couple of small definition files to get the same result. If I need ot make a change to both files at the same time, I can simply change my my.cnf.m4 then re-generate my my.cnf files for each instance.

In today's tutorial, you saw how to create and get m4 to expand basic macros, how to include other files into an m4 macro file, and how to create comments. In our next episode, I'll explain how to make m4 smarter by showing you how to make it ask questions and do things differently depending on the answer. I'll also explain how to get the make command to automatically handle creating all your configuration files for you as well as getting the files into source control.

Do you like this series? Please post a comment! :-)

How to COUNT in SQL

0 comments
Often, I see users attempting to count rows of a table with COUNT(*) as a part of their query. Regardless of what database you're using, this probably not an optimal way to count.

COUNT(*) is likely to cause the system to do a table scan to find out exactly how many rows are available. Table scans are a very bad thing because it causes the system to go to disk to read the table. COUNT(columnName) on a column that is indexed will allow the database to use the index to compute the count (meaning it reads less data and may even be able to do it entirely from RAM if the index is already loaded there). The difference between COUNT(*) and COUNT(columnName) is a simple one - COUNT(*) counts every row where COUNT(columnName) counts the rows where columnName has a non-NULL value.

My favorite way to do counts (when I don't already have a summary table telling me how many rows are in the result set I'm looking for) is to count the non-contextual primary key in the table since I make it a habit of putting a non-contextual primary key in every table I can. This takes advantage of every possible optimization I can and keeps the amount of data the system must read down to a minimum.

This is a great practice to be in because it helps not only in MySQL, but in other databases as well.

Thursday, January 7, 2010

MySQL Styles

0 comments
There are nearly as many ways of writing SQL code as there are people writing it. Each person has his or her own style. While that may make it easy at one time or another for the writer, I suggest that using a well-recognized style makes it easier to read and maintain code (SQL or otherwise). Good style also makes it easier to spot problems with code. Whether you like the style I prefer or not, simply thinking about why others use one coding style over another can help us all communicate through our code better.

Before I launch into the methods I'm using, I suggest you check out http://sqlinform.com/ (I'm not associated with the site, however, I use the free on-line SQL formatting tool there rather frequently). When I'm trying to decipher someone else's code formatted in some other way (often just mashed together on a single or a few lines), this tool gives me a quick/easy way to re-format my code in a way that helps me read through the code. It clearly delineates the pieces of the code structure and helps guide my eye through it. Given the proper settings, I am able to read code so much easier that I am encouraging all the DBD's I work with to utilize this tool whether or not they're writing SQL for MySQL.

Imagine you were handed a file that looks like this:


USE DBFOO; START TRANSACTION;
DELIMITER ;;
// SP_GET_FOO_DATA CREATED BY SOME BEGINNER TO MAKE IT EASIER TO MAINTAIN HOW FOO DATA IS RETRIEVED
CREATE PROCEDURE SP_GET_FOO_DATA(V_FOO VARCHAR(2048)) SQL SECURITY INVOKER BEGIN SELECT SPLATTER.FOO, BLIP.BAR, BLOP.BAZ FROM BLIP LEFT JOIN SPLAT SPLATTER USING(SPLAT_ID), BLOP WHERE FOO = V_FOO AND BLIP.BLIP_ID = BLOP.BLIP_ID; END;; COMMIT;;
DELIMITER ;


Don't run for the hills... That's some funky code but let's see what we can do to clean it up:

1) Using START TRANSACTION and COMMIT are both unnecessary here because DDL statements (like CREATE PROCEDURE, ALTER TABLE, CREATE TABLE, etc) implicitly call COMMIT for you whether or not you're in an active transaction. MySQL silently ignores COMMITs when dealing with non-transactional storage engines but the point is, adding a COMMIT here doesn't add anything, so let's get rid of the excess transactional code.

2) Adding a USE statement may or may not be appropriate. At a minimum it makes deploying the code to a different database that much more difficult because the person applying it will need to change the USE statement accordingly.

3) Am I the only one who finds it hard to read text in all caps all the time? Okay - don't answer that. Obviously, I'm not the only one who would rather see text mostly lower case and all caps only when it's time to draw the reader's attention to something special.

4) The comment that is placed right before the CREATE PROCEDURE statement makes it hard to identify where the comment begins and ends versus where the code begins and ends. These lines need to be a) split apart at a minimum. It makes sense to me that the code would look and read better if the comment became a part of the DDL so it will show up in the SHOW PROCEDURE STATUS statement when called.

5) Long lines are no fun to read when the line is longer than your screen width setting. To get the whole picture, one must either decrease the font size to a sometimes unreadable value, or manually edit the file to break things up. When I do reviews, I want to spend time reading code, not reformatting it.

6) As I mentioned above, all caps makes it harder to follow text. It's a very common standard used by SQL writers that SQL clauses (like SELECT, WHERE, AND, VARCHAR, etc) are typically expressed in capital letters while database, table, column and variable names are expressed in lower case. This makes it much easier to find the clause or parameters to a clause much easier when utilized in conjunction with the other methods below.

7) Breaking up lines into logical groups helps the reader get the big picture fast. When I am writing SQL code, I like to see my column listings in a group, each JOIN in a line group, the WHERE clause in a line group, etc.

8) Combining implicit and explicit joins in MySQL 5 with a sql_mode of STRICT will cause MySQL to complain loudly. Even if we hadn't used the explicit join in the middle, adding the implicit join after the explicit join throws many for a loop. By writing explicit joins, we lead the reader through the code.

9) When aliasing columns or tables, use the word AS even though it's not required. This helps the reader quickly see that the table or column is being aliased. Without it, the alias isn't as easy to find.

10) Combining JOIN USING with implicit joins is typically a problem for MySQL and will likely cause a syntax error. Consider being more explicit when using the USING clause.

11) Positional sensitivity creates fragile code. It's better to use order and group by with the names of the columns than their positional counterparts because the positional order can change without negatively impacting the functionality.

I have some other rules I use for my own code that make it easier for me to read SQL. Can you find them below?


USE dbFoo ;
DELIMITER ;;
CREATE
PROCEDURE sp_get_foo_data(
v_foo VARCHAR(2048)
)
SQL SECURITY INVOKER
COMMENT 'Make it easier to maintain how foo data is retrieved'
BEGIN
SELECT s.foo
, b1.bar
, b2.baz
FROM blip
AS b1

LEFT
JOIN splat
AS s
ON s.SPLAT_ID = b1.SPLAT_ID

INNER
JOIN blop
AS b2
ON b2.blip_id = b1.blip_id

WHERE s.foo = v_foo
;
END ;;
DELIMITER ;


You can see I like to keep operators and SQL clauses to the left of a visual vertical column (right justified) and indent according to the level of code (new blocks like the begin/end block are indented). I also prefer to be explicit with table aliases for each table in-use. I go through this extra step when tables are being joined even if those names don't conflict. This makes it easier to quickly indicate which table the column comes from. When I have long lists of comma-separated lists of items (columns in this case), I like to treat the comma as an operator and start a new line right before the comma then indent according to the list. I put the comma first because it makes adding and removing lines from the list much easier that way. When it makes sense to do so, I also like to vertically align arithmetic, comparison, and assignment operators.

I could go on, I think you might be able to see some of the why behind my own formatting preferences. What preferences do you have and why?