Thursday, January 7, 2010

MySQL Styles

There are nearly as many ways of writing SQL code as there are people writing it. Each person has his or her own style. While that may make it easy at one time or another for the writer, I suggest that using a well-recognized style makes it easier to read and maintain code (SQL or otherwise). Good style also makes it easier to spot problems with code. Whether you like the style I prefer or not, simply thinking about why others use one coding style over another can help us all communicate through our code better.

Before I launch into the methods I'm using, I suggest you check out http://sqlinform.com/ (I'm not associated with the site, however, I use the free on-line SQL formatting tool there rather frequently). When I'm trying to decipher someone else's code formatted in some other way (often just mashed together on a single or a few lines), this tool gives me a quick/easy way to re-format my code in a way that helps me read through the code. It clearly delineates the pieces of the code structure and helps guide my eye through it. Given the proper settings, I am able to read code so much easier that I am encouraging all the DBD's I work with to utilize this tool whether or not they're writing SQL for MySQL.

Imagine you were handed a file that looks like this:


USE DBFOO; START TRANSACTION;
DELIMITER ;;
// SP_GET_FOO_DATA CREATED BY SOME BEGINNER TO MAKE IT EASIER TO MAINTAIN HOW FOO DATA IS RETRIEVED
CREATE PROCEDURE SP_GET_FOO_DATA(V_FOO VARCHAR(2048)) SQL SECURITY INVOKER BEGIN SELECT SPLATTER.FOO, BLIP.BAR, BLOP.BAZ FROM BLIP LEFT JOIN SPLAT SPLATTER USING(SPLAT_ID), BLOP WHERE FOO = V_FOO AND BLIP.BLIP_ID = BLOP.BLIP_ID; END;; COMMIT;;
DELIMITER ;


Don't run for the hills... That's some funky code but let's see what we can do to clean it up:

1) Using START TRANSACTION and COMMIT are both unnecessary here because DDL statements (like CREATE PROCEDURE, ALTER TABLE, CREATE TABLE, etc) implicitly call COMMIT for you whether or not you're in an active transaction. MySQL silently ignores COMMITs when dealing with non-transactional storage engines but the point is, adding a COMMIT here doesn't add anything, so let's get rid of the excess transactional code.

2) Adding a USE statement may or may not be appropriate. At a minimum it makes deploying the code to a different database that much more difficult because the person applying it will need to change the USE statement accordingly.

3) Am I the only one who finds it hard to read text in all caps all the time? Okay - don't answer that. Obviously, I'm not the only one who would rather see text mostly lower case and all caps only when it's time to draw the reader's attention to something special.

4) The comment that is placed right before the CREATE PROCEDURE statement makes it hard to identify where the comment begins and ends versus where the code begins and ends. These lines need to be a) split apart at a minimum. It makes sense to me that the code would look and read better if the comment became a part of the DDL so it will show up in the SHOW PROCEDURE STATUS statement when called.

5) Long lines are no fun to read when the line is longer than your screen width setting. To get the whole picture, one must either decrease the font size to a sometimes unreadable value, or manually edit the file to break things up. When I do reviews, I want to spend time reading code, not reformatting it.

6) As I mentioned above, all caps makes it harder to follow text. It's a very common standard used by SQL writers that SQL clauses (like SELECT, WHERE, AND, VARCHAR, etc) are typically expressed in capital letters while database, table, column and variable names are expressed in lower case. This makes it much easier to find the clause or parameters to a clause much easier when utilized in conjunction with the other methods below.

7) Breaking up lines into logical groups helps the reader get the big picture fast. When I am writing SQL code, I like to see my column listings in a group, each JOIN in a line group, the WHERE clause in a line group, etc.

8) Combining implicit and explicit joins in MySQL 5 with a sql_mode of STRICT will cause MySQL to complain loudly. Even if we hadn't used the explicit join in the middle, adding the implicit join after the explicit join throws many for a loop. By writing explicit joins, we lead the reader through the code.

9) When aliasing columns or tables, use the word AS even though it's not required. This helps the reader quickly see that the table or column is being aliased. Without it, the alias isn't as easy to find.

10) Combining JOIN USING with implicit joins is typically a problem for MySQL and will likely cause a syntax error. Consider being more explicit when using the USING clause.

11) Positional sensitivity creates fragile code. It's better to use order and group by with the names of the columns than their positional counterparts because the positional order can change without negatively impacting the functionality.

I have some other rules I use for my own code that make it easier for me to read SQL. Can you find them below?


USE dbFoo ;
DELIMITER ;;
CREATE
PROCEDURE sp_get_foo_data(
v_foo VARCHAR(2048)
)
SQL SECURITY INVOKER
COMMENT 'Make it easier to maintain how foo data is retrieved'
BEGIN
SELECT s.foo
, b1.bar
, b2.baz
FROM blip
AS b1

LEFT
JOIN splat
AS s
ON s.SPLAT_ID = b1.SPLAT_ID

INNER
JOIN blop
AS b2
ON b2.blip_id = b1.blip_id

WHERE s.foo = v_foo
;
END ;;
DELIMITER ;


You can see I like to keep operators and SQL clauses to the left of a visual vertical column (right justified) and indent according to the level of code (new blocks like the begin/end block are indented). I also prefer to be explicit with table aliases for each table in-use. I go through this extra step when tables are being joined even if those names don't conflict. This makes it easier to quickly indicate which table the column comes from. When I have long lists of comma-separated lists of items (columns in this case), I like to treat the comma as an operator and start a new line right before the comma then indent according to the list. I put the comma first because it makes adding and removing lines from the list much easier that way. When it makes sense to do so, I also like to vertically align arithmetic, comparison, and assignment operators.

I could go on, I think you might be able to see some of the why behind my own formatting preferences. What preferences do you have and why?

No comments:

Post a Comment