Monday, February 8, 2010

How to COUNT in SQL

Often, I see users attempting to count rows of a table with COUNT(*) as a part of their query. Regardless of what database you're using, this probably not an optimal way to count.

COUNT(*) is likely to cause the system to do a table scan to find out exactly how many rows are available. Table scans are a very bad thing because it causes the system to go to disk to read the table. COUNT(columnName) on a column that is indexed will allow the database to use the index to compute the count (meaning it reads less data and may even be able to do it entirely from RAM if the index is already loaded there). The difference between COUNT(*) and COUNT(columnName) is a simple one - COUNT(*) counts every row where COUNT(columnName) counts the rows where columnName has a non-NULL value.

My favorite way to do counts (when I don't already have a summary table telling me how many rows are in the result set I'm looking for) is to count the non-contextual primary key in the table since I make it a habit of putting a non-contextual primary key in every table I can. This takes advantage of every possible optimization I can and keeps the amount of data the system must read down to a minimum.

This is a great practice to be in because it helps not only in MySQL, but in other databases as well.

No comments:

Post a Comment