Wednesday, December 2, 2009

Motorola Droid versus Blackberry 8830 World Edition

As a DBA, I often need to be available to perform work on databases I am responsible for when I am away from the office. I have two phones - a corporate Blackberry 8830 World Edition and my personal Motorola Droid. Both phones are fully featured and support a large number of applications. Both phones are impressive in their own right on capabilities, though the thing I feel I miss most on the Blackberry is the turn-by-turn verbally announced directions (versus VZ Navigator on my old Glyde). The ssh client in the Droid (ConnectBot) seems far more capable than the ssh client I use on my Blackberry. I had difficulty learning the Blackberry development environment and while I didn't try terribly hard, I was not able to find a simulator to help me test my applications before deploying them to a Blackberry. I did not have that problem at all with my Droid. The Droid simulator was very easy to find and works wonderfully for me. The integration into Eclipse also makes it very easy to utilize.

With what I know about the Droid versus the Blackberry after a few months of use, for my needs, the Droid beats my 8830 hands down.

Tuesday, November 3, 2009

Dave's Stuff: Cardinality

Here's some good stuff on cardinality I read recently. It made me think to review my own understanding of the computation of cardinality in a table. Dave's Stuff: Cardinality

I didn't know that cardinality isn't a perfect computation of cardinality, rather a random sampling of values during an ANALYZE TABLE according to his description.

Dave - The way I have been explaining cardinality to users is to discuss uniqueness of a value. If a bin has n items in it, the highest cardinality possible is n if all the items in the bin are unique. The lowest possible cardinality is 1. The real value of cardinality is the relationship of the number of all items in the bin to the number of distinct items in the bin (key in this case). One formula for determining this is...

cardinality = total_items / distinct_items

Friday, June 19, 2009

Dynamic Memory Allocation in MySQL

Did you know that MySQL can dynamically allocate memory for queries? I didn't till I found this web page...

The connection buffer and result buffer both begin with a size given by net_buffer_length but are dynamically enlarged up to max_allowed_packet bytes as needed. The result buffer shrinks to net_buffer_length after each SQL statement. While a statement is running, a copy of the current statement string is also allocated.

What's the big deal about this? mysqld needs enough memory in order to transmit the entire result of a query. If a query or result requires more than max_allowed_packet bytes, mysqld is unable to serve up the results. While the table structure is built to support very large amounts of data, it's possible to design a table in such a way that mysqld won't be able to respond by asking it to send back too much data in a single row.

I was surprised to learn that even if you can concat data into a text field, if the size of a retrieved data set exceeds max_allowed_packet bytes, the query will fail! That sucks. I was certain I had filed a bug on this years ago, but wasn't able to find it. At any rate, has been filed recently. :-)

Sunday, May 31, 2009

What do you think? Test automation...

What do you think? What are the advantages of how your approach works for you? Some say write tests after other code is done. Some prefer to write tests then code. Some want tests at the unit level. Some only care about testing API's. What are your thoughts?

Saturday, April 25, 2009

What should I use - stored procedures or application code?


I've often thought about using a stored procedure versus using application code to accomplish tasks. The rules I use are pretty simple:
  1. Let the database generate data for itself (from data it already has).
  2. Let the database maintain data security and integrity where appropriate.
  3. Let the database help me administer it.
I've chosen to shy away from asking the database to handle business logic internally for a number of reasons, largely because databases are typically very expensive resources in comparison to application servers. It may also be difficult to revision business logic stored in the database, therefore making it more difficult to track how that code has changed.

I do understand the value of stored procedures reducing network traffic for certain types of queries as well as guaranteeing the effective use of the query cache, however, I find that applications can be designed to use libraries of SQL in such a way that the query cache is used effectively and network traffic is reduced through the proper use of joins, cursors and temporary tables.

Talking with other DBA's, some agree with me, some don't. What are your thoughts about when to use stored procedures versus application code?

Creating a MySQL reference database in 3 easy steps

What is a reference database?

A reference database is nothing more than a set of data that developers and test systems use to mirror what production data looks like without actually exposing sensitive data from the production system. This is critical for systems that handle HIPPA and PCI/DSS data because developers need to know what production looks and performs like, but they aren't allowed to see protected data like credit card numbers or medical data about specific individuals.

There are a number of different goals a reference database typically covers:
  • Provide a realistic snapshot of what developers need in order to be effective when maintaining most of the production code (i.e. the DDL and realistic data).
  • Provide a way to test edge cases without affecting production.
  • Provide a data set that realistically models production for performance testing of queries.
How do I set up a basic reference database?

So, the process of creating the reference database is pretty simple using replication. I'll assume you're already using replication to provide back-up of your master data set (at minimum). I'm also going to assume you are on MySQL 5 or newer.
  1. Create a replication slave that receives all data from your production master.
  2. Create triggers and corresponding stored procedures on this slave that obfuscate sensitive data.
  3. Utilize a cron or similar automated job to do a mysqldump of the resulting data set from time to time so that developers and testers can download it to their environments.
This accomplishes the goal of providing a production database snapshot in isolation without exposing sensitive data to others that aren't authorized to see it. It also accomplishes the goal of providing a basis for testing performance of the database as well as a method for verifying edge cases.

How do I cover edge cases?

The problem is, this method probably won't cover edge cases by itself, but as I said above, it does provide the basis for it. To cover edge cases that the production data doesn't cover, you'll need to write a script that will update a deployed copy of the reference database. What happens if certain data is missing? What if it's duplicated inappropriately? How does the application react when a transaction table is full - does it automatically add another transaction table to the merge? ... You know your application and the edge cases it needs. Creating scripts to simulate these conditions should be relatively straightforward.

Once you have a reference database and scripts to test performance and edge cases, it becomes terribly simple to add your database to your continuous integration system. Deciding on how often to run those tests largely depends on how often you can afford to run the tests with the resources you have available to you. Typically, I want to run a minimum set of tests for every code commit related to it, then run a full set of tests on a timed schedule so I don't overwhelm my build system.

Has this helped you? Do you have suggestions or comments that might help someone else? Please feel free to add your comments here. :-)

Letting RSS help you with searches

Until recently, I never really thought much of using RSS to help me in this economy with job searches, but someone else did because sites like Dice, Monster, Indeed and Craigslist all provide ways to let your RSS reader keep you up-to-date with job search data. I know - I'm giving away one of my search secrets, but if I don't tell you, someone else will. Let's stick together and help each other out as we weather this recession.

Tuesday, January 20, 2009

When do I get a certification and why?

I see questions like this on LinkedIn - very frequently.

Will it help me make more money? Will it help me get my foot in the door? Are certifications just a waste of time?

The answer to all these questions is: it depends. I've been on both sides of job searching (looking for work and looking for workers) and I can tell you that I've seen lots of people who had or were looking for certifications. My Certified MySQL 5 Database Administrator #5739 "ticket" has helped open doors for me. I have seen people come to me looking for work with certifications obtained by cramming for exams to pass certifications without attempting to understand what they were memorizing.

When I've interviewed (on both sides of the table), it's often been helpful to be able to explain how I used my certification in a real-world business scenario. For example, I've been able to say things like this (honestly, of course) to an interviewer. "I had been developing code against MySQL databases and administering those databases many years before I decided to obtain a MySQL certification of my own. The additional learning I got from studying for the certification helped me be better at my job and exposed me to new concepts that helped me think in new ways about how to use MySQL." Flipping that around, I've asked questions like - "Why did you get your certification?" and "What has the certification process done for you?"

Are you just out of college and deciding on getting certified to gain an edge? Go ahead, but better yet - if you can find an organization who can use those skills, even as a volunteer, it will likely have more credibility with an employer over someone who has a degree and certifications without experience.

Looking for ways to use your MySQL certification? Look for local clubs, non-profit's, and other similar groups that have a website, keep membership lists, or do other things where a database could help them. If they already have someone that's maintaining their database, you might ask if you could shadow that person to help build skills. If you end up getting some work out of it, great. If not, at least you'll build contacts that could potentially help you find work or provide a professional reference for you.

Testing MySQL Query Results

Often times, developers talk about testing code, but few spend the time to test data changes made in SQL adequately. I've found that using code similar to this has been a real help to me in isolating potential bugs in how I've coded my own inserts, updates, and deletes. Many would argue that there are better ways to do this, often from an external language, but when all you have is plain old SQL, there are effective ways to perform tests.

Breaking it down, there are three basic steps I use:

  1. Drop and re-create a table to store results in

  2. Write inserts to that table from selects that verify key parts of expected results

  3. Inspect the outcome of the tests

Drop and re-create a table to store results in

SQL Code:

DROP TABLE IF EXISTS test_resultset;
CREATE TABLE test_resultset (
test_name VARCHAR(255) NOT NULL
, passfail ENUM('Pass', 'Fail') NOT NULL DEFAULT 'Fail'

Write inserts to that table from selects that verify key parts of expected results

INSERT INTO test_resultset ( test_name, passfail, result_count )
SELECT '01 first_test_table.first_test_column' AS test_name
, IF( SOME CONDITION , 'Pass', 'Fail' ) AS passfail
, COUNT(*) AS result_count
BY passfail

Inspect the outcome of the tests

FROM test_resultset

I've found that when I write my tests first, it helps me know when I've written "enough" code to be able to say a particular step is complete. When someone asks me what my code is supposed to do, I use my test cases to explain it to them.

Recently, I made modifications to reports that were very intricate. To make it easier on me, I wrote SQL code that verified my modifications before I made them. I used tests that looked something like this (the real tables are named descriptively, unlike these):

INTO test_resultset
SELECT '41 mlr.withheld_amount = SUM(mlrd.withheld_amount)' AS test_name
, IF( ABS( mlr.withheld_amount - mlrdj.withheld_amount ) < 0.001
, 'Pass'
, 'Fail' ) AS passfail
, COUNT(*)
FROM mlr

, SUM(withheld_amount) AS withheld_amount
FROM mlrd
AS mlrdj
ON mlr.mlrd_id =

BY passfail

When I looked at my result set, I saw my test passed, what the name of the test was, the number of records that matched, and the time when the test was last run. I know to look for test 41 in my result set so that if the test doesn't show up, I know it didn't return any rows so it's likely that either my test is not valid or there is no data to run my test against.

By storing the test results in a table, it also provides reference to help you know when you ran your last test set and gives you an easy way to find out what the results of each test were.

If you wanted to keep results from multiple runs of tests, you may want to create a separate table that will give you a test run number, then add that to your test_resultset table as a cross-reference. Then, removing data (like dropping the test_resultset table) between runs becomes not only unnecessary but undesirable.

While this is a very rudimentary test method, it can save lots of work by giving yourself an easy way to run a test set quickly. As you develop code, keeping a test set like this around (whether it's in a SQL file or driven by some other language), you'll find benefits from having test sets like this to rely on as your code becomes more robust and complex.

I could go on, but I'm sure you get the idea and can see how useful this could be when verifying SQL code works as expected. Want to learn more about writing tests before code? Check out "Test Driven Development" in your favorite search engine.

Friday, January 16, 2009

Using top, mytop, mysql command-line all inside screen

This posting contains sample .screenrc and .mytop files.

If you've never used screen before and you like using command-line things, check it out. What I like most about screen is you can start a screen session, detach from it intentionally or otherwise, then reattach picking up right where you left off last time. The easiest way to explain what screen does is simply stating that screen allows you to open (within reason) as many independent terminal sessions as your system will allow. From each of these sessions, you can do pretty much anything you'd do from your regular terminal session. Some of these sessions can be displayed back to you at the same time, some may be left running in the background. In each session, however, screen tracks the session display just like a regular terminal would.

I use screen personally to help me be aware of what's happening critically on my system, and to provide a constant point of reference I can get back to where I left off when I'm able.

The top command is widely used by system administrators to see what processes are taking up the most resources on a given system. Like top, mytop also helps DBA's to see what threads are utilizing resources quickly. What many don't know is there are a number of things that can be done with mytop like killing active processes, turning off the header, setting up color, etc.

When you combine these features with screen, it can be a very nice tool when doing administration and development of a database host.

For example, over a telnet (very insecure) or ssh session, screen can present multiple session windows back to the user all, without needing a gui at all.

My own .screenrc automatically sets up my windows for me like this:

At the top, I automatically start top in a window that's 10 lines high. This lets me see the most process-intensive commands running at any given time.

The window below that is a mytop session to the database and shows me a list of the active queries at any given time (I filter out sleeping sessions and I don't generally display the headers giving me more space to see what queries are running. I also have it updating once per second.

In the third (bottom) window, I leave the rest of the lines to interact with MySQL, the shell, and other systems.

You'll probably find these files interesting:

$HOME/.mytop: (note: chmod 600 to protect your password from prying eyes)




# Make the escape key the backtick rather than ctrl-a since I use ctrl-a so much
escape ``
termcapinfo linux|xterm|rxvt|xterm-color ti@:te@:XT
autodetach on
autonuke on
defflow off
defscrollback 3000 # yes, that's lines
startup_message off
vbell on
vbell_msg "bell on %t (%n)"

caption splitonly "%{= bw}%-Lw%{= wb}%50> %n%f* %t %{-}%+Lw"

hardstatus alwayslastline "%< %= %{= bw} %H %{= rW} %l %{= bw} %m/%d %0c:%s %{-}"

activity "Activity in %t(%n)"

shelltitle "shell"

# Toggle 'fullscreen' or not.
bind f eval "caption splitonly" "hardstatus ignore"
bind F eval "caption always" "hardstatus alwayslastline"

# escape Control-R will reset my windows back to start-up sizes and locations.
# It's nice for when I disconnect from screen then reconnect because screen
# doesn't split windows by
# default.
bind ^R eval "only" "select 0" "split" "split" "focus top" "resize 10" "focus down" "select 1" "resize 12" "focus bottom" "select 2"

# If a window goes unresponsive, don't block the whole session waiting for it.
nonblock on

bufferfile /home/me/screenPasteBuffer.txt

msgwait 0
screen -t top 0
stuff "nice top \012"
screen -t mytop@dbdev 1
stuff "nice mytop \012"
screen -t webdev001 2
focus top
select 0
resize 10
focus down
select 1
resize 12
focus bottom
screen -t dbdev001 3
stuff "ssh dbdev001"
screen -t dbreport001 4
stuff "ssh dbdev001"
screen -t webdev001 5
stuff "ssh dbdev001"
screen -t webdev001 6
stuff "ssh dbdev001"
select 2
msgwait 2

pow_detach_msg "Screen session of \$LOGNAME \$:cr:\$:nl:ended."
shell -$SHELL


I can't take credit for all these enhancements. On the other hand, I have tweaked them to suit my own needs.


Friday, January 9, 2009

Making the mysql command-line client work better

Baron Schwartz on posted a number of very nice tips on how to use the mysql command-line client. Many of these pertain specifically to Unix-based clients, but some are rather useful in Windows as well. :-)

If you're doing MySQL solo, I've often found to be rather useful.

Thursday, January 1, 2009

MySQL Clustering vs. Replication (single & multi-master)

There is a really cool method of implementing replication in MySQL (5.0.24+, 5.1.12+) that allows you to have more than one replication master server for a data set without clustering. There are some really nice capabilities this gives you, however, there are some gotcha's with it too:


  • Having dual replication masters allows an administrator to take one of the masters off-line during an upgrade allowing the other master(s) to handle requests during that upgrade. This provides a cheap form of high availability (HA).

  • Hardware failures in a multi-master situation are less of an issue a s long as the software is configured to use more than one of the masters or a load balancer deals with sensing which host to use.

  • Given a large enough pipe, multi-mastering makes it possible to have a physically remote master. You can't do that reliably with MySQL cluster (not replicated) because it requires extremely fast connections between the nodes of the cluster.

  • No storage engine changes are required to implement for MyISAM (as opposed to Cluster requiring changing to NDB - a transactional storage engine).


  • Locking/Updating Race: It's possible for a lock to succeed in multiple masters simultaneously then fail on the "other" master(s). For example, a user double-clicks a submit causing the application to start two different threads of the same type. It's possible that neither will know about the other until it's "too late" if each thread gets a different DB server connection.

  • Each DB still needs to have a full copy of the entire data set unlike cluster which can distribute copies of data across multiple servers.

  • Suffers from al the problems of replication and the fragility it introduces.

  • requires modifications to every table using auto_increment primary keys.

  • Tables that don't use an auto-increment field as the primary key or tables that have a UNIQUE key constraint on a column or set of columns may allow inserts to succeed on separate masters then fail in replication.

MySQL multi-master replication can be a cheap alternative to MySQL cluster without some of the issues introduced by changing storage engines and incurring the expense of setting up the number of servers (and associated RAM, disk, networking, etc.) required to make MySQL cluster work efficiently.

To learn more about MySQL Multi-Master Replication, see: