Saturday, April 25, 2009

What should I use - stored procedures or application code?


I've often thought about using a stored procedure versus using application code to accomplish tasks. The rules I use are pretty simple:
  1. Let the database generate data for itself (from data it already has).
  2. Let the database maintain data security and integrity where appropriate.
  3. Let the database help me administer it.
I've chosen to shy away from asking the database to handle business logic internally for a number of reasons, largely because databases are typically very expensive resources in comparison to application servers. It may also be difficult to revision business logic stored in the database, therefore making it more difficult to track how that code has changed.

I do understand the value of stored procedures reducing network traffic for certain types of queries as well as guaranteeing the effective use of the query cache, however, I find that applications can be designed to use libraries of SQL in such a way that the query cache is used effectively and network traffic is reduced through the proper use of joins, cursors and temporary tables.

Talking with other DBA's, some agree with me, some don't. What are your thoughts about when to use stored procedures versus application code?

Creating a MySQL reference database in 3 easy steps

What is a reference database?

A reference database is nothing more than a set of data that developers and test systems use to mirror what production data looks like without actually exposing sensitive data from the production system. This is critical for systems that handle HIPPA and PCI/DSS data because developers need to know what production looks and performs like, but they aren't allowed to see protected data like credit card numbers or medical data about specific individuals.

There are a number of different goals a reference database typically covers:
  • Provide a realistic snapshot of what developers need in order to be effective when maintaining most of the production code (i.e. the DDL and realistic data).
  • Provide a way to test edge cases without affecting production.
  • Provide a data set that realistically models production for performance testing of queries.
How do I set up a basic reference database?

So, the process of creating the reference database is pretty simple using replication. I'll assume you're already using replication to provide back-up of your master data set (at minimum). I'm also going to assume you are on MySQL 5 or newer.
  1. Create a replication slave that receives all data from your production master.
  2. Create triggers and corresponding stored procedures on this slave that obfuscate sensitive data.
  3. Utilize a cron or similar automated job to do a mysqldump of the resulting data set from time to time so that developers and testers can download it to their environments.
This accomplishes the goal of providing a production database snapshot in isolation without exposing sensitive data to others that aren't authorized to see it. It also accomplishes the goal of providing a basis for testing performance of the database as well as a method for verifying edge cases.

How do I cover edge cases?

The problem is, this method probably won't cover edge cases by itself, but as I said above, it does provide the basis for it. To cover edge cases that the production data doesn't cover, you'll need to write a script that will update a deployed copy of the reference database. What happens if certain data is missing? What if it's duplicated inappropriately? How does the application react when a transaction table is full - does it automatically add another transaction table to the merge? ... You know your application and the edge cases it needs. Creating scripts to simulate these conditions should be relatively straightforward.

Once you have a reference database and scripts to test performance and edge cases, it becomes terribly simple to add your database to your continuous integration system. Deciding on how often to run those tests largely depends on how often you can afford to run the tests with the resources you have available to you. Typically, I want to run a minimum set of tests for every code commit related to it, then run a full set of tests on a timed schedule so I don't overwhelm my build system.

Has this helped you? Do you have suggestions or comments that might help someone else? Please feel free to add your comments here. :-)

Letting RSS help you with searches

Until recently, I never really thought much of using RSS to help me in this economy with job searches, but someone else did because sites like Dice, Monster, Indeed and Craigslist all provide ways to let your RSS reader keep you up-to-date with job search data. I know - I'm giving away one of my search secrets, but if I don't tell you, someone else will. Let's stick together and help each other out as we weather this recession.