Tuesday, January 20, 2009

When do I get a certification and why?

0 comments
I see questions like this on LinkedIn - http://www.linkedin.com/ very frequently.

Will it help me make more money? Will it help me get my foot in the door? Are certifications just a waste of time?

The answer to all these questions is: it depends. I've been on both sides of job searching (looking for work and looking for workers) and I can tell you that I've seen lots of people who had or were looking for certifications. My Certified MySQL 5 Database Administrator #5739 "ticket" has helped open doors for me. I have seen people come to me looking for work with certifications obtained by cramming for exams to pass certifications without attempting to understand what they were memorizing.

When I've interviewed (on both sides of the table), it's often been helpful to be able to explain how I used my certification in a real-world business scenario. For example, I've been able to say things like this (honestly, of course) to an interviewer. "I had been developing code against MySQL databases and administering those databases many years before I decided to obtain a MySQL certification of my own. The additional learning I got from studying for the certification helped me be better at my job and exposed me to new concepts that helped me think in new ways about how to use MySQL." Flipping that around, I've asked questions like - "Why did you get your certification?" and "What has the certification process done for you?"

Are you just out of college and deciding on getting certified to gain an edge? Go ahead, but better yet - if you can find an organization who can use those skills, even as a volunteer, it will likely have more credibility with an employer over someone who has a degree and certifications without experience.

Looking for ways to use your MySQL certification? Look for local clubs, non-profit's, and other similar groups that have a website, keep membership lists, or do other things where a database could help them. If they already have someone that's maintaining their database, you might ask if you could shadow that person to help build skills. If you end up getting some work out of it, great. If not, at least you'll build contacts that could potentially help you find work or provide a professional reference for you.

Testing MySQL Query Results

0 comments
Often times, developers talk about testing code, but few spend the time to test data changes made in SQL adequately. I've found that using code similar to this has been a real help to me in isolating potential bugs in how I've coded my own inserts, updates, and deletes. Many would argue that there are better ways to do this, often from an external language, but when all you have is plain old SQL, there are effective ways to perform tests.

Breaking it down, there are three basic steps I use:


  1. Drop and re-create a table to store results in

  2. Write inserts to that table from selects that verify key parts of expected results

  3. Inspect the outcome of the tests



Drop and re-create a table to store results in

SQL Code:


DROP TABLE IF EXISTS test_resultset;
CREATE TABLE test_resultset (
test_name VARCHAR(255) NOT NULL
, created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
, passfail ENUM('Pass', 'Fail') NOT NULL DEFAULT 'Fail'
, result_count INT UNSIGNED NOT NULL DEFAULT 0
);


Write inserts to that table from selects that verify key parts of expected results


INSERT INTO test_resultset ( test_name, passfail, result_count )
SELECT '01 first_test_table.first_test_column' AS test_name
, IF( SOME CONDITION , 'Pass', 'Fail' ) AS passfail
, COUNT(*) AS result_count
...
GROUP
BY passfail
;


Inspect the outcome of the tests


SELECT *
FROM test_resultset
;


I've found that when I write my tests first, it helps me know when I've written "enough" code to be able to say a particular step is complete. When someone asks me what my code is supposed to do, I use my test cases to explain it to them.

Recently, I made modifications to reports that were very intricate. To make it easier on me, I wrote SQL code that verified my modifications before I made them. I used tests that looked something like this (the real tables are named descriptively, unlike these):


INSERT
INTO test_resultset
SELECT '41 mlr.withheld_amount = SUM(mlrd.withheld_amount)' AS test_name
, IF( ABS( mlr.withheld_amount - mlrdj.withheld_amount ) < 0.001
, 'Pass'
, 'Fail' ) AS passfail
, COUNT(*)
FROM mlr

JOIN
(
SELECT id
, SUM(withheld_amount) AS withheld_amount
FROM mlrd
)
AS mlrdj
ON mlr.mlrd_id = mlrdj.id

GROUP
BY passfail
;


When I looked at my result set, I saw my test passed, what the name of the test was, the number of records that matched, and the time when the test was last run. I know to look for test 41 in my result set so that if the test doesn't show up, I know it didn't return any rows so it's likely that either my test is not valid or there is no data to run my test against.

By storing the test results in a table, it also provides reference to help you know when you ran your last test set and gives you an easy way to find out what the results of each test were.

If you wanted to keep results from multiple runs of tests, you may want to create a separate table that will give you a test run number, then add that to your test_resultset table as a cross-reference. Then, removing data (like dropping the test_resultset table) between runs becomes not only unnecessary but undesirable.

While this is a very rudimentary test method, it can save lots of work by giving yourself an easy way to run a test set quickly. As you develop code, keeping a test set like this around (whether it's in a SQL file or driven by some other language), you'll find benefits from having test sets like this to rely on as your code becomes more robust and complex.

I could go on, but I'm sure you get the idea and can see how useful this could be when verifying SQL code works as expected. Want to learn more about writing tests before code? Check out "Test Driven Development" in your favorite search engine.

Friday, January 16, 2009

Using top, mytop, mysql command-line all inside screen

3 comments
This posting contains sample .screenrc and .mytop files.

If you've never used screen before and you like using command-line things, check it out. What I like most about screen is you can start a screen session, detach from it intentionally or otherwise, then reattach picking up right where you left off last time. The easiest way to explain what screen does is simply stating that screen allows you to open (within reason) as many independent terminal sessions as your system will allow. From each of these sessions, you can do pretty much anything you'd do from your regular terminal session. Some of these sessions can be displayed back to you at the same time, some may be left running in the background. In each session, however, screen tracks the session display just like a regular terminal would.

I use screen personally to help me be aware of what's happening critically on my system, and to provide a constant point of reference I can get back to where I left off when I'm able.

The top command is widely used by system administrators to see what processes are taking up the most resources on a given system. Like top, mytop also helps DBA's to see what threads are utilizing resources quickly. What many don't know is there are a number of things that can be done with mytop like killing active processes, turning off the header, setting up color, etc.

When you combine these features with screen, it can be a very nice tool when doing administration and development of a database host.

For example, over a telnet (very insecure) or ssh session, screen can present multiple session windows back to the user all, without needing a gui at all.

My own .screenrc automatically sets up my windows for me like this:

At the top, I automatically start top in a window that's 10 lines high. This lets me see the most process-intensive commands running at any given time.

The window below that is a mytop session to the database and shows me a list of the active queries at any given time (I filter out sleeping sessions and I don't generally display the headers giving me more space to see what queries are running. I also have it updating once per second.

In the third (bottom) window, I leave the rest of the lines to interact with MySQL, the shell, and other systems.

You'll probably find these files interesting:

$HOME/.mytop: (note: chmod 600 to protect your password from prying eyes)
----------------------------------------------------------------------------

host=dbdev:3306
user=me
db=sys_trunk_db
pass=my_pass
delay=1
port=3306
socket=
batchmode=0
header=0
color=1
idle=0

----------------------------------------------------------------------------

$HOME/.screenrc
----------------------------------------------------------------------------

# Make the escape key the backtick rather than ctrl-a since I use ctrl-a so much
escape ``
termcapinfo linux|xterm|rxvt|xterm-color ti@:te@:XT
autodetach on
autonuke on
defflow off
defscrollback 3000 # yes, that's lines
startup_message off
chdir
vbell on
vbell_msg "bell on %t (%n)"

caption splitonly "%{= bw}%-Lw%{= wb}%50> %n%f* %t %{-}%+Lw"

hardstatus alwayslastline "%< %= %{= bw} %H %{= rW} %l %{= bw} %m/%d %0c:%s %{-}"

activity "Activity in %t(%n)"

shelltitle "shell"

#
# Toggle 'fullscreen' or not.
#
bind f eval "caption splitonly" "hardstatus ignore"
bind F eval "caption always" "hardstatus alwayslastline"

# escape Control-R will reset my windows back to start-up sizes and locations.
# It's nice for when I disconnect from screen then reconnect because screen
# doesn't split windows by
# default.
bind ^R eval "only" "select 0" "split" "split" "focus top" "resize 10" "focus down" "select 1" "resize 12" "focus bottom" "select 2"

# If a window goes unresponsive, don't block the whole session waiting for it.
nonblock on

bufferfile /home/me/screenPasteBuffer.txt

msgwait 0
screen -t top 0
stuff "nice top \012"
screen -t mytop@dbdev 1
stuff "nice mytop \012"
screen -t webdev001 2
split
split
focus top
select 0
resize 10
focus down
select 1
resize 12
focus bottom
screen -t dbdev001 3
stuff "ssh dbdev001"
screen -t dbreport001 4
stuff "ssh dbdev001"
screen -t webdev001 5
stuff "ssh dbdev001"
screen -t webdev001 6
stuff "ssh dbdev001"
select 2
msgwait 2

pow_detach_msg "Screen session of \$LOGNAME \$:cr:\$:nl:ended."
shell -$SHELL

----------------------------------------------------------------------------

I can't take credit for all these enhancements. On the other hand, I have tweaked them to suit my own needs.

Enjoy!

Friday, January 9, 2009

Making the mysql command-line client work better

0 comments
Baron Schwartz on www.mysqlperformanceblog.com posted a number of very nice tips on how to use the mysql command-line client. Many of these pertain specifically to Unix-based clients, but some are rather useful in Windows as well. :-)

If you're doing MySQL solo, I've often found mysqlperformanceblog.com to be rather useful.

Thursday, January 1, 2009

MySQL Clustering vs. Replication (single & multi-master)

0 comments
There is a really cool method of implementing replication in MySQL (5.0.24+, 5.1.12+) that allows you to have more than one replication master server for a data set without clustering. There are some really nice capabilities this gives you, however, there are some gotcha's with it too:

Pro's:


  • Having dual replication masters allows an administrator to take one of the masters off-line during an upgrade allowing the other master(s) to handle requests during that upgrade. This provides a cheap form of high availability (HA).

  • Hardware failures in a multi-master situation are less of an issue a s long as the software is configured to use more than one of the masters or a load balancer deals with sensing which host to use.

  • Given a large enough pipe, multi-mastering makes it possible to have a physically remote master. You can't do that reliably with MySQL cluster (not replicated) because it requires extremely fast connections between the nodes of the cluster.

  • No storage engine changes are required to implement for MyISAM (as opposed to Cluster requiring changing to NDB - a transactional storage engine).



Con's:


  • Locking/Updating Race: It's possible for a lock to succeed in multiple masters simultaneously then fail on the "other" master(s). For example, a user double-clicks a submit causing the application to start two different threads of the same type. It's possible that neither will know about the other until it's "too late" if each thread gets a different DB server connection.

  • Each DB still needs to have a full copy of the entire data set unlike cluster which can distribute copies of data across multiple servers.

  • Suffers from al the problems of replication and the fragility it introduces.

  • requires modifications to every table using auto_increment primary keys.

  • Tables that don't use an auto-increment field as the primary key or tables that have a UNIQUE key constraint on a column or set of columns may allow inserts to succeed on separate masters then fail in replication.



MySQL multi-master replication can be a cheap alternative to MySQL cluster without some of the issues introduced by changing storage engines and incurring the expense of setting up the number of servers (and associated RAM, disk, networking, etc.) required to make MySQL cluster work efficiently.

To learn more about MySQL Multi-Master Replication, see: