Wednesday, June 16, 2021

Sharing SSH session variables across multiple sessions

0 comments
In Unix-like systems, we're used to using ssh-agent to keep track of our private keys making it easier to log into remote systems. When using tmux or screen, it can be difficult to do this using just one agent unless you store certain values for later retrieval. Here is a script that makes that much easier / possible.

In order to use this script, place this file into your $HOME/bin directory and set the permissions of that file to 0755 if you want to share, or 0700 if you don't.

#!/bin/sh
#
# Save SSH environment variables for later
# retrieval to keep from having to start
# multiple ssh-agent executables.
#
# Use the following alias to automatically
# reconnect to the "old" session after using
# grabssh. Note that it assumes that your
# private keys all end in '.pem'.
#
# alias fixssh='source $HOME/bin/fixssh_helper 2>/dev/null ; temp="`ssh-add -l >/dev/null 2>/dev/null`" ; if [ $? -ne 0 ] ; then eval "`ssh-agent -s` ; $HOME/bin/grabssh ; ssh-add $HOME/.ssh/*.pem" ; else echo "Reconnected to ssh-agent" ; ssh-add -l ; fi'
#
SSHVARS="SSH_AUTH_SOCK SSH_AGENT_PID DISPLAY"

for x in ${SSHVARS} ; do
    (eval echo $x=\$$x) | sed  's/=/="/
                                s/$/"/
                                s/^/export /'
done 1>$HOME/bin/fixssh_helper

chmod 600 $HOME/bin/fixssh_helper

echo "Saved SSH auth information for later retrieval"

Make sure that $HOME/bin is in your PATH environment variable so no matter where you are, it'll find this script.

The script itself gives us a suggestion to use an alias in our shell's rc file. In my case, that's .zshrc, but you may be using .bashrc, .kshrc, or some other default file in your home directory. Your mileage may vary but this has been thoroughly tested with zsh and bash.

Example usage (running for the first time):

$ fixssh
command not found: fixssh

This is a good thing. We're not going to mess with any system commands named fixssh. :-)

$ alias fixssh='source $HOME/bin/fixssh_helper 2>/dev/null ; temp="`ssh-add -l >/dev/null 2>/dev/null`" ; if [ $? -ne 0 ] ; then eval "`ssh-agent -s` ; $HOME/bin/grabssh ; ssh-add $HOME/.ssh/*.pem" ; else echo "Reconnected to ssh-agent" ; ssh-add -l ; fi'

This installs the script we were asked to run (above) for our current shell.

If you don't have any SSH keys, there are lots of articles out there on how to generate SSH keys. I won't duplicate their efforts here.

Now, let's make sure we have a .pem file for it to use. I'll assume that you usually have id_rsa as your primary SSH private key. If there are others, you'll want to follow this same process with each private key file. There is no need to do this with public keys (.pub files).

$ mv $HOME/.ssh/id_rsa $HOME/.ssh/id_rsa.pem

Now we have an SSH key we can use with this system.

$ fixssh
Agent pid 32977
Saved SSH auth information for later retrieval
Identity added: *****.pem (*****.pem)

What happens if you close your ssh session? You can use fixssh again to reconnect to your ssh-agent. What if you open another session in parallel? As above, use fixssh to reconnect to your existing ssh-agent.

This is a great tool for folks that need to log into a jumpbox without having to set up their ssh-agent each time.

Note: I found the grabssh and fixssh methods on the web *many* years ago and while I have long since forgotten where that came from, my goal is not to plagiarize that method. This method of using fixssh has evolved greatly from the original. Hats off to the original poster.

Tuesday, August 2, 2016

Efficient MySQL Date Verification in Javascript?

0 comments
I'm not the best person I know at determining what is efficient in JavaScript (ECMAScript) though I would like to think that this could help someone.

/**
 * Make sure that the passed value is valid for the proposed condition. If
 * isRequired is true, dateString must not be blank or null as well as being
 * a valid date string. If isRequired is false, dateString may be blank or null,
 * but when it's not, it must be a valid date string. A valid date string looks
 * like YYYY-MM-DD
 *
 * @param dateString {String}
 * @param isRequired {Boolean}
 * @returns {Boolean}
 */
function isDateValid( dateString, isRequired ) {
    var regex = /^\d\d\d\d-\d\d-\d\d$/ ;
    var retVal = true ;

    if ( ! isRequired ) {
        if ( ( null == dateString ) || ( '' == dateString ) ) {
            return true ;
        }
    }
    else {
        retVal = ( ( null !== dateString ) && ( '' !== dateString ) ) ;
    }
    retVal = ( retVal && ( null !== dateString.match( regex ) ) ) ;
    if ( retVal ) {
        var daysInMonths = [ 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 ] ;
        var yr = parseInt( dateString.substring( 0, 4 ) ) ;
        var mo = parseInt( dateString.substring( 5, 7 ) ) ;
        var da = parseInt( dateString.substring( 8, 10 ) ) ;
        if ( ( yr % 4 ) && ( ( yr % 400 ) || ! ( yr % 100 ) ) ) {
                daysInMonths[ 1 ]++ ; // Leap day!
        }
        if  ( ( yr < 2000 ) || ( yr > 2038 )
           || ( mo < 1 ) || ( mo > 12 )
           || ( da < 1 ) || ( da > daysInMonths[ mo ] )
            ) {
         retVal = false ;
        }
    }
    return ( retVal ) ;
}
If you know of a more efficient way to handle a MySQL (YYYY-DD-MM) date validation, please reply to this post. :-)

Thursday, March 26, 2015

Estimating MySQL rollback time in InnoDB

0 comments
Estimating (computing) rollback time in MySQL can be a bit of a pain (art), but it can be done. MySQL provides data about rollbacks in InnoDB through SHOW ENGINE INNODB STATUS output. Here's a sample:

---TRANSACTION 2920ACF08, ACTIVE 12568 sec rollback
ROLLING BACK 2426027 lock struct(s), heap size 216037816, 52624206 row lock(s), undo log entries 3226386
MySQL thread id 5669944, OS thread handle 0x2b126bd21940, query id 2028903424 10.2.3.4 user41
# Query_time: 8736.352709  Lock_time: 0.000151 Rows_sent: 0  Rows_examined: 52624206
SET timestamp=1427378149; 
(query being rolled back)


For estimating rollback time, the important bits here are:
ROLLING BACK 2426027 lock struct(s), heap size 216037816, 52624206 row lock(s), undo log entries 3226386
The "undo log entries" value is the number of undo logs remaining to be rolled back. This value decreases over time. To get to the time remaining, we need at least two samples of this ROLLING BACK line along with timestamps of when those were taken. Put those together and you'll get the rollback rate. Here's rollback rate per second:
rbr = (staring log entries - ending log entries) / (end time in seconds - start time in seconds)
My experience has been that as undo log entries approaches zero, the rollback rate tends to increase. Having said that, if you have your $HOME/.my.cnf set up with your credentials in it, you can use something like this from a Unix (/bin/sh) shell prompt to see a likely up-to-date prediction of when rollback will complete:

$ rbr=161 # Rollback rate per second 
$ while true ; do clear ; x=“`mysql -h HOSTNAME -e 'show engine innodb status \G' | grep ROLLING\ BACK`" ; echo "$x" ; echo -n "Minutes to go: " ; expr `echo "$x" | cut -d, -f4- | cut -d' ' -f5-` / $rbr / 60 ; sleep 5 ; done

It's technically possible to dynamically adjust the value of $rbr, but I've found that it tends to lead to more frustration.

Saturday, March 2, 2013

GTD Done Wrong?

0 comments
So I just saw this in the GTD blog and wondered if I agree. I'm pretty sure I don't agree.

Priorities and Goals... The GTD Achilles Heel? David Allen Company Forums by commmmodo on 3/2/2013 1:24 PM
After using GTD since 2007, I have found priorities and goals to be t?he system's achilles heal.
I think there's a fair chance I'm doing something wrong, so I want to give the community a chance to correct me and defend GTD.
Time in life is short and finite. I have reached a conclusion that if you want to achieve big career and life goals, you have to cut out all of the unnecessary projects/tasks and focus exclusively on the absolutely best project that will advance you to that goal. There are lots of tasks and projects we could do, but 80% of our energy should be put into the 20% most important projects.
So I ran a little GTD experiment recently. At my weekly review, I started setting top priority projects for a 3-10 day span and timeboxing it. The idea is to find the project that is holding me back from the next level of success in life, and get it completed in a set number of days. So for example: until March 10th I am working on our fundraising documents for people to invest in our company, and after March 10th it's being marked DONE.
During my experiment, I replied to as few emails that don't deal with this project as possible, put off meetings on other projects, and anything that isn't directly achieving the goal I set. I went in my office and closed the door, metaphorically and literally. Because, really, I can do all of the medium-priority tasks I want... and they're not bad things to be working on... but if I really want to advance my career and my company to the next level as quickly as possible, this top-priority project is all I should be focusing on. It's a harsh reality. I guess an analogy would be, as Warren Buffet says, "Putting all of your eggs in 1 basket and watching it carefully." Instead of watering a thousand roses with my finite water bucket of time, I am watering 1 flower with a lot of water until it's bloomed big and strong.
I was a little upset at how well this experiment went, since I have trusted David Allen and GTD to tell me the best thing to do for 6+ years. The results? I got what would have taken 20 days done in about 4. I achieved my goal, and it moved the company and my life forward in a really big way.
GTD's answer to this, as I understand it, is pretty simple: set 50,000ft, 30,000ft, and 20,000ft altitudes (areas of responsibility and major goals) and review them at your weekly review. Then, as you go through your day, pick out next actions based on context, time, energy, and priority.
The problem with this GTD goal and priority system is: you're never picking out 1 30,000ft goal that should be done next, and systematizing it into your daily routine. There's context lists and project lists... but there's no "Do This Project and Nothing Else if You Want to Advance your Life And Career" list. There's no part of GTD that focuses you on that next most important goal. Instead, you're assessing goals and priorities every 5 minutes, and that creates a mental fatigue of sorts. That 3-10 day goal is never written down, making it easy to lose sight of what you really should be doing, even though you may identify this important project during those precious moments of weekly review zen.
Out of practicality, I've started doing a new activity during my weekly review: "What is the next most important project to complete that will advance my life and career more than anything else?" I write it down, open up Omnifocus, and hide all other projects except that one.
Therefore, I've started to see GTD as a sort of hamster on a wheel, a way to spend time on a lot of stuff that doesn't matter and avoid the harsh reality that I should be focused on the one project that actually matters, and saying "f*** everything else."
My question is: why aren't priorities and goals a part of GTD? Is GTD just that? Getting THINGS done. Don't we really want GTMITD? Getting THE MOST IMPORTANT THINGS done? Okay, okay, the acronym isn't as sexy. But life is short, time is finite, and priorities (as defined by your larger goals) need to be systematized. I need something where I can go on autopilot during the work day. That's the whole point of mind like water, is I don't need to be thinking about my task system all day long. I need a better answer than, "Set up your 20,000ft review, and then reanalyze your priorities every time you complete a task." It's not working for me.
I hope this explains the problem clearly. It's a complex situation, therefore I may not have explained everything you need to know to render a reply. Please feel free to ask followup questions and I'll respond to them promptly. Thank you.

Does anyone watching this blog have any comments on why this GTD practitioner should feel he wasn't using GTD while working on the fundraising documents given the information provided?

I don't consider myself a GTD expert, but I do think "commmmodo" was actually using GTD properly during the "experiment" because he/she elected to prioritize the fundraising document above nearly everything else for a limited time. Maybe I'm missing something.

Thoughts anyone?

Tuesday, November 27, 2012

NoSQL vs. SomeSQL

0 comments
Linux Journal had a fantastic article (SQL vs. NoSQL) some time back. While I know this is a bit of hopping on the bandwagon, I like the point this video is trying to make: http://www.xtranormal.com/watch/6995033/mongo-db-is-web-scale. Caution: The language used in this "video" may not be appropriate for some viewers.

There are lots of folks out there that like to tout numbers on performance and how sometimes performance is really fast under "ideal" conditions, but as both point out, the trick is to know how to balance performance/scalability, reliability, and availability. /dev/null is extremely scalable and available but it's completely unreliable. In MySQL - the Blackhole storage engine has a lot of the same performance metrics but used properly, can be a great way to "pass through" data in a replication ring.

Sunday, February 12, 2012

A basic shared-nothing data sharding system

0 comments
There's a lot of buzz about sharding data. Today, I'll provide a very brief overview of how sharding helps systems I manage run more efficiently and how we're addressing keeping individual shards balanced.

The goal of sharding data in our environment involves: 1) make the structure of the data consistent across all the shards, 2) dividing data up so it can be found easily, 3) automatically and continuously re-balance the shards, and 4) allow for changes in scale (like adding a new shard or different shard sizes).

Item 1 is a snap - all we do there is to deploy the same data structures in each of the shards with all the supporting data required to answer questions related to a user. Some of this data is user-specific, some is globally replicated. In any case, this goal makes it easy to use one set of code to access data in any of the shards without having to cross to another shard or database to get the answer for a question. This reduces workload in the application and on other database servers.

Item 2 is done by hashing our key data. Let's say that we have a set of widgets that users are concerned with. Some users have a few widgets, some have a lot, but each user is very different from another. Widgets are pretty common and well defined. Each user has a user ID and any question we ask the system always involves a specific user ID. So - our key data we hash against in this case would be the user ID. Data about the widgets is replicated to all the shards, but data about each user is only kept on the shard where that user's data lives.

Item 3 is handled by a separate process that utilizes the same API the application uses. Balancing the data between shards is simple - the balancer asks the API if there are any users that need to move. If yes, the balancer lets the API know to lock that user temporarily, moves the data, then unlocks the users for use on the new shard. What this means for applications is each time a location is returned for a specific user, that location is only guaranteed for a given window of time (30 seconds for example). So - when the balancer tells the API it's moving a user's records, any requests for that user's records are held up until the user's data is moved. The API is smart enough to only let the balancer move data that has not been accessed recently. This doesn't prevent all lock collisions, but it handles most of them.

Item 4 is handled through the configuration of the API. Because we use an API to tell the application where the data is for a given user, we've abstracted away where data actually lives. This makes it easy to add and remove servers from the sharding pool. We've extended this to include allowing a shard to be marked as in a draining state. When a shard is draining, the API will ask the balancer to move rows from the draining shard and redistribute that information onto other members of the sharding pool. This makes it possible to take a shard out of rotation for routine maintenance without the loss of data.

Notice that I didn't mention any specific software here. I didn't tell you what language the application is written in, what language the API is written in, or what the actual data store was. The technique of sharding data is pretty simple and can be done with nearly any persistence layer using any programming language.

The beauty of this system is that once the API is written, the balancer can be a complete "black box" to the application. This type of system could be implemented with a data store when just starting out and be expanded to multiple stores as the need expands. Also - sharding key needs to change, again, the application doesn't need to change - just the API and the balancer.

One other big benefit to sharding data like this - it's often a lot cheaper to buy several smaller systems than to buy and maintain one very large system. If one of the systems in the sharding pool goes off-line, the worst possible exposure in a shared-nothing sharding system is the data stored on the member that went down. In a monolithic system, you stand to lose a lot more.

While I wouldn't suggest trying to do this type of work on top of every data set out there, I do see that there is a lot of benefit when the types of questions being asked of a data set can be divided up easily while still making it relatively easy to answer the "question at hand" from a single source. The secret in the sauce is making sure that any common data is shared among all the systems in the pool.

Sunday, January 15, 2012

Managing incoming emails

0 comments
Reading emails all day long tends to be very counter-productive for me. I usually end up responding faster than anyone else which generally gets me a lot more work than I need. At the same time, I have a responsibility during my times as primary and secondary on-call to respond within our service level agreement. So - how do I find balance? My team and I use mailing lists to help us manage those truly urgent issues versus those issues that can be handled as time allows. We have three lists:

group_primary@foo.com
group_secondary@foo.com
group_admin@foo.com

We've published these three lists to our operations center. Everyone else just gets the admin list. We don't tell others about the primary and secondary lists because anything we'd get on primary or secondary would need to come via the operations center anyway. We also don't want our over 600 co-workers (not on our team and not in the NOC) to email us willy-nilly using our on-call emails.

Next, on each of our team's smart phones, we've set them up to recognize emails going specifically to the primary and secondary emails so our phones will either go off like a pager or (in my case) read the sender and destination email (think "Inbound Primary email from the NOC"). That prevents me from having to look at my phone every time a new message comes in but lets me know when there's something that requires my attention.

The other thing we do is to make it easy to change the destination for the primary address easily so that only primary gets notified. Secondary is notified in the same way but on my two-man team, there are only two of us so secondary always goes to the whole team (for now).

Finally, to help us have reasonable sanity, I do what I can to only check the "other" emails twice a day.

The  net result of this process is I am able to focus on getting project work done between routine email readings and it lets others figure things out for themselves or wait a bit for an answer. If it was truly urgent, the sender could simply ask the NOC to reach out to the on-call person to get a faster response.

How do you deal with your on-call processes and email?