Tuesday, May 17, 2016

Confession: taking one for the team

Last week, I sent out a confession at work. I'd read an encouragement to do things like that, and…well, it's pretty self-explanatory. Here's a lightly edited version:
   From: collin <email.address@here>
     To: <recipient-list here>
   Date: Last Tuesday
Subject: Confession

Short version: I did something dumb and confess it.
Busy people can stop reading here, though I hope you read this at some point--maybe while waiting for one of your tests to complete. Details follow.
I picked up a free copy of The Soft Edge when they were being handed out at the cafeteria some weeks back. In it was an encouragement to celebrate successes and also to confess mistakes--especially big successes and big mistakes (cf. “Asoh Defense”).

I saw the power of this a while back when a colleague told me about a mistake, looking somewhat sheepish. “We’ve all done that,” I said. Just to make sure, I added, “I’ve done it myself.”

Well, I’m not the most empathic guy but I felt the weight lift off their shoulders. “You’ve done it?” they asked, incredulous. Yep. It was probably about 20 years ago, but I’ve done stupider things before. And since.

Fast forward to the present. There have been lots of failures in <test case name here>. Some of them happen only when nobody is watching, and have defied analysis. But one of them should have been fixed (by me) right away: burt987303.

It happened in February, then again 8am on May 2. The symptom was a timeout on a “d-volume-create” zapi. “Hey,” I thought, “if the simulator (in this case) can’t come back within 100 seconds, then maybe it died. I could look more, but since it won’t happen again for another 2 months, how much time should I spend on this?”

The answer was: Just a few minutes more--long enough to RTFM and adjust the timeout. You see, it happened again May 4th. You can read <internal document name here>, but the short version is that zsmcli has a “timeout” parameter: I coulda just set that in the command to extend the default 100s timeout.

I’m happy to tell you that I checked in a fix, and that said fix prevented another failure on 5/7 (in <log file name here>, there’s a 118-second d-volume-create execution).

To be clear, the issue was in a test script; the issue wasn’t in the product. I’ve introduced, or incorrectly fixed, product defects before, but this particular issue is in a test case, not in any product sold by my employer.

Is there a moral to the story? Well, the obvious one is to rtfm. Or the help message, as the case may be.

The second is, if you’ve done something like this (it could be more or less dumb; we’re not being precise), don’t feel too bad about it. Performance reviews are over; you could tell somebody. If you’re more senior--or just old--you could tell someone younger; it’ll probably help them feel better. And that in turn might help them think more clearly, too.


Why did I say that telling a younger person about your mistake might help them think more clearly? Because when we reduce the pressure they feel to appear perfect (even if the pressure originated inside their own head), they’ll have less anxiety—less stress. Less anxiety, clearer thinking.

It also makes you appear more human, more real. And we need more of that—more live, human connections (as distinct from mere contractural, transactional connections) in the workplace. Out of the workplace too.

No comments: