Thursday, June 27, 2019
Those deployment blues
My department at The Corporation had a deployment this morning (2:00 am). These deployments don't happen that often (the last one happened in January of this year; last year we had a total of four deployments) but usually there are no problems afterwards.
This time we weren't so lucky.
It wasn't a problem with our code, but with a vendor our customer, The Monopolistic Phone Company, uses. The vendor in question wasn't sending some critical information we were sending back to The Monopolistic Phone Company. We didn't notice this initially since our testing just happened to use the other vendor The Monopolistic Phone Comapny uses. So while it technically wasn't our problem, getting that particular vendor to even look at a problem, much less solve it, is a multi-month and multi-money problem, practically it is our problem.
The base problem is that one vendor who shall rename nameless is supposed to forward all SIP headers that start with a common prefix, but they have a limit to the number of non-standard SIP headers they'll forward and we've exceeded said limit. Apparently, a new feature we added, plus moving some existing data to its own header, bumped the number of headers past this limit. The fix was easy (just put the existing data we moved back in the old header while keeping it in the new header) but there was a bit of concern about installing it into production.
You see, because our customer is The Monopolistic Phone Company, and they have regulartory issues with respect to reliability to contend with, there's a whole process involved with deployment. Just for starters, we have to give them a 10-business day notice of any changes, which they can veto …
Oh, and have I mentioned the very scary SLAs we have with them? Where vast amounts of money start flowing to The Monopolistic Phone Company for violations of said SLAs? So you can see why it takes a significant amount of time to get deployed, and why we have so few.
Fortunately, we're given a number of emergency deployments we can use and thus, we used one of them today.
All told, from initial bug fix to re-deployment took a total of three hours. That is the fastest deployment I've seen of our department's code.