Writing About Things Going Wrong

In this post, I want to talk about one very important part of dealing with an outage or some other type of undesirable event - doing a write-up afterwards. As well as being a good exercise in introspecting behaviours in your team, a well-written report can have a huge impact on how your work, as an ops/sysadmin/SRE/whatever person, is perceived outside of your team. Whether you think it is or not, giving other people in your organisation a view into some of the details of what you do is a part of your job.

Controlled Server Demolition

Today I tasked with one of those jobs which don’t come up very often - shutting down a service for good. This particular retirement posed an interesting challenge though. As per usual, I wanted to make sure that all traces of customer data was wiped from the systems, and, indeed, any of our own keys/users/passwords and so on. The challenge was, though, that I only had ssh access to the systems.

Latency And Mobile Sites

I wrote an article for MobiForge entitled Less Is More - Why You Should Care About Latency For Your Mobile Site .

An Empirical Study of SNI Support in Different HTTP Clients

There’s a bit of a chicken and egg situation inherent in the way that SSL works. This can make hosting multiple SSL domains/certs on the same host problematic. Before an SSL client can make a request it must handshake with the server to set up a secure connection. Part of the handshake process is the server presenting it’s SSL cert to the client. Only after the connection is set up does the client send the actual HTTP request.

65 Mustang

About 6 months ago, I bought myself a present in the form of a midnight blue, 1965 Ford Mustang. It was something that I had been thinking about for a long time. I wanted a project car. Initially, I considered starting with something very bare-bones and building it up from scratch. But, in the end I decided it might be wise to take it easier on myself and get something (almost) running.

Automatically Signing Apt Repos with gpg-agent

This is an enormous pain in the ass to get working correctly. To hopefully save you some butt pain, here’s how you actually do this, from end-to-end. So, the story is, you want to build an apt repo of your own stuff. In order to stop apt-get complaining every time you install something from this repo, you need to set it up as a signed repo. To keep things safe, you’ll need to set a passphrase on the key that you use to sign the repo.

Zabbix Agent Over an ssh Tunnel

Today I set up Zabbix monitoring of a bunch of boxes. A couple have public IP addresses - the load-balancers - so they were pretty standard. However, most of them are sitting behind a NAT, so are a little trickier. I played around with Zabbix proxy for a while - whose purpose is to solve this exact problem. In the end though, just to be different/awkward, I opted to set up ssh tunnels and just pass the Zabbix traffic through the load-balancer boxes.

Failover Squid via HAProxy

At the moment I’m using Squid quite a bit as a forward proxy. The application in question pulls content from remote sites and does some processing on it. It’s handy to have a copy of the site ‘nearby’ in case further processing is needed. So, the content is pulled through Squid for later use. Obviously, a single squid instance is no good. If it goes down, everything grinds to a halt.

root-kit-a-rama

Last week, I got a complaint that one of our webservers, hosted in EC2, was responding very slowly. After some fiddling around, I could eventually get ssh access, the box was just dragging along a bit. So, I check the uptime and it’s hovering around 6. I check top and there are a few perl processes chewing up the CPU. At first I think this is some backend web app stuff - some DB processing or something.