On Reputation

A brief, Friday follow-up to the earlier Barbarians at the Gate note from a few days ago. Recall that we had a WordPress website under continuous, distributed password-guessing attack. In the previous post, I listed a long set of US-based addresses and their owners (via whois) that were in the logs as participants. DigitalOcean was very well-represented in that list.

Incident Response

Random scanners and attackers were, of course, in the threat modeling. The asset itself is considered low-risk: it contains no useful data and is not particularly useful as a beachhead. Since the website is

  • largely a latest-release, routinely patched place-holder with only one account and four posts,
  • sitting behind an nginx reverse-proxy with some rules at the front edge,
  • running on apache in the back-end with some different protections in place,
  • running on a single use server on a restricted network segment,
  • running tripwire to check for server configuration changes,

I wasn’t too concerned… Still, an attack is an attack, so we shouldn’t make assumptions. If anything, it’s an opportunity to challenge assumptions, re-evaluate strategies, update tactics, and so forth. In this case, although I had used a particular WordPress plugin to offer an OpenID-Connect login flow and I hid the ordinary username/password login form from the user, that functionality was still present via a POST method to the wp-login endpoint. That was an opportunity to revisit both the apache configuration in the back-end as well as the nginx configuration at the front-end. The attackers were also after the xmlrpc endpoint; that was already being handled by apache, but it was a chance to consider whether it would be best to kill that at the front.

The password guessing aspect was expected–but admittedly not to this extent. It was an opportunity to reevaluate an earlier decision not to implement a package like fail2ban to reactively block password guessers by source IP address — after all, login was going to be handled by a different service. Since default installations of fail2ban can lead to a remaking of your own iptables firewall implementation, I decided that for now it was sufficient to add a new iptables blacklist chain checked at the INPUT and FORWARD stages, and once or twice a day to filter the logs and add the obvious offenders.

Something got under my skin about DigitalOcean though… addresses registered to their service kept appearing. Yes, there were a few from Amazon AWS, a few from Google, a few from Vultr, … — the different cloud providers were all represented at some level — but DigitalOcean? That was crazy.

Risk Mitigation

This led to an obvious question: How often do we really expect legitimate website visitors to come via a DigitalOcean address? When considering Availability — one of the core tenets of Information Assurance — we have to do our part to understand who our clients are and make sure that they have paths to our service. In this particular case, our clients may be people operating in hostile environments. Those people may be operating through Tor nodes, VPNs, and other proxies. Cloud providers are ideal places for those endpoints to appear, and multiple users — some legitimate and some not — may be using those types of endpoints.

Additionally — and interestingly — some otherwise legitimate services may be piping their traffic through similar endpoints unbeknownst to their users. Case and point: I’ve notice that when I’m operating on the wifi at a particular Starbucks coffee shop a mile or so from home, my public IP address appears as an Amazon AWS cloud IP address. That means that the wifi traffic is being tunneled past the local ISPs to the cloud where Starbucks or whoever else is handling it. Strange, yes? Anyway, blocking Amazon cloud addresses would block this Starbuck’s wifi users, and who knows who else?

Ultimately, we don’t want to block them.

But we did.

For now.

At least all of the DigitalOcean addresses. We found every last CIDR block of theirs that we could find and threw them in the blacklist. As new addresses appear attributed to them, we find that containing CIDR block and throw it onto the blacklist too.

Suddenly, the logs look “normal” again.

We took a similar approach for known foreign IP addresses. As soon as I flipped the switch to make the site available, IP addresses from Russia, Ukraine, France, the Seychelles, China, and others, were all were probing and attacking the little “Hello, World!” presence. For now — in the initial development phases — excluding these sources of potential future customers seems like a fair risk mitigation.

… which brings us back to Reputation

I may have first heard about DigitalOcean droplets as an alternative to Amazon EC2 instances from one of the various YouTube tech vlogs, with a fellow — maybe an affiliate? — suggesting to host a cloud-based wifi controller there. As a service only you’d be using, it’s a fine solution — just don’t block yourself and you’re good to go. As soon as you want to provide a public service, though, there can be issues. I spotted lots of those issues while I was googling around for those DigitalOcean CIDR blocks — the most prominent was that DigitalOcean customers standing up email servers were having lots of problems getting their systems to work. Root cause? Groups like spamhaus had already blacklisted the entire organization — every last IP address they owned. Any SMTP server consulting spamhaus — which is likely most of them — will deny your message at the gate. Why? According to my reading, (1) DigitalOcean-listed addresses were a tremendous source of spam email, and (2) DigitalOcean was not responsive to complaints.

Reputation: shot.

We normally think of our own reputation, our family’s reputation, our school’s reputation, our company’s reputation, etc. It’s easy to overlook or forget that things like phone numbers and IP addresses have reputations too. If by luck of the draw you find yourself or your business operating with an IP address that was used in a spam campaign or a phone number that was used by a telemarketer, you’re screwed.

Similarly, have you considered what happens if someone is up to no good on your guest network, and your guest network uses the same public IP address as your business network? Or maybe other teams or organizations are on your network as well? You may find yourself answering the abuse complaints, or being quietly blacklisted, or even under attack.

There are, of course, mitigations to several of these things — but first you have to think them through in a fairly disciplined way. If you’ve not considered those types of issues before, well, you may be missing a few other things as well. It might be time to visit our contact form and introduce yourself 🙂

Power Outage! #ohshit!

They never happen here! Well, almost never… and for around three hours last night it certainly did happen.

Here at the home office, the power lines are buried. There’s a stretch of high tension power lines cutting through a mile or so away, but otherwise there are no telephone poles or power lines to be found. What that means for us is that when storms sweep through and the branches are falling everywhere, we can tweet about it in real-time!

… well, that and all systems stay green on the dashboard.

But last night, not only was the power down, but there were also brief efforts to bring it back up. Each one of those flickers of life was enough to start the spin-up of various devices, only to have them come crashing down again. That can’t be good.

Risk Planning, Network Style (Simplified)

We have two fundamental concerns at the home-office, namely the “home” and the “office” — go figure 🙂 The home can begrudgingly switch to books and board games by candlelight like the cavemen did once their battery-powered video game consoles ran out of juice, and — assuming the local cellular network stays up — they can commiserate with friends on social media so long as their cellphone batteries hold up. In general, that’s a low probability event in this area and it carries very low impact.

More for business purposes, I keep a portable, cellular hot spot device charged and ready with a few extra gigabytes on the plan. It has a significantly longer battery life than the cellphones — even more so if considering laptop tethering. Also, we do keep at least one deep-cycle, 12V lead acid battery topped off with a solar panel. That plus a small power inverter and a power strip can have phones and laptops recharged as necessary. Less for business and more for hobby, I also keep ham radio gear and spare batteries charged, so even if the cellular network is down we can check in with others and assess if there’s actually a bigger problem.

So, what about the business-specific aspects? Here there are several cost-benefit trade-offs, the “risk mitigations” and the “risk acceptances.” As a small consultancy where most clients have our cellphone numbers, it doesn’t really matter if VoIP telephony is off-line. It also doesn’t really matter if the website is not available. Neither are mission critical, so the costs associated with moving those services to the cloud are not worth it. Similarly, in-house we maintain a lab environment for testing and experimenting; nothing critical to operations lives there, so it’s okay if the lab is offline. Overall, there is no real need for a generator, redundant network paths, fail-over equipment, and so forth. As a mitigation, we keep a higher rate of data backups (including all virtual machines) with offline and offsite storage. Further, we generally insist that client data remains on client networks, so there is nothing here that they should require. Finally, in case of disaster from corrupted data or fried boxes, we are fairly skilled in quickly rebuilding key functions with commodity equipment and those stashed images.

On the flip-side, there are three basic services that we have pushed to the cloud as we need them to survive a home-office outage:

  1. VPN-related services. It makes sense to keep a VPN server with all of the routine availability guarantees provided. When the home-office network is up, it can connect with the VPN server along with other permitted systems; when it is not, designated users and devices can reach the server directly. Separation of system and user VPNs with enhanced access controls makes for nice partition. Assuming clients are still online and we can hit the network from somewhere if not here, we’re good to go — core business functions continue as expected.
  2. Email. In many cases — particularly with data privacy experiments — we prefer to host our own email services. Naturally, we’d prefer email services to survive as a critical business function. In case of catastrophe there, though, we are prepared to throw the switch to one of the major email services. Check!
  3. Realm-related services. This is more of a “break glass in case of emergency” situation. We keep a FreeIPA server turned-off in the cloud. Primary servers run in the office with this one designated as a (generally unavailable) secondary across the system. The FreeIPA server is tied to VPN, DNS, and other access functions. While it’s not essential to operations, it is more than convenient to have it when needed. This service is routinely brought on-line, where it joins the VPN and subsequently syncs with the primary servers before having system updates applied and then being put back to sleep. Check!

In particular, the combination of VPN services and the email services provides a simple alert mechanism if the office is not communicating with the cloud. (The home-office is in some aspects treated as any client site, with availability reports from inside the site as well as from outside.)

In the end, we’re confident that the security program foundations were are all considered: “Who needs access to what to keep the operation running?” was considered. We considered the threat — in this case, a power outage — and the potential impacts. We considered alternative plans, including cost-benefits of each, for different levels of availability and protection to mitigate the risk. We have systems in place to detect the occurance, and we have a plan to respond that takes us right through recovery.

So, How did We Do? (After Action Reporting & Lessons Learned)

Interestingly enough, the first potential problem encountered was just potential embarrassment: the power outage occurred near the end of day Friday, not so very long after inviting a client to review something on the website. (Because when else would an event like this occur, right?) While the website content is hosted inside the lab, there is actually an nginx reverse-proxy in the cloud that forwards the requests back. Anyone visiting the page then would have a basic nginx 50x error. It’s now on the to-do list to throw some basic content on the nginx server that can be served quickly when the back-end is unavailable.

Some virtual machines associated with more recent experiments failed to come up when the host servers restarted. That was a simple matter: the development services were not critical, so they were not toggled to fire up on host restart. Unfortunately, one of the services added was a keycloak authentication service that was recently hooked into a monitoring system — that is to say that former LDAP-based authentication was transitioned to OpenID-Connect-based authentication — and we didn’t set the auto-power-up flag for that. That’s been corrected. However, this service is still considered non-essential, so for the time being it will remain locally hosted.

DNS had some trouble on restart, which naturally cascades into some other issues when services come back online. The “break glass” FreeIPA in the cloud was meant to handle that concern, but there were some hiccups. In particular, the VPN server was configured to push the other DNS servers to machines that joined, but we left out the emergency server. That was corrected and tested, and as an extra layer, each of the cloud servers were updated with additional /etc/hosts entries pointing to essential services.

Interestingly, one of the virtualization servers in the home-office needed to find an active display attached before it would complete its boot sequence. That is currently one HDMI connection to a monitor that was switched to a different source. The fix will probably be one of: attach a proper KVM; attach a null monitor dongle; or find the BIOS switch to allow the headless start. TBD.

While our typical (low probability) power event involves a single outage, pause, power-up cycle, this one was particularly disconcerting as we went through a few up/down cycles, some closer together. While much of the infrastructure gear is solid state and doesn’t particularly care, I had strong concerns about the integrity of the virtualization hosts. We’ll have to consider adding a UPS for those boxes in particular for potentially more graceful cycling — at least smoothing out any rapid cycles. TBD.

All in all, it was a little harrowing, but the systems all came through fine and — perhaps as importantly — according to plan.

So, How Would Your Office Fare?

The above is a real-life example affecting a moderately complex network. Even for a small concern like this, it was essential that we had the full understanding of the network and the services running on it. We knew the priorities for verifying and troubleshooting different elements and we knew what to put on hold. We had the method to break the glass where necessary. We worked through the steps to confirm all systems green, and we had the discipline to catalogue the event and consider changes to our standard operating procedures where it might be worthwhile.

So, what if this was a hacker event instead? What if we were handling sensitive client data, health data, financial transaction data, …? How about local business data, from finances to inventories, to emails and records? What if this router fails or that server crashes? Do you have a plan for continuity of operations? An incident response plan? Communications plans, both internal for key personnel and external for clients, vendors, contractors, partners, …? Have you tested the plans?

The thought process is the same, but the complexity may grow quickly. With continual effort and improvement, it is feasible and it doesn’t have to be a monolithic effort. Consider the risks, rank them, and get to work!

As usual, for that outside assessment of your current state and help in getting you to where you need to be, drop us a note via our contact page.