It happens to the best of us: One morning, you’re interrupted by automated alerts that you’ve lost access to a collection of test servers sitting in the cloud. Very strange. Oh, I bet that’s it: Looks like the VPN link to that group is down. Naturally, you have backdoor access to that server that doesn’t rely on the VPN connection, right? (Don’t lie to me ~ I know you do. You can verify to me later that the secondary access is documented, secure, and audited…) The box is running an openvpn server, so we check the logs: No connections from anywhere? … and there’s why:
Certificates: They are a Brilliant Solution
Seriously, they are.
- Certificates are plaintext statements about an identity that can be for a user, like you or me; for a device, like a laptop or cellphone, or IoT device; for a service, like a web server (think “green padlock”), a mail server, your organization’s LDAP server, or — in this case — a VPN server; or for just about anything else you can imagine.
- The identity typically has some statement of lineage, such as “John Doe of the SysAdmins Group of the Seattle Office of E-Corp” or “Email Service of the Cloud Services Group of E-Corp” or “cad0ab4f-23e3-4289-9eb4-89d9bf9bb35f of Marketing of Laptops of New York Office of E-Corp.”
- If the identity represents a network service, the certificate will undoubtedly contain some DNS information: FQDNs, aliases, IP addresses, etc.
- Each certificate can contain additional information about how the certificate can be used or how its use should be limited. A certificate may have stated use for identifying a TLS protected server or client, for use in signing (but not encrypting) email — or vice versa, for identifying an end user’s identity, for signing code, and so forth.
- The certificate contains a public key for the identity together with associated cryptologic parameters for using the key.
- The certificate contains time stamps indicating the range of dates for which the information is valid.
- Finally, the certificate is sealed with information about the entity that issued this certificate as well as its cryptologic signature of the certificate verifying that the certificate is authentic and has not been tampered with.
As long as each certificate in the chain is properly signed by its issuer and the lineage of the issuing parties eventually trace back to some authority that you trust, you’re good to go. Typically, you have the top-level certificate authorities and those authorities you encounter frequently on hand, though to make things convenient the client may present you not only with his certificate but its entire lineage. Again, if they’re all validly linked and the chain connects to an authority you trust, you trust the new one too — simple enough.
Typically, each entity in the chain generates and maintains its own public/private key pair, and that private key is never revealed to the lineage. It’s enough that an entity prove its identity to the authority and present its public key in a document that is signed using the private key — the Certificate Signing Request (CSR) stage. The authority can then verify the identity and, in checking the signature, verify that this identity does posses the private key matched with the public key presented. If so, done! The authority signs off and generates the certificate according to policy.
Long story short? It’s a fairly durable solution across networks that actually generates very little network traffic — in the sense that there doesn’t have to be a lot of back-and-forth between servers to verify the necessary information. I need only check that the certificate has valid lineage (which may not even require me to go check out-of-band), and that you have the private key that matches the public one in your certificate. No “passwords” or similar are ever shared that could be compromised or “replayed;” instead, I ask you to cryptologically sign something fairly random using your private key. If your response checks out against the public key, you’re in.
Certificates: They’re a Pain in the Ass
Seriously, they are.
For all the good they do, there’s quite a bit of overhead associated with using them. There’s an initial security issue in ensuring that Certificate Authorities (“CAs”) are not compromised lest they issue false certificates that will be trusted. There’s quite a bit of bookkeeping at the CA level in certificate life cycle management. There’s a distribution issue to ensure that the keys are securely delivered and installed, and that services are properly configured to use…
… and then there’s the time issue. Certificates have built in expiration dates. That was considered a possible benefit in the last section in the reduction of network traffic, but that does lead to two administrative issues:
- If you want to ensure access continues, you have to ensure the certificates are replaced with valid ones before they expire.
- If you want to ensure that access stops before the certificate expires — for instance, if a user is terminated, a service is decommissioned, a certificate is compromised, a CA is compromised, etc. — you’ve got some extra work to do to enable client systems to check.
Those two points are not always easy to balance. If we generate short-lived certificates to favor expiration and minimize cancellations, we have more issuing and tracking to do together with all the overhead. If we favor longer-lived certificates, we have to consider the probability of compromise and enforce cancellation checks. Typically, cancelling certificates implies maintaining a Certificate Revocation List (CRL), distributing the CRL to all services that might encounter a revoked certificate, and configuring the services to check presented certificates against the CRL That’s non-trivial when multiple services are in play.
And guess what? CRLs have validity dates too. Given a presented certificate and an expired CRL, the default, secure action is to reject the certificate. The novice’s preference is to generate long-lived CRLs to avoid unnecessary updates, replacing the CRL as necessary in between. However, that CRL is an authentic, signed document, valid between the stated dates; if I have a copy of an old CRL that is not expired and I can get the server to consult my copy instead of your updated one, … yeah, the service is potentially vulnerable if otherwise valid CRLs are floating around — and, no, there’s no revoking a CRL.
To overcome the CRL distribution overhead and time concerns, the Online Certificate Status Protocol (OCSP) was introduced allowing a networked client to ask, “Is this certificate valid right now?” Well, that’s pretty cool! Except now we have to introduce a trusted and available OSCP Responder Service on the network and configure the capable services to use it.
Now we have increased network traffic and complexity — and we still haven’t addressed handling the original expiration dates… I mean, if we have to check each certificate anyway, … Suddenly we’re moving closer to protocols such as SAML which move trust from the CA to one or more trusted servers.
So, what about that VPN connection?
We have a special Intermediate Certificate Authority (ICA) to handle servers authorized to access a server VPN using certificates only; there is another ICA handling users, which requires both a certificate to get to the front door with TLS client verification as well as username & password authentication for the user to get in. The user certificates can be longer-lived since the username & password check can be rejected if we deactivate the user centrally; the servers, however, have longer-lived certificates and a CRL. We missed the calendar tickler saying the server VPN CRL was about to expire. Since inbound certificates could not be checked against the expired CRL, they were rejected by default. The remedy involved bringing up a protected device with the server CA information, generating an updated CRL, moving that CRL to the network, pushing the CRL to the openvpn server, and restarting the service.
Tedious. Trivial? Well, yes — but definitely tedious.
And what about maintaining your PKI / Certificates?
If you have an operation in place, O&M consists mostly of staying ahead of expiration dates, generating new certificates, and cancelling unnecessary or compromised certificates as required.
Here are some basic questions to kick-start thinking:
- As a first step, forget all the fancy business above for a moment. Instead, ask if you know when your public website’s commercial certificate will expire. After all, it can certainly be embarrassing when your website throws warnings and errors to your clients expecting that green browser lock.
- Do your users or devices have certificates for special accesses — used for verifying internal servers and internal users? Do you know when they come due?
- How about the Certificate Authorities and Intermediate Certificate Authorities themselves — the entities that sign off on your certificates? When are they coming due?
- Are your CRLs up to date, distributed, and used everywhere they should be?
That was just the routine business. Now how about the security program checks?
- How are your own internal CAs / ICAs maintained? Who has access? Who has authority to issue and revoke?
- Does your incident response process trigger processes for handling compromised certificates? How about compromised CAs?
- Does your disaster recovery planning include rebuilding CAs? How about rekeying users, devices, and services?
When was the last time you reviewed your general PKI plans and procedures? When was the last time you verified your implementation matches the plan?
Like I said, there’s quite a bit of overhead. Stay ahead of the daily expirations. Schedule your security sanity checks. Today’s as good a day as any.
… and as always, if you need help with your PKI / Certificate strategies and implementations, feel free to contact us.