PHP Package Signing: My Current Thoughts
We figured out how to write good code. We figured out how to write good code in a reusable way…for the most part. We figured out how to distribute and mix all that good reusable code in a sensible fashion. Can we now figure out how to do it all securely?
Package signing is a simple enough idea, and I’ve been spending time trying to fit it, Composer and Packagist together as a model in my head. The concept is to have parties create a cryptographic signature of the code to be distributed at a specific point in time using a private key. If anyone were to change that code or its metadata (think composer.json) with malevolent intent, the end user would then notice that the cryptographic signature cannot be verified using the package author’s known public key. It’s a familiar topic from all those cryptography books you’ve doubtlessly read ;).
Alright, it’s actually a horrendously complicated topic that boggles the minds of many a programmer. We’re a practical bunch, and we just want the damn code. NOW!
Practical considerations and security are locked in a continuous battle for primacy. Look at SSL/TLS – it is a security nightmare but we keep it around because, until someone comes up with a decent replacement, the alternative is no encrypted HTTPS with a verifiable host for anyone. We continue to support old versions of SSL/TLS out of practical concerns despite knowing their weaknesses. They are old versions for a reason!
Those same concerns have been at war in my own head since last week, when I made the mistake of contemplating package signing. Eventually, my practical side won out and my security persona has been sulking in a corner ever since refusing to talk to me.
The problem with package signing from my perspective is tied up in a phrase most of you would know: The needs of the many outweigh the needs of the few. Thank you, Spock.
PKI vs GPG (Some Context!)
I won’t go into too much detail here…
Right off the bat, we have two contenders for signing packages: Public-key infrastructure (PKI) backed by a Certificate Authority (CA) and Pretty Good Privacy (PGP) also commonly referred to by its GNU implementation, GNU Privacy Guard (GPG). You’d be most familiar with PKI in the form of the SSL certificates used online. Both have the notion of private keys and public keys. Data encrypted by one key can only be decrypted by the other key. If you keep one private, then holders of the public key can always verify that data sent by you was really sent by you. If you lose the private key, you’ll need to revoke it and get a new one.
Assuming, they trust it is you to start with!
Trust is the core difference between PKI and GPG. How do you know, with certainty, than any given public key is firmly associated with the person you know it should be associated with? Maybe it’s a hacker posing as that person? Maybe it’s the local friendly NSA office masquerading as Google? Establishing trust takes diverging paths for PKI and GPG. PKI keys (in the form of certificates) are either self-signed or signed by a trusted Certificate Authority. Generally, we put zero faith in self-signed certificates because anyone can claim to be anyone else using them. We instead trust a select number of CAs to sign certs because they’ll hopefully do stuff like asking for passports, addresses, and other person or company specific information to verify any entity’s real identity before doing so. GPG avoids centralised authorities like the plague and instead uses a “web of trust” model where everyone can sign everyone else’s public key, i.e. the more of these endorsements a GPG private key gets, the more likely it is that the public key represents the expected identity (based on the number of endorsers you already trust). Webs of trust require time, care, and effort, but they have been extremely successful and certainly do work.
Excellent, your brain is still not smeared over the monitor. That’s a good sign. Now, what the heck has this got to do with PHP and package signing?
People Are Lazy And Cheap
These are the two things you can count on with rational people, though perhaps using more charitable terms. People don’t want to do any more work than they need to, and they generally don’t want to spend any more money than they need to. Add those to one other – they generally don’t think about security when downloading code – and you have something of a problem.
PKI dies an immediate death in this worldview, because obtaining a CA issued code signing certificate costs money. That guy who wants to put 100 lines of code onto Github? That’ll be $100 please. Package signing using a CA will never work for PHP because it imposes a monetary cost on open source contributions. Many of us might not blink at spending $100 dollars to indulge our willingness to write free code, but we’d probably all prefer not to.
GPG is utterly free. Surely it’s a winner then? That guy who wants to put 100 lines of code onto Github? He probably has to now generate a new GPG keypair with a resulting public key that nobody has signed and which nobody will trust. He’s way past caring now. And so are the potentially tens of thousands of people who can’t verify his code. They want the damn code. NOW! Not months down the line when he’s begged, pleaded and bled his willpower dry trying to get sufficient signatures from others in the community to get a widely trusted public key. I’m exaggerating a wee bit, but the barriers to entry for GPG were intended to prevent weakness. There are shortcuts, but shortcuts undermine the purpose of a web of trust. Meeting in person is one of the most often quoted means of getting your GPG public key signed properly. You can find me somewhere in the Wicklow mountains of Ireland. I’ll wait ;).
From a security perspective, both of these options can and do work. Practically? They sort of stink if you want global adoption. One costs money, and the other requires time and effort for both package authors and users. In the real world, these problems are not uncommon. Throw GPG signatures at everything, and a tiny minority will actually bother checking them. Require CA code signing certs, and you will be ignored by programmers. These options, particularly given the distributed nature of PHP packages (i.e. github repos), will only ever serve a minority of users. Would you oblige us again, Spock?
The needs of the many outweigh the needs of the few.
Thank you, gravelly voiced Spock. Sorry about Abrams, we didn’t think he’d actually blow up Vulcan, replace phasers with blasters, and subtitute you with an emotional basket case because he thought he was making a Star Wars movie.
I just killed PKI and GPG as the basis for any proposal I’d make for package signing in PHP. Making <1% of the community extremely safe while leaving >99% of it extremely vulnerable leaves me with a bad taste in my mouth because it conflicts with my desire to spread security as much as possible to protect as many people as possible. I could blame the ignorance of users for the need to give up some notional perfect security, but they just want their code and that is the reality we live in. One might as well stand on a beach battling the tides if they are going to deny that Humans are, well, only Human. So, it’s time to come up with something else that can be easily implemented in PHP.
The Double Signature Alternative
The double signature approach to package signing relies on having a minimum of two parties verify the integrity of a package independently of each other. Luckily, we already have two parties: the package author and the Packagist repository. The basis of its effectiveness is probably obvious. Each independent signature requires a separate private key which is ultimately held offline. In order for an attacker to compromise a package, they would also need to compromise both private keys. This assumption rests on the notion that each party’s public keys are known by the user in advance and maintained in some permanent keystore, e.g. included in composer.phar or accepted upon first download of a public package and stored in a simple file that you can copy between systems (in effect quite like GPG’s keyring).
Let’s say that I tag a release for LibraryX 1.0.0 tomorrow. As the package author, I would sign it using a private key that I generated at home. I’d then advertise the public key widely for users to download and cache (our permanent keyring). Packagist, when building the metadata for the new tagged release, would also generate its own signature for the package. Our download client, which we’ll assume is Composer, will check both signatures and only accept a package for use when both signatures can be verified.
While I refer to this as a “double signature” approach, there remains scope for adding a third independent party. It also doesn’t necessarily impose digital signing on package authors. Package authors can simply not sign anything, but Packagist would still do its own signing. This significantly lessens security but it trumps the current situation where Packagist signs nothing at all. It should also have Composer differentiate by marking packages to be installed as verified or partially verified, while allowing options for users to impose a mimimum acceptable level of verification.
Happily, this line of thinking is exactly what is at the core of a proposed solution for both Rubygems and Python’s pypi called theupdateframework. A little external validation doesn’t hurt and my security persona might stop sulking soon. It can also be implemented entirely using openssl.
Signing A Package
At its core, signing a package for PHP is straightforward. For every single file in the package, you calculated its checksum, e.g. a SHA256 hash. You store a list of files and their hashes in a single file called the manifest. You then sign the manifest. Upon download, the user can verify the manifest’s signature and check that the list of files and hashes it contains actually matches the files in the package received.
If only it were that simple…
Metadata Is Dangerous
This part is somewhat technical, but hopefully it makes sense!
While we might suspect that files are important, securing something like Packagist requires an obsession with package metadata. Whenever you use Composer it doesn’t run off to Github, it downloads a file called packages.json from packagist.org. Composer quite literally relies on Packagist to tell it about available packages, versions and URLs to remote downloads and VCS urls. In the absence of a secure signature-based process, this creates a gigantic single point of failure for all Composer users.
If Packagist is ever hacked, an attacker can now respond to every single Composer request unchallenged. And Composer implicitly trusts everything that Packagist tells it. Basically, it would allow an attacker to poison the entire population of Composer users. That is simply intolerable.
Metadata is the core problem we need to solve. We need to prevent attackers from redirecting package downloads, informing us of incorrect available versions, replaying old copies of the metadata, and so on. There must also then be a way to recover from the attack. In other words, we need a metadata architecture that can survive Packagist’s downfall and allow for its restoration.
If we implement not just package signing, but more specifically metadata signing, then we immediately alleviate the problem. It’s still based on the primary goal of there being at least two private keys which are kept offline. The missing features that it also requires are fourfold:
1. Role Delegation
3. Signing Thresholds
4. Key Revocation
Role delegation is a simple mechanism where Packagist maintains private key(s) to sign all of its automatically generated metadata files. These keys are delegated to by one or more master root keys which are kept offline. If the server is ever compromised, the root key is not. This would allow the Packagist maintainers to, upon restoring control, revoke the online delegated keys and replace them.
Timestamping prevents attackers from reusing older correctly signed metadata by imposing an expiry date and version number. Old data will expire, new data will have a limited predetermined validity period. Versioning merely ensures the Composer client can check that metadata it downloads is fresher than an older copy, i.e. it can continue running off cached copies.
Signing thresholds are an opt-in measure where you can require certain metadata to be co-signed. For example, Packagist could require that automated key delegations are signed by at least two private keys (increasing the difficulty for attackers since they now need both!). You could do the same at the developer level for newly tagged releases.
Key revocation is another recovery mechanism. It allows the holder(s) of root keys to revoke delegated keys. After a compromise, you’ll want to replace the online private keys. This can all be done by simply root signing a new file which details the delegated keys (I’m trying to avoid too many implementation details so hit me up in the comments if you’re confused).
All of this works in concert, but with one crucial additional element I need to look into. We’re basically saying that blind trust in Packagist is a bad thing, even with signing, so we can’t only rely on Packagist or we’d be vulnerable to replay/rollback attacks. For example, we should have a means of independently cross-verifying versions back to their origin, the actual git repository for the package – for example, using “git ls-remote –tags” to compare release tags as a means of validating the correctness of Packagist metadata. I have no idea how that would work outside of git, but it’s an obvious validation method that is possible due to Packagist’s distributed nature.
This article was me thinking aloud but, if you follow the logic, it demonstrates that package signing is an understatement. We don’t want to just secure packages, we also want to secure the metadata that describes what packages exist. It’s a far more complex problem than first glances suggest. Luckily, it’s not an overwhelming problem. It is entirely possible to implement basic defences using openssl, public/private keys, and some independent validation separate from Packagist itself. Though your brain will appreciate it more when it’s automated ;).