Is Facebook’s HHVM Building PHP’s Coffin?

English: THIS IS SPARTA

English: THIS IS SPARTA (Photo credit: Wikipedia)

With HHVM 3.0 now released, it’s probably time to start talking about HHVM and the new Hack Language. It’s becoming hard to ignore some of the fantastical notions spreading on the grapevine about HHVM. There is talk of significant performance improvements, a multitude of new features courtesy of Hack, that PHP Internals is actually now outnumbered by HHVM contributors. There is even treasonous talk of PHP’s Zend Engine being put out to pasture.

Perhaps worse, HHVM 3.0 is following in 2.0′s steps: It is steadily eroding away at the notion that Facebook is going to abandon HHVM tomorrow (specifically at tea time once Mark has finished his crumpets) and it’s reinforcing the notion that Facebook actually wants people to adopt HHVM…as if running framework test suites and blogging about PHP parity every other day didn’t clue you in. Before 2.0, you’d swear the whole project was either top secret or had been disavowed.

This Was Inevitable…

PHP is not the only language to have faced multiple implementations. Ruby MRI (Matz’s Ruby Interpreter) is the “original” Ruby. It now puts up with JRuby, Rubinius, IronRuby, MagLev and MacRuby (where would we be without an Objective-C option, right?). Python is also not alone. It has alternatives (a lot of them) but you’d most likely recognise IronPython, PyPy, and Jython.

PHP is merely doing what it does best – creeping up on other languages and stealthily “borrowing” the best of their advantages. This time it’s not actually PHP doing the creeping, however, and beating PHP at the catchup game is a big deal. Just in case you haven’t heard – even HHVM is hardly the only option for PHP either. You can use Zephir to write PHP-like code which compiles to a PHP extension. Recently, I heard about HippyVM which appears to be in its infancy though those kids can grow up very fast (and it’s association with PyPy means it can’t be dismissed out of hand). There’s also the penultimate “opt-out” options for converting PHP out to Java, for example. The ultimate option being to just maintain the output and dump PHP altogether. Never overestimate your PHP code’s permanence ;) .

I haven’t really discussed HHVM anywhere because the equation Facebook presented us with just didn’t add up for my particular circumstances. Now it does. HHVM is becoming ever more compelling as the weeks roll by. The PHP parity quest, the Hack Language, the shift to FastCGI and, most importantly, HHVM’s rapid improvement over time are creating something extremely attractive. Yes, it performs really well, but that’s not always the most relevant factor to programmers on the ground churning out everyday applications.

To PHP or not to PHP?

I’m seriously thinking about using the damn thing! This poses one small troubling question which keeps gnawing away at my poor brain: What is PHP?

PHP is a language without a specification to define all of its behaviour. It exists as one implementation that other alternatives must struggle to copy. It’s a hard slog, I have no doubt, and the fact that people undertake that slog is a testament to PHP’s place among programming languages. It’s also a serious problem. If the slog is hard enough, and it definitely must be without a specification, then the motivation is there to depart from PHP entirely. Which is exactly what is happening with the Hack Language. It is not PHP. And that PHP parity quest? That will not result in PHP either. Currently that is a problem, for now, until people get around to figuring out that something that’s almost PHP is actually good enough. It’s already good enough to write applications on using the most popular frameworks we have.

That one implementation of PHP is managed by PHP Internals, a group which (while I don’t want to tar all its members with the same brush!) is commonly described in terms that are not flattering.

The group has, at various times, been accused of being dysfunctional, disconnected from programmers on the ground, of being hostile and obnoxious, and has members who appear to vote against RFCs for no perceivable cause and do so frequently. In other words, the reputation of PHP-Internals doth suck eggs. Anthony Ferrara referred to it as “a toxic kindergarten” and Phil Sturgeon described it more poetically as “a cold, harsh, unwelcome land”.

I remember when Composer arrived. You may too. Here was this tool that installed stuff automatically from Github while resolving dependencies. It was revolutionary. It was so revolutionary, that I barely believed it was real because it was too much to hope for. Even as Composer took PHP by storm, there were still people stubbornly clinging to the notion that this upstart could not possibly replace PEAR. They were wrong. PEAR is dead. Some readers may still use it daily, some may even still be clinging to the final strands of hope that it will rise again, and I know it still releases code. I’ll amend my erroneous prognosis to “being in the grips of an irreversible terminal illness”. Better? I feel sorry for the PEAR group, and I am grateful for their years of work, but programmers never wanted PEAR. That was PEAR’s undoing – everything programmers knew at large about PEAR was not what they really wanted.

I wonder if PHP Internals has realized that they are facing the exact same problem?

Do I actually want to spend the next decade waiting for PHP to add Generics as a feature? No. Do I want to wait even 2 years for scalar typehinting? No. What about Collections? Surely I can wait 6 months and rely on SPL until then? No. Can’t I at least be reasonable and wait for PHP 6 to level the performance playing field? NO!

Bear in mind that HHVM+Hack doesn’t just add those. It has what I would consider years of PHP’s potential drip-fed headline features packed into just the existing version. The very first official version. What will get added in the next version? I don’t know, but I’m excited to find out.

Did Facebook Just Kill PHP?

PHP, as we know it, is starting to smell. It has gone from being the only PHP in town, to being the slowest, with the least number of features, and the one that’s subject to dysfunctional governance. The new PHP is called Hack, a new language with only the briefest of documentation since you can learn the other 99.9% of this language over on the PHP manual. The name is perfectly chosen. It’s exactly what it appears to be – a hacked version of the PHP language. Something that a population of programmers, perhaps irked by having to use PHP and impatient with PHP Internals, might put together by adding borrowed parts from Java and other programming languages.

The best is yet to come. What happens when PHP Internals decides to implement, say, Generics but with a syntax that differs from Hack? Will Facebook rewrite all the lines of Hack code and update its manual? I’m sure they will, in their imaginations, while laughing very loudly at the idea. They use <?hh instead of <?php for a reason. The whole language supports easy migration away from PHP (the original, the Zend Engine one, whatever we’re calling it now) – it’s not designed to allow migration from Hack back to PHP by giving up features. The value of any migration is, in this case, a one way street. That migration certainly has value. It obviously has value.

No, Hack is something else. It might spark a few arguments over not playing nice by copying PHP’s new syntax, but he who builds first doesn’t have any reason to go along with he who came late…

PHP isn’t really dead, of course. It will exist for years to come. Even I will likely still be using it in 2020. Without my crystal ball, I can only guess at the future. However, looking at HHVM+Hack and then at PHP, I am scratching my head trying to figure out how PHP can continue on its current trajectory. The HHVM guys are not stupid people – they took a language that is easy to learn, used by countless programmers, and simply made it faster and better. If you ignore Hack, they just made it faster. A lot faster.

Perhaps, I have a few neurons misfiring and I’ve arrived at the wrong conclusion due to their misbehaviour. I suppose that’s possible. Maybe.

I guess we’ll just have to wait and see how PHP Internals ends up evolving PHP to meet this new challenge. Just don’t take too long, people – I’m an impatient programmer and I like shiny new things that equate to cheaper hosting bills and improved source code.

Enhanced by Zemanta

PHP Package Signing: My Current Thoughts

Image of a modern fountain pen writing in curs...

(Photo credit: Wikipedia)

We figured out how to write good code. We figured out how to write good code in a reusable way…for the most part. We figured out how to distribute and mix all that good reusable code in a sensible fashion. Can we now figure out how to do it all securely?

Package signing is a simple enough idea, and I’ve been spending time trying to fit it, Composer and Packagist together as a model in my head. The concept is to have parties create a cryptographic signature of the code to be distributed at a specific point in time using a private key. If anyone were to change that code or its metadata (think composer.json) with malevolent intent, the end user would then notice that the cryptographic signature cannot be verified using the package author’s known public key. It’s a familiar topic from all those cryptography books you’ve doubtlessly read ;) .

Alright, it’s actually a horrendously complicated topic that boggles the minds of many a programmer. We’re a practical bunch, and we just want the damn code. NOW!

Practical considerations and security are locked in a continuous battle for primacy. Look at SSL/TLS – it is a security nightmare but we keep it around because, until someone comes up with a decent replacement, the alternative is no encrypted HTTPS with a verifiable host for anyone. We continue to support old versions of SSL/TLS out of practical concerns despite knowing their weaknesses. They are old versions for a reason!

Those same concerns have been at war in my own head since last week, when I made the mistake of contemplating package signing. Eventually, my practical side won out and my security persona has been sulking in a corner ever since refusing to talk to me.

The problem with package signing from my perspective is tied up in a phrase most of you would know: The needs of the many outweigh the needs of the few. Thank you, Spock.

PKI vs GPG (Some Context!)

I won’t go into too much detail here…

Right off the bat, we have two contenders for signing packages: Public-key infrastructure (PKI) backed by a Certificate Authority (CA) and Pretty Good Privacy (PGP) also commonly referred to by its GNU implementation, GNU Privacy Guard (GPG). You’d be most familiar with PKI in the form of the SSL certificates used online. Both have the notion of private keys and public keys. Data encrypted by one key can only be decrypted by the other key. If you keep one private, then holders of the public key can always verify that data sent by you was really sent by you. If you lose the private key, you’ll need to revoke it and get a new one.

Assuming, they trust it is you to start with!

Trust is the core difference between PKI and GPG. How do you know, with certainty, than any given public key is firmly associated with the person you know it should be associated with? Maybe it’s a hacker posing as that person? Maybe it’s the local friendly NSA office masquerading as Google? Establishing trust takes diverging paths for PKI and GPG. PKI keys (in the form of certificates) are either self-signed or signed by a trusted Certificate Authority. Generally, we put zero faith in self-signed certificates because anyone can claim to be anyone else using them. We instead trust a select number of CAs to sign certs because they’ll hopefully do stuff like asking for passports, addresses, and other person or company specific information to verify any entity’s real identity before doing so. GPG avoids centralised authorities like the plague and instead uses a “web of trust” model where everyone can sign everyone else’s public key, i.e. the more of these endorsements a GPG private key gets, the more likely it is that the public key represents the expected identity (based on the number of endorsers you already trust). Webs of trust require time, care, and effort, but they have been extremely successful and certainly do work.

Excellent, your brain is still not smeared over the monitor. That’s a good sign. Now, what the heck has this got to do with PHP and package signing?

People Are Lazy And Cheap

These are the two things you can count on with rational people, though perhaps using more charitable terms. People don’t want to do any more work than they need to, and they generally don’t want to spend any more money than they need to. Add those to one other – they generally don’t think about security when downloading code – and you have something of a problem.

PKI dies an immediate death in this worldview, because obtaining a CA issued code signing certificate costs money. That guy who wants to put 100 lines of code onto Github? That’ll be $100 please. Package signing using a CA will never work for PHP because it imposes a monetary cost on open source contributions. Many of us might not blink at spending $100 dollars to indulge our willingness to write free code, but we’d probably all prefer not to.

GPG is utterly free. Surely it’s a winner then? That guy who wants to put 100 lines of code onto Github? He probably has to now generate a new GPG keypair with a resulting public key that nobody has signed and which nobody will trust. He’s way past caring now. And so are the potentially tens of thousands of people who can’t verify his code. They want the damn code. NOW! Not months down the line when he’s begged, pleaded and bled his willpower dry trying to get sufficient signatures from others in the community to get a widely trusted public key. I’m exaggerating a wee bit, but the barriers to entry for GPG were intended to prevent weakness. There are shortcuts, but shortcuts undermine the purpose of a web of trust. Meeting in person is one of the most often quoted means of getting your GPG public key signed properly. You can find me somewhere in the Wicklow mountains of Ireland. I’ll wait ;) .

From a security perspective, both of these options can and do work. Practically? They sort of stink if you want global adoption. One costs money, and the other requires time and effort for both package authors and users. In the real world, these problems are not uncommon. Throw GPG signatures at everything, and a tiny minority will actually bother checking them. Require CA code signing certs, and you will be ignored by programmers. These options, particularly given the distributed nature of PHP packages (i.e. github repos), will only ever serve a minority of users. Would you oblige us again, Spock?

The needs of the many outweigh the needs of the few.

Thank you, gravelly voiced Spock. Sorry about Abrams, we didn’t think he’d actually blow up Vulcan, replace phasers with blasters, and subtitute you with an emotional basket case because he thought he was making a Star Wars movie.

I just killed PKI and GPG as the basis for any proposal I’d make for package signing in PHP. Making <1% of the community extremely safe while leaving >99% of it extremely vulnerable leaves me with a bad taste in my mouth because it conflicts with my desire to spread security as much as possible to protect as many people as possible. I could blame the ignorance of users for the need to give up some notional perfect security, but they just want their code and that is the reality we live in. One might as well stand on a beach battling the tides if they are going to deny that Humans are, well, only Human. So, it’s time to come up with something else that can be easily implemented in PHP.

The Double Signature Alternative

The double signature approach to package signing relies on having a minimum of two parties verify the integrity of a package independently of each other. Luckily, we already have two parties: the package author and the Packagist repository. The basis of its effectiveness is probably obvious. Each independent signature requires a separate private key which is ultimately held offline. In order for an attacker to compromise a package, they would also need to compromise both private keys. This assumption rests on the notion that each party’s public keys are known by the user in advance and maintained in some permanent keystore, e.g. included in composer.phar or accepted upon first download of a public package and stored in a simple file that you can copy between systems (in effect quite like GPG’s keyring).

Let’s say that I tag a release for LibraryX 1.0.0 tomorrow. As the package author, I would sign it using a private key that I generated at home. I’d then advertise the public key widely for users to download and cache (our permanent keyring). Packagist, when building the metadata for the new tagged release, would also generate its own signature for the package. Our download client, which we’ll assume is Composer, will check both signatures and only accept a package for use when both signatures can be verified.

While I refer to this as a “double signature” approach, there remains scope for adding a third independent party. It also doesn’t necessarily impose digital signing on package authors. Package authors can simply not sign anything, but Packagist would still do its own signing. This significantly lessens security but it trumps the current situation where Packagist signs nothing at all. It should also have Composer differentiate by marking packages to be installed as verified or partially verified, while allowing options for users to impose a mimimum acceptable level of verification.

Happily, this line of thinking is exactly what is at the core of a proposed solution for both Rubygems and Python’s pypi called theupdateframework. A little external validation doesn’t hurt and my security persona might stop sulking soon. It can also be implemented entirely using openssl.

Signing A Package

At its core, signing a package for PHP is straightforward. For every single file in the package, you calculated its checksum, e.g. a SHA256 hash. You store a list of files and their hashes in a single file called the manifest. You then sign the manifest. Upon download, the user can verify the manifest’s signature and check that the list of files and hashes it contains actually matches the files in the package received.

If only it were that simple…

Metadata Is Dangerous

This part is somewhat technical, but hopefully it makes sense!

While we might suspect that files are important, securing something like Packagist requires an obsession with package metadata. Whenever you use Composer it doesn’t run off to Github, it downloads a file called packages.json from packagist.org. Composer quite literally relies on Packagist to tell it about available packages, versions and URLs to remote downloads and VCS urls. In the absence of a secure signature-based process, this creates a gigantic single point of failure for all Composer users.

If Packagist is ever hacked, an attacker can now respond to every single Composer request unchallenged. And Composer implicitly trusts everything that Packagist tells it. Basically, it would allow an attacker to poison the entire population of Composer users. That is simply intolerable.

Metadata is the core problem we need to solve. We need to prevent attackers from redirecting package downloads, informing us of incorrect available versions, replaying old copies of the metadata, and so on. There must also then be a way to recover from the attack. In other words, we need a metadata architecture that can survive Packagist’s downfall and allow for its restoration.

If we implement not just package signing, but more specifically metadata signing, then we immediately alleviate the problem. It’s still based on the primary goal of there being at least two private keys which are kept offline. The missing features that it also requires are fourfold:

1. Role Delegation
2. Timestamping
3. Signing Thresholds
4. Key Revocation

Role delegation is a simple mechanism where Packagist maintains private key(s) to sign all of its automatically generated metadata files. These keys are delegated to by one or more master root keys which are kept offline. If the server is ever compromised, the root key is not. This would allow the Packagist maintainers to, upon restoring control, revoke the online delegated keys and replace them.

Timestamping prevents attackers from reusing older correctly signed metadata by imposing an expiry date and version number. Old data will expire, new data will have a limited predetermined validity period. Versioning merely ensures the Composer client can check that metadata it downloads is fresher than an older copy, i.e. it can continue running off cached copies.

Signing thresholds are an opt-in measure where you can require certain metadata to be co-signed. For example, Packagist could require that automated key delegations are signed by at least two private keys (increasing the difficulty for attackers since they now need both!). You could do the same at the developer level for newly tagged releases.

Key revocation is another recovery mechanism. It allows the holder(s) of root keys to revoke delegated keys. After a compromise, you’ll want to replace the online private keys. This can all be done by simply root signing a new file which details the delegated keys (I’m trying to avoid too many implementation details so hit me up in the comments if you’re confused).

All of this works in concert, but with one crucial additional element I need to look into. We’re basically saying that blind trust in Packagist is a bad thing, even with signing, so we can’t only rely on Packagist or we’d be vulnerable to replay/rollback attacks. For example, we should have a means of independently cross-verifying versions back to their origin, the actual git repository for the package – for example, using “git ls-remote –tags” to compare release tags as a means of validating the correctness of Packagist metadata. I have no idea how that would work outside of git, but it’s an obvious validation method that is possible due to Packagist’s distributed nature.

Conclusion

This article was me thinking aloud but, if you follow the logic, it demonstrates that package signing is an understatement. We don’t want to just secure packages, we also want to secure the metadata that describes what packages exist. It’s a far more complex problem than first glances suggest. Luckily, it’s not an overwhelming problem. It is entirely possible to implement basic defences using openssl, public/private keys, and some independent validation separate from Packagist itself. Though your brain will appreciate it more when it’s automated ;) .

Enhanced by Zemanta