The press has greatly under-reported the two security holes, called Meltdown and Spectre, that can without exaggeration be characterized as affecting just about every computing device in use today (with very rare exceptions, like the Apple Watch). And because the media has so badly dropped the ball, your humble blogger will start with a high-level introductory piece, in the hopes that the IT and security experts in our readership will chime in, ideally in comments, with more information and ideas. Lambert has more posts planned, and they will be more technical in nature.
One of the most obvious points, that cannot be made often enough, is that these security holes exist at the most foundational hardware level, the processors. Initial reports were that they could be fixed only via Very Extreme Measures, like getting hardware without the dodgy Intel chips. That was quickly scaled back to “oh, patches are being launched.”
The wee problem is that with a flaw this fundamental and widespread, these patches aren’t just any patches. Given the severity of the flaws (and Spectre is more recalcitrant than Meltdown), the industry’s incentives are to say whatever it can throw at the problem is adequate whether they really address the problems or not. These fixes are also said to slow down performance by 5% to 30% per process. That is a massive haircut, particularly in a high volume setting. Perhaps later optimizations can cut the performance cost, but the flip side is that later patches that do a better job could just as well increase the performance hit.
Moreover, it isn’t just that virtually everyone who has a computer (and that means smartphones too) is faced with what will feel like a big hardware downgrade in remedying these vulnerabilities. Even more important, it isn’t clear that any device with these flawed chips can ever be made secure again. While there was reason to assume that the NSA had managed to get back doors installed in every device, it’s one thing to have the NSA snooping on you. We now have the possibility of a much larger range of actors getting at your data. As our Clive put it:
And certainly for me, I’ve moved from a position of being fairly sure that most data I have either in the cloud or locally on my devices is secure and confidential to being totally convinced it’s been compromised already or could easily be by anyone who wants to.
We’ll provide links to some good overviews on Meltdown and Spectre and then give some initial examples from the financial services arena of their implications
Some Primers on Meltdown and Spectre
We’ll quote at length from , which broke the story, and which gave a good overview:
The fix is to separate the kernel’s memory completely from user processes using what’s called Kernel Page Table Isolation, or KPTI. At one point, Forcefully Unmap Complete Kernel With Interrupt Trampolines, aka FUCKWIT, was mulled by the Linux kernel team, giving you an idea of how annoying this has been for the developers.
Whenever a running program needs to do anything useful – such as write to a file or open a network connection – it has to temporarily hand control of the processor to the kernel to carry out the job. To make the transition from user mode to kernel mode and back to user mode as fast and efficient as possible, the kernel is present in all processes’ virtual memory address spaces, although it is invisible to these programs. When the kernel is needed, the program makes a system call, the processor switches to kernel mode and enters the kernel. When it is done, the CPU is told to switch back to user mode, and reenter the process. While in user mode, the kernel’s code and data remains out of sight but present in the process’s page tables.
Think of the kernel as God sitting on a cloud, looking down on Earth. It’s there, and no normal being can see it, yet they can pray to it.
These KPTI patches move the kernel into a completely separate address space, so it’s not just invisible to a running process, it’s not even there at all. Really, this shouldn’t be needed, but clearly there is a flaw in Intel’s silicon that allows kernel access protections to be bypassed in some way…
It appears, from what AMD software engineer Tom Lendacky was suggesting above, that Intel’s CPUs speculatively execute code potentially without performing security checks. It seems it may be possible to craft software in such a way that the processor starts executing an instruction that would normally be blocked – such as reading kernel memory from user mode – and completes that instruction before the privilege level check occurs.
That would allow ring-3-level user code to read ring-0-level kernel data. And that is not good.
The specifics of the vulnerability have yet to be confirmed, but consider this: the changes to Linux and Windows are significant and are being pushed out at high speed. That suggests it’s more serious than a KASLR bypass.
Richard Smith provided this simplification:
It is perhaps like a scam, with the processor as the uninformed front man, and the various user processes that share the processor as the ultimate victims.
The processor is hardwired to make *good* guesses [“speculative execution”] about what to do next, and makes a good job of it. However, the quality of the guess is based on the fundamental and inevitable assumption that the instruction stream is, as it were, honest, about what it is trying to do. CPUs can’t help making this assumption; they have no insight into what smells wrong and what doesn’t.
So: subvert these good intentions by presenting the processor with a dishonest and, to insightful human eyes, wildly improbable instruction stream, that is cunningly engineered to extract information about the carefully hidden inner workings of the host. The processor will now haplessly & obliviously leak info about stuff it’s meant to keep secret, thus compromising the security of all the other user processes.
Needless to say, cryptocurrency owners, this includes your holdings.
This post, , provides another good layperson-accessible description of the flaws (hat tip EM, who it turns out is such a hard core geek that he runs something Raspberry Pi-like).
Lambert liked the description in this tweet, because it provides a layperson-friendly discussion of how to make an instruction stream “dishonest,” as Richard puts it, but I was less enamored of it by virtue of not being able to relate it to actual computer operations. But if you are more computer-savvy, the analogy may seem more obvious:
Here’s my layman’s not-totally-accurate-but-gets-the-point-across story about how & type attacks work:
Let’s say you go to a library that has a ‘special collection’ you’re not allowed access to, but you want to to read one of the books. 1/10
— Joe Fitz (@securelyfitz)
Some Examples of Why This Is a Big Deal From Banking
Recall that the financial services industry is one of the most demanding IT environments: extremely high transaction volumes, many of which are mission critical, and very low tolerance for errors. The industry has made this bad situation worse by regularly under-investing in IT, so it is running with little headroom in many activities.
Consider some possible ramifications of a 5% to 30% increase in processing time:
Many large international banks run their big batch processes overnight, in New York time terms. Those need to complete execution before the start of the trading day in the US. What happens if the Meltdown and Spectre patches slow processing time so much that they can’t complete the overnight jobs by the opening of the trading day? As our Clive noted:
My TBTF — as per standard industry practices — rolls out security patches without much, if any, testing (and such testing that is performed is to deploy to a test PC and test server in the company’s test domain that, so long as it doesn’t fall over in a heap after perfunctory checks, is deemed to be a pass; this is acceptable because security patches shouldn’t touch functionality and shouldn’t make substantial and fundamental changes to wide ranging components like the kernel as this fix does).
But this fix inevitably kills some machine benchmarks by 30% or so. For some services that are already running “hot” (limited headroom available at peak processing times, for example) due to sweating assets and starvation-level budgets for upgrades which are now the norm, this will be more than sufficient to push them over the edge and into outages. No performance and capacity testing — which is among the most long winded and resource-intensive to do — will be scheduled because, realistically, it would take months to do properly and this fix needs to be rolled out now because it is exploitable and if successfully exploited can compromise all other security measures.
And recall, at the top of the post, we expressed our doubts that anything other than getting on entirely new chips would fully remedy the bugs. Clive highlights the implication of system security being in doubt:
A big problem for financial services is that, when a customer’s on-line banking facility (and usually then their account) is compromised and the customer suffers a loss — via money transmission (wire transfer) to a fraudster’s bank account — the bank will invariably, at least initially, try to claim their systems are infallible and if a customer user ID and password was used alongside whatever two-factor authentication is adopted then the customer has been negligent in some way.
I don’t see how with this, alongside other similar security flaws, any bank can now claim their security systems are, well, secure.
I expect this will be increasingly tested both in regulator-run dispute resolution, mandatory arbitration or, eventually, the courts. That will be a fairly seismic shock — financial services are used to showing up and simply saying “hey, we’re a bank, *of course* our systems are secure and fool-proof” without needing to supply serious evidence to show they’ve not fallen foul of the myriad of glitches that are now out there, in the public records.
One of the few upsides is that the increased processing time cost is a mini-transaction tax on high-frequency trading, which as we have discussed, is an entirely parasitic activity that should have been regulated or taxed out of existence long ago (among other things, it creates the worst possible market structure and drains market liquidity when it is most needed).
Needless to say, even the experts are only beginning to get their arms around what it will take to remedy these epic security flaws and what the costs will be. And the incumbents have incentives to minimize how bad things are. So reader sightings, both in the trade press and from their own experience will be very helpful in making a more accurate diagnosis.