We’ve heard many times before the myth that Software-Defined Networks and OpenFlow aren’t ready for prime-time. Late last year and earlier this year, we heard Cisco's thoughts on OpenFlow (including from a VP of Data Center Switching): “We don’t think it’s production-ready.”
A couple of weeks ago, this myth was completely busted by Urs Hoelzle’s keynote at the Open Networking Summit (ONS), where he revealed that Google’s biggest production datacenter network is a Software-Defined Network running 100% OpenFlow.
This revelation was simultaneously electrifying, clarifying, and empowering. Just like last Interop was a coming-out party for OpenFlow, I think we’ll look back at this announcement as the point where the whole networking industry realized SDN/OpenFlow is serious business, and I expect the number of SDN-related investigations and adoptions to start taking off from here (we'll check at the end of the year if we've reached my 20% prediction - Google's announcement will certainly help!). Indeed, we’ve already had a surge of interest since the ONS – it was an amazing show for Big Switch Networks with all of our talks, tutorials, and partner demos.
So, let’s go mythbusting! (All credit to Adam and Jamie of MythBusters – a show I love watching with my kids…)
Think about all the critical applications and services that Google offers: search, maps, videos, ad networks, social networking, apps, … Obviously, you’d suspect that the backend, inter-datacenter network for all those services must be big. But you may not know just how *Big* it is. Their Internet-facing network was already the second largest network in the world at the end of 2010, accounting for somewhere between 6-10% of all Internet traffic. That’s BIG, but their backend network is even BIGGER than that. Thus, it’s very plausible that Google’s backend, inter-datacenter network is the BIGGEST network in the world, and it’s 100% SDN/ OpenFlow!
There are many variants of the “OpenFlow is not production-ready” myth, usually following the form, “OpenFlow is not < blank > enough to run in production.” Let’s examine three of those myth variants and see how Google’s talk busted them.
Many folks in networking look at the OpenFlow 1.0 spec and lament that its feature set isn’t complete enough to implement everything people want to do with networking.
It’s true that OpenFlow 1.0 has a basic set of instructions (we’ve likened it to an initial version of an x86 instruction set) and that the ONF working groups are doing some great work to expand the instruction set to make it even more powerful/capable.
However, there are many production use cases that can be implemented with OpenFlow 1.0. We’ve talked about network virtualization as a very compelling use case, but Google presented another one: run a multi-datacenter, global WAN backbone. Sounds complete enough to me!
Again, after reading the OpenFlow 1.0 spec, people sometimes initially think that a centralized Software-Defined Network must be too slow to handle production traffic at the speeds that traditional networks run.
Besides the obvious “if it’s running production on one of the biggest networks in the world, then it must be fast enough,” there’s another important measure of speed that Hoelzle highlighted: convergence times upon failure. There’s a whole lot of difficult tuning to get convergence the way you want it upon link failure in a traditional network, and distributed protocols can only get you so far in terms of reducing the convergence time.
We in the SDN/OpenFlow community have seen that this is where centralized control planes can really shine. The control planes are running on server HW where the amount of memory and compute power are an order of magnitude bigger and faster and that server HW can be scaled-out in a cluster. So the reconvergence computation is quite easy to handle and the failover directions can be quickly propagated throughout the network. Indeed, Google saw nearly an order of magnitude faster convergence with their network and had much higher confidence that the centrally computed convergence was good and accurate:
This actually came up during Hoelzle’s Q&A session, and I was so glad that someone asked it. Someone asked me this same question at the Ethernet Summit back in February, and I presented an argument very similar to Hoelzle’s – a centralized, decouple control plane can actually be made more secure than a traditional distributed network.
Hoelzle reasoned that a single router being compromised in a backbone is already disastrous (high-publicized examples here and here). The problem is a traditional distributed network has so many “doors” to secure. So many different protocols, so many different devices, so many different layers – all have to be coordinated carefully in order to ensure no leakage.
Indeed, a number of former colleagues of mine are at a company (RedSeal) whose very premise is that traditional networks have become too difficult to secure unless you use specialized analysis tools and professional services.
So many doors to secure in a traditional network...
On the other hand, SDN and OpenFlow offer a decoupled control plane that has far fewer nodes to secure. These nodes are not in the data plane, and they’re effectively server nodes – we have a lot of expertise in hardening/securing those. The distributed switches themselves now have only one protocol or “door” to secure – the OpenFlow protocol. So in the end, the SDN/OpenFlow network can be made more secure and can be maintained as more secure than a traditional network.
In many conversations we’ve had with customers over the last 12 months, the question of SDN/OpenFlow’s production-readiness has always reared its head, and now I believe CIOs, networking architects and designers, and IT professionals everywhere can know that SDN/OpenFlow is real, and they should start seriously looking at it ASAP.
We need to cover this in a later post, but I’ll fully acknowledge the world-class talent and expertise Google has in networking and software. I was sitting with Jim Wanderer during the keynote (his group at Google is responsible for much of their datacenter networking software), and I told him how excited I am that Google is finally telling the rest of the world what challenging and ground-breaking work they've been doing. They did something that only a couple of companies in the world could have done starting 18-24 months ago - awesome job by the Google crew! Good news is that solutions based on this technology are now coming out from Big Switch Networks and other companies.
So, as Adam and Jamie would say, “What’s the status of the myth that SDN/OpenFlow isn’t production-ready? Confirmed, busted, or plausible?”
SDN/OpenFlow is in production at Google, one of the Biggest networks in the world, so this myth is Busted!