In the past few months there has been a flurry of activity and announcements about network verification and assurance technologies, which promise to tackle various obstacles that stand in the way of network administrators. The purpose of these technologies is to eliminate human errors that can cause major network outages, especially in the complex and seemingly heterogeneous networks of today. Human errors include anything from loading the wrong software to mis-cabling to misconfiguration. Without a doubt, this is a critical concern for private or hybrid cloud operators, as well as those operating other types of networks, such as campus or WAN.
Lack of innovation from traditional networking vendors, who did not deliver on the promise of simplification with software-defined networking (SDN), has given more prominence to these new tools and technologies. However, having such a tool does not eliminate the need to improve all assets of IT including the network. If an IT organization investigates and deploys the best possible networking solution, then these tools can be used to solve more complex cross-domain network assurance challenges instead of doing simple configuration validations within one network domain, i.e. within a data center fabric.
As we enter 2017, SDN has matured into a powerful technology that gives network engineers the ability to address many of the network assurance challenges that are a result of the manual, box-by-box approach of legacy networking.
There are five commonly-mentioned workflow scenarios that are prone to human error. Big Switch’s Big Cloud Fabric introduces an unprecedented operational simplicity to networking and addresses each of these.
First, a quick overview of Big Cloud Fabric (BCF). This is a next-generation data center fabric that provides intelligence, agility and flexibility needed to architect a Software-Defined Data Center (SDDC). BCF is an integrated networking solution based on an SDN controller, open networking hardware (white-box / brite-box), and a design that allows a multi-rack network to be represented as one logical (big) switch.
Now, back to the five workflow scenarios:
1. “All my switches are running software version X that fixes vulnerability Y. However, yesterday I inserted a new switch running an older software version, forgot to upgrade and now that switch has been compromised.”
As you may have guessed, this scenario would never happen with Big Cloud Fabric because its Zero Touch Networking ensures that all switches are running the same software version. A fabric-wide upgrade that is completely coordinated by our SDN controller moves the entire fabric to a new software image in minutes and without service disruption.
2. “I’ve found a crude way to automate by loading templates, specific to network tiers. I have a configuration template for spines and another one for leafs, and have to keep track of template versions. My networking vendor also offered me an additional template lifecycle management tool, but somehow this whole thing feels brittle, clunky and still has a lot of room for admin error.”
Brittle and clunky? We could not agree more and unfortunately that’s what you get in a box-by-box network without an integrated controller. That is also the reason why it seems like deploying additional network assurance and validation tools is the only way to solve this problem.
With Big Cloud Fabric, there are no complex configurations or templates to maintain. All that is required to specify when a switch is admitted into the fabric is its MAC address, its fabric role (leaf or spine) and which rack it is a part of. The MAC address is easily found on the pull-out asset tag that comes with each open networking switch and is very similar to what comes with a modern server. No need to worry about packing slips or files with serial numbers. Additionally, BCF ensures that every new switch conforms to a defined fabric storm control setting via a storm control profile.
The rest is taken care of by BCF, meaning that any existing fabric configuration like QoS, multicast or security policies are applied to the switch depending on the workloads that come online. IP addresses to switches are assigned from a pool identified on the controller.
Figure 1:Admitting a new switch to Big Cloud Fabric
Figure 2:Big Cloud Fabric complete setup with controllers and switches
3. “Back to configuration management, I wanted to deploy a set of leaf switches from vendor X in one area (Pod-1) of the network and from vendor Y in another area (Pod-2). It was hard enough to maintain templates for the same vendor, now doing this across vendors is even harder. What if they make changes to their BGP configuration? What if they change defaults? What if I forget to configure jumbo MTU?”
When open networking switches (white-box / brite-box) are used as the foundation for Big Cloud Fabric, this configuration management problem becomes a thing of the past. Adding rack of switches is as easy as configuring the switch MAC addresses via controller GUI, CLI or REST API. Jumbo MTU is always pre-configured and can not be disabled in the entire fabric, which eliminates what previously could have been a problem source. The practicality of deploying switches from different vendors within the same fabric pod can be debated. While these switches have similar capabilities, we know that customers do not just deploy random hardware but validate specific fabric pod configuration as part of a PoC. For multi-vendor data center, Pod-1 can be built with Vendor X switches and, when more capacity is needed, a new pod can be built with the Vendor X or even a different Vendor Y’s switches. There are no vendor-specific templates to maintain as the BCF controller takes care of it automatically. Also, the controller will never admit a switch that is not on the hardware compatibility list -- so there is no room for human error in this instance.
4. “I mis-cabled a spine-leaf link and made it run between leafs in two different racks. This caused a major outage.”
As surprising as it is to hear at the end of 2016, this comes up quite often in our conversations with customers. Again, such a scenario is simply impossible with Big Cloud Fabric. I mentioned Zero Touch Networking above and what information we need to admit a switch into a fabric. Once we identify two switches belonging to the same rack, where a server would have dual connections to, we can automatically detect mis-cabling of the leaf switch to a different rack or the same ESXi server mistakenly connected to two different racks. BCF expects all leafs to be connected to all spines for a predictable network, so in the instance that one of the links never comes up or fails later on, the fabric would flag that error. So, there is nothing additional to configure, prevent or validate here.
Figure 3: Big Cloud Fabric error view
5. “I made a change and now I have an application connectivity issues. The only way I can figure out the problem is with good old ping and traceroute, which I run on every box.”
Big Cloud Fabric offers something much better than that. Using the Test Path feature, exact traffic path or reason for lack of connectivity between any two L2-L4 endpoints in the fabric can be found easily. This can be done from the BCF controller GUI, CLI or REST API. Additionally, a test can be executed from the OpenStack Horizon GUI or the VMware vSphere Web Client GUI with the help of our plug-in.
Now, let’s say we realized that a network policy is preventing communication. Instead of fixing it on multiple boxes, it can be fixed directly within the controller, therefore fixing the entire fabric.
Deploying network verification and assurance solutions does not mean you can ignore what lies underneath. To make the best use of these tools, one has to break free from the complexities of a traditional box-by-box network.
Feel free to reach out to us at email@example.com for more information on Big Cloud Fabric and join us on this journey as we continue to innovate and disrupt the status quo of networking. To Learn More: Big Cloud Fabric Architecture.
Principal Technical Marketing Engineer, Big Switch Networks