100Gb network adapters are coming, said Jesper Brouer in his talk at the LCA 2015 kernel miniconference. Driving such adapters at their full wire speed is going to be a significant challenge for the Linux kernel; meeting that challenge is the subject of his current and future work. The good news is that Linux networking has gotten quite a bit faster as a result – even if there are still problems to be solved.
I agree that work needs to be done to reduce linux kernel overhead for high interrupt workloads. However I’m wondering if now wouldn’t be a good time to officially push to standardize on jumbo packets throughout the internet. 1500 byte packets have been holding us back for a while now in an ethernet standard that was set decades ago, and it creates lots of unnecessary load on modern CPUs, routers, and switching equipment.
Consider a 100MB video call or data stream that takes about 69k 1500B packets to transmit after factoring in packet overhead. This same 100MB stream would require only 11k 9000B jumbo packets. That’s 84% of the switching/interrupt load that is freed on each and every router in the path just by switching to jumbo packets. Even simple webpages these days are significantly larger than 1500bytes and would see tremendous benefit.
Edited 2015-01-23 01:02 UTC
Routers and switches aren’t generally affected by small frame sizes, as they don’t experience “interrupts” like end hosts do. Even at the very smallest packets sizes, most data center switches and Internet routers can handle line rate forwarding. CPU-based routers do have interrupts, but they’re generally used for things like branch office and don’t experience nearly the high bandwidth that would benefit from larger frame sizes.
End-host interfaces that interact with the Internet are pretty much stuck at 1500 bytes, as there’s too many paths between point A and point B that would squeeze the MTU back down to 1500 (and lots of networks break pMTU).
So that leaves back end connections like iSCSI, NFS, backups, etc. These are network that are controlled and known, so MTU of the end points isn’t a problem. On modern, multi-core CPUs and NICs with checksum offloading, vMDQs, etc., there hasn’t been much of an advantage to moving to jumbo frames. A lot of places don’t even bother any more.
That could change with the migration to 25 Gbit, 40 Gbit, 50 Gbit, and 100 Gbit networks. But for a while at least, we’re still going to be stuck with 1500 MTU for Internet-facing end-host interfaces.
Edited 2015-01-23 02:41 UTC
tony,
That’s not really accurate though. The routing table has to be consulted for every packet, which means it has a lot more work to do when packets are small.
At 1Gbps and 55 byte packets, a router would have to do 2.3M lookups per second. At 10Gbps, a router would have to do 22.7M lookups per second. At 100Gbps, a router would have to do 227.3M lookups per second.
A brand new Cisco 7600 router, which will run you about ~$5K can handle 16 Gb/s and 6.5 Mp/s of throughput per module. So without dividing the traffic among additional units, it would not even be able to handle a 10Gb/s stream full of small packets today, much less the 16Gb/s stream it’s rated for. Bandwidth isn’t the bottleneck, it’s packet routing speed. The bigger the packets, the more bandwidth a router can handle. By upping packet size to 9KB, this router could theoretically handle up to 468Gbps without touching it’s route lookup speed.
So while we could make routers that could handle arbitrarily small packets by over-provisioning the packet rate, that comes at exponentially greater cost. Conversely if increasing packet size can save costs, then maybe that’s what we should be doing.
I have to disagree. Firewalls, NAS drives, tablets, desktops, etc, these are all negatively affected by having to reassemble data fragments <1500B at a time. Yes we can add more transistors to offload this overhead, but if jumbo packets can actually help eliminate the overhead without throwing more processing power at the problem, then it’s a good thing IMHO.
Edited 2015-01-23 06:55 UTC
Actually it is accurate. Routers/switches tend to use CAM/TCAMs or something similar. (T)CAM is a type of memory that is special because it can do a lookup of a destination MAC, IP, or network prefix in a single clock cycle, no matter how big the forwarding table (Forwarding Information Base, FIB) is. Most TCAM ASICs are built so they can sustain forwarding even at the smallest packet sizes.
TCAMs are great for that, but they’re expensive and power hungry, which is why they’re only used in specialized networking equipment and typically only have a couple hundred thousand entries (instead of like, 2 GBytes, which could hold a lot more entries).
http://en.wikipedia.org/wiki/Content-addressable_memory
Hardware routers and switches use distributed forwarding in the various line cards. The supervisor module is responsible for learning routes and Layer 2 adjacencies via various protocols (OSPF, BGP, ARP) and then building a RIB (routing information base). The RIB is then compiled into a FIB (forwarding information base) that is downloaded into the line cards. For efficiency, a forwarding entry for a given destination IP, MAC, or network prefix is only installed on line cards that are connected to those particular networks.
The line cards then forward packets at line rate because every lookup checks the entire forwarding table in a single cycle, again because of the TCAM memory.
http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-se…
Exactly, which is why MTU is kept at 1500. Anything higher is either dropped or fragmented.
tony,
You are still overlooking the actual specs of these routers.
Here’s a 20Gbps model, it does 30mpps.
http://andovercg.com/datasheets/cisco-20G-ethernet-services-cards-7…
If you think it can saturate it’s link with 55byte packets, then you’d be wrong. 30mpps * 55B = 13.2Gbps. 13.2Gbps < 20Gbps Q.E.D.
So this statement was not really accurate: “Routers and switches aren’t generally affected by small frame sizes”.
Not that it really matters because because everything that’s bandwidth intensive should be using larger packets anyways. That’s the point, larger packets can dramatically decrease the load on these routers. Or put another way, larger packets let us multiply the bandwidth per core. So larger packets are an easy way to scale bandwidth without increasing a router’s processing power.
That’s precisely what needs to be fixed in order to support jumbo packets. It seems logical to deploy jumbo frames at the same time as IPv6. IPv6 will takes up more overhead and so do other tunnels like VPNs or GRE. Without a bump in packet size we’re actually loosing room for the actual payload over time.
Edited 2015-01-23 09:19 UTC
Fair point, but at 1500 bytes, that card can forward full line rate. It only runs into trouble at about the 650 packet size, assuming my math is correct. Jumbo frames wouldn’t help on that hardware. There’s no forwarding performance benefit to that router by moving to frames larger than 1500 bytes. And that router is fairly old technology as well.
[/q]
I think some of this disagreement might come with the terminology we’re using here. Typically when we talk about “processing power”, it’s in reference to a CPU. Modern switches and routers have CPUs (control plane) of course, but the CPUs don’t actually forward packets. Some smaller-end routers do of course (like a home router), but for the most part routers and switches have long moved to distributed forwarding.
Much older switches/routers did actually forward via a main CPU. Something like an older Catalyst switch with a “route processor” is an example.
Packet processing is something different, which is usually referred to as forwarding rate, and completely independent of a router/switch’s CPU. At 1500 bytes, most (modern) routers and switches are typically well within their line rate for forwarding rates. Jumbo frames wouldn’t help there.
Instead, jumbo frames are meant to help the end hosts, but again with the advent of checksum offloading (one of the larger CPU hits) and multi-core, jumbo frames haven’t typically provided a huge benefit at 1 or 10 Gbit like it might have for single-core, non-offloading 100 mbit and 1 Gigabit systems 10 or more years ago.
The performance implications for a CPU-based forwarder (such as a Linux host) are very, very different than performance implications for a router or switch that does distributed ASIC forwarding.
At this point for end hosts talking to the Internet, the rates of traffic isn’t really affected by the smaller MTU. Servers today are barely saturating 10 Gbit links, and often much of that is storage traffic (which can be jumbo, where as Internet facing cannot usually). My guess is a modern home system with a 300 Mbit Internet connection would have about zero benefit in terms of throughput or CPU overhead if it could communicate with the entire Internet at 9000 bytes versus the current ~1500 byte limit.
I haven’t run any tests, but tests like this from Chris Wahl show zero difference (and even performance drawbacks sometimes) to jumbo frames on 1 Gbit in a server environment. http://wahlnetwork.com/2013/03/25/do-jumbo-frames-improve-vmotion-p…
So for endpoints that communicate at even slower speeds, jumbo probably isn’t much help there.
That could change with speeds higher than 10 Gbit, however. But only for endpoints. Switches that are communicating
tony,
I’m glad you are seeing my point. Going back to what I was saying before, increasing the packet size allows us to get more bandwidth for the given processing power of a router core. In other words, it should make bandwidth (at 100Gbps and beyond) much cheaper.
On my own network Jumbo packets do make a noticeable difference with file transfers, etc. My desktop NIC does hardware offload but I don’t think my NAS drives or laptops do. It’s not just physical hardware that would benefit, even VM infrastructure (aka Amazon EC2) could benefit greatly by sending fewer/larger packets over more/smaller ones.
IMHO everything’s pointing to 1500B being too small for most of today’s payloads, it’s just seems like we ought to be moving in a direction that corrects this 1500B limitation rather than just compensating for it with hardware that can process more and more 1500B packets.
You say this, but the packets per second bottlenecks show up again when you aggregate the small packets from many peers. In order to support higher bandwidths, there are two options: continue investing in ever faster router cores to forward small packets, or just increase the packet size to something more appropriate for today’s large payloads.
The only benefit of 1500B is legacy compatibility. That is significant, but since network operators need to upgrade to IPv6 routers anyways, it’s a perfect time to finally overcome to the 1500B limitation as well.
It is not larger packet sizes that will make networking cheaper for routers and switches. The primary cost is the optics, signaling, cabling, etc., and getting signalling to higher and higher levels. Actual processing of packets is extremely cheap thanks to advances in merchant silicon like Broadcom’s Trident II (which can handle line rate at the smallest packet sizes).
Again, scaling and performance issues for routers/switches are vastly different than for servers/CPU-based routers.
So there is zero benefit to routers and switches to have larger frames (especially since a larger MTU doesn’t mean they won’t deal with much smaller frames anyway).
The 1500 byte limit isn’t a hardware limitation, as Most hardware deployed now can do >1500 bytes, it’s a limitation of both convention and MTU discovery. For larger than 1500 byte frames to work, every device in the path would need to be configured to handle larger frames, from core routers and switches, to the DSL and cable modems, to the server and client end points (the later which are almost always set to 1500 byte MTU). Plus, you’ve got far too many places that break pMTU, which would cause lots of traffic black holes.
For backhaul private networks, it’s possible to do jumbo. I think Internet2 does it. For the regular public Internet? There’s just little no benefit for the end hosts, and no benefit for the network devices in between.
Edited 2015-01-23 23:44 UTC
tony,
You’ve already acknowledged that smaller packets need higher powered routers for the same bandwidth. One of the reasons that higher bandwidths are difficult to achieve is because of the 1500B packet limit. I truly don’t understand why you are in denial about the benefits that jumbo packets offer.
You can read about what the US Department of Energy says about it:
http://fasterdata.es.net/network-tuning/mtu-issues/
Internet2 connects institutions of advanced learning, here’s what they say:
http://noc.net.internet2.edu/i2network/jumbo-frames.html
NASA makes the same points as I have:
http://www.nren.nasa.gov/jumboframes.html
We could stick with 1500B packets indefinitely on the internet for legacy reasons, but it will come at the expense of more expensive, more complex, less efficient networking hardware. And don’t pretend it’s insignificant, a 1500B router core has to be able to support about 6.1 times the load of a 9KB jumbo packet router to sustain the exact same payload bandwidth. Even if it is feasible, it doesn’t really make much engineering sense if we don’t have to.
I’m tired of repeating myself, so if you don’t want to accept this is the truth, then so be it.
Yes, which is why I think we should increase the MTU at the same time as switching to IPv6, it would be a lost opportunity to upgrade one and not the other. Of course, the main reason we don’t have either is because commercial ISPs don’t want to upgrade anything – but if they do upgrade one, I should hope they’d upgrade the other too.
Edited 2015-01-24 02:54 UTC
The point I conceded is actually irrelevant to our discussion. Because it’s not smaller packets (1500 byte) that stress older hardware, it’s tiny packets. 1500 byte packets are large, and are no problem for any modern (or even older) router/switch to push full line rate.
Enabling jumbo frames doesn’t eliminate smaller frames, either. It only allows packets to be larger than 1500. There will still be plenty of smaller than 1500 byte frames on a given network, even if the limit is set to 9000 bytes.
1500 vs 9000 byte frames are not an issue for modern or even semi-modern network routers and switches that use distributed forwarding. I don’t know why you’re in denial about this.
In the network world, it’s just not something we consider for gear performance. Most of the time when we raise MTUs it’s because of tunneling, like OTV, VXLAN, VPNs, etc. Often, we don’t like to do jumbo frames because it adds complexity to configuration, and makes it more likely an end-host will have problems because of an improperly configured MTU, and because the performance benefit is modest at best.
The only benefit at this point, (and it’s not near what you’re asserting), is the end-points on high-speed connections.
Speaking of benchmarking, here’s what we consider in the world of networking (unlike vague claims without benchmarks or methodology):
For 1 Gbit NICs, there was not much of a difference:
http://www.boche.net/blog/index.php/2011/01/24/jumbo-frames-compari…
http://wahlnetwork.com/2013/03/25/do-jumbo-frames-improve-vmotion-p…
Here’s Purdue (1 Gbit) http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=2770&context…
Results are a bit better, but still mixed (there’s also a section that shows how LRO and other optimizations help reduce CPU workload)
10 Gbit NICs with jumbo frames: Mostly single digit benefits: https://vstorage.wordpress.com/2013/12/09/jumbo-frames-performance-w…
This shows a bit better, but as the message size gets larger the benefit becomes smaller: http://longwhiteclouds.com/2013/09/10/the-great-jumbo-frames-debate…
You had mentioned sites like Amazon EC2 could “greatly” (your words) benefit from jumbo frames. Many sites do use jumbo frames on the backend, just not to the Internet. And as I’ve shown with actual benchmarks from various sources, the benefits are little to modest. Certainly not “greatly”
As am I. To sum up:
1) Network routers and switches don’t care between regular (1500) and jumbo (9000) byte frames. No performance difference for distributed forwarding devices (unless you count the increased latency you get with larger frames because of serialization delays).
2) Performance benefits for 1 Gbit is little to none, 10 Gbit is tepid to modest. Certainly not great. And this is for when an organization controls all end-points and the transit network (back-end data center connections).
Here’s more on why that is: http://codingrelic.geekhold.com/2011/12/requiem-for-jumbo-frames.ht…
3) Internet-wide large MTU would require every single device to change their MTU (to gain the benefits), and/or all of the sites that break pMTU to fix their stuff (to not have massive, massive Internet outages all over the place). That’s a spectacularly massive undertaking, for little performance benefit, and would be a huge disaster.
Here’s a Nanog discussion on it: https://www.nanog.org/sites/default/files/wednesday_general_steenber…
tony,
I get your rebuttal that any additional load arising from the use of small packets can be made to perform well by using beefier hardware. Yes I agree that taking load off the CPU is good, but whether it’s in silicon or software, it’s still significantly more load that needs to be handled. Moving the load into silicon doesn’t mean we should not also fix the MTU bottleneck. It’s not irrelevant that handing six times more load adds to the expense, complexity, heat output, energy consumption of every piece equipment in the network stack as well as endpoints (ie tablet/phone), it’s all part of the equation that points to 1500B packets not being sufficient.
Of course not all applications need Jumbo packets. Even bulk transfers will use randomly length packets at the end, and the TCP ACK responses will be tiny. Never the less, take a look at what consumes the bulk of bandwidth on the internet, it should be no surprise that video services are on top.
http://variety.com/2014/digital/news/netflix-youtube-bandwidth-usag…
1500B is absolutely miniscule relative to the payloads we are transferring. Even payloads as small as the osnews.com HTML homepage would benefit.
You do realize that your own links are showing jumbo packets help increase performance by 3-8% with an offload engine, right? That difference is likely to grow with faster links. As for the vmototion exceptions, I don’t use it, so it’s impossible for me to say, but I can speculate that maybe their drivers only support the offload engine when 1500B packets are used and disable the offload engine for jumbo packets.
Absolutely they benefit too. I’m looking at my virtual machines right now (which are KVM based), and the virtual adapters automatically use 64K packets for intra-VPS communications using far fewer packets. Sites that use jumbo frames internally are also proponents for using them externally, except that they aren’t supported.
Granted, legacy devices are the biggest impediment to full adoption. But given that 1500B-stuck end points still work just fine on Jumbo packet capable networks, we ought to be deploying jumbo packet capable equipment for all new installs. A lot of new consumer equipment sold in the past few years is already jumbo-packet capable, hopefully over time the internet & ISPs will catch up.
Edited 2015-01-24 20:33 UTC
That vmotion issue might have been a local problem, or maybe it was a VMotion issue that’s been fixed. VMWare itself recommended jumbo packets on ESXi4.x for performance.
http://www.vmware.com/pdf/vsphere4/r40/vsp_40_admin_guide.pdf
A later ESX benchmark shows jumbo packets do not hurt performance, they only help it. Maybe a more advanced offload engine can close the performance gap again, but it will invariably require more expense and complexity to build that silicone.
http://longwhiteclouds.com/2013/09/10/the-great-jumbo-frames-debate…
Even if we could assume the performance gap were fully closed, the next point is that hardware offload engines aren’t without their own problems:
http://www.symantec.com/business/support/index?page=content&id=TECH…
http://www.peerwisdom.org/2013/04/03/large-send-offload-and-network…
When we delegate parsing layer 3 & layer 4 packet structure to silicon, it creates problems for traffic using TCP/IP extensions. The widespread deployment of hardware offloading results in what the next link calls “TCP Calcification”. TCP extensions that could ordinarily be negotiated between internet peers is now avoided because of these offload engines:
http://codingrelic.geekhold.com/2011/12/requiem-for-jumbo-frames.ht…
So, with all this in mind, I’m hopping you will you reconsider the assertions that Jumbo Frames offer “zero benefit”. Likewise, while hardware offload engines may be useful in closing performance gaps caused by handling many small packets, they do have their own faults which should be considered.
Edited 2015-01-24 22:58 UTC
Certain types of (D)DOS-attacks use lots of small packets. It can be helpful for an attacker because it needs less bandwidth. But usually they’ll just use an amplification attack where by they sent a lot of data.
In IPv6 Path MTU Discovery is handled by the endpoints and thus supposed to be mandatory.
So when enough IPv6 has been deployed, so in theory we can increase the standard MTU pretty easily.
But then the fun starts, convincing manufacturers to change the default on their products.
Which won’t work well when you change the NIC before the switch.
So you probably want to start increasing the default MTU on switches first.
My understanding is that the issue has always been about error rates going up significantly with jumboframes. TCP/UDP checksums become less effective as the frame size increases, so the undetected error rate goes up fast. To conteract that you now need to do some other form of error checking (CRC or something), and that is generally more expensive for the general CPU only case than the packet overhead was… catch 22.
TSO (segment offloading in the NIC) takes care of the vast majority of the interrupt overhead (at the endpoints at least), and routers and whatnot have the hardware ICs to throw at the problem for the most part.
Not to say it isn’t still an issue, it is, but it isn’t as big of an issue as it once was and has been partially solved through other mechanisms.
galvanash,
That’s worth considering, however the computers I oversee are often left on weeks at a time with zero Ethernet errors. I can’t say errors are impossible, but in my experience with proper cabling it’s extremely rare. Granted, the story may be different across a WAN. Anyways it’s discussed here:
http://noahdavids.org/self_published/CRC_and_checksum.html
For what it’s worth, my cable modem shows this:
Total Unerrored Codewords 2113042807
Total Correctable Codewords 3437
Total Uncorrectable Codewords 16
So it looks like there were 3453 errors in 19 days, 99.5% of which were correctable.
It’s true that offload engines can help reduce the CPU overhead but we have to use the same sort of tricks in every component on the network to compensate for unnecessarily high packet rates. It just doesn’t seem ideal to me.
Consider that out of 1500 bytes that only about 65 are overhead, Jumbo frames are not a all that useful. Jumbo frames cant speed up the traffic. It can only cut down on the amount of overhead, which is at most about 5%. Add in the problems of error checking and equipment support, and you are doing a lot of work to pass a few more bytes.
In reality, you can go from about 100 MB/s to maybe 105 MB/s. Not really much of a gain. I would be much happier seeing 10 Gb getting cheap.
The idea behind jumbo frames isn’t to make the physical interconnect faster. (You can’t go faster than 100Mb/s on a 100Mb/s network)
But to reduce the impact of _cpu_, the network stack, the software processing layers, etc as a bottleneck.
That said, you’re right. Modern equipment (2 – 4 years old) won’t see much of a boost. But, if you’re in a situation where a 1-5% gain will let you limp existing hardware for another year or two without violating QOS constraints, then it may be worth going jumbo.
https://www.youtube.com/watch?v=3XG9-X777Jo
It’s about time Linux tried to catch up to BSD. It would be great to see Linux catch up to FreeBSD network performance, although I’m sure it will be hard work to get there.
I’d be happy to see 10G at the consumer level. One of many axes I have to grind with intel is that their chipsets still don’t support 10G natively. And the cost of 10G adapters is still silly high, the cost of 10G switches even more silly high.
1
Browser: Mozilla/4.0 (compatible; Synapse)