mitigating bufferbloat

April 26, 2020

Tl;DR

When I upload something large from my laptop my network latency increases from 10 milliseconds up to 2-4 seconds. That makes remote ssh, audio, video and even browsing quite a challenging task. I avoid latency increase and network congestion by aggressively throttling all my devices with this simple command from CAKE howto:

# tc qdisc del dev wl0 root
# tc qdisc add dev wl0 root cake bandwidth 1mbit conservative

Done! Now I almost never notice latency increase even if I upload A Lot of stuff.

More words

If you are experiencing severe networking lags (10x latency increase) when network is congested then you probably experience a bufferbloat effect where networking devices keep your packets in their large software/hardware queues. These queues can hide anywhere: in your own networking card(!), in your Wifi router, somewhere in the middle of The Internet. It is a bug in the implementation of those devices. Excessive buffering usually happens on a last mile of your device connected to ISP. ISPs usually have a lot more bandwidth and smarter devices to avoid that effect.

Observing `bufferbloat`

Before doing anything about it let’s check if we have a problem:

$ ping 8.8.8.8
64 bytes from 8.8.8.8: icmp_seq=1 ttl=54 time=8.53 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=54 time=18.1 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=54 time=9.47 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=54 time=14.1 ms
...

In a second terminal I start an upload:

$ scp /boot/vmlinuz slyfox@guppy.ia64.dev.gentoo.org:/tmp/my-kernel
vmlinuz 99% 7968KB 119.1KB/s   00:00 ETA

Back in the first terminal:

...
64 bytes from 8.8.8.8: icmp_seq=100 ttl=54 time=116 ms
64 bytes from 8.8.8.8: icmp_seq=101 ttl=54 time=141 ms
64 bytes from 8.8.8.8: icmp_seq=102 ttl=54 time=192 ms
64 bytes from 8.8.8.8: icmp_seq=103 ttl=54 time=188 ms
64 bytes from 8.8.8.8: icmp_seq=104 ttl=54 time=182 ms
64 bytes from 8.8.8.8: icmp_seq=105 ttl=54 time=218 ms
64 bytes from 8.8.8.8: icmp_seq=166 ttl=54 time=223 ms
64 bytes from 8.8.8.8: icmp_seq=167 ttl=54 time=230 ms
64 bytes from 8.8.8.8: icmp_seq=168 ttl=54 time=232 ms
64 bytes from 8.8.8.8: icmp_seq=169 ttl=54 time=8.66 ms <--- recovered
64 bytes from 8.8.8.8: icmp_seq=170 ttl=54 time=9.21 ms
64 bytes from 8.8.8.8: icmp_seq=171 ttl=54 time=8.79 ms
64 bytes from 8.8.8.8: icmp_seq=172 ttl=54 time=9.02 ms
64 bytes from 8.8.8.8: icmp_seq=173 ttl=54 time=10.2 ms

You can notice where upload started and finished just by looking at the ping latency output. There is a nice interactive website where you can profile your network in a slightly more rigorous way: https://www.dslreports.com/speedtest My profile on default system: https://www.dslreports.com/speedtest/62838367

We can see a bunch of things outright:

16.5 mbit/s download and 0.96 mbit/s upload speed
26ms of idle latency (to closest test server perhaps)
latency increases a bit when I congest network with download: 26ms -> 70ms (100ms worst case)
latency increases A Lot when I congest network with upload: 26ms -> ~1s (~2.6s worst case)

These were my defaults during first test:

$ tc qdisc show dev wl0
qdisc mq 0: root
qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn

Now let’s artificially limit egress (outgoing) bandwidth:

# tc qdisc del dev wl0 root
# tc qdisc add dev wl0 root cake bandwidth 1mbit conservative

and retest again: https://www.dslreports.com/speedtest/62838672

Things changed:

Speeds decreased slightly: 16.1 mbit/s upload, 0.811 mbit/s download. Mostly because I set conservative keyword to account for worst encapsulation overhead. You can experiment with more fine-grained values, but I don’t mind the result.
Download congestion can still impact latency increase. For some reason it’s more pronounced now: 140ms worst.
Upload congestion has almost no effect on latency! Average is 34ms, worst is 70ms!

Now let’s also try artificially limiting ingress:

#### egress as before
# tc qdisc del dev wl0 root
# tc qdisc add dev wl0 root cake bandwidth 1mbit conservative

#### ingress, a bit kludgy
# ip link add name ifb4wl0 type ifb
# tc qdisc del dev wl0 ingress
# tc qdisc add dev wl0 handle ffff: ingress
# tc qdisc del dev ifb4wl0 root
# tc qdisc add dev ifb4wl0 root cake bandwidth 16mbit conservative
# ip link set ifb4wl0 up
# tc filter add dev wl0 parent ffff: protocol all prio 10 u32 match u32 0 0 flowid 1:1 action mirred egress redirect dev ifb4wl0

and retest again: https://www.dslreports.com/speedtest/62838933

Speeds decreased even more: 11.27 mbit/s upload, 0.766 mbit/s download. I think the drastic drops upset TCP and it can’t recover faster.
Download congestion did not become any better. That might be a sign of bufferbloat on ISP side.
Upload congestion is still good.

If you rerun the test you will notice that numbers will change from run to run in order of 10-20% (or more). This will not allow you to fine-tune the setup but is good enough to observe the effect.

More fine-grained testing

To get better numbers bufferbloat.net project suggests using flent command line tool.

For that you will need a peer host to talk to. I’m picking guppy.ia64.dev.gentoo.org 140ms (half a globe) away from me. Here is the run on default qdisc:

#### remotely
guppy / # netserver -D

#### locally
localhost # tc qdisc show dev wl0
qdisc mq 0: root
qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn

localhost # flent rrul -p all_scaled -l 60 -H guppy.ia64.dev.gentoo.org -t def-to-guppy -o def-filename.png
Started Flent 1.0.1 using Python 3.6.10.
Starting rrul test. Expected run time: 70 seconds.

There are 3 graphs:

download speed: jumps around 2.5-5 mbit/s.
uplaod speed: very unstable, hard to say anything :)
latency graph: almost always 750ms, sometimes touches 500ms

Here is the run with cake both on ingress and egress:

#### remotely
guppy / # netserver -D

#### locally
localhost # tc qdisc show dev wl0
qdisc cake 800f: root refcnt 5 bandwidth 1Mbit diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms atm overhead 48
qdisc ingress ffff: parent ffff:fff1 ----------------

localhost # flent rrul -p all_scaled -l 60 -H guppy.ia64.dev.gentoo.org -t cake-to-guppy -o cake-filename.png
Started Flent 1.0.1 using Python 3.6.10.
Starting rrul test. Expected run time: 70 seconds.

3 graphs again:

download speed: jumps around 2-3 mbit/s.
uplaod speed: looks like 0.1mbit/s?
latency graph: most of the time 170ms, sometimes spikes up to 300ms

Even though data is still quite noisy we can see that latency is lower in the second case. Picking closer host and running test for longer time might yield more precise data. Now you can tweak many more parameters to see what changes for better or worse: TCP congestion algorithms (I use bbr), many tweaks on man tc-cake’s side and many more. I suggest skimming through more resources at https://www.bufferbloat.net/projects/ if you are interested in the topic.

Final test

Let’s get back to our original test (with ingress/egress cake enabled this time):

$ ping 8.8.8.8
64 bytes from 8.8.8.8: icmp_seq=1 ttl=54 time=9.31 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=54 time=9.20 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=54 time=8.37 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=54 time=10.6 ms
...

In a second terminal start uploader:

$ scp /boot/vmlinuz slyfox@guppy.ia64.dev.gentoo.org:/tmp/my-kernel
vmlinuz 99% 7968KB 104.6KB/s   00:00 ETA
...