mitigating bufferbloat

April 26, 2020

Tl;DR

When I upload something large from my laptop my network latency increases from 10 milliseconds up to 2-4 seconds. That makes remote ssh, audio, video and even browsing quite a challenging task. I avoid latency increase and network congestion by aggressively throttling all my device with this simple command from CAKE howto:

# tc qdisc del dev wl0 root
# tc qdisc add dev wl0 root cake bandwidth 1mbit conservative

Done! Now I almost never notice latency increase even if I upload A Lot of stuff.

More words

If you are experiencing severe networking lags (10x latency increase) when network is congested then you probably experience a bufferbloat effect where networking devices keep your packets in their large software/hardware queues.

These queues can hide anywhere: in your own networking card(!), in your wi-fi router, somewhere in the middle of The Internet. It is a bug in the implementation of those devices.

Excessive buffering usually happens on a last mile of your device connected to ISP. ISPs usually have a lot more bandwidth and smarter devices to avoid that effect.

Observing bufferbloat

Before doing anything about it let’s check if we have a problem:

$ ping 8.8.8.8
64 bytes from 8.8.8.8: icmp_seq=1 ttl=54 time=8.53 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=54 time=18.1 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=54 time=9.47 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=54 time=14.1 ms
...
# in a second terminal start uploader
$ scp /boot/vmlinuz slyfox@guppy.ia64.dev.gentoo.org:/tmp/my-kernel
vmlinuz 99% 7968KB 119.1KB/s   00:00 ETA
...
# in the first terminal:
64 bytes from 8.8.8.8: icmp_seq=100 ttl=54 time=116 ms
64 bytes from 8.8.8.8: icmp_seq=101 ttl=54 time=141 ms
64 bytes from 8.8.8.8: icmp_seq=102 ttl=54 time=192 ms
64 bytes from 8.8.8.8: icmp_seq=103 ttl=54 time=188 ms
64 bytes from 8.8.8.8: icmp_seq=104 ttl=54 time=182 ms
64 bytes from 8.8.8.8: icmp_seq=105 ttl=54 time=218 ms
64 bytes from 8.8.8.8: icmp_seq=166 ttl=54 time=223 ms
64 bytes from 8.8.8.8: icmp_seq=167 ttl=54 time=230 ms
64 bytes from 8.8.8.8: icmp_seq=168 ttl=54 time=232 ms
64 bytes from 8.8.8.8: icmp_seq=169 ttl=54 time=8.66 ms <--- recovered
64 bytes from 8.8.8.8: icmp_seq=170 ttl=54 time=9.21 ms
64 bytes from 8.8.8.8: icmp_seq=171 ttl=54 time=8.79 ms
64 bytes from 8.8.8.8: icmp_seq=172 ttl=54 time=9.02 ms
64 bytes from 8.8.8.8: icmp_seq=173 ttl=54 time=10.2 ms

You can notice where upload started and finished just by looking at the ping latency output.

There is a nice interactive website where you can profile your network in a slightly more rigorous way: https://www.dslreports.com/speedtest

My profile on default system: https://www.dslreports.com/speedtest/62838367

We can see a bunch of things outright:

These were my defaults during first test:

$ tc qdisc show dev wl0
qdisc mq 0: root
qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn

Now let’s artificially limit egress (outgoing) bandwidth:

# tc qdisc del dev wl0 root
# tc qdisc add dev wl0 root cake bandwidth 1mbit conservative

and retest again: https://www.dslreports.com/speedtest/62838672

Things changed:

Now let’s also try artificially limiting ingress:

#### egress as before
# tc qdisc del dev wl0 root
# tc qdisc add dev wl0 root cake bandwidth 1mbit conservative

#### ingress, a bit kludgy
# ip link add name ifb4wl0 type ifb
# tc qdisc del dev wl0 ingress
# tc qdisc add dev wl0 handle ffff: ingress
# tc qdisc del dev ifb4wl0 root
# tc qdisc add dev ifb4wl0 root cake bandwidth 16mbit conservative
# ip link set ifb4wl0 up
# tc filter add dev wl0 parent ffff: protocol all prio 10 u32 match u32 0 0 flowid 1:1 action mirred egress redirect dev ifb4wl0

and retest again: https://www.dslreports.com/speedtest/62838933

If you rerun the test you will notice that numbers will change from run to run in order of 10-20% (or more). This will not allow you to fine-tune the setup but is good enough to observe the effect.

More fine-grained testing

To get better numbers bufferbloat.net project suggests using flent command line tool.

For that you will need a peer host to talk to. I’m picking guppy.ia64.dev.gentoo.org 140ms (half a globe) away from me.

Here is the run on default qdisc:

#### remotely
guppy / # netserver -D

#### locally
localhost # tc qdisc show dev wl0
qdisc mq 0: root
qdisc fq_codel 0: parent :4 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :3 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: parent :1 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn

localhost # flent rrul -p all_scaled -l 60 -H guppy.ia64.dev.gentoo.org -t def-to-guppy -o def-filename.png
Started Flent 1.0.1 using Python 3.6.10.
Starting rrul test. Expected run time: 70 seconds.

image

There are 3 graphs:

Here is the run with cake both on ingress and egress:

#### remotely
guppy / # netserver -D

#### locally
localhost # tc qdisc show dev wl0
qdisc cake 800f: root refcnt 5 bandwidth 1Mbit diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms atm overhead 48
qdisc ingress ffff: parent ffff:fff1 ----------------

localhost # flent rrul -p all_scaled -l 60 -H guppy.ia64.dev.gentoo.org -t cake-to-guppy -o cake-filename.png
Started Flent 1.0.1 using Python 3.6.10.
Starting rrul test. Expected run time: 70 seconds.

image

3 graphs again:

Even though data is still quite noisy we can see that latency is lower in the second case.

Picking closer host and running test for longer time might yield more precise data.

Now you can tweak many more parameters to see what changes for better or worse: TCP congestion algorithms (I use bbr), many tweaks on man tc-cake’s side and many more.

I suggest skimming through more resources at https://www.bufferbloat.net/projects/ if you are interested in the topic.

Final test

Let’s get back to our original test (with ingress/egress cake enabled this time):

$ ping 8.8.8.8
64 bytes from 8.8.8.8: icmp_seq=1 ttl=54 time=9.31 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=54 time=9.20 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=54 time=8.37 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=54 time=10.6 ms
...
# in a second terminal start uploader
$ scp /boot/vmlinuz slyfox@guppy.ia64.dev.gentoo.org:/tmp/my-kernel
vmlinuz 99% 7968KB 104.6KB/s   00:00 ETA
...
# in the first terminal:
64 bytes from 8.8.8.8: icmp_seq=178 ttl=54 time=16.5 ms
64 bytes from 8.8.8.8: icmp_seq=179 ttl=54 time=14.8 ms
64 bytes from 8.8.8.8: icmp_seq=180 ttl=54 time=15.4 ms
64 bytes from 8.8.8.8: icmp_seq=181 ttl=54 time=10.0 ms
64 bytes from 8.8.8.8: icmp_seq=182 ttl=54 time=20.3 ms
64 bytes from 8.8.8.8: icmp_seq=183 ttl=54 time=9.16 ms
64 bytes from 8.8.8.8: icmp_seq=184 ttl=54 time=9.71 ms
64 bytes from 8.8.8.8: icmp_seq=185 ttl=54 time=17.5 ms
64 bytes from 8.8.8.8: icmp_seq=186 ttl=54 time=13.0 ms
64 bytes from 8.8.8.8: icmp_seq=224 ttl=54 time=19.9 ms
64 bytes from 8.8.8.8: icmp_seq=225 ttl=54 time=42.3 ms
64 bytes from 8.8.8.8: icmp_seq=226 ttl=54 time=13.3 ms
64 bytes from 8.8.8.8: icmp_seq=227 ttl=54 time=8.39 ms
64 bytes from 8.8.8.8: icmp_seq=230 ttl=54 time=11.4 ms
64 bytes from 8.8.8.8: icmp_seq=231 ttl=54 time=8.84 ms <--- recovered
64 bytes from 8.8.8.8: icmp_seq=232 ttl=54 time=8.63 ms
64 bytes from 8.8.8.8: icmp_seq=233 ttl=54 time=8.49 ms

Latency increased a bit from 10ms to ~20ms. But it does not get to 300ms. Yay!

Have fun!