The Problem
I have been using an Asus RT-AC68U, followed by an RT-AC87U, running Merlin’s firmware with customised firewall scripts for the longest time. However, both units had a persistent issue with some (not all) sites being inaccessible, total resets and re-configuration from scratch regardless.
Having confirmed it was an issue with the router(s) and not the firmware nor firewall rules nor server-side blocks, and not being able to find a solution, I decided to just utilise a software firewall. One that I knew well and trusted was/is pfSense.
The Other Problem
At the very same time, I finally discovered that the boot failures of my server was actually due to the PSU (read other Amazon reviews citing similar fan-spin-up-then-dies failures). Having not had time to look at the frequently (and randomly rebooting server), I finally purchased whatever SFX module that was in stock at the local “IT complex” – another Silverstone SST-SX600-G unit… Crossing my fingers that the PSU was the culprit…
2018/06/04 Update: Nope, false hope again… Server is still rebooting rather “randomly” despite using a brand new Corsair SF600…
The Solution
Ignoring the irritating reboots and before I could do anything else, I wanted to ensure I minimised any (other) SPoFs as much as possible within a “reasonable” cost. I already had a UPS, and a IEEE 802.3ad (link aggregation) capable router, so I decided to spring for a 4-port GbE PCIe 2.0x x4 NIC to squeeze into the only PCIe slot on the ITX motherboard.
Bringing Up The Network
There were some initial issues with attempting to reliably bring up or down the interfaces, including the dreaded “A start job is running for raise network interfaces (x minutes of 5 mins n sec)“. A partial solution was to lower the DHCP timeout, but at the same time, I knew my “hackish” network configuration was also not good.
After much experimentation, my final, usable network configuration is as follows:
# This file describes the network interfaces available on your system # and how to activate them. For more information, see interfaces(5). source /etc/network/interfaces.d/* # The loopback network interface auto lo iface lo inet loopback ############################################## # Channel bonding eno1 and enp3s0 interfaces # ############################################## # set up bond0 auto bond0 iface bond0 inet manual hwaddress <HW MAC> bond-slaves enp3s0 enp0s31f6 bond-mode 802.3ad bond-lacp-rate fast bond-miimon 100 bond-xmit_hash_policy layer2+3 allow-hotplug enp3s0 iface enp3s0 inet manual bond-master bond0 allow-hotplug enp0s31f6 iface enp0s31f6 inet manual bond-master bond0 #################################################### # Channel bonding enp1s0f0 and enp1s0f1 interfaces # #################################################### # set up bond1 auto bond1 iface bond1 inet manual hwaddress <HW MAC> bond-slaves enp1s0f0 enp1s0f1 bond-mode 802.3ad bond-lacp-rate fast bond-miimon 100 bond-xmit_hash_policy layer2+3 allow-hotplug enp1s0f0 iface enp1s0f0 inet manual bond-master bond1 allow-hotplug enp1s0f1 iface enp1s0f1 inet manual bond-master bond1 #################################################### # Channel bonding enp1s0f2 and enp1s0f3 interfaces # #################################################### # set up bond2 auto bond2 iface bond2 inet manual hwaddress <HW MAC> bond-slaves enp1s0f2 enp1s0f3 bond-mode 802.3ad bond-lacp-rate fast bond-miimon 100 bond-xmit_hash_policy layer2+3 allow-hotplug enp1s0f2 iface enp1s0f2 inet manual bond-master bond2 allow-hotplug enp1s0f3 iface enp1s0f3 inet manual bond-master bond2 ######################################### # Temporary fix for specific interfaces # ######################################### auto br0 iface br0 inet static bridge_ports bond0 address <IP ADDRESS> netmask <SUBNET> gateway <GW ADDRESS> dns-nameservers <DNS ADDRESS> auto br1 iface br1 inet manual bridge_ports bond1 auto br2 iface br2 inet manual bridge_ports bond2
Once this was done, Ubuntu no longer “hung” for some time while booting, particularly when one or more cables were left unconnected.
Making Sense of pfSense…
Downloading and installing pfSense on a new VM (using KVM as the hypervisor) went pretty smoothly…
Obviously, if this was the only problem, then my life would have been much, much easier…
pfSense Not Making Much Sense…
Due to my wanting to “test” things first, I set up a LAN-only configuration (i.e. 1 NIC)… Attempting to start up pfSense immediately starting giving me problems after the initial installation. pfSense would literally hang when booting, stuck at the “Starting DNS Resolver”.
I tried using virtio devices. I tried e1000 NIC emulation. I tried disabling all hardware offloading (as per my NetGate Forum post), with both virtio and e1000 emulation. I tried praying and pleading…
Time was wasted attempting to find out what was wrong, and I finally narrowed it down to one thing: unbound.
My “work-around” was to, immediately after installation, disable the unbound service, then switching to the “old school” DNS forwarder instead. Once everything was finally set up correctly, I could finally switch back to using unbound.
So far, I have not discovered the cause of this issue, but since pfSense was finally set up and unbound no longer caused any more issues, I left it at that.
With the final use of virtio devices in the pfSense KVM, I had to disable all hardware offloading, thus my final /etc/network/interfaces
configuration on the Ubuntu KVM host ended up like this.
2019/03/03 Update: Due to a botched Ubuntu LTS 18.04 update that somehow nuked Internet access through the pfSense VM, I was forced to add in a backup link via a USB LTE modem, and also took a chance to clean up the
/etc/network/interfaces
file, with the following main changes:
- added
bond-updelay
andbond-downdelay
- set
bond-slaves
asnone
in the bondN interfaces to prevent race conditions- changed the
bond2
from802.3ad
(LAG/LACP) toactive-backup
mode (since the fibre modem does not support it)
2019/03/04 Update: Whilst troubleshooting the on-again, off-again WAN problem, I finally came across some clue that may have caused this all: Docker… In short, while Docker is installed, certain firewall (ipchains) rule/s may have been touched which may have caused this mess… The fix is to add the following to
/etc/sysctl.conf
to disablenetfilter
for bridge traffic:
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-arptables = 0
So now, the only problem was the reboots…