Software Firewall…

The Problem

I have been using an Asus RT-AC68U, followed by an RT-AC87U, running Merlin’s firmware with customised firewall scripts for the longest time. However, both units had a persistent issue with some (not all) sites being inaccessible, total resets and re-configuration from scratch regardless.

Having confirmed it was an issue with the router(s) and not the firmware nor firewall rules nor server-side blocks, and not being able to find a solution, I decided to just utilise a software firewall. One that I knew well and trusted was/is pfSense.

The Other Problem

At the very same time, I finally discovered that the boot failures of my server was actually due to the PSU (read other Amazon reviews citing similar fan-spin-up-then-dies failures). Having not had time to look at the frequently (and randomly rebooting server), I finally purchased whatever SFX module that was in stock at the local “IT complex” – another Silverstone SST-SX600-G unit… Crossing my fingers that the PSU was the culprit…

2018/06/04 Update: Nope, false hope again… Server is still rebooting rather “randomly” despite using a brand new Corsair SF600

The Solution

Ignoring the irritating reboots and before I could do anything else, I wanted to ensure I minimised any (other) SPoFs as much as possible within a “reasonable” cost. I already had a UPS, and a IEEE 802.3ad (link aggregation) capable router, so I decided to spring for a 4-port GbE PCIe 2.0x x4 NIC to squeeze into the only PCIe slot on the ITX motherboard.

Bringing Up The Network

There were some initial issues with attempting to reliably bring up or down the interfaces, including the dreaded “A start job is running for raise network interfaces (x minutes of 5 mins n sec)“. A partial solution was to lower the DHCP timeout, but at the same time, I knew my “hackish” network configuration was also not good.

After much experimentation, my final, usable network configuration is as follows:

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

##############################################
# Channel bonding eno1 and enp3s0 interfaces #
##############################################

# set up bond0
auto bond0
iface bond0 inet manual
hwaddress <HW MAC>
bond-slaves enp3s0 enp0s31f6
bond-mode 802.3ad
bond-lacp-rate fast
bond-miimon 100
bond-xmit_hash_policy layer2+3

allow-hotplug enp3s0
iface enp3s0 inet manual
bond-master bond0

allow-hotplug enp0s31f6
iface enp0s31f6 inet manual
bond-master bond0

####################################################
# Channel bonding enp1s0f0 and enp1s0f1 interfaces #
####################################################

# set up bond1
auto bond1
iface bond1 inet manual
hwaddress <HW MAC>
bond-slaves enp1s0f0 enp1s0f1
bond-mode 802.3ad
bond-lacp-rate fast
bond-miimon 100
bond-xmit_hash_policy layer2+3

allow-hotplug enp1s0f0
iface enp1s0f0 inet manual
bond-master bond1

allow-hotplug enp1s0f1
iface enp1s0f1 inet manual
bond-master bond1

####################################################
# Channel bonding enp1s0f2 and enp1s0f3 interfaces #
####################################################

# set up bond2
auto bond2
iface bond2 inet manual
hwaddress <HW MAC>
bond-slaves enp1s0f2 enp1s0f3
bond-mode 802.3ad
bond-lacp-rate fast
bond-miimon 100
bond-xmit_hash_policy layer2+3

allow-hotplug enp1s0f2
iface enp1s0f2 inet manual
bond-master bond2

allow-hotplug enp1s0f3
iface enp1s0f3 inet manual
bond-master bond2

#########################################
# Temporary fix for specific interfaces #
#########################################

auto br0
iface br0 inet static
bridge_ports bond0
address <IP ADDRESS>
netmask <SUBNET>
gateway <GW ADDRESS>
dns-nameservers <DNS ADDRESS>

auto br1
iface br1 inet manual
bridge_ports bond1

auto br2
iface br2 inet manual
bridge_ports bond2

Once this was done, Ubuntu no longer “hung” for some time while booting, particularly when one or more cables were left unconnected.

Making Sense of pfSense…

Downloading and installing pfSense on a new VM (using KVM as the hypervisor) went pretty smoothly…

Obviously, if this was the only problem, then my life would have been much, much easier…

pfSense Not Making Much Sense…

Due to my wanting to “test” things first, I set up a LAN-only configuration (i.e. 1 NIC)… Attempting to start up pfSense immediately starting giving me problems after the initial installation. pfSense would literally hang when booting, stuck at the “Starting DNS Resolver”.

I tried using virtio devices. I tried e1000 NIC emulation. I tried disabling all hardware offloading (as per my NetGate Forum post), with both virtio and e1000 emulation. I tried praying and pleading…

Time was wasted attempting to find out what was wrong, and I finally narrowed it down to one thing: unbound.

My “work-around” was to, immediately after installation, disable the unbound service, then switching to the “old school” DNS forwarder instead. Once everything was finally set up correctly, I could finally switch back to using unbound.

So far, I have not discovered the cause of this issue, but since pfSense was finally set up and unbound no longer caused any more issues, I left it at that.

With the final use of virtio devices in the pfSense KVM, I had to disable all hardware offloading, thus my final /etc/network/interfaces configuration on the Ubuntu KVM host ended up like this.

2019/03/03 Update: Due to a botched Ubuntu LTS 18.04 update that somehow nuked Internet access through the pfSense VM, I was forced to add in a backup link via a USB LTE modem, and also took a chance to clean up the /etc/network/interfaces file, with the following main changes:

  • added bond-updelay and bond-downdelay
  • set bond-slaves as none in the bondN interfaces to prevent race conditions
  • changed the bond2 from 802.3ad (LAG/LACP) to active-backup mode (since the fibre modem does not support it)

2019/03/04 Update: Whilst troubleshooting the on-again, off-again WAN problem, I finally came across some clue that may have caused this all: Docker… In short, while Docker is installed, certain firewall (ipchains) rule/s may have been touched which may have caused this mess… The fix is to add the following to /etc/sysctl.conf to disable netfilter for bridge traffic:

  • net.bridge.bridge-nf-call-iptables = 0
  • net.bridge.bridge-nf-call-ip6tables = 0
  • net.bridge.bridge-nf-call-arptables = 0

So now, the only problem was the reboots…

Leave a Reply