Spanning tree is a system developed to prevent loops in layer two networks. Unlike on L3, frames do not have a TTL that decrements with each hop. To further complicate loop prevention on L2, the usage of broadcast frames is used very frequently. Although this is beneficial for the simplicity of the hosts on the L2 network, this does cause some added complication in loop prevention.
Why? 🤷🏻♂️
As the need for redundancy in networking grew, the desire to run multiple links between switches needed to be addressed. Immediately there is the issue of frames looping through the network. Now, switches don’t forward packets out the port the ingress on however, the frames that are broadcasted (either if they have a broadcast destination MAC or if the destination MAC is unknown to the switch) will steadily increase the amount of useless traffic in the network. This will continue until the network breaks. As is the knowledge of the networking cult: Thou shalt not loop thy network.
Fix? 🔨
The solution to this issue is to implement a loop detection and prevention system. Enter: Spanning Tree Protocol (STP).
STP uses something called Bridge Protocol Data Unit to communicate bridge information between switches. This system is distributed and the decision about which ports are up and which should be blocked to prevent loops is made by each switch individually, but based on the information it gathers from the surrounding switches. Now, because it is up to each switch to make this decision by themselves it is, in practice, often a headache to combine switches from different vendors since their implementation of the standard can vary.
A side note: This issue was something that we ran into at Dreamhack when the core switches were swapped from Cisco to Juniper, but the distribution switches were still Cisco. There was a difference in how they did STP, which would cause issues in such large L2 networks.
How does it work 🤔
The first step of the process is to determine which switch is to be the root switch. The root switch will be just that, the root of the tree that will be built together with the other switches. A general good rule of thumb is to use the core switch as you root since this will, most likely, give you the least trouble with STP. Which switch you select as your root switch is really only important when you want to do STP based redundant links between switches, but nonetheless, it is good practice to select the root correctly to help your future self should you ever wish to implement STP redundant links.
The root election is done like this:
- The switch is powered on and/or connected to a STP domain.
- The switch assumes it is the root of the domain and starts sending and receiving BPDUs with other switches.
- If the switch receives a BPDU that indicates that there is a different switch that has a better STP priority, then the switch will mark that switch as the root of the tree and then advertise that downstream to other switches.
- If there are two switches with the same STP priority in a STP-domain, then the switch with the lowest MAC-address will become the root switch.
For Cisco switches you can run the following command to view general STP information:
show spanning-tree
The switches can send two types of BPDUs to one another:
- 📝 Configuration BPDUs - These are used to elect the root bridge in the L2 domain and to inform neighboring switches of path cost, root bridge identifier, local bridge identifier and some other important information.
- 🚨 Topology change notification (TCN) BPDUs - These frames are used to notify all other switches whenever there is a change in the network. For example if a switchport goes from up to down, then the switch will notify all other switches through a TCN BPDU.
Stages of STP
The job of STP is to determine if a port should be used or not, in order to prevent loops from forming in the network. There are 6 port states that regulate what a port can do.
- X Disabled - Port is admin disabled.
- 🛑 Blocking - The port has been blocked by the STP process. This is usually to prevent a loop from forming in your network.
- 👂🏻 Listening - When a switchport comes up, this is the first state the port will be in. This allows the switch to listen for and send BPDUs on that port, which will allow the switch to shutdown the port before any data has been received or forwarded. This stage is by default 15 seconds, but can be modified using the
forward delay
. You can read more here. - 👩🏼🏫 Learning - After the listening stage is done the switch will continue with the learning stage, during which the switch will be able to update the MAC-address table based on the source address of the incoming frames. The switch also continues to communicate using BPDUs. This stage is also 15 seconds by default, but can be modified using the
forward delay
. You can read more here. - ➡️ Forwarding - The port is UP/UP and forwarding traffic. Everything is going swimmingly! :)
- ⛓️💥 Broken - There is a major issue, so the switch will discard all traffic, incoming and outgoing.
Using the default timers the time from the moment you connect a port yo when it starts being functional is 30 seconds. This is a rather long time, which is one of the reasons why you would want to tune tour STP deployment.
Port types
There are three types of ports in the STP domain:
- Root port (RP) - The port on the switch that uplinks to the root switch.
- Designated port (DP) - A port on the switch that connects to downlink switches.
- Blocking port - A port that is set to the blocking state. 😋
STP will set every port to it’s type based on what is connected to that port. There can only ever be one root port on a switch. On the root switch, all ports are designated ports.
Here is an example of viewing the port status and port type as far as STP is concerned:
show spanning-tree vlan 77
Different flavors 🍰
STP has evolved through out the years, mostly to match the scale of the ever growing L2 environments at campuses. The original STP standard (802.1D) does work well, as long as your network is a smaller size.
The next iteration to talk about is Per-VLAN Spanning Tree (PVST). Which basically builds a tree for each VLAN. This gives you some unique capabilities were you can have different root switches for different VLANs, allowing you to, for example, have a redundant STP link forward half of your VLANs across link A and the other half across link B, thus balancing the load across the two links. This may also be relevant if you have different gateways for the VLANs connected to different switches. PVST (and a different version PVST+) were Cisco proprietary and later got developed into Rapid Spanning Tree Protocol (RSTP).
RSTP
This differs a bit from the original STP standard in a few ways. First of, it builds a single tree for each VLAN, much like PVST. It is also design to be faster than regular old boring STP, hence the Rapid in Rapid Spanning Tree Protocol. This is done by omitting one of the stages of STP, namely the “listening” step, and in stead going straight into the learning stage. This takes the default minimum time needed for a switchport to come up from 30 seconds down to just 15 seconds.
Of course, you can modify the RSTP timers. In this case it would be the “forward-time” that sets the length of the learning stage. Default value is 15 seconds, but can be set to anything from 4-30 seconds:
MSTP
Because of the nature of the Per VLAN Spanning Tree protocols, there can occur scaling issues when you reach a certain level of STP domain size. This is because each time a topology change is triggered, all switches will need to recalculate the entire STP tree for each VLAN. The issue can be exacerbated if there are a large number of VLANs, in addition to the switch count.
Multiple Spanning Tree Protocol (MSTP) solves this by splitting the L2 domain into many Multiple Spanning Tree Instance (MSTI), where the tree is calculated for each instance, in stead of for each VLAN.