BGP
Best Path Selection Algorithm with examples
BGP is the
protocol used to announce prefixes throughout the internet. It’s a very robust
protocol, and very useful to carry lot of prefixes, such as the Internet
prefixes or internal client prefixes of an ISP.
When a
prefix is received in BGP, the path passes through two steps before being
chosen as candidate to populate the RIB.
The first
step consists on checking if the path is valid. If it is, the prefix will get into
the BGP table, and later the second step of selection will start.
In order to
pass this first check, the path must meet the following requirements:
- The prefix must not been marked as “not-synchronized”
- There must be a route in the RIB to reach the next-hop
- For prefixes learned through eBGP sessions, the local ASN must not be in the AS_PATH of the prefix
In the
second step, the best path to reach the prefix is selected. If there is only one path, no
comparison needed. If there are many paths to reach the prefix, there is a
special algorithm that BGP uses to select the best path, and this is what I
want to talk about.
This
algorithm dictates the following:
- Prefer the path with the highest WEIGHT
- Prefer the path with the highest LOCAL PREFERENCE
- Prefer the path that was locally originated via a network o redistribute command over aggregate-address command
- Prefer the path with the lowest AS_PATH
- Prefer the path with the lowest ORIGIN type
- Prefer the path with the lowest MULTI-EXIT DISCRIMINATOR (MED)
- Prefer eBGP over iBGP
- Prefer the path with the lowest IGP metric to the BGP next-hop
- When both path are external, prefer the one that was received first
- Prefer the route that comes from the BGP router with the lowest router ID
- If the originator or router ID is the same for multiple paths, prefer the path with the minimum cluster list length
- Prefer the path that comes from the lowest neighbor address
As you can
see, the selection process is quite long, although in most cases the
selection doesn’t go further than point 8.
Let’s study
points 1 through 8 and how we can influence them within the
following lab. The prefix we are going to be working with is 100.100.100.0/24,
announced by R4 and R6:
1.- PATH WITH HIGHEST WEIGHT
Weight is a
Cisco-specific attribute, that means it’s not standard. This attribute is
local to the router on witch it’s configured, so it’s not advertised with the prefix to other
peers. This attribute is used to tell the router which path to use to
reach the prefix. The highest value wins.
It’s the first attribute checked by BGP, so if there are two different paths for the same prefix but with different Weight values, the path with the highest value wins.
It’s the first attribute checked by BGP, so if there are two different paths for the same prefix but with different Weight values, the path with the highest value wins.
In the lab
scenario, R4 and R6 both announce the prefix 100.100.100.0/24, one through an
eBGP session and other through an iBGP session. Let’s check how R2 and R1 see
this prefix without changing anything:
R2#show ip
bgp
BGP table
version is 3, local router ID is 2.2.2.2
Status
codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin
codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
* 100.100.100.0/24 4.4.4.4 0 0 65002 i
*>i 6.6.6.6 0 100
0 i
R2#show ip
bgp 100.100.100.0/24
BGP routing
table entry for 100.100.100.0/24, version 3
Paths: (2
available, best #2, table default)
Advertised to update-groups:
13
16
65002
4.4.4.4 (metric 11) from 4.4.4.4 (4.4.4.4)
Origin IGP, metric 0, localpref 100,
valid, external
Local
6.6.6.6 (metric 11) from 6.6.6.6 (6.6.6.6)
Origin IGP, metric 0, localpref 100,
valid, internal, best
R2 gets two
paths for the prefix 100.100.100.0/24: one of them from an eBGP peer and the
other one from an iBGP peer. So R2 doesn’t choose the path through
the eBGP peer, as we could think initially as the Administrative
Distance for eBGP is less than for iBGP, but that’s not what really happens.
R2
picks the one from the iBGP peer as the best one, because as we will see
later, it’s the one with the shortest AS_PATH length. Both paths (through
R4 and through R6) have the same weight, local-preference and route origin. So
the tie-breaker is the shorter AS_PATH, that is the path through R6.
Let’s see
what happens when the weight parameter is configured on R2:
R2#conf term
R2(config)#router
bgp 65001
R2(config-router)#neig
4.4.4.4 weight 200
R2(config-router)#end
R2#clear ip
bgp 4.4.4.4
R2#sh ip bgp
BGP table
version is 4, local router ID is 2.2.2.2
Status
codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin
codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight
Path
*>
100.100.100.0/24 4.4.4.4
0 200 65002 i
* i 6.6.6.6 0 100
0 i
Now R2 takes
the path through R4. And it announces this path to R1 as its own choice, but we
said the weight attribute is not attached to the prefix, so if R1 had a BGP
session with R6, it would prefer the path through R6 as R2 did at the
beginning.
Let’s build
this BGP session between R1 and R6, and let’s see which path R1 chooses:
R1#sh ip bgp
sum
BGP router
identifier 1.1.1.1, local AS number 65001
....
Neighbor V AS MsgRcvd MsgSent TblVer
InQ OutQ Up/Down State/PfxRcd
2.2.2.2 4 65001 30
30 14 0
0 00:24:37 1
6.6.6.6 4 65001 4
3 14 0
0 00:00:31 1
R1#sh ip bgp
100.100.100.0/24
BGP routing
table entry for 100.100.100.0/24, version 14
Paths: (2
available, best #1, table default)
Not advertised to any peer
Local
6.6.6.6 (metric 21) from 6.6.6.6 (6.6.6.6)
Origin IGP, metric 0, localpref 100,
valid, internal, best
65002
4.4.4.4 (metric 21) from 2.2.2.2 (2.2.2.2)
Origin IGP, metric 0, localpref 100,
valid, internal
R1#
Although R2
prefers the path through R4, R1 prefers the path through R6 because it has a
shorter AS_PATH.
So as I said
before, the weight attribute only has local significance, and it’s not attached
to the prefix when announced via BGP.
2.- PATH WITH HIGHEST LOCAL-PREFERENCE
When all the
paths to the destination have the same weight value, the next attribute to be
checked is Local-Preference.
Local-preference
is a standard attribute, and it’s transmitted only between iBGP peers.
This
parameter is set to outgoing or incoming prefixes by using a route-map with the
peer. If there isn’t any statement matching a specific prefix inside the
route-map, the local-preference is set for all the prefixes outgoing or
incoming for that peer. The highest value wins.
Let’s get
back to the original scenario. R4, R3, and R6 are announcing the same
100.100.100.0/24 prefix. But, R3 is announcing this prefix with a local-preference
of 150:
R2#sh ip bgp
BGP table
version is 7, local router ID is 2.2.2.2
Status
codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r
RIB-failure, S Stale
Origin
codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*>i100.100.100.0/24
3.3.3.3 0 150
0 i
* 4.4.4.4 0 0 65002 i
* i 6.6.6.6 0 100
0 i
R2#sh ip bgp
100.100.100.0/24
BGP routing
table entry for 100.100.100.0/24, version 7
Paths: (3
available, best #1, table default)
Flag: 0x800
Advertised to update-groups:
13
18
Local, (Received from a RR-client)
3.3.3.3 (metric 11) from 3.3.3.3 (3.3.3.3)
Origin IGP, metric 0, localpref 150,
valid, internal, best
65002
4.4.4.4 (metric 11) from 4.4.4.4 (4.4.4.4)
Origin IGP, metric 0, localpref 100,
valid, external
Local, (Received from a RR-client)
6.6.6.6 (metric 11) from 6.6.6.6 (6.6.6.6)
Origin IGP, metric 0, localpref 100,
valid, internal
It makes R2
select the path through R3 as the best choice, and announce this choice to
other iBGP neighbors, as we can see in R1:
R1#sh ip bgp
100.100.100.0/24
BGP routing
table entry for 100.100.100.0/24, version 17
Paths: (1
available, best #1, table default)
Not advertised to any peer
Local
3.3.3.3 (metric 11) from 2.2.2.2 (2.2.2.2)
Origin IGP, metric 0, localpref 150, valid, internal, best
Originator: 3.3.3.3, Cluster list:
2.2.2.2
As we can
see, the value of Local-Preference is attached to the prefix.
In order to change this decision, we can configure a route-map in R2 with a higher local-preference value and apply it to the session with R6. After resetting the session with R6 on R2, the prefix announced by R6 will have the highest local-preference value, so R2 will choose this new path. At the same time it would be announced this way to their clients:
In order to change this decision, we can configure a route-map in R2 with a higher local-preference value and apply it to the session with R6. After resetting the session with R6 on R2, the prefix announced by R6 will have the highest local-preference value, so R2 will choose this new path. At the same time it would be announced this way to their clients:
R2#configure
t
R2(config)#route-map
LP-200
R2(config-route-map)#set
local-preference 200
R2(config-route-map)#exit
R2(config)#router
bgp 65001
R2(config-router)#neig
6.6.6.6 route-map LP-200 in
R2(config-router)#end
R2#clear ip
bgp 6.6.6.6
R2#sh ip bgp
BGP table
version is 8, local router ID is 2.2.2.2
Status
codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin
codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*>i100.100.100.0/24
6.6.6.6 0 200
0 i
* i 3.3.3.3 0 150
0 i
* 4.4.4.4 0 0 65002 i
R1#show ip
bgp 100.100.100.0/24
BGP routing
table entry for 100.100.100.0/24, version 18
Paths: (1
available, best #1, table default)
Not advertised to any peer
Local
6.6.6.6 (metric 21) from 2.2.2.2 (2.2.2.2)
Origin IGP, metric 0, localpref 200, valid, internal, best
Originator: 6.6.6.6, Cluster list:
2.2.2.2
A path without
LOCAL_PREF is considered to have the value that is set with the bgp default
local-preference command, or if this is not configured, a 100 by default.
3.- PATH LOCALLY ORIGINATED
This point
is reached if all of the above attributes have the same value for all the
feasible paths.
Local paths that are sourced by the network or redistribute commands are preferred over local aggregates that are sourced by the aggregate-address command.
Let’s get back to the original scenario.
Now R5 is announcing the prefix 100.100.100.0/30 to R3 using an iBGP session and R3 generates the bgp aggregated prefix 100.100.100.0/24 using the aggregate-address command, and also through the redistribution of its Loopback100 interface:
Local paths that are sourced by the network or redistribute commands are preferred over local aggregates that are sourced by the aggregate-address command.
Let’s get back to the original scenario.
Now R5 is announcing the prefix 100.100.100.0/30 to R3 using an iBGP session and R3 generates the bgp aggregated prefix 100.100.100.0/24 using the aggregate-address command, and also through the redistribution of its Loopback100 interface:
R3#show ip
bgp
BGP table
version is 4, local router ID is 3.3.3.3
Status
codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin
codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
s>i100.100.100.0/30
5.5.5.5 0 100
0 i
* 100.100.100.0/24 0.0.0.0 32768 i
*> 0.0.0.0 0 32768 ?
R3#sh ip bgp
100.100.100.0/24
BGP routing
table entry for 100.100.100.0/24, version 3
Paths: (2
available, best #2, table default)
Advertised to update-groups:
16
17
Local, (aggregated by 65001 3.3.3.3)
0.0.0.0 from 0.0.0.0 (3.3.3.3)
Origin IGP, localpref 100, weight 32768,
valid, aggregated, local, atomic-aggregate
Local
0.0.0.0 from 0.0.0.0 (3.3.3.3)
Origin incomplete, metric 0, localpref
100, weight 32768, valid, sourced, best
R3 prefers
the path originated via the redistribute command, instead of the one from the
aggregate command. And that path is the one announced to R2.
4.- PATH WITH SHORTEST AS_PATH
If none of
the above attributes break the tie and the router doesn’t have the prefix
locally generated, the next parameter to check is the AS_PATH attribute.
The AS_PATH is a well-known mandatory attribute. It means every prefix has this attribute attached, and every router must understand this attribute. The shorter this attribute is, the more preferable is the path.
The AS_PATH is a well-known mandatory attribute. It means every prefix has this attribute attached, and every router must understand this attribute. The shorter this attribute is, the more preferable is the path.
Let’s get
back again to the original scenario, with all already seen attributes set by
default.
In this
scenario, the prefix received from R4 has the longest AS_PATH because it’s an
eBGP session.
R2#sh ip bgp
BGP table
version is 61, local router ID is 2.2.2.2
Status
codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin
codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*>i100.100.100.0/24
6.6.6.6 0 100
0 i
* 4.4.4.4 0 0 65002 i>/pre>
That’s why
R2 prefers the iBGP prefix than the eBGP prefix.
The
manipulation of the AS_PATH attribute must be done in a eBGP session. Among
iBGP peers is not possible to manipulate the AS_PATH (you could hide it with the aggregate-address command, or to manipulate it
with confederations)
5.- PATH WITH LOWEST ORIGIN
Origin is
also a well-known mandatory attribute, like next-hop and as_path. So every BGP
prefix has this attribute.
There are 3 origin types: IGP, EGP and INCOMPLETE.
There are 3 origin types: IGP, EGP and INCOMPLETE.
IGP is more
preferable than Exterior Gateway Protocol (EGP), and EGP is more preferable than
INCOMPLETE.
Typically,
when a prefix is generated by the command network,
it gets the type IGP, and when it’s redistributed
from another protocol, it gets the type INCOMPLETE.
In our
scenario, R6 is generating the prefix 100.100.100.0/24 by redistributing it
Loopback100 interface:
R6#show
route-map
route-map
CONN, permit, sequence 10
Match clauses:
interface Loopback100
Set clauses:
Policy routing matches: 0 packets, 0 bytes
R6#conf term
R6(config)#router
bgp 65001
R6(config-router)#redistribute
connected route-map CONN
R6(config-router)#end
R6#clear ip
bgp
R2#sh ip bgp
100.100.100.0/24
BGP routing
table entry for 100.100.100.0/24, version 76
Paths: (3
available, best #1, table default)
Advertised to update-groups:
13
18
Local, (Received from a RR-client)
3.3.3.3 (metric 11) from 3.3.3.3 (3.3.3.3)
Origin IGP, metric 0, localpref 100,
valid, internal, best
Local, (Received from a RR-client)
6.6.6.6 (metric 11) from 6.6.6.6 (6.6.6.6)
Origin
incomplete, metric 0, localpref 100, valid, internal
65002
4.4.4.4 (metric 11) from 4.4.4.4 (4.4.4.4)
Origin IGP, metric 0, localpref 100,
valid, external
R2 prefers
the path through R3 because of the origin type.
In order to change the origin type, a route-map must be used:
In order to change the origin type, a route-map must be used:
R6#conf term
Enter
configuration commands, one per line.
End with CNTL/Z.
R6(config)#route-map
CONN
R6(config-route-map)#set
origin igp
R6(config-route-map)#end
R6# clear ip
bgp 2.2.2.2
R2#sh ip bgp
100.100.100.0/24
BGP routing
table entry for 100.100.100.0/24, version 76
Paths: (3
available, best #1, table default)
Advertised to update-groups:
13
18
Local, (Received from a RR-client)
6.6.6.6 (metric 11) from 6.6.6.6 (6.6.6.6)
Origin IGP,
metric 0, localpref 100, valid, internal, best
Local, (Received from a RR-client)
3.3.3.3 (metric 11) from 3.3.3.3 (3.3.3.3)
Origin IGP, metric 0, localpref 100,
valid, internal
65002
4.4.4.4 (metric 11) from 4.4.4.4 (4.4.4.4)
Origin IGP, metric 0, localpref 100,
valid, external
6.- PATH WITH THE LOWEST MED
MED
comparison only occurs if the first (the neighboring) AS is the same in the two
paths to compare. There are other implications (check this Cisco reference to know
more about this parameter)
It’s an
Optional Non-transitive Attribute, so it may not been passed to other
AS’s and its usage as a tie-breaker between several paths depends on each AS
policy. The lowest MED is the most preferable.
MED can be manipulated using a route-map:
MED can be manipulated using a route-map:
R3#conf term
R3(config)#route-map
MED
R3(config-route-map)#set
metric 20000
R3(config-route-map)#router
bgp 65001
R3(config-router)#neig
2.2.2.2 route-map MED out
R3(config-router)#end
R3#clear ip
bgp 2.2.2.2
R6#conf term
R6(config)#route-map
MED
R6(config-route-map)#set
metric 1000
R6(config-route-map)#exit
R6(config)#router
bgp 65001
R6(config-router)#neig
2.2.2.2 route-map MED out
R6(config-router)#end
R6#clear ip
bgp 2.2.2.2
R2#sh ip bgp
BGP table
version is 81, local router ID is 2.2.2.2
Status
codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes:
i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*
i100.100.100.0/24 3.3.3.3
2000 100 0 i
*>i 6.6.6.6 1000 100
0 i
* 4.4.4.4 0 0 65002 i
7.- PREFER EBGP OVER IBGP
We reached
the most interesting point.. From the first part of the post, we saw that the
path through R6, who it’s an iBGP peer, was preferred over the path through R4,
who is an eBGP peer.
This is because the fact that the route is learned via iBGP or eBGP is not considered until all the above attributes are equal. In that case, the prefix learned through an eBGP session is preferred over an iBGP session.
This is because the fact that the route is learned via iBGP or eBGP is not considered until all the above attributes are equal. In that case, the prefix learned through an eBGP session is preferred over an iBGP session.
In order to
try this, I have changed a little bit the scenario. Now R5 keeps an eBGP
session with R3, and it announces the prefix 100.100.100.0/24.
R4 has an
eBGP session with R2, and it announces also the prefix 100.100.100.0/24.
Between R2 and R3 there is an iBGP session, but R2 filters everything towards
R3.
In this
situation, we see that R2 gets two path for the prefix 100.100.100.0/24. Both
paths have the same attributes, but one of them is through an iBGP peer, and
the other one through an eBGP peer:
R2#sh ip bgp
BGP table
version is 84, local router ID is 2.2.2.2
Status
codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin
codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*
i100.100.100.0/24 5.5.5.5
0 100 0 65003 i
*> 4.4.4.4 0 0 65002 i
R2#sh ip bgp
100.100.100.0/24
BGP routing
table entry for 100.100.100.0/24, version 84
Paths: (2
available, best #2, table default)
Advertised to update-groups:
13
65003, (Received from a RR-client)
5.5.5.5 (metric 21) from 3.3.3.3 (3.3.3.3)
Origin IGP, metric 0, localpref 100,
valid, internal
65002
4.4.4.4 (metric 11) from 4.4.4.4 (4.4.4.4)
Origin IGP, metric 0, localpref 100,
valid, external, best
R2 prefers
the path through the eBGP peer, although it has another path through an iBGP
peer.
8.- PATH WITH LOWEST IGP METRIC
If all the
above attributes are equal and no path has been chosen yet, the next
parameter to check is the IGP cost to reach the different next-hops of the
prefix.
Getting back
to the original scenario, I changed the OSPF cost of R3′s loopback. Now only R6
and R3 are announcing the prefix 100.100.100.0/24:
R2#sh ip bgp
BGP table
version is 88, local router ID is 2.2.2.2
Status
codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin
codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*
i100.100.100.0/24 3.3.3.3
0 100 0 i
*>i 6.6.6.6 0 100
0 i
R2#sh ip bgp
100.100.100.0/24
BGP routing
table entry for 100.100.100.0/24, version 88
Paths: (2
available, best #2, table default)
Advertised to update-groups:
13
Local, (Received from a RR-client)
3.3.3.3 (metric 1010) from 3.3.3.3
(3.3.3.3)
Origin IGP, metric 0, localpref 100,
valid, internal
Local, (Received from a RR-client)
6.6.6.6 (metric 11) from 6.6.6.6 (6.6.6.6)
Origin IGP, metric 0, localpref 100,
valid, internal, best
R2#sh ip
route 3.3.3.3
Routing
entry for 3.3.3.3/32
Known via "ospf 1", distance 110, metric 1010, type intra area
Last update from 10.10.23.3 on Ethernet0/2,
00:00:47 ago
Routing Descriptor Blocks:
* 10.10.23.3, from 3.3.3.3, 00:00:47 ago, via
Ethernet0/2
Route metric is 1010, traffic share count
is 1
R2#sh ip
route 6.6.6.6
Routing
entry for 6.6.6.6/32
Known via "ospf 1", distance 110, metric 11, type intra area
Last update from 10.10.26.6 on Ethernet0/3,
05:23:31 ago
Routing Descriptor Blocks:
* 10.10.26.6, from 6.6.6.6, 05:23:31 ago, via
Ethernet0/3
Route metric is 11, traffic share count
is 1
R2 prefers
the path through R6 because the OSPF metric to reach that next-hop is smaller,
all the other parameters are exactly the same for both paths.