If you're preparing for an exam right now, you probably need to know the full version of this - located here..
However in the real world there are ( in cisco land anyway ) 4 parameters that you typically use. There are two separate 'sets' - attributes which govern how traffic leaves your AS (i.e. routes which are members of other AS's - or outbound traffic); and attributes which govern how traffic enters your AS (i.e. they belong to your AS - inbound traffic).
Lets look at outbound traffic first. The two parameters we can change to alter this are weight and local preference. Weight is Cisco proprietary, and is local to the router (it is not exchanged with other devices). Local preference on the other hand is exchanged by routers within an AS (subject to normal iBGP rules on split horizon). The path with the lowest weight (best) will be selected over the path with the highest (best) local preference.
For inbound traffic you also have two options, MED (AKA metric), and path. MED has a few limitations - firstly - and most importantly - this attribute can be passed to (and within) a directly neighboring AS, but will be passed no further.
Let's take and example - you are connecting to the internet via two separate ISPs, and you want to control which ISP your inbound traffic is to come down, using the MED attribute. You set a MED of 100 on ISP A, and 200 on ISP B (lowest wins), so all the traffic will come down ISP A right? Wrong. ISP A will have a path with a MED of 100, and ISP B will have a path of 200, but they won't pass the MED to each other to compare. When the paths get advertised out to the rest of the internet, the MED will be stripped off, and the rest of the world will pay no attention to it.
What will actually determine* how this traffic enters your network is the AS path. Go back to our example above - both ISP A and B will have two paths to get to your network. One will your peering with them. It will have a path length (i.e. the number of AS's in the path) of 1 (your AS). Let's pretend ISP A and ISP B peer with each other directly and exchange all paths - so they will have a second path to get to you via the other ISP. This will have a path length of 2. Path length works on shortest path wins, so they will always send traffic down the direct peering. Their peers (and backbones) will see both paths, and depending on how they connect, will favour one over the other. Typically, lets say ISP A is a well peered tier 1 ISP, it will probably have a shorter path to most destinations than ISP B, who lets say is a small local ISP, which is often going to have longer paths to get to most destinations. This by itself will bring most (but not all) traffic down ISP A.
If you want to force a particular path, you can perform what's called 'AS Path pre-pending' (or sometimes path 'stuffing'). Simply put, we repeatedly prepend our own AS to the path that we want to make 'worst' - lets say that's ISP B. If our AS is 1234, then we send a path of <1234,1234,1234,1234,1234,1234> to ISP B, and a path of <1234> to ISP A. This will generally make A preferential to all but the most crazily connected peers. Even traffic generated from within ISP B will now travel over the peering to A and down that link (path length of 2 rather than 6).
MED does have a use though - let's say you have two connections to the same ISP, and want to choose one link (and we'll assume the ISP does nothing clever at their end) over the other. MED will work well, as it propagate the metric to the ISP's backbone, and within that backbone, control which path is seen as best.
For enterprise networks, this is a common use, although it's often can be a waste of time, as the ISP will often assume that everything you tell them is wrong, strip all attributes from your advertisements, and control this themselves. Having done the job, and knowing how many customers screw this up, it's not as crazy as it may seem. If you want to do this, make sure you have the conversation with the ISP about it first, and make sure they're not going to strip your attributes. In such a scenario - I'd generally still use path - it's harder to overwrite, and given the choice, best path will take preference over best MED.
Comments and corrections welcome!
* it can be a lot more complicated than this - ISP backbone teams spent half their time making sure traffic enters/leaves their network the way they want it to - i.e. the cheapest way. It's a form of warfare. Let's pretend for the sake of this example that it's simple.
The thoughts and ramblings of a roving network engineer, as he goes from network to network, finding and fixing pain...
Monday, March 29, 2010
Thursday, March 25, 2010
F5 LTM and tcp timouts
One of the dangers of being from a pure cisco background is assumption. You treat all devices as if they have the same defaults as 'normal' Cisco devices. I think I'm pretty good at avoiding this, but it gets us all sometimes.
As we all know, when you run long lived TCP connections through application aware devices, you need to ensure the connection is used. The classic problem is Oracle SQLnet through firewalls, where oracle sets up pools of TCP connections for later use, so that when it gets a burst of traffic it has the connections set up and doesn't need to waste precious milliseconds on TCP handshakes.
The problem comes, if they remain unused for longer than the TCP session timeout value of the firewall (typically 60 minutes), where the firewall silently drops the connections as being dead, but the client and server think they're still up. Next time one side or the other decides to send a little traffic on one, you get a 'broken socket' error.
This is normal behaviour, and needs to be taken into consideration whenever you're building systems/applications which connect through firewalls or load balancers.
Now, lets be clear. The answer to this issue is always setting a TCP keepalive. Send a packet every minute, and you will never have a problem. Ever. Really, do this. In fact, given you work with such a sensible bunch who will immediately implement this, there's no need to read on.
Back to F5. Interesting fact of the day, is when you use the F5 LTM for load balancing TCP connections, the default timeout is only 5 minutes - i.e. a TCP connection which does not send a packet for 301 seconds gets dropped. That's not that long, unlike the 60 minutes (3600 seconds) I have in my head from Cisco land. So the whole 'using a TCP keepalive' becomes even more important. Really, you are a crazy person if you don't use a keepalive in this situation. There's really no point reading on. You're not a crazy person, nor is your co-worker who's setting up the app right?
Still here? OK - there's two ways to know you have this issue - you'll see this as RST packets being sent back to the client from the F5 (which do not come from the 'real' server) when they send traffic on a timed out connection, and also you can see the current connection timers using the 'b conn client' command :
[dan@ltm01:Active] ~ # b conn client | grep tcp
CONN client 10.1.4.90:1873 server 10.31.8.10:https any protocol tcp age 216 - client: 10.1.4.90:1873
CONN client 10.1.4.90:1876 server 10.31.8.10:https any protocol tcp age 217 - client: 10.1.4.90:1876
CONN client 10.1.4.90:1877 server 10.31.8.10:https any protocol tcp age 238 - client: 10.1.4.90:1877
You can see even more detail using 'b conn client show all ' :
[dan@ltm01:Active] ~ # b conn client 10.1.4.90 show all
VIRTUAL 10.31.8.10:https <-> NODE any6:any TYPE any
CLIENTSIDE 10.1.4.90:1927 <-> 10.31.8.10:https
(pkts,bits) in = (71, 16867) out = (122, 139083)
SERVERSIDE any6:any <-> any6:any
(pkts,bits) in = (0, 0) out = (0, 0)
PROTOCOL tcp UNIT 1 IDLE 83 (300) LASTHOP 8 00:19:a9:f7:c0:00
If you do have a crazy person setting up your application, here's how you can be a network hero and 'fix the F5'. Write an iRule with the following content :
when SERVER_CONNECTED {
IP::idle_timeout 3600
}
and apply to the virtual server. This simply ups the timeout to 1 hr (obviously you can adjust the time to suit your environment). You can actually be quite granular, and set different values for different protocols, check the always useful http://devcentral.f5.com site for more detail, or see this excellent post..
To see the change :
Last point on this, as with most iRules, simply applying it to the virtual server doesn't immediately effect current connections. Because the rule starts with 'when SERVER_CONNECTED' - it'll be invoked when a new TCP connection is set up, and the F5 makes the backend connection to the server. You could probably fiddle with this to find other ways to tune when it's started.
As we all know, when you run long lived TCP connections through application aware devices, you need to ensure the connection is used. The classic problem is Oracle SQLnet through firewalls, where oracle sets up pools of TCP connections for later use, so that when it gets a burst of traffic it has the connections set up and doesn't need to waste precious milliseconds on TCP handshakes.
The problem comes, if they remain unused for longer than the TCP session timeout value of the firewall (typically 60 minutes), where the firewall silently drops the connections as being dead, but the client and server think they're still up. Next time one side or the other decides to send a little traffic on one, you get a 'broken socket' error.
This is normal behaviour, and needs to be taken into consideration whenever you're building systems/applications which connect through firewalls or load balancers.
Now, lets be clear. The answer to this issue is always setting a TCP keepalive. Send a packet every minute, and you will never have a problem. Ever. Really, do this. In fact, given you work with such a sensible bunch who will immediately implement this, there's no need to read on.
Back to F5. Interesting fact of the day, is when you use the F5 LTM for load balancing TCP connections, the default timeout is only 5 minutes - i.e. a TCP connection which does not send a packet for 301 seconds gets dropped. That's not that long, unlike the 60 minutes (3600 seconds) I have in my head from Cisco land. So the whole 'using a TCP keepalive' becomes even more important. Really, you are a crazy person if you don't use a keepalive in this situation. There's really no point reading on. You're not a crazy person, nor is your co-worker who's setting up the app right?
Still here? OK - there's two ways to know you have this issue - you'll see this as RST packets being sent back to the client from the F5 (which do not come from the 'real' server) when they send traffic on a timed out connection, and also you can see the current connection timers using the 'b conn client' command :
[dan@ltm01:Active] ~ # b conn client | grep tcp
CONN client 10.1.4.90:1873 server 10.31.8.10:https any protocol tcp age 216 - client: 10.1.4.90:1873
CONN client 10.1.4.90:1876 server 10.31.8.10:https any protocol tcp age 217 - client: 10.1.4.90:1876
CONN client 10.1.4.90:1877 server 10.31.8.10:https any protocol tcp age 238 - client: 10.1.4.90:1877
You can see even more detail using 'b conn client
[dan@ltm01:Active] ~ # b conn client 10.1.4.90 show all
VIRTUAL 10.31.8.10:https <-> NODE any6:any TYPE any
CLIENTSIDE 10.1.4.90:1927 <-> 10.31.8.10:https
(pkts,bits) in = (71, 16867) out = (122, 139083)
SERVERSIDE any6:any <-> any6:any
(pkts,bits) in = (0, 0) out = (0, 0)
PROTOCOL tcp UNIT 1 IDLE 83 (300) LASTHOP 8 00:19:a9:f7:c0:00
So you can now see what the actual timeout value is for the this connection (83 seconds used from a 300 second timer in this case). This is particularly hand as it shows you if your 'fix' has actually taken.
If you do have a crazy person setting up your application, here's how you can be a network hero and 'fix the F5'. Write an iRule with the following content :
when SERVER_CONNECTED {
IP::idle_timeout 3600
}
and apply to the virtual server. This simply ups the timeout to 1 hr (obviously you can adjust the time to suit your environment). You can actually be quite granular, and set different values for different protocols, check the always useful http://devcentral.f5.com site for more detail, or see this excellent post..
To see the change :
[dan@ltm01:Active] ~ # b conn client 10.1.4.90 show all
VIRTUAL 10.31.8.10:https <-> NODE any6:any TYPE any
CLIENTSIDE 10.1.4.90:1943 <-> 10.31.8.10:https
(pkts,bits) in = (83, 18157) out = (165, 188758)
SERVERSIDE any6:any <-> any6:any
(pkts,bits) in = (0, 0) out = (0, 0)
PROTOCOL tcp UNIT 1 IDLE 4 (3600) LASTHOP 8 00:19:a9:f7:c0:00
Last point on this, as with most iRules, simply applying it to the virtual server doesn't immediately effect current connections. Because the rule starts with 'when SERVER_CONNECTED' - it'll be invoked when a new TCP connection is set up, and the F5 makes the backend connection to the server. You could probably fiddle with this to find other ways to tune when it's started.
Monday, March 22, 2010
Peer Neighbor route
This is a useful feature for PPP links where the two ends are not on the same subnet. The most common scenario would be in the use of IP Unnumbered links (although the feature comes from the dial environment). In the scenario of two routers connected via a PPP link with nothing but a loopback (/32) and unnumbered link between them :
R4# sh ip int brief
Serial1/1 155.1.254.4 YES TFTP up up
Loopback10 155.1.254.4 YES NVRAM up up
R5#sh ip int brief
Serial1/1 155.1.254.5 YES TFTP up up
Loopback10 155.1.254.5 YES NVRAM up up
you would expect only one /32 route in each routing table, but :
R5# show ip route
155.1.0.0/32 is subnetted, 2 subnets
C 155.1.254.4 is directly connected, Serial1/1
C 155.1.254.5 is directly connected, Loopback10
R4#sh ip route
155.1.0.0/32 is subnetted, 2 subnets
C 155.1.254.4 is directly connected, Loopback10
C 155.1.254.5 is directly connected, Serial1/1
we’ve actually learnt the route from the other end via PPP. It only learns the route of the other end of the link – not remote connected networks. This behaviour is enabled by default, but can be disabled using R5(config-if)#no peer neighbor-route.
This can be debugged using deb ppp negotiation.
R4# sh ip int brief
Serial1/1 155.1.254.4 YES TFTP up up
Loopback10 155.1.254.4 YES NVRAM up up
R5#sh ip int brief
Serial1/1 155.1.254.5 YES TFTP up up
Loopback10 155.1.254.5 YES NVRAM up up
you would expect only one /32 route in each routing table, but :
R5# show ip route
155.1.0.0/32 is subnetted, 2 subnets
C 155.1.254.4 is directly connected, Serial1/1
C 155.1.254.5 is directly connected, Loopback10
R4#sh ip route
155.1.0.0/32 is subnetted, 2 subnets
C 155.1.254.4 is directly connected, Loopback10
C 155.1.254.5 is directly connected, Serial1/1
we’ve actually learnt the route from the other end via PPP. It only learns the route of the other end of the link – not remote connected networks. This behaviour is enabled by default, but can be disabled using R5(config-if)#no peer neighbor-route.
This can be debugged using deb ppp negotiation.
Tuesday, March 16, 2010
Back to the grind
I need to re-certify my CCIE by November. Back to the study again.. Plan so far is read up on my old notes, learn the new subjects, and give it a go and see how I get on. Apparently the new 4.0 exam is bit of a toughie, but hey, I've 6 months..
Let you know how I get on..
Let you know how I get on..
F5 Virtual Appliance
I've been slowly coming around to the F5 way of thinking for a while now. It took a while - mainly because I'm not a coder in any way - and while you slowly start to realize how amazing iRules are the more you use them, to a non-coder it often seems like an awfully complicated way of do some simple things. But you get used to it.
But F5 has two really good things which Cisco could learn from. Firstly the deventral.f5.com site. It's a community support forum - and while you can get help for all kinds of things F5 related, it's really about iRules. While most vendors have some equivalent, the great thing with devcentral is F5 encourage their finest geeks to actively post on it. You get really good answers, really quickly. I've been really impressed by it.
Second (and new) is they've done exactly what everyone is asking Cisco and Juniper to do - and released a performance crippled VM of their product. So you can lab it, play with it, learn the product well. I'm over the moon about this. Now the real test is what they do with the licensing. Currently it's a 90 day trial, which is renewable. You don't need special contracts or be a customer. You can just have one. I understand they intend to release a commercial version for long term dev/staging environments.
I love this. It means I can learn the product, mess with it, get to know it backwards. That is a good thing for me, for my employer, and for F5.
This is the opposite of what Cisco are doing. With the new restrictions on IOS licensing, dynamips/GNS3 will slowly become less useful, and I'm going to find it harder to learn the intricacies of their product.
I really do wonder at the complete disconnect that Cisco seems to have with their user base. It's kinda scary. They need to halve the number of marketing people they have, and start talking to their community. Learn a little from F5. They have the right idea on some things..
But F5 has two really good things which Cisco could learn from. Firstly the deventral.f5.com site. It's a community support forum - and while you can get help for all kinds of things F5 related, it's really about iRules. While most vendors have some equivalent, the great thing with devcentral is F5 encourage their finest geeks to actively post on it. You get really good answers, really quickly. I've been really impressed by it.
Second (and new) is they've done exactly what everyone is asking Cisco and Juniper to do - and released a performance crippled VM of their product. So you can lab it, play with it, learn the product well. I'm over the moon about this. Now the real test is what they do with the licensing. Currently it's a 90 day trial, which is renewable. You don't need special contracts or be a customer. You can just have one. I understand they intend to release a commercial version for long term dev/staging environments.
I love this. It means I can learn the product, mess with it, get to know it backwards. That is a good thing for me, for my employer, and for F5.
This is the opposite of what Cisco are doing. With the new restrictions on IOS licensing, dynamips/GNS3 will slowly become less useful, and I'm going to find it harder to learn the intricacies of their product.
I really do wonder at the complete disconnect that Cisco seems to have with their user base. It's kinda scary. They need to halve the number of marketing people they have, and start talking to their community. Learn a little from F5. They have the right idea on some things..
Subscribe to:
Posts (Atom)