Wednesday, July 28, 2010

EIGRP Feasible Successor routes

I always forget what these means in-between exams (I simply don't use EIGRP anywhere), but it is quite important to understand EIGRPs computation logic. The logic is this - when looking at possible 'backup' routes to hold ready for use, it's important to avoid the possibility of considering a path which has come from a neighbor which is going to send the packet back to you in the end - i.e. a routing loop. Without the 'global view' of link state protocols - you need to have a way to determine what inbound advertisements are 'safe' and which ones are 'maybe not safe'.

If you've a backup route which you know is safe, you can switch over really quickly in the event of the loss of your primary link. If you don't - you want to take a little time and be sure that you're not going to introduce a loop. You need to wait for the network to re-converge. If you've your head around that - the rest is just terminology.

First key term is 'Feasible Distance'. This is basically the metric of the current 'best' route (i.e. the live one), including any metric elements that were added by the local router. The 'whole cost' if you like. Let's make up a number and say it's 10,000.   'Reported Distance' of an inbound route advertisement is the metric that the neighbor is passing to us, before we add on any metric ourselves.

There are two possibilities for the Reported Distance, it's either more or less than our current 'best' metric - the Feasible Distance. If it's more, say 20,000, then it's possible that that neighbor is actually advertising our own advertisement back to us - i.e. a loop. Maybe it's not, but until the network has re-converged, we have to consider the possibility.  If the RD is less, then it is not possible that it's going back via us. This cannot be a loop, and therefore it's safe to quickly switch to this link.

Feasible Successors are quite simply an alternative path to the live route (the 'Successor'), which has a lower reported distance than the current feasible distance - i.e. backup routes which we know are not a loop. If a router looses it's main route, and has a feasible successor, it simply promotes it to the 'live' route, and sends out updates to it's neighbors to tell them the metric has changed.

If it doesn't have a feasible successor, then the router has no quick backup path, marks the route as 'active' (a counter intuitive tern which means the router is actively trying to work out how to get to the route in question), and starts querying it's neighbors to see if any of them know an way to get to the destination. Either it'll get an answer back with a loop free route, which will be installed as the new successor (i.e live route), or it'll give up and remove the route from the routing table..

Why EIGRP hello and hold timers don't have to match

Lets take a simple topology. Two routers, R1 and R2, on a common LAN. Assume basic EIGRP is up. One morning you decide to start 'optimizing' the protocol by adjusting timers, but get half way through it and get bored and go for tea :

(both)

router eigrp 100
 network 150.150.0.0




R1#sh run int f0/0
!
interface FastEthernet0/0
 ip address 150.150.36.3 255.255.255.128
!! default EIGRP timers of hello (5) and holdtime (15) apply
!
end




R2#sh run int f0/0
!
interface FastEthernet0/0
 ip address 150.150.36.126 255.255.255.0
 ip hello-interval eigrp 100 1
 ip hold-time eigrp 100 3
end

R1's hellos are only going to be sent out every 5 seconds, which is longer than R2's hold time of 3 seconds, so it's going to break right? No, actually it isn't. In EIGRP, when a router exchanges hellos with a neighbor, it looks at the timers in the inbound hello, and expects packets at that rate.

So in the scenario above, R1 knows to expect packets every second from R2, and applies a hold time of 3 seconds, even though it's sending it's own hellos out ever 5 seconds. Vice versa, even though R2's interface is configured with a hold-time of only 3 seconds, it knows to expect hellos from R1 every 5, and to apply a hold time of 15.

I don't know if this is a good thing, as it basically makes EIGRP very forgiving of bad configuration, but I've no doubt that if you look hard enough, you'll find neighbor relationships relying on this behavior. Fun eh!

Monday, July 26, 2010

How resilient does your DC network need to be?

We had a great conversation this weekend on packetpushers about how resilient you should make your datacentre switching. The guts of the conversation is on your core chassis devices, do you need to put in multiple supervisors/PSUs/ETC.

Now, the real answer to these kinds of questions is always a varient on 'it depends on the requirements'.
Ethan Banks was making the point that there are networks where a two second loss is a big deal. I guess we're primarily talking about trading houses/banks/call centres etc here, and in those places it's a fair point.

For pretty much anyone else, the first question I'd ask is 'How much money is it worth to avoid a 3 second loss once a year'. Why those numbers? Assuming you're doing something vaguely sensible with Rapid Spanning tree and/or your L3 routing protocol of choice, you should expect an unplanned device loss to be recovered in that time. Why once a year? If you're losing  devices more often than that, you probably have an environmental issue you need to fix first.

So what's the significance of the 3 second drop? Well, of course you need to test in your environment, but you should expect TCP connections to stay up (or you can tweak your TCP stacks to ensure they will). SMB transfers may drop, VOIP calls will drop. Storage calls will fail. These things should all recover*. I'm sure you can think of a few more things.

* Yeah I know the VOIP call won't 'recover', but you can call them back. As long as it's not a regular even, this is usually an acceptable risk. As for the rest, test your applications, and see what will recover, and what dies horribly. The 'dies horribly' list is a set of applications which are the drivers for 'hitless redundancy'. Make sure it's made clear to the business that these are the apps which are responsible for the extra cost.  
So once we have our understanding of consequence, we can take our techie hat off, and we need to start thinking like a business person. What is the consequence of the occasional drop worth to the business. Throw it out to your internal customers - probably some of them will jump up and down and tell you they can't tolerate any loss - it must be up all the time. That's fine. Make them come up with a number - what will this failure scenario cost. Compare this cost to the price from Cisco/Juniper/Whoever for the extra kit, and this is your business case, one way or the other.

One useful trick (and it depends on how internal financing works in your company) is you budget to build your network to a certain 'reasonable' level of resiliency - and if any particular application owner/customer needs more, then they pay for the extra. It's not just about being a smartypants, but making people understand that these extra uptime percentage points get expensive. There is a consequence to the company's finances for demanding them. Often, it might turn out to be a lot cheaper to re-engineer the application to learn to recover from a network failure.

Monday, July 19, 2010

The problem with net neutrality..

I've always considered myself in the 'yaa boo ISP opposition to net neutrality is bad' camp, but over the weekend I was speaking to a friend of mine who is a network engineer at a medium sized UK ISP, who all but hissed when I mentioned it.

Why all the fuss? The best way to look at it (and the thing that ISP's are worried about) is not so much the stuff that people are doing today, but the changes that are on the horizon to how media gets delivered. Do any of you remember the old days, when kids were respectful, policemen kept order, the summers where always hot, and music and books where things you bought in shops? God knows how people got them into their kindles, but hey.

To the ISP's these things are not huge deals. An eBook is a few hundred kilobytes, an album is say 18-25Mb. And the key point I guess is that they are one off transactions. A typical user is only going to download them occasionally.

TV and video are next. At the moment we go to the video store, or maybe occasionally rent videos from iTunes or whoever, but primarily we still watch live TV via traditional methods. The ISP business in the UK got a real wakeup call when the BBC launched their iPlayer application, which allowed people to ignore shedules, and watch their tv whenever they wanted. Unlike PVR type systems, the content is delivered in demand from the cloud, rather than the user storing the media locally.

This hurt their networks really badly. But what really scares them is the question of what happens when this becomes the mainstream. Fast forward to a few weeks ago when google announce Google TV. A set top box which allows people to scour the Internet for tv content (no doubt with the help of YouTube acting as a CDN). Moderatly scary. Then the fact that they've tied up with Sony to build it into their TVs.. Scary. Also the story around the campfire is that Sony can hit the switch on however many million PS3's are out there and turn them into into GoogleTV set top boxes. Oh and Microsoft can do something similar with the Xbox. Terrifying.

The Internet as its designed today (a unicast based, far away datacentre centric network) simply couldn't cope. It would die horribly. It simply can't do that huge sustained volume. And this is why they want to be able to limit it. They have to.

Now in reality, this is a when not if scenario. So in reality, someone (IETF?) needs to start planning for this, because it's perfectly possible to do, it just takes a different design than we have right now for how traffic gets for A to B across the Internet. Unlike when I say 'someone should do it' at work, i doubt it'll end up being me. However I do suspect CDNs will become extremely important.

The main conclusion I came to over the weekend was to buy some Akami stock! I know this will end up with big nasty corporate telephone companies controlling what we get access to, rather than the current system where Rupert Murdoch does, but hey, whoever said speech was free right...

Location:Dublin, Ireland

Wednesday, June 16, 2010

Which is the worlds best firewall? Windows Firewall of course..

Over the years I've often been asked 'Which is the best firewall?' My answer is usually the same - it's the one you know best.. Customers often think I'm just being a smart arse, but I'm not. So I thought I'd lay out the argument here..

How do Companies buy firewalls?


There have been lots of religious wars over firewall types. Who can forget the proxy vs statefull inspection of the nineties and early noughies. Vendors still forward the whole 'ours is best because of feature X' arguments.

As a result of these wars, many organisations now use a formulaic approach to vendor and product selection. You write down a list of the features you want, compare the products, and bingo, there is the selection.

It sounds logical and sensible as a method, although you could argue that it's main purpose is to assist in the justification of why the purchase was made after the fact.. Most people (subconsciously maybe) tilt the requirements towards the product they actually want, and vendors have become expert in planting requirements into peoples heads. But still you can say 'No, we didn't buy from them because my brother is the salesman, it's because it's the only firewall to protect against Mosaic vulnerabilities'..

How well do we know our firewalls?



Firewalls are complex beasts, and while most people learn quickly enough how to make a policy change, or set up a simple NAT, regardless of brand. However really understanding how it's processing the packet, the order and types of checks that are going on, and how to troubleshoot each aspect of the inspection logic is a rarer thing. For example, I would say that I know the PIX/ASA firewalls pretty well. I can set up complex features and troubleshoot them well. However give me a checkpoint box, and I can do a lot, but quickly get lost on advanced troubleshooting tasks.

This is hugely important, as the moment things start to go wrong with a firewall deployment, people quickly start to compromise their lofty security ideals to keep their traffic working. If we don't really fully understand exactly what that knob does, other than make the traffic work when we turn it off, we end up in a position where the firewall configuration is suboptimal - and we might not even know!

That's not to say we can make every feature work on our firewall of choice, but we probably have the experience and product knowledge to understand the consequence of turning off a feature. If I'm working on an ASA, I'm pretty confident that I can stand over my work and say 'Yes, this will protect exactly what you've asked me to protect'. Do it with a product I don't know so well, and I'm not so sure how it will perform.

So which is best, Cisco ASA or XP firewall?



I tweeted about this recently, and someone sent me (in jest) a comment 'so XP firewall is as good as a pix then?'.. Well, this could be the logical conclusion of the argument.

Let's say as an organisation you have a windows admin who is so good he can change the computer case colour by group policy, and is a passable cisco admin. A really good windows admin can close down windows so tight that you can be plugged into the same LAN as it and you will get nowhere (and no, I don't mean by powering it off).

So what's the best use of his time? If he spends his time making windows bulletproof, using all the tools he knows, he will have more effect than poking at a PIX he doesn't really understand..

Am I seriously recommending windows firewall over a PIX? No, I'm not, because I'm a Cisco guy. But my windows admin buddy probably will, and that's the point. The tools we know how to use, are the tools that we will have the most effect with.

Of course, as any security person will tell you, a layered approach is essential, but when you select your products for each layer, make sure it's the product which you can make fly. Otherwise, all you've bought is a bunch of problems and caveats you haven't seen before..

Location:Dublin, Ireland

Saturday, June 12, 2010

ASA new features in 8.3

Cisco quietly released the latest version of ASA code (8.3 which i must have missed the release of) and while checking through the new feature releases I noticed two key changes.

The first one is the option to use Globally Applied access lists. This is a big deal, as cisco firewall policies have always been interface based. It's one of the barriers that administrators of other devices find when starting to use PIX/ASA/IOS firewalls. The last couple of years have seen a couple of changes that indicated a move in that direction. We had zone based firewall (now the recommended way to firewall on IOS), where controls are applied between zones rather than interfaces. Now ASA (not PIX btw) move to global access lists.

Is it better? That's going to turn into a religious argument I suspect. Some people like the idea that you have one policy where rules are configured, and one place only. You can argue that this simplifies configuration, especially on devices that have multiple occasional administrators, as you don't need to look in multiple places to find where rules are configured.

Others (probably people like me who are very used to the PIX way of doing things) would argue that per-interface configuration means shorter more specific access lists per interface, meaning you've less to look through to find a rule - you just need to know what you're doing to make sure you look in the right place.. Time will tell what become the prevalent method, and it'll be interesting to see whether Cisco optimize performance towards one method or another (like ZBF being the 'optimised' method on ISR routers)..

The second big change is the 'simplification' of NAT.. I've not finished reading up on this yet, but the bit I don't like is the bit in the release notes where it says that legacy configuration will be automatically upgraded. That sounds like something which will need a lot of testing before we're all confident in making live upgrades.. For me, I'd rather see an option to use either method, just for a version or two.

Location:Fingrean Rd,,United Kingdom

Friday, May 21, 2010

Will the CCIE lab become more accesible - and is this a good thing?

Rumours are flying that CCIE lab prices are about to drop. We've already seen the lab move out of their static locations, and the interface used by candidates means there is no need to be in the next room to the kit.

Taking the lab



In my day (an unimaginable 19 months ago) taking the CCIE lab meant checking the lab booking tool daily to try to get lab slots, waiting months then for test day, the best bit of €2k of spend (including travel/hotel/etc). You can look at this in two ways. The fact it was hard and expensive to get a slot meant you made very sure you were ready before going. Conversely, you couldn't just study until you felt ready, then go..

So what's changed



In reality the change is more approach than technology. Over the last year, Cisco have had 'mobile' test centers which meant they could expand the amount of testing they could do, without the limitation of the space in their offices. Accessing routers remotely is of course not a new thing. It makes no difference if you're in the next room or 1000 miles from your rack. Most CCIE candidates use remote labs as part of their training, so there is nothing new to them.

How far could they scale this



Well, they could really increase the capacity enormously. There are four things they need to scale up to deliver more tests :

1) Racks - This one is pretty easy.
2) Seats and screens - either using their relationship with prometric, or by opening up in Cisco offices around the world, this would not be a major problem.
3) Marking - this would need a little time to get trusted people with the right skills up to speed. Still, I'm sure they can do it easily enough with a little planning.
4) Proctors - now this is the challenge.


Proctors



For those who haven't done it, the CCIE lab is different to the rest of their tests (in many ways) in that a proctor is there to help you understand what's being asked. Wording can often be unclear - I know when I was in the lab I was up with the proctor constantly clarifying things. This is probably the hardest thing to scale. Options that jump to mind would include :

1) On-site proctors - if you've a permanent site, why not just hire another proctor?
2) Access via telepresence - you could have a 'farm' of proctors all available remotely. This would have a big advantage that if you have people doing voice, security, R&S in the same room - each can speak to a proctor who knows their track. That can be a big problem with the current system.. This is probably more cost effective as a proctor covering a small test site might not be overworked.
3) Mobile proctors - covering a number of sites making sure that each site is covered when it's running tests. More appropriate to part time sites, or sites which run different tracks on different days..

Either way, the issue is quite solvable.

Where does this leave us?



Well, if Cisco scaled up the ability to deliver tests, then there would probably be more cost effective. Darby Weaver is predicting the $1000 lab this year, and as well as the poor economy, this more efficient delivery method may be a driver for that.

Is it a good thing? One thing comes to mind. If it's easy and quick to get a lab (as it is with a 'normal' cisco exam), is this going to encourage people to 'go and give it a go' rather than putting time/money/effort into training/studying? Would more cheap attendances lead to increased braindumping? Is it going to mean candidates aren't going to put the same study effort in before going to the lab, and just hope to get lucky?

Or will it just mean the well prepared student can get a lab date when it suits them. Hopefully, it's the latter.