Friday, May 21, 2010

Will the CCIE lab become more accesible - and is this a good thing?

Rumours are flying that CCIE lab prices are about to drop. We've already seen the lab move out of their static locations, and the interface used by candidates means there is no need to be in the next room to the kit.

Taking the lab



In my day (an unimaginable 19 months ago) taking the CCIE lab meant checking the lab booking tool daily to try to get lab slots, waiting months then for test day, the best bit of €2k of spend (including travel/hotel/etc). You can look at this in two ways. The fact it was hard and expensive to get a slot meant you made very sure you were ready before going. Conversely, you couldn't just study until you felt ready, then go..

So what's changed



In reality the change is more approach than technology. Over the last year, Cisco have had 'mobile' test centers which meant they could expand the amount of testing they could do, without the limitation of the space in their offices. Accessing routers remotely is of course not a new thing. It makes no difference if you're in the next room or 1000 miles from your rack. Most CCIE candidates use remote labs as part of their training, so there is nothing new to them.

How far could they scale this



Well, they could really increase the capacity enormously. There are four things they need to scale up to deliver more tests :

1) Racks - This one is pretty easy.
2) Seats and screens - either using their relationship with prometric, or by opening up in Cisco offices around the world, this would not be a major problem.
3) Marking - this would need a little time to get trusted people with the right skills up to speed. Still, I'm sure they can do it easily enough with a little planning.
4) Proctors - now this is the challenge.


Proctors



For those who haven't done it, the CCIE lab is different to the rest of their tests (in many ways) in that a proctor is there to help you understand what's being asked. Wording can often be unclear - I know when I was in the lab I was up with the proctor constantly clarifying things. This is probably the hardest thing to scale. Options that jump to mind would include :

1) On-site proctors - if you've a permanent site, why not just hire another proctor?
2) Access via telepresence - you could have a 'farm' of proctors all available remotely. This would have a big advantage that if you have people doing voice, security, R&S in the same room - each can speak to a proctor who knows their track. That can be a big problem with the current system.. This is probably more cost effective as a proctor covering a small test site might not be overworked.
3) Mobile proctors - covering a number of sites making sure that each site is covered when it's running tests. More appropriate to part time sites, or sites which run different tracks on different days..

Either way, the issue is quite solvable.

Where does this leave us?



Well, if Cisco scaled up the ability to deliver tests, then there would probably be more cost effective. Darby Weaver is predicting the $1000 lab this year, and as well as the poor economy, this more efficient delivery method may be a driver for that.

Is it a good thing? One thing comes to mind. If it's easy and quick to get a lab (as it is with a 'normal' cisco exam), is this going to encourage people to 'go and give it a go' rather than putting time/money/effort into training/studying? Would more cheap attendances lead to increased braindumping? Is it going to mean candidates aren't going to put the same study effort in before going to the lab, and just hope to get lucky?

Or will it just mean the well prepared student can get a lab date when it suits them. Hopefully, it's the latter.

Thursday, May 20, 2010

Has IT certification become nothing more than a racket?

I've a funny relationship with 'authorized training' and certification. In one way, it is great that I know that within a reasonable period of time, I'll be able to attend a course which will give me a good grounding on 'how to drive' a particular technology. In Cisco land, the breadth of subjects that they cover is very impressive, they've made a real effort here. the certification process (especially the CCIE level exams in my opinion) drive us to better knowledge and to be better engineers.

On the other side - it is generally just an intro. My personal experience of these courses has generally been underwhelming, and I've hoped to come out with a lot better understanding than I actually did.

Who goes down the route of certification?

Speaking to people on these courses, there are typically three kinds of people who attend them :

1) End users of the equipment, who are buying an and as part of the budget are going on the training. They generally don't care about doing the exam.
2) People doing certifications for their own career (more applicable to the CCNA/CCNP/CCIE specific courses). Often very dedicated to get as much out of the experience as they can - although you can get braindump-bunnies too..
3) Partner employee's who have to get the certification asap as their employer is going to be audited by Cisco soon.

Audit time



Let's ignore the first two for the moment. Anyone who's ever worked for a Cisco partner knows audit time. It's when senior management work out that if they get two people with certification X, they'll get an extra percentage point or two from Cisco on everything they buy. It's big money actually, and can be the difference between winning and losing business, so is important. So if you make the mistake of walking near them at the time they have this revelation, you will volunteer to go and do the certification.

This leads to a lot of cheating. You see nobody cares how well the engineer knows the technology. That's another days problem. The key is to make sure that the certifications are in place by audit day. Quick is better than good. I don't like this (and let's be clear, it's not the way I do things!), but it's the way it is. It's a rotten part of the certification business, always has been and probably always will be.

how bad is it?



However this week I heard a story that made me realize how bad things have actually got. And I have to say, after all these years, it still shocked me. A friend of mine has recently attended a Cisco course, provided by one of the big UK authorized providers. The instructor started by handing out the official course guide, and a photocopy of the testking for the exam, and told the attendees to start reading through the questions during the evenings and come to him with any questions. Then he started going through the course notes.

Now as I've said before, I don't approve of the whole braindump thing. It's pointless, it devalues the work of those who make the effort to actually go and learn the technology, but mostly it turns the whole certification business into nothing more than a racket. But I didn't think we'd got to the point where there was such acceptance of the fact that most people on the course are just going to then just going to braindump the exam. Surely most of us want to do it properly right?

Summary



We have to decide what we want from a certification process. The problem is the partners who need certified staff, the training company, the testing firms, and Cisco themselves all do just fine out of the current system. It's us as engineers who are devalued and cheapened by a process which is meant to make us better at what we do, but actually just makes us rats in a process. If we don't approve, then expose the cheats, do it properly ourselves, and lets take the high ground and be better people as well as better engineers..

Wednesday, May 19, 2010

Troubleshooting: Dan's mistake of the week..

I've spent time over the last two weeks pulling my hair out as to why new traffic on an existing site to site VPN didn't work. Finally got it today, and reminded myself an important lesson in the process.

Scenario



Take a simple site-site internet VPN between two sites (well there are lots, but this about a VPN between two sites), and two networks in each site. London has the network 10.15.1.0/24 and 10.15.2.0/24. Dublin has 10.35.130.0/24 and 10.35.139.0/24. Each site has a pair of ASAs (up to date code).

Summary of the configuration here (it's not the real one for obvious reasons, and cut down to the key bits, but is accurate for the sake of the article) :

Dublin-PIX :

interface e0/0
nameif outside
security-level 0
ip address 35.35.35.35 255.255.255.0
!
interface e0/1
nameif inside
security-level 100
ip address 10.35.139.254 255.255.255.0
!
interface e0/1
nameif dmz
security-level 50
ip address 10.35.130.254 255.255.255.0
!
crypto map Dublin 24 match address Dublin-London
crypto map Dublin 24 set peer 15.15.15.15
crypto map Dublin 24 set transform-set AES-256VPN
crypto map Dublin interface outside
!
access-list Dublin-London ex per ip 10.35.130.0 255.255.252.0 10.15.0.0 255.255.0.0
access-list Dublin-London ex per ip 10.35.139.0 255.255.255.0 10.15.0.0 255.255.0.0
!
access-list NONAT extended per ip 10.35.130.0 255.255.252.0 10.0.0.0 255.0.0.0
access-list NONAT extended per ip 10.35.139.0 255.255.255.0 10.0.0.0 255.255.255.0
!
nat (inside) 0 access-list NONAT
nat (inside) 1 10.35.139.0 255.255.255.0
nat (dmz) 0 access-list NONAT
nat (dmz) 1 10.35.130.0 255.255.255.0
!
global (outside) 1 interface



London-PIX :

interface e0/0
nameif outside
security-level 0
ip address 15.15.15.15 255.255.255.0
!
interface e0/1
nameif inside
security-level 100
ip address 10.15.1.254 255.255.255.0
!
interface e0/1
nameif dmz
security-level 50
ip address 10.15.2.254 255.255.255.0
!
crypto map London 14 match address London-Dublin
crypto map London 14 set peer 35.35.35.35
crypto map London 14 set transform-set AES-256VPN
crypto map London interface outside
!
access-list London-Dublin extended permit ip 10.15.0.0 255.255.0.0 10.35.130.0 255.255.252.0
access-list London-Dublin extended permit ip 10.15.0.0 255.255.0.0 10.35.139.0 255.255.255.0
!
access-list NONAT extended permit ip 10.15.0.0 255.255.0.0 10.0.0.0 255.0.0.0
!
nat (inside) 0 access-list NONAT
nat (inside) 1 10.15.1.0 255.255.255.0
nat (dmz) 0 access-list NONAT
nat (dmz) 1 10.15.2.0 255.255.255.0
!
global (outside) 1 interface


Symptoms



The problem is that while traffic from 10.35.130.0/24 could get to machines in 10.15.1.0/24, traffic from 10.35.139.0/24 consistently could not. Running a packet-trace (at both ends) showed that it should work, and packets where leaving the 10.35.0.0 site, arriving at 10.15.0.0 site, the response packets never made it back. A capture on the inside interface in site 15 showed the server did respond. Rule access-lists are all correct..

To keep it simple - it's nothing to do with rules or the servers. I has probably never worked, but this is the first traffic to go between these two networks.

Simply - the packets from Dublin->London pass, but packets from London->Dublin don't.

What it could be



Once I've a clear definition of the problem, the next thing is to rule in or out the most likely causes.

The first thing that jumped to mind (and a common cause of issues with these symptoms) was that the NAT exemption wasn't set up correctly. Unless traffic is NAT exempted, it will NAT behind the interface IP, which will put it outside the VPN interesting traffic, and we would get these symptoms. However the packet-trace I did showed that the traffic was 'Allowed' to NAT exempt, so it wasn't that. I was actually still fairly convinced it could be, but eventually moved on.

Second thought was 'could it be badly written ACLs for the VPN definition', or even some incorrect routing. After lots of staring, answer was nope, none of them.

As sherlock homes (he's a British policeman) once said, 'when you've ruled out the probable, then whatever remains, however improbable, must be the answer'. So we're into the unlikely stuff. I spent hours looking for odd traffic handling quirks of the ASA. Tried a few things. Rebooted them. I was getting nowhere.


So what was it?



Have you worked it out yet (I'll be impressed with you if you have)?

I think you cross a line when you decide it isn't a probable cause. You decide it's going be something odd, then common sense leaves you and you start looking for crazy stuff. Don't get me wrong, sometimes it can be a bug, or a feature you don't understand properly, or something else crazy. But usually, it's something simple.

I showed you the same parts of the configuration that I focused on - the sections relevant to the connection in question. The sharp eyed amongst you will have noticed the sequence numbers in the crypto map may point to other entries prior to this one. Such as :


London :

crypto map London 9 match address London-Cork
crypto map London 9 set peer 9.9.9.9
crypto map London 9 set transform-set AES-256VPN
!
access-list London-SITE9 extended per ip 10.15.0.0 255.255.0.0 10.35.136.0 255.255.252.0

You see, the second octect in the IP scheme denotes country code (353 is ireland - shortened to 35), and for some reason in the distant past someone decided to use non contiguous addresses (I can only assume satan was whispering in their ear). When the (older) Cork connection was set up (which has 136,137,138 ranges) they simplified it by using 10.35.136.0 255.255.252.0. As this is a lower sequence number in the crypto map, it gets hit first, and the traffic never gets down to the ACL we want it to hit.

As with so many of these things, once you work it out, it's obvious and you're dumb. So where did I go wrong? My biggest mistake was immediately deciding (see the second sentence of this article) that the issue could only be to do with the configuration of this particular site-site VPN (i.e. sequence 14 of the crypto map), or something crazy and probably global. I didn't look at the rest of the crypto map at all. Why would I?

Dan's approach to troubleshooting

When you've spent many years troubleshooting networks, you learn not to beat yourself up over silly mistakes like this. You learn from them and move on. I'll certainly not make this mistake again. However to help you with the issues you've never encountered before, troubleshooting depends on being logical and methodical. It's so incredibly important, and normally something I take very seriously.

In this case, I wasn't as methodical or as logical as I should have been, and that's the real lesson I need to take away with me.


Friday, May 14, 2010

How do you interview for technical people?

Having been through a rather grilling interview process myself this week, I'm curious how people out there interview for technical staff, and how successful or not they find certain techniques to be?

The classical method of interviewing (either in person or over the phone), I've always found to be quite useless. It does have a 'weeding' effect of removing the completely clueless, but that's all. The problem is when you ask people to describe protocols/systems/methods etc you're testing their booksmart skills, not their real world ones.

Practical Tests



I've always loved the idea of a practical test, where some kit is set up in a 'mini lab' type setup, and a number of trouble tickets are given to the candidate. This will answer the 'can they troubleshoot' question! However it's quite time consuming to set up and grade - if you've ever tried to write these kinds of tests they actually take quite a lot of time to get clear and fair.

As an alternative - there is whiteboarding a problem, where the candidate is given a theoretical issue, which they need to talk through their troubleshooting techniques for. It's simpler to do for the interviewer, but can be quite daunting for people who are not used to public presentations.

Whiteboarding



In general, making someone answer an open question (e.g. show me on the board how OSPF works) will really test the depth of their knowledge - beyond being able to regurgitate the books they've read. However, again you have the problems of nerves and people unused to standing up and presenting. If you're not looking for 'public facing' staff, are you ruling out people based on a skill you don't need? Speaking to recruiters, this kind of method can rule out a lot of people!

Comments



I'm really interested to hear your comments on this one. What have you done (or had done to you!) over the years, and what do you think worked and didn't work!

Friday, May 7, 2010

Why ACS replication fails through a firewall..

Ever tried putting a Cisco ACS server on each side of an ASA, and getting them to replicate? By default it doesn't work, and it's bugged me for ages as to why. I finally discovered why today. It's one of those 'once it's explained it's flippin obvious' ones..

I am assuming of course that NAT and ACL entries are correct!


Why you get a problem



ACS uses port TCP2000 for replication traffic. Anyone think what else uses TCP2000? SCCP (Skinny) of course! And guess what is a default inspection on the ASA :


policy-map global_policy
 class inspection_default
  inspect dns migrated_dns_map_1
  inspect ftp
  inspect h323 h225
  inspect h323 ras
  inspect http
  inspect netbios
  inspect rsh
  inspect rtsp
  inspect skinny  
  inspect esmtp
  inspect sqlnet
  inspect sunrpc
  inspect tftp
  inspect sip
  inspect xdmcp
!


How to resolve the issue




By default, the ASA will inspect the traffic as if it's SCCP, will see that it's not valid SCCP traffic, and quietly drop it. You can stop this behaviour in two ways:

1) Disable the inspection completely if you're not using Cisco IPT.
2) Remove it from the default inspection list, and set up a separate class to match the traffic you DO want to inspect for SCCP, and inspect only it..


Example



Lets say two ACS servers, 1.1.1.1 and 2.2.2.2 need to replicate, and you do use SCCP on the network :


access-list NOT-ACS extended deny tcp host 1.1.1.1 host 2.2.2.2
access-list NOT-ACS extended permit ip any any
!

class-map  NOT-ACS
 match access-list  NOT-ACS
!

policy-map global_policy
 class inspection_default
  inspect dns migrated_dns_map_1
  inspect ftp
  inspect h323 h225
  inspect h323 ras
  inspect http
  inspect netbios
  inspect rsh
  inspect rtsp
  inspect esmtp
  inspect sqlnet
  inspect sunrpc
  inspect tftp
  inspect sip
  inspect xdmcp
 class NOT-ACS
  inspect skinny
!

Your ACS devices will replicate, and your SCCP still gets inspected..

Thanks to my old buddy Frank Gannon for spotting this one!

(updated to correct config error spotted by Robert)