Wednesday, April 14, 2010

Making changes to remote systems..

Every time we change something on the network, we take a risk. We may well have tested it in the lab/pre-prod environment, but there's always some risk. The nature of the change determines how much risk. It's why networks are most stable when the admin is on holiday. 

If it's on a device that you are standing next to, you know that if it goes wrong, you can always recover it manually. But what do you do if it's remote? Now with cisco routers and switches - as long as it's something in the config, you can usually get back by rebooting*, and the startup-config will be reloaded. But it's a little bit embarrassing to have to call the office and get someone to do this for you. Also, if you're following a sensible change control process, you've made the change out of hours when there's no-one there to do this.

A really useful command here is 'reload in ' - where x is a number of minutes. If you follow this procedure - you'll always know you can get back to where you were :

1) write mem the old, known working, config
2) enter the command 'reload in 20'       (or whatever interval you choose)
3) make the change.


console#telnet 192.168.1.250
Trying 192.168.1.250 ... Open


User Access Verification

Username: cisco
Password: 

ADMIN-SW#wr mem
Building configuration...
[OK]
ADMIN-SW#reload in 5
Reload scheduled in 5 minutes by cisco on vty0 (192.168.1.150)
Proceed with reload? [confirm]
ADMIN-SW#


***
*** --- SHUTDOWN in 0:05:00 ---
***
ADMIN-SW#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
ADMIN-SW(config)#int f0/4
ADMIN-SW(config-if)#sw acc vlan 2


Now, either the change works, and you maintain your access. Depending on the nature of the change, reconnect a new session to be sure. Then, when you're happy that it's not going to break again, enter 'reload cancel', and the scheduled reboot will be cancelled. If for some reason the change fails, then after 20 minutes, the device will reboot, and you can go back to it and work out why.

In this case, the change worked and I can reconnect :

console#telnet 192.168.1.250
Trying 192.168.1.250 ... Open


User Access Verification

Username: cisco
Password: 

Reload scheduled in 4 minutes and 30 seconds by cisco on vty0 (192.168.1.150)
ADMIN-SW#relo
ADMIN-SW#reload cancel
ADMIN-SW#


***
*** --- SHUTDOWN ABORTED ---
***

ADMIN-SW#
ADMIN-SW#


If that hadn't worked (or if I hadn't gone back in to cancel it!)  then after the five minutes the switch would reboot, and come back in its previous state. It's still an outage of course, and you still should have tested your change better, but at least you're able to recover, regroup, and try again. 

I ALWAYS do this before a change on a remote device, even if I can't imagine any way that this change can break anything. Why - because I've been bitten too many times by devices far far away.



*ASA/PIX in FO mode is different. I'll post on it another day.

0 comments: