Campus Network Upgrade: Phase I
by Dick Wong
      
    A new upgrade of the campus network had already been started with Phase I (replace/upgrade the network switches at the core and distribution layers) being completed in 2015.
The existing network design adopted a traditional 3-tier hierarchical model comprising the core, distribution and access layers. The Phase I network upgrade immediately increased the campus core network switching capacity by 2.8 times, with ample upgrade potential to meet future demand. Full network redundancy was provided at the core and distribution layers such that switches at these layers could be replaced one by one with no service interruption.  Let us briefly describe the migration steps for these layers:
Upgrade one of the core switches
 
- Prepare to remove Core Switch 2. By adjusting Layer(L)3 link cost, all network traffic is diverted to Core Switch 1.
- Power down and dismantle Core Switch 2.
- Install new Core Switch 2. Verify that it resumes fully and becomes operational again.
 
Upgrade distribution layer switches
Distribution switches were virtualized by using a technology called VSS (Virtual Switching System), with which a pair of physical switches could operate as a single logical virtual switch.   
- Prepare to remove Distribution Switch Y. By adjusting L3 link cost and changing the priority of the Hot Standby Router Protocol (HSRP) per Virtual LAN (VLAN), e.g. Staff LAN, Student LAN, all L2/L3 network traffic is diverted to Distribution Switch X.
- Power down and dismantle Distribution Switch Y.
- Install new Distribution Switch Y in connection to new L2-core (VSS). Verify that it resumes fully and becomes operational again.
- Prepare to remove Distribution Switch X. By adjusting L3 link cost and changing the priority of the Hot Standby Router Protocol (HSRP) per VLAN, all L2/L3 network traffic is diverted to Distribution Switch Y. Power down and dismantle Distribution Switch X.
- Install new Distribution Switch X and form VSS with Distribution Switch Y (renamed as Distribution Switch VSS-X). Verify that it resumes fully and becomes operational again.
 
Upgrade distribution Layer switches serving departmental router which supports Student LAN
- Setup dual L3 uplinks for departmental student LAN. Adjust L3 link cost so as to divert all network traffic to one of the L3 uplinks.
- Power down and dismantle Distribution Switch Y.
- Upgrade Distribution Layer Switch Y and verify that both L3 uplinks resume fully and become operational again. Adjust L3 link cost so as to divert all network traffic to the other L3 uplink.
- Power down and dismantle Distribution Switch X.
- Install and form new Distribution Switch VSS-X. Verify that both L3 uplinks resume fully and become operational again.
 
Upgrade the remaining core switch
Replacement of the next remaining Layer-3 Core switch was initiated after the installation of all VSS Distribution switches were completed and tested.
- Prepare to remove Core Switch 1. By adjusting L3 link cost, all network traffic is diverted to Core Switch 2. Power down and dismantle Core Switch 1.
- Install new Core Switch 1.  Verify that it resumes fully and becomes operational again. 
 
   
Lessons learnt from Phase I 
A lot of valuable lessons were learnt and experiences gained during the deployment process as some of the problems we had faced were unexpected. 
- Tight schedule
Owing to the prolonged tendering process, the main part of Phase I campus network upgrade has to be completed in a relatively short time (during the Summer Term of 2015) by a small core team who carried out most of the design and installation work. To start the replacement of switches for the Phase I network upgrade, the health condition of each new switch for the core and distribution layers was closely monitored on the testing network first, and if satisfactory, it would then be installed one by one to replace the corresponding old switch, and was tested again in the production network. Each additional new switch being installed and tested in the production network will increase correspondingly the complexity and the time for the testing. With such tedious and repeated process, on average only 2 switches could be replaced per week, and we managed to replace all 16 switches (excluding the distribution layer for Wireless LAN and gateway) within two months.
- Zero service downtime
To minimize the interruption, delay, suspension of access, or unavailability of network services, the core or distribution layer could not be totally shutdown for replacement. Careful planning for diverting and resuming network traffic was critical to make the zero downtime possible. We started with a comprehensive cost and role calculating for the different layers to divert network traffic away from those old switches to be replaced. Although the upgrading process unavoidably had affected network traffic, we managed to control the change of traffic flow such that it caused no significant impact to network services.
- Cabling work
Due to the changing of network interface adaptor of distribution layer switches (from SC to LC), over 300 fibre patch cables were replaced during the upgrade. We managed to mobilize the extra manpower required for fibre cable re-testing.
- Software bugs
Cisco IOS (Internetwork Operating System) for the new core and distribution switches had bugs such as high CPU, unable to collect resource information and configure UTP interface. After reporting to the vendor, a new IOS version was officially released in Oct 2015 and fortunately it resolved all the major issues that CityU had encountered thus far during the new switches deployment.
Current status and future work 
With the completion of Phase I of the network upgrade, it not only enhances the overall switching throughput of the backbone network, but also improves its resiliency, availability and stability. The next phase of the network upgrade will begin in the coming summer whereby around 50% of the switches in the access layer will be replaced and selected new features will be implemented for this layer.