Slices and JUNOS

In the land of JUNOS, in the fires of Mount Doom. No, I’ll resist but I felt like I had Eye of Sauron on me. On a UNIX OS a concept of slices exist. The reason these exist are due to the resilient boot architecture of JUNOS. Having dual-boot partitions you have a designated backup copy that allows device boot when something occurs.

JUNOS will slice up your internal flash to separate or partition to ensure resiliency and stability. It also means you won’t lose everything in event of an issue. By default the EX flash is divided into four slices. Two identical copies of JUNOS are stored on slice 1 and slice 2. Slice 3 contains the contents of /var with slice 4 holding /config. Due to the high level of read  and writes to /var and the potential chance of corruption, isolation works a treat to avoid entire partition corruption.

Testing some failover mechanisms I was cutting the power to the EX. It generally cannot hurt and in this particular test I was pulling the plug out. I boot back on and I was greeted by this prompt.

** **
** ** 
** It is possible that the primary copy of JUNOS failed to boot up **
** properly, and so this device has booted from the backup copy. **
** **
** Please re-install JUNOS to recover the primary copy in case **
** it has been corrupted. **
** **

I kind of filled my pants. Before researching what had occurred I thought I had broken my new device. Luckily this wasn’t the case. What I had done is I had corrupted the primary boot slice. I had done no damage to the secondary and it booted from this. I powered off and restarted hoping it would work and I was met by the same screen. Not to worry. I noticed there was a red light on the chassis. I checked the system alarms.

[email protected]> show chassis alarms
1 alarms currently active
Alarm time Class Description
2013-02-18 09:34:21 PST Minor Host 0 Boot from backup root

Again it had booted from the backup. Time to discover what the onboard help had to offer.

After using help apropos I discovered the command that might save me.

request system snapshot media internal slice alternate

This command allows repair of the primary slice by copying the image from the backup to the primary. Then a reboot is needed to ensure your EX boots of the primary partition.

request system reboot slice alternate media internal

Now it should be happy days once you reboot. You have tested (unintentionally) your backup partition. After the reboot you can confirm JUNOS is installed correctly on each slice by issuing the following.

show system snapshot media internal slice 1
show system snapshot media internal slice 2

At the time I did honestly contemplate zeroing the device.. This restores it to a factory state and I had no issue copying my config back on. I thought though there would have to be a fix for production devices. I am glad I found it. Now with the worry gone I know I have a way to fix it and a way to fix it. If single chassis it will require a restart to clear the alarm but if running with dual RE’s or in a virtual chassis then you could shift the workload and active gateways. +1 to my neck beard skills.

Further reading : Understanding Resilient Dual-Root Partitions on Switches

3 thoughts on “Slices and JUNOS”

  1. I have this same prob.. two of the FPC’s have this error, but if I use request system reboot slice alternate media internal, it ‘fixes’ two but breaks FPC 0 he he any thoughts

    1. This is already three years old, but here goes. This is the way I understand it.

      Assuming you’ve actually fixed the damaged primary partitions as described in Pandom’s post (by copying the backup partition over the primary partition), you need to tell your VC which FPC to reboot on an alternate slice. If fpc0 is already booted from the primary partition, and you issue ‘request system reboot slice alternate media internal’, it’s going to switch to the backup slice. Note that in this case, the word ‘alternate’ simply indicates the slice it’s not currently using. If you’re on your primary slice, ‘alternate’ indicates the backup slice. If you’re on the backup slice, ‘alternate’ indicates your primary slice.

      When you have fpc1 and fpc2 booted from the primary partition, and fpc0 booted from the backup partition, you can issue:

      request system reboot slice alternate media internal member 0

Leave a Reply

Your email address will not be published. Required fields are marked *