There are a number of different areas in the Detective that are useful for troubleshooting a faulty system. Troubleshooting information can be found in the following locations:
• Server > (Selected Server) > Troubleshooting
• Workstation > (Selected Workstation) > Troubleshooting
• Training > Personal Trainer > (Selected Series) > (Selected system)
• Administration > Troubleshooting
• Administration > Hardware Administration
When troubleshooting a faulty workstation or server go to the chosen system’s landing page (Server/Workstation > (Chosen System) > Troubleshooting). Click on the Troubleshooting link in the lower left corner. This will bring up the Troubleshooting Landing Page. The page will generally be divided into the following five subsections.
Note: Content for each system in the Detective may differ depending on firmware and options. The specific information below is drawn from the Sun Fire V880. Content for other systems may vary slightly.
Main System Troubleshooting
This subsection has a number of troubleshooting techniques and tips for the chosen system. You will find “System Status LED” interpretation and troubleshooting information for each of the system’s major components. The list of links will change depending on the system selected. When troubleshooting a system utilize this section first. While the other troubleshooting sections will be helpful this specific section will typically contain the most relevant information. Other troubleshooting sections often contain greater detailed information but are of a more generic nature.
This subsection contains specific information about the firmware used on the chosen system. Firmware commands and tests can be very useful in diagnosing and isolating a system fault. Firmware commands can often be used to disable and enable hardware components, this can be very useful in isolating intermittent faults. If the integrity of the firmware is compromised by improper settings, this section will guide you in the proper procedure to reset the firmware to its default settings. Features such as ASR (Automatic System Recovery), Device Identifiers, OBP are discussed in this section.
This subsection contains information about some if the more generic tools that are useful in troubleshooting the system. This section is generic throughout the Detective and some items may not apply to the chosen system. Other items may not be installed during a generic OS installation. SunVTS is discussed in this section. SunVTS is a test and verification suite of test that is included with the installation disks. These test are generally not installed during a standard installation. The procedure explaining how to install and use SunVTS is described in this section.. While the information in this subsection is generic in nature, it can be extremely useful in isolating and repairing a faulty system.
This section contains detailed generic troubleshooting information about several major components. This section covers the difference between correctable and uncorrectable memory and disk errors, and how to approach them. You will find techniques such as how to troubleshoot a SCSI hard drive and isolate the drive from the SCSI bus. Different methods of recovering data from a faulty disk drive are documented. The information in th section will provide you with a strong troubleshooting foundation. We suggest that the information in this section be read as a precursor to actually troubleshooting the system.
This subsection contains advanced troubleshooting techniques that should be used as a last resort. System panics are discussed in this section. System panics occur when the system crashes. The system can be configured to create a memory and register dump after a system crash. This information can be used to help identify the root cause of the system failure. Information is this section is very technical and high level. If you rally want to dig into the architecture of the system the information in this subsystem will keep you entertained.
Personal Trainer Troubleshooting
The second method of obtaining troubleshooting information can be found through the “Personal Trainer”. The “Personal Trainer” is organized in a manner that is conducive to e-learning as opposed to the standard reference organization found throughout the Detective. The information in this section is identical to the troubleshooting information in the main troubleshooting section previously explained. If you wish to learn about a system in a relaxed e-Learning environment, use this section.
After clicking on the Administration tab you will see a “Troubleshooting” subsystem. The troubleshooting links in this section are targeted for the system administrator or anyone who does not want to physically handle the hardware. Some of the more useful subsections include the following:
This subsection covers the different LOM (Lights Out Management) iterations used by Sun in their different hardware platforms. LOM can be very helpful in isolating faults both locally and remotely. There are a number of different LOM commands that display the system’s status and configuration parameters. Below is a small list of some of the more commonly used LOM commands and their function.
||Controls the way the system boots through the OpenBoot PROM.
||Drops the host system from Solaris into OpenBoot PROM or kadb.
||Clears components and devices that have been added to the blacklist. Re-enables every component, despite being manually disabled or automatically disabled during a Power-On Self-Test.
||Clears faults within the system as identified when the showfault -v Command is used so the repair can be done manually
||Displays the host System Console output buffers.
||Disables defined components through a blacklisting system. Those components added to the blacklist cannot be re-enable until the enablecomponent Command is issued and a power cycle or reset occurs.
||Enables a defined component from a blacklisting system. Those components added to the blacklist cannot be re-enabled until the enablecomponent Command is issued and a power cycle or reset occurs.
||Powers the system down and powers it back up.
||Removes the main power from the host system.
||Applies the main power from the host system or Field Replaceable
||Prepares a FRU for removal and illuminates the host system’s “OK-to-Remove” Status LED.
||Resets all ALOM CMT configuration parameters to their default values.
||Controls the status of the Virtual Keyswitch.
||Turns a system’s Locator LED on or off. This only works on host systems that have Locator LEDs.
||Displays system components and their respected test status. Every Device Identifier is displayed when this command is used.
||Displays the environmental status of the host system, as follows: System Temperatures, Power Supplies, Locator LED, Cooling / Fans, Voltage and Current Sensor, and Virtual Keyswitch. Position
||Displays current system faults. Output included with this command includes fault ID, faulted FRU ID, and the fault message.
||Displays information about FRUs within a host system.
||Displays the status of the Virtual Keyswitch.
||Displays the history of all events logged in the ALOM CMT event buffer.
Detective TIP: When you’re on a Menu page and you see a list of commands or terms, simply hover your mouse over the command and a popup will appear with a short description of what the command does.
This section is generic and contains all of the available hardware administration tasks. If you know that you will be servicing a system with a RSC (Remote System Command) board, you can quickly find the RSC command options and examples in this section.
This subsection also contains modules describing Suns diagnostic features that are built into the firmware and Solaris to maximize system uptime. CHS (Component Health Check) is any example of this type of function. CHS will set a fault bit in some major FRUs when it determines that an error has occurred. This fault bit will reside in the faulty FRU and will disable the FRU during the POST (Power On Self Test). The CHS module fully explains this procedure.
This subsection can be used as reference as well as training material. Having a firm understanding of this subsection will greatly enhance your troubleshooting ability and efficiency.
This subsection covers generic Solaris OBP commands that are useful in isolating system faults. OBP (OpenBoot Prom) Commands are discussed in this subsection. OBP commands are low-level firmware commands. These commands can help isolate faults in a system that is unable to boot. SunVTS is discussed in this section. SunVTS is a test and verification suite of test that is included with the installation disks. These test are generally not installed during a standard installation. The procedure explaining how to install and use SunVTS is described in this section.. While the information is this subsection is generic in nature, it can be extremely useful in isolating and repairing a faulty system.
System Tuning and Capacity Planning
Reading, understanding and implementing ideas in this subsection can make you a hero in your customer’s or bosses eyes. This subsection has more to do with maximizing efficiencies within the system than troubleshooting. It is important to understand why a system is running slow and what can be done to improve performance. System tuning and capacity planning can save a company a lot of money by utilizing existing hardware and delaying the purchase of new hardware.