Key Users
  Product Detail
  Automated Brochure
  Automated Demos
  Press Releases
  Newsletter
  Brochure
  Testimonials
  Customers
  Analyst Reviews
  Live Demo Request
  ROI

 


 
 
 
    > Newsletter > Current Issue > System Troubleshooting
 
   
 

System Troubleshooting

There are a number of different areas in the Detective that are useful for troubleshooting a faulty system. Troubleshooting information can be found in the following locations:

• Server > (Selected Server) > Troubleshooting
• Workstation > (Selected Workstation) > Troubleshooting
• Training > Personal Trainer > (Selected Series) > (Selected system)
• Administration > Troubleshooting
• Administration > Hardware Administration

When troubleshooting a faulty workstation or server go to the chosen system’s landing page (Server/Workstation > (Chosen System) > Troubleshooting). Click on the Troubleshooting link in the lower left corner. This will bring up the Troubleshooting Landing Page. The page will generally be divided into the following five subsections.

Note: Content for each system in the Detective may differ depending on firmware and options. The specific information below is drawn from the Sun Fire V880. Content for other systems may vary slightly.

Main System Troubleshooting

System Troubleshooting

This subsection has a number of troubleshooting techniques and tips for the chosen system. You will find “System Status LED” interpretation and troubleshooting information for each of the system’s major components. The list of links will change depending on the system selected. When troubleshooting a system utilize this section first. While the other troubleshooting sections will be helpful this specific section will typically contain the most relevant information. Other troubleshooting sections often contain greater detailed information but are of a more generic nature.

Firmware

This subsection contains specific information about the firmware used on the chosen system. Firmware commands and tests can be very useful in diagnosing and isolating a system fault. Firmware commands can often be used to disable and enable hardware components, this can be very useful in isolating intermittent faults. If the integrity of the firmware is compromised by improper settings, this section will guide you in the proper procedure to reset the firmware to its default settings. Features such as ASR (Automatic System Recovery), Device Identifiers, OBP are discussed in this section.

Troubleshooting Tools

This subsection contains information about some if the more generic tools that are useful in troubleshooting the system. This section is generic throughout the Detective and some items may not apply to the chosen system. Other items may not be installed during a generic OS installation. SunVTS is discussed in this section. SunVTS is a test and verification suite of test that is included with the installation disks. These test are generally not installed during a standard installation. The procedure explaining how to install and use SunVTS is described in this section.. While the information in this subsection is generic in nature, it can be extremely useful in isolating and repairing a faulty system.

Testing Components

This section contains detailed generic troubleshooting information about several major components. This section covers the difference between correctable and uncorrectable memory and disk errors, and how to approach them. You will find techniques such as how to troubleshoot a SCSI hard drive and isolate the drive from the SCSI bus. Different methods of recovering data from a faulty disk drive are documented. The information in th section will provide you with a strong troubleshooting foundation. We suggest that the information in this section be read as a precursor to actually troubleshooting the system.

Advanced Troubleshooting

This subsection contains advanced troubleshooting techniques that should be used as a last resort. System panics are discussed in this section. System panics occur when the system crashes. The system can be configured to create a memory and register dump after a system crash. This information can be used to help identify the root cause of the system failure. Information is this section is very technical and high level. If you rally want to dig into the architecture of the system the information in this subsystem will keep you entertained.

 

Personal Trainer Troubleshooting

The second method of obtaining troubleshooting information can be found through the “Personal Trainer”. The “Personal Trainer” is organized in a manner that is conducive to e-learning as opposed to the standard reference organization found throughout the Detective. The information in this section is identical to the troubleshooting information in the main troubleshooting section previously explained. If you wish to learn about a system in a relaxed e-Learning environment, use this section.

 

Administration Troubleshooting

After clicking on the Administration tab you will see a “Troubleshooting” subsystem. The troubleshooting links in this section are targeted for the system administrator or anyone who does not want to physically handle the hardware. Some of the more useful subsections include the following:

Hardware Administration

This subsection covers the different LOM (Lights Out Management) iterations used by Sun in their different hardware platforms. LOM can be very helpful in isolating faults both locally and remotely. There are a number of different LOM commands that display the system’s status and configuration parameters. Below is a small list of some of the more commonly used LOM commands and their function.

Bootmode Controls the way the system boots through the OpenBoot PROM.
break Drops the host system from Solaris into OpenBoot PROM or kadb.
clearasrdb Clears components and devices that have been added to the blacklist. Re-enables every component, despite being manually disabled or automatically disabled during a Power-On Self-Test.
clearfault Clears faults within the system as identified when the showfault -v Command is used so the repair can be done manually
consolehistory Displays the host System Console output buffers.
disablecomponent Disables defined components through a blacklisting system. Those components added to the blacklist cannot be re-enable until the enablecomponent Command is issued and a power cycle or reset occurs.
enablecomponent Enables a defined component from a blacklisting system. Those components added to the blacklist cannot be re-enabled until the enablecomponent Command is issued and a power cycle or reset occurs.
powercycle Powers the system down and powers it back up.
poweroff Removes the main power from the host system.
poweron Applies the main power from the host system or Field Replaceable
Units (FRUs).
removefru Prepares a FRU for removal and illuminates the host system’s “OK-to-Remove” Status LED.
setdefaults Resets all ALOM CMT configuration parameters to their default values.
setkeyswitch Controls the status of the Virtual Keyswitch.
setlocator Turns a system’s Locator LED on or off. This only works on host systems that have Locator LEDs.
showcomponent Displays system components and their respected test status. Every Device Identifier is displayed when this command is used.
showenvironment Displays the environmental status of the host system, as follows: System Temperatures, Power Supplies, Locator LED, Cooling / Fans, Voltage and Current Sensor, and Virtual Keyswitch. Position
showfaults Displays current system faults. Output included with this command includes fault ID, faulted FRU ID, and the fault message.
showfru Displays information about FRUs within a host system.
showkeyswitch Displays the status of the Virtual Keyswitch.
showlogs Displays the history of all events logged in the ALOM CMT event buffer.


Detective TIP: When you’re on a Menu page and you see a list of commands or terms, simply hover your mouse over the command and a popup will appear with a short description of what the command does.

This section is generic and contains all of the available hardware administration tasks. If you know that you will be servicing a system with a RSC (Remote System Command) board, you can quickly find the RSC command options and examples in this section.

This subsection also contains modules describing Suns diagnostic features that are built into the firmware and Solaris to maximize system uptime. CHS (Component Health Check) is any example of this type of function. CHS will set a fault bit in some major FRUs when it determines that an error has occurred. This fault bit will reside in the faulty FRU and will disable the FRU during the POST (Power On Self Test). The CHS module fully explains this procedure.

This subsection can be used as reference as well as training material. Having a firm understanding of this subsection will greatly enhance your troubleshooting ability and efficiency.

Troubleshooting

This subsection covers generic Solaris OBP commands that are useful in isolating system faults. OBP (OpenBoot Prom) Commands are discussed in this subsection. OBP commands are low-level firmware commands. These commands can help isolate faults in a system that is unable to boot. SunVTS is discussed in this section. SunVTS is a test and verification suite of test that is included with the installation disks. These test are generally not installed during a standard installation. The procedure explaining how to install and use SunVTS is described in this section.. While the information is this subsection is generic in nature, it can be extremely useful in isolating and repairing a faulty system.

System Tuning and Capacity Planning

Reading, understanding and implementing ideas in this subsection can make you a hero in your customer’s or bosses eyes. This subsection has more to do with maximizing efficiencies within the system than troubleshooting. It is important to understand why a system is running slow and what can be done to improve performance. System tuning and capacity planning can save a company a lot of money by utilizing existing hardware and delaying the purchase of new hardware.