I often see device paths in the system logs referring to faulty peripheral. How can I interpret these device paths and isolate the faulty component?
Sun's device mapping strategies for the different workstation and server architectures allow the Operating System and firmware to communicate with the different components. Each component has a unique Physical Device Path. The device drivers are included in the path. Many times error messages, system logs and system status commands display devices according to their device paths. It is important to be able to relate the Physical Device Path to an actual physical slot location so the faulty device can be correctly identified.
This document will help explain how to identify slot numbers from a device's fully qualified physical path name. Identifying and managing I/O devices and disks can be very difficult, especially when system configurations are complex.
The physical address represents a physical characteristic that is unique to the device. Examples of physical addresses include the bess address and the slot number. The slot number indicates where the device is installed.
The physical device is a uniquely qualified full device name allowing all hardware devices in a system to be uniquely identified. The physical address represents a physical characteristic that is unique to the device. Examples of physical addresses include the bus address and the slot number. The slot number indicates where the device is installed. You reference a physical device by the Note Identifier Agent ID (AID). The AID range will vary depending on the system. The AID is usually in hexadecimal notation (0 to 1 f). In a device path beginning with ssm@0,0 the first number, 0, is the Node ID.
The full device pathname identifies a device in terms of its location in the device tree by identifying a series of node names separated by slashes, with the root (top level of the path) indicated by a leading slash. The physical device files can be found in the "/devices" directory. Each node name in the device pathname has the following form:
driver-name - identifies the device name
@unit-address - is the physical address of the device in the address space of the parent
device arguments - defines additional information regarding the device software
Matching the devices to the physical path's driver names will help identify the type of device. For example, if you see "hme" in the device path you can conclude that the device in question is a hundred megabit Ethernet controller (hme). The table below provides the descriptions of some commonly used device driver prefixes and their descriptions.
|Device Driver Prefix
||Fast/Wide SCSI Controller
||Fast(10/100 mb/sec) Ethernet
||Differential SCSI controllers and the SunSwift card
||Small Computer Serial Interface (SCSI) devices
||soc+ or socal Fiber Channel Arbitrated Loop (FCAL)
||SPARC Storage Array (SSA) controllers
||soc SPARC Storage Array (SSA) controllers
Below is an example of a device path seen in an Enterprise 450 Server.
The simplest way to decode this path is to navigate to the Main System Landing page in the Detective. In this example we will click on the "Server" tab and then select the Enterprise 450 Server. On the left-hand menu under Related Items click on "Device Mapping" then click on "Controller Mapping" and lastly click on Enterprise 450 Slot Specifications and Notes. This will bring you to the following table:
|Full Device Path Name
||External SCSI Port
Each of the Workstation and Servers in the Detective has a device mapping table. Using the example "/pci@6,4000/scsi@2" we can match the path from the system logs to the full device path in the table. In this example we can conclude that there is a problem with the PCI Board in slot 3. From the device drive name we can also conclude that the PCI board in slot 3 is a SCSI Controller (see Device Driver Prefix table above).
I'm having problems with the fibre hard disk drives in my Sun Fire V490. I cannot isolate the problem to a specific component. Can you give me some general troubleshooting tips?
The Sun Fire V490 contains a standard Fibre Channel (FC) Backplane that accomodates up to two FC-AL Hard Disk Drives. An on-board FC-AL controller integrated into the System Board controls the loop.
or Hard Disk Drive failure typically manifests itself as a disk drive read, write, or parity error. The Hard Disk Drive that is indicated by the fault message an/or LED fault should be replaced.
If an internal FC-AL Hard Disk Drive does not respond to commands, fails to boot, or the FC-AL loop fails to initialize OpenBoot Diagnostics tests that pertain to the Hard Disk storage subsystem, follow the procedure outlined below to help isolate the fault.
STEP BY STEP: FC-AL Troubleshooting
1. Bring the system to the OpenBoot PROM "ok" prompt.
2. At the "ok" prompt, type:
ok setenv auto-boot? false
3. At the "ok" prompt, type:
4. At the "ok" prompt, type:
ok setenv diag-level max
5. At the "ok" prompt, type:
ok setenv diag-switch true
6. At the "ok" prompt, type:
ok setenv test-args verbose
7. Verify all cables attached to the FC-AL backplanes are properly connected.
8. Power on the system and observe the POST status messages
9. If POST reports a problem, replace the component indicated by the failure message and repeat POST diagnostics until the problem is resolved. If no error is detected in POST, continue with the next step.
10. At the "ok" prompt, type obdiag. The OpenBoot Diagnostics menu is displayed, followed by the "obdiag>" prompt.
11. Test segment 5 of the I2C bus i2c@1, 30 (obdiag test 14) to verify that it is correctly operating. The test must pass in order to properly tet the FC-AL subsystem. If test 14 fails, run the obdiags on the remaining i2c segments and replace the component or components indicated by the failure messages. Segment 5 test failures can also result from a faulty I2C cable.
|After verifying that I2C segments are operating correctly, test the Hard Disk controller in the following order:
1. Test 4.controller@0,16 - base backplane Loop A
2. controller@0,1c - expansion backplane Loop A (if installed)
3. Test 5 controller@0,1a - base backplane Loop B
4. controller@0,1e - expansion backplane Loop A (if installed)
If the tests indicate a problem DPM, CRC, SSC-100, SSC-050 or LM75, the source of the problem is the FC-AL backplane. Replace the backplane and repeat the test.
If a loop-empty sub-test fails in a single backplane configuration, replace the backplane and repeat the test. If a loop-empty sub-test fails in a dual-backplane configuration, remove the FC-AL data cables between backplanes and repeat the test. If the failure persists, replace the backplane under test. Otherwise, the failure may be traced to the other backplane or the FC-AL cables between the two. If a failure message identifies one or more specific disks, replace the disks with known good disks and repeat the test.
|If the preceding test did not fail, run the FC-AL controller tests in the following order:
1. Test 1 SUNW,qlc@2 on-board FC-AL controller (Loop A)
2. Test 2 SUNW,qlc@4 - PCI FC-AL controller (Loop B, if installed)
Other types of failures during the on-board controller test usually indicate a problem with the System Board or the System Board FC-AL cable. When testing the PCI controller, these types of failure messages point to the PCI Card or the FC-AL cable between the card and the base backplane. In a dual-backplane configuration, removing the FC-AL cables and repeating the test can help to isolate the problem.
Do you have a question you'd like to see answered in a future issue of eKnowledge? Email Allen at: email@example.com