Key Users
  Product Detail
  Automated Brochure
  Automated Demos
  Press Releases
  Analyst Reviews
  Live Demo Request


    > Newsletter > Current Issue > Ask Al

Ask Al

Question #1
I'm trying to upgrade my Sun Fire V1280 by adding a 1500MHz UltraSPARC IV+ CPU/Memory Board. But the system always hangs when it engages the Power-On Self-Test (POST) sequence during the booting process. My original configuration included a single 1350 MHz UltraSPARC IV CPU/Memory Board. I've tried a couple other 1500MHz UltraSPARC IV+ CPU/Memory Boards that we have around the lab and they also failed during POST. How do I troubleshoot this problem?

Whenever adding a new component - especially an upgraded item - ensure you have followed all the required configuration rules and guidelines for the system. You can find these installation rules and guidelines in the On-Line! Detective. For the scenario you're experiencing, click on the "Servers" tab within the top menu and then click on "Sun Fire V1280" in the left-hand menu.

Scroll down to the very bottom of the system's landing page. Under the "Additional Information" heading there are several links that can be used obtain the component's configuration rules and guidelines. You should always verify that you meet all of these configuration rules and guideline BEFORE conducting any type upgrade.

Under the "Minimum Operating System Notes" heading within "Additional Information," you will see that 1500MHz UltraSPARC IV+ CPU/Memory Boards were not supported until Sun released Solaris 9 9/05. Ensure you are running Solaris 9 9/05 at a minimum. If you're not, halt the CPU/Memory Board upgrade.

This is just one factor involved when ensuring your system is capable of handling newer 1500MHz UltraSPARC IV+ CPU/Memory Boards. From the situation you described, however, it's a good bet the problem is probably a system firmware compatibility issue because the system is failing during POST and before Solaris has even been loaded.

To determine which firmware versions are compatible with different hardware configurations, click on "CPU" within the left-hand menu. This takes you to a page describing all the different CPU/Memory Board options. Scroll to the bottom of the page to see the CPU/Memory Board configuration rules and guidelines under the "Notes" heading. Make sure that you adhere to all of these rules.

For example, these rules and guidelines state that 1500MHz UltraSPARC IV+ CPU/Memory Boards must have a minimum system controller firmware of 5.19. And another note indicates system controller firmware 5.19.1 may cause a Panic (BugID 6319704), use firmware revision >= 5.19.2. These two guidelines must be followed to ensure full compatibility within the system.

Notes are also not limited to being generic. If you know a part's 7-digit part number, scroll up and click on the part number to receive notes specific to that component.

Determining Configuration Compatibility

Most problems that occur after a component upgrade are a result of software, firmware or hardware incompatibility. It is imperative that you follow ALL configuration rules and guidelines when upgrading a system. A summary of the steps required to locate the configuration rules and guidelines for any major component are outlined below.

1. Go to your system's landing page by either clicking the "workstation" or "server" tab. Choose the system that you're working on afterward.

2. Scroll down to the very bottom of the page and read all the configuration rules and guidelines under the heading "Additional Information." Make sure you meet or exceed all of the rules and guidelines specified in this section. If needed, click on some of the links for more information.

3. Under the "Primary Components" heading within left-hand menu, click on the component you are upgrading.

4. Scroll down to the bottom of this page and read all the configuration rules and guidelines in the "Notes" table. Make sure you also meet or exceed all of these configuration rules and guidelines.

5. Scroll up until you see the specific part number you are working with. Click on the part number. This takes you to a page with detail information about that part's configuration rules and guidelines.

6. Scroll down to the bottom of this page and read the configuration rules and guidelines under the "Notes" heading. Make sure you also meet or exceed all of these configuration rules and guidelines.

If you follow these six steps every time, you can be reasonably assured you won't confront an operating system, firmware, or hardware compatibility issue.

Always follow the FRU removal and replacement procedures in the On-Line! Detective for Sun. You will find hardware configuration rules and guidelines as well as installation tips and slot information.

Other Possible Problems

If you have followed all of the steps above and are still having hardware issues, remove the component from the system and check for bent pins or damaged connectors. Reseat the component and inspect the connection. If possible, try a different slot and try to isolate the component by going down to a minimum configuration. If you still are having problems, you can make a reasonable assumption that you are working with a faulty component.


Question #2
I have a SCSI hard disk drive that is inoperative. Unfortunately, we don't have a data backup system and we need to retrieve some critical information that is still on the drive. We looked at sending the drive out to have the data retrieved; however, the cost is prohibitive. Do you have any "last ditch" recommendations for retrieving data off a defective disk drive?

There are two different methods for retrieving information off a failed hard disk drive. The first is a non-destructive, and the second is destructive. Try method one first.  If that does not produce any satisfying results, try method two as a “last ditch” effort.

Method One: Non-Destructive
The following procedures are successful only 30% to 50% of the time. It's important to follow the procedures in sequence because they increase in risk until you reach the point where the disk is effectively destroyed. Keep in mind that even though this method is "non-destructive," further damage could occur by following this process.

1. If "mechanical" noises are coming from inside the disk, don't run the disk until the source of noise is identified (or verified that the disk can be run without destroying it). Go to "Method 2: Step 1."

2. Double-check to see whether the disk drive is actually faulty. If possible, move the disk drive to a different system. Make the disk drive the only device on the SCSI chain with all new cables and terminators. If the disk still fails, go to step 7.

3. If step two is not possible, change all external cables and terminators. Components such as cables, SCSI IDs and terminations are a major source of disk problems.

4. Remove all other devices from the SCSI chain to isolate the faulty drive on the SCSI Bus. It is not uncommon for a different device on the SCSI chain to corrupt the SCSI chain and make other devices on the chain appear defective.

5. Install the disk on a different power supply by swapping it into another enclosure. Do not assume that if you measure the correct voltages with a volt/ohm meter that the power supply is good. Poor DC voltage regulation (fluctuating voltages) can appear good on the volt/ohm meter and still cause drive problems.

6. A failed fan is a common problem. Verify that there is proper air circulation around the disk drive. An overheated drive will generate errors.

7. If the drive makes a high pitch squeal, the noise may be originating from a bad fan bearing or a bad disk spindle bearing. This is commonly misdiagnosed as a crashed drive. This condition is very common after the system/drive has been powered off for an extended time. If the system is left on, the bearing will usually seat and the noise will generally go away.

8. Verify that the SCSI termination voltage is +5v at both the computer and the disk. If the voltage is missing, try to determine why and fix the problem. The term power is usually protected by a fuse or PTC on the disk drive logic board. If the voltage is too low, try enabling the termination voltage at both ends of the cable. This is not normally recommended since it can lead to a ground loop which increases noise levels. If you are a hardware hacker and the voltage is too low, you can even try replacing the termination regulator with a higher drive component.

9. If the drive does not appear to spin up and the drive has been powered off for an extended time, the lubricant used on the rails may be sticking, thus preventing the drive from loading. This can usually be fixed by jarring the disk drive. With no power applied to the disk, drop the disk drive flat on its back from about a foot in height on a wooden surface.

10. Try replacing the drive logic board with the logic board from another identical drive. Depending on the drive, this may be fairly easy or virtually impossible. Also, note that drives which use EEPROM for various parameters may impose difficulty recovering data even if the drive is usable after the swap. Around 30% to 50% of bad drives are caused by a bad circuit board on the drive. Do a comprehensive visual examination of the electronics and look for blackened or discolored components. Also examine the traces on the board looking for signs of heat (melting, balled up solder, curled traces, etc.).

Method Two: Destructive
The procedure outlined below is destructive. Only use this procedure only if you are desperate and you have no other options. The steps outlined below will most likely destroy the disk drive and definitely void the warranty on the drive. They should only be preformed if you have ruled out using a professional service and you are willing to LOSE ALL DATA ON THE DRIVE. There is only a small chance that you will be able to retrieve the data. If you want to make every possible effort to retrieve the data then continue with the following steps.

1. Physically remove the disk from all power sources, then, observing proper anti-static precautions, carefully open up the disk enclosure. DO NOT TOUCH THE PLATTER SURFACES! (Or allow anything to come in contact with them). Determine whether the disk can be run with the case open (i.e. everything is screwed in tight). Some disks can run with the case open. If the disk was making noises, try to determine the source.

2. Assuming everything is screwed in tight, turn the disk over and shake it to see if anything falls out. If it does, try to determine what it is and where it came from. If it's an electrical component, chances are you may have to replace the electronics (with electronics from an identical drive).

3. Examine the platter surfaces for scratches (Bench light or flashlight required). If a surface is scratched by one of the heads (or arms), then that head is probably bad. Data on that surface is probably unrecoverable unless you can replace the head assembly and the scratching isn't too bad. If the scratching is severe enough to impact the rotation of the disk, then that arm may have to be bent or cut away to allow the disk to rotate properly.

4. While touching only the edge of the platters, rotate disk to check for sticky or rough bearings. Use something extremely clean for this--hands are not a particularly good choice; but plastic (such as plastic wrap) is generally better but is a major source of static so don't let it touch the electronics.

5. If the disk appears okay mechanically, then hook it up to power and turn it on. If it doesn't start spinning, try giving it a push. If that makes them spin, maybe the problem is that the motor is no longer powerful enough to start the disk spinning.

6. The problem could be a broken power lead or other wire. Visually check all wires and connectors.

7. If the problem is the disk bearing (if it's sticky or rough), then you could try an infinitesimally small drop of ultra-fine machine oil on the bearing. This may smooth it out enough to run the drive long enough to remove data.

8. If the problem is the head stepper motor assembly, then it is theoretically possible to replace the head and stepper with an identical unit from another drive. At a minimum, this will require a good assortment of miniature tools. This procedure has a pretty low chance of success.

9. If there are no signs of mechanical problems then the problem may be with power. Perform a continuity check between power and ground. This should read low resistance but not zero resistance.

10. Connect the drive to power and check voltages (meter required). Check the electronics board, the stepper motor, and power to the disk motor. Check the fuses or PTCs (if you can find them). If there is no power, you may be able to solder in a jumper wire to bring in power (or ground).

11. In all cases you probably want a low-level disk read/write diagnostic program to determine the effect of the changes you make.

Take all of this advice with a grain a salt and a pound of caution. And, of course, if you aren't a certified technician, you shouldn't ever open up an electrical device anyway--there is a reason that data recovery companies charge so much!


Do you have a question you'd like to see answered in a future issue of eKnowledge? Email Allen at: