Hardware Topic
   >  Introduction to Hardware Problems
   >  Troubleshooting Memory Problems
   >  Troubleshooting Drive Problems
   >  Troubleshooting CPU Problems
   >  Troubleshooting PSU Problems
   >  Troubleshooting Card Problems
   >  Troubleshooting Intermittent Problems
   >  Diagnosing Hardware Failures

 

Troubleshooting Intermittent Hardware Problems

What is an Intermittent Problem

An intermittent problem is one that occurs occasionally or unpredictably. This differs from a problem that is predictable - e.g. the PC always fails to power up, always fails if a certain program/function is run. With Intermittent problems, a good Problem Log is essential to narrowing down the root cause.

An intermittent problem is one that occurs occasionally or unpredictably

Typical causes of intermittent problems are:

  • Firmware Problem or Configuration: the Operating System or device drivers do not correctly support a component

  • Heat problems: a component (-often the CPU or a graphics card) is getting too hot and overheating

  • Power problems: the PSU is supplying too little or too much power, or cannot maintain a constant supply

  • Motherboard problems: a motherboard component (-e.g. Northbridge / MCH) that communicates with the other components is damaged or misbehaving

  • Manufacturing Defect: a particular sub-part of a component is faulty but the PC functions normally until that area is accessed

Hardware is usually very reliable and seldom fails after it has been working correctly for some time (-the exception here are hard drives, which become more likely to fail with age or rough usage)


Firmware Problems

The most common cause of Intermittent faults is firmware. Firmware comprises the device drivers required to support each component and their configuration within the Operating System (O/S). As companies strive to produce innovative ways of delivering components, with increased capacity and speed, the components often differ in their interaction with the O/S. This means that a new driver program may be required to facilitate reliable communication between the device and the O/S - or different configuration settings made before the device will operate consistently.

.. [component] manufacturers issue updated drivers or fixes as soon as they are made aware of problems

Firmware problems should be suspected if any of the following are true:

If you suspect driver/configuration problems, then the best course of action is to search the device manufacturer's website for upgraded drivers or FAQs: manufacturers issue updated drivers or fixes as soon as they are made aware of problems. If there are no updates available, you can also try contacting their support department (-giving them as much of your evidence as possible) and ask them if they are aware of the problem: if nothing else, they generally can give you valuable advice on how to troubleshoot their devices further.

Another useful resource are forums: these are sites where questions can be posted and knowledgeable people answer ..

Another useful resource are forums: these are sites where questions can be posted and knowledgeable people answer (-if they can). Generally, the best way is to proceed is to type a description of the problem (-remove any specifics of your own computer from the message) into a Search Engine and see if it returns any matches. If you do not find the answer (-or even the question), you can sign up to one (-or more) of the technical forums best associated with your configuration and post your question there.


Overheating Problems

This is particularly common in high end gaming PCs, where there is a lot of high power components in a confined space, or overclocked systems where the user is running the CPU at a faster clock speed than is recommended. The key to diagnosing overheating is to keep an eye on the temperature of your system over time: if it is inexorably climbing (-even when the fan is running flat out) then this is likely to be the source of the problem

There are only two possible cures for overheating: reduce the heat generated .. [or] .. increase cooling

You can normally monitor temperatures in the BIOS, but a more convenient way is from within Linux. You can check the temperature of various components using the sensors command, which can be installed from the command line by typing:

sudo apt-get install lm-sensors

Here is an example of using the sensors command:

$ sensors
k10temp-pci-00c3
Adapter: PCI adapter
temp1:       +16.6°C  (high = +70.0°C, crit = +99.5°C)

atk0110-acpi-0
Adapter: ACPI interface
Vcore Voltage:     +1.04 V    (min =  +0.85 V,  max =  +1.60 V)
 +3.3 Voltage:      +3.31 V    (min =  +2.97 V,  max =  +3.63 V)
 +5 Voltage:         +4.86 V    (min =  +4.50 V,  max =  +5.50 V)
 +12 Voltage:       +11.73 V  (min = +10.20 V, max = +13.80 V)
CPU FAN Speed:        2537 RPM  (min =  600 RPM)
CHASSIS FAN Speed:2537 RPM  (min =  600 RPM)
CPU Temperature:   +31.0°C  (high = +60.0°C, crit = +95.0°C)  
MB Temperature:     +27.0°C  (high = +45.0°C, crit = +75.0°C)

The useful thing about this is that it can be croned to run periodically and redirect any output to disc. this can then be imported into a spreadsheet and graphed to identify any trends.

There are only two possible cures for overheating:

  • Reduce the heat generated:

    1. Remove components that are no longer required (-e.g. old cards)

    2. Switching to cooler running components (-e.g. CPU and/or graphics card)

  • Increase Cooling:

    1. Make sure the air supply to your PC case is not restricted: PCs are often placed in room corners, with the fan up against a wall: this will restrict the amount of air available for cooling and allow the hot air to accumulate inside the case. Place your case to maximise airflow - and keep it away from other heat sources such as radiators

    2. Adding additional fans to your case: a typical case comes with a single fan in the rear but you can also fit a second (-and sometimes a third) fan to suck air into the side/front for the rear fan to expel: thus improving the flow of cool through the case interior and then out of the rear panel

    3. Think about the placement of components within the case: if possible, try to spread them out, leaving the maximum amount of space between drives and cards to facilitate cooling. Try and place the hotter components on top, as the heat will travel upwards

    4. You may be able to reduce the CPU temperature by reducing the clock speed in the BIOS setup program (-especially in overclocked PCs)

    5. Adding an uprated fan to your CPU: as the CPU generates the most heat inside a PC, you can buy better fans that sit atop the CPU (-even liquid cooled ones) to lower the processor temperature


Power Problems

The components in your PC are not going to be happy if your PSU cannot supply the right voltage consistently. This is obviously a problem with the PSU and it can be diagnosed using the sensors command in the same way as overheating problems:

Here is an example of using the sensors command:

$ sensors
k10temp-pci-00c3
Adapter: PCI adapter
temp1:       +16.6°C  (high = +70.0°C, crit = +99.5°C)

atk0110-acpi-0
Adapter: ACPI interface
Vcore Voltage:     +1.04 V    (min =  +0.85 V,  max =  +1.60 V)
 +3.3 Voltage:       +3.31 V    (min =  +2.97 V,  max =  +3.63 V)
 +5 Voltage:           +4.86 V    (min =  +4.50 V,  max =  +5.50 V)
 +12 Voltage:         +11.73 V (min = +10.20 V, max = +13.80 V)
CPU FAN Speed:        2537 RPM  (min =  600 RPM)
CHASSIS FAN Speed:2537 RPM  (min =  600 RPM)
CPU Temperature:   +31.0°C  (high = +60.0°C, crit = +95.0°C)  
MB Temperature:     +27.0°C  (high = +45.0°C, crit = +75.0°C)

Check that the min/max values lie within the tolerances for your CPU, otherwise it could lead to problems.

If you find that a constant voltage is not being maintained, you can either try removing some of the other components (-if they are not essential) to lighten the load. Otherwise a new PSU is required.


Motherboard Problems

Motherboard failure is very rare: if it does occur it is often due to either static damage during installation or from a power surge (-which is why it is important to protect your PC with a surge protector or - better still - an Uninterruptable Power Supply).

Be sure to first check that the setup is correct in the BIOS setup program (-they normally have a "reset to factory defaults" option that you can try if all else fails) ..

If a component on the motherboard has truly failed, there is only one course of action: a motherboard replacement. This is a major step - and an expensive one - so you need to be absolutely sure that the motherboard is at fault. Be sure to first check that the setup is correct in the BIOS setup program (-they normally have a "reset to factory defaults" option that you can try if all else fails) and also check the motherboard manufacturer's website to see if there is a later version of the BIOS which might fix your problem.

Unfortunately, it is unlikely that you will gain definitive proof that the root cause of the problem lies with the motherboard (-short of swapping it for a new one); it is normally a case of ruling out everything else before being left with the motherboard hypothesis. Here are some guidelines:

  • Use your Problem Log and the Linux Log Files to see if the failure occurs during a function controlled by the motherboard

  • Strip out all but the essential components from your PC and see if the problem persists: for example, remove any cards and all drives, then try booting from a USB stick to see if the problem persists. If not, try adding back components one at a time (-beginning with the drive containing the Operating System) to see if the problems come back: if they do, it could be a component and not the motherboard that is faulty. In these cases, follow the troubleshooting section for that component within this guide to rule it out as the root cause

  • Sometimes, you may find that a particular port or slot is damaged: try moving cards/cables to different slot/port to see if this fixes things: if so, then the problem lies with the motherboard


HomeSite IndexDesktop GuideServer GuideHints and TipsHardware CornerVideo SectionContact Us

 sitelock verified Firefox Download Button