Troubleshooting New Devices
When you install a new device, you take the chance that a stable machine will suddenly cease to function or become erratic. Basic diagnostic procedures always apply. If possible, undo what you just did and see if the problem goes away. If not, begin analyzing what changes were made and correlate them to the symptoms you're seeing. This section contains some common tools to help you diagnose problems caused by adding components.
Device Driver Update and Rollback
The easiest way to update a driver is by using the Device Manager console. Obtain the Windows Server 2003/XP or Windows 2000 driver and the associated INF setup script and proceed as directed in Procedure 3.3.
Procedure 3.3 Device Driver Update
Right-click the device icon and select Update Driver. This launches the Hardware Update Wizard.
Select the Install from a List or Specific Location option.
Click Next. The Please Choose Your Search window opens. Select the Don't Search option.
Click Next. The Select Network Adapter window opens. (This will vary, of course, depending on the device you are updating.)
Click Have Disk. A browse window opens. Navigate to the location of the driver and the associated INF setup script.
Select the INF script corresponding to the device. The system displays the strings in the INF. Verify that they match the device you are updating.
Click Next to install the drivers.
Click Finish to close the wizard.
At this point, the Properties window for the device should show the new version information. If the machine becomes unstable or crashes at this point, you can boot to Safe mode then roll back the device to the old driver until you figure out what happened. Follow Procedure 3.4.
Procedure 3.4 Device Driver Rollback
Open the Properties window for the device.
Select the Driver tab (see Figure 3.19).
Figure 3.19. Device Driver properties showing Roll Back Driver option.
Click Roll Back Driver.
The system asks if you're sure. Click OK to complete the job.
The system obtains the old driver out of the driver library and reinstalls it. The Properties window refreshes with the old version number.
Building a Windows Server 2003 Boot Disk
It is very common when adding new mass storage devices or interfaces that the machine refuses to boot. This occurs generally because the additional drive or interface changed the ARC path to the Windows Server 2003 boot partition. Other reasons for failure to boot include the following:
The Windows Server 2003 system files at the root of the boot partition have been corrupted or deleted.
The boot sector has been corrupted by a virus.
A user ran SYS against the C drive in an attempt to make the machine dual-boot.
The primary drive in a mirrored set has failed (software RAID only).
The Ntbootdd.sys driver has been deleted or corrupted on a system with a SCSI interface that has no BIOS.
In all these situations, the core Windows Server 2003 files in the \Windows directory are probably just fine; all you need to do is bypass the corrupted files and disk structures at the beginning of the boot drive. You can do this on IA32 machines by booting to a floppy that contains the same Windows Server 2003 system files found at the root of the boot disk. This is called a fault tolerant boot disk.
It is not necessary, nor is it possible, to use a fault tolerant boot disk on an IA64 machine. Use the EFI shell to select a boot partition. See Chapter 1 for details.
When building a fault tolerant boot disk, it's important to use a floppy that has been formatting on a machine running Windows Server 2003 or XP. These system files are capable of booting earlier versions of NT-based operating systems, but the opposite is not true. You cannot boot Windows Server 2003 using a fault tolerant disk built on a Windows 2000 server.
To configure a fault tolerant boot disk in Windows Server 2003, follow Procedure 3.5.
Procedure 3.5 Creating a Fault Tolerant Boot Floppy Disk
Use the ATTRIB utility to remove the read-only and hidden attributes from the following system files at the root of the boot drive:
Ntbootdd.sys (if present)
Insert a blank floppy disk into the floppy disk drive and format it. You can use the Quick Format option if you're sure that there are no disk defects.
When format is complete, copy the files listed in Step 1 to the floppy.
Open Notepad and use it to edit the Boot.ini file on the floppy disk.
Change the Time Setting entry to -1. This disables the countdown timer. (You do not have to do this, but I find it helpful to keep the counter from ticking down when I'm troubleshooting.)
Save the changes and close Notepad.
Restart the computer and boot from the fault tolerant boot floppy disk.
When the BOOT menu appears, highlight the entry representing the partition containing the Windows Server 2003 system files and press Enter to finish the boot.
The ARC path in Boot.ini tells NTLDR where to find the Windows Server 2003 boot files on the hard drive. The remainder of the boot process should proceed normally. At this point, the floppy disk is no longer needed. Remove it from the drive.
Keep a copy of the fault tolerant boot disk handy for booting workstations and servers in the field. If the boot partition has more than 7.8GB, or a non-standard geometry, or is on a SCSI drive with an interface that has no BIOS, the ARC path will have a signature() entry. The number in the parentheses is a unique ID written to the Master Boot Record (MBR) of the boot drive. The Boot.ini for this machine will not work on another machine. Clearly label the boot floppy disk and keep it in safe place.
Resolving SCSI Problems
If Plug and Play Manager does not recognize the controller or the disks on the controller, or you get data corruption or a significant number of errors in the Event log relating to the SCSI devices on that interface, the most likely causes are improper termination or excessive cable length. The following can cause improper termination:
Mixing active and passive terminations.
Mixing cable types, which can cause impedance matching and timing problems.
Having too many terminations, such as having the SCSI controller in the middle of the bus with active termination enabled on the controller and terminators at either end of the cable.
Forgetting to attach the resistor pack.
Attaching a resistor pack when active termination is enabled.
Thinking that you have enabled active termination on a device but actually putting the jumper on the wrong pins.
Thinking that you have disabled active termination on a device but actually removing the jumper from the wrong pins.
The cable and terminator configuration might have worked in NT4, but Windows Server 2003 puts much greater demand on the hard disk interface to boost performance. Weaklings break down quickly. Replace the interface with one on the HCL. If it is built into the motherboard, get a PCI adapter and disable the motherboard interface.
If you already have a SCSI adapter and you add a second one of the exact make and model, you may need to disable the BIOS on the second adapter to keep it from squabbling with the first adapter. This does not affect Windows Server 2003 functionality; if you put an installation of Windows Server 2003 on a disk connected to the second adapter, however, Setup will copy the SCSI miniport driver to the root of the partition and name it Ntbootdd.sys so that NTLDR can use the driver to scan the SCSI bus.
If you have mirrored drives and the primary drive fails, you can use the fault tolerant boot floppy disk to boot from the mirrored partition. If the mirrored drive is the second disk on a SCSI chain, for example, the ARC path would be multi(0)disk(0)rdisk(1)partition(1)\WINNT).
Setup will also use a signature() controller ID in the ARC path in Boot.ini to identify the drive. The parameter in the parentheses of signature() is a special signature in the Master Boot Record placed there by Setup.
You may get an Unable to Locate Operating System error after adding a second SCSI controller. You can thank PCI for that. In most cases, the SCSI adapter with the lowest IOBase address is assumed to be the boot host adapter. In PnP systems, the PCI slot closest to the CPU normally gets the lowest IOBase address. Therefore, if you installed the second adapter in a slot closer to the CPU (lower number), you changed your active drive designation. Try swapping PCI slots. Plug and Play Manager assigns component identifiers based on their PCI slot; therefore, by swapping slots, you'll force a new PnP enumeration.
Correcting Non-PnP System Hangs
If the system hangs while NTDETECT is running, you probably have a problem with the motherboard or memory or hard drive interface. The function of NTDETECT is to recognize hardware. If it cannot do its job, it stalls.
You cannot resolve this error until you know what component is causing the hang. There are a couple of ways to find this out. One way is to press F8 at the BOOT menu and select the Boot Logging option. This writes a Boot log to the hard drive; if you can't get booted, however, you cannot read the log.
The second alternative is to use a debug version of Ntdetect.COM called Ntdetect.chk. The Ntdetect.chk file is located on the Windows Server 2003 CD in the \Support\Debug\I386 directory. Use Ntdetect.chk as instructed in Procedure 3.6
Procedure 3.6 Using Ntdetect.chk
Make a fault tolerant boot floppy using Windows Server 2003.
Copy Ntdetect.chk onto the duplicate disk.
Rename Ntdetect.com to Ntdetect.old.
Rename Ntdetect.chk to Ntdetect.com.
Reboot using the disk with the renamed Ntdetect.chk.
The debug version of NTDETECT works just like the regular version except that it displays what it detects as each component is encountered. When the system hangs, the last component on the screen is the one causing the problem. Look for IRQ or IOBase conflicts.
Tracking Kernel Memory Use
You should try to keep long-term statistics for kernel memory use on your servers so that you can spot abnormal trends. The most convenient tool for doing this is Performance Monitor. Open the Performance Monitor console using START | PROGRAMS | ADMINISTRATIVE TOOLS | PERFORMANCE. Figure 3.20 shows an example.
Figure 3.20. Performance Monitor console.
The Performance Monitor console contains two snap-ins. The System Monitor snap-in is an ActiveX control designed to display performance counters in graphical format. The Performance Logs and Alerts snap-in is designed to collect performance statistics and write them to a log or send alerts to a console or Event log. Logs are the best way to collect long-term performance statistics. Configure a log to collect kernel memory statistics by following the steps shown in Procedure 3.7.
Procedure 3.7 Configuring Performance Monitor to Collect Kernel Memory Statistics
Expand the tree under Performance Logs and Alerts and highlight Counter Logs.
Right-click a blank area in the right pane and select NEW LOG SETTINGS from the flyout menu. The New Log Settings window opens.
Enter a name for the log, such as Long-Term kernel Memory Use.
Click OK. A management window opens for the log. The window name matches the log name you assigned in Step 3.
Click Add. The Select Counters window opens.
Under Performance Object, select Memory from the drop-down box.
Select the All Counters radio button. Long-term performance data collection involves taking snapshots at infrequent intervals, such as once an hour, so collecting all available counters will not be too much of a burden on the server.
Click Add to add the counters to the log, and then click Close to return to the main log management window.
Set the Sample Data Every value to 1 Hour.
Select the Log Files tab. The default location of the log is a folder called \Perflog at the root of the system partition. You can change this location using the Browse button.
The default filename uses the name you assigned to the log plus a six-digit number. If you stipulate a Log File Size Limit at the bottom of the window, a log fills up, then closes, and another begins filling.
Click OK to save the selections and return to the Performance Monitor console.
Collect statistics for a few days, then view the contents of the log using the System Monitor snap-in. To do this, select the log as the source for the chart by following the steps shown in Procedure 3.8.
Procedure 3.8 Charting Performance Monitor Logs
Highlight the System Monitor icon. An empty chart appears in the right pane.
Right-click the chart and select PROPERTIES from the flyout menu. The Properties window opens.
Select the Source tab.
Select the Log File radio button.
Click Browse to open the Select Log File navigation tool. The focus is set automatically to the \Perflog folder.
Double-click the name of the counter log you configured to select it and return to the System Monitor Properties window.
Click OK to save the selections, close the window, and return to the main Performance window. Nothing happens quite yet.
Right-click the right pane again and this time select ADD COUNTERS from the flyout menu. The Add Counters window opens.
Select the All Counters radio button, and then click Add followed by Close. This adds all the counters to the chart. If that makes the chart too busy, you can delete counter entries.
The chart shows the statistics you collected in the log. Up to 100 data points can be displayed. Press Ctrl+H to turn on highlighting so that any counter you select in the lower part of the window turns into a white line in the chart.