• Chapter 1. Installing and Configuring Windows Server 2003
  • software development Company Server 2003
  • Chapter 1. Installing and Configuring Windows Server 2003
  • New Features in Windows Server 2003
  • Best Practices
  • Moving Forward
  • Version Comparisons
  • Hardware Recommendations
  • Installation Checklist
  • Functional Overview of Windows Server 2003 Setup
  • Installing Windows Server 2003
  • Post Setup Configurations
  • Functional Description of the Windows Server 2003 Boot Process
  • Correcting Common Setup Problems
  • Chapter 2. Performing Upgrades and Automated Installations
  • New Features in Windows Server 2003
  • NT4 Upgrade Functional Overview
  • Upgrading an NT4 or Windows 2000 Server
  • Automating Windows Server 2003 Deployments
  • Moving Forward
  • Chapter 3. Adding Hardware
  • New Features in Windows Server 2003
  • Functional Description of Windows Server 2003 Architecture
  • Overview of Windows Server 2003 Plug and Play
  • Installing and Configuring Devices
  • Troubleshooting New Devices
  • Moving Forward
  • Chapter 4. Managing NetBIOS Name Resolution
  • New Features in Windows Server 2003
  • Moving Forward
  • Overview of Windows Server 2003 Networking
  • Name Resolution and Network Services
  • Network Diagnostic Utilities
  • Resolving NetBIOS Names Using Broadcasts
  • Resolving NetBIOS Names Using Lmhosts
  • Resolving NetBIOS Names Using WINS
  • Managing WINS
  • Disabling NetBIOS-over-TCP/IP Name Resolution
  • Chapter 5. Managing DNS
  • New Features in Windows Server 2003
  • Configuring a Caching-Only Server
  • Configuring a DNS Server to Use a Forwarder
  • Managing Dynamic DNS
  • Configuring Advanced DNS Server Parameters
  • Examining Zones with Nslookup
  • Command-Line Management of DNS
  • Configuring DHCP to Support DNS
  • Moving Forward
  • Overview of DNS Domain Structure
  • Functional Description of DNS Query Handling
  • Designing DNS Domains
  • Active Directory Integration
  • Configuring DNS Clients
  • Installing and Configuring DNS Servers
  • Configuring Secondary DNS Servers
  • Integrating DNS Zones into Active Directory
  • Chapter 6. Understanding Active Directory Services
  • New Features in Windows Server 2003
  • Active Directory Support Files
  • Active Directory Utilities
  • Bulk Imports and Exports
  • Moving Forward
  • Limitations of Classic NT Security
  • Directory Service Components
  • Brief History of Directory Services
  • X.500 Overview
  • LDAP Information Model
  • LDAP Namespace Structure
  • Active Directory Namespace Structure
  • Active Directory Schema
  • Chapter 7. Managing Active Directory Replication
  • New Features in Windows Server 2003
  • Replication Overview
  • Detailed Replication Transaction Descriptions
  • Designing Site Architectures
  • Configuring Inter-site Replication
  • Controlling Replication Parameters
  • Special Replication Operations
  • Troubleshooting Replication Problems
  • Moving Forward
  • Chapter 8. Designing Windows Server 2003 Domains
  • New Features in Windows Server 2003
  • Design Objectives
  • DNS and Active Directory Namespaces
  • Domain Design Strategies
  • Strategies for OU Design
  • Flexible Single Master Operations
  • Domain Controller Placement
  • Moving Forward
  • Chapter 9. Deploying Windows Server 2003 Domains
  • New Features in Windows Server 2003
  • Preparing for an NT Domain Upgrade
  • In-Place Upgrade of an NT4 Domain
  • In-Place Upgrade of a Windows 2000 Forest
  • Migrating from NT and Windows 2000 Domains to Windows Server 2003
  • Additional Domain Operations
  • Moving Forward
  • Chapter 10. Active Directory Maintenance
  • New Features in Windows Server 2003
  • Loss of a DNS Server
  • Loss of a Domain Controller
  • Loss of Key Replication Components
  • Backing Up the Directory
  • Performing Directory Maintenance
  • Moving Forward
  • Chapter 11. Understanding Network Access Security and Kerberos
  • New Features in Windows Server 2003
  • Windows Server 2003 Security Architecture
  • Security Components
  • Password Security
  • Authentication
  • Analysis of Kerberos Transactions
  • MITv5 Kerberos Interoperability
  • Security Auditing
  • Moving Forward
  • Chapter 12. Managing Group Policies
  • New Features in Windows Server 2003
  • Group Policy Operational Overview
  • Managing Individual Group Policy Types
  • Moving Forward
  • Chapter 13. Managing Active Directory Security
  • New Features in Windows Server 2003
  • Overview of Active Directory Security
  • Using Groups to Manage Active Directory Objects
  • Service Accounts
  • Using the Secondary Logon Service and RunAs
  • Using WMI for Active Directory Event Notification
  • Moving Forward
  • Chapter 14. Configuring Data Storage
  • New Features in Windows Server 2003
  • Functional Description of Windows Server 2003 Data Storage
  • Performing Disk Operations on IA32 Systems
  • Recovering Failed Fault Tolerant Disks
  • Working with GPT Disks
  • Moving Forward
  • Chapter 15. Managing File Systems
  • New Features in Windows Server 2003
  • Overview of Windows Server 2003 File Systems
  • NTFS Attributes
  • Link Tracking Service
  • Reparse Points
  • File System Recovery and Fault Tolerance
  • Quotas
  • File System Operations
  • Moving Forward
  • Chapter 16. Managing Shared Resources
  • New Features in Windows Server 2003
  • Functional Description of Windows Resource Sharing
  • Configuring File Sharing
  • Connecting to Shared Folders
  • Resource Sharing Using the Distributed File System (Dfs)
  • Printer Sharing
  • Configuring Windows Server 2003 Clients to Print
  • Managing Print Services
  • Moving Forward
  • Chapter 17. Managing File Encryption
  • New Features in Windows Server 2003
  • File Encryption Functional Description
  • Certificate Management
  • Encrypted File Recovery
  • Encrypting Server-Based Files
  • EFS File Transactions and WebDAV
  • Special EFS Guidelines
  • EFS Procedures
  • Moving Forward
  • Chapter 18. Managing a Public Key Infrastructure
  • New Features in Windows Server 2003
  • Moving Forward
  • PKI Goals
  • Cryptographic Elements in Windows Server 2003
  • Public/Private Key Services
  • Certificates
  • Certification Authorities
  • Certificate Enrollment
  • Key Archival and Recovery
  • Command-Line PKI Tools
  • Chapter 19. Managing the User Operating Environment
  • New Features in Windows Server 2003
  • Side-by-Side Assemblies
  • User State Migration
  • Managing Folder Redirection
  • Creating and Managing Home Directories
  • Managing Offline Files
  • Managing Servers via Remote Desktop
  • Moving Forward
  • Chapter 20. Managing Remote Access and Internet Routing
  • New Features in Windows Server 2003
  • Configuring a Network Bridge
  • Configuring Virtual Private Network Connections
  • Configuring Internet Authentication Services (IAS)
  • Moving Forward
  • Functional Description of WAN Device Support
  • PPP Authentication
  • NT4 RAS Servers and Active Directory Domains
  • Deploying Smart Cards for Remote Access
  • Installing and Configuring Modems
  • Configuring a Remote Access Server
  • Configuring a Demand-Dial Router
  • Configuring an Internet Gateway Using NAT
  • Chapter 21. Recovering from System Failures
  • New Features in Windows Server 2003
  • Functional Description Ntbackup
  • Backup and Restore Operations
  • Recovering from Blue Screen Stops
  • Using Emergency Management Services (EMS)
  • Using Safe Mode
  • Restoring Functionality with the Last Known Good Configuration
  • Recovery Console
  • Moving Forward
  • Who Should Read This Book
  • Who This Book Is Not For
  • Conventions
  • Acknowledgments
  • About the Author
  • About the Technical Reviewers
  • Index
  • Index A
  • Index B
  • Index C
  • Index D
  • Index E
  • Index F
  • Index G
  • Index H
  • Index I
  • Index J
  • Index K
  • Index L
  • Index M
  • Index N
  • Index O
  • Index P
  • Index Q
  • Index R
  • Index S
  • Index SYMBOL
  • Index T
  • Index U
  • Index V
  • Index W
  • Index X
  • Index Z
  • Preface
  • Previous Section Next Section

    Recovering from Blue Screen Stops

    There are two varieties of executables in Windows Server 2003:

    • Kernel services that run in a privileged memory space inside the Windows Executive

    • User applications that run in unprivileged memory

    For the most part, it should be nearly impossible for a User-side application to crash a system. Oh, a user app can load the system outrageously or cause it to become autistic, but it should not cause a complete loss of system services.

    Kernel services, on the other hand, are fully capable of causing drastic malfunctions. Rather than risk widespread memory and file corruption, when a kernel service misbehaves, the system is brought to a stop and information about the crash is displayed. This kernel-mode stop is commonly called a Blue Screen Of Death, or BSOD, due to the background color of the informational display. This blue screen display is handled by a kernel-mode routine called KeBugCheckEx, so it is often called a bugcheck.

    Online Event Tracking

    If a server crashes due to a bugcheck, or an application hangs and must be killed from Task Manager or by DrWatson, the system assembles a set of XML files that contain the names of the processes that were running at the time of the crash or hang and system information about memory contents and CPU register contents, similar to what you see at the blue screen.

    These XML files are sent to Microsoft where they are added to a repository of failure information used to determine causes and help identify solutions for crashes. For example, if thousands of error reports flood into Microsoft that identify a particular driver as the culprit in a crash, Microsoft will work with the vendor to determine the cause of the instabilities.

    You may not want this information to be transmitted to Microsoft. There is a set of group policies under Computer Settings | Administrative Templates | System | Error Reporting that control the online error report settings. You can elect to block reporting completely, to report on selected applications, or to report only unplanned shutdown events.

    Bugcheck Codes

    The top lines of the blue screen contain bugcheck codes that identify the source of the stop, information about the stop that differs depending on the stop code, and oftentimes the name of the culprit. The information looks like this:

    *** STOP: 0x0000001E (0xC0000005, 0x8041E9FB, 0x00000000, 0x00000030)
    KMODE_EXCEPTION_NOT_HANDLED
    *** Address 8041E9FB base at 80400000, DateStamp 377509d0 – ntoskrnl.exe
    

    The bugcheck codes are your best bet for quickly finding the cause of the crash. The rest of the stop screen usually (but not always) contains stack dump information listing the processes that were in memory at the time of the crash and what they were doing. Here's a brief explanation of the bugcheck information:

    • The first entry after STOP is the hex ID of the stop code. This corresponds to the name on the second line. If there is no name, the exception was so severe that the system was not able to refer to the lookup table to generate the name.

    • The next four entries are parameters that were passed to KeBugCheckEx when the STOP error was issued. The meaning and origin of these parameters vary depending on the type of error.

    • The line following the bugcheck code specifies the base address of the image that caused the exception, a hex representation of the date stamp on the image, and its name. In this case, the exception was thrown by the kernel driver, Ntoskrnl.exe.

    The fact that a particular executable is implicated by bugcheck does not necessarily mean that it was the actual perpetrator. In this game of blue screen Clue, you have to search through all the rooms to find out who killed Mr. Server. The name at the top of the bugcheck list might have just been a dupe used by the real culprit.

    The Microsoft Knowledgebase is your best source for information about bugcheck codes. Start by searching for Q103059, which lists the stop codes and their names. Then check out Q192463 for ways to collect information without doing full-blown kernel debugging. For a full list of stop codes, download the Windows DDK from msdn.microsoft.com and take a look at the include file, bugcodes.h.

    Online Error Reporting

    In an effort to find and correct common sources of system hangs and bugchecks, Windows Server 2003 has an Error Reporting service, Ersvc, that collects kernel information from bugcheck and application information from Dr. Watson and sends them to Microsoft where they are cataloged and analyzed. Chapter 3, "Adding Hardware," discusses this feature in detail.

    Common Stop Errors

    Of the more than 200 kernel-mode stop codes, only a few are especially common. Here they are:

    • Kmode_Exception_Not_Handled (0x0000001E). This error says that an exception occurred in the kernel for which there was no error handler. In most cases, bugcheck can tell you the name of the misbehaving driver. This will be listed in the third line of the display.

    • Irql_Not_Less_Or_Equal (0x0000000A). When a thread issues a software interrupt, it does so at a particular interrupt request level (IRQL). There are 32 IRQLs, with higher numbers having higher priority. An 0A error occurs when a driver running at one IRQL tries to access memory that is owned by a process at a higher IRQL.

    • Unexpected_Kernel_Mode_Trap (0x0000007F). This is generally a hardware problem. Refer to KnowledgeBase article Q137539 for a list of common culprits.

    • Ntfs_File_System (0x00000024). This is commonly caused by a virus, or sometimes an overly aggressive virus checker. It is also commonly cause by file system utilities that attempt to reach around the APIs to access the file system directly. It can also be caused by file system corruption.

    • Page_Fault_In_Nonpaged_Area (0x00000050). This is also commonly caused by virus checkers. It has also been tied to many TCP/IP problems, as well. Some fairly notorious denial-of-service attacks result in a 0x50 stop error, so if you start getting this on your DMZ machines, you might try enabling auditing and applying a packet sniffer to see if you can capture the source of the problem.

    • Inaccessible_Boot_Device (0x0000007B). If this occurs when starting a system that has been in operation a while, it almost always indicates a failed drive, drive controller, or a boot sector virus. If it occurs on a new installation, you may have drive sector translation problems or an improver host adapter driver. This error also occurs if you restart a system following a failure of the primary drive in a mirrored set.

    Memory Dumps

    There are a variety of steps you can take to assess the cause of a bugcheck and try to prevent another like it:

    • If you get a stop error after installing a new piece of hardware, a driver upgrade, or a new application, your first step should be to restart and select the Last Known Good Configuration option. See "Restoring Functionality with the Last Known Good Configuration" for details.

    • If that doesn't work, boot to a Recovery console and delete or rename the offending driver.

    • If you can't physically get to the server, you can set it up for out-of-band (OOB) access to see the bugcheck codes and restart. See "Using Emergency Management Services."

    If you try all of these and are still unable to restore normal operation, you can capture the contents of memory at the time of the stop and send it off to Microsoft Product Support Services (PSS) to analyze. PSS charges a few hundred dollars per incident for this service, but when you compare that against the losses incurred from server downtime, it's often worth the expense.

    Configuring Memory Dumps

    By default, as part of the bugcheck, the contents of RAM are dumped to the paging file. After restart, the paging file is copied to a file called Memory.dmp in the \Windows folder.

    For this full memory dump to succeed, the paging file must be at least the size of RAM plus 1MB for header information. The paging file must be in the root of the boot drive. (Microsoft calls this the System partition.) This is because the bugcheck routine cannot mount a file system, so it is limited to using bare INT13 calls. You can have other paging files on other drives, but they will not be used for the memory dump.

    If you have a fire-breathing server with many, many gigabytes of RAM, you probably don't want to give up gigabytes and gigabytes of real estate in your system partition for the paging file. Also, a multi-gigabyte dump file is not likely to have useful content unless the misbehaving driver leaves a known footprint. To avoid large memory dumps, you have two options:

    • Small memory dumps. This dumps just that portion of RAM owned by the operating system. This is rarely over 1GB and cannot be more than 2GB, at least on IA32 systems. IA64 systems can have a larger footprint. You'll have to check Task Manager to find out how much dump space to set aside.

    • Kernel memory dumps. This dumps just the stack space. This can be useful only if the offending driver leaves a very clear indication. Otherwise, it does not include sufficient information for a full diagnosis.

    Memory dump options are controlled by System properties. Right-click the My Computer icon and select PROPERTIES from the flyout menu. The Advanced tab has a Startup and Recovery button that opens a window to access the memory dump settings. Figure 21.16 shows an example.

    Figure 21.16. Startup and Recovery window showing default Recovery settings for handling system stop errors.

    graphics/21fig16.gif

    Several recovery options in this window are worth your attention. They are as follows:

    • Send an Administrative Alert uses the Alerter service, if it is still functioning after the crash, to put out a network broadcast to members of the Administrators domain local group to notify them of the stop error. If you have a trap management console of some sort (HP Openview, for example), you should also load an SNMP agent on the server so it can trap when the failure occurs.

    • Write Debugging Information To specifies the name of a file that will hold the memory contents after the system reboots. The dump stays in the paging file until the system restarts successfully.

    • Write Kernel Information Only saves hard drive space that would go to waste if you have lots of RAM in a server. You can size the paging file by opening Task Manager, selecting the Performance tab, and looking at the total memory value under Kernel Memory.

    • Automatically Reboot. This option restarts the system after the memory dump has completed. It has the potential of causing a continuous loop if the cause of the blue screen stop doesn't go away after restart. For this reason, it is a good idea to monitor your servers with some sort of SNMP tool that will notify you when the server crashes.

    Examining Memory Dumps

    In production, if you have a server that is crashing regularly and you cannot figure out the problem, it's probably a good idea to spend the money to call Microsoft's Product Support Services. They may want a copy of the memory dump file. Before you burn a huge dump onto a CD for the overnight pouch to PSS, it's a good idea to make sure there is useful information in the dump file. The Windows Server 2003 CD has a utility called DUMPCHK that verifies the integrity of the file contents.

    If you'd like to do your own poking around in the dump file, you'll need a tool. The simplest and most flexible dump analysis tool is the Windows Debugger, Windbg. You can get this tool on the Windows Server 2003 CD, which comes in Technet or can be downloaded from the MSDN web site.

    The Windows Server 2003 CD also has a fairly hefty symbols file that you need to install. These symbols help the debugger interpret what it sees in the dump file. There are two sets of symbol files, one for the retail version of the product and one for the debug version. Unless you are running the debug version of Windows Server 2003 from the MSDN library, use the retail symbols.

    When you install Windbg, it will look for the symbols in the \Windows\Symbols folder. If you did not install the symbols into that default location, you must configure Windbg with the actual location. This is in done in VIEW | OPTIONS | SYMBOLS.

    Windbg has a huge number of uses and switches, all of which lie outside the scope of this book. The feature that can help with crash dump analysis, though, is pretty straightforward. Simply point the program at the crash dump file, \Windows\Memory.dmp, using FILE | OPEN CRASH DUMP. Figure 21.17 shows an example following a crash. I generated the crash in this example using a diagnostic feature in Windows. (If you sneered just now, you're a cynic.) To enable this feature, make the following Registry change to the keyboard driver:

    Key:    HKLM | System | CurrentControlSet | Services | i8042prt | Parameters
    Value:  CrashOnCtrlScroll
    Data:   1 (REG_DWORD)
    
    Figure 21.17. Windbg Debugger window showing results of loading a crash dump file.

    graphics/21fig17.jpg

    After restarting with this setting, you can crash the system by pressing the right Ctrl key (not the left) and then pressing the Scroll Lock key twice.

    The debugger will point right at the source of the crash if it can get a clear picture from the dump file about the events that led up to the bugcheck.

    Following the restart, the Error Reporting Service will want to send Microsoft information about the crash. A notification window gives you the opportunity to decide whether or not to send the information. A hyperlink takes you to the details of what will be sent. Figure 21.18 shows an example.

    Figure 21.18. Error Reporting window following a system crash.

    graphics/21fig18.jpg

      Previous Section Next Section