Loss of Key Replication Components
When a domain controller fails, its replication partners realize that it has stopped responding to update requests. The partners notify their KCC service, which sets to work creating connection objects that bypass the failed DC, something like Department of Transportation workers directing traffic around an accident.
When the failed domain controller returns to service, its replication partners realize that it is available once again and they inform their KCC service. The KCC rebuilds the connection objects to the domain controller and tears down the bypass.
This work can take a little time, so be patient. You should eventually see a note in the Event log that the KCC was able to create the connection objects and the DRA was able to sync up the naming contexts.
The situation gets a little more complicated if the failed domain controller has special inter-site replication duties. This includes bridgehead servers and Inter-site Topology Generator servers.
Selecting a New Preferred Bridgehead Server
Replication between sites is performed by selected domain controllers called bridgeheads. Under normal circumstances, the KCC handles the loss of a bridgehead server with no administrator intervention. It sorts through the available domain controllers in order of their Globally Unique Identifier (GUID) and selects the server with the highest GUID.
You may want the KCC to select bridgeheads from a certain group of servers. For instance, it is best to have a Global Catalog server acting as a bridgehead so the partial naming contexts can be replicated in a single hop to the next site. You might want to limit the candidate list to GC servers.
Designate a domain controller as a preferred bridgehead server using the server properties in AD Sites and Services. Chapter 7, "Managing Active Directory Replication," has detailed steps for this operation. Figure 10.2 shows the bridgehead selection window from Active Directory Sites and Services.
Figure 10.2. AD Sites and Services console showing server Properties window with preferred bridgehead selection for the IP transport.
Always designate at least two preferred bridgehead servers. If you select only one, and it fails, all replication will stop until you select another. If the last available bridgehead server goes down, you must select a new preferred bridgehead quickly so replication can recommence.
Selecting a New Inter-Site Topology Generator
The loss of an ISTG does not present an immediate problem. The only real chore the ISTG needs to do is create connections between bridgehead servers. This is not done very frequently. Still, you don't want the failure to go unhealed.
The ISTG informs its replication partners of its presence by updating an attribute in its object every thirty minutes. If an hour goes by without an update, the KCC on the other domain controllers in the site realize that the ISTG is no longer available and they set to work selecting another.
The ISTG can be identified using the Properties window for the NTDS Site Settings objects in AD Sites and Services. Figure 10.3 shows an example.
Figure 10.3. NTDS Site Settings object properties showing the ISTG server for the selected site.
The KCC uses the same algorithm to select a new ISTG as it does to select a new bridgehead. It selects the domain controller in the site with the highest GUID. In the case of the ISTG, however, there is no "preferred" domain controller setting.
After you get replication working, it's a good idea to ensure that the replicas on various domain controllers match. The simplest way to do this is by using the DSASTAT utility that comes in the Support Tools. Open a command console and run dsastat -loglevel:info. Depending on the speed of the links to the replication partners, the utility might take five to fifteen minutes to finish a run. At the end, you'll get a list of every object on the replicas and their sizes so you can compare object count and size.
Loss of a WAN Link
If a WAN link goes down, the bridgeheads on either side will realize that they cannot pull replication from their partner. If an alternate (but higher cost) connection exists to another site, the Directory Replication Agent (DRA) on the bridgeheads will use this connection automatically. The DRA also informs the ISTG, which tries to create new connections to other bridgeheads.
You should not need to intervene in the operation of the DRA and ISTG. If they are unable to come up with a suitable replication path due to the way your sites are configured and connected, you may have to live with disabled replication until you can re-establish communications.