As a SQL Server DBA, there will be times when you will cross over into the system administrator realm of responsibilities or at a minimum explaining them how to fix errors so your SQL environment runs better. WSFC (windows server failover cluster) is setup for AlwaysOn (AO) Availability Group (AG) but setup with no shared disk resources. If WSFC is having issues, your Availability Group will not function properly or will cause you a lot of heartaches trying to figure out the root cause of the issue issues.

Most of the time WSFC errors will not occur until AO is setup; however, you should make sure no errors exist in the WSFC logs before setting up AO. You can look at the event viewer or look within Failover Cluster Manager for errors. Fix errors before setting up AO or have the SA fix the errors.

Only add nodes within the Failover Cluster Manager that are part of the AlwaysOn Availability Group failover. Adding other servers that will not be part of the AG will cause issues if those nodes have problems. If other servers are part of the WSFC, make sure those servers do not have a separate AG that is part of the WSFC. If they do then the AG will have to be deleted (verify the AG name is no longer under Roles in the Failover Cluster Manager for the cluster) and the nodes evicted from the WSFC. After that is done, a new WSFC will have to be created and the AG recreated. If those servers do not have an AG created, they should be evicted from the WSFC. Do this during a maintenance window in case of something going wrong.

Here are some common errors and how to fix them.

Error: The file share witness resource “failed to arbitrate for the files share “\servername\share”. Please ensure that file share \servername\share exists and is accessible by the cluster.

failed to arbitrate error

Fix: To fix the error, an admin needs to give EVERYONE FULL control to share \servername\share. This is a share that the cluster uses within WSFC and needs access to it. Nothing is in this share.

Error: The cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster or a failover of the witness disk…

cluster shutting down quorum lost error

Fix: This can be fixed by changing the cluster threshold and delay settings. More details on how to change this can be found here – https://virtual-dba.com/alwayson-changing-cluster-configuration/

Error(s): Cluster is offline.

Clustered role ‘Cluster Group’ has exceeded its failover threshold. It has exhausted the configured number of failover attempts within the failover period of time allotted to it and will be left in a failed state. No additional attempts will be made to bring the role online or fail it over to another node in the cluster…

The Cluster service failed to bring clustered role ‘Cluster Group’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Cluster resource ‘Clustered IP Address XXX.XXX.XXX.XXX of type ‘IP Address; in clustered role ‘Cluster Group’ failed…

Encountered a failure when attempting to create new NetBIOS interface while bringing resource ‘Cluster IP Address XXX.XXX.XXX.XXX online (error code ‘1450’). The maximum number of NetBIOS names may have been exceeded.

cluster offline error

Fix: After validating the WSFC had no errors, the problem was a duplicate IP address conflict issue. The SA needs to fix this. Verify DNS has the IP address for the cluster node. If the IP address is changed, make sure DNS is fixed. If the IP does not respond to ping, flush the ARP cache to remove old information or you can just remove the one bad entry.

How to flush the whole ARP cache or just remove one bad entry: http://www.techrepublic.com/blog/windows-and-office/quick-tips-flush-the-arp-cache-in-windows-7/

Error: The computer object associated with cluster network name resource ” could not be updated. The cluster identity ‘Name$’ may lack permissions required to update the object. Please work with your domain administrator to ensure that the cluster identity can update computer objects in the domain.

cluster network name could not update

Fix: https://support.microsoft.com/en-us/help/2770582/event-id-1222-when-you-create-a-windows-server-2012-failover-cluster

Error: Cluster network name resource ‘SQL Network Name (SQLClusterName)’ failed registration of one or more associated DNS name(s) for the following reason: DNS operation refused. Ensure that the network adapters associated with dependent IP address resources are configured with at least one assessable DNS server.

cluster network failed reg DNS error

Fix:

  1. Open DNS Manager, find the record (SQLClusterName) (Host(A) record) for the SQLClusterName resource.
  2. Go to properties for that record
  3. In the Security tab, make sure the WindowsClusterName is included if not add it.
  4. Make sure the WindowsClusterName (will have $ after the name) has Write, Read and Special permissions checked under Allow
  5. Click Advance, locate WindowsClusterName, and click Edit
  6. Make sure that Write all properties, Read permissions, All Validated Writes are selected
  7. Click OK three times to exit.

Error: Cluster network name resource ‘name’ cannot be brought online. The computer object associated with the resource could not be updated in ‘domainname’ for the following reason: Unable to update password for computer account… The cluster identity ‘windowsclustername$’ may lack permissions required to update the object. Please work with your domain administrator to ensure that the cluster identity can update computer objects in the domain.

cluster network unable update password

Fix:

  1. Within AD, look for the Listener name.
  2. Go to Properties of the computer (listener name), then click on the security tab.
    a. If you do not see the security tab close the properties window for the listener, click on
    View then check Advanced Features. This will allow you to see the Security tab of the
    listener within Computers.
  3. Within the security tab, give the WindowsClusterName (it will have a $ after the name) FULL
    CONTROL permissions.
security full control permissions
properties permissions full

Error: No matching network interface found for resource ‘AGName_XXX.XXX.XXX.XXX’ IP address ‘XXX.XXX.XXX.XXX’ (return code was ‘5035’). If your cluster nodes span different subnets, this may be normal.

The Cluster service failed to bring clustered role ‘AGName’ completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.

Cluster resource ‘AGName_XXX.XXX.XXX.XXX’of type ‘IP Address’ in clustered role ‘AGName’ failed. Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

no matching interface error
cluster service failed online error
failover clustering error

Fix: You will see these errors if you try to configure your listener and you either have multiple computers on different subnets or all your servers on the same subnet.

  • If the servers that you want to be part of the AG are on the same subnet, make sure your primary NIC (look at all NICs settings) subnet mask are set the same for all servers. Once you fix that, you will be able to create a listener.
  • If you have multiple servers on different subnets, make sure you have an IP address for every subnet your computer is attached too.

Error: Cluster network name resource failed registration of one or more associated DNS names(s) because the access to update the secure DNS Zone was denied.

Cluster Network name: ‘AGName_ListenerName’DNS Zone: ‘domain.com’

Ensure that cluster name object (CNO) is granted permissions to the Secure DNS Zone.

failed registration access secure DNS Zone

Fix:

  1. Edit the NIC. Open Control Panel\Network and Internet\Network Connections. Go to Properties for your NIC. Click Properties for Internet Protocol Version 4(TCP/IPv4) and/or Internet Protocol Version 6 (TCP/IPv6).
  2. Click on the DNS tab. Uncheck “Register this connection’s addresses in DNS”.
  3. You need to do this on all nodes that are part of the cluster.
NIC register connection address DNS

Error: The computer object associated with cluster network name resource ‘AGName_ListenerName’ could not be updated.

The text for the associated error code is: Unable to protect the Virtual Computer Object (VCO) from accidental deletion

The cluster identity ‘Clustername$’ may lack permissions required to update the object. Please work with your domain administrator to ensure that the cluster identity can update computer objects in the domain.

update unable to protect VCO

Fix:

  1. Edit the NIC. Open Control Panel\Network and Internet\Network Connections. Go to Properties for your NIC. Click Properties for Internet Protocol Version 4(TCP/IPv4) and/or Internet Protocol Version 6 (TCP/IPv6).
  2. Click on the DNS tab. Uncheck “Register this connection’s addresses in DNS”.
  3. You need to do this on all nodes that are part of the cluster.
register connection addresses DNS