Cluster Quorum Registries pointing to the wrong SQL Server Disk

This situation happened in a Windows Server 2003 SP2 environment on a Clustered Microsoft SQL Server 2005 SP4 instance.

First off, this article helps to understand a little bit more about what is the Quorum resource on a Failover Clustering environment and it’s primary functions.

http://msdn.microsoft.com/en-us/library/windows/desktop/aa371819%28v=vs.85%29.aspx

The SQL Server Instance was not starting and the following error was recorded at the windows application log:

Could not open error log file ‘K:\i2_SystemDb\Microsoft SQL Server\MSSQL.2\MSSQL\LOG\ERRORLOG’. Operating system error = 32(The process cannot access the file because it is being used by another process.).

However, it was normally starting through the Services console (services.msc) from outside the Cluster Administrator. This was a strong indicator that something was definetly wrong with some of the cluster settings.

Since the ERRORLOG file was in use by another process, it was obviously another SQL instance on the cluster. In this case, it was the named one. From SQL Server Configuration Manager it was determined that both  existing instances, the default (MSSQLSERVER) and the named, were pointing to the exact same data drive and files.

Image

-dR:\Microsoft SQL Server\MSSQL.1\MSSQL\Data\master.mdf;-eR:\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\ERRORLOG;-lR:\Microsoft SQL Server\MSSQL.1\MSSQL\Data\mastlog.ldf

-d (Master data file path), -e (ERRORLOG file path), -l (Master log data file path)

The registries also showed this same setting:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\Parameters

Image

But, for this scenario, changing the registry directly on this above screen would not work, as the Cluster configurations for the SQL paths resides also in another registry – for the Quorum disk.

This KB article from Microsoft has the solution. The method 1 was used.

http://support.microsoft.com/kb/953504

Image

Image

Note that there is one GUID(unique identifier)  entry for each Cluster resource, such as SQL Fulltext, Volume manager,DTC,Network name, to mention some.

Image

The 000000004 is the checkpoint file name. With these details in hand, the KB steps 5 to 9 (for method 1) must be followed so you can update the registries accordingly (SQLArg keys) with the correct drives path and SQL will be able to start with success.

How can I track what could have caused this?

To reach out the reason when this situation occurs, your starting point should be looking at recent changes to all related resources from the Cluster environment, specially to drives and quorum configuration.

Thank you for viewing this post. Should you have any concern, question or suggestions, feel free to comment.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s