This situation happened in a Windows Server 2003 SP2 environment on a Clustered Microsoft SQL Server 2005 SP4 instance.
First off, this article helps to understand a little bit more about what is the Quorum resource on a Failover Clustering environment and it’s primary functions.
The SQL Server Instance was not starting and the following error was recorded at the windows application log:
Could not open error log file ‘K:\i2_SystemDb\Microsoft SQL Server\MSSQL.2\MSSQL\LOG\ERRORLOG’. Operating system error = 32(The process cannot access the file because it is being used by another process.).
However, it was normally starting through the Services console (services.msc) from outside the Cluster Administrator. This was a strong indicator that something was definetly wrong with some of the cluster settings.
Since the ERRORLOG file was in use by another process, it was obviously another SQL instance on the cluster. In this case, it was the named one. From SQL Server Configuration Manager it was determined that both existing instances, the default (MSSQLSERVER) and the named, were pointing to the exact same data drive and files.
-dR:\Microsoft SQL Server\MSSQL.1\MSSQL\Data\master.mdf;-eR:\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\ERRORLOG;-lR:\Microsoft SQL Server\MSSQL.1\MSSQL\Data\mastlog.ldf
-d (Master data file path), -e (ERRORLOG file path), -l (Master log data file path)
The registries also showed this same setting:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\Parameters
But, for this scenario, changing the registry directly on this above screen would not work, as the Cluster configurations for the SQL paths resides also in another registry – for the Quorum disk.
This KB article from Microsoft has the solution. The method 1 was used.
Note that there is one GUID(unique identifier) entry for each Cluster resource, such as SQL Fulltext, Volume manager,DTC,Network name, to mention some.
The 000000004 is the checkpoint file name. With these details in hand, the KB steps 5 to 9 (for method 1) must be followed so you can update the registries accordingly (SQLArg keys) with the correct drives path and SQL will be able to start with success.
How can I track what could have caused this?
To reach out the reason when this situation occurs, your starting point should be looking at recent changes to all related resources from the Cluster environment, specially to drives and quorum configuration.
Thank you for viewing this post. Should you have any concern, question or suggestions, feel free to comment.