Monday, June 30, 2008

Isolation Level

The phenomena that are allowed at each isolation level in SQL Server 2000

Friday, June 27, 2008

Getting Detail of SQL Server from Registry

From a Command prompt you can run the following REG instructions

To Get remote server details:
- Add \\servername\... before HKLM to get remote machine details


reg query HKLM\Software\Microsoft\MSSQLServer\MSSQLServer\CurrentVersion /v CurrentVersion


For SQL Server 2000
-------------------
reg query "HKLM\Software\Microsoft\Microsoft SQL Server" /v InstalledInstances



reg query "HKLM\Software\Microsoft\Microsoft SQL Server\80\MSSQLLicenseInfo\MSSQL8.00" /v ConcurrentLimit

-- For each instance, replace MSSQLServer with 80\InstanceName from in the following
reg query HKLM\Software\Microsoft\MSSQLServer\Setup /v Edition
reg query HKLM\Software\Microsoft\MSSQLServer\Cluster /v ClusterName
reg query HKLM\Software\Microsoft\MSSQLServer\MSSQLServer\CurrentVersion /v CSDVersion
reg query HKLM\Software\Microsoft\MSSQLServer\MSSQLServer\SuperSocketNetLib\Tcp /v TcpPort


----- Using Registy Server Listing ---
reg query "HKLM\Software\Microsoft\Microsoft SQL Server\80\Tools\Client\PrefServers" ServerN (where N represents the server number )

reg query "HKLM\Software\Microsoft\Microsoft SQL Server\80\Tools\Client\PrefServers" UserN (where N represents the default login corresponding the ServerN )


For SQL Server 2005
-------------------
reg query "HKLM\Software\Microsoft\Microsoft SQL Server\Instance Names"
- Subkeys represent a different SQL Server Product (SQL=SQL Server, RS=Reporting Server, etc)


- Replace MSSQL.1 with each of the keys from "Inteance Names" on below
reg query "HKLM\Software\Microsoft\Microsoft SQL Server\MSSQL.1\Setup" /v Edition
reg query "HKLM\Software\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\CurrentVersion" /v CurrentVersion
reg query "HKLM\Software\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\SuperSocketNetLib\Tcp\IPAll" /v TcpPort

Tuesday, June 24, 2008

Troubleshooting SQL Server 2005/2008 Performance and Scalability Flowchart

Flowchart on Troubleshooting SQL Server 2005/2008 Performance and Scalability Flowchart is available on SQL Server Customer Advisory Team ( SQL CAT ) site.

http://sqlcat.com/files/folders/280/download.aspx

Monday, June 23, 2008

5 Rules of Data Normalization - poster

This is the place to obtain a copy of the aging but still popular "5 Rules of Data Normalization" poster.

The poster was written in 1989 by Marc Rettig as a subscription premium for Database Programming and Design magazine.

http://www.marcrettig.com/poster/

Sunday, June 15, 2008

What not to do as a DBA

Here are some tips for what not to do as DBA – Newbie or not!

1. Rebuild an index in working hours
This will hit disk I/O very heavily. It is rarely useful to do this during normal working hours, so always schedule it for the evening or overnight - that is, during the period of lowest user activity. Difficult if you work in a 24/7 environment like myself. Look for maintenance window !

2. Stopping the database engine without warning
Why? Lots of frustrated users and a telephone (helpdesk or yours) that won't stop ringing. Trust me I’ve been here – inadvertently shutting down database servers without informing the application users is a cardinal sin!

3. Performing a service pack upgrade during working hours
Usually this involves re-starting the core database engine. Don't do it, again it'll annoy many people. And besides that’s what overtime is for. Follow the Change control !

4. Running test queries against live servers
Do you really know how long they'll run for or how much disk I/O they will demand? I thought not! Again, past experience has led me to be extremely cautious in this department. Test, test out on Development/ Test environment before hitting on Production always !

5. Being arrogant towards other teams in the department
Why? A few can be a complete pain, but explaining to them the issues and trying to work towards a good working compromise is usually more productive than treating them unsympathetically. Likewise the helpdesk guys - you need each other. Foster good relationships with your work colleagues, in the long term it pays dividends.

6. Backing up during working hours
Again it's all about disk I/O. It serves the backup or it serves your users. If you have to do it, look at differentials or transaction log backups: they take less time and reduce dropped connections as a result. I’ve recently had to change from full to differential backups because of space restrictions too. Alternatively, monitor the server and talk to your users/tape guys about the best time to perform one - but only if you really have to.

7. Executing patches/updates against live data
You are kidding, right? At the very least, test it against a live copy, and backup the live database before you apply it. And if possible have a regression script which will allow you to back out the updates if reverting to a backup is not possible. In my experience when users say they have confirmed and tested everything is ok – they haven’t really, give it 3/4 days later – they’ll spot something else!

8. Not securing your database servers
Get clued up on security. Your organisation may have its own Information Security Officers like mine; define user roles and password retention policies etc

To find the DBCC execution progress

To find the DBCC execution progress;

select percent_complete, estimated_completion_time, cpu_time, total_elapsed_time, * from sys.dm_exec_requests

Wednesday, June 11, 2008

SLA - Service Level Agreement for SQL Server 2005

Do you know what an SLA is?

SLA = Service Level Agreement. SLAs are agreements between you and your customers. If you're a DBA, then your customer is typically the company for whom you work. Examples of SLAs are:

1. In the event of a corruption, or other disaster, the maximum amount of data loss is the last 15 minutes of transactions.
2. In the event of a corruption, or other disaster, the maximum amount of downtime the application can tolerate is 20 minutes.

Usually, it's a combination of SLAs such as those above.

Do you know why SLAs are important?

Here's the catch - an SLA is really more than just an agreement between you and your customers - it's more like a contract that you're obligated to meet. This means that if you're a DBA with zero-downtime and zero-data loss SLAs, you need to make sure that in the event of a corruption you can actually meet those SLAs. The obvious thing is that if the SLAs cannot be met then the business will suffer downtime and data loss. The not so obvious thing is that if you're the one who agreed to the SLAs in the first place, and when the disaster strikes, the capabilities of the system are far below the SLA's requirements, then you could lose your job - resume/CV time - I've heard of it happening...

Do you know your SLAs?

You have to know what your SLAs are so you can make sure the system can meet them. Several DBAs I discussed this with don't know what their business' SLAs are, even though they are responsible for making sure they are met. I find this astounding - how can you sign up for meeting an SLA when you don't know that the SLA is? Especially if failing to meet the SLA could lead to resume/CV time...

Do you think you can meet your SLAs?

The other reason to know your SLAs, of course, is so that you can correctly architect your system to meet them. There are a bunch of technologies you can use and strategies you can employ to work towards meeting your SLAs (well beyond the scope of this blog post but will be covered through the year). If you find that you can't meet your SLAs, you need to push-back on your management - otherwise you're setting yourself up for trouble when a disaster occurs and you can't meet the SLAs - you'll be held responsible.

Do you know you can meet your SLAs?

Your disaster recovery plan looks great on paper - but have you actually tried it? I know of one company that has a 15 minute downtime SLA for a 300+GB database but the DBA is relying on clusters to provide that for him. That won't work if the database is corrupt (remember a failover cluster has a single point of failure in its shared-nothing configuration - the disks) and needs to be restored from the last full backup... Another company I know of relies on database mirroring to failover in the event of a disaster but has never tried it to see if their application fails over gracefully... You have to make sure you've practiced recovering from a disaster before the first real disaster happens - you'll be amazed at the little things that are discovered (e.g. if the on-site backups are bad, how long will it take to get the offsite copies brought in-house from the off-site location 100 miles away? Can you still meet your 15 minute downtime SLA in that case?)

SQL Server Specific on SLA

Hours of Operation

1. Hours that the partition/table/database is available to users. The SLAs may be different for various parts of the database, depending on what applications need access to them. Why is it important to differentiate at various granularities? For example, depending on how the unit of data is used, it may require different maintenance than other data and so knowing the availability SLA allows maintenance downtime to be planned.
2. Hours reserved for planned downtime. Again, this may differ at various granularities of data.
3. Amount of advance notice for extended downtime or other changes that affect users. For instance, when my bank upgraded its computer system last year, they gave a series of warnings over the preceeding few months so that people weren't surprised.

Service Availability

1. Percentage of time SQL Server service is running and able to service connections.
2. Percentage of time a particular partition/table/database is available for use (i.e. not exclusively locked for maintenance or restore).

System Performance

1. Number of concurrent users the system supports.
2. Number of transactions supported per unit of time.
3. Acceptable level of performance, such as latency experienced by users for a variety of operations.
4. Minimum time for an update to be replicated to various remote sites.

Disaster Recovery

1. Time allowed for recovery from each type of failure (e.g. accidental data deletion, database corruption, SQL Server crash, OS crash, server failure, site failure).
2. Time it takes to bring critical data online (e.g. the read/write partitions of a sales database) such that operations can continue and less critical data can be recovered later.
3. Time taken to recover data to the point of failure.
4. Maximum acceptable data/transaction/work loss for various kinds of failures.
5. Maximum time for application failover to a remote server/site.

Support

1. Methods available for application users to get help.
2. Maximum response/resolution time from a DBA to respond to various types of problems.

Other

1. Maximum amount of space for user tables/databases.
2. Amount of users in specific roles.

Some other things to consider are how you define an SLA - for example, in transactions per second or in commit latency that users experience - and the interplay between SLAs - for example, the commit latency SLA may be affected if the acceptable data loss SLA is zero and a solution such as synchronous database mirroring or remote SAN mirroring is used.

Bottom line is that although it can be simple to quickly define and announce a set of SLAs for a given application, its very difficult to make sure that each is palatable to all involved, guarantee that each can be met, and allow easy diagnosis of the system to work out which component is failing when an SLA is not met. SLAs really need to be defined while a sytem is being designed as retro-fitting SLAs after-the-fact can be very time-consuming and costly.

Tuesday, June 10, 2008

SQL Server 2005 Performance Statistics Script

PSS SQL Server Engineers : SQL Server 2005 Performance Statistics Script

"Perf Stats Script", since it's useful for troubleshooting lots of issues in addition to blocking. The script takes advantage of the extremely rich set of data exposed by the new Dynamic Management Views in SQL Server 2005.

Download the script from http://blogs.msdn.com/psssql/attachment/1736376.ashx

Monday, June 09, 2008

SAN SRDF based Replication

The replication mode is "Synchronous". This means that when the host server commits a write to a SAN disk, that bit of data will be written to the DR array in Norwest first, the DR array then sends an acknowledgement back to the production array and that bit of data will be written to the prod array when it receives that acknowledgement from the DR side. Meaning that the data between the R1 (prod) and R2 (DR) devices are always in-sync when the SRDF mirrors are established.
• When the SRDF mirror/replication is established, the process write-disables the target devices on the DR array in Norwest, so your DR server will not be able to access the DR SAN devices while that is on. If you want to make the DR LUNs available to the DR host, we will have to split the SRDF mirror for that set of servers.
• Zero data loss? As long as the data has been committed to the SAN disks, it should be covered. As for data that might still be sitting in the server cache but have NOT been flushed out/committed to SAN disk, well, the SAN storage has no control over data that has not been written to it.