Friday, February 13, 2009

SQL Memory Paged Out During Large File Copy

We currently use SQL server maintenance plan to generate full database backup files (nightly or weekly, depending on the application), differential and transaction log backup files throughout the day. We then run a script to robocopy (from Windows Server 2003 Resource Kit Tools) the generated backup files offsite.

Ever since we migrated our databases from SQL server 32-bit to 64-bit, we started encountering a problem where SQL server process memory was being paged out. As a result, our applications became momentarily unresponsive and unavailable to our customers. This happened consistently around the same time every night, which coincided with the time when the offsite copying of SQL database full backup files was running. Error message in the SQL server log was:
A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: 0 seconds. Working set (KB): xxxxxx, committed (KB): xxxxxx, memory utilization: xx%.
According to this Microsoft Article: How to reduce paging of buffer pool memory in the 64-bit version of SQL Server 2005, only SQL Server 2005 Enterprise Edition can use the Lock pages in memory to reduce paging. We are running SQL Server 2005 Standard Edition. We may need to upgrade/ migrate to Enterprise Edition. This means additional costs of licensing and migration to consider.

In the mean time, our applications are failing momentarily every night. I need to come up with a better approach now.

Our requirements are pretty simple (listed in the order of importance):
  • No paging of SQL server memory.
  • Can be automated via GUI, or script.
  • Still maintain a copy of database backup files locally. (This is preferable because in the event of a problem that requires a full database restore, the process would be much faster)
  • Fast.
Is that too much to ask for?

Google helped me locate this article Technet's Ask the Performance Team Blog: Slow Large File Copy Issue. It has a pretty good explanation of what is going on (the use of buffered I/O) during the copy process which leads to the paging problem we encountered.

I followed the recommendation in the article and tested using eseutil to perform offsite copying of our database backup files. Unfortunately, it did not work as expected. SQL server process memory was still being paged out.

I continued to investigate options. Here is a list of what I have looked into:
  • cat (from Cygwin)
  • split to remote share and cat to combine (from Cygwin)
  • split locally, copy to remote share, and cat to combine (from Cygwin)
  • rsync (from Cygwin)
  • Symantec Backup Exec backup agent
  • Symantec Backup Exec SQL backup agent
  • Windows NTBackup
  • SQL backup to remote share
  • SQL backup using MIRROR TO clause (Eliminated since it is available in SQL Server 2005 Enterpise Edition only)
  • SQL database mirroring and backup from mirrored databases (Eliminated since backup option is not available in the read-only mirrored databases)
  • SQL backup to a file server cluster volume. After database backup files are generated, failover to the other file server node for the offsite backup to run against (Not recommended due to additional complexity. We may better off migrate to SQL Server 2005 Enterprise Edition)
After some testings, the winner (or the best compromise) is rsync.

We continue to use SQL server maintenance plan to generate all database backup files. The files are kept locally on the SQL server. We then run a script to rsync the files offsite. It never pages out our SQL server process memory. It still takes a long time to run, but compare to other approaches, this is the fastest one.

Our largest database is the webctdatabase of our Blackboard LMS. It is now about 400GB, and it takes about 15 hours for rsync to copy it offsite.

6 comments:

Corinna Lo said...

I subsequently posted some tips on Using rsync on Windows.

Mark said...

Great write up.

What speed connection do you have offsite? 15 hours is a while.

if you were doing a local copy on the same lan to a different server, which method do you think you would use?

Thanks,
Mark

Unknown said...

Lock server pages in memory is no longer only for Enterprise Edition. In standard edition you can achieve this if you update your SQL Server to atleast
SQL Server 2005 SP3 CU4 or SQL Server 2008 SP1 CU2

Corinna Lo said...

hi, Mark

Thanks for your comment. The bottleneck does not seem to be network related. ie. It takes about the same time to copy a large file locally, or through our network backbone.

I notice many copying utilities perform much faster if I have SQL server service stopped. So it would seem to be a resource in memory issue on the originating SQL server itself.

We are running a full database backup once a week, differential backups every 6 hours, and transaction log backups hourly. The differential, and transaction log backup files are significantly (relative to full backup file) smaller. So it is not too bad to take 15 hours to perform the offsite copy once a week.

Although I know we have that 15 hours exposure when the production server could go down, and we do not have a proper backup offsite. I have scheduled a transaction log backup just before the full backup starts since it is very fast to copy the transaction log backup offsite. We also have the database mirrored. These should helped mitigate some of the risks.

Corinna Lo said...

hi, Michael

Thanks for the information you provided.

I'm adding a link to the Cumulative update package 4 for SQL Server 2005 Service Pack 3 article for easy reference.

Access Arm said...

It's important the structure and content is spot on to make the most of it. information management systems