Friday, February 20, 2009

Web Slides on Harvesting Grade Book

Diigo has a feature where you can create a "List" of bookmarks. Inside the list, you can order the bookmarks in any sequence you want, and create a slide show from it.

I just created a series called The Evolution of Harvesting Grade Book. It tells the story behind some of the transformative assessment projects our department has been working on - visualized through what is currently called the "Harvesting Grade Book".

Friday, February 13, 2009

SQL Memory Paged Out During Large File Copy

We currently use SQL server maintenance plan to generate full database backup files (nightly or weekly, depending on the application), differential and transaction log backup files throughout the day. We then run a script to robocopy (from Windows Server 2003 Resource Kit Tools) the generated backup files offsite.

Ever since we migrated our databases from SQL server 32-bit to 64-bit, we started encountering a problem where SQL server process memory was being paged out. As a result, our applications became momentarily unresponsive and unavailable to our customers. This happened consistently around the same time every night, which coincided with the time when the offsite copying of SQL database full backup files was running. Error message in the SQL server log was:
A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: 0 seconds. Working set (KB): xxxxxx, committed (KB): xxxxxx, memory utilization: xx%.
According to this Microsoft Article: How to reduce paging of buffer pool memory in the 64-bit version of SQL Server 2005, only SQL Server 2005 Enterprise Edition can use the Lock pages in memory to reduce paging. We are running SQL Server 2005 Standard Edition. We may need to upgrade/ migrate to Enterprise Edition. This means additional costs of licensing and migration to consider.

In the mean time, our applications are failing momentarily every night. I need to come up with a better approach now.

Our requirements are pretty simple (listed in the order of importance):
  • No paging of SQL server memory.
  • Can be automated via GUI, or script.
  • Still maintain a copy of database backup files locally. (This is preferable because in the event of a problem that requires a full database restore, the process would be much faster)
  • Fast.
Is that too much to ask for?

Google helped me locate this article Technet's Ask the Performance Team Blog: Slow Large File Copy Issue. It has a pretty good explanation of what is going on (the use of buffered I/O) during the copy process which leads to the paging problem we encountered.

I followed the recommendation in the article and tested using eseutil to perform offsite copying of our database backup files. Unfortunately, it did not work as expected. SQL server process memory was still being paged out.

I continued to investigate options. Here is a list of what I have looked into:
  • cat (from Cygwin)
  • split to remote share and cat to combine (from Cygwin)
  • split locally, copy to remote share, and cat to combine (from Cygwin)
  • rsync (from Cygwin)
  • Symantec Backup Exec backup agent
  • Symantec Backup Exec SQL backup agent
  • Windows NTBackup
  • SQL backup to remote share
  • SQL backup using MIRROR TO clause (Eliminated since it is available in SQL Server 2005 Enterpise Edition only)
  • SQL database mirroring and backup from mirrored databases (Eliminated since backup option is not available in the read-only mirrored databases)
  • SQL backup to a file server cluster volume. After database backup files are generated, failover to the other file server node for the offsite backup to run against (Not recommended due to additional complexity. We may better off migrate to SQL Server 2005 Enterprise Edition)
After some testings, the winner (or the best compromise) is rsync.

We continue to use SQL server maintenance plan to generate all database backup files. The files are kept locally on the SQL server. We then run a script to rsync the files offsite. It never pages out our SQL server process memory. It still takes a long time to run, but compare to other approaches, this is the fastest one.

Our largest database is the webctdatabase of our Blackboard LMS. It is now about 400GB, and it takes about 15 hours for rsync to copy it offsite.

Wednesday, February 11, 2009

SQL Injection Exploit... of a Mom



It's brilliant.

SQL Script to Generate a List of Sections (& Instructors) from Blackboard Learning System CE8

We need to generate a list of all courses/ sections hosted in our Learning Management System (powered by Blackboard Learning System CE8, formerly known as WebCT).

In Blackboard's terminology, all courses and sections are learning contexts. They are hierarchical and their parent-child relationships along with other information are stored in the database. So, the challenge here is to figure out how to write a recursive query.

With a little help from an msdn article: Recursive Queries Using Common Table Expressions, I figured out how to do that today.

The following SQL query will generate a list of all sections ordered by the course names, and the by section names:
with temp1 (parent_learning_context_id, learning_context_id, name, level)
as
(
select parent_learning_context_id, learning_context_id,
name, 0 as level from rpt_learning_context
where parent_learning_context_id IS NULL
union all
select LC.parent_learning_context_id,
LC.learning_context_id, LC.name, level + 1
from rpt_learning_context as LC
inner join temp1 as T
on LC.parent_learning_context_id = T.learning_context_id
)
select T.parent_learning_context_id, LC.name,
T.learning_context_id, T.name from temp1 as T
inner join learning_context as LC
on T.parent_learning_context_id = LC.id
where T.level = 4
order by LC.name, T.name
With a little twist, I can easily modify it to generate a list of all instructors teaching this Spring semester:
with temp1 (parent_learning_context_id, learning_context_id, name, level)
as
(
select parent_learning_context_id, learning_context_id,
name, 0 as level from rpt_learning_context
where parent_learning_context_id IS NULL
union all
select LC.parent_learning_context_id,
LC.learning_context_id, LC.name, level + 1
from rpt_learning_context as LC
inner join temp1 as T
on LC.parent_learning_context_id = T.learning_context_id
)
select LC.name, T.name, P.webct_id_lowercase,
P.name_n_given, P.name_n_family, P.email from temp1 as T
inner join learning_context as LC
on T.parent_learning_context_id = LC.id
inner join rpt_member as M
on M.learning_context_id = T.learning_context_id
inner join person as P
on M.person_id = p.id
where T.level = 4
and LC.name like '%2009_spring%'
and M.role='SINS'
and M.active = '1'
and M.denied_access = '0'
order by T.name
SQL queries - I'm loving it.

Sunday, February 8, 2009

Using Google Chart API to Implement Harvesting Grade Book

Our department has been working on a transformative assessment approach to teaching and learning. One of the challenges of implementing the idea is to provide rich renderings to help in conceptualizing and communicating the assessment data, one of which is in a form of what we currently called the Harvesting Grade Book (See "Rich Assessment From A Harvesting Grade Book" for more information).

Currently, the graphs in the Harvesting Grade Book example are generated manually. The data is downloaded into Excel and then graphed. This manual approach though, has its limitation: the results cannot be scaled and cannot be updated in real-time. We need to find ways to automate this process.

I looked at options.  The Microsoft LogParser utility was a potential solution I investigated to address the scaling issue. For those who are not familiar with it, LogParser is powerful command line utility. It takes input from a variety of sources, such as CVS, query a SQL database and even Active Directory via LDAP, etc. The output can also be in a variety of formats, such as XML, charts, etc. I have used it to generate a number of monitoring and usage graphs for the Blackboard Learning Management System. I believe my familiarity with LogParser would enable me to apply the utility, in conjunction with a wrapper script, to automate graphs generation for the Harvesting Grade Book in batch.

In order to overcome the second limitation - real-time updates, I decided to experiment with Google Chart API. For those who are not familiar with it, the Google Chart API lets you dynamically generate chart on the fly by sending it a REST web services call (which is basically a URL which contains the chart properties and data). The service will then return you the chart as an image in PNG format. What makes this solution elegant and easy to apply is that the URL contains the data. This kind of web-based charting is far easier and efficent than the manual compilations and complications of using Excel.

For example, a radar graph like those in the Harvesting Grade Book example can be created with a URL:
http://chart.apis.google.com/chart?
cht=r
&chs=410x270
&chtt=Self%20Assessment%20compared%20with%20Industry%20Ratings
&chdl=Competency|Team%201|Industry
&chxt=x,y&chxr=|1,0,6|
&chxs=0,000000,10|1,000000,10,,l
&chxl=0:|Problem|Context|Own%20Perspective|Data|Other%20Perspectives|Conclusions|Communication|1:|0|2|4|6
&chls=2.0,6.0,2.0|1.0,4.0,0.0|1.0,4.0,0.0
&chco=000000,888888,FE5900&chm=B,88888860,1,1.0,5.0|B,FE590060,2,1.0,5.0
&chd=t:67,67,67,67,67,67,67,67|73,77,73,80,67,73,90,73|33,33,50,50,33,33,33,33
&chdlp=r
Google will then return the following graph:
 
Perfect! And to any usage limitation concerns of using Google Chart API service, "There's no limit to the number of calls per day you can make to the Google Chart API... If you think your service will make more than 250,000 API calls per day, please let us know by mailing an estimate to ...". With a number like 250,000, I consider the scalability issue solved.

Thursday, February 5, 2009

SQL Script to Download Grades from Blackboard Learning System CE8

I just finished a SQL query to download all grade book data from Blackboard Learning System CE8.
select
LC2.name as Semester,
GB.learning_context_name as Section,
P.source_id as StudentID,
GB.user_login_name,
GB.given_name,
GB.family_name,
GB.column_name,
GB.column_type,
GB.original_value,
GB.override,
GB.max_points,
GB.final_value
from rpt_ext_gradebook as GB
inner join rpt_person as P
on P.person_id = GB.person_id
inner join rpt_learning_context as LC1
on GB.learning_context_id = LC1.learning_context_id
inner join rpt_learning_context as LC2
on LC1.parent_learning_context_id = LC2.learning_context_id
where P.demo_user = 0
order by LC2.name, GB.learning_context_name, column_name, user_login_name
This depends on the RPT_EXT_GRADEBOOK background job to run successfully to have the data populated properly.

I ran it against a clone of our production database. It took 1.5 minutes to retrieve 2.2 millions grade records. I spot checked a number of courses (sections). It included the resulting value of grades even when they were calculated by formula. It also gave me the original grades, any overrides made, and the final grades. Very nice!