Saturday, July 21, 2007

BbWorld 07 Conference Presentation

Blackboard acquired WebCT in year 2006. There are a lot of changes made soon after the merger. The product WebCT Campus Edition was renamed to Blackboard Learning System CE. A number of senior engineers from WebCT left. The support model was changed. But the main issue is the lack of good and timely technical support, and this have caused us a lot of struggle to support the product day to day.

But our spirits remain strong. We, the same group who presented last year, met again and co-presented "A Year Later : Looking Back and Moving Forward" in the BbWorld '07 Conference. Here are the slides from our university:



Tuesday, June 12, 2007

SharePoint MOSS 2007 with SSL termination on Load Balancer

We want to enable SSL in our SharePoint (MOSS 2007). Since we already have a pair of load balancers (F5 Network's BigIP load balancers) for our Blackboard Learning Management System. We would like to use them for SSL termination for SharePoint as well. The advantage is that it offloads all encryption and decryption work from our SharePoint servers on to the load balancer (which is designed to do that work, and more).

The network path is as follows:

Browser ---https---> load balancer ---http---> SharePoint servers

However, it turns out to be not an easy task. We found that the URLs embedded in http responses (such as form action link) from SharePoint are in http. Since SharePoint never knows that the traffic was originally https (as you can see from the network path above), of course it would embed URLs in http. It kind of makes sense.

I searched all over the places to see if someone had already found a solution.

One suggestion was to use the stream profile of the load balancer as workaround:
  • On the BigIP load balancer, under Local Traffic | Virtual Servers | Profiles, choose Others | Stream.
  • Create a Stream profile with Settings:
    Source http://sp.domain.com
    Target https://sp.domain.com
It does work. All "http://sp.domain.com" in the http responses from SharePoint would be replaced by "https://sp.domain.com". If you decide to purse this approach, you must read AskF5 knowledge base article SOL6422: Using the Stream profile with HTTP traffic may prevent the client from displaying all of the data. It documents a known issue of Stream profile, and the workaround.

But I am persistence, and kept pursuing further for the real fix in SharePoint. The following two articles had been very useful in helping me derive my own solution using BigIP load balancers.
It took me a day, and I think I figured it out:
  • First you create a Sharepoint site in default zone, and port
    spsite port 8888
  • Sharepoint will create the web application, content database accordingly.
  • Then, extend this web application to a new SharePoint web site with your internal host name, port, and no SSL
    http://sp.domain.com port 80
  • In the Load Balanced URL field, use https://sp.domain.com (yes, https here!).
  • Put this site in Internet zone.
  • Then, go to Operations | Alternate Access Mapping. You will see that the following entries:

    Internal URLZonePublic URL for Zone
    http://spsite:8888Defaulthttp://spsite:8888
    https://sp.domain.comInternethttps://sp.domain.com

  • Now, click on Add Internal URLs. Add your internal non-SSL url as Internet Zone.
    http://sp.domain.com Internet
  • Then, go back to Operations | Alternate Access Mapping screen. You will see that the following entries:

    Internal URLZonePublic URL for Zone
    http://spsite:8888Defaulthttp://spsite:8888
    https://sp.domain.comInternethttps://sp.domain.com
    http://sp.domain.comInternethttps://sp.domain.com
Only then, SharePoint will know that the incoming URL http://sp.domain.com is associated with the Internet zone, and it should embed https://sp.domain.com inside form action link, etc when sending responses back to users.

Monday, May 14, 2007

Using Gawk for Log Analysis

Gawk is a very powerful text processing and pattern matching utility. It is the Gnu version of awk. I use it to search into the logs where grep cannot do.

For example, in our Blackboard CE/Vista 8, the webserver.log files contain the following fields:
date time time-taken c-ip x-weblogic.servlet.logging.ELFWebCTSession sc-status cs-method cs-uri-stem cs-uri-query bytes x-weblogic.servlet.logging.ELFWebCTExtras cs(User-Agent)
I can use the gawk command to easily find all http requests that had taken longer than 60 seconds to process.
gawk -F\t "$3 > 60" webserver.log
In gawk commands, the fields are preceeded by a $ sign. ie. $1 refers to date, $2 refers to time and so forth. Use the -F switch to specify the delimiter which is tab in this case.

To find all http requests that had taken a long time to process (larger than 60 seconds), but do not involve downloading of large files (smaller than 50MB):
gawk -F\t "($3 > 60) && ($10 < 52428800)" webserver.log
To find all http requests that got the error 500:
gawk -F\t "$6 == 500" webserver.log
For Windows users, you can have gawk by installing cygwin. Remember to add c:\cygwin\bin (or wherever your installation directory is) to your environment path. This way, you can run gawk directly from any command prompt or inside a script.

Tuesday, August 1, 2006

WebCT Impact 2006 Conference Presentation

We are one of the first few universities who have migrated from WebCT CE4 to CE6/ Vista4 soonest as the product was released. Together, we presented "Lessons Learned : Migrating From CE4 To CE6/ Vista4" in the WebCT Impact 2006 Conference. Here are the slides from our university:



Wednesday, May 3, 2006

Signs That Your Machine May Be Compromised

Here are some of the signs to look for which may indicate your machine is being compromised.
  • Your web site is being defaced, or have javascript inserted that send users to another site.
  • Your machine is listening on some new or unknown ports.
  • The logs suddenly become much larger than what they usually are.
  • The logs are not logging any thing.
  • Disk space utilization of your machine suddenly increases.
  • Network utilization of your machine suddenly increases.
  • Your machine runs unusually slow.
  • Someone reported that your machine doing some kind of attacks on theirs, or spamming, or hosting copyrighted movies, etc.
  • Google search for "viagra site:yourwebsite.com" or other keyword. Result came out positive, but you did not post those pages there.

Monday, May 1, 2006

Virtual World, Real Money - Her Second Life is Good!

The front page of the current issue (May 1, 2006) of Business Week caught my attention.



It talks about a Chinese entrepreneur is making real money out of the Second Life virtual world. The entire article can be found here:
http://www.businessweek.com/magazine/content/06_18/b3982001.htm

This virtual reality thing is not just a game anymore. There are lands out for auctions, bidding price start at USD1000. There are all kinds of products selling inside the virtual world. There is even currency (Linden dollar to USD) fluctuation.

And you wonder... why would someone pay over a thousand dollar for some pixels on the computer screen?

The days of one way internet is gone. This or the next generation don't just logon and read something. They want to engage in doing something. The games, tools or web applications (however you want to call them) allow them to do so. They empower the people, engaging them, enabling them, allowing them to be creative, and to interact. Just like the wikipedians, the "players" are all very passionate in what they are doing. And the value of the application is the community.

Sunday, November 20, 2005

WSUWiki Initial Log Analysis Result - Month of November 2005

I'm running a log analysis tool called Awstats against the logs of some of our applications.

I downloaded the result into Excel, and do a little massaging on the data - Basically, I separated all the "action=" with the pages-URL. Among all "action=" pages, I added the counts of "action=edit", and "action=submit" together. They are both generated by the process of editing (Edit, Preview, Save). Among all individual pages, I added /index.php and /index.php/Main_Page together. They are basically the same page.

Here is the top 10 most viewed Pages-URL:

Pages-URL Viewed EntryExit
Main_Page 1447442246
Chinese_Calendar_2006 622 405 122
Cultural_Politics_of_Sport:_Annotated_Bibliography 301 104 42
Fix_Outlook_2003_Phonebook_Issue 236 170 51
Chinese_Calendar_2005 197 125 43
Special:Search 120 4 16
Category:CES_308_RKing 98 3 0
User:Krussell07/CES_308 9130
Category:CoursePagelist 891 0
History_Of_Sport8324

Notice that both Chinese Calendars are high on the list, I am pleasantly surprised.

But then, as I look more... I was even more surprised. I have only promoted the URL of the AAPI page to the asian groups. It's been over a month ago. Being too caught up at my main duty, I admit I haven't done much since then.
But the two Chinese Calendars (linked from the AAPI page) are on the top 10 most visited list, but not the AAPI page itself.
They have very high "Entry" counts. Apparently, once people found those calendar pages, they bookmarked them.
The 2006 version has a much higher "Viewed" count then the 2005 version. (Due to student groups planning on events and gatherings of next year... e.g. looking at when the Chinese New Year is, maybe?)

Among the top 5, the "Entry" counts are several times higher than the "Exit" counts. This implies that once people visit a page (and find it useful), they also wander off, looking at what else could be interesting to them on the site.

Finally, I added the "Viewed" counts of all individual page.I found that in this month of November, so far there are 6865 views and 967 edits. There are 7 times more views than edits.

Ok, maybe I should exclude "/index.php/Main_Page" in the numbers.
This gives 5418 views and 967 edits. Still over 5 times more views than edits.

People are visiting (and re-visiting) pages for "reference". They would generally poke around further on the site if the information on some particular pages are useful to them. Information here flows one direction only.

Among all the 967 edits, there are 789 counts of clicking the edit button, and and 178 actually submit. (Submit includes both preview and save. This is just the way the URL is presented to the server). Intuitively, I would think there should be more submit than edit... since people may only click edit once on a page, but preview many times before they save their work. The log analysis result shows that it is not the case. There are 4 times more people click on edit, than actually saving their work.

Could it be due to the wiki (text editor) hard to use for end-user?
What can we do to encourage more contributions?