Monday, May 14, 2007

Using Gawk for Log Analysis

Gawk is a very powerful text processing and pattern matching utility. It is the Gnu version of awk. I use it to search into the logs where grep cannot do.

For example, in our Blackboard CE/Vista 8, the webserver.log files contain the following fields:
date time time-taken c-ip x-weblogic.servlet.logging.ELFWebCTSession sc-status cs-method cs-uri-stem cs-uri-query bytes x-weblogic.servlet.logging.ELFWebCTExtras cs(User-Agent)
I can use the gawk command to easily find all http requests that had taken longer than 60 seconds to process.
gawk -F\t "$3 > 60" webserver.log
In gawk commands, the fields are preceeded by a $ sign. ie. $1 refers to date, $2 refers to time and so forth. Use the -F switch to specify the delimiter which is tab in this case.

To find all http requests that had taken a long time to process (larger than 60 seconds), but do not involve downloading of large files (smaller than 50MB):
gawk -F\t "($3 > 60) && ($10 < 52428800)" webserver.log
To find all http requests that got the error 500:
gawk -F\t "$6 == 500" webserver.log
For Windows users, you can have gawk by installing cygwin. Remember to add c:\cygwin\bin (or wherever your installation directory is) to your environment path. This way, you can run gawk directly from any command prompt or inside a script.