Wednesday, April 10, 2013

Reporting object version errors in application servers

The problem

We have an occasional problem with what I assume is corrupt cache on our PeopleSoft app servers. As a result, application users get data integrity errors. The error states "The data structure from the page does not match the data structure in the database. This can be caused by a change in the record definition while you were viewing the page."

Sometimes it is just that -- a PeopleTools object is modified while it's in use. PeopleSoft recognizes it and aborts the transaction, as it should.

More often, though, I've found that the error is bogus, and no such change happened. Clearing the cache on the app server(s) in question will avoid further errors. This has a performance impact, but that's a trade-off I'll make any day. Call me crazy, but I like having transactions actually complete, if a little more slowly than normal, rather than having them abort. The app server rebuilds its cache, and soon all is back to normal.

These data integrity errors show up in the app server logs. Here's an example:

PSAPPSRV.3460 (3110) [04/10/13 12:53:11 JSMITH@WXP (IE 7.0; WINXP) ICPanel](0) Deserialization integrity failure:  record SOME_TABLE version was 6514, now 6526
PSAPPSRV.5092 (1570) [04/10/13 12:53:18 JSMITH@WXP (IE 7.0; WINXP) ICPanel](0) Deserialization integrity failure:  record SOME_TABLE version was 6526, now 6514

Notice that the version number changed from 6514 to 6526, then back again 7 seconds later! This is not normal behavior.

Once these errors show up in the app server log, that user has already seen the error, and has had a transaction abort. However, it's not too late to save other users from the same fate, if I know a particular app server has a cache problem.

Our PeopleSoft app servers are all running Windows. PowerShell seems like a good way to monitor the log files and alert me when we have these errors.

This script demonstrates the following features of PowerShell:

  • hash tables
  • arrays
  • looping with ForEach-Object
  • searching for text with Select-String
  • reading and writing an XML file
  • sending email 


As always, I'm sure I'm doing some of this the hard way, and I invite PeopleSoft and PowerShell folks to suggest improvements. Also, if anybody knows how to make Alex Gorbatchev's Syntax Highlighter do horizontal scrolling on Blogger, please let me know.

I'll go through the interesting sections of the script, then show the whole thing at the end of this post.

The plan

Our PS_HOME folders already have network shares set up on all of the servers, so I decided to run just one monitoring script on another server to check all of them.

In order to avoid a flood of email alerts, I wanted a way to keep track of when each app server had its last error, and only send out email when new errors show up. PS app server logs start new each day, and I don't want to be pestered all day about an error that happened 1 minute after midnight.

I chose to use a hash table to keep track of each server's most recent error. This is a key-value or dictionary data structure. The name of each server will be my key, and the timestamp will be the value.

For persistant storage, I chose XML. PowerShell has cmdlets to import and export data to XML files, which makes saving and retrieving my hash table simple.

First, I'll create the hash table with sufficiently old timestamps, and write it to a file. I only need to do this once, unless I need to add more servers to my list.

$foo = get-date$foo = $foo.AddDays(-1000)
$app_servers = @{"Pete" = $foo; "Roger" = $foo; `
    "Keith" = $foo; "John" = $foo}
$app_servers | Export-CliXML app_servers.xml

This creates a short list of my servers, each with a timestamp of today's date minus 1,000 days. Any new error I find in the logs will update the timestamp.

The script

Now on to the actual monitoring. After initializing a few variables, I read my hash table in from my XML file:

$app_servers = Import-Clixml app_servers.xml

Now, it turns out that you can't modify a hash table while iterating through it, so I make a copy of it:

$copy_of_app_servers = $app_servers.clone()

The location of a PeopleSoft app server log on Windows is $PS_HOME\appserv\DBNAME\LOGS\APPSRV_MMDD.LOG, with MMDD being today's month and date. I derive the MMDD portion of that file name:

$log_date = get-date -format "MMdd"

Now I'm ready to loop through each server, using PowerShell's ForEach-Object syntax. To do this with a hash table, you need to use the .GetEnumerator() method. Once I'm looping through, the way to refer to the current object is $_. Each component of the object is addressed with dot notation; $_.Key for my server names, and $_.Value for my date stamps. I copy these values to string variables, and I build the complete path to the server log. $log_path has previously been initialized to the standard path for our particular version of PeopleTools. I also set up an array, $error_report, that will hold the lines of the log that contain the version errors.

$app_servers.GetEnumerator() | ForEach-Object {
    $current_server = $_.Key
    $previous_error_date = $_.Value
    $error_report = $null
    $error_report = @()
    $filespec = "\\" + $current_server + $log_path `
        + "APPSRV_" + $log_date + ".LOG"

Now I'm ready to read through the log file, looking for the word "version" using the cmdlet select-string. This is roughly the equivalent of grep.  "Version" also shows up harmlessly in other places in the log, e.g., as "BrowserVersion." I ignore these by repeatedly piping through select-string with the -NotMatch option.

select-string -path $filespec -pattern "version" `
    | select-string -NotMatch "UOMConversion" `
    | select-string -NotMatch "BrowserVersion" `
    | select-string -NotMatch "CONVERSION_RATE" `
    | select-string -NotMatch "conversionQty"

By the way, the backtick is PowerShell's line continuation character. I can thus make a very long pipeline of select-string cmdlets much more readable.

Select-string, unlike grep, returns an object, not just text. The actual text is a component of that object, named "line." It has more text in it than I need, so I discard much of it. If you look back at the example errors I listed above, you'll see that the timestamp is preceded by a square bracket. I use the -split method to chop each line in two, and then use select-string with a regular expression to keep only the one that starts with a date.

$_.line -split "\[" | select-string "^[0-9][0-9]"

This leaves me with just the portions of the lines starting with the timestamp, e.g., "04/10/13 12:53:11 JSMITH..." I process each of these lines, extracting the timestamp for my copy of the hash table. If the timestamp is newer than anything I've seen for this server, I write the hash table to my XML file, and add the offending log line (with a newline `n at the end for readability) to my array of errors.

$_.line -split "\[" | select-string "^[0-9][0-9]" `
    | ForEach-Object {
    $error_line = $_.line
    $error_date = `
        [datetime]::ParseExact($error_line.substring(0,17),`
            'MM/dd/yy HH:mm:ss',$null)
    if ($error_date -gt $previous_error_date) {
        $copy_of_app_servers.Set_Item($current_server, `
            $error_date)
        $copy_of_app_servers | Export-Clixml app_servers.xml
        $error_report += ($error_line + "`n")

Once I'm done with the log for each server, I send an email if there were any errors, i.e., if $error_report contains anything:

if ($error_report.Count -gt 0) {
    Send-MailMessage -To $mail_to `
        -Subject ($current_server + ": cache corruption? ") `
        -Body ("App server " + $current_server + `
            " may have corrupt cache.`n" `
            + $error_report) `
        -From $mail_from `
        -SmtpServer $mail_server}

I've used the Windows scheduler to run this script every five minutes. It should shield our application users the aggravation of at least some of those data integrity errors.

The full script


# psversion.ps1
# find PeopleSoft object version warnings in app server logs
# 8 Apr 2013 jthvedt
#
# email parameters
$mail_server = "mail.example.com"
$mail_to = `
    "Help Desk &ltcomputerhelpdesk@example.com&gt"
$mail_from = "peoplesoft@example.com"
#
# read hash table of app servers from file
$app_servers = Import-Clixml app_servers.xml
#
# you can't change a hashtable while iterating through it,
#   so I'll make a copy
$copy_of_app_servers = $app_servers.clone()
#
# path to app server log:
$log_path = "\fscm90\appserv\DBNAME\LOGS\"
#
# today's date in MMdd format
$log_date = get-date -format "MMdd"
#
# loop through app servers, searching for "version"
#     loop through each matching line
#
# notes:
#   $_.line is the line portion of the MatchInfo object 
#       returned by select-string
#
$app_servers.GetEnumerator() | ForEach-Object {
    $current_server = $_.Key
    $previous_error_date = $_.Value
    $error_report = $null
    $error_report = @()
    $filespec = "\\" + $current_server + $log_path + "APPSRV_" `
        + $log_date + ".LOG"
    select-string -path $filespec -pattern "version" `
        | select-string -NotMatch "UOMConversion" `
        | select-string -NotMatch "BrowserVersion" `
        | select-string -NotMatch "CONVERSION_RATE" `
        | select-string -NotMatch "conversionQty" `
            | ForEach-Object {
        $_.line -split "\[" | select-string "^[0-9][0-9]" `
            | ForEach-Object {
            $error_line = $_.line
            $error_date = `
                [datetime]::ParseExact($error_line.substring(0,17),`
                'MM/dd/yy HH:mm:ss',$null)
            if ($error_date -gt $previous_error_date) {
                $copy_of_app_servers.Set_Item($current_server, `
                    $error_date)
                $copy_of_app_servers | Export-Clixml app_servers.xml
                $error_report += ($error_line + "`n")
            }
        }
    }
    if ($error_report.Count -gt 0) {
        Send-MailMessage -To $mail_to `
            -Subject ($current_server + ": cache corruption? ") `
            -Body ("App server " + $current_server + `
                " may have corrupt cache.`n" + $error_report) `
            -From $mail_from `
            -SmtpServer $mail_server
    }
}

No comments:

Post a Comment