Thursday, October 10, 2013

RecoverPoint Management Application login woes

Here's another off-topic post, written in the hopes that it can save at least one person from hours of frustration.

I was unable to log in to my RecoverPoint management application. I don't know what precipitated it. It worked for many months, until it didn't. It's certainly something to do with Java, but I don't know exactly what.

The application would appear to launch normally (i.e., slowly!), until I got to the point of logging in. I'd enter my credentials, and it would reply with "Failed to connect to RPA (ip address of RPA). Connection Failed".

I was able to SSH into the RPA, so I knew it wasn't a locked account or bad password.

I run mostly in a VM, so I tried my host OS and got the same result. I tried another PC, and was able to get in, so it clearly was something local to my PC.

I futzed around with Java -- checking my version (I was already on the latest), clearing the cache, uninstalling/reinstalling, disabling version 7 in the hopes that 6 would work. Nothing helped. It actually got worse -- on Chrome, my browser of choice, it started to ask every time to upgrade to the latest version of Java, even though I already had it. I couldn't even launch the app. On Firefox, I could at least launch it, but I still couldn't log in.

I was so desperate, I even tried Internet Explorer. Same result.

I started poking around in the Java Control Panel to see if I could find anything. On a whim, I switched my proxy setting from "Use browser settings" to "Use proxy server" and filled in my proxy server details. Mind you, these are the same settings my browser has. So it shouldn't have fixed a thing. But it did. As long as I have specific proxy settings for Java, it works; when I take them out, it doesn't.

Here are the versions of the various parts: Windows XP SP3; RecoverPoint 3.5 SP1; Java 7 update 40.

Wednesday, April 10, 2013

Reporting object version errors in application servers

The problem

We have an occasional problem with what I assume is corrupt cache on our PeopleSoft app servers. As a result, application users get data integrity errors. The error states "The data structure from the page does not match the data structure in the database. This can be caused by a change in the record definition while you were viewing the page."

Sometimes it is just that -- a PeopleTools object is modified while it's in use. PeopleSoft recognizes it and aborts the transaction, as it should.

More often, though, I've found that the error is bogus, and no such change happened. Clearing the cache on the app server(s) in question will avoid further errors. This has a performance impact. Call me crazy, but I like having transactions actually complete, if a little more slowly than normal, rather than having them abort. The app server rebuilds its cache, and soon all is back to normal.

These data integrity errors show up in the app server logs. Here's an example:

PSAPPSRV.3460 (3110) [04/10/13 12:53:11 JSMITH@WXP (IE 7.0; WINXP) ICPanel](0) Deserialization integrity failure:  record SOME_TABLE version was 6514, now 6526
PSAPPSRV.5092 (1570) [04/10/13 12:53:18 JSMITH@WXP (IE 7.0; WINXP) ICPanel](0) Deserialization integrity failure:  record SOME_TABLE version was 6526, now 6514

Notice that the version number changed from 6514 to 6526, then back again 7 seconds later! This is not normal behavior.

Once these errors show up in the app server log, that user has already seen the error, and has had a transaction abort. However, it's not too late to save other users from the same fate, if I know a particular app server has a cache problem.

Our PeopleSoft app servers are all running Windows. PowerShell seems like a good way to monitor the log files and alert me when we have these errors.

This script demonstrates the following features of PowerShell:

  • hash tables
  • arrays
  • looping with ForEach-Object
  • searching for text with Select-String
  • reading and writing an XML file
  • sending email 


As always, I'm sure I'm doing some of this the hard way, and I invite PeopleSoft and PowerShell folks to suggest improvements. Also, if anybody knows how to make Alex Gorbatchev's Syntax Highlighter do horizontal scrolling on Blogger, please let me know.

I'll go through the interesting sections of the script, then show the whole thing at the end of this post.

The plan

Our PS_HOME folders already have network shares set up on all of the servers, so I decided to run just one monitoring script on another server to check all of them.

In order to avoid a flood of email alerts, I wanted a way to keep track of when each app server had its last error, and only send out email when new errors show up. PS app server logs start new each day, and I don't want to be pestered all day about an error that happened 1 minute after midnight.

I chose to use a hash table to keep track of each server's most recent error. This is a key-value or dictionary data structure. The name of each server will be my key, and the timestamp will be the value.

For persistant storage, I chose XML. PowerShell has cmdlets to import and export data to XML files, which makes saving and retrieving my hash table simple.

First, I'll create the hash table with sufficiently old timestamps, and write it to a file. I only need to do this once, unless I need to add more servers to my list.

$foo = get-date$foo = $foo.AddDays(-1000)
$app_servers = @{"Pete" = $foo; "Roger" = $foo; `
    "Keith" = $foo; "John" = $foo}
$app_servers | Export-CliXML app_servers.xml

This creates a short list of my servers, each with a timestamp of today's date minus 1,000 days. Any new error I find in the logs will update the timestamp.

The script

Now on to the actual monitoring. After initializing a few variables, I read my hash table in from my XML file:

$app_servers = Import-Clixml app_servers.xml

Now, it turns out that you can't modify a hash table while iterating through it, so I make a copy of it:

$copy_of_app_servers = $app_servers.clone()

The location of a PeopleSoft app server log on Windows is $PS_HOME\appserv\DBNAME\LOGS\APPSRV_MMDD.LOG, with MMDD being today's month and date. I derive the MMDD portion of that file name:

$log_date = get-date -format "MMdd"

Now I'm ready to loop through each server, using PowerShell's ForEach-Object syntax. To do this with a hash table, you need to use the .GetEnumerator() method. Once I'm looping through, the way to refer to the current object is $_. Each component of the object is addressed with dot notation; $_.Key for my server names, and $_.Value for my date stamps. I copy these values to string variables, and I build the complete path to the server log. $log_path has previously been initialized to the standard path for our particular version of PeopleTools. I also set up an array, $error_report, that will hold the lines of the log that contain the version errors.

$app_servers.GetEnumerator() | ForEach-Object {
    $current_server = $_.Key
    $previous_error_date = $_.Value
    $error_report = $null
    $error_report = @()
    $filespec = "\\" + $current_server + $log_path `
        + "APPSRV_" + $log_date + ".LOG"

Now I'm ready to read through the log file, looking for the word "version" using the cmdlet select-string. This is roughly the equivalent of grep.  "Version" also shows up harmlessly in other places in the log, e.g., as "BrowserVersion." I ignore these by repeatedly piping through select-string with the -NotMatch option.

select-string -path $filespec -pattern "version" `
    | select-string -NotMatch "UOMConversion" `
    | select-string -NotMatch "BrowserVersion" `
    | select-string -NotMatch "CONVERSION_RATE" `
    | select-string -NotMatch "conversionQty"

By the way, the backtick is PowerShell's line continuation character. I can thus make a very long pipeline of select-string cmdlets much more readable.

Select-string, unlike grep, returns an object, not just text. The actual text is a component of that object, named "line." It has more text in it than I need, so I discard much of it. If you look back at the example errors I listed above, you'll see that the timestamp is preceded by a square bracket. I use the -split method to chop each line in two, and then use select-string with a regular expression to keep only the one that starts with a date.

$_.line -split "\[" | select-string "^[0-9][0-9]"

This leaves me with just the portions of the lines starting with the timestamp, e.g., "04/10/13 12:53:11 JSMITH..." I process each of these lines, extracting the timestamp for my copy of the hash table. If the timestamp is newer than anything I've seen for this server, I write the hash table to my XML file, and add the offending log line (with a newline `n at the end for readability) to my array of errors.

$_.line -split "\[" | select-string "^[0-9][0-9]" `
    | ForEach-Object {
    $error_line = $_.line
    $error_date = `
        [datetime]::ParseExact($error_line.substring(0,17),`
            'MM/dd/yy HH:mm:ss',$null)
    if ($error_date -gt $previous_error_date) {
        $copy_of_app_servers.Set_Item($current_server, `
            $error_date)
        $copy_of_app_servers | Export-Clixml app_servers.xml
        $error_report += ($error_line + "`n")

Once I'm done with the log for each server, I send an email if there were any errors, i.e., if $error_report contains anything:

if ($error_report.Count -gt 0) {
    Send-MailMessage -To $mail_to `
        -Subject ($current_server + ": cache corruption? ") `
        -Body ("App server " + $current_server + `
            " may have corrupt cache.`n" `
            + $error_report) `
        -From $mail_from `
        -SmtpServer $mail_server}

I've used the Windows scheduler to run this script every five minutes. It should shield our application users the aggravation of at least some of those data integrity errors.

The full script


# psversion.ps1
# find PeopleSoft object version warnings in app server logs
# 8 Apr 2013 jthvedt
#
# email parameters
$mail_server = "mail.example.com"
$mail_to = `
    "Help Desk &ltcomputerhelpdesk@example.com&gt"
$mail_from = "peoplesoft@example.com"
#
# read hash table of app servers from file
$app_servers = Import-Clixml app_servers.xml
#
# you can't change a hashtable while iterating through it,
#   so I'll make a copy
$copy_of_app_servers = $app_servers.clone()
#
# path to app server log:
$log_path = "\fscm90\appserv\DBNAME\LOGS\"
#
# today's date in MMdd format
$log_date = get-date -format "MMdd"
#
# loop through app servers, searching for "version"
#     loop through each matching line
#
# notes:
#   $_.line is the line portion of the MatchInfo object 
#       returned by select-string
#
$app_servers.GetEnumerator() | ForEach-Object {
    $current_server = $_.Key
    $previous_error_date = $_.Value
    $error_report = $null
    $error_report = @()
    $filespec = "\\" + $current_server + $log_path + "APPSRV_" `
        + $log_date + ".LOG"
    select-string -path $filespec -pattern "version" `
        | select-string -NotMatch "UOMConversion" `
        | select-string -NotMatch "BrowserVersion" `
        | select-string -NotMatch "CONVERSION_RATE" `
        | select-string -NotMatch "conversionQty" `
            | ForEach-Object {
        $_.line -split "\[" | select-string "^[0-9][0-9]" `
            | ForEach-Object {
            $error_line = $_.line
            $error_date = `
                [datetime]::ParseExact($error_line.substring(0,17),`
                'MM/dd/yy HH:mm:ss',$null)
            if ($error_date -gt $previous_error_date) {
                $copy_of_app_servers.Set_Item($current_server, `
                    $error_date)
                $copy_of_app_servers | Export-Clixml app_servers.xml
                $error_report += ($error_line + "`n")
            }
        }
    }
    if ($error_report.Count -gt 0) {
        Send-MailMessage -To $mail_to `
            -Subject ($current_server + ": cache corruption? ") `
            -Body ("App server " + $current_server + `
                " may have corrupt cache.`n" + $error_report) `
            -From $mail_from `
            -SmtpServer $mail_server
    }
}

Thursday, March 3, 2011

PowerShell 2.0, finally

Today on my main work PC I finally moved to Windows XP Service Pack 3, which means I can run PowerShell 2.0. Will that mean I'll get moving on this PS4PS thing? Alas, probably not. But at least I've knocked down one small roadblock.

Friday, December 4, 2009

PowerShell version 2.0

PowerShell version 2.0 has been out for a while. Microsoft doesn't make it all that obvious, but it seems you can't just get PowerShell by itself.

Also, for Windows XP, you have to be on Service Pack 3, which, alas, I am not. My PeopleSoft servers are running Windows Server 2003, so I can get PowerShell v2 for them.

Anyway, you can download it here. It wasn't all that easy to find. I looked at the MSFT PowerShell Blog, and saw a "click here to download PowerShell" link right at the top. Good! So I follow the link, and does it get me to the download page? Nope. It's a list of links, none of which say anything about the final release of PowerShell 2.0. There are earlier versions, and some test releases of 2.0. There's one link labeled "lastest release," but that's to Windows Management Framework, not PowerShell. Little did I know, WMF includes PS. On a whim, I followed that link, and finally found myself on a download page.

Even on then, you have to scroll about 2/3 of the way down. Hey, Microsoft! If you want people to use this stuff, make it less confusing. Granted, PowerShell is not exactly aimed at the easily-confused, but still.

Oracle index compression and PeopleTools

[blog note: PowerShell has dropped to the bottom of my priority list for now. I hope to get back to it eventually. In the meantime, here's a PeopleSoft-related post that has nothing to do with PowerShell.]

I've been having great success with using Oracle's index compression with our PeopleSoft database. It saves a lot of space, and speeds up index range scans. The problem is that it's not supported by PeopleTools.

It's not appropriate for every index, but that's beyond the scope of this discussion. For some background on Oracle index compression, Jonathan Lewis describes the basics here, and Richard Foote goes into detail here, here, here, and here.

When an index is built or rebuilt with the PeopleTools App Designer, it uses the DDL [data definition language] model for your particular platform. In the case of Oracle, you can have default parameters like initial and max extents that you can override per index. Compression isn't one of those parameters, so I've been compressing selected indexes outside of PeopleTools. Usually, I'll just do an "alter index ps_some_huge_table rebuild compress X" with X being some number of columns.

This is not at all optimal. It's not reflected in the PeopleTools metadata, so if anybody does a build with PeopleTools, the compression is lost. In the case of an application upgrade, this is likely to undo compression on a large number of indexes.

What I'd like to do is modify the DDL model SQL, have the default be no compression, and override the compression for those indexes that I've already compressed. For future index compression, I can just use PeopleTools from the get-go.

You can change the DDL model SQL in PeopleTools/Utilities/Administration/DDL Model Defaults.

Here's the problem I've run into: the delivered DDL Model SQL for a create index command for Oracle looks like this, with anything in double-asterisks being a parameter that's either defaulted or overridden:

CREATE [UNIQUE] **BITMAP** INDEX [IDXNAME] ON [TBNAME] ([IDXCOLLIST]) TABLESPACE **INDEXSPC** STORAGE (INITIAL **INIT** NEXT **NEXT** MAXEXTENTS **MAXEXT** PCTINCREASE **PCT**) PCTFREE **PCTFREE** PARALLEL NOLOGGING;


I've modified it to look like this:

CREATE [UNIQUE] **BITMAP** INDEX [IDXNAME] ON [TBNAME] ([IDXCOLLIST]) TABLESPACE **INDEXSPC** STORAGE (INITIAL **INIT** NEXT **NEXT** MAXEXTENTS **MAXEXT** PCTINCREASE **PCT**) PCTFREE **PCTFREE** PARALLEL NOLOGGING COMPRESS **COMPRESS**;


and set up an additional parameter COMPRESS with a default value of 0. I then go into a record definition for some table with an index I want to compress and override that default value with 3. I generate a CREATE INDEX script, and everything looks good -- the index with the overridden COMPRESS parameter says COMPRESS 3, and the other indexes say COMPRESS 0.

Alas, COMPRESS 0 is not legal syntax, at least in Oracle 10.2. The syntax for no compression is either NOCOMPRESS or just no compress clause at all. I can't have the default be blank, since COMPRESS on its own with no integer means compress every column in a non-unique index and compress all but the last in a unique index.

As I'm typing this, another idea occurs to me. Back in a bit... [type type type test test test...]

OK! That's better. Instead of using the parameter just as the number of columns, with the default being 0, I made the default value "NOCOMPRESS" and the override now has to be "COMPRESS X" with X being some positive integer. I no longer include the keyword "COMPRESS" in the model DDL; I just include the parameter value itself. So now the model SQL is this:

CREATE [UNIQUE] **BITMAP** INDEX [IDXNAME] ON [TBNAME] ([IDXCOLLIST]) TABLESPACE **INDEXSPC** STORAGE (INITIAL **INIT** NEXT **NEXT** MAXEXTENTS **MAXEXT** PCTINCREASE **PCT**) PCTFREE **PCTFREE** PARALLEL NOLOGGING **COMPRESS**;


One oddity: I had to exit and restart the app designer to get it to use the new default value.

I have it only in a development database. I'll have to think, read, and tinker a bit before I put it in production. Upgrades are by far my biggest worry.

UPDATE: I heard from a couple of PeopleSoft experts, by email and in the comments, and there's an even simpler way to achieve this. It's based on the fact that PeopleTools does no validation on the model SQL or parameters. One can just piggyback the compress syntax onto one of the existing parameters. Says David Kurtz:
I tend to put the COMPRESS keyword and value into the PCTFREE DDL override. These are just strings that are used to build up the DDL command. The PCTFREE override defaults to "10". I might, for example, set it to "10 COMPRESS 1".
The advantage is that when upgrade time comes, and the DDL script runs (in this case, DDLORA.DMS), nothing is lost. I don't have to remember to redo the custom model SQL.

Thursday, May 28, 2009

Zombie Crystal Reports, part 2

See part 1.

I am hunting for long-running Crystal Reports on a PeopleSoft report server. Obviously there are a number of ways to see these, e.g., the PeopleSoft process monitor or Oracle data dictionary v$ views. I need a way to see these at the Windows process level, so I can kill them.

The PowerShell way to get process information is get-process. How did I learn this? Well, I googled it. That's a limitation of the command line: it's not all that discoverable. In contrast to a GUI, it's often hard to just poke around the tool itself to find stuff out. This page at Microsoft shows a number of things you can do with get-process.

Aside: get-process is a cmdlet (pronounced commandlet), which are the basic building-blocks in PowerShell.

By default, it shows all processes, and eight columns of attributes, none of which has anything to do with start time:


PS C:\temp\ps1> get-process

Handles NPM(K) PM(K)  WS(K) VM(M)  CPU(s)    Id ProcessName
------- ------ -----  ----- -----  ------    -- -----------
88       3     2124    4300    39    5.39   308 Avconsol
86       3     2248    4096    44   25.66  1280 Avsynmgr
89       3     1916    3552   421    1.58  3468 bash
65       4    19216   18972    74    4.23   200 chrome
713     15    42024   56864   138  481.77  1372 chrome
61       3     8948    2448    60    0.80  1700 chrome
61       5    17168    3148    73    4.50  1916 chrome
61       3    11548    4684    65   10.64  2056 chrome
62       3    54004   15736   121   13.83  2228 chrome
...

Yes, I have a lot of Chrome tabs open. You don't know the half of it...

To find out what other attributes get-process can report, pipe it to cmdlet get-member:

PS C:\temp\ps1> get-process | get-member

TypeName: System.Diagnostics.Process

Name          MemberType     Definition
----          ----------     ----------
Handles       AliasProperty  Handles = Handlecount
Name          AliasProperty  Name = ProcessName
NPM           AliasProperty  NPM = NonpagedSystemMemorySize
PM            AliasProperty  PM = PagedMemorySize
VM            AliasProperty  VM = VirtualMemorySize
WS            AliasProperty  WS = WorkingSet
add_Disposed  Method         System.Void add_Disposed(EventHandler value)
...

It goes on and on and on from there, eventually showing us this:

StartTime  Property  System.DateTime StartTime {get;}


Get used to "get-member." I know I'll be using it all the time.

To see that property, pipe it to the cmdlet Select-Object. We'll also send it a process name with a wildcard to just see PeopleSoft-related processes:


PS C:\temp\ps1> get-process ps* | select-object Id, ProcessName, StartTime

Id   ProcessName   StartTime
--   -----------   ---------
3484 psadmin       3/26/2009 8:48:36 AM
2024 psaesrv       5/27/2009 5:41:20 PM
2328 psaesrv       5/26/2009 5:58:13 PM
3116 psaesrv       5/19/2009 12:23:18 PM
3344 psdstsrv      5/28/2009 3:32:49 AM
3472 psdstsrv      4/2/2009 6:05:18 AM
2344 psmonitorsrv  5/28/2009 6:05:17 AM
3132 psprcsrv      4/2/2009 6:05:19 AM
3388 pssqr         5/28/2009 1:13:50 PM

Here we can see some App Engine servers, various report server processes like distribution servers, and an SQR. No Crystal Reports at all. I was hoping for a zombie so we could kill it. Another day.

So we can report on the start time, but how do we filter on it? Let's say we want to see everthing started in the last two hours:


PS C:\temp\ps1> $now = Get-Date
PS C:\temp\ps1> $then = $now.AddSeconds(-7200)
PS C:\temp\ps1> get-process | where-object {$_.StartTime -gt $then}

Handles  NPM(K) PM(K)  WS(K) VM(M)  CPU(s)    Id ProcessName
-------  ------ -----  ----- -----  ------    -- -----------
31       1     1888     2608    12    0.03  2480 pssqr
79       4     5220    10056    44    0.06  1624 sqlplus
78       5    12816    16828    66    0.20  2576 sqrw



Grab the current time with Get-Date; calculate a time two hours ago (7200 seconds = 2 hours), and, via cmdlet Where-Object, show all processes started after that. Here we see that the report server has recently launched an SQR and a SQL*Plus script.

I could easily have asked for just the Crystal Reports, though since there weren't any running, it's a good thing I didn't.

In part three I may finally get around to killing the zombies.

Wednesday, May 27, 2009

Zombie Crystal Reports, part 1

I've struggled with Crystal Reports and the PeopleSoft report server over the years. Back when we were on PeopleTools 8.42, it took me several weeks to get them running smoothly. I had to carefully limit the number that would run simultaneously on any server, or they'd lock up on us. It took a lot of experimentation to find the right limit that would minimize zombie reports, yet still give us the throughput we required. I ended up adding another server just to handle the load.

Another thing I did, which was an ugly but effective hack, was to run a VBscript that would find and kill any Crystal that had been running for more than an hour. I know nothing about VBScript, but I found a generic process-killing script somewhere, and modified it for my purposes. I scheduled it to run periodically on each report server, and it did what it was supposed to do.

On our last PeopleSoft upgrade, we deployed all new Windows servers, including the report servers. I didn't put those VBScripts on the new servers, and so far on PeopleTools 8.49, I haven't seen many Crystal zombies.

On Tuesday of this week, I found a zombie that had been running over the three-day holiday weekend. I'd received complaints about Crystals sitting in queue for a long time. Part if it was just a tremendous number of report requests, but that one stuck report was apparently taking up a slot in the report server queue, so we were running one less report at a time than we normally do.

The PeopleSoft process request table (PSPRCSRQST) and the Oracle V$SESSION table showed me the thing had been running for a few days, but the Windows task manager showed three pscrrun process. The task manager doesn't show a start time. Which was the zombie?

Well, one clue was that one of them was using a crazy amount of memory -- I believe it was close to 1 GB. I was fairly sure that was the culprit, but to be extra sure, I put that report server in "suspend" mode, which completes any current processes, but doesn't start any new ones. Sure enough, the other two Crystals finished within a minute. I killed the zombie using the Windows task manager, and un-suspended the report server.

Had I scheduled that VBScript, that Crystal would have been long gone. I considered doing just that, but it seemed like a good opportunity to learn a little PowerShell.

Next, in part 2: finding long-running reports.