OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
From: AIX Service Mail Server (aixserv_at_austin.ibm.com)
Date: Tue Aug 06 2002 - 02:43:10 CDT

  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

    APAR: IY24116 COMPID: 5765E6900 REL: 310
    ABSTRACT: FUTURE AVAILABILITY OF LOADLEVELER CHECKPOINT/RESTART FOR 32 BIT

    PROBLEM DESCRIPTION:
    future availability of loadleveler checkpoint/restart for
    64 bit application information.

    PROBLEM SUMMARY:
    availability of loadleveler checkpoint/restart
    for 32 bit application information.

    ------

    APAR: IY24117 COMPID: 5765E6900 REL: 310
    ABSTRACT: FUTURE AVAILABILITY OF LOADLEVELER CHECKPOINT/RESTART FOR

    PROBLEM DESCRIPTION:
    future availability ofloadleveler checkpoint/restart for 64 bit
    application information.

    PROBLEM SUMMARY:
    availability of loadleveler checkpoint/
    restart for 64 bit application information.

    ------

    APAR: IY28091 COMPID: 5765D5100 REL: 320
    ABSTRACT: SPGETDESC SUPPORT OF WINTERHAWK2 450MHZ

    PROBLEM DESCRIPTION:
    The WinterHawk2 450MHz needs to be added to the spgetdesc
    command .

    PROBLEM SUMMARY:
    /usr/lpp/ssp/bin/spgetdesc did not describe the
    processor speed of the Winterhawk II nodes.
    For Winterhawk II nodes, both thin and wide,
    currently the description that is returned is:
    spgetdesc: Node 5 (c183n05.ppd.pok.ibm.com)
    is a 375_MHz_POWER3_SMP_Thin
    The description has been updated to return:
    375/450_MHz_POWER3_SMP_Thin - for thin nodes
    375/450_MHz_POWER3_SMP_Wide - for wide nodes

    PROBLEM CONCLUSION:
    In the spgetdesc script, the definition table for
    a Winterhawk II node was changed to read
    375/450_MHz_POWER3_SMP_Thin
    OR
    375/450_MHz_POWER3_SMP_Wide

    ------

    APAR: IY29066 COMPID: 5765E6800 REL: 300
    ABSTRACT: README.PERFAGENT FOR THE SERVER NEEDS TO BE UPDATED

    PROBLEM DESCRIPTION:
    bos.perf.tools README needed.

    PROBLEM SUMMARY:
    The readme text for solaris & hp support
    need to be removed from perfagent.server README file.

    PROBLEM CONCLUSION:
    Remove the readme text which tells about xmservd's
    solaris & hp support and also added that HP & solaris
    are no more supported.

    ------

    APAR: IY29412 COMPID: 5765E7400 REL: 300
    ABSTRACT: AZIZO DATA COLLECTION 256 METRIC LIMIT

    PROBLEM DESCRIPTION:
    ptxtab provides an option to generate a sinlge ASCII file.

    PROBLEM CONCLUSION:
    flag -u causes ptxtab to ignore the maximum of metircs in each
    statset and format a single output file as comma separated
    ASCII.

    ------

    APAR: IY29453 COMPID: 5765E6800 REL: 300
    ABSTRACT: PTXSPLIT -F OPTION PROCESSES NO MORE THAN 256 METRICS

    PROBLEM DESCRIPTION:
    ptxsplit -f option processes no more than 256 metrics

    PROBLEM SUMMARY:
    ptxsplit -f option can process more than 256 metrics.

    PROBLEM CONCLUSION:
    ptxsplit -u option reconstructs statset, so it can process
    more than 256 metrics.

    ------

    APAR: IY29544 COMPID: 5765E6900 REL: 310
    ABSTRACT: GANG PREEMTION PROBLEM

    PROBLEM DESCRIPTION:
    Problem:
    When Schedd goes down, jobs in preempted state remain in the
    queue in 'EP' state for indefinite time, even after the Schedd
    has restarted.

    PROBLEM SUMMARY:
    Jobs remain in EP (preempt pending) indefinitely.

    PROBLEM CONCLUSION:
    Jobs need to be running before being preempted.

    ------

    APAR: IY29802 COMPID: 5765C3403 REL: 430
    ABSTRACT: ENSCRIPT DOESN'T SUPPORT HEXA CHARMETRICS IN .AFM FILES

    PROBLEM DESCRIPTION:
    enscript will fail with following error:
    enscript: 1007-603 The AFM file /usr/lib/ps/<font>.afm has a
    line which is not formatted correctly.
    incase there are Hexa charmetrics in the AFM files, in
    StartCharmetrics
    EndCharmetrics section
    A Hexa Charmetric will be typically like this:
    CH 816D ; WX 250 ; N space ; B 0 0 0 0 ;

    PROBLEM SUMMARY:
    enscript and afmdit.awk doesnt support hexa
    charmetrics in AFM files

    PROBLEM CONCLUSION:
    enscript and afmdit.awk enhanced to support
    CharMetrics of type:
    CH 814E ; WX 1000 ; N chars ; B 334 703 665 805 ;

    ------

    APAR: IY29901 COMPID: 5765B8100 REL: 220
    ABSTRACT: SAVEDT USES UNINITIALISED VARIABLE IN SOME CLEANUP ROUTINES.

    PROBLEM DESCRIPTION:
    The saveDT script can report an error with when attempting to
    clean-up after a failure.
    This only reports the error if you select to tar to tape.

    LOCAL FIX:
    The problem will occur when some other failure has occurred on
    the system. Check and correct the previous failure.

    PROBLEM SUMMARY:
    The saveDT script can report an error with when
     attempting to clean-up after a failure.
    This only reports the error if you select to tar to tape.

    PROBLEM CONCLUSION:
    By checking appropriate environment
    variables were set before trying to use them.

    ------

    APAR: IY30258 COMPID: 5765E6110 REL: 220
    ABSTRACT: REQUIRED MAINTENANCE UPGRADE

    PROBLEM DESCRIPTION:
    required maintenance upgrade

    ------

    APAR: IY30437 COMPID: 5765D5100 REL: 320
    ABSTRACT: BAD DMA WRITE FOR KLAPI 0-COPY MSG

    PROBLEM DESCRIPTION:
    This problem is caused by cleaning up a hal dma handle while the
    there is still a post of the message possible.

    PROBLEM SUMMARY:
    There is a time hole where a DMA buffer may
    remain posted after a message is marked as complete to the user.
    This leaves the possibility of data corruption or, in the case
    of a Regatta, a system check stop.

    PROBLEM CONCLUSION:
    All outstanding DMA buffers are cancelled
    before a buffer is marked as complete to the user.

    ------

    APAR: IY30443 COMPID: 5765E7400 REL: 300
    ABSTRACT: GREY OUT USELESS VIEWS

    PROBLEM DESCRIPTION:
    views are greyed-out based on the amount of data recording
    files have.

    PROBLEM CONCLUSION:
    views are greyed-out based on the amount of data recording
    files have. For example if only 1 days worth of data is
    available then no need to offer the year-by-month view.

    ------

    APAR: IY30645 COMPID: 5765C3403 REL: 430
    ABSTRACT: PPS FAIL TO SYNC ON DISK WITH > 2ª15 PPS

    PROBLEM DESCRIPTION:
    Customer may be unable to sync all partitions on logical
    volumes on a disk with greater than 2ª15 physical partitions.

    PROBLEM SUMMARY:
    syncvg fails to sync all partitions on disk
    with greater than 2ª15 pps.

    PROBLEM CONCLUSION:
    change variable types in LVM kernel config
    routine to ensure proper handling of large
    pp numbers.

    TEMPORARY FIX:
    Reduce factor of VG, thereby decreasing the
    number of PPs per PV.

    ------

    APAR: IY30864 COMPID: 5765E5400 REL: 440
    ABSTRACT: HAS,HAES: EXTREMELY LONG FALLOVER TIMES DUE TO IMFS COMMANDS

    PROBLEM DESCRIPTION:
    The customer was testing some fallovers and found that they
    were taking an extremely long time. It was found that the
    long time was due to imfs commands issued during the takeover
    of the vg's. The customer was using bigvgs.

    PROBLEM SUMMARY:
    HACMP will always run imfs after varying on a volume group.
    This is unnecessary for bigvgs.

    PROBLEM CONCLUSION:
    Test to see if the volume group is a bigvg, and, if so, skip
    the imfs.

    ------

    APAR: IY31246 COMPID: 5765D5100 REL: 340
    ABSTRACT: RC.SP SETS THE WRONG BOOTLIST IF TOTAL BOOTDISKS NOT

    PROBLEM DESCRIPTION:
    rc.sp sets the wrong bootlist if total bootdisks
    not equivalent to total install disks

    PROBLEM SUMMARY:
    On the reboot of a node, the bootlist was being reset to
    include all of the physical volumes listed for the selected
    volume group of the node. Even the physical volumes that
    did not contain boot logical volumes were included in
    the bootlist. If there was a high number of physical
    volumes it could cause a subsequent reboot to fail.

    PROBLEM CONCLUSION:
    spboot, which is called by /etc/rc.sp, was modified to only
    set the bootlist to physical volumes that contain boot
    logical volumes.

    ------

    APAR: IY31307 COMPID: 5765B9500 REL: 150
    ABSTRACT: PROBLEM IN CONVERTING FROM MMSDRFS TO MMSDRFS2 WHEN MIGRATING

    PROBLEM DESCRIPTION:
    After migration from gpfs-1.2 to gpfs-1.5, customer was unable
    to start daemon because of the following errors ...
    mmcommon: Invalid keyword: getNodeSDRdata
    mmremote: Unexpected error from getLock: convertFwd. Return code
    mmremote: 6027-1571 Unexpected failure executing sysctl -h cws1
    Check the preceding messages if any. Check /etc/sysctl.mmcmd.ac
    restart sysctl on cws1, or run kinit on this node.
    From the error messages this appeared to be sysctl related.
    However, the actual problem was that the mmcommon routine was
    being called with a function name that had changed in the
    gpfs-1.5 version of the file, hence the "Invalid keyword".

    PROBLEM SUMMARY:
    Allow getNodeSDRdata to be called directly. Needed when
    migrating from rel 1.1 or 1.2.

    PROBLEM CONCLUSION:
    Fixed migration path from release 1.1 and 1.2

    ------

    APAR: IY31350 COMPID: 5765D5100 REL: 320
    ABSTRACT: SPCW_DEFER_NTP SHOULD CALL /USR/SBIN/NTPDATE, NOT $SSP_BIN/NTPDA

    PROBLEM DESCRIPTION:
    The PSSP Version of xntpd is no longer used in PSSP 3.2. PSSP
    systems, 3.2 and higher, must use the AIX version of ntpdate,
    which is /usr/sbin/ntpdate.
    The script /usr/sbin/hacws/spcw_defer_ntp uses $SSP_BIN/ntpdate,
    which is incorrect. See line 101. Result is that xntpd does
    not start.

    LOCAL FIX:
    As a workaround, customer can use a symbolic link:
        ln -s /usr/sbin/ntpdate /usr/lpp/ssp/bin/ntpdate
    OR, edit line 101 in /usr/sbin/hacws/spcw_defer_ntp
    an replace $SSP_BIN/ntpdate with /usr/sbin/ntpdate.

    PROBLEM SUMMARY:
    Effective with PSSP 3.2, ntp is no longer shipped with PSSP.
    The AIX version of ntp should be used. spcw_defer_ntp
    is calling /usr/lpp/ssp/bin/ntpdate when it should be
    calling /usr/sbin/ntpdate.

    PROBLEM CONCLUSION:
    spcw_defer_ntp has been modified to call /usr/sbin/ntpdate
    instead of /usr/lpp/ssp/bin/ntpdate.

    ------

    APAR: IY31381 COMPID: 5765B9500 REL: 150
    ABSTRACT: GPFS:6027-848 CONFIG MANAGER 35 FAILED UPDATING NEW NODE STATUS

    PROBLEM DESCRIPTION:
    gpfs:6027-848 config manager 35 failed updating new node status

    PROBLEM SUMMARY:
    Fixed sysctl locking condition with mmconfig.

    PROBLEM CONCLUSION:
    In the sp environment, do not use the output of hostname as
    a lock identifier. If hostname on a node is set to be the
    same as the switch adapter name, locks cannot be reclaimed
    (sysctl cannot talk to the node).

    ------

    APAR: IY31445 COMPID: 5765D9300 REL: 320
    ABSTRACT: ADDING PROFILE PROBES TO A LARGE APP CAUSED SESMGR CORE DUMP

    PROBLEM DESCRIPTION:
    I am not getting profile output from my [large] job

    PROBLEM SUMMARY:
    Using pct to add profile probes to a large application
    causes a sesmgr core dump.

    PROBLEM CONCLUSION:
    The problem is a memory overlay when adding large numbers of
    probes. This causes a seg fault of the profile module which
    is loaded by sesmgr. An array of objects in the profile module
    was incorrectly being staticly allocated. The fix is to dynamically
    allocate the array.

    ------

    APAR: IY31450 COMPID: 5765D6100 REL: 220
    ABSTRACT: THE LIMIT OF 4095 OF LOADL WAS EXCEEDED WITH LLQ -S

    PROBLEM DESCRIPTION:
    The limit of 4095 of LoadL was exceeded with llq -s
    command. Customer wishes change request to make this limit
    larger.

    PROBLEM SUMMARY:
    In LoadLeveler 2.2, the command llq -s would core dump when
    the internal class list array expands the CLASS statement
    to more than 4095 characters.

    ------

    APAR: IY31475 COMPID: 5765B9501 REL: 340
    ABSTRACT: FSCK DOES NOT FIX CORRUPTED ALLOC MAP CHAINS

    PROBLEM DESCRIPTION:
    mmfsck does not fix FSSTRUCT errors of type 114 (corrupted
    allocation maps).

    PROBLEM SUMMARY:
    Fixed mmfsck to repair FSSTRUCT errors of type 114
    (corrupted allocation maps)

    PROBLEM CONCLUSION:
    Fix relinkAllChunks which computed an incorrect allocation
    map magic number for a disk. Provide new functionality to
    verify allocation map chunk list head bitmap chain.
    Recognize chunk list head loops and unlinked chunks.

    ------

    APAR: IY31577 COMPID: 5765B9501 REL: 340
    ABSTRACT: MMREPQUOTA SHOWS NEGATIVE USAGE AFTER MMRESTRIPEFS

    PROBLEM DESCRIPTION:
    mmrepquota shows negative usage after mmrestripefs

    PROBLEM SUMMARY:
    Fixed mmrestripefs causing mmrepquota to show incorrect
    usage.

    PROBLEM CONCLUSION:
    During restripe and defrag, when deallocating unused blocks
    do not decrement quota usage count if these blocks were not
    allocated with allocBlock.

    ------

    APAR: IY31578 COMPID: 5765B9501 REL: 320
    ABSTRACT: NODE PANICKED BY RUNNING GPFS_STAT()

    PROBLEM DESCRIPTION:
    node panicked by running gpfs_stat()

    PROBLEM SUMMARY:
    kpathname being traced after it is freed in kernel.

    PROBLEM CONCLUSION:
    Fixed trace path which could cause gpfs_stat() to panic
    node.

    ------

    APAR: IY31580 COMPID: 5765B9501 REL: 340
    ABSTRACT: NODE PANICKED BY RUNNING GPFS_STAT()

    PROBLEM DESCRIPTION:
    node panicked by running gpfs_stat()

    PROBLEM SUMMARY:
    kpathname being traced after it is freed in kernel.

    PROBLEM CONCLUSION:
    Fixed trace path which could cause gpfs_stat() to panic
    node.

    ------

    APAR: IY31601 COMPID: 5765D9300 REL: 320
    ABSTRACT: MAN PAGE FOR MPCC_R SHOULD DOCUMENT: C++ BINDINGS ARE SUPPORTED

    PROBLEM DESCRIPTION:
    mpCC_r documentation is incorrect. There is a -cpp option that
    is documented in the mpcc_r script.However, the -cpp really appl
    ies to the mpCC_r script. The -cpp option in mpCC_r enables use
    of full C++ bindings in MPI. The man page for mpCC_r also does
    not include the -cpp option.
    man page for mpCC_r and mpcc_r need to be changed. man page
    for mpCC_r should document that C++ bindings are supported via
    the -cpp flag. The mpcc_r man page should remove the -cpp flag
    from its description.

    PROBLEM SUMMARY:
    The documentation for mpCC_r is incorrect. There is a -cpp
    option that is documented in the mpcc_r script. However,
    the -cpp really applies to the mpCC_r script. The -cpp
    optionin mpCC_r enables use of C++ bindings in MPI.

    PROBLEM CONCLUSION:
    The -cpp option was added to the man page for mpCC_r and the
    -cpp option was also removed from mpcc_r. The poe.README
    was also changed to note the documentation change to mpCC_r
    and mpcc_r.

    ------

    APAR: IY31698 COMPID: 5765D5100 REL: 340
    ABSTRACT: ERROR MESSAGE WHEN APPLYING SSP.PMAN 3.4.0.1

    PROBLEM DESCRIPTION:
    error message when applying ssp.pman 3.4.0.1

    PROBLEM SUMMARY:
    On the apply of ssp.pman 3.4.0.1 in PTF Set 9 installp
    prints the message:
    touch: 0652-046 Cannot create
    /usr/lpp/ssp/README/pman3.2_save_me1.
    and fails to create the marker file. This will result in
    pmand being unnecessarily recycled in future PTFs and not
    being recycled on PTF reject.
    The cause of the problem is that the directory README should
    be READMES.

    PROBLEM CONCLUSION:
    The README in the directory path of the marker file
    (/usr/lpp/ssp/README) has been changed to READMES in the
    installp script file.

    ------

    APAR: IY31699 COMPID: 5765D5100 REL: 320
    ABSTRACT: S1TERM PROCESSING ENHANCEMENTS

    PROBLEM DESCRIPTION:
    s1term processing enhancements

    PROBLEM SUMMARY:
    Enhancements were required to s1term processing used by
    a node to obtain a srvtab and the supman password.

    PROBLEM CONCLUSION:
    Enhancements were made to s1term processing used by
    a node to obtain a srvtab and the supman password.
    The effected scripts were kfserver and pssfb_script
    for srvtab processing and srvsuppwd and getsuppwd
    for the processing of the supman password.

    ------

    APAR: IY31700 COMPID: 5765B9500 REL: 140
    ABSTRACT: PROBLEM IN CONVERTING FROM MMSDRFS TO MMSDRFS2 WHEN MIGRATING

    PROBLEM DESCRIPTION:
    After migration from gpfs-1.2 to gpfs-1.5, customer was unable
    to start daemon because of the following errors ...
    mmcommon: Invalid keyword: getNodeSDRdata
    mmremote: Unexpected error from getLock: convertFwd. Return code
    mmremote: 6027-1571 Unexpected failure executing sysctl -h cws1
    Check the preceding messages if any. Check /etc/sysctl.mmcmd.ac
    restart sysctl on cws1, or run kinit on this node.
    From the error messages this appeared to be sysctl related.
    However, the actual problem was that the mmcommon routine was
    being called with a function name that had changed in the
    gpfs-1.5 version of the file, hence the "Invalid keyword".

    PROBLEM SUMMARY:
    Allow getnodesdrdata to be called directly.
    Needed when migrating from rel 1.1 or 1.2.

    PROBLEM CONCLUSION:
    fixed migration path from release 1.1 and 1.2

    ------

    APAR: IY31780 COMPID: 5765D5100 REL: 340
    ABSTRACT: SETUP_SERVER SHOULD IGNORE PPP CONNECTIONS

    PROBLEM DESCRIPTION:
    If pp0 adapter is pressent setup_server fails.
    setup_server : host: 0827-803 Cannot find address 0.0.0.0.
    setup_CWS: 0016-338 Kerberos setup was bypassed for network
    interfaces that could not be resolved
    Setup_server ends with rc = 0. But The node you are installing
    does not receive a kerberos ticket.
    Circumvention this problem by detaching pp0 causes that
    svcagent cannot be activated and running during setup_server
    action.

    LOCAL FIX:
    A good workaround is to add an entry to /etc/hosts like:
    zero 0.0.0.0 # dummy ppp entry to prevent setup_server problems

    PROBLEM SUMMARY:
    When the Point-to-Point Protocol (PPP) is being used on
    a Control Workstation, setup_CWS will terminate processing
    with the messages:
    host: 0827-803 Cannot find address 0.0.0.0.
    setup_CWS: 0016-338 Kerberos setup was bypassed for
               network interfaces that could not be resolved.
    Since the Point-to-Point Protocol is being displayed in
    the netstat -in data, setup_CWS tries to determine the
    IP addresses for these interfaces and fails. The data
    from the Point-to-Point Protocol should be ignored
    by setup_CWS.

    PROBLEM CONCLUSION:
    setup_CWS has been modified to skip lines of data from
    netstat -in which refer to the Point-to-Point Protocol.

    ------

    APAR: IY31795 COMPID: 5765B9500 REL: 150
    ABSTRACT: MMSETRCMD COMMAND MISSING IN GPFS 1.5 FOR AIX DISTRIBUTION

    PROBLEM DESCRIPTION:
    mmchcluster command fails in aix environment because the
    mmsetrcmd command is missing.

    PROBLEM CONCLUSION:
    The mmsetrcmd is not in the set of scripts that are
    shipped with AIX. However, the command is documented
    in the command reference for AIX.
    GPFS will now include this script in the inventory of
    scripts.

    ------

    APAR: IY31801 COMPID: 5765B9501 REL: 320
    ABSTRACT: ASSERT AFTER METANODE RELINQUISH

    PROBLEM DESCRIPTION:
    assert after metanode relinquish

    PROBLEM SUMMARY:
    Fixed an Assert after metanode relinquish

    PROBLEM CONCLUSION:
    Test for turning off the newMnode flag was in the wrong
    place

    ------

    APAR: IY31807 COMPID: 5765E5400 REL: 441
    ABSTRACT: HACMP/HAES - ICMP PING CAUSES DELAY IN CLGETADDR, CLGETACTIVENOD

    PROBLEM DESCRIPTION:
    This APAR corrects the condition where a node is pinging an
    adapter on another node and clgetaddr, clgetactivenodes,
    clfindres, et. is executing simultaneously which causes the
    command to take several minutes to execute.

    PROBLEM SUMMARY:
    If the customer is executing a ping command on the local node
    to any responding remote node and then simultaneously
    executes a clgetaddr or clgetactivenodes, then a long
    delay (approx) 10 minutes or more will result.

    PROBLEM CONCLUSION:
    Modification of clgetaddr, and other utilities to send
    multiple ping requests.

    ------

    APAR: IY31820 COMPID: 5765D9300 REL: 310
    ABSTRACT: C++ NON-THREADED PROGRAMS MAY ABORT WHEN RUN WHEN COMPILED WITH

    PROBLEM DESCRIPTION:
    When a C++ program is compiled with mpCC (the non-threaded
    compile script and then run, jobs may abort.
    The workaround listed in the poe.README for C++ programs needs
    to be altered so that the workaround states it also applies to
    VAC 5.0 . IBM will recommend that customers use the mpCC_r
    compile script which is the threaded comi

    PROBLEM SUMMARY:
    C++ non-threaded programs may abort when run with
    the mpCC compile script.

    PROBLEM CONCLUSION:
    The poe.README is being changed to document that C++
    executables built with the non-threaded MPI library may
    abort when run. IBM recommends that the threaded compile
    script such as mpCC_r be used.
    There is also a workaround documented in the
    poe.README for creating an alternate mpCC
    script that provides for an alternative
    initialization routine bound in the executable
    that prevents the job abort problem. Threaded
    applications compiled with the mpCC_r script are
    not affected.

    ------

    APAR: IY31861 COMPID: 5765E5400 REL: 440
    ABSTRACT: HAS,HAES: CLVERIFY ERROR WHEN CONCURRENT RG DOES NOT INCLUDE

    PROBLEM DESCRIPTION:
    The customer was attemptin to sync resources in a 5 node cluster
    which had a concurrent resource group with only 2 of the
    cluster nodes participating in the group and no disk fencing
    specified. The sync attempt failed with error msg from
    clverify indicating "Not all nodes of cluster were included in
    the concurrent resource group".

    PROBLEM SUMMARY:
    The customer was attempting to sync resources in a 5 node
    cluster which had a concurrent resource group with only 2 of
    the cluster nodes participating in the group and no disk
    fencing specified. The sync attempt failed with error msg from
    clverify indicating "Not all nodes of cluster were included in
    the concurrent resource group".

    PROBLEM CONCLUSION:
    The code was changed so that all nodes of the cluster have
    to participate in a concurrent RG only if fencing is
    specified as TRUE for the RG.

    ------

    APAR: IY31876 COMPID: 5765E5400 REL: 440
    ABSTRACT: HAES: ADD SNAPSHOT CONVERSION PATHS

    PROBLEM DESCRIPTION:
    It is currently not possible to migrate snapshots from versions
    HACMP 4.3.1
    HACMP 4.4.0
    to
    HACMP/ES 4.4.1
    Support for these conversions should be added.

    PROBLEM CONCLUSION:
    Conversion paths for updating snapshots from HAS 4.3.1 and
    4.4.0 are added. These will not allow a migration install
    of HAES 4.4.1 over these version, but it will allow converting
    existing snapshots for use with HAES 4.4.1.

    ------

    APAR: IY31900 COMPID: 5765B9501 REL: 340
    ABSTRACT: ASSERT AFTER METANODE RELINQUISH

    PROBLEM DESCRIPTION:
    assert after metanode relinquish

    PROBLEM SUMMARY:
    Fixed an Assert after metanode relinquish

    PROBLEM CONCLUSION:
    Test for turning off the newMnode flag was in the wrong
    place

    ------

    APAR: IY31915 COMPID: 5765D5100 REL: 340
    ABSTRACT: PROBLEMS MOUNTING GPFS FS AFTER DELETING DISKS. DISK DESCRIPTOR

    PROBLEM DESCRIPTION:
    Problems mounting gpfs fs after deleting disks. The error
    6027-711 was received which indicated that the disk or fs
    does not exits. It mentioned the deleted disks. the mmsdrfs2
    file in the SDR and /var/mmfs/gen were updated and did not show
    the disks. The problem is that the disk descriptor areas on
    some vsd's are not updated. By chance, the ones that are not
    updated are the first one gpfs uses in attempting to mount the
    fs causing the failure.

    PROBLEM SUMMARY:
    After the mmdeldisk command, some filesystem would not be
    able to remount due to old replica data.

    PROBLEM CONCLUSION:
    When migrating the stripe group descriptor to a new replica
    set, update the copy of the destriptor on all other disks in
    the stripe group as well. This is necessary to prevent
    future attempts to read from disks in the old replica set in
    case these disks have since been deleted from the stripe
    group.

    ------

    APAR: IY31916 COMPID: 5765B9501 REL: 340
    ABSTRACT: PROBLEMS MOUNTING GPFS FS AFTER DELETING DISKS. DISK DESCRIPTOR

    PROBLEM DESCRIPTION:
    Problems mounting gpfs fs after deleting disks. The error
    6027-711 was received which indicated that the disk or fs
    does not exits. It mentioned the deleted disks. the mmsdrfs2
    file in the SDR and /var/mmfs/gen were updated and did not show
    the disks. The problem is that the disk descriptor areas on
    some vsd's are not updated. By chance, the ones that are not
    updated are the first one gpfs uses in attempting to mount the
    fs causing the failure.

    PROBLEM SUMMARY:
    After the mmdeldisk command, some filesystem would not be
    able to remount due to old replica data.

    PROBLEM CONCLUSION:
    When migrating the stripe group descriptor to a new replica
    set, update the copy of the destriptor on all other disks in
    the stripe group as well. This is necessary to prevent
    future attempts to read from disks in the old replica set in
    case these disks have since been deleted from the stripe
    group.

    ------

    APAR: IY31966 COMPID: 5765D6100 REL: 220
    ABSTRACT: LOADLEVELER WRITES TO SOCKET HANG, POSSIBLY CAUSING CORE DUMP

    PROBLEM DESCRIPTION:
    If a LoadLeveler daemon is writing to a socket and the socket
    window fills up, the write can hang until the window drains. If
    the hang is long enough (e.g. if the client is suspended the
    window will never drain) and a LoadLeveler daemon is holding
    locks over the write, this can eventually cause the LoadLeveler
    daemon to core dump.

    PROBLEM SUMMARY:
    LoadLeveler daemons (such as the LoadL_negotiator) can
    hang if they are writing to a socket, and the process
    reading from the socket is suspended. If a LoadLeveler
    daemon hangs writing to a socket this could result in a
    core dump.

    PROBLEM CONCLUSION:
    The LoadLeveler library code has been changed to prevent
    socket writes from hanging when the socket window fills.
    LoadLeveler will set the socket in non-blocking mode
    and allow write operations to time-out.

    ------

    APAR: IY31991 COMPID: 5765C3403 REL: 430
    ABSTRACT: REDUCEVG FAILS TO REMOVE DISK PREVIOUSLY CONTAINING DUMPLV

    PROBLEM DESCRIPTION:
    reducevg fails to remove disk from volume group when disk
    formerly held dump device.

    PROBLEM SUMMARY:
    reducevg fails when customer attempts to remove a disk
    from the volume group which formerly held a copy of the
    dump logical volume (the dump_inited flag is still set).

    PROBLEM CONCLUSION:
    The rmlvcopy command needs to correctly change the status
    of the dump_inited flag when removing a dumplv from a
    disk.

    ------

    APAR: IY31994 COMPID: 5765B9501 REL: 330
    ABSTRACT: READDIR() MISSES MOVED BLOCKS

    PROBLEM DESCRIPTION:
    ls -l command on GPFS may miss to list files. This may
    happen in cases where the directory gets increases.
    Example: A directory consists of 3 blocks (0,1,2).
    A ls -l command is run, using readdir() the read
    directory entries. As the ls command is stating the
    the first files another file gets created by another
    process. If the directory needs to be increased to
    hold this new created file another block (3) gets
    allocated and half of the enrties of block 1 are
    copied into block 3. The last readdir run by the ls
    command is supposed to see this has happened and
    return results from both blocks, but it is only
    returning entries left in block 1, so it leaves out
    the ones moved to block 3.
    This problem may be seen by other commands using
    readdir() too.

    PROBLEM SUMMARY:
    ls -l may not show all new entries, readdir()
    misses moved blocks.

    PROBLEM CONCLUSION:
    readdir scan was stopping too soon in some
    cases if the directory block was split after the scan started.
    Also, fix code to work if a directory block merge occurs in
    the middle of readdir scan. This won't happen with current
    code because merge gets lock that conflicts with readdir
    (also, merge always fails with E_MULTI_RANGE_LOCK(), but fix it
    anyway in case this changes.

    ------

    APAR: IY31997 COMPID: 5765C3403 REL: 430
    ABSTRACT: SECURITY: BUFFER OVERFLOW IN ERRPT

    PROBLEM DESCRIPTION:
    Security problem with errpt.

    PROBLEM CONCLUSION:
    Lengthen a fixed-length buffer beyond the max argument
    list length.

    ------

    APAR: IY32009 COMPID: 5765D5100 REL: 340
    ABSTRACT: SERVICES_CONFIG FAILING TO CALL ACCT_CONFIG

    PROBLEM DESCRIPTION:
    services_config failing to call acct_config

    PROBLEM SUMMARY:
    Under certain conditions, services_config was not calling
    acct_config when it should have. As a result, accouting
    was not set up correctly on the node.

    PROBLEM CONCLUSION:
    services_config was modified to correctly call acct_config.

    ------

    APAR: IY32016 COMPID: 5765C3403 REL: 430
    ABSTRACT: IMPLEMENT AIX PCI EEH MULTIFUNCTION ADAPTER SUPPORT

    PROBLEM DESCRIPTION:
    Multifunction adapters may cause a machine check.

    PROBLEM CONCLUSION:
    Implement EEH kernel services for multifunction adapters
    that will allow device drivers to recover from fatal errors
    on hardware assisted systems.

    ------

    APAR: IY32027 COMPID: 5765D9300 REL: 320
    ABSTRACT: C++ NON-THREADED PROGRAMS MAY ABORT WHEN RUN WHEN COMPILED WITH

    PROBLEM DESCRIPTION:
    When a C++ program is compiled with mpCC (the non-threaded
    compile script and then run, jobs may abort.
    The workaround listed in the poe.README for C++ programs needs
    to be altered so that the workaround states it also applies to
    VAC 5.0 . IBM will recommend that customers use the mpCC_r
    compile script which is the threaded comi

    PROBLEM SUMMARY:
    C++ non-threaded programs may abort when run with
    the mpCC compile script.

    PROBLEM CONCLUSION:
    The poe.README is being changed to document that C++
    executables built with the non-threaded MPI library may
    abort when run. IBM recommends that the threaded compile
    script such as mpCC_r be used.
    There is also a workaround documented in the
    poe.README for creating an alternate mpCC
    script that provides for an alternative
    initialization routine bound in the executable
    that prevents the job abort problem. Threaded
    applications compiled with the mpCC_r script are
    not affected.

    ------

    APAR: IY32046 COMPID: 5765C3403 REL: 430
    ABSTRACT: ONLINE MIRROR BACKUP FAILS WITH SEQUENTIAL SCHEDULING

    PROBLEM DESCRIPTION:
    chfs -a splitcopy fails on LV with sequential scheduling
    policy

    PROBLEM CONCLUSION:
    Add logic to sequential mirroring code to check for online
    mirror backups

    ------

    APAR: IY32047 COMPID: 5765C3403 REL: 430
    ABSTRACT: XLATE IOCTL FAILING ON STRIPED LVS

    PROBLEM DESCRIPTION:
    The XLATE ioctl is randomly returnning failures on
    striped logical volumes.

    ------

    APAR: IY32069 COMPID: 5765B9501 REL: 340
    ABSTRACT: ASSRT FAILED:OPENFILE.C, LINE 4448

    PROBLEM DESCRIPTION:
    assert failed; openfile.c, line 4448

    PROBLEM SUMMARY:
    When UpdateDataBlockDiskAddrs() returns error other than
    E_NOT_METANODE, it is updating version field with the
    uninitialized stack value. As a result
    cleanIndirectUpdates() never reset dirtyIndirectUpdates
    which caused the assert.

    PROBLEM CONCLUSION:
    mnUpdateSomeDataBlockDiskAddrs() updates version only when
    there are no errors from UpdateDataBlockDiskAddrs()

    ------

    APAR: IY32071 COMPID: 5765B9501 REL: 330
    ABSTRACT: ASSRT FAILED:OPENFILE.C, LINE 4448

    PROBLEM DESCRIPTION:
    assert failed; openfile.c, line 4448

    PROBLEM SUMMARY:
    When UpdateDataBlockDiskAddrs() returns error other than
    E_NOT_METANODE, it is updating version field with the
    uninitialized stack value. As a result
    cleanIndirectUpdates() never reset dirtyIndirectUpdates
    which caused the assert.

    PROBLEM CONCLUSION:
    mnUpdateSomeDataBlockDiskAddrs() updates version only when
    there are no errors from UpdateDataBlockDiskAddrs()

    ------

    APAR: IY32072 COMPID: 5765B9501 REL: 320
    ABSTRACT: ASSRT FAILED:OPENFILE.C, LINE 4448

    PROBLEM DESCRIPTION:
    assert failed; openfile.c, line 4448

    PROBLEM SUMMARY:
    When UpdateDataBlockDiskAddrs() returns error other than
    E_NOT_METANODE, it is updating version field with the
    uninitialized stack value. As a result
    cleanIndirectUpdates() never reset dirtyIndirectUpdates
    which caused the assert.

    PROBLEM CONCLUSION:
    mnUpdateSomeDataBlockDiskAddrs() updates version only when
    there are no errors from UpdateDataBlockDiskAddrs()

    ------

    APAR: IY32075 COMPID: 5765E5400 REL: 440
    ABSTRACT: CL_LSVG AND CL_LSFS ERRORS (HACMP441 HAES441)

    PROBLEM DESCRIPTION:
    cl_lsvg and cl_lsfs generate trash output
    and error messages: can't locate VG/FS

    PROBLEM SUMMARY:
    Running smitty cl_admin and then selecting:
            Cluster Logical Volume Manager
            Shared Logical Volumes
            List All Shared Logical Volumes by Volume Group
    generates the error:
    cl_lsvg: can't locate VG
    Running smitty cl_admin and then selecting:
            Cluster Logical Volume Manager
            Shared File Systems
            Journaled File Systems
            List All Shared File Systems
    generates the error:
    cllsfs:can't locate FS
    cl_lsvg.cel and cl_lsfs.cel both had errors
    that caused the script to attempt to process all
    line of a temporary file instead of just the
    lines that contained the name of the resource
    group currently being processed.

    PROBLEM CONCLUSION:
    The cl_lsfs.cel and cl_lsvg.cel scripts
    were modified to grep the temporary file
    for the resource group being processed
    rather than reading all lines from the
    file.

    ------

    APAR: IY32102 COMPID: 5765D5100 REL: 320
    ABSTRACT: ESTART FAILED ON AN SP SWITCH2 SYSTEM WITH WRAPPED SWITCH PORTS

    PROBLEM DESCRIPTION:
    estart failed on an sp switch2 system with swapped switch por

    PROBLEM SUMMARY:
    On a SP Switch 2 system, if the switch ports have
    wrap plugs, Estart may fail. The following message
    will be in the flt file on the primary node:
    CSswitchInit: 2510-712 generate_service_routes() failed
    with rc=103. If the wrap plugs are removed, Estart will
    succeed.
    This has been seen at sevice levels ssp.css 3.4.0.6
    or 3.4.0.7; it may also occur at service level
    ssp.css 3.2.0.18.

    PROBLEM CONCLUSION:
    The switch fault service daemon code has been corrected.

    ------

    APAR: IY32140 COMPID: 5765B9501 REL: 340
    ABSTRACT: ASSERT: MMQUOTAON MAIN PROCESS 2564178 KILLED BY SIGNAL 11

    PROBLEM DESCRIPTION:
    assert: mmquotaon main process 2564178 killed by signal 11

    PROBLEM SUMMARY:
    Without dereferencing the pointer quSharesPP, QuotaOn() is
    setting to zero the storage area at quSharesPP which caused
    segmentation.

    PROBLEM CONCLUSION:
    Added missing dereference so that right area of memory is
    zeroed out.

    ------

    APAR: IY32141 COMPID: 5765B9501 REL: 320
    ABSTRACT: ASSERT: MMQUOTAON MAIN PROCESS 2564178 KILLED BY SIGNAL 11

    PROBLEM DESCRIPTION:
    assert: mmquotaon main process 2564178 killed by signal 11

    PROBLEM SUMMARY:
    Without dereferencing the pointer quSharesPP, QuotaOn() is
    setting to zero the storage area at quSharesPP which caused
    segmentation.

    PROBLEM CONCLUSION:
    Added missing dereference so that right area of memory is
    zeroed out.

    ------

    APAR: IY32142 COMPID: 5765B9501 REL: 330
    ABSTRACT: ASSERT: MMQUOTAON MAIN PROCESS 2564178 KILLED BY SIGNAL 11

    PROBLEM DESCRIPTION:
    assert: mmquotaon main process 2564178 killed by signal 11

    PROBLEM SUMMARY:
    Without dereferencing the pointer quSharesPP, QuotaOn() is
    setting to zero the storage area at quSharesPP which caused
    segmentation.

    PROBLEM CONCLUSION:
    Added missing dereference so that right area of memory is
    zeroed out.

    ------

    APAR: IY32182 COMPID: 5765E5400 REL: 440
    ABSTRACT: HAES: NODEXNODE LEAVES HACMPPAGER WITH HAS PATH TO SAMPLE.TXT

    PROBLEM DESCRIPTION:
    After a node by node migration from HAS 450 to HAES 450 with
    pager configured the odm HACMPpager has the path to the sample.txt
    file in the format for HAS (or /usr/sbin/custer...).

    PROBLEM CONCLUSION:
    The default pager file will be remapped to its HAES location.
    User created files will be unmodified.

    ------

    APAR: IY32184 COMPID: 5765B9501 REL: 340
    ABSTRACT: SIGNAL 11 DURING LOG RECOVERY

    PROBLEM DESCRIPTION:
    A bad record in the recovery logs overwrites some other stack
    data causing a SIGSEGV.

    PROBLEM SUMMARY:
    SIGSEGV caused by bad record in the recovery logs

    PROBLEM CONCLUSION:
    When importing a replicated disk address from the log, check
    that nValidDiskAddrs fits in a RepDiskAddr structure. A bad
    log record caused stack corruption and later a SIGSEGV
    occurred.

    ------

    APAR: IY32186 COMPID: 5765E6900 REL: 310
    ABSTRACT: BATCH TASK GEOMETRY JOB GETS ORPHAN. LL CAN'T DISPATCH NEW

    PROBLEM DESCRIPTION:
    Customer has a task geometry job that would get orphan
            processes on lpar nodes. Then when LL thinks it is
            gone, no other new jobs could be scheduled. A recycle
            was needed after orphan processes were killed.

    PROBLEM SUMMARY:
    In LoadLeveler, task_geometry jobs vectors are not
    created correctly and memory errors can occur.
    Or the Negotiator could core dump with Segmentation
    fault.

    PROBLEM CONCLUSION:
    In LoadLeveler, task_geometry jobs vectors
    are now created correctly.
    Memory leaks calls in Accumulator and Backfill
    and Gang dispatching are fixed.

    ------

    APAR: IY32189 COMPID: 5765B9501 REL: 320
    ABSTRACT: READDIR() MISSES MOVED BLOCKS

    PROBLEM DESCRIPTION:
    ls -l command on GPFS may miss to list files. This may
    happen in cases where the directory gets increases.
    Example: A directory consists of 3 blocks (0,1,2).
    A ls -l command is run, using readdir() the read
    directory entries. As the ls command is stating the
    the first files another file gets created by another
    process. If the directory needs to be increased to
    hold this new created file another block (3) gets
    allocated and half of the enrties of block 1 are
    copied into block 3. The last readdir run by the ls
    command is supposed to see this has happened and
    return results from both blocks, but it is only
    returning entries left in block 1, so it leaves out
    the ones moved to block 3.
    This problem may be seen by other commands using
    readdir() too.

    PROBLEM SUMMARY:
    ls -l may not show all new entries, readdir()
    misses moved blocks.

    PROBLEM CONCLUSION:
    readdir scan was stopping too soon in some
    cases if the directory block was split after the scan started.
    Also, fix code to work if a directory block merge occurs in
    the middle of readdir scan. This won't happen with current
    code because merge gets lock that conflicts with readdir
    (also, merge always fails with E_MULTI_RANGE_LOCK(), but fix it
    anyway in case this changes.

    ------

    APAR: IY32192 COMPID: 5765E5400 REL: 440
    ABSTRACT: HACMP/HAES:ERROR FOUND IN CSPOC.LOG DURING CLVM CSPOC OPERATION

    PROBLEM DESCRIPTION:
    While creating a concurrent volume group in SMIT using "Create
    a Concurrent Volume Group" in CSPOC, one line in cspoc.log
    showed "FAILED" and another line shortly afterward showed
    "RETURN CODE=1". However, the intended operation worked
    properly and there were no errors in hacmp.out or AIX errlog.

    PROBLEM CONCLUSION:
    Concurrent volume groups should be not varied off after
    creation. Special check should be added for Conc. VG.

    ------

    APAR: IY32224 COMPID: 5765E5400 REL: 440
    ABSTRACT: HACMP/HAES: CLVERIFY INTERPRETS SOME WARNINGS AS ERRORS

    PROBLEM DESCRIPTION:
    clverify terminates or returns with a non-zero error count
    even if only warnings have been issued.

    PROBLEM CONCLUSION:
    Scan the messages before sending them, to distinguish warning
    from error messages, based on the header inserted by clverify.
    Skip updating the error count for warning messages.

    ------

    APAR: IY32226 COMPID: 5765E5400 REL: 440
    ABSTRACT: HACMP/HAES: DO NOT MODIFY /ETC/RC.SHUTDOWN IF OFFICIAL CALLOUT

    PROBLEM DESCRIPTION:
    HACMP will rename any user /etc/rc.shutdown, and insert its
    own.

    PROBLEM CONCLUSION:
    If /etc/shutdown contains a callout for HACMP, do not replace
    any user version of /etc/rc.shutdown.

    ------

    APAR: IY32260 COMPID: 5765E5400 REL: 440
    ABSTRACT: HACMP/HAES: GET_ADDRS CAN RETURN INCORRECT IP ADDRESS

    PROBLEM DESCRIPTION:
    a call to clgetaddr for a node that is down return the
    service label which has been taken over by another node

    PROBLEM CONCLUSION:
    modify checking performed by library functions.

    ------

    APAR: IY32266 COMPID: 5765D5100 REL: 340
    ABSTRACT: SOMETIMES LAPI SHARED MEMORY PROGRAM HANGS.

    PROBLEM DESCRIPTION:
    Sometimes LAPI chared memory program hangs. In the user
    application is was a GAMESS program running a 32 way LAPI
    shared memory job.

    PROBLEM SUMMARY:
    Fixed a problem in lapi that causes the application to hang
    sometimes.

    PROBLEM CONCLUSION:
    Sometimes a LAPI shared memory application will hang.

    ------

    APAR: IY32268 COMPID: 5765E6900 REL: 310
    ABSTRACT: LOADL_CONFIG ENV VARIABLE ERROR MSG FIX AND DOCUMENT FORMAT

    PROBLEM DESCRIPTION:
    The LOADL_CONFIG env variable format isn't documentated. Also,
            the error message that it outputted is deceiving.

    LOCAL FIX:
    Put in the correct format to LOADL_CONFIG. Either in the
            format of LOADL_CONFIG=/etc/LoadL.cfg or
            LOADL_CONFIG=LoadL

    PROBLEM SUMMARY:
    There isn't any correct input format documented
    for the LoadLeveler environment variable LOADL_CONFIG.
    And when the incorrect format was used, the
    error message outputted for the incorrect filename
    was misleading.

    PROBLEM CONCLUSION:
    The error message from LOADL_CONFIG environment
    variable will now put the filename of
    what it is trying to open.
    The following would be added to the LoadLeveler
    documentation:
    In Using and Administering LoadLeveler,
    Chapter 5. Submitting and managing jobs,
    Subparagraph "Querying multiple LoadLeveler clusters",
    The format for LOAD_CONFIG environment variable:
    LOADL_CONFIG="fully qualified path and filename"
    e.g.
       LOADL_CONFIG=/etc/LoadL.cfg
    or
    LOADL_CONFIG="Name of the file without any suffix extension"
       This is because internally the prefix "/etc"
    and the suffix ".cfg" would be appended to the
    beginning and to the ending of the filename specified.
    e.g.
       LOADL_CONFIG=LoadL

    ------

    APAR: IY32331 COMPID: 5765D9300 REL: 320
    ABSTRACT: WORKAROUNDS RELATED TO THE USE OF TECHNICAL LARGE PAGE FOR POE

    PROBLEM DESCRIPTION:
    workarounds related to the use of technical large page
    for POE jobs.

    PROBLEM SUMMARY:
    workarounds related to the use of
    technical large page for POE jobs.

    PROBLEM CONCLUSION:
    workarounds related to the use of technical
    large page for POE jobs.

    ------

    APAR: IY32353 COMPID: 5765D5100 REL: 320
    ABSTRACT: TCE LEAK

    PROBLEM DESCRIPTION:
    tce leak

    PROBLEM SUMMARY:
    KHAL buffers supporting KLAPI zero copy were
    released under the condition that KHAL port status is clean.
    This prevented buffers from being released when port status had
    certain flags, reflecting the internal KHAL status, set.
    The only condition that should have been checked in the KHAL
    port status is whether the port is closed.

    PROBLEM CONCLUSION:
    To resolve the problem, the check has been
    added to KHAL function which releases KHAL buffers allocated
    in support if KLAPI zero copy operation. The check verifies
    that the buffers are released under any circumstances except
    for that the KHAL port is closed.

    ------

    APAR: IY32361 COMPID: 5765D5100 REL: 320
    ABSTRACT: NODECOND_MCA NEEDS TO HANDLE 10/100 ADAPTERS

    PROBLEM DESCRIPTION:
    nodecond_mca does not currently recognize the
    10/100 Mbs Ethernet TX MC Adapter. It terminates with the msg:
    the first ethernet adapter detected is not a supported
    installation adapter.
    The code needs to be modified to recognize this supported
    adapter.

    LOCAL FIX:
    Manual node conditioning can be used to select this adapter.

    PROBLEM SUMMARY:
    nodecond_mca does not currently recognize the
    10/100 Mbs Ethernet TX MC Adapter. It terminates with the
    message that the first ethernet adapter detected is not a
    supported installation adapter.
    The code needs to be modified to recognize this supported
    adapter.

    PROBLEM CONCLUSION:
    nodecond_mca has been modified to recognize the
    10/100 Mbs Ethernet TX MC Adapter.

    ------

    APAR: IY32362 COMPID: 5765D5100 REL: 340
    ABSTRACT: NODECOND_MCA NEEDS TO HANDLE 10/100 ADAPTERS

    PROBLEM DESCRIPTION:
    nodecond_mca does not currently recognize the
    10/100 Mbs Ethernet TX MC Adapter. It terminates with the msg:
    the first ethernet adapter detected is not a supported
    installation adapter.
    The code needs to be modified to recognize this supported
    adapter.

    LOCAL FIX:
    Manual node conditioning can be used to select this adapter.

    PROBLEM SUMMARY:
    nodecond_mca does not currently recognize the
    10/100 Mbs Ethernet TX MC Adapter. It terminates with the
    message that the first ethernet adapter detected is not a
    supported installation adapter.
    The code needs to be modified to recognize this supported
    adapter.

    PROBLEM CONCLUSION:
    nodecond_mca has been modified to recognize the
    10/100 Mbs Ethernet TX MC Adapter.

    ------

    APAR: IY32365 COMPID: 5765B9501 REL: 340
    ABSTRACT: READDIR() MISSES MOVED BLOCKS

    PROBLEM DESCRIPTION:
    ls -l command on GPFS may miss to list files. This may
    happen in cases where the directory gets increases.
    Example: A directory consists of 3 blocks (0,1,2).
    A ls -l command is run, using readdir() the read
    directory entries. As the ls command is stating the
    the first files another file gets created by another
    process. If the directory needs to be increased to
    hold this new created file another block (3) gets
    allocated and half of the enrties of block 1 are
    copied into block 3. The last readdir run by the ls
    command is supposed to see this has happened and
    return results from both blocks, but it is only
    returning entries left in block 1, so it leaves out
    the ones moved to block 3.
    This problem may be seen by other commands using
    readdir() too.

    PROBLEM SUMMARY:
    ls -l may not show all new entries, readdir()
    misses moved blocks.

    PROBLEM CONCLUSION:
    readdir scan was stopping too soon in some
    cases if the directory block was split after the scan started.
    Also, fix code to work if a directory block merge occurs in
    the middle of readdir scan. This won't happen with current
    code because merge gets lock that conflicts with readdir
    (also, merge always fails with E_MULTI_RANGE_LOCK(), but fix it
    anyway in case this changes.

    ------

    APAR: IY32415 COMPID: 5765E6900 REL: 310
    ABSTRACT: NEW CONFIGURATION KEYWORD: ENFORCE_RESOURCE_POLICY = HARD | SOFT

    PROBLEM DESCRIPTION:
    New configuration keyword:
    ENFORCE_RESOURCE_POLICY = hard | soft | shares

    PROBLEM SUMMARY:
    A new keyword that allows the administrator to define the
    type of enforcement policy that LoadLeveler will use when
    creating WLM classes.

    PROBLEM CONCLUSION:
    LoadLeveler by default will create WLM shares bases on a job
    step's resource requirements when creating a WLM class. The
    new keyword will let the administrator decide whether
    shares, soft limits or hard limits should be defined.
    Soft and hard limits will represent the percentage of step
    requested resources divided by total machine resources.

    ------

    APAR: IY32417 COMPID: 5765D5100 REL: 340
    ABSTRACT: NGRESOLVE -D FOR IP ADDRESS RETURNS BLANK LINES

    PROBLEM DESCRIPTION:
    ngresolve -d for ip address returns blank lines

    PROBLEM SUMMARY:
    Issuing ngresolve with the -d flag should display the IP
    address of each node in the node group. Currently a
    blank line is being displayed for each node. The adapter
    information for the node was not being passed correctly.

    PROBLEM CONCLUSION:
    Changed the format to pass "node number and adapter type"
    from SpNode.C to the constructor of adapter SpAdapter.C.
    ngresolve with the -d flag will now display the IP
    address of each node in the node group.

    ------

    APAR: IY32429 COMPID: 5765D5100 REL: 340
    ABSTRACT: DOUBLE FREE OF SERVICE PACKET STORAGE IN HAL_RECV_HNDLR()

    PROBLEM DESCRIPTION:
    In hal_recv_hndlr(), there are cases where the storage for a
    service packet is freed after the packet has been placed on a
    port'svirtual receive FIFO. The port thread will also free the
    storage after reading the service packet from the FIFO. The
    free in hal_recv_hndlr() is erroneous. The results are
    indeterminate, because it depends on if/when the doubly-freed
    storage is reused. One possible result is a fault-service daemon
    core dump.

    LOCAL FIX:
    Restart the fault-service daemon after a core dump with
    /usr/lpp/ssp/css/rc.switch.

    PROBLEM SUMMARY:
    Under some conditions, the hal_recv_hndlr() function will
    free the storage used for service packets; the port thread
    may later free this same storage. The results are
    indeterminate; there may be data corruption or the fault
    service daemon may core dump.

    PROBLEM CONCLUSION:
    Once a service packet is placed on the port's virtual
    receive FIFO queue, the hal_recv_hndlr() function
    will no longer try to free it.

    ------

    APAR: IY32496 COMPID: 5765B9501 REL: 340
    ABSTRACT: MULTIPLE PROCESSES UPDATES TO GPFS RESULT IN CORRUPTED/MISSING

    PROBLEM DESCRIPTION:
    A write/append test demonstrates a serious problem with GPFS
    when multiple processes write append to the same file. The
    resulting file contains corrupted/missing data. A test PGM opens
    a file with flags O_RDWR ] O_CREAT ] O_APPEND and then does a
    series of identically sized writes to the file. If multiple
    copies of the program are run at the same time on the same LPAR
    then the resulting file contains missing or corrupted records.
    Same test was run repeatedly to nfs and locally mounted file
    systems with no problems.

    PROBLEM SUMMARY:
    Writes in append mode overwrites previous records

    PROBLEM CONCLUSION:
    When multiple processes write to the same file in append
    mode some records are being overwritten. The fast write path
    was not getting an Append (wa) lock on the inode to
    serialize the writes.

    ------

    APAR: IY32524 COMPID: 5765D5100 REL: 340
    ABSTRACT: K4DESTROY ERROR MESSAGE FROM CLEANUP.LOGS.NODES SCRIPT

    PROBLEM DESCRIPTION:
    When cron runs the cleanup.logs.nodes script it emails root
    with the error message from the k4destroy command:
         2502-000 k4destroy: No tickets to destroy.
    if kerberos is not active.
    This error message should not be emailed if kerberos is not
    active. kerberos is not active if it is configured but has been
    deactivated using the chauthent and chauthts commands. The test
    for kerberos not active is that the lsauthent command output
    does not include "Kerberos 4" and the lsauthts command output
    not include "Compatibility".

    LOCAL FIX:
    A workaround of adding '2>/dev/null' to the either the
    script or the crontab entry.

    PROBLEM SUMMARY:
    cleanup.logs.nodes issues k4destroy to destroy any Kerberos
    Version 4 authentication tickets. If any messages are
    written to stderr, such as:
    k4destroy: 2502-000 No tickets to destroy.
    it results in an email being sent to root, since it is
    usually run as a cron job. Customers would prefer to not
    see this message and to not receive the email.

    PROBLEM CONCLUSION:
    cleanup.logs.nodes has been modified so that any output to
    stderr from k4destroy will be redirected to stdout,
    which is already being redirected to /dev/null. As a
    result no error messages will be issued from the call
    to k4destroy from cleanup.logs.nodes.

    ------

    APAR: IY32698 COMPID: 5765B9501 REL: 330
    ABSTRACT: SIGNAL 11 DURING LOG RECOVERY

    PROBLEM DESCRIPTION:
    A bad record in the recovery logs overwrites some other stack
    data causing a SIGSEGV.

    PROBLEM SUMMARY:
    SIGSEGV caused by bad record in the recovery logs

    PROBLEM CONCLUSION:
    When importing a replicated disk address from the log, check
    that nValidDiskAddrs fits in a RepDiskAddr structure. A bad
    log record caused stack corruption and later a SIGSEGV
    occurred.

    ------

    APAR: IY32699 COMPID: 5765B9501 REL: 320
    ABSTRACT: SIGNAL 11 DURING LOG RECOVERY

    PROBLEM DESCRIPTION:
    A bad record in the recovery logs overwrites some other stack
    data causing a SIGSEGV.

    PROBLEM SUMMARY:
    SIGSEGV caused by bad record in the recovery logs

    PROBLEM CONCLUSION:
    When importing a replicated disk address from the log, check
    that nValidDiskAddrs fits in a RepDiskAddr structure. A bad
    log record caused stack corruption and later a SIGSEGV
    occurred.

    ------

    APAR: IY32702 COMPID: 5765D5100 REL: 340
    ABSTRACT: S70D DAEMON GETS ERRORS "CANNOT COMMUNICATE WITH REMOTE NODE" IF

    PROBLEM DESCRIPTION:
    At SP systems using a 128-port RAN for S7A tty connection foll
    owing entries occur in system errlog of cws: "cannot communicate
    with remote node". The problem is definately associated with a
    timing issue between the s70d daemon and the H/W. With using
    an 8-port adapter the connection is direct between the 8-port
    box and the CWS. With a 16-port box there is only a direct
    connection between the initial box and the CWS (all other boxes
    making up the 128-way adapter do not have direct connection
    which is where the problem lies).

    PROBLEM SUMMARY:
    SP_attached S70s and S7As connected using more than
    When an S70 or S7A is connected using more than an
    8-port RAN, SPMON_EMSG101_ER entries may be made in the
    errpt. This indicates a communication problem, which
    is not the case. The s70d daemon needs to be more
    tolerant of non-responses from the SAMI.

    PROBLEM CONCLUSION:
    The s70d daemon has been modified to be more tolerant of
    non-responses of any kind from SAMI. The allowable number
    of non-responses has been increased to prevent
    SPMON_EMSG101_ER entries being made in the errpt indicating
    the Supervisor is not responding.

    ------

    APAR: IY32752 COMPID: 5765D5100 REL: 340
    ABSTRACT: SRVSUPPWD NEEDS UNIQUE TMP PW FILENAME

    PROBLEM DESCRIPTION:
    srvsuppwd needs unique tmp pw filename

    PROBLEM SUMMARY:
    The srvsuppwd process creates a temporary file that is
    not unique to the process. Since there may be multiple
    srvsuppwd processes running at the same time, this could
    result in an updsuppwd process on a node having to try
    to obtain the supman password file muptiple times.

    PROBLEM CONCLUSION:
    srvsuppwd has been modified to create a temporary file that
    is unique to the process that creates it.

    ------

    APAR: IY32800 COMPID: 5765B9501 REL: 320
    ABSTRACT: ASSERT: !HASTOKENS() (CACHEOBJFLAGS & 0X02)

    PROBLEM DESCRIPTION:
    assert: !has tokens() !! (cacheobjflags & 0x02)

    PROBLEM SUMMARY:
    gpfs assert: !hasTokens() !! (cacheObjFlags & 0x02)

    PROBLEM CONCLUSION:
    If filesystem panics while relinquish of tokens is in
    progress in releaseLastHold, do not assert that the
    tokencount is zero.

    ------

    APAR: IY32801 COMPID: 5765B9501 REL: 330
    ABSTRACT: ASSERT: !HASTOKENS() (CACHEOBJFLAGS & 0X02)

    PROBLEM DESCRIPTION:
    assert: !has tokens() !! (cacheobjflags & 0x02)

    PROBLEM SUMMARY:
    gpfs assert: !hasTokens() !! (cacheObjFlags & 0x02)

    PROBLEM CONCLUSION:
    If filesystem panics while relinquish of tokens is in
    progress in releaseLastHold, do not assert that the
    tokencount is zero.

    ------

    APAR: IY32803 COMPID: 5765B9501 REL: 340
    ABSTRACT: ASSERT: !HASTOKENS() (CACHEOBJFLAGS & 0X02)

    PROBLEM DESCRIPTION:
    assert: !has tokens() !! (cacheobjflags & 0x02)

    PROBLEM SUMMARY:
    gpfs assert: !hasTokens() !! (cacheObjFlags & 0x02)

    PROBLEM CONCLUSION:
    If filesystem panics while relinquish of tokens is in
    progress in releaseLastHold, do not assert that the
    tokencount is zero.

    ------

    APAR: IY32923 COMPID: 5765E6900 REL: 310
    ABSTRACT: LL'S WLM INTEGRATION. LL JOBS WONT RUN. MISLEADING MESSAGE:

    PROBLEM DESCRIPTION:
    LL used with WLM integration. Periodicaly some parallel jobs
    are in the queue in the "Idle" state although there are clearly
    resources available. "llq -s" run against the job shows "The max
    imum number of steps (27) that can be running when consumable
    resources are enforced has been exceeded".
    This is definitely NOT the case - there are NO jobs running at
    that time.
    note: "blocking" keyword is used in LL job command file !!

    PROBLEM SUMMARY:
    LoadL quit scheduling jobs because it ran out of WLM
    classes even thought there were no jobs running on the
    machine.

    PROBLEM CONCLUSION:
    When loadl is using WLM and the blocking keywork is used in
    the user's job cmd file, LoadL is over incrementing the
    number of classes being used and then decrementing
    appropriately, leaving the difference in the class counter.

    ------

    APAR: IY32955 COMPID: 5724C3505 REL: 310
    ABSTRACT: DTMF DIGITS SOMETIMES DETECTED TWICE IN VOICEXML APPLICATION

    PROBLEM DESCRIPTION:
    DTMF digits are sometimes detected twice when entering data
    into a voiceXML application when reco is active also.

    PROBLEM SUMMARY:
    DTMF digits are sometimes detected twice when
    entering data into a voiceXML application when reco is
    active also.

    PROBLEM CONCLUSION:
    Harness changed to correct problem

    ------

    APAR: IY32966 COMPID: 5765E6900 REL: 310
    ABSTRACT: LOADLEVELER WRITES TO SOCKET HANG, POSSIBLY CAUSING CORE DUMP

    PROBLEM DESCRIPTION:
    If a LoadLeveler daemon is writing to a socket and the socket
    window fills up, the write can hang until the window drains. If
    the hang is long enough (e.g. if the client is suspended the
    window will never drain) and a LoadLeveler daemon is holding
    locks over the write, this can eventually cause the LoadLeveler
    daemon to core dump.

    PROBLEM SUMMARY:
    LoadLeveler daemons (such as the LoadL_negotiator) can
    hang if they are writing to a socket, and the process
    reading from the socket is suspended. If a LoadLeveler
    daemon hangs writing to a socket this could result in a
    core dump.

    PROBLEM CONCLUSION:
    The LoadLeveler library code has been changed to prevent
    socket writes from hanging when the socket window fills.
    LoadLeveler will set the socket in non-blocking mode
    and allow write operations to time-out.

    ------

    APAR: IY32970 COMPID: 5765D5100 REL: 340
    ABSTRACT: SP SWITCH2 WORM RUNS SLOW UNDER HEAVY PAGING LOAD

    PROBLEM DESCRIPTION:
    The current SP Switch2 Worm uses popen() to invoke the sum
    command on the current compressed topology file. The result of
    the sum command is used to determine if an updated copy of the
    topology file needs to be sent to the node. Under heavy paging
    load, the time necessary for popen() to do a fork to invoke ksh,
    and then ksh to do a fork to invoke sum, can be excessive. If
    the Worm does not report back fast enough, when it receives
    a NODE_INIT packet, the primary will drop the node off of the
    switch.

    LOCAL FIX:
    None, really. The node is normally still okay. There shouldn't
    be any problem bringing it back on the switch, via Eunfence.
    But, the damage is already done.

    PROBLEM SUMMARY:
    Nodes can drop off the SP Switch 2 when they are under
    a heavy load (e.g. high levels of paging). The time
    taken to call the AIX sum command to calculate the
    switch topology file checksum may be too long under high
    load conditions, causing the primary node to drop the
    slow responding node off the switch.

    PROBLEM CONCLUSION:
    The fault_service_Worm_RTG_CS code has been changed to
    calculate the checksum of the switch topology file
    directly instead of calling the AIX sum command.

    ------

    APAR: IY32979 COMPID: 5765D5100 REL: 340
    ABSTRACT: SWITCH STOPS RESPONDING TO 8K PACKETS

    PROBLEM DESCRIPTION:
    When there are burst of in-bound IP traffic, if the number of
    outstanding pending IP datagram exceeds the limit of receive
    queue, IP datagram will be dropped, receive cluster buffers will
    be recovered and recycled back to Corsair adapter. Cluster
    buffers are not recovered correctly, and there is an rpool
    cluster buffer leakage. When the leakage becomes severe, no
    more large IP datagram can run thru the switch network.

    PROBLEM SUMMARY:
    When IP traffic overflow the receive side, we will drop the
    incoming IP datagram but we forget to re-claim the IP
    receive cluster buffers, over the long run, we may run out
    of receive cluster buffers and can no longer ping large IP
    datagram over SP switch

    PROBLEM CONCLUSION:
    Reclaim the IP receive cluster buffer approriately.

    ------

    APAR: IY32982 COMPID: 5765E7400 REL: 300
    ABSTRACT: SAVE METRICS DATA TO FILES

    PROBLEM DESCRIPTION:
    save metrics data to files in spreadsheet format and
    HTML format.

    ------

    APAR: IY32985 COMPID: 5765B9501 REL: 320
    ABSTRACT: GPFS FREED ACTIVE STORAGE ON WHEN HANDLING NFS DUPLICATE FCNTL

    PROBLEM DESCRIPTION:
    KERNEL PANIC IN KXCLEANUPACQUIRES
    GPFS freed active storage on when handling NFS duplicate fcntl
    requests retries.

    LOCAL FIX:
    There are no known workarounds for this problem.

    PROBLEM SUMMARY:
    NFS using GPFS caused kernel panic in kxcleanupacquires

    PROBLEM CONCLUSION:
    Clear the local sleepElement pointer returned from
    kxDupCheckAcquires before returning from gpfsFcntl
    (otherwise it will be freed when the routine exits).

    ------

    APAR: IY33005 COMPID: 5765B9501 REL: 340
    ABSTRACT: GPFS FREED ACTIVE STORAGE ON WHEN HANDLING NFS DUPLICATE FCNTL

    PROBLEM DESCRIPTION:
    KERNEL PANIC IN KXCLEANUPACQUIRES
    GPFS freed active storage on when handling NFS duplicate fcntl
    requests retries.

    LOCAL FIX:
    There are no known workarounds for this problem.

    PROBLEM SUMMARY:
    NFS using GPFS caused kernel panic in kxcleanupacquires

    PROBLEM CONCLUSION:
    Clear the local sleepElement pointer returned from
    kxDupCheckAcquires before returning from gpfsFcntl
    (otherwise it will be freed when the routine exits).

    ------

    APAR: IY33010 COMPID: 5765D5100 REL: 330
    ABSTRACT: GPFS FREED ACTIVE STORAGE ON WHEN HANDLING NFS DUPLICATE FCNTL

    PROBLEM DESCRIPTION:
    KERNEL PANIC IN KXCLEANUPACQUIRES
    GPFS freed active storage on when handling NFS duplicate fcntl
    requests retries.

    LOCAL FIX:
    There are no known workarounds for this problem.

    PROBLEM SUMMARY:
    NFS using GPFS caused kernel panic in kxcleanupacquires

    PROBLEM CONCLUSION:
    Clear the local sleepElement pointer returned from
    kxDupCheckAcquires before returning from gpfsFcntl
    (otherwise it will be freed when the routine exits).

    ------

    APAR: IY33064 COMPID: 5765D5100 REL: 340
    ABSTRACT: SP SWITCH 2 WINDOW SUSPEND FAILURE AFTER LOADLEVELER HAS TRIED

    PROBLEM DESCRIPTION:
    On the SP Switch 2, a failure may occur suspending windows if
    a job fails to respond to the suspend request that is issued
    during switch recovery. This problem can happen if a job has
    a SIGKILL pending (having been killed by LoadLeveler) but has
    not yet fully processed the SIGKILL because a thread is in a
    system call with signals blocked. When the windows fail to
    suspend because of a non-responsive job, switch recovery will
    fail on the affected node, and switch responds will be lost on
    the affected switch plane.

    PROBLEM SUMMARY:
    Nodes can drop off the SP Switch 2 when the switch device
    driver fails to suspend jobs that are running. The
    adapter.log will show the following error:
    QUERY SUSPEND WINDOW_COMPLETION ioctl failed

    PROBLEM CONCLUSION:
    The device driver for the SP Switch 2 has been changed
    to allow suspend requests to be properly handled for
    jobs that are starting or stopping.

    ------

    APAR: IY33091 COMPID: 5697E3000 REL: 220
    ABSTRACT: JAVA ON-THE-SPOT PROBLEM

    PROBLEM DESCRIPTION:
    Japan Extension Kit V2.2 is upgraded.

    PROBLEM CONCLUSION:
    All problems we found were fixed.

    ------

    APAR: IY33093 COMPID: 5697E3000 REL: 230
    ABSTRACT: WNN7 UPDATE WITH NEW README

    PROBLEM DESCRIPTION:
    Japan Extension Kit V2.3 is upgraded

    PROBLEM CONCLUSION:
    All problems we found were fixed.

    ------

    APAR: IY33109 COMPID: 5765D9300 REL: 320
    ABSTRACT: POE MAY NOT HANDLE MULTIPLE GMON.OUT FILES ON A NODE CORRECTLY

    PROBLEM DESCRIPTION:
    When customers compile a program for profiling using the -pg
    option using POE, executing the program will sometimes not
    create all or full gmon.out files when there are more than
    a few tasks on a node. Sometimes the customer will see
    messages like the following :
    ATTENTION: 0031-662 Node 1 did not send PROFILE_DONE, sent
    msgtype 15
    ATTENTION: 0031-679 Profiling may not have completed on node 1

    PROBLEM SUMMARY:
    When customers compile a program for profiling using the -pg
    option using POE, executing the program will sometimes not
    create all or full gmon.out files when there are more than
    a few tasks on a node. Sometimes the customer will see
    messages like the following :
    ATTENTION: 0031-662 Node 1 did not send PROFILE_DONE, sent
    msgtype 15
    ATTENTION: 0031-679 Profiling may not have completed on node
    1

    PROBLEM CONCLUSION:
    POE will change the behavior when a SIGCHLD signal
    comes in during the processing of gmon.out files.

    ------

    APAR: IY33110 COMPID: 5765B9501 REL: 340
    ABSTRACT: NEGATIVE IN_DOUBT VALUES ARE NOT DOCUMENTED VERY WELL

    PROBLEM DESCRIPTION:
    the output of mmcheckquota might represent to the user
    negative values for the in_doubt column.
    The Documentation doesnt specify neg. values at all,
    so its occurence is quite confusing for customers, and
    should be documented.

    PROBLEM SUMMARY:
    negative in-doubt values for GPFS quotas were not
    documented

    PROBLEM CONCLUSION:
    In the Administration and Programming Reference in the
    chapter "Performing GPFS Administration tasks" under the
    heading "Checking quotas" and in the chapter
    "GPFS commands" under the description section of the
    mmcheckquota command, add the paragraph:
    When issuing the mmcheckquota command on a mounted file
    system, negative in-doubt values may be reported if the
    quota server processes a combination of up-to-date and
    back-level information. This is a transient situation
    and may be ignored.
    In the Administration and Programming Reference in the
    chapter "Performing GPFS Administration tasks" under the
    heading "Listing quotas" and in the chapter
    "GPFS commands" under the description section of the
    mmlsquota command, add the paragraph:
    When issuing the mmlsquota command on a mounted file
    system,negative in-doubt values may be reported if the
    quota server processes a combination of up-to-date and
    back-level information. This is a transient situation
    and may be ignored.
    In the Administration and Programming Reference in the
    chapter "Performing GPFS Administration tasks" under the
    heading "Creating file system quota reports" and in the
    chapter "GPFS commands" under the description section of
    the mmrepquota command, add the paragraph:
    When issuing the mmrepquota command on a mounted file
    system,negative in-doubt values may be reported if the
    quota server processes a combination of up-to-date and
    back-level information. This is a transient situation
    and may be ignored.
    V3 The man pages for mmcheckquota, mmlsquota and
    mmrepquota were updated with this information.

    ------

    APAR: IY33111 COMPID: 5765B9501 REL: 330
    ABSTRACT: MMFSCK DESTROYED THE ALLOC MAP FILES

    PROBLEM DESCRIPTION:
    mmfsck destroyed the alloc map files

    PROBLEM SUMMARY:
    mmfsck corrupted the alloc map file

    PROBLEM CONCLUSION:
    Fix alloc map segment compare. Map header pointer movement
    was wrong. Alloc segment data length computation was wrong.
    Number of segments to compare was wrong when number of
    segments didn't end on a block boundary. Comparison of
    lastblocksubblocks should allow 0 or 32 for full last block.
    LastSubblocks modulo must use maxSubblocksPerBlock not 32
    for directories and other things that have 'small'
    fullblocks.

    ------

    APAR: IY33113 COMPID: 5765B9501 REL: 340
    ABSTRACT: MMFSCK DESTROYED THE ALLOC MAP FILES

    PROBLEM DESCRIPTION:
    mmfsck destroyed the alloc map files

    PROBLEM SUMMARY:
    mmfsck corrupted the alloc map file

    PROBLEM CONCLUSION:
    Fix alloc map segment compare. Map header pointer movement
    was wrong. Alloc segment data length computation was wrong.
    Number of segments to compare was wrong when number of
    segments didn't end on a block boundary. Comparison of
    lastblocksubblocks should allow 0 or 32 for full last block.
    LastSubblocks modulo must use maxSubblocksPerBlock not 32
    for directories and other things that have 'small'
    fullblocks.

    ------

    APAR: IY33116 COMPID: 5765D5100 REL: 340
    ABSTRACT: 0509-036 AND 0509-130 IN PMANRMD.LOG FILE WHEN LIBDCE.A IS ON

    PROBLEM DESCRIPTION:
    The pmanrmd.log file shows a repeatable pattern of the
    following entries.
    0509-036 Cannot load program spsec_ldmod because of the
              following errors
    0509-130 Symbol resolution failed for
              /usr/lpp/ssp/bin/spsec_ldmod because:
    0509-136 Symbol GSS_MECH_MIT_KRB5 is not exported from dependent
              module /usr/lib/libdce.a(shr.o).
    /usr/lpp/ssp/bin/SDRGetObjects: 0025-004 Item specified for
           query, insertion or deletion was not found.
    The problem is triggered by the pman daemon logic finding that
    the libdce.a file is on this system before checking to see if
    DCE authentication is in use. DCE is not in use on this system
    and the file remains for other reasons.
    Other apars with similar symptoms IY17070, IY21195, IY23021 and
    IY22203 either have the fix on or do not apply. There does not
    seem to be any impact to the system other than the error entries
    in the log.

    LOCAL FIX:
    The only impact is the messages and they can be ignored. If the
    libdce.a file is removed the messages stop.

    PROBLEM SUMMARY:
    dsrvtgt was calling spsec_start before it was determining if
    dce authentication was being used. If there is an older
    /usr/lib/libdce.a you will get the load errors seen in the
    /var/adm/SPlogs/pman/pmanrmd.log.
    The -m in the SDRGetObjects call has been change to -q.

    PROBLEM CONCLUSION:
    dsrvtgt has been modified to determine if dce authentication
    is being used before calling spsec_start. If it determines
    dce authentication is not being used it just exits without
    calling spsec_start.
    The SDRGetObjects option list has been fixed.

    ------

    APAR: IY33197 COMPID: 5765E6900 REL: 310
    ABSTRACT: TIMING EXPOSURE IN LOADL_NEGOTIATOR CAUSES DEADLOCK

    PROBLEM DESCRIPTION:
    Timing exposures, between a job completion and a Negotiator
    Cycle can cause a deadlock condition in the Negotiator.

    PROBLEM SUMMARY:
    A timing exposure in the LoadLeveler Negotiator could make
    it think that there were jobs running, on a machine, when
    they had already finished. That wrong assumption could
    cause the Negotiator to try to get the same lock, for
    write, a second time. The Negotiator would hang, after
    that.

    PROBLEM CONCLUSION:
    The LoadLeveler Negotiator added a second verification
    that there were jobs running, on a machine, before trying
    to use certain data about the jobs on that machine.

    ------

    APAR: IY33208 COMPID: 5724C3505 REL: 310
    ABSTRACT: CACHEM IMPROVEMENTS FOR WEBSPHERE VOICE RESPONSE AIX

    PROBLEM DESCRIPTION:
    Cachem improvements to be made

    PROBLEM CONCLUSION:
    Changing cache expiry & file error logging

    ------

    APAR: IY33214 COMPID: 5765E6900 REL: 310
    ABSTRACT: A CANCELED INTERACTIVE JOB MIGHT CAUSE THE NEGOTIATOR TO QUIT

    PROBLEM DESCRIPTION:
    If an Interactive job is Ctrl-C'd at the same time that the
    Negotiator decides that it cannot schedule the job to run, the
    LoadL_negotiator daemon may fail to handle the job correctly
    and will intentionally terminate itself.

    PROBLEM SUMMARY:
    An interactive poe job can cause the LoadLeveler Negotiator
    to get confused, and decide to terminate itself, if the
    interactive job is terminated at just the right time during
    the negotiation cycle.

    PROBLEM CONCLUSION:
    The LoadLeveler Negotiator was modified to keep better
    track of interactive jobs, during the negotiation cycle.

    ------

    APAR: IY33227 COMPID: 5765C3403 REL: 430
    ABSTRACT: MISSING RESOURCE ERROR FOR TMSSAR WHEN CFGMGR IS RUN

    PROBLEM DESCRIPTION:
    The error message...
      MISSING RESOURCE
      801020
      The following resources were detected previously, but are
      not detected now:
      - tmssar Target Mode SSA Router
      These resources do not have Diagnostic support and cannot
      be resolved by the Missing Option Resolution Procedure.
    ...when cfgmgr is run.

    PROBLEM CONCLUSION:
    Edit tmssa.ssa.usr.add file to change chgstatus value to 1

    ------

    APAR: IY33251 COMPID: 5765D5100 REL: 340
    ABSTRACT: KLAPI SUPPORT FOR REGATTAH SP SWITH 2 AND SP SWITCH 2 2-PLANE

    PROBLEM DESCRIPTION:
    KLAPI support for regattaH SP Switch 2 and sp switch 2 2-plane

    PROBLEM SUMMARY:
    klapi support for regattah sp switch and
    sp switch2 2-plane

    ------

    APAR: IY33256 COMPID: 5765C3403 REL: 430
    ABSTRACT: CRITICAL FIXES FOR AIX 4.3 AS OF JULY 2002

    PROBLEM DESCRIPTION:
    This APAR delivers security related and other critical fixes for
    AIX 4.3.3 made available after the 4330-10 Recommended
    Maintenance package. This package is delta to the latest
    Recommended Maintenance package and assumes that it is already
    installed. This package also assumes that the Critical Fixes
    for April 2002 (APAR IY30431) are installed.
    This APAR should be ordered with a service level of 433010.
    Security issues resolved:
      IY30357 SECURITY: Buffer overflow vulnerability in traceroute
      IY31997 SECURITY: Buffer overflow in errpt
    * = Potential for remote exploitation
    Other critical issues resolved:
      IY30626 System crash while deleting arp entry
      IY31050 Application fails to load with error 0509-036
      IY31059 su gives error and exits from a local user to other
              users
    This is a packaging APAR only. It will not appear in the list
    of APARs on the SMIT "Update Software by Fix (APAR)" panel, nor
    will the 'instfix' command show this APAR as being installed
    after the updates delivered by this package are installed.
    To install selected updates from this package, use the command:
      smit update_by_fix
    To install all updates from this package that apply to installed
    filesets on your system, use the command:
      smit update_all
    A system reboot is not required after installation for the fixes
    in this package to take effect.

    PROBLEM SUMMARY:
    Packaging only.

    ------

    APAR: IY33265 COMPID: 5765D5100 REL: 340
    ABSTRACT: SP-SWITCH2: ENTIRE SWITCH PLANE BROUGHT DOWN BECAUSE OF A SINGLE

    PROBLEM DESCRIPTION:
    SP-Switch2. Entire switch plane brought down by a single
    bad switch adapter. Problem is already described by defect
    84881.

    PROBLEM SUMMARY:
    When the adapter processor takes an exception
    it does not generate a PCI interrupt to inform the DD and it
    hangs itself afterwards. This causes the switch to back up
    causing the whole network to go down.

    PROBLEM CONCLUSION:
    Added a new function to the bootcode to
    invoke error recovery in the error exception path.

    ------

    APAR: IY33266 COMPID: 5765B8100 REL: 220
    ABSTRACT: DTMF DIGITS SOMETIMES DETECTED TWICE IN VOICEXML APPLICATION

    PROBLEM DESCRIPTION:
    DTMF digits are sometimes detected twice when entering data
    into a voiceXML application when reco is active also.

    PROBLEM SUMMARY:
    DTMF digits are sometimes detected twice when
    entering data into a voiceXML application when reco is
    active also.

    PROBLEM CONCLUSION:
    Harness changed to correct problem

    ------

    APAR: IY33350 COMPID: 5765B8100 REL: 220
    ABSTRACT: LANGUAGE CREATION TIMES OUT

    PROBLEM DESCRIPTION:
    Sometines the careation of a new language will fail if one
    of the voice directories contains a large ammount of data.

    PROBLEM SUMMARY:
    Sometines the careation of a new language will
    fail if one of the voice directories contains a large
    ammount of data.

    PROBLEM CONCLUSION:
    By setting a longer timeout so that the
    operations are much less likely to timeout during a voice
    directory copy operation.

    ------

    APAR: IY33378 COMPID: 5724C3505 REL: 310
    ABSTRACT: IMPROVED ACCESSIBILITY FOR WEBSPHERE VOICE RESPONSE

    PROBLEM DESCRIPTION:
     Improved accessibility for WebSphere Voice Response

    PROBLEM SUMMARY:
     Improved accessibility for WebSphere Voice
    Response

    ------

    APAR: IY33420 COMPID: 5765D5100 REL: 340
    ABSTRACT: SCALING SUPPORT

    PROBLEM DESCRIPTION:
    The IBM eServer Cluster 1600 scaling limit has been increased
    to support clusterswith as many as 32 IBM eServer pSeries
    690/670 servers with a maximum of 128 Logical Partitions
    (LPARs). A single cluster can now employ 1024 POWER4
    processors.

    PROBLEM CONCLUSION:
    The scaling limit for Hardware Management Console (HMC) is
    increased. The HMC can now control up to 8 IBM eServer
    pSeries 690/670 servers in a cluster with a maximum of
    32 LPARs per HMC.

    ------

    APAR: IY33423 COMPID: 5765B9500 REL: 150
    ABSTRACT: MMFSCK DESTROYED THE ALLOC MAP FILES

    PROBLEM DESCRIPTION:
    mmfsck destroyed the alloc map files

    PROBLEM CONCLUSION:
    Fix alloc map segment compare. Map header pointer movement
    was wrong. Alloc segment data length computation was wrong.
    Number of segments to compare was wrong when number of
    segments didn't end on a block boundary. Comparison of
    lastblocksubblocks should allow 0 or 32 for full last block.
    LastSubblocks modulo must use maxSubblocksPerBlock not 32
    for directories and other things that have 'small'
    fullblocks.

    ------

    APAR: IY33448 COMPID: 5765D5100 REL: 340
    ABSTRACT: LATEST PSSP 3.4.0 FIXES AS OF JULY 2002

    PROBLEM DESCRIPTION:
    This is the lastest PSSP ptf as of June 2002.
    Order this apar to get all of the ptfs as of May 2002.

    PROBLEM SUMMARY:
    This is a packaging apar for PSSP 3.4.0 fixes
    as of July 2002.

    ------

    APAR: PQ62946 COMPID: 5765C4200 REL: 330
    ABSTRACT: ESSL _COPY PERFORMANCE FOR N NEAR POWER OF 2 DEGRADED

    PROBLEM DESCRIPTION:
    Customer reported that performance of SCOPY was poor for
    N values which were powers of 2.

    LOCAL FIX:
    Use N value which are not near powers of 2 if possible.

    PROBLEM SUMMARY:
    Performance of ESSL _COPY routines when N
    is near a multiple of 128 is poor.

    PROBLEM CONCLUSION:
    ESSL COPY routines were modified to use
    fewer streams or make adjustments to the lengths of the
    problem in order to avoid rolling the L1 and L2 caches on the
    POWER4.

    TEMPORARY FIX:
    Avoid N values near multiples of 128.

    ------

    APAR: PQ63390 COMPID: 5765C4200 REL: 330
    ABSTRACT: IMPROVE PERFORMANCE ON POWER4

    PROBLEM DESCRIPTION:
    Performance improvements for p690 needed.

    PROBLEM SUMMARY:
    Performance improvements were needed for
    multiple routines for the p690.

    PROBLEM CONCLUSION:
    Multiple routines in the L1 BLAS, L3 BLAS,
    Eigensystems and Linear Algebraic Equations were improved.

    ------

    APAR: PQ63391 COMPID: 5765C4200 REL: 330
    ABSTRACT: ZSCAL PREFETCHING PAST END OF ARRAY ON POWER4

    PROBLEM DESCRIPTION:
    On the Power4, ZSCAL can prefetch past the end of an array for
    N >= 30.

    LOCAL FIX:
    Allocate the vector to be longer than necessary to avoid
    possible seg fault.

    PROBLEM SUMMARY:
    On Power4, for N >= 30, ZSCAL may prefetch
    past the end of the array.

    PROBLEM CONCLUSION:
    ZSCAL was corrected to prevent reading past
    the end of the array.

    ------

    APAR: PQ63394 COMPID: 5765C4200 REL: 330
    ABSTRACT: ERRSET CALLS INCORRECT INTERNALLY FOR MULTIPLE LINEAR ALGEBRAIC

    PROBLEM DESCRIPTION:
    The IUSADR arguments for internal calls to ERRSET in some
    subroutines was incorrect in 64-bit mode.

    PROBLEM SUMMARY:
    The following ESSL routines had the IUSADR
    argument in an internal call to ERRSET incorrectly typed as
    a 32-bit integer instead of a 64-bit integer in 64-bit mode:
    _POTRF, _POICD, _POTRI, _TRTRI, _TPTRI, _GETRI

    PROBLEM CONCLUSION:
    Internal calls to ERRSET were corrected.

    ------

    APAR: PQ63401 COMPID: 5765C4200 REL: 330
    ABSTRACT: IMPROVE FFT PERFORMANCE FOR SMALL LENGTHS

    PROBLEM DESCRIPTION:
    SciComp report indicated that small length FFT performance for
    ESSL was not as good as some public domain packages.

    PROBLEM SUMMARY:
    Performance for small length FFTs were not
    as good as some public domain packages.

    PROBLEM CONCLUSION:
    ESSL 1-D FFTs were improved for lengths
    which are powers of 2 and less than 64.

    ------

    APAR: PQ63403 COMPID: 5765C4200 REL: 330
    ABSTRACT: IMPROVE DAXPY FULL CACHE PERFORMANCE ON POWER4

    PROBLEM DESCRIPTION:
    Through the Scholars program, it was reported that DAXPY
    performance on Power4 for very small sizes was less than that
    of NETLIB, inlined code and AIX libblas.a. For larger sizes
    which are still in the cache, around N=1024, it was reported
    than ESSL DAXPY was less than AIX libblas.a

    PROBLEM SUMMARY:
    When the data is in the cache, DAXPY was not
    performing well on POWER4. For very small sizes, the loop
    was not a good choice and an internal call to a routine which
    determines the machine type was unnecessary overhead. For
    larger sizes(around 1K), the loop was still not scheduled well.

    PROBLEM CONCLUSION:
    prefetch can cause jumpy behavior from size
    to size. However, when the data is not in the cache, this
    technique provides a signifcant boost. In the interest of not
    degrading the performance for those customers whose data is not
    in the cache, this technique was retained.

    ------

    APAR: PQ63407 COMPID: 5765C4200 REL: 330
    ABSTRACT: L1 BLAS PERFORMANCE FOR N NEAR MULTIPLES OF 128 DEGRADED ON

    PROBLEM DESCRIPTION:
    When N is near a multiple of 128, the performance of some
    ESSL L1 BLAS routines is poor.

    LOCAL FIX:
    Avoid N lengths near multiples of 128

    PROBLEM SUMMARY:
    Performance of some L1 BLAS routines for N
    near a multiple of 128 was not good on Power4 as the
    technique for hardware prefetching can cause the L1 cache
    to be rolled more frequently.

    PROBLEM CONCLUSION:
    Multiple L1 BLAS codes were updated to use
    fewer streams or to adjust the problem size to avoid the
    cache problem.

    TEMPORARY FIX:
    Avoid lengths near multiples of 128.

    ------