OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
From: AIX Service Mail Server (aixservaustin.ibm.com)
Date: Tue Jun 12 2001 - 02:21:11 CDT

  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

    APAR: IY15379 COMPID: 5765D5100 REL: 320
    ABSTRACT: DOUBLE FREE() IN FAULT_SERVICE_WORM

    PROBLEM DESCRIPTION:
     Problem:
     While running the fault_service_Worm with DEBUG_MALLOC
     set, the worm cored due to freeing the same space twice.

    PROBLEM SUMMARY:
    In the check_compat routine, a pointer was used to traverse
    a list of integers. the pinter was incremented as each
    integer was inspected. After the traversal the pointer was
    passed to free(). However at this point the pointer had been
    incremented and was not pointing to the head of the
    allocated storage anymore.

    PROBLEM CONCLUSION:
    The solution to the coding error is to use a separate
    pointer to traverse the list, keeping the original pointer
    value to use in the free() call.

    ------

    APAR: IY15774 COMPID: 5765D5100 REL: 320
    ABSTRACT: SMITTY AND PRESPECTIVES DO NOT SUPPORT PARTITION SIZE MORE THAN

    PROBLEM DESCRIPTION:
    smitty and prespectives do not support partition size more than
    256 MB when user tries to create VSD.

    PROBLEM SUMMARY:
    The createvsd command supports physical partition sizes of
    512 and 1024 mb, but the smit interface was never updated
    to accept these values. Perspectives, which invokes smit
    config_data has the same problem.

    PROBLEM CONCLUSION:
    The smit panel "Create a Virtual Shared Disk" has been
    expanded to allow physical partition sizes of 512 and 1024
    megabytes.

    ------

    APAR: IY16126 COMPID: 5765D5101 REL: 120
    ABSTRACT: ADAPTER IS BEING DISABLED FASTER THAN IS SHOULD WHEN A NEW

    PROBLEM DESCRIPTION:
    Adapter being disabled because of issues with adapter_may_
    _down field not being reset when a new group is committed. Also
    possible where the incoming_bcasts_cnt and incoming_unicasts_cnt
    fields are reset.

    PROBLEM SUMMARY:
    Topology Services has some logic that considers
    as down an adapter that is apparently only able
    to receive broadcast messages. Receiving only
    broadcast messages may point to a problem with
    either the network or with routing at the local (or
    maybe remote) adapter. Once Topology Services
    detects the adapter is unable to join any
    adapter membership group with its peers, and
    is only receiving broadcast messages, then it
    notifies other subsystems (like Group Services
    or indirectly HACMP) that the local adapter is
    down.
    A problem in the logic above sometimes causes
    an adapter that is receiving only broadcast
    messages to be flagged as down too soon --
    before Topology Services can be really sure
    that only broadcast messages are being received.

    PROBLEM CONCLUSION:
    The logic used to flag as down an adapter that
    is receiving only broadcast messages has been
    modified. With the change, Topology Services will
    ensure that the adapter is indeed only receiving
    broadcast messages for an appropriate time period
    before notifying other subsystems that the
    adapter is down.

    ------

    APAR: IY16244 COMPID: 5765D2800 REL: 430
    ABSTRACT: CSPOC ADD USER VERY SLOW

    PROBLEM DESCRIPTION:
    cspoc add user very slow because if many users in /etc/passwd
    (>7000) the "lsuser -a id ALL" will use 2 minutes of CPU time
    on a RS/6000 model F50.

    PROBLEM SUMMARY:
    If system has >5000 users CSPOC mkuser can take two minutes.

    PROBLEM CONCLUSION:
    Replace lsuser with awk on /etc/passwd.

    ------

    APAR: IY16308 COMPID: 5765D5100 REL: 320
    ABSTRACT: SETUP_SERVER FAILS WITH ERROR 0016-014 NOT GETTING RELIABLE

    PROBLEM DESCRIPTION:
    setup_server fails with error 0016-014 not getting a reliable
    hostname. When installing any SP system and adding nodes,
    frames or attached servers there are many situations where
    SDR-config has run and created the node object but the customer
    has not entered all the data for the node object. In this case
    fields like reliable_hostname are set to "". When setup_server
    runs it fails with the 0016-014 error and sets the fatal
    setup_server processing incomplete (rc=2) message. setup_server
    should just continue and set an informational rc=1 message.

    PROBLEM SUMMARY:
    When setup_server is executed with incomplete data
    information located in the node object
    (ie: reliable_hostname="") the following error message
    is displayed:
    setup_server: 0016-014 Problem found while querying SDR
    for reliable hostnames. SDR Return Code 2.
    setup_server: Processing incomplete (rc= 2).
    After the message is redirected to SDTOUT the command
    immediately exits.

    PROBLEM CONCLUSION:
    setup_server has been modified to no longer terminate when
    it encounters a node with incomplete data. For nodes that
    do not have a reliable hostname entered, the following
    messages will be issued:
    setup_server: There is no reliable hostname
                    assigned to node <node_number>
    setup_server: No NIM resources will be allocated for
                    node <node_number>
    setup_server will then continue processing and exit with
    a return code of 1.

    ------

    APAR: IY16502 COMPID: 5765D5100 REL: 320
    ABSTRACT: HMADM 0026-614 ERROR IS SOMETIMES REPORTED ON THE CONSOLE.

    PROBLEM DESCRIPTION:
    hmadm 0026-614 error is sometimes reported on the console.

    PROBLEM SUMMARY:
    As part of the function of cleanup.logs.ws, the hardware
    monitor daemon log is changed. The current log is closed
    and a new log is opened. In PSSP 3.2, the following
    error message was being issued to the console:
    Cannot change hmlogfile for cleanup, hmadm error:
    hmadm: 0026-614 You do not have authorization to access
          the Hardware Monitor.
    Prior to the call to hmadm clog, ksrvtgt was being issued
    to obtain a hardmon ticket, but it needed to be called
    for root SPbgAdm.

    PROBLEM CONCLUSION:
    cleanup.logs.ws was modified to issue krsvtgt for
    root SPbgAdm prior to invoking hmadm clog. This allows
    hmadm to complete successfully.

    ------

    APAR: IY16688 COMPID: 5765D5100 REL: 320
    ABSTRACT: SDR_CONFIG - INCORRECT SETTING OF ISPARTITIONABLE ATTRIBUTE

    PROBLEM DESCRIPTION:
    IsPartitional attribute of SP class should only be set to false
    after all necessary SDR updates are successfully completed.
    Otherwise, subsequent invocations of SDR_config will not
    complete setup for the SP Switch 2.
    SDR_config -u should also update the Switch_adapter_port class.
    If the SDR updates fail when invoked from hmreinit, verify
    that this is made clear to the user.

    LOCAL FIX:
    The IsPartitional attribute in the SP class needed to be reset
    to "true" before rerunning SDR_config.

    PROBLEM SUMMARY:
    Three problems:
    1. Execution of SDR_config doesn't retry SDR writes if an
      error 80 (Fail to Connect) is encountered.
    2. If an SP Switch2 is present, the SPS2_CleanUp subroutine
      is sometimes skipped, leaving defunct values in the SDR.
    3. The "SDR_config -u" option (update) should not be
      permitted when a SP Switch2 is present. It can cause
      major SDR corruption.

    PROBLEM CONCLUSION:
     When SDR_config writes to the SDR, if an error 80 status
    is encountered, up to five retries will be attempted, with
    a one-second wait in between.
    If an SP Switch2 is present, SPS2_CleanUp will always be
    run.
    The update option of SDR_config will no longer be allowed
    if the SP Switch2 is present. This change is reflected in
    the man page for the SDR_config command, and will be shown
    in the next edition of the PSSP Commands Reference manual.
    A new diagnostic message, 0016-742 will be issued if -u is
    attempted and the task will be aborted.

    ------

    APAR: IY16803 COMPID: 5765D5100 REL: 320
    ABSTRACT: CSTARTUP -S FLAG DOES NOT IGNORE EXISTING SEQUENCING VIOLATIONS.

    PROBLEM DESCRIPTION:
    The cstartup -S flag is supposed to ignore existing
    sequencing violations; some trailing target_nodes are
    already up and running. The target_nodes that are already
    up are left alone. The other target_nodes are started in
    sequence. Running the command returns error 0035-161.

    PROBLEM SUMMARY:
    When a /etc/cstartSeq file is being
    used, and the option -S is used, cstartup
    should not be checking for sequence
    violations.

    PROBLEM CONCLUSION:
    Under the section where we're checking for
    whether a node is a target node...code was
    added to check if one of the command line options
    was -S. If so, do not check for sequence
    violations.
    Also, code was added under the section for a
    target node, that if -z or -Z command was
    issued then we reset the target node. Previously,
    we were always resetting the target node, whether
    or not -z or -Z was used.

    ------

    APAR: IY16864 COMPID: 5765B8100 REL: 220
    ABSTRACT: DTRA CANNOT LOAD VOCABULARIES

    PROBLEM DESCRIPTION:
    The fileset "devices.artic960add" Version 1.4.2 shipped in
    fix level 3004 does not work correctly with the DTRA adapter
    in some system units.

    LOCAL FIX:
    This can be temporarily worked-around by using
    /usr/lib/drivers/s960add from version 1.4.1 of
    devices.artic960add.

    PROBLEM SUMMARY:
    DTRA CANNOT LOAD VOCABULARIES

    PROBLEM CONCLUSION:
    Microcode device drivers updated to
    correct problem

    ------

    APAR: IY16931 COMPID: 5765D5100 REL: 320
    ABSTRACT: PSSP_SCRIPT FAILURE IS NIM MASTER OR BIS ADAPTER NAME END WITH

    PROBLEM DESCRIPTION:
    pssp_script will parse incorretly hostname ending with "is"
    because the following lines of code :
    pssp_script:581: nim_master_ip=$ nim_master_ip *is
    pssp_script:765: bis_adap_addr=$ bis_adap_addr *is
    pssp_script:790: nim_master_ip=$ temp *is
    Are missing a space character after "*" and before "is" so
    when the host command respond "artemis is 1.1.1.1" the resulting
    nim_master_ip value will be "is" instead of "1.1.1.1"

    LOCAL FIX:
    Use a was hostname not ending with "is"
    or change the pssp_script from :
    $(nim_master_ip *is) to :
    $(nim_master_ip *is)

    PROBLEM SUMMARY:
    When the hostname of the B/I Server ends in "is",
    pssp_script fails to determine its ip address correctly.
    As a result the node will be unable to ftp files from
    the B/I Server during a customization, which will result
    in the customization hanging with an led of a03. The
    log from the customization will show an error on the
    tftpfile of the install_info file. There will be a message
    from tftp that the host is unknown.

    PROBLEM CONCLUSION:
    pssp_script was modified to handle the parsing of the
    output of the host command, when the hostname ends
    in "is". This allows nodes to customize when their
    B/I Server hostname ends in "is".

    ------

    APAR: IY16994 COMPID: 5765D5100 REL: 320
    ABSTRACT: VSD/RVSD ENHANCEMENTS

    PROBLEM DESCRIPTION:
    VSD/RVSD Enhancements

    PROBLEM SUMMARY:
    Add the README file for VSD support of the Subsystem
    Device Driver (SDD). SDD is a device driver shipped
    with the Enterprise Storage Server (ESS).

    PROBLEM CONCLUSION:
    Add the README file for VSD support of the Subsystem
    Device Driver (SDD). SDD is a device driver shipped
    with the Enterprise Storage Server (ESS).

    ------

    APAR: IY16995 COMPID: 5765D5100 REL: 311
    ABSTRACT: VSD/RVSD ENHANCEMENTS

    PROBLEM DESCRIPTION:
    vsd/rvsd enhancements

    PROBLEM SUMMARY:
    Add the README file for VSD support of the Subsystem
    Device Driver (SDD). SDD is a device driver shipped
    with the Enterprise Storage Server (ESS).

    PROBLEM CONCLUSION:
    Add the README file for VSD support of the Subsystem
    Device Driver (SDD). SDD is a device driver shipped
    with the Enterprise Storage Server (ESS).

    ------

    APAR: IY17070 COMPID: 5765D5100 REL: 320
    ABSTRACT: NON DCE SITUATION CAUSES DCE AUTH TEST TO INTERFERE W/ SDR CMMDS

    PROBLEM DESCRIPTION:
    If a customer has inadvertenly left a copy of /usr/lib/libdce.a
    on his system, but elects NOT to use DCE as an authentication
    method (ie- splstdata -p shows "auth_methods k4:std"), it
    has been noted the system attempts to use a symbol exported from
    libdce.a. For example, the use of SDRChangeAttrValues results
    in the following error messages:
    exec(): 0509-036 Cannot load program spsec_ldmod because
     of the following errors:
         0509-130 Symbol resolution failed for
                  /usr/lpp/ssp/bin/spsec_ldmod because:
         0509-136 Symbol GSS_MECH_MIT_KRB5 (number 7) is
                  not exported from dependent module
                  /usr/lib/libdce.a(shr.o).
    Removal of the file /usr/lib/libdce.a is a workaround, causing
    the messages to go away.

    LOCAL FIX:
    Workaround is to remove /usr/lib/libdce.a, because it is not
    used in a non-DCE authentication (Kerberos 5) environment.
    Problem can be fixed by changing the logic in an authentication
    test in the module that initiates the SDR session.

    PROBLEM SUMMARY:
    Routines that were trying to write to the SDR in a non-DCE
    environment would fail if a level of DCE prior to 3.1
    existed on the system. The following messages would be
    issued:
    exec(): 0509-036 Cannot load program spsec_ldmod because
                    of the following errors:
           0509-130 Symbol resolution failed for
                    /usr/lpp/ssp/bin/spsec_ldmod because:
           0509-136 Symbol GSS_MECH_MIT_KRB5 (number 7) is
                    not exported from dependent module
                    /usr/lib/libdce.a(shr.o).
           0509-192 Examine .loader section symbols with the
                    'dump -Tv' command.
    The routines were trying to access a variable that does
    not exist in /usr/lib/libdce.a in the earlier version
    of DCE.

    PROBLEM CONCLUSION:
    Modified the routine that grants write access to the SDR to
    first determine if DCE is being used as an authentication
    method, prior to accessing the DCE shared library.

    ------

    APAR: IY17126 COMPID: 5765D5100 REL: 320
    ABSTRACT: VSD IN SUSPENDED STATE, VOL GROUPS ARE VARIED OFF. IN ERRPT YOU

    PROBLEM DESCRIPTION:
    GPFS file system is unmounted, VSD is in suspended state,
    volume groups are varied off. Problem is because customer
    had a GPFS file system in the beginning of their PATH statement
    in /etc/environment. This causes the VSD scripts to callout
    ksh and use ksh's PATH list which will hang the VSD scripts
    because the GPFS path is unmounted and unavailable. If
    VSD scripts used the full path name then the VSD Internal Error
    of klapi timeout would not occur because ksh's PATH list would
    not be used. VSD Scripts should contain full path names.

    LOCAL FIX:
    Fix /etc/environment to contain PATHs that are always
    available or put the user path at the end of the PATH
    statement.

    PROBLEM SUMMARY:
    If a user's directory name appears in the $PATH environment
    variable ahead of the system directories /usr/bin, /usr/sbin
    and /etc, RVSD can be impacted. If, for example, the user's
    directory wasn't mounted, and RVSD tries to execute a
    command that must be resolved from $PATH, it will hang
    waiting for the mount.

    PROBLEM CONCLUSION:
    Five RVSD command modules which rely on $PATH to resolve
    system calls have been changed so that /usr/bin, /usr/sbin
    and /etc are scanned first.
    This will make them more robust, and not be affected by
    local modifications.

    ------

    APAR: IY17129 COMPID: 5765D5100 REL: 320
    ABSTRACT: NO DEFAULT VALUES WHEN SETTING ENT SPEED AND DUPLEX

    PROBLEM DESCRIPTION:
    Step 30 of Migrating to the latest level od PSSP in the
    Installation and Migration guide states that default values will
    be assigned for ethernet speed and duplex. In practise no
    default values are assigned, they are left as null.

    LOCAL FIX:
    Values are entered manually

    PROBLEM SUMMARY:
    After a migration from PSSP-2.4 to PSSP-3.2 the
    new ethernet adapter values, enet_rate and duplex,
    residing in the SDR are not given valid default
    definitions. The null SDR attributes lead to the
    following nodecond failure found in the nodecond
    log:
    Nodecond Status: network type not selected
    return code -1 from boot_network
    Nodecond Status: Finished

    PROBLEM CONCLUSION:
    The new code within /usr/lpp/ssp/install/bin/SDR_init
    does a check for the ethernet type and any blank
    values currently defined in the SDR for the enet_rate
    and duplex attributes. If any blank values are found
    the code will now assign 10/half for those 'bnc' and
    'dix' adapters and 100/half for those 'tp' adapter(s)
    effected.

    ------

    APAR: IY17143 COMPID: 5765D5100 REL: 320
    ABSTRACT: PSSP_SCRIPT NEEDS TO INSTALL DEVICES.CHRP.BASE.RTE IS

    PROBLEM DESCRIPTION:
    The PSSP 3.x pssp_script fails to install (migrate to)
    PSSP 2.4 on MCA nodes because devices.chrp.base.rte
    gets installed only on PCI nodes. But this fileset is
    a prereq of PSSP 2.4, so it is needed on MCA nodes too.
    The part of pssp_script to install devices.chrp.base.rte
    currently is (I have to wrap lines to fit into SSF)
    # Defect 46958: remove -c (commit) flag for AIX filesets
    oslvl=$($oslevel)
    if $oslvl = $os415 && $oslvl = $os414 ; then
    if -z $($lslpp -qh devices.chrp.base.rte 2>/dev/null)
       && $platform = "chrp" ; then #-
       $installp -abgXd/mnt devices.chrp.base.rte
    ==> only on oslevel >=4.2.0.0 and on platform==chrp
        devices.chrp.base.rte will be installed
    This needs to be changed to
    if $oslvl = $os415 && $oslvl = $os414 ; then
    if -z $($lslpp -qh devices.chrp.base.rte 2>/dev/null)
       && $platform = "chrp" || "$code_version" = "PSSP-2.4"
       then #-
       $installp -abgXd/mnt devices.chrp.base.rte
    ==> devices.chrp.base.rte will be installed if
        oslevel>=4.2.0.0 and (platform==chrp or
        code_version==PSSP-2.4)

    PROBLEM SUMMARY:
    pssp_script only installs the fileset devices.chrp.base.rte
    on chrp nodes. However the ssp.basic fileset in PSSP 2.4
    requires devices.chrp.base.rte, regardless of the type of
    node. pssp_script needs to be modified to install the
    fileset devices.chrp.base.rte when either the node
    platform is chrp, or the PSSP level of the node is 2.4.

    PROBLEM CONCLUSION:
    pssp_script has been modified to install the
    fileset devices.chrp.base.rte when either the node
    platform is chrp, or the PSSP level of the node is 2.4.

    ------

    APAR: IY17160 COMPID: 5765D5100 REL: 320
    ABSTRACT: DISABLE MONITORING INDIVIDUAL VSDS IN CFGVSD COMMAND.

    PROBLEM DESCRIPTION:
     Problem:
     When I use /usr/lpp/mmfs/bin/mmcrvsd to create a large
     filesystem (over 300 vsd disks) it does create the vsds
     but does not activate them on the nodes. Each node in the
     cluster returns.
     monitorvsd: 0034-020 only 300 vsds can be monitored at the
     time.
     I have to manually start the vsds on each node before I
     can format the filesystem.
     This is not a big impact, but can cause confusion about if
     the system is in a correct state or not.

    LOCAL FIX:
    Disable monitoring individual vsds in cfgvsd.

    PROBLEM SUMMARY:
    When more than 300 VSD names are specified on a cfgvsd
    command, the execution aborts with error 0034-020, "Only 300
    vsds can be monitored at a time."
    The execution of monitorvsd on all the VSDs in the list is
    not a required feature. In fact, it is undocumented. The
    call should be removed. The monitorvsd function is
    available, and may be invoked by the user to monitor
    individual VSDs.

    PROBLEM CONCLUSION:
    The cfgvsd command will no longer execute a monitorvsd
    command when given a list of VSD names. The monitorvsd
    command will still be available for those who need the
    feature.

    ------

    APAR: IY17200 COMPID: 5765D5100 REL: 320
    ABSTRACT: OUT.TOP INCORRECT LINK STATUS

    PROBLEM DESCRIPTION:
    uninitialized links are reporting "initialized" status in out,to
    p. Example:
    2 L: initialized (wrap plug is installed)
    Should be -2 L

    PROBLEM SUMMARY:
    The out.top file reports conflicting status for
    uninitialized switch-to-switch links. For example, the
    comments report the link as "initialized" when in fact
    it may be removed from the network as faulty. This
    is what the customer received:
    s 10104 1 s 180 3 E154-S04-BH-J33 to E69-S17-BH-J19
    2 L: initialized (link has been removed from network -
    no AUTOJOIN)
    Note, this problem affects only the comments portion of
    the out.top entry; the reported switch connections are
    correct.

    PROBLEM CONCLUSION:
    In the above example, the text "2 L; initialized" represents
    the device status; the text in parenthesis "(link has been
    removed from network - no AUTOJOIN)" represents the link
    status. For an uninitialized switch-to-switch link, the
    device status is meaningless and will not be displayed.
    Given the above example, what you will now see in the
    comments portion of the out.top entry is:
    -6 R; link has been removed from network - no AUTOJOIN
    or
    -6 L; link has been removed from network - no AUTOJOIN
    if the left side of the link is faulty.

    ------

    APAR: IY17211 COMPID: 5765D2800 REL: 430
    ABSTRACT: CLVER DISPLAYS ERROR WITH 3.2.0.6 SSP.BASIC: ERROR: SERVICE

    PROBLEM DESCRIPTION:
    clver incorrectly displays the following error when the
    cluster includes an HPS network and another tcpip network
    such as ethernet:
    Service adapter <adapter name> is improperly configured on
    node <node name>.
    The problem appears to be caused by a naming convention
    change in PSSP regarding the css adapter/interface.

    PROBLEM SUMMARY:
    cluster verification fails with following error:
    ERROR: Service adapter <adapter name> is improperly configured
    on node <node name>.

    PROBLEM CONCLUSION:
    Modify clver so that it is not so picky about the css name
    it is looking for in the CuAt and CuDv ODM classes.

    ------

    APAR: IY17226 COMPID: 5765D6100 REL: 220
    ABSTRACT: LLCANCEL POE INTERACTIVE RUNNING JOB W/EXTERNAL SCHEDULER

    PROBLEM DESCRIPTION:
    Interactive poe running jobs using external scheduler the
            llcancel command can not stop the poe job.

    LOCAL FIX:
    In order to cancel the poe interactive running job w/external
            scheduler. Only the ctrl C can kill the poe job not
            llcancel for now.

    PROBLEM SUMMARY:
    The LoadLeveler command llcancel would not be able
    to cancel a running POE interactive job with
    external scheduler set.

    PROBLEM CONCLUSION:
    The LoadLeveler command llcancel would now be able
    to cancel a running POE interactive job with
    external scheduler set.

    ------

    APAR: IY17233 COMPID: 5765D2800 REL: 430
    ABSTRACT: INCORRECT LIST OF SHARED FILESYSTEMS UNDER SMIT CHANGE / SHOW

    PROBLEM DESCRIPTION:
    When a Resource Group only contains one Node name in node list
    and no
        filesystems nor Volume Groups are specified for this
        Resource Group, the smit menu: Change / Show
    Characteristics of a Shared File System under the cspoc
    options, displays an incorrect list of filesystems, including
    all filesystems in rootvg.

    PROBLEM SUMMARY:
    When a Resource Group only contains one Node name in node list
    and no filesystems nor Volume Groups are specified for this
    Resource Group, the smit menu: Change / Show
    Characteristics of a Shared File System under the cspoc
    options, displays an incorrect list of filesystems, including
    all filesystems in rootvg.

    PROBLEM CONCLUSION:
    Modify HACMP smit cspoc odm sm_cmd_opt and sm_cmd_hdr entries
    to use correct flags.

    ------

    APAR: IY17241 COMPID: 5765D2800 REL: 430
    ABSTRACT: CLSMUXPD FAILS TO OPERATE PROPERLY IF NOFILES IS GREATER THAN

    PROBLEM DESCRIPTION:
    When nofiles is set to a value greater than 2000,
    clsmuxpd does not operate properly.

    LOCAL FIX:
    set nofiles in /etc/security/limits to 2000 or less.

    PROBLEM CONCLUSION:
    modify clsmuxpd so that it will correctly handle file
    descriptors.

    ------

    APAR: IY17252 COMPID: 5765D2800 REL: 430
    ABSTRACT: CLCONVERT CORE DUMPS WHEN CONVERTING FROM HACMP 4.2.2 TO

    PROBLEM DESCRIPTION:
    clconvert.43 core dumps when converting from a hacmp 422
    snapshot.

    PROBLEM CONCLUSION:
    Modify clconvert.43 so that it correctly converts resources
    from hacmp 4.2.2 to hacmp/es 4.3.1

    ------

    APAR: IY17253 COMPID: 5765B9501 REL: 320
    ABSTRACT: MISSING MMFS ENTRIES IN /ETC/FILESYSTEMS AFTER MKSYSB INSTALL

    PROBLEM DESCRIPTION:
    This APAR was opened with regard to the steps required following
    reinstall of a GPFS server since the image doesn't backup the
    /dev/ entries for mountpoints. You can either mv cluster.nodes
    .nodes or mmfs.cfg and restart GPFS to rebuild the /dev entries,
    but the documentation doesn't say anything about that.
    Documentations should be modified/enhanced to reflect above
    steps.

    LOCAL FIX:
    Modify/enhance the documentation to show the steps required
    following the reinstall of a GPFS server.

    PROBLEM SUMMARY:
    Using network install of a node, the gpfs configuration
    files are refreshed; but the entries in the aix
    configuration files are not. This results in an inability to
    mount file systems until the AIX /dev and /etc/filesystems
    entries are refreshed.

    PROBLEM CONCLUSION:
    On GPFS startup; check to see that the AIX configuration
    files contain the needed data.

    ------

    APAR: IY17405 COMPID: 5765D5100 REL: 320
    ABSTRACT: RVSD NOT TERMINATING WHEN CSS ADAPTER GROUP IS DISSOLVED

    PROBLEM DESCRIPTION:
    RVSD is not terminating and restarting KLAPI in all cases where
    CSS adapter group is dissolved.

    LOCAL FIX:
    efix can be found in
    /afs/aix/u/bdherr/efix/nersc/adapter_group_disolved
    contact Brian Herr for more details

    PROBLEM SUMMARY:
    RVSD is not terminating and restarting KLAPI in all
    cases when the CSS adapter group is dissolved. This
    will cause RVSD/VSD to hang when communication is
    re-attempted to these nodes.

    PROBLEM CONCLUSION:
    RVSD will make sure that KLAPI gets terminated and
    restarted when the switch adapter group disolves.

    ------

    APAR: IY17409 COMPID: 5765D5100 REL: 320
    ABSTRACT: CLEANUP.LOGS.WS: K4DESTROY: 2502-000 NO TICKETS TO DESTROY

    PROBLEM DESCRIPTION:
    /usr/lpp/ssp/bin/cleanup.logs.ws script run
    thru cron will create unexpected stderr output
    if the file /var/adm/SPlogs/SPdaemon.log does
    not exit.
    We only get the k4 ticket if the file exists:
    if -f $LOG_DIR/SPdaemon.log
    then
       WORKSTATION_NAME=`hostname`
       # Get K4 creds if required
       /bin/ksrvtgt root SPbgAdm
     ...
    fi
    but later on we will destroy, independed if we
    got it.
    # Get rid of K4 creds if any
    /bin/k4destroy >/dev/null
    unset KRBTKFILE
    at this point the message
    k4destroy: 2502-000 No tickets to destroy.
    is written to stdout. As the script is run from
    cron and no stderr redirection is set there
    (errors should got back to root as email)
    root now will get one email a day because
    of k4destroy failing.
    We should redirect stderr on k4destroy to
    /dev/null too.
    Work around:
    edit /usr/lpp/ssp/bin/cleanup.logs.ws and
    change
    /bin/k4destroy >/dev/null
    to
    /bin/k4destroy >/dev/null 2>/dev/null

    PROBLEM SUMMARY:
    cleanup.logs.ws issues k4destroy to destroy any Kerberos
    Version 4 authentication tickets. If any messages are
    written to stderr, such as:
    k4destroy: 2502-000 No tickets to destroy.
    it results in an email being sent to root, since it is
    usually run as a cron job. Customers would prefer to not
    see this message and to not receive the email.

    PROBLEM CONCLUSION:
    cleanup.logs.ws has been modified so that any output to
    stderr from k4destroy will be redirected to stdout,
    which is already being redirected to /dev/null. As a
    result no error messages will be issued from the call
    to k4destroy from cleanup.logs.ws. This is consistent
    with the call to kdestroy.

    ------

    APAR: IY17438 COMPID: 5765D5100 REL: 320
    ABSTRACT: SWITCH WENT TO DOWN

    PROBLEM DESCRIPTION:
    Switch went to down.
    flt file shows the errors:
    2510-898 unable to access SDR to get the list of auto-join nodes
     rc= -1.
    2510-195 The fault service daemon got a SIGTERM signal.

    PROBLEM SUMMARY:
    Closed the window so the child process will exit on SIGTERM
    without resetting the adapter.

    PROBLEM CONCLUSION:
    There is a small window where the primary forks a child
    process and an SDR test is run where a SIGTERM to the child
    will result in the child call the standard SIGTERM handler
    and reset the adapter.

    ------

    APAR: IY17453 COMPID: 5765D5100 REL: 320
    ABSTRACT: BACKUP ID NOT UPDATED IN SDR WHEN BACKUP TIMES OUT DURING

    PROBLEM DESCRIPTION:
    This apar addresses the aftermath of a backup node not
    responding during fence/unfence processing. The backup
    node fails to ACK to a DEVICE_DB_UPDATES command;
    the backup is fenced and a new backup is chosen. However,
    the SDR is not updated to reflect the new backup id.

    PROBLEM SUMMARY:
    When a primary backup node times out during an Efence or
    Eunfence operation, it is fenced off the switch and a new
    backup node is chosen, but the SDR is not updated to
    reflect the new backup's id.

    PROBLEM CONCLUSION:
    The error recovery Estart that gets invoked to handle
    errors found during Efence or Eunfence will call the
    function which will update the backup's name in the SDR.

    ------

    APAR: IY17467 COMPID: 5765D6100 REL: 220
    ABSTRACT: LLCTL -Q <COMMAND> DOES NOT PREVENT STDOUT.

    PROBLEM DESCRIPTION:
    The llctl -q (the quiet mode) <command> permits non error output
    as if the -q option was not used. For example, llctl -q reconfig
    produces std output exactly like "llctl reconfig" does.

    PROBLEM SUMMARY:
    In LL 2.2, the llctl -q option is not suppressing
    the informational messages.

    PROBLEM CONCLUSION:
    For LoadLeveler version 2.2,
    the llctl -q option will now suppress informational
    messages.

    ------

    APAR: IY17483 COMPID: 5765B9501 REL: 320
    ABSTRACT: MMLSQUOTA -G <GROUP> RUN AS NONE ROOT USER RETURNS THE ERROR

    PROBLEM DESCRIPTION:
    mmlsquota -g <group> run as none root user returns the error
    "operation not permitted" (in this case user loadl).
    GPFS documentation has no limits listed on mmlsquota like 'to
    run this you need system group permission'.

    LOCAL FIX:
    The code internally checks the uid of the process issuing the
    command and allows non-root users to see only their own user
    quotas. This restriction should be mentionded in the man page
    in the next level of documents

    PROBLEM SUMMARY:
    mmlsquota requires root access

    PROBLEM CONCLUSION:
    Allow mmlsquota to display the quotas for a group which the
    issuing user is a member of.

    ------

    APAR: IY17491 COMPID: 5765D5101 REL: 120
    ABSTRACT: ASCIW: GROUP SERVICES UNABLE TO ESTABLISH DOMAIN AFTER RECYCLE

    PROBLEM DESCRIPTION:
    ASCIW: Group Services unable to establish domain after recycle

    PROBLEM SUMMARY:
    In a system with multiple networks. a node may not have
    connection on all networks. To receive connectivity
    information on networks that are not directly connected, a
    Topology Services daemon relies on connectivity messages
    forwarded from those networks. In some situations,
    forwarded connectivity messages may be unnecessarily
    ignored or lost, resulting in missed node downs.

    PROBLEM CONCLUSION:
    Topology Services has been modified to reduce the
    possibility that a forwarded connectivity messages may be
    ignored or lost. It has more nodes doing the forwarding of
    connectivity messages, and it accepts forwarded
    connectivity messages when it is busy committing a group.

    ------

    APAR: IY17504 COMPID: 5765D5101 REL: 120
    ABSTRACT: PHOENIX.SNAP DOESN'T WAIT LONG ENOUGH ON LARGE SYSTEMS

    PROBLEM DESCRIPTION:
    phoenix.snap will wait a certain amount of time for commands it
    runs to complete before declaring them hung and killing them.
    On very large systems some of these commands take longer to run
    and this wait time is not long enough. The wait time needs to
    be increased for large systems.

    LOCAL FIX:
    Manually modify the amount of time phoenix.snap waits by
    changing the values in the function waitforit()

    PROBLEM SUMMARY:
    Some of the commands phoenix.snap runs take time
    proportional to the number of nodes in the system.
    The amount of time it waits for these commands
    before declaring them hung and terminating them
    needs to be adjusted accordingly.
    There were also 3 other bugs noted in phoenix.snap
    that are being fixed here:
    1. On 3.2 systems the format of the lssrc output
    for hags has changed slightly causing phoenix.snap
    to fail to determine the hags nameserver.
    2. /etc/netsvc.conf was not being collected
    because of a typo.
    3. vmstat was collecting the wrong data. Instead
    of collecting interval statistics as was intended
    it was getting statistics since system startup
    repeatedly.

    PROBLEM CONCLUSION:
    phoenix.snap now increases the wait time
    proportionally for every 256 nodes in the system.
    Thus, for 257-512 nodes it doubles the wait time.
    As for the other 3 bugs:
    The lssrc parsing has been modified to not be
    dependent on the spacing of the output.
    The /etc/netsvc.conf typo has been corrected.
    The vmstat collection has been modified to run
    "vmstat 5 5" instead of 5 instances of "vmstat".
    The result is interval data will be collected
    instead of numbers since system startup 5 times
    over.

    ------

    APAR: IY17518 COMPID: 5765D5100 REL: 320
    ABSTRACT: VSD/RVSD ENHANCEMENTS

    PROBLEM DESCRIPTION:
    VSD/RVSD support for the Subsystem
    Device Driver (SDD) for the Enterprise Storage Server (ESS)

    PROBLEM SUMMARY:
    VSD/RVSD enhancements

    PROBLEM CONCLUSION:
    VSD/RVSD enhancements

    ------

    APAR: IY17541 COMPID: 5765D5100 REL: 320
    ABSTRACT: NODES NEED BOOTED TWICE TO UPDATE TUNING.CUST

    PROBLEM DESCRIPTION:
    Because boot procedure always runs tuning.cust locally, *then*
    checks for customize and ftp's tuning.cust from CWS, the node
    must be rebooted a second time for changes in tuning.cust to
    actually be applied to the node.

    LOCAL FIX:
    Reboot nodes a second time after tuning.cust has been ftp'd
    through customize reboot.

    PROBLEM SUMMARY:
    During a node's customization, tuning.cust is ftp'd from
    the node's Boot/Install Server. However, tuning.cust will
    not be executed until the next reboot of the node.
    pssp_script should be modified so that during a node's
    customization, tuning.cust will be executed.

    PROBLEM CONCLUSION:
    pssp_script has been modified so that during a node's
    customization, tuning.cust will be executed.

    ------

    APAR: IY17579 COMPID: 5765D5100 REL: 320
    ABSTRACT: FSD RESPONSE TIME

    PROBLEM DESCRIPTION:
    fsd response time

    PROBLEM SUMMARY:
    When a node is heavily loaded, the switch daemon can be
    delayed processing certain packets. In some cases, the node
    is dropped from the switch.

    PROBLEM CONCLUSION:
    The packet processing code in the fault service daemon has
    been changed to improve processing time.

    ------

    APAR: IY17583 COMPID: 5765D5100 REL: 320
    ABSTRACT: MSSR DOES NOT RELEASE ADAPTER AFTER TEST EXITS.(D/S)

    PROBLEM DESCRIPTION:
    MSSR does not release adapter after test exits.(D/S)

    PROBLEM SUMMARY:
    The interface between the fault service daemon, Connectivity
    Matrix, and the protocols was changed without changing the
    SP Switch Diagnostics.

    PROBLEM CONCLUSION:
    Added call to FSD::download_processor_route_table() so
    that the Connectivity Matrix (CM) will updated, and the
    protocols will be restarted.

    TEMPORARY FIX:
    After the test completes, run an Estart to restore the
    correct state to the adapter.

    ------

    APAR: IY17584 COMPID: 5765D5100 REL: 320
    ABSTRACT: INSUFFICIENT STACK FOR KICKPIPES() IN MPCI, CAUSES A PROBLEM

    PROBLEM DESCRIPTION:
    PSSP 3.1.1 introduced local var 'shoveq' & 'frq' in kickpipe(
    in MPCI. They need stack frame 8192 Bytes, but they are 4096
    Bytes. This fact causes a problem for Informix down.

    PROBLEM SUMMARY:
    Running Informix, on an SP system, can fail if a query with
    2000 or's is done. Informix may detect a corruption of the
    header of its stack block pool, and quit.

    PROBLEM CONCLUSION:
    MPCI, which is used by Informix, added a couple of large
    stack variables for shared memory support. This causes a
    problem, for Informix, because Informix both uses MPCI and
    manages its own threading and stacks. Informix's current
    management does not account for the addition of 8K of
    additional stack space for the MPCI routines that Informix
    calls. MPCI changed the declaration of these new large
    variables so that they are now located in the heap, instead
    of the stack. The new MPCI implementation solves the
    problem that Informix had working with our MPCI environment
    and is probably the better way for MPCI to handle these
    large variables.

    ------

    APAR: IY17605 COMPID: 5765B9501 REL: 320
    ABSTRACT: ASSERT FAILED: IN_CPY->COPYSET...

    PROBLEM DESCRIPTION:
    assert failed:in_cpy->...

    PROBLEM SUMMARY:
    GPFS self check logic failed in HandleReq.C

    PROBLEM CONCLUSION:
    Correct logic error in the token manager

    ------

    APAR: IY17638 COMPID: 5765B9501 REL: 320
    ABSTRACT: GPFS QUOTA OPERATION LOSES TOKEN AND ASSERTS

    PROBLEM DESCRIPTION:
    The problem seems to be that the filesystem managerwanted to
    do a quota operation, but had lost the token for the quota file
    to a particular node. It seems the quota code forgot about
    reacquiring the token before doing the operation and therefore
    asserted when some of its data was not in a "valid" state.

    PROBLEM SUMMARY:
    GPFS self check logic failed in a stress load with quotas
    enabled

    PROBLEM CONCLUSION:
    Correct locking error on quota file

    ------

    APAR: IY17653 COMPID: 5765D5100 REL: 320
    ABSTRACT: SPADAPTR ERROR WITH '-S YES' AND LARGE NODE NUMBER

    PROBLEM DESCRIPTION:
    Using spadaptr with the '-s yes' option and a large number of
    nodes can cause incorrect ip addresses to be calculated,
    resulting in error message 0022-047.

    PROBLEM SUMMARY:
    A customer was using spadaptrs to enter data for a large
    number of css adapters using the switch node numbers.
    When a node with a high node number, but a low switch
    number was encountered, the third octet of the IP address
    was calculated incorrectly.
    During the processing of all the nodes, the third octet
    of the IP address had been incremented because the fourth
    octet had exceeded 255. When the node with a high node
    number was being processed it used the incremented third
    octet number instead of the original value. Since this
    calculated IP address was not a valid IP address, an error
    message was issued stating that the IP address could not
    be resolved and spadaptrs terminated.

    PROBLEM CONCLUSION:
    spadaptrs was modified to correct the generation of IP
    addresses for css adapters, when the switch node numbers
    are used. Certain values were not being reset, which
    caused the third octet of the generated IP address to be
    incorrect, which could cause spadaptrs to fail.

    ------

    APAR: IY17683 COMPID: 5765D2800 REL: 430
    ABSTRACT: UNABLE TO SYNC CLUSTER TOPOLOGY DUE TO CLLOG ERROR MESSAGE

    PROBLEM DESCRIPTION:
    During verification of log files, cllog will fail, issuing the
    erroneous message that cluster.log has already been redirected.

    PROBLEM SUMMARY:
    During verification of log files, cllog will fail, issuing the
    erroneous message that cluster.log has already been redirected.
    The real cause of the problem is an incorrect number of ''
    characters in the following awk calls:
    awk '/local0.info/ { print $2 }' /etc/syslog.conf
    awk '/user.notice/ { print $2 }' /etc/syslog.conf
    There should be three '' characters before the '$', not one.

    PROBLEM CONCLUSION:
    The code was changed to pass grep command which does not
    require any escaped characters to cl_rsh so that the entire
    line is returned to the local machine. Once returned the
    line is then echoed into awk to print just the $2 field.

    ------

    APAR: IY17717 COMPID: 5765D2800 REL: 430
    ABSTRACT: THE HOME DIRECTORY FOR ROOT COULD BE DIFFERENT THEN "/".

    PROBLEM DESCRIPTION:
    When .rhosts file is used by HACMP it assumes that the directory
    is "/". The file should be accessed using "~root/.rhosts".

    PROBLEM SUMMARY:
    error /.rhosts file does not exist

    PROBLEM CONCLUSION:
    replace /.rhosts reference with root/.rhosts

    ------

    APAR: IY17723 COMPID: 5765B9501 REL: 320
    ABSTRACT: GETMSG LOSES ERRNO SETTING

    PROBLEM DESCRIPTION:
    getmsg loses errno setting

    PROBLEM SUMMARY:
    Errno gets lost if the message file becomes inaccessible

    PROBLEM CONCLUSION:
    Correct handling of errno,

    ------

    APAR: IY17726 COMPID: 5765B9501 REL: 320
    ABSTRACT: LOGS NOT MIGRATED IN DELDISK

    PROBLEM DESCRIPTION:
    logs not migrated in deldisk

    PROBLEM SUMMARY:
    Log migration failed after deleting all of the original
    disks in a file system and then trying immediately to
    restripe the file system

    PROBLEM CONCLUSION:
    Correct an error in the creation of spare logs.

    ------

    APAR: IY17753 COMPID: 5765D5100 REL: 320
    ABSTRACT: COLONY:EDC ERRORS ON SW LINKS CAUSE NODES TO FALL OFF THE SW

    PROBLEM DESCRIPTION:
    colony:edc errors on sw links cause nodes to fall off the sw.

    PROBLEM SUMMARY:
    This Defect introduces the cable_test tool in response to
    customer requests to expedite debugging problems with
    loose cables or interposer cards.

    PROBLEM CONCLUSION:
    The cable_test tool was written to help the user isolate
    problems with loose cables and improperly seated interposer
    cards.

    ------

    APAR: IY17770 COMPID: 5765D6100 REL: 220
    ABSTRACT: LOADL LOST DRAIN ON CLASS AFTER LLCTL RECONFIG

    PROBLEM DESCRIPTION:
    LOADL LOST DRAIN ON CLASS AFTER LLCTL RECONFIG

    PROBLEM SUMMARY:
    LoadLeveler resets the startd drain on class after a
    reconfig.

    PROBLEM CONCLUSION:
    LoadLeveler now maintains the startd drain on class after a
    reconfig.

    ------

    APAR: IY17787 COMPID: 5765D6100 REL: 210
    ABSTRACT: STARTER CRASHES WHEN JOB_USER_PROLOG SPECIFIED

    PROBLEM DESCRIPTION:
    LoadL_starter will sometimes take a Segmentation Violation when
    a Job User Prolog is being used.

    LOCAL FIX:
    The job might run, if it is tried again. Otherwise, you must
    find a way to run it without using the Job User Prolog.

    PROBLEM SUMMARY:
    When JOB_USER_PROLOG is specified in the Config file, the
    LoadL_starter will occasionally take a segmentation
    violation.

    PROBLEM CONCLUSION:
    The handling of the JOB_USER_PROLOG was corrected to avoid
    the timing condition that could lead to the segmentation
    violation.

    ------

    APAR: IY17845 COMPID: 5765D5100 REL: 320
    ABSTRACT: WE ARE GETTING THE FOLLOWING ERROR SPKNKEYMAN_ERROR104 WHEN DCE

    PROBLEM DESCRIPTION:
    In errpt -a, customer sees various entries for the following
    error : SPKNKEYMAN_ERROR104. Customer sees error when DCE is
    installed but not configured for SP.

    LOCAL FIX:
    Commenting the following in /etc/rc.sp:
    SPNKEYMAN_START=/usr/lpp/ssp/bin/spnkeyman_start
    if [ -x ${SPNKEYMAN_START} ]; then
            $SPNKEYMAN_START &
    fi

    PROBLEM SUMMARY:
    startup script is checking for
    /usr/lib/libdce.a but not for whether
    the actual fileset is installed or not..
    And it is failing to behave as expected
    in the scenarios where libdce.a exists
    in /usr/lib not becoz the customer
    installed it but becoz some
    of his application needs it..
    daemon code is not detecting the
    completion of the dce config for trusted
    services...hence keeps running and making
    errpt entries in case of dce is
    configured partially.

    PROBLEM CONCLUSION:
    startup script now checks for the fileset
    dce.client.rte instead of checking for
    /usr/lib/libdce.a before starting the daemon.
    daemon code detects the partial configuration
    of dce for trusted services,makes an errpt
    entry and exits.

    ------

    APAR: IY17889 COMPID: 5765B9501 REL: 320
    ABSTRACT: FORMAT SEGMENTATION FAULT

    PROBLEM DESCRIPTION:
    format segmentation fault

    PROBLEM SUMMARY:
    Fix potential segmentation fault during file system
    creation.

    PROBLEM CONCLUSION:
    Fix bug in serializing multiple worker threads

    ------

    APAR: IY17926 COMPID: 5765D6100 REL: 220
    ABSTRACT: LL TO SUPPORT NEW INTERFACE TO VMGETINFO

    PROBLEM DESCRIPTION:
    LL to support new interface to vmgetinfo

    PROBLEM SUMMARY:
    Needed support for future changes in AIX
    for LoadLeveler.

    PROBLEM CONCLUSION:
    Code changes for future support of
    AIX for LoadLeveler.

    ------

    APAR: IY18011 COMPID: 5765D5100 REL: 320
    ABSTRACT: PRIMARY DAEMON CORE DUMPS IN CSRECOVERY

    PROBLEM DESCRIPTION:
    primary daemon core dumps in csrecovery

    PROBLEM SUMMARY:
    This problem was caused by an array in switch recovery
    overflowing and wiping out pointers and other variables. The
    array now has one element assigned to each chip and node in
    the system, eliminating the overflow condition.

    PROBLEM CONCLUSION:
    Switch recovery was changed to have a fixed array containing
    error reset information. The previous array was coded to 100
    entries. If more than 100 resets were pending then the
    array would overflow, and pointers to other structures would
    be wiped out. This would cause segmentation faults. The new
    arrays have one element for each switch chip/node in the
    system, so the overflow will not occur.

    ------

    APAR: IY18013 COMPID: 5765D2800 REL: 430
    ABSTRACT: ON FALLOVER, STANDBY ADAPTER MARKED DOWN IS SOMETIMES SELECTED

    PROBLEM DESCRIPTION:
    node has two standbys. A service address fails (unplugged).
    Swap_adapter completes sucessfully and standby is marked
    down. Fallover occurs. It fails even though there is a
    second standby marked up becuase the standby which is down
    is selected for the service address of the failed node.

    PROBLEM SUMMARY:
    node has two standbys. A service address fails (unplugged).
    Swap_adapter completes sucessfully and standby is marked
    down. Fallover occurs. It fails even though there is a
    second standby marked up becuase the standby which is down
    is selected for the service address of the failed node.

    PROBLEM CONCLUSION:
    Modify clstrmgr so that it exports DOWN for standby adapters
    which are down instead of doing nothing thus making it
    compatible with HAS.

    ------

    APAR: IY18023 COMPID: 5765B9501 REL: 320
    ABSTRACT: INDIRECT BLOCKS LEFT AFTER TRUNCATING TO SMALL FILE

    PROBLEM DESCRIPTION:
    indirect blocks left after truncating to small file

    PROBLEM SUMMARY:
    Correct an error in maintaining the indirect level of a file
    when truncated from a very large file to a very small
    non-zero length

    PROBLEM CONCLUSION:
    Correctly handle the indirection level when truncating files
    to a non-zero small size.

    ------

    APAR: IY18025 COMPID: 5765B9501 REL: 320
    ABSTRACT: DEAMON ASSET REPDISKADDR::GETFROMARRAY + 0X9C

    PROBLEM DESCRIPTION:
    daemon asset repdiskaddr::getfromarray + 0x9c

    PROBLEM SUMMARY:
    Service tool caused a node panic when used.

    PROBLEM CONCLUSION:
    Correct a logic in a data collection service tool

    ------

    APAR: IY18078 COMPID: 5765D5100 REL: 320
    ABSTRACT: VSDVGTS -A TOO SLOW WITH MANY HDISKS AND VPATHS

    PROBLEM DESCRIPTION:
    vsdvgts -a too slow with many hdisks and vpaths

    PROBLEM SUMMARY:
    Several scalability performance problems have been
    observed when a node has a large number of disks
    and/or volume groups being managed by RVSD.

    PROBLEM CONCLUSION:
    The following changes will be made in RVSD to address
    the performance problems observed when a node has
    a large number of disks and/or volume groups.
     - The vsdvgts command has been changed to not use
       the lspv command to determine volume group membership.
     - The RVSD recovery scripts will limit the number of
       varyonvg/varoffvg that can occur in parallel in order
       to reduce ODM lock contention.

    ------

    APAR: IY18108 COMPID: 5765D6100 REL: 220
    ABSTRACT: LLSUBMIT API NOT CLOSING JOB COMMAND FILE

    PROBLEM DESCRIPTION:
    A users program using the submit api runs out of file handles to
    run jobs.

    LOCAL FIX:
    The user's code can be reorganized so that it does not try to
    do more than 4 job submissions without shutting down.

    PROBLEM SUMMARY:
    The llsubmit API does not always close the job command file

    PROBLEM CONCLUSION:
    The llsubmit API needs to close the job command file after
    successfully obtaining the data from the file.

    ------

    APAR: IY18125 COMPID: 5765B9501 REL: 320
    ABSTRACT: ONLINE MMCHECKQUOTA: DEALLOC ASSERTS IN FIXSHADOWTABLEBLOCKCOUNT

    PROBLEM DESCRIPTION:
    online mmcheckquota: dealloc asserts in FixShadowTableBlockCount

    PROBLEM SUMMARY:
    GPFS self check logic terminated while running mmcheckquota

    PROBLEM CONCLUSION:
    Fix serialization error in mmcheckquota

    ------

    APAR: IY18165 COMPID: 5765B9501 REL: 320
    ABSTRACT: GPFS 1.5 SP /ASSERT IN SYNC.C DIRTYINDBUFS > 0 && IBDP != NULL

    PROBLEM DESCRIPTION:
    gpfs 1.5 sp /assert in sync.C dirtyindbufs >0 && ibdp != null

    PROBLEM SUMMARY:
    GPFS self check logic failed in sync.C line 2683

    PROBLEM CONCLUSION:
     In writeIndirect, wait until updateLogger mutex is held
    before checking whether indirect block is dirty.

    ------

    APAR: IY18168 COMPID: 5765B9501 REL: 320
    ABSTRACT: ASSERT FAILED HANDLEREQ.C LINE 2598

    PROBLEM DESCRIPTION:
    assert failed handlereq.c line 2598

    PROBLEM SUMMARY:
    GPFS self check logic failed at
    HandleReq.C line 2598

    PROBLEM CONCLUSION:
    Token reclaiming flag needs to be turned off when the
    token is put backin STABLE state.

    ------

    APAR: IY18170 COMPID: 5765B9501 REL: 320
    ABSTRACT: REMOVE ID AND LOG RECORDS FROM SHIPPED FILES

    PROBLEM DESCRIPTION:
    remove id and log records from shipped files

    PROBLEM SUMMARY:
    Minor packaging changes

    ------

    APAR: IY18206 COMPID: 5765D9300 REL: 310
    ABSTRACT: CHANGE LIGHT WEIGHT CORE FILE DESIGN TO BE ABLE TO NOT PRODUCE

    PROBLEM DESCRIPTION:
    When a user does an llcancel and is using light weight core
    files then they get a LWCF. APAR IY15826 was taken as a DCR
    to change the design in a future release. This apar will
    retrofit this fix into Parallel Environment 3.1 .

    PROBLEM SUMMARY:
    When a user does an llcancel and is using light weight
    core files then a light weight core file is produced.
    The user would like a way to not get the light weight
    core files produced on an llcancel.

    PROBLEM CONCLUSION:
    A design change was made that will add a new environment
    variable to control the generation of a light weight
    core file when a SIGTERM is received by POE.
    llcancel generates a SIGTERM to POE on an
    interactive POE job.
    The following POE environment variable is now recognized:
    export MP_COREFILE_SIGTERM={YES|NO}. The default is YES.
    Set the environment variable to NO (case insensitive)
    and if light weight core files are being specified, then
    on a SIGTERM no light core files will be produced.
    There is also a new command line argument to POE
    -corefile_sigterm that can be used. The default is YES.

    ------

    APAR: IY18233 COMPID: 5765D5100 REL: 320
    ABSTRACT: PSSP_SCRIPT SHOULD SUPPORT NON-ENGLISH LOCALES

    PROBLEM DESCRIPTION:
    pssp_script (PSSP 3.2) should support non-English locales.
    +765 bis_adap_addr=$ bis_adap_addr#*is #- Strip leading stuff
    +766 bis_adap_addr=$ bis_adap_addr%% , * #- Strip following
    the line does include a bias for "bis_adap_add**is ".
    config.log file contains the following error:
    + bis_adap_addr=wg101682 ist 164.17.10.10, Aliases: wg101682
    get_eff_addr[25]: wg101682: 0403-009 Die angegebene Nummer ...

    LOCAL FIX:
    As circumvention:
    edit pssp_script lines having "**is " for THIS SPECIFIC problem.
    bis_adap_addr=$ bis_adap_addr#*is
    change to:
    bis_adap_addr=$ bis_adap_addr#**is

    PROBLEM SUMMARY:
     When customizing a node, if LC_ALL is set to something
    other then en_US (e.g. de_DE), the customization will fail.
    pssp_script will issue a message from its get_eff_addr
    routine stating that an invalid number was supplied for an
    IP address.
    The problem is caused by pssp_script issuing a "host"
    command, and then doing a Korn shell pattern match based
    on the word "is". That will fail in non-English locales.
    ************************************************************
    * USERS AFFECTED: Users whose installation default *
    * language locale is not English. *
    ************************************************************
    * PROBLEM DESCRIPTION: Node customization will fail in *
    * get_eff_addr complaining about bad data in a hostname. *
    ************************************************************
    * RECOMMENDATION: pssp_script must be changed so all *
    * its internal calls generate output in the "C" locale. *
    ************************************************************

    PROBLEM CONCLUSION:
     pssp_script has been modified to use LC_ALL=C on its calls
    to functions that return character strings so that the
    output is in English.
    This allows the pattern-matching operators that follow
    these commands to find the English word(s) they are looking
    for

    ------

    APAR: IY18320 COMPID: 5765D5100 REL: 320
    ABSTRACT: RESET CLEARS DATA THAT MAY BE NEEDED FOR CHECKSTOP ANALYSIS

    PROBLEM DESCRIPTION:
    During adapter reset certain registers are cleared that may be
    needed for debug.
    SP-Switch2 only.

    PROBLEM SUMMARY:
    A node may checkstop when reading SRAM on a
    snap after a critical adapter MIC error has occured and the
    adapter has been reset.

    PROBLEM CONCLUSION:
    Don't read SRAM on critical MIC adapter
    errors.

    ------

    APAR: IY18322 COMPID: 5765D5100 REL: 320
    ABSTRACT: BITS INCORRECTLY TURNED ON DURING ADAPTER RESET

    PROBLEM DESCRIPTION:
    Bits are incorrectly set on the SP-Switch2 adapter during
    critical adapter recovery.

    PROBLEM SUMMARY:
    MIC adapter error bit 23 caused by MIC chip
    reset can override true adapter errors.

    PROBLEM CONCLUSION:
    We will ignore MIC bit 23 when identified
    along with any other critical adapter error.

    ------

    APAR: IY18326 COMPID: 5765D5100 REL: 320
    ABSTRACT: EMASTERD TAKES A LONG TIME ON LARGE SYSTEMS.

    PROBLEM DESCRIPTION:
    emasterd takes a long time (up to 20 minutes) to establish the
    emaster on a large system.

    PROBLEM SUMMARY:
    On large SP Switch2 systems, the
    assignment of a new MSS node can take several minutes;
    this is too long a time for the system to be deprived of a
    synchronized switch clock.

    PROBLEM CONCLUSION:
    Changes were made to emastered to
    speed-up MSS failover on large systems.

    ------

    APAR: IY18337 COMPID: 5765D5100 REL: 320
    ABSTRACT: REENABLE DIAGS EXECUTION AT CONFIG

    PROBLEM DESCRIPTION:
    reenable diags execution at config

    PROBLEM SUMMARY:
    diags execution disabled in SP Switch2 adapter
    configuration
     method to prevent popping another problem (since solved)
     which checkstopped node.

    PROBLEM CONCLUSION:
    diags execution reenabled in SP Switch2
    adapter configuration method.

    ------

    APAR: IY18351 COMPID: 5765D5100 REL: 320
    ABSTRACT: INCORRECT MESSAGE RETURNED WHEN POWERING ON CONDOR NODES

    PROBLEM DESCRIPTION:
    Incorrect message returned when powering on Condor Nodes

    PROBLEM SUMMARY:
    condor nodes require more time to power on
    than most other nodes. The hmcmds command needs to be
    sensitive to this requirement and wait a longer period of time
    for a power on to occur before it reports that the power on
    failed.

    PROBLEM CONCLUSION:
    Modify hmcmds to wait a longer period
    of time during a power on sequence before it reports a
    failure.

    ------

    APAR: IY18354 COMPID: 5765B9501 REL: 320
    ABSTRACT: MMDELDISK STOPS WHEN IT FINDS BROKEN DISK ADDRESS

    PROBLEM DESCRIPTION:
    MMDELDISK STOPS WHEN IT FINDS BROKEN DISK ADDRESS

    PROBLEM SUMMARY:
    mmdeldisk terminiates when it finds a disk
    block which can not be read.

    PROBLEM CONCLUSION:
    mmdeldisk should continue when
    encountering a bad disk block and mark the indirect block
    pointing at the bad disk block as bad.

    ------

    APAR: IY18369 COMPID: 5765D6100 REL: 220
    ABSTRACT: CLASS STATEMENT DOES NO ALLOW ENOUGHT SPACE/NEW FORMAT

    PROBLEM DESCRIPTION:
    LoadL Class statement in the LoadL_config.local file is limited
    to 1024 characters. This with the new hardware and number of
    processors does not allow to list larger numbers of classes.
    Currently you need to list the class for each instance and will
    run out of room if large class names or large number of classes.
    New format to correct this concern and allow greater number of
    classes.

    PROBLEM SUMMARY:
    LoadL Class statement in the LoadL_config.local file
    does not allow large numbers of classes to be listed.
    Currently you need to list the class for each instance and
    will
    run out of room for large class names or large number of
    classes.
    New format addresses this concern and allow greater number
    of
    classes.

    PROBLEM CONCLUSION:
    Two formats for the CLASS statements will be accepted.
    old format:
        CLASS = { "Class_A" "Class_A" "Class_B" }
    new format:
        CLASS = Class_A(2) Class_B(1)
    The new format will allow more class instances to be
    specified and make it
    easier to specify them.
    If "{" or "}" is detected for the CLASS statement, it will
    be processed
    according to the old format; otherwise, the new format.
    The following applies to the new format:
    Each class can only have one entry.
    If a class has more than one entry or there is a syntax
    error, the whole
    CLASS statement will be ignored.
    No_Class(1) will be added if there is no good user input
    from the CLASS
    statement.
    White spaces are free for the new format.
    The number of instances for a class specified inside ()
    should be a
    unsigned integer.

    ------

    APAR: IY18378 COMPID: 5765B9501 REL: 320
    ABSTRACT: DEADLOCK RESTRIPING A METAFILE.

    PROBLEM DESCRIPTION:
    deadlock restriping metadata.

    PROBLEM SUMMARY:
    Deadlock when running restripe command.

    PROBLEM CONCLUSION:
    Correct locking error in GPFS

    ------

    APAR: IY18485 COMPID: 5765D9300 REL: 310
    ABSTRACT: BUG IN CONVERTING FILE OFFSET TO BYTE DISPLACEMENT

    PROBLEM DESCRIPTION:
    bug in converting file offset to byte displacement

    PROBLEM SUMMARY:
    The offset in MPI-IO interfaces is expressed in the number
    of etypes. This should always be converted into a byte
    displacement from file displacement before accessing data.
    In an optimization for certain data types, MPI library
    failed to handle the conversion correctly.

    PROBLEM CONCLUSION:
    The optimization code has been changed so offset in MPI-IO
    interfaces will be converted into byte offset correctly.

    ------

    APAR: IY18486 COMPID: 576554300 REL: 240
    ABSTRACT: BUG IN CONVERTING FILE OFFSET TO BYTE DISPLACEMENT

    PROBLEM DESCRIPTION:
    bug in converting file offset to byte displacement

    PROBLEM SUMMARY:
    The offset in MPI-IO interfaces is expressed in the number
    of etypes. This should always be converted into a byte
    displacement from file displacement before accessing data.
    In an optimization for certain data types, MPI library
    failed to handle the conversion correctly.

    PROBLEM CONCLUSION:
    The optimization code has been changed so offset in MPI-IO
    interfaces will be converted into byte offset correctly.

    ------

    APAR: IY18487 COMPID: 5765D5101 REL: 111
    ABSTRACT: HAEM DAEMON PRIORITY LIKE HATS_PRIORITY

    PROBLEM DESCRIPTION:
    To stop haem from core dumping, dev has recommended to implement
    the same mechanism that hags uses which is to set the priority
    of the daemon at hats_priority + 1.
    haem was not responding to the hags daemon at the given time (2
    mins). When haem detects this, he terminates himself, respawns
    and problem goes away until it can not respond again to hags for
    2 mins.
    haemd did not get enough cpu cycles because it runs with the

    PROBLEM SUMMARY:
    If the system is heavily loaded, there might be occassions
    that haem daemon is not getting enough resources to be able
    to respond to hags within two minutes. When this occurs,
    haemd terminates and respawns.

    PROBLEM CONCLUSION:
    The priority of the haemd process and its resource
    monitors are now equal to:
    (FixPri_Value attribute of the TS_Config class) + 1.
    In other words, haemd priority is set to hats_priority + 1
    (which is what hags is using). This will minimize the
    possibility of haem not being able to respond to hags within
    2 minutes since they are not both set to the same priority.

    ------

    APAR: IY18488 COMPID: 5765B9501 REL: 320
    ABSTRACT: INCREASEINDIRECTIONLEVEL LIVELOCK RESERVING LOGSPACE

    PROBLEM DESCRIPTION:
    increase indirection level livelock reserving logspace

    PROBLEM SUMMARY:
    fix potential deadlock discovered in
    development.

    ------

    APAR: IY18492 COMPID: 5765D5100 REL: 320
    ABSTRACT: HARDMON ERRORS AFTER APPLYING APAR IY16350

    PROBLEM DESCRIPTION:
    APAR IY16350 in ptf set 8 introduced new code for hardware
    support. The post_u script that is run during the ptf install
    will add entries to the /spdata/sys1/spmon/hmthresholds file
    only if the node number is 0. The cws will only have a node
    number of 0 if install_cw has been run. On new cws installs
    that apply ssp code then pt8 before running install_cw the
    post_u will not run and the hmthresholds will not be updated
    leading to errors 0026-612, 0026-409, 0026-405.

    LOCAL FIX:
    run /usr/lpp/ssp/ssp.basic/3.2.0.9/inst_root/ssp.basic.post_u
    on cws after running install_cw

    PROBLEM SUMMARY:
    In cases where PTF 8 has been installed on a CWS, prior to
    install_cw being run, hmthresholds will not be updated with
    entries for the M80. This will result in the CWS not being
    able to communicate with hardmon.

    PROBLEM CONCLUSION:
    post_process has been modified to check hmthresholds for
    entries required for the M80 and to add the entries if they
    do not exist.

    ------

    APAR: IY18494 COMPID: 5765D5100 REL: 320
    ABSTRACT: S1 TTY VALUE NOT UPDATED WITH SPFRAME

    PROBLEM DESCRIPTION:
    s1 tty value not updated with spframe

    PROBLEM SUMMARY:
    This conditional was chosen to avoid a double negative, but
    it doesn't work as expected for double quotes("").

    PROBLEM CONCLUSION:
    Changing the conditional now handles the s1tty code path
    properly for defined tty and for undefined ("")

    ------

    APAR: IY18515 COMPID: 5765D2800 REL: 430
    ABSTRACT: CLINFO EXITS WHEN NOFILES IS SET TO VALUE GREATER THAN 2000.

    PROBLEM DESCRIPTION:
    Clinfo exits when nofiles is set to a value greater than
    2000.

    LOCAL FIX:
    set nofiles to 2000 or less.

    PROBLEM CONCLUSION:
    Modify clinfo so that it is able to handle __NUM_ENTRIES
    number of open files.

    ------

    APAR: IY18527 COMPID: 5765B9501 REL: 320
    ABSTRACT: ASSERT SUBROUTINE FAILED: !UNPINSOMEBUFFER: NO BUFFERS FOUND

    PROBLEM DESCRIPTION:
    assert subroutine failed: !unpinSomeBuffer: no buffers found

    PROBLEM SUMMARY:
    Remove incorrect assert. When other threads are
    pinning and
    unpinning,
    the accounting cannot prevent this assertion from happening.
    Just wait a litle while and try again.

    PROBLEM CONCLUSION:
    Remove incorrect assert. When other threads are pinning
    and unpinning, the accounting cannot prevent this assertion
    from happening. Just wait a litle while and try again.

    ------

    APAR: IY18601 COMPID: 5765B9501 REL: 320
    ABSTRACT: INODE PREFETCH LOOPING

    PROBLEM DESCRIPTION:
    inode prefetch looping

    PROBLEM SUMMARY:
    Infinite loop under rare conditions found in
    development.

    PROBLEM CONCLUSION:
    InodePretchInstance::WorkerThreadBody
    nIdle variable not initialized resulting in prefetch
    thread spinning.

    ------

    APAR: IY18606 COMPID: 5765D5100 REL: 320
    ABSTRACT: RETRIEVE NODE DEVICE CONFIG INFO ON SNAPS

    PROBLEM DESCRIPTION:
    retrieve node device config info on snaps

    PROBLEM SUMMARY:
    The css.snap command was modified to include output
    from the "lsattr -E -l css0" and "lsattr -E -l css1"
    commands.

    PROBLEM CONCLUSION:
    The css.snap command needs to provide additional
    information.

    ------

    APAR: IY18628 COMPID: 5765D2800 REL: 430
    ABSTRACT: HAES: CLSWAPADDRESS CAUSES TWO SWAP_ADAPTER EVENTS TO BE RUN

    PROBLEM DESCRIPTION:
    When customer issued Swap Network Adapter from the smit panel,
    another swap_adapter event started while the initial one was
    still executing. The second one ended with a script error and
    went into config_too_long.

    PROBLEM SUMMARY:
    When customer issued Swap Network Adapter from the smit panel,
    another swap_adapter event started while the initial one was
    still executing. The second one ended with a script error and
    went into config_too_long.

    PROBLEM CONCLUSION:
    Corrected the calls to cl_hats_adapter in cl_swap_IP_address
    and cl_swap_ATM_IP_address such that clstrmgr recognizes the
    grace period for the swap currently running.

    ------

    APAR: IY18650 COMPID: 5765D2800 REL: 430
    ABSTRACT: HACMP.OUT AND CSPOC.LOG HAS TRANSLATED MESSAGE

    PROBLEM DESCRIPTION:
    If LANG is set to point to a non-english message catalog, the
    HA log files may contain non-english text. This can make it
    difficult to receive customer support in the event of problems.

    PROBLEM CONCLUSION:
    This is a change from the solution posted above:
    Add 'LANG=C' on the command line when clgetif is called.
    De-internationalize clsetenvgrp. This program is not intended
    to be run by users, and thus none of its output should be
    showing up anywhere other than the logs. The command does not
    have a man page or any other user documentation, which means
    that this solution should be appropriate.

    ------

    APAR: IY18651 COMPID: 5765D2800 REL: 430
    ABSTRACT: MOUNT POINT NOT REMOVED WHEN REMOVING FILESYSTEM WITH C-SPOC -

    PROBLEM DESCRIPTION:
    When the customer used C-SPOC to remove a filesystem and
    specified to also remove the mount point, the mount point was
    removed only on the node where the vg was varied on and not
    on any of the other cluster nodes participating in the volume
    group.

    PROBLEM CONCLUSION:
    If -r flag do try_parallel of rmdir mount point.

    ------

    APAR: IY18654 COMPID: 5765B9501 REL: 320
    ABSTRACT: MMDELDISK FAILS WITH NOT ENOUGH MEMORY

    PROBLEM DESCRIPTION:
    mmdeldisk fails with not enough memory

    PROBLEM SUMMARY:
     E_NOMEM error received when running mmdeldisk

    PROBLEM CONCLUSION:
    Correct memory leak in the token manager
    when running mmdeldisk.

    ------

    APAR: IY18673 COMPID: 5765B9501 REL: 320
    ABSTRACT: SLOW NFS FLUSH_RANGE CALLS

    PROBLEM DESCRIPTION:
    slow nfs flush_range calls

    PROBLEM SUMMARY:
    NFS performance improvement

    PROBLEM CONCLUSION:
    Streamline one piece of the data flush when
    running NFS writes.

    ------

    APAR: IY18695 COMPID: 5765D5100 REL: 320
    ABSTRACT: COLONY ADAPTER CONFIG PROBLEM

    PROBLEM DESCRIPTION:
    colony adapter config problem

    PROBLEM SUMMARY:
    There was a timing problem that only manifested itself on
    old hardware during the css0 adapter configuration.

    PROBLEM CONCLUSION:
    By removing 3 of 4 reads ( put in place to confirm the
    previous write ) during RDRAM reset the timing issue was
    resolved.

    ------

    APAR: IY18697 COMPID: 5765D5100 REL: 320
    ABSTRACT: COLONY:FIXES/DEBUG ITEMS

    PROBLEM DESCRIPTION:
    colony; fixes/debug items

    PROBLEM SUMMARY:
    While working on some problems at a customer's shop, I
    discovered some paths through the code that shouldn't be
    executed, but were NOT explicitly prevented by the code. I
    also added some debug code which didn't negatively affect
    performance.

    PROBLEM CONCLUSION:
    I altered the code to explicitly prevent some of the paths
    in the code from being illegally executed. I added some of
    the debug code that I developed while working on some field
    problems.

    ------

    APAR: IY18701 COMPID: 5765D5100 REL: 320
    ABSTRACT: ERROR UNDEFINING VSD FOR GHOST VSDS

    PROBLEM DESCRIPTION:
    error undefining vsd for ghost vsds

    PROBLEM SUMMARY:
    The undefvsd command is not undefining vsds. A non-zero
    return code is returned, but no error message is issued.

    PROBLEM CONCLUSION:
    The code is missing a define that is necessary once
    the rvsdrestrict level is set to RVSD3.2.0.4.

    ------

    APAR: IY18742 COMPID: 5765D5100 REL: 311
    ABSTRACT: PSSP_SCRIPT SHOULD SUPPORT NON-ENGLISH LOCALES

    PROBLEM DESCRIPTION:
    pssp_script (PSSP 3.2) should support non-English locales.
    +765 bis_adap_addr=$ bis_adap_addr#*is #- Strip leading stuff
    +766 bis_adap_addr=$ bis_adap_addr%% , * #- Strip following
    the line does include a bias for "bis_adap_add**is ".
    config.log file contains the following error:
    + bis_adap_addr=wg101682 ist 164.17.10.10, Aliases: wg101682
    get_eff_addr[25]: wg101682: 0403-009 Die angegebene Nummer ...

    LOCAL FIX:
    As circumvention:
    edit pssp_script lines having "**is " for THIS SPECIFIC problem.
    bis_adap_addr=$ bis_adap_addr#*is
    change to:
    bis_adap_addr=$ bis_adap_addr#**is

    PROBLEM SUMMARY:
     When customizing a node, if LC_ALL is set to something
    other then en_US (e.g. de_DE), the customization will fail.
    pssp_script will issue a message from its get_eff_addr
    routine stating that an invalid number was supplied for an
    IP address.
    The problem is caused by pssp_script issuing a "host"
    command, and then doing a Korn shell pattern match based
    on the word "is". That will fail in non-English locales.
    ************************************************************
    * USERS AFFECTED: Users whose installation default *
    * language locale is not English. *
    ************************************************************
    * PROBLEM DESCRIPTION: Node customization will fail in *
    * get_eff_addr complaining about bad data in a hostname. *
    ************************************************************
    * RECOMMENDATION: pssp_script must be changed so all *
    * its internal calls generate output in the "C" locale. *
    ************************************************************

    PROBLEM CONCLUSION:
     pssp_script has been modified to use LC_ALL=C on its calls
    to functions that return character strings so that the
    output is in English.
    This allows the pattern-matching operators that follow
    these commands to find the English word(s) they are looking
    for

    ------

    APAR: IY18773 COMPID: 5765D5100 REL: 311
    ABSTRACT: EM.DEFAULT LOG HAS WARNING MSG CONCERNING LIBSDR.A BEING LINKED

    PROBLEM DESCRIPTION:
    In /var/ha/log/em.default log the following warning msg appears
    Error!!! The SDR has detected that multiple copies of itself
    have been linked into this program. This happens if a library or
    the binary has been linked with a static copy of libSDR.a and
    shared libraries are used that reference the shared SDR library,
    libSDR.a. This program must be re-linked!
    The way EM uses the SDR library was changed in 3.1.1 causing
    this problem.

    PROBLEM SUMMARY:
    As a result of the process flow in haemd, when a call is
    made to SDROpenSession, the following messages are
    erroneously written to the em.default log in /var/ha/log/:
    Error!!! The SDR has detected that
    multiple copies of itself have been linked into this
    program.
    This happens if a library or the binary has been linked with
    a static copy of libSDR.a and shared libraries are used that
    reference the shared SDR library, libSDRs.a. This program
    must be re-linked!
    There is actually not an error condition in this situation,
    but SDRCloseSession has failed to clear an environment
    variable which causes the message to be written to the log.

    PROBLEM CONCLUSION:
    SDRCloseSession was modified to clear the environment
    variable which is used to track if multiple copies of the
    SDR have been linked into a program.

    ------

    APAR: IY18904 COMPID: 5765B9500 REL: 130
    ABSTRACT: SPBGADM STILL NOT BEING ADDED TO /ETC/SYSCTL.MCMD.ACL FILE

    PROBLEM DESCRIPTION:
    The SPbgAdm entry is still not being added to the
    /etc/sysctl.mmcmd.acl file. The reason is that the root part
    of mmfs.gpfs.rte did not ship because there were no modified
    shippable parts. The fix for IY16838 involved changes a PTF
    install processing script, which is not considered a shippable
    part. We ned to force the root part of mmfs.gpfs.rte to ship
    even though no shippable parts have changed.

    LOCAL FIX:
    Add the following line to the /etc/sysctl.mmcmd.acl file

    PROBLEM SUMMARY:
    ***********************************************************
    * USERS AFFECTED: Users migrating to GPFS 1.3 *
    ***********************************************************
    * PROBLEM DESCRIPTION: *
    * The root.SPbgAdm line is not being added to the *
    * /etc/sysctl.mmcmd.acl *
    * file. This results in GPFS commands failing with *
    * insufficient authorization messages. *
    ***********************************************************
    * RECOMMENDATION: Install this APAR *
    ***********************************************************

    TEMPORARY FIX:
    *************************************************************
    * Manually add the SPbgAdm line to the /etc/sysctl.mmcmd.acl*
    * file. For example: *
    *_PRINCIPAL root.SPbgAdmj *
    *************************************************************

    ------

    APAR: IY18963 COMPID: 576554300 REL: 240
    ABSTRACT: INSUFFICIENT STACK FOR KICKPIPES() IN MPCI, CAUSES A PROBLEM

    PROBLEM DESCRIPTION:
    PSSP 3.1.1 introduced local var 'shoveq' & 'frq' in kickpipe(
    in MPCI. They need stack frame 8192 Bytes, but they are 4096
    Bytes. This fact causes a problem for Informix down.

    PROBLEM SUMMARY:
    Running Informix on an SP system, can fail if
    a query with 2000 or's is done. Informix may detect a
    corruption of the header of its stack block pool, and quit.

    PROBLEM CONCLUSION:
    MPCI, which is used by Informix, added a
    couple of large stack variables for shared memory support.
    This causes a problem, for Informix, because Informix both uses
    MPCI and manages its own threading and stacks. Informix's
    current management does not account for the addition of 8K of
    additional stack space for the MPCI routines that Informix
    calls. MPCI changed the declaration of these new large
    variables so that they are now locatd in the heap, instead
    of the stack. The new MPCI implementation solves the problem
    that Informix had working with our MPCI environment and is
    probably the better way for MPCI to handle these large
    variables.

    ------

    APAR: IY18967 COMPID: 5765B9500 REL: 130
    ABSTRACT: README UPDATE RELEATIVE TO IBM ESS STORAGE

    PROBLEM DESCRIPTION:
    README update relative to IBM ESS Storage

    ------

    APAR: IY19005 COMPID: 5765C3403 REL: 430
    ABSTRACT: BOS.RTE.INSTALL 4.3.3.51 SHOULD NOT REQUIRE 4.3.3.50

    PROBLEM DESCRIPTION:
    PTF U475744 (bos.rte.install 4.3.3.51) should install
    on any level 4.3 system without any requisite to
    previous level bos.rte.install updates. However, it
    has a requisite to the 4.3.3.50 level.

    PROBLEM CONCLUSION:
    Supersede the bos.rte.install 4.3.3.51 level update with
    the 4.3.3.52 level.

    ------

    APAR: IY19044 COMPID: 5765D5100 REL: 320
    ABSTRACT: PERF. TUNE PAAM BUG FIX AND ADD MX SUPPORT

    PROBLEM DESCRIPTION:
    Perf. Tune PAAM bug fix and add MX support

    PROBLEM SUMMARY:
    Performace degradation after PAAM fix - Colony
    Possible PAAM bug in TB3MX

    PROBLEM CONCLUSION:
    Performance tune KHAL - Colony and TB3MX
    Add full PAAM fix for TB3MX user-space

    ------

    APAR: IY19045 COMPID: 5765D5100 REL: 320
    ABSTRACT: COREDUMP WHEN USE -P OPTION IN THE IFCL_DUMP

    PROBLEM DESCRIPTION:
    coredump when use -p option in the ifcl_dump

    PROBLEM SUMMARY:
    The ifcl_dump command will core dump when it is run with
    the '-p' flag (/usr/lpp/ssp/css/ifcl_dump -p).

    PROBLEM CONCLUSION:
    The ifcl_dump command has been changed to prevent a core
    dump from occuring when ifcl_dump is run with the -p flag.

    ------

    APAR: IY19046 COMPID: 5765D5100 REL: 320
    ABSTRACT: 4M/2M SRAM SUPPORT

    PROBLEM DESCRIPTION:
    4M/2M SRAM suppport

    PROBLEM SUMMARY:
    The switch IP driver needs to be able to support both
    SP Switch 2 adapters that have 2 megabytes of SRAM, as
    well as adapters that have 4 megabytes of SRAM.

    PROBLEM CONCLUSION:
    The switch IP driver, if_cl, has been changed to support
    SP Switch-2 adapters that have 2 megabytes of SRAM, as
    well as adapters that have 4 megabytes of SRAM.

    ------

    APAR: IY19047 COMPID: 5765D5100 REL: 320
    ABSTRACT: COLONY: DUMP.S NEEDS TO FLUSH ALL 2M OF SRAM

    PROBLEM DESCRIPTION:
    COLONY: dump.s needs to flush all 2M of SRAM

    PROBLEM SUMMARY:
    The dump.s routine needs to flush all 2M of SRAM (or the
    higher 2M of SRAM, if you are working with a 4M adapter).
    Currently, it was only flushing 512K. This meant that under
    certain conditions, the microcode logs would include old
    values of some of its variables, rather than what was
    actually being seen by the microcode which was still in
    cache.

    PROBLEM CONCLUSION:
    I altered dump.s to flush the entire 2M SRAM (or high order
    2M of SRAM in a 4M adapter).

    ------

    APAR: IY19048 COMPID: 5765D5100 REL: 320
    ABSTRACT: DELETE EXCEPTION HANDLING CODE FROM COLONY DRIVER SOURCE

    PROBLEM DESCRIPTION:
    delete exception handling code from colony driver source

    PROBLEM SUMMARY:
    The exception handling code for the SP Switch 2 adapter is
    redundant as the adapter will checkstop the node on errors.
    The redundant code and related branches was removed.

    PROBLEM CONCLUSION:
    The exception handling code for the SP-Switch 2 adapter is
    redundant as the adapter will checkstop the node on errors.
    The redundant code and related branches was removed.

    ------

    APAR: IY19300 COMPID: 5765D5100 REL: 320
    ABSTRACT: LAPI/KLAPI PERFORMANCE PROBLEM

    PROBLEM DESCRIPTION:
    LAPI/KLAPI performance problem after PTF9 on SP-Switch2.

    PROBLEM SUMMARY:
    KLAPI performance suffers with PTF9.
    LAPI performance is not on par with MPI.

    PROBLEM CONCLUSION:
    With this fix, LAPI US performance should be as good as MPI.
    KLAPI performance is fixed with this defect.

    ------

    APAR: IY19310 COMPID: 5765E6110 REL: 220
    ABSTRACT: REQUIRED UPDATES FOR RSCT VERSION 2.2

    PROBLEM DESCRIPTION:
    Required updates for RSCT Version 2.2

    PROBLEM SUMMARY:
    These updates must be applied if you are using
    WebSM or have the PSSP or HACMP/ES products installed.

    PROBLEM CONCLUSION:
    These updates must be applied if you are
    using WebSM or have the PSSP or HACMP/ES products installed.

    ------

    APAR: IY19314 COMPID: 5765D5100 REL: 320
    ABSTRACT: SPFRAME MANPGE NEEDS UPDATE FRO CSP OPTION

    PROBLEM DESCRIPTION:
    spframe manpage needs updated for csp option

    PROBLEM SUMMARY:
    Support was added with APAR IY16350 in ssp.basic 3.2.0.9 for
    the RS/6000 H80 and M80. This support includes a new
    hardware protocol of csp on the spframe command. The man
    page for spframe needs to be updated with this information.

    PROBLEM CONCLUSION:
    Updated the spframe man page with the new syntax for
    the RS/6000 H80 and M80, which is:
     spframe -p CSP -n starting_switch_port -r yes | no
              start_frame frame_count starting_tty_port
    CSP was also added to the list of valid hardware protocols.

    ------

    APAR: IY19328 COMPID: 5765B9501 REL: 320
    ABSTRACT: UMASK NOT HONORED THROUGH NFS

    PROBLEM DESCRIPTION:
    umask not honored through nfs

    PROBLEM SUMMARY:
    umask not being honored from nfs.

    PROBLEM CONCLUSION:
    let lfs and nfs set the file or directory mo
    de masked with the umask.

    ------

    APAR: IY19897 COMPID: 5765C3403 REL: 430
    ABSTRACT: AIX 4.3 SECURITY RELATED UPDATES AS OF JUNE 2001

    PROBLEM DESCRIPTION:
    This APAR delivers security related updates for AIX 4.3
    available as of June 2001.
    This is a packaging APAR only. It will not appear in the list
    of APARs on the SMIT "Update Software by Fix (APAR)" panel, nor
    will the 'instfix' command show this APAR as being installed
    after the updates delivered by this package are installed.
    To install selected updates from this package, use the command:
      smit update_by_fix
    To install all updates from this package that apply to installed
    filesets on your system, use the command:
      smit update_all

    PROBLEM SUMMARY:
    Packaging only.

    ------

    APAR: IY19901 COMPID: 5765B8100 REL: 220
    ABSTRACT: ARTIC960 VERSION 1.4.3 UPDATE

    PROBLEM DESCRIPTION:
    Update to Artic drivers to version 1.4.3

    ------

    APAR: IY19908 COMPID: 576552900 REL: 240
    ABSTRACT: LATEST PSSP 2.4 FIXES AS OF JUNE 2001.

    PROBLEM DESCRIPTION:
    Latest PSSP 2.4 fixes as of June 2001.

    PROBLEM SUMMARY:
    This is a packaging apar for PSSP 2.4 fixes as
    of June 2001.

    PROBLEM CONCLUSION:
    This is a packaging apar for PSSP 2.4 fixes
    as of June 2001.

    ------

    APAR: IY19921 COMPID: 5765D5100 REL: 320
    ABSTRACT: LATEST PSSP 3.2.0 FIXES AS OF JUNE 2001

    PROBLEM DESCRIPTION:
    This is the lastest PSSP ptf as of June 2001.
    Order this apar to get all of the ptfs as of June 2001.

    PROBLEM SUMMARY:
    This is a packaging apar for PSSP 3.2.0 fixes
    as of June 2001.

    PROBLEM CONCLUSION:
    This is a packaging apar for PSSP 3.2.0
    fixes as of June 2001.

    ------