OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
From: AIX Service Mail Server (aixserv_at_austin.ibm.com)
Date: Tue Oct 08 2002 - 02:43:49 CDT

  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

    has requested a copy or has subscribed to the document named "New_AIXV4_Fixes".
    If you would like to be removed from this mailing list, send e-mail to
    aixservaustin.ibm.com with a subject of "unsubscribe New_AIXV4_Fixes", or
    send a note to owner-aixservaustin.ibm.com with your request.

    APAR: IY32602 COMPID: 5765E6900 REL: 310
    ABSTRACT: LLQ AND LLSTATUS DID NOT RETURN A MESSAG

    PROBLEM DESCRIPTION:
    after installtion of pssp sw ll commands died silently,
    returncode 1.
    llq and llstatus did not return a message but llctl returns
    message 2539-510....

    LOCAL FIX:
    you may need to reinitialize trusted services

    PROBLEM SUMMARY:
    When the trusted services isn't configured,
    LoadLeveler llq and llstatus commands would
    return no data or error messages.

    PROBLEM CONCLUSION:
    LoadLeveler llq and llstatus would return
    correct data even if trusted services
    aren't configured.

    ------

    APAR: IY32609 COMPID: 5765D5100 REL: 340
    ABSTRACT: NODECOND_CHRP FAILS WITH AUTO/AUTO SETTINGS FOR

    PROBLEM DESCRIPTION:
    An install of a pseries 670/690 machine, with enet_rate and
    duplex settings set to auto/auto, fails with the following error
    in the nodecond log along with a led e1f4 node hang:
    Nodecond Status: network type not selected
    Nodecond Status: pSeries 670/690-released OF lock (network boot)
    return code -1 from boot_network
    The problem only seems to be isolated to the 670/690 machines.

    LOCAL FIX:
    Set the enet_rate and/or duplex setting to something other than
    auto.

    PROBLEM SUMMARY:
    netbooting a node with a 10/100 Mbps Ethernet PCI Adapter II
    as en0 will fail if the rate and duplex settings of the
    adapter are set to auto in the SDR. nodecond_chrp needs to
    be modified to handle this adapter.

    PROBLEM CONCLUSION:
    nodecond_chrp has been modified to be able to netboot a
    node with a 10/100 Mbps Ethernet PCI Adapter II as en0
    when the rate and duplex settings of the adapter are set
    to auto in the SDR.

    ------

    APAR: IY32786 COMPID: 5765D5100 REL: 340
    ABSTRACT: XNTPD INOPERATIVE WHEN BROADCASTCLIENT AND SERVER TIMEMASTER

    PROBLEM DESCRIPTION:
    This syetem has two nodes which had a new install of PSSP 3.4.
    A lssrc -a showed the XNTPD was inoperative.
    The errpt -a shows an entry for failing module xntpd, which had
    a SOFTWARE PROGRAM ERROR, Symptom code 256, Software error code
    -9017 and error code of 0.
        The root of the xntpd failure is that /etc/ntp.conf has
    conflicting lines in it. The file originated in the spmig that
    came with PSSP 3.4 and it already had a line of broadcastclient
    in it. This system required that timemaster be specified and
    that was done via smitty. That invoked spsitenv properly and
    this process added the server timemaster line to ntp.conf
    without removing the conflicting broadcastclient line.
        The customer manually editied the broadcastclient line to
    comment it out. Now xntpd is able to start and continue
    sucessfully.
    When seting a site environment variable to specifiy a server
    for the ntp, the broadcastclient line should always be removed.

    LOCAL FIX:
    Edit the ntp.conf to comment out the broadcastclient line when
    specifing a server for xntp.

    PROBLEM SUMMARY:
    PSSP began using AIX's default ntp.conf file, which has a
    "broadcastclient" line. This line is incompatible with
    server lines. When both are present, it can cause ntp to
    terminate.

    PROBLEM CONCLUSION:
    The broadcastclient line will be commented by the ntp
    configuration routine when any server lines are added.
    There will also be a one-time commenting at PTF
    installation.

    ------

    APAR: IY33002 COMPID: 5765E6110 REL: 220
    ABSTRACT: REQUIRED MAINTENANCE UPGRADE

    PROBLEM DESCRIPTION:
    required maintenance upgrade

    ------

    APAR: IY33004 COMPID: 5765E6100 REL: 110
    ABSTRACT: REQUIRED MAINTENANCE UPGRADE

    PROBLEM DESCRIPTION:
    required maintenance upgrade

    ------

    APAR: IY33209 COMPID: 5765D5100 REL: 340
    ABSTRACT: HMREINIT FAILS TO ADD ENTRY FOR ROOT.ADMIN IN HMACLS FILE, IF

    PROBLEM DESCRIPTION:
    If a CWS hostname starts with a number (e.g. 2cws) hmreinit
    will fail to add the root.admin entry for a new frame to the
    hmacls file, if the frame number matches the number at the
    begin of the hostname (frame 2 in this case).
    This is caused by the following statement in hmreinit:
    if -z `/bin/grep "ª *$frame_numberª 0-9 " $HMACLS`

    LOCAL FIX:
    Add the missing line manually to /spdata/sys1/spmon/hmacls file:
    frame# root.admin vsm.
    stopsrc -s hardmon
    startsrc -s hardmon

    PROBLEM SUMMARY:
    If the hostname of the Control Workstation begins with
    a number that matches a frame number, hmreinit fails to add
    all the required entries to /spdata/sys1/spmon/hmacls for
    that frame.

    PROBLEM CONCLUSION:
    hmreinit has been modified to handle the case where the
    hostname of the Control Workstation begins with a number
    that matches a frame number. hmreinit will now add
    all the required entries to /spdata/sys1/spmon/hmacls for
    that frame.

    ------

    APAR: IY33223 COMPID: 5765D5100 REL: 340
    ABSTRACT: DURING SYSMAN_TEST MESSAGE 0037-014 IS ISSUED FOR PPP CONNECTION

    PROBLEM DESCRIPTION:
    While running SYSMAN_test the following is issued when the CWS
    has a pp0 adapter (used by Service Agent).
    SYSMAN_test: 0037-014 Control workstation IP addresses in SDR
      do not match netstat output
    # netstat -in
    ...
    pp0* 1500 link#4
    pp0* 1500 0 0.0.0.0 <==
    ...
    For SYSMAN_test these messages can be ignored but the logic
    should be changed to tolerate the PPP adapter.
    See APAR IY31780 for similar symptoms.

    PROBLEM SUMMARY:
    When the Point-to-Point Protocol (PPP) is being used on
    a Control Workstation, SYSMAN_test will issue the
    following message(s):
    SYSMAN_test: 0037-014 Control workstation IP addresses
                in SDR do not match netstat output
    Since the Point-to-Point Protocol is being displayed in
    the netstat -in data, SYSMAN_test tries to match it
    with data in the SDR and fails. The data from the
    Point-to-Point Protocol should be ignored by SYSMAN_test.

    PROBLEM CONCLUSION:
    SYSMAN_test has been modified to skip lines of data from
    netstat -in which refer to the Point-to-Point Protocol.

    ------

    APAR: IY33247 COMPID: 5765D5100 REL: 340
    ABSTRACT: CSHUTDOWN NODE HANG ON CSCONTROL

    PROBLEM DESCRIPTION:
    cshutdown node hang on cscontrol

    PROBLEM SUMMARY:
    The cshutdown code was first obtaining DCE credentials. If
    it received the DCE credentials it did not proceed to
    obtaining the K4 credentials. When shutdown is run on the
    K4 nodes, the shutdown receives an error due to the lack of
    K4 credentials.

    PROBLEM CONCLUSION:
    The cshutdown code was changed to obtain both DCE
    credentials and K4 credentials if DCE and (k4) compat is
    configured on the SP.

    ------

    APAR: IY33264 COMPID: 5765D5100 REL: 340
    ABSTRACT: MICROCODE SHOULD TURN OFF TBIC PORT ON P750

    PROBLEM DESCRIPTION:
    Microcode should turn off TBIC port on P750

    PROBLEM SUMMARY:
    When the CEC is powered off the adapter continues to run
    until it runs out of receive buffers. Since the CEC is
    powered off one of the adapter DMAs fails and the microcode
    takes an exception. Since host side recovery cannot run the
    switch network backs up.

    PROBLEM CONCLUSION:
    In its exception handler routine the mircocode turns off the
    adapter switch port casuing switch error recovery to be
    invoked which in turn bit buckets all packets destined for
    this adapter.

    ------

    APAR: IY33415 COMPID: 5765E6900 REL: 310
    ABSTRACT: LOADL CANNOT REMOVE A RP JOB

    PROBLEM DESCRIPTION:
    One machine in the LL pool had a crash, which left the two jobs,
    which had been running on the machine in the LL queue.
    LL on the machine was back after reboot, but llstatus showed
    that resources are in use - no new jobs would start. A llcancel
    put the jobs in RP, and the resources on the machine were not
    freed. One job had been issued from this machine, this job
    disappeared from the system after deleting the job_queue files
    in spool/ and recycle LL on this machine. second job, issued
    from another machine persists in queue as RP. resources blocked.

    PROBLEM SUMMARY:
    When LoadLeveler came back up after a crash,
    the job previously in suspended state is gone
    but llq still have it shown as running.
    Doing a llcancel could only set the job
    state to RP without truly removing it.

    PROBLEM CONCLUSION:
    When LoadLeveler came back up after a crash,
    the job previously in suspended state is now
    able to run. And llcancel will be able to
    kill the job.

    ------

    APAR: IY33428 COMPID: 5765D5100 REL: 340
    ABSTRACT: RVSDS SOMETIMES DONT COME UP ON REBOOT DUE TO UNRELIABLE PARSE

    PROBLEM DESCRIPTION:
    rvsd startup on reboot somtimes fails with a strange
    error message from ha.vsd, pointing to a syntax error
    in line 1079 of pssp3.4 ptfset10 level.
    in this line the pid of srcmstr is computed via the
    ps command output piped to grep.
    This is unreliable, since it sometimes comes up with more
    than one PID.
    theres an open defect (82415) pokcmvc, which deals with
    exactly the same prob. for another release.

    LOCAL FIX:
    either reboot again,
    or fix that line to make sure, only the srcmstr s pid is
    grepped.

    PROBLEM SUMMARY:
    A line in ha.vsd that does a grep on srcmstr to determine
    if the rvsd daemon was started via srcmstr is not as
    robust as it could be. It is possible for it to pick up
    multiple process ids, which result in the rvsd daemon
    being unable to start. In this case a Syntax error from
    ha.vsd is written to the console log.

    PROBLEM CONCLUSION:
    ha.vsd has been modified to do a more efficient
    check to determine the process id of srcmstr.
    This check should prevent the syntax error from
    ha.vsd which prevents the rsvd daemon from starting.

    ------

    APAR: IY33544 COMPID: 5765D5100 REL: 340
    ABSTRACT: SDR_CONFIG SETS PSSP LEVEL TO PSSP 3.2 FOR REG LPARS

    PROBLEM DESCRIPTION:
    sdr_config sets PSSP level to pssp 3.2 for reg lpars

    PROBLEM SUMMARY:
    If a CWS has been migrated to PSSP 3.4 from an earlier
    level, defining adapters for pSeries 670/690 nodes by
    specifying the physical location codes may fail.

    PROBLEM CONCLUSION:
    The fix will allow you to define adapters by using physical
    location codes on pSeries 670/690 nodes. A warning will
    be issued by spadaptrs if the PSSP level or code version
    specified for the nodes are not at 3.4. This may happen
    if the CWS has been migrated from an earlier level to
    3.4.

    ------

    APAR: IY33550 COMPID: 5765D5100 REL: 340
    ABSTRACT: S70D DAEMON DIES UNEXPECTED. HARDMON MUST BE STOPPED AND RESTART

    PROBLEM DESCRIPTION:
    SP attached server S80/S85. s70d dies unexpectly. following msgs
    in /var/adm/SPlogs/spmon/s70/s70d.3.log.xxx :
    s70d 3 : 0026-500I s70d daemon started on device"/dev/tty7" (Fra
    me 3) at Sat May 18 09:59:29 2002
    s70d 3 : 0026-507I Entered main processing loop
             SAMI Firmware Level (mm/dd/yy): 8/31/99
    s70d 3 : 0026-522 ioctl() was unsuccessful: Resource temporarily
    unavailable (11)
    s70d 3 : 0026-502I s70d daemon ended (2) on device "/dev/tty7"

    PROBLEM SUMMARY:
    An ioctl failure is causing the s70d to terminate. In the
    log file /var/adm/SPlogs/spmon/s70/s70d.x.log.yyy will be
    the messages:
    0026-522 ioctl() was unsuccessful:
    0026-502I s70d daemon ended (x) on device "/dev/ttyx"
    The s70d should be modifed to not terminate if there
    an ioctl failure.

    PROBLEM CONCLUSION:
    The s70d has been modified to not issue message 0026-522
    when a call to ioctl is unsuccessful and to not
    terminate. The ioctl will either succeed on a subsequent
    retry, or will cause another terminating error to occur.

    ------

    APAR: IY33616 COMPID: 5765D5100 REL: 340
    ABSTRACT: INCORRECT LPAR PARTITION STATE OF 'INITIALIZING' IN NODE STATUS

    PROBLEM DESCRIPTION:
    incorrect lpar partition state of 'initializing' in node status

    PROBLEM SUMMARY:
    The partition state for LPARs on the Regatta machine
    continues to display yellow with 'initializing' on
    the Node Status page of the Node notebook even after reboot
    is complete.

    PROBLEM CONCLUSION:
    The hmcd daemon was returning the Regatta states
    'initializing' and 'running' incorrectly. It was returning
    'initializing' when it should have been returning 'running'
    and visa versa. The hmcd daemon passes this information
    to the Hardware Monitor, which through Event Management,
    passes the information to the Perspectives GUI. The
    Hardware Monitor also passes this information to clients
    such as the hmmon command line interface.

    ------

    APAR: IY33670 COMPID: 5765D5100 REL: 340
    ABSTRACT: TASK_ID WRONG AND PORT DISABLE BEFORE INTERNAL WRAP SET

    PROBLEM DESCRIPTION:
    task_id wrong and port disable before internal wrap set

    PROBLEM SUMMARY:
    For Corsair Adapter Diagnostics:
    Running diagnostic fails the dma_test after the adapter
    has been unfenced or has been connected to the switch.
    The cause for this was the incorrect parameter for the
    network table was being used as a destination id and
    the microcode was throwing out packets because after
    becoming unfenced the network table has become
    populated by real routes.
    The adapter also is reporting false link errors when
    the diagnostics is run after the adapter has been
    unfenced. The cause for this was that the port is still
    active when the TBIC internal wrap mode was turned on,
    which in turn generates the link interrupts.

    PROBLEM CONCLUSION:
    The correct parameter to the dma test is now being used
    so diagnostics will no longer fail even if the network
    table is populated (after an unfence). The port is now
    being disabled prior to enabling the TBIC internal
    wrap mode and the spurious link errors are no longer
    being generated.

    ------

    APAR: IY33671 COMPID: 5765D5100 REL: 340
    ABSTRACT: PROBLEM IN ZERO SDRAM

    PROBLEM DESCRIPTION:
    problem in zero sdram

    PROBLEM SUMMARY:
    The last segment of SDRAM is not initialized to zero.

    PROBLEM CONCLUSION:
    The last segment of SDRAM is now initialized to zero.

    ------

    APAR: IY33672 COMPID: 5765D5100 REL: 340
    ABSTRACT: RH3 MPV:MISSING PART NUMBER FOR TB3 PCI ADAPTER

    PROBLEM DESCRIPTION:
    rh3 mpv:missing part number for tb3 pci adapter

    PROBLEM SUMMARY:
    VPD for pci adapter is not available for diagnostic
    controller to pick up.

    PROBLEM CONCLUSION:
    Get vpd data from adapter and save it in CuVPD during card
    config. Now, when a problem is detected while running
    diagnostics, Diagnostic Controller will get FRU from CuVPD
    and report the problem.

    ------

    APAR: IY33789 COMPID: 5765B9500 REL: 150
    ABSTRACT: ASSERT IN SGMGR.C ON LINE 2018

    PROBLEM DESCRIPTION:
    assert in sgmgr.c on line 2018

    PROBLEM SUMMARY:
    assert in sgmgr.C on line 2018. tsdf -q called
    relPermissionToRun without first calling getPermissionToRun

    PROBLEM CONCLUSION:
    tsdf -q should not call relPermissionToRun
    since it did not call getPermissionToRun

    ------

    APAR: IY33790 COMPID: 5765B9501 REL: 340
    ABSTRACT: MODIFY TSDF AND TSCHDISK CONFLICTS

    PROBLEM DESCRIPTION:
    modify tsdf and tschdisk conflicts

    PROBLEM SUMMARY:
    Someone changed disks in Being-emptied state to Suspended
    state during a mmdeldisk command and the results may not be
    what was expected.

    PROBLEM CONCLUSION:
    Tweak conflict matrix to have the tsdf and chdisk commands
    fail instead of wait for restripe to finish. Allow tsdf with
    the -q option to run without permissions since it only
    displays in-memory tables.

    ------

    APAR: IY33791 COMPID: 5765B9501 REL: 340
    ABSTRACT: MSGS. GARBAGE IN GPFS LOG FILE

    PROBLEM DESCRIPTION:
    msgs. garbage in gpfs log file

    PROBLEM SUMMARY:
    Error messages printed thru RSCT contained garbage data

    PROBLEM CONCLUSION:
    Length of the message string passed to RSCT needs to include
    the trailing zero.

    ------

    APAR: IY33792 COMPID: 5765B9501 REL: 340
    ABSTRACT: INODES MISSING AFTER INODE FILE EXPANSION

    PROBLEM DESCRIPTION:
    inodes missing after inode file expansion

    PROBLEM SUMMARY:
    Inodes missing after inode file expansion

    PROBLEM CONCLUSION:
    After inode file expansion, when marking new inodes as
    available in the inode map, must update segment hints
    accordingly. Otherwise, some of the new inodes may be
    unavailable until next remount.

    ------

    APAR: IY33793 COMPID: 5765B9501 REL: 340
    ABSTRACT: LONG WAITERS ON MMDELFS 'WAITING FOR SG CLEANUP'

    PROBLEM DESCRIPTION:
    long waiters on mmdelfs 'waiting for sg cleanup'

    PROBLEM SUMMARY:
    the rpc handler should let the syncClient call use the sg
    even if in cleanupInProgress.

    PROBLEM CONCLUSION:
    sgmMsgSGUmount handler, having finished its work, calls
    EndUse, and useCount for fs2 goes to zero, triggering
    cleanup, where cleanupInProgress flag is set. During
    cleanup, syncFS is called, which calls
    QuotaClient::SyncQuotaClt, which does internal RPC handled
    by SGHandleQuotaMgrMsg. In this handler, useStripeGroup is
    called, and for all messages that are not quotaMsgEndClient
    it passes USE_WAIT_FOR_CLEANUP flag, thus making useStripe
    Group block because cleanupInProgress is set.

    ------

    APAR: IY33794 COMPID: 5765B9500 REL: 150
    ABSTRACT: MMSDRFS 0 LENGTH AFTER PANIC

    PROBLEM DESCRIPTION:
    mmsdrfs 0 length after panic

    PROBLEM SUMMARY:
    After node crash the mmsdrfs file was made zero
    length

    PROBLEM CONCLUSION:
    Need to do sync after updating key files

    ------

    APAR: IY33796 COMPID: 5765B9501 REL: 340
    ABSTRACT: DIO TRACING FOR TOOLS

    PROBLEM DESCRIPTION:
    dio tracing for tools

    PROBLEM SUMMARY:
    Need DIO tracing

    PROBLEM CONCLUSION:
    Add a DIOQIO trace so that tools for analyzing IO events can
    match QIO/FIO pairs.

    ------

    APAR: IY33797 COMPID: 5765B9501 REL: 340
    ABSTRACT: INVALIDATE DISKS WHEN MMADDISK FAILS

    PROBLEM DESCRIPTION:
    invalidate disks when mmaddisk fails

    PROBLEM SUMMARY:
    When adding new disks, it needs to add new segments to
    allocation map. If there is no space left for the new
    segments it stops. But after freeing space, the retry of the
     mmadddisk says "Are you sure, these disks appear to be in
    use", so need to use the -v no option on mmadddisk to
    override the check.

    PROBLEM CONCLUSION:
    When completeDeleteDisks find disks that were being added
    (but add failed), it needs to have changeDiskStates
    invalidate the disk and SG descriptors when it changes the
    state to BeingDeletedFromAllocMap so that the disks do not
    look like they belong to a SG anymore.

    ------

    APAR: IY33798 COMPID: 5765B9501 REL: 340
    ABSTRACT: VALIDATE ALLOC SEG WRITES

    PROBLEM DESCRIPTION:
    validate alloc seg writes

    PROBLEM SUMMARY:
    Sometimes on a loaded system alloc segment data is getting
    corrupted.

    PROBLEM CONCLUSION:
    To catch alloc segment data corruption, if
    AssertOnStructureError is set, code is added to validate
    alloc segment buffer that was just written still has the
    correct checksum, etc. This would verify whether the data
    was corrupted in GPFS between the checksum calculation and
    the write to disk, or by the disk IO subsystem, et al.

    ------

    APAR: IY33814 COMPID: 5765D5100 REL: 340
    ABSTRACT: SPLED FAILS

    PROBLEM DESCRIPTION:
    spled fails

    PROBLEM SUMMARY:
    The problem is cause by RegattaH (HMC) frames. The problem
    only happens when we start spled immediately after start
    hardmon because it take a longer time for hardmon to
    connect to HMC. Therefore, If we start spled immediately
    after start hardmon, spled will only display non-HMC frames
    at the beginning and blow away eventuallly when it find the
    other frames since there is no space for the new-found
    frames

    PROBLEM CONCLUSION:
    Here is how the fix works:
    If you start spled immediately after start hardmon, spled
    will display all frames (non-HMC and HMC). However, it only
    displays leds/lcds for non-HMC frames and HMC frames will be
    blank at the beginning. It takes about 30 seconds to
    display leds/lcds for HMC frames since it takes time for
    hardmon connect to HMC frames initiately. Note: The switch
    frames will be blank if there are any switch frames in the
    system since there are no lcds/leds for the switch frames.

    ------

    APAR: IY33829 COMPID: 5765D5100 REL: 340
    ABSTRACT: DCR - SPMKVGOBJ SHOULD GIVE WARNING NOT FAIL IF IMAGE IS NOT IN

    PROBLEM DESCRIPTION:
    Currently spmkvgobj fails with a non-0 return code if the image
    specified by the -i flag cannot be found in
       /spdata/sys1/install/images
    This can be inconvenient or unrealistic for large customers that
    keep a separate install image for each node, but store them
    outside spdata when not working on them. Since all spmkvgobj is
    doing is entering data and not actually using the image, it
    could be made to issue a warning if the image is not there but
    still exit successfully (with rc 0).

    LOCAL FIX:
    Touch the needed filename in the spdata directory to fake
    spmkvgobj out (it just checks for filename existence, doesn't
    confirm that it is a valid mksysb).

    PROBLEM SUMMARY:
    spmkvgobj and spchvgobj currently issue an error and
    terminate if the install image specified does not exist in
    /spdata/sys1/install/images/.
    Since the install image is not being used at this point,
    the check should be modified to just issue an
    informational message if the install image does not exist
    and continue processing.

    PROBLEM CONCLUSION:
    spmkvgobj and spchvgobj have been modified to only issue
    an informational message if the install image specified
    does not exist. Later processing in spbootins and
    mknimres will continue to issue error messages if the
    install image is still not present when it is required.

    ------

    APAR: IY33844 COMPID: 5765D5100 REL: 340
    ABSTRACT: SPREBUILDSYSMAP LEAVES SYSTEM WITHOUT SYSPAR_MAP ENTRIES

    PROBLEM DESCRIPTION:
     The command /usr/lpp/ssp/bin/sprebuildsysmap is missing an
     absolute path for the command to be issued on line 60, which is
     as follows:
     $command = "SDR_config -l";
     This line should be:
     $command = "/usr/lpp/ssp/install/bin/SDR_config -l";
    The script is assuming /usr/lpp/ssp/install/bin is in the
    System Administrator's PATH and this is not always a valid
    assumption.
    Because of this, the sprebuildsysmap command is failing and
    leaving the system with no entries in the Syspar_map class.

    LOCAL FIX:
    System Administrator should include /usr/lpp/ssp/bin in the PATH
    as a workaround.

    PROBLEM SUMMARY:
    When the sprebuildsysmap command invokes SDR_config it
    does not specify its full path. If the user's PATH does
    not contain /usr/lpp/ssp/install/bin/, the call to
    SDR_config will fail and the Syspar_map will not be
    recreated.

    PROBLEM CONCLUSION:
    sprebuildsysmap was modified to specify the full path
    for all commands that it calls.

    ------

    APAR: IY33854 COMPID: 5765D5100 REL: 340
    ABSTRACT: XMEMPIN SERVICE FOR 64BIT U_CLIENT

    PROBLEM DESCRIPTION:
    xmempin service for 64bit u_client

    PROBLEM SUMMARY:
    64bit user clients pass in a 64bit shm_p to functions
    xmempin/xmemunpin. Since these functions only accept a
    32bit shm_p on 32bit kernel, shm_p might be corrupted.

    PROBLEM CONCLUSION:
    Before passing shm_p to xmempin/xmemunpin, use as_remap64()
    to remap the 64bit shm_p to 32bit.

    ------

    APAR: IY33906 COMPID: 5765D5100 REL: 340
    ABSTRACT: MODS NEEDED IN HACWS PRE/POST EVENTS

    PROBLEM DESCRIPTION:
    mods needed in hacws pre/post events

    PROBLEM SUMMARY:
    HACWS changes needed to support hacmp 4.5

    PROBLEM CONCLUSION:
    modified hacws pre- and post-event
    scripts to correctly handle changed in hacmp 4.5.

    ------

    APAR: IY33925 COMPID: 5765D5100 REL: 340
    ABSTRACT: GET_FILE_CHECKSUM CAN LEAVE OPEN FILE DESCRIPTORS

    PROBLEM DESCRIPTION:
    A node may be fenced off the switch if there are too many open
    file descriptors. Symptoms include the following messages in
    the fs_daemon_print.file:
            get_file_checksum: fopen failed, errno = 24
            2547-677 topo did not rebuild correctly
            Turning off this nodes switchResponds bits in the SDR
     To recover run rc.switch and Eunfence the node.

    PROBLEM SUMMARY:
    Nodes on the SP-Switch 2 can drop off the switch after the
    switch Eprimary node has done many switch service operations
    (e.g. Efence/Eunfence/Estart's).

    PROBLEM CONCLUSION:
    The fault service daemon has been changed to prevent files
    from being left open after topology file distribution on
    the SP Switch-2 occurs.

    ------

    APAR: IY34137 COMPID: 5765D9300 REL: 320
    ABSTRACT: _GETODMNN MAY CORRUPT MEMORY

    PROBLEM DESCRIPTION:
    _getodmnn may corrupt memory

    PROBLEM SUMMARY:
    During MPI_Init(), odm_set_path() is called for gathering
    info from ODM database. The returned memory pointer is kept
    in a variable for memory releasing. The variable was not
    initialized and the validity of its value was not checked
    before it being used in memory freeing. If ODM function
    calls fail for some reason, invalid memory pointer could be
    used in the memory freeing and cause memory corruption.

    PROBLEM CONCLUSION:
    Initialize the variable to NULL and check whether memory is
    allocated before it is freed.

    ------

    APAR: IY34141 COMPID: 5765D5100 REL: 340
    ABSTRACT: SPFRAME NOT GIVING ERROR WHEN IT SHOULD

    PROBLEM DESCRIPTION:
    spframe not giving error when it should

    PROBLEM SUMMARY:
    Running spframe to attach a CSP protocol server to a
    switchless SP system gives unexpected behavior. The '-n'
    flag is required in this situation, but leaving it out does
    not produce an error message as it should.
    The cause of this problem was due to a flaw in the routine
    that determines if a system is partitionable or not.

    PROBLEM CONCLUSION:
    The routine that determines if a system is partitionable was
    fixed to resolve this defect.

    ------

    APAR: IY34143 COMPID: 5765D5100 REL: 340
    ABSTRACT: POWER ON COMMANDS TO NODE1 OF P690 OR P670 DO NOT SUCCEED

    PROBLEM DESCRIPTION:
    power on commands to node 1 of a p690 pr p670 do not succeed

    PROBLEM SUMMARY:
    Power on commands to "node 1" of a p690 or p670 server
    in SMP mode do not succeed. If you issue
    spmon -power on node1
    for an SMP mode p690 or p670 server, the following
    error message is issued:
    spmon: 0026-068 Frame 1 is powered off.
          Unable to power on devices in the frame.
    spmon: 0026-025 Power command ended in error.

    PROBLEM CONCLUSION:
    The library used by spmon has been modified to allow
    spmon -power on commands to "node 1" of a p690 or p670
    server in SMP mode to succeed.

    ------

    APAR: IY34151 COMPID: 5765D5100 REL: 340
    ABSTRACT: ADD CRUISER SUPPORT

    PROBLEM DESCRIPTION:
    add cruiser support

    PROBLEM SUMMARY:
    Support for the cruiser adapter is needed.

    PROBLEM CONCLUSION:
    PdDv and PdAt information for cruiser is added to
    the corsair.add file so that the cruiser adapter will
    be recognized.

    ------

    APAR: IY34152 COMPID: 5765D5100 REL: 340
    ABSTRACT: KLAPI/DMA FAILURE

    PROBLEM DESCRIPTION:
    klapi/dma failure

    PROBLEM SUMMARY:
    Timing hole in Zero Copy retransmission path of KLAPI can
    cause DMA from system memory to be garbage. The packet will
    be dropped on the other side but we need to fix this hole so
    it does not DMA from system memory bad data. This will only
    happen when multiple retransmit request is processed for a
    given message.

    PROBLEM CONCLUSION:
    The fix is to not send a duplicate zero copy packet in
    KLAPI retransmission logic if the originator of the zero
    copy packet sends an acknowledgement that it has completly
    received the packet.

    ------

    APAR: IY34181 COMPID: 5765D5100 REL: 340
    ABSTRACT: SPGETDESC: REGATTA SUPPORT ENHANCEMENTS

    PROBLEM DESCRIPTION:
    spgetdesc:regatta support enhancements

    PROBLEM SUMMARY:
    Some Regatta support enhancements are required, and are
    provided in this apar.

    PROBLEM CONCLUSION:
    The Regatta support enhancements have been provided.

    ------

    APAR: IY34248 COMPID: 5765E6100 REL: 510
    ABSTRACT: LIBPERFSTAT DISK ERROR/MEMORY LEAK

    PROBLEM DESCRIPTION:
    ers of libperfstat.a disk API function will
    experience a memory leak when consumers run
    as root.

    PROBLEM SUMMARY:
    Users of libperfstat disk API may
    experience a memory leak when
    consumers run as root

    ------

    APAR: IY34339 COMPID: 5765B9501 REL: 340
    ABSTRACT: DEADLOCK HAPPENS WHEN A BYTE-RANGE LOCK IS TAKEN OUT ON A FILE.

    PROBLEM DESCRIPTION:

    PROBLEM SUMMARY:
    Deadlock occured when a byte-range lock is taken out on a
    file.

    PROBLEM CONCLUSION:
    a kernel thread executing lockGetattr may deadlock with an
    InodePrefetchWorker thread. inodeUsed is called while
    holding InodeCacheObj mutex, where an attempt is made to
    acquire ipsMutex. The InodePrefetchWorker thread, while
    holding ipsMutex, calls relAllLocks, which may eventually
    lead to a brUnlock call where InodeCacheObj mutex is
    acquired. The fix is for InodePrefetchWorker to temporarily
    drop ipsMutex while calling relAllLocks.

    ------

    APAR: IY34343 COMPID: 5765D5100 REL: 340
    ABSTRACT: HANG:USER-SPACE SIGNAL LIBRARIES

    PROBLEM DESCRIPTION:
    hang:user-space signal libraries

    PROBLEM SUMMARY:
    This problem shows up when one process' send tail is out of
    sync with another process' send head. When send fifo is
    full, under certain circumstance, the former process think
    there is still space available in send fifo. So it
    continues to write into the fifo so that send tail passes
    send head, which should never happen.

    PROBLEM CONCLUSION:
    To solve this problem, an extra checking need to be
    conducted under that circumstance so that the process can
    stop writing to the fifo when it is full.

    ------

    APAR: IY34397 COMPID: 5765B9501 REL: 340
    ABSTRACT: EXECUTABLE NOT REPLACED AFTER REBIND

    PROBLEM DESCRIPTION:
    Customer discovered on GPFS 1.5 filesystem (at PTF set 11) when
    compiling a C program, any old executables are not replaced --
    so if a program is changed and recompiled, the old executable
    must be deleted first.
    This has been identified as a problem with memory mapped pages
    (and the BRL::flushMappedPages routine) that could affect other
    things besides C compilers, although no other symptoms are known
    at this time.

    LOCAL FIX:
    For C compilers on GPFS 1.5 PTF set 11 (mmfs.base.rte 3.4.0.9),
    old executable must be deleted before recompiling changed source
    code.

    PROBLEM SUMMARY:
    Old executables not replaced after rebind.

    PROBLEM CONCLUSION:
    Invalidate pages from VMM cache at truncate time rather than
    waiting to do it the next time a byte range lock is
    acquired.

    TEMPORARY FIX:
    For C compilers on GPFS 1.5 PTF set 11 (mmfs.base.rte
    3.4.0.9), old executable must be deleted before recompiling
    changed source code

    ------

    APAR: IY34408 COMPID: 5765D5100 REL: 340
    ABSTRACT: SWITCH PRIMARY NODE HUNG WHILE TRYING TO HANDLE SENDER HANGS

    PROBLEM DESCRIPTION:
    The primary node hung while trying to handle a sender hang
    condition causing the backup to take over. The SwitchScan()
    function is not handling sender hang recovery correctly. There
    are a couple of minor bugs in the SwitchScan() code that need to
    be fixed.

    PROBLEM SUMMARY:
    When handling sender hang conditions, it is possible for
    the switch primary node to send a reset service packet to
    an invalid or unknown device, causing the primary to hang
    and the backup node to take over. The fault service
    daemon on the original primary node may be terminated.

    PROBLEM CONCLUSION:
    The code to reset sender hangs was not processing
    its internal buffers correctly; this has been
    corrected.

    ------

    APAR: IY34469 COMPID: 5765D5100 REL: 340
    ABSTRACT: RAISE SPSWITCH2PCI ADAPTER RECOVERY BAD PACKET THRESHOLD

    PROBLEM DESCRIPTION:
    The SPSwitch2PCI adapter will be fenced from the switch when it
    receives more than a small number of corrupted switch packets.
    Generally speaking, the adapter is the victim, and not the cause
    of the bad packets. Therefore, there is little to gain by
    fencing the adapter. The focus of this APAR is to raise the
    SPSwitch2PCI adapter recovery bad pkt threshold significantly.

    PROBLEM SUMMARY:
    The switch bad packet threshold values for the SPSwitch2PCI
    adapter are set too low. This causes the adapter recovery
    logic to fence the node from the switch plane.

    PROBLEM CONCLUSION:
    The threshold values for bad switch packets for the
    SPSwitch2PCI adapter have been increased (in effect
    to infinity). This makes the adapter recovery
    logic more tolerant of corrupted packets.

    ------

    APAR: IY34658 COMPID: 5765B9500 REL: 150
    ABSTRACT: MMCHCLUSTER -R NOT REFLECTED IN MMLSCLUSTER

    PROBLEM DESCRIPTION:
    mmchcluster -R not reflected in mmlscluster

    PROBLEM CONCLUSION:
    Correct the test for changed rcp path;
    change the command to work for clusters based on RSCT peer
    domains.

    ------

    APAR: IY34661 COMPID: 5765B9501 REL: 340
    ABSTRACT: C209F6N16 AND C209F5N03 CRASHED WITH FLASHING "888"

    PROBLEM DESCRIPTION:
    c209f6n16 and c209f5n03 crashed with flashing "888"

    PROBLEM SUMMARY:
    gpfsMount asserted node because mount helper was not
    running.

    PROBLEM CONCLUSION:
    clear the root gpfsNode pointer in the VFS data when
    kSFSMount fails (in case it was set before the failure
    occurred).

    ------

    APAR: IY34744 COMPID: 5765B9501 REL: 340
    ABSTRACT: RUNNING ALT_DISK_INSTALL ON LIVE SYSTEM BRINGS DOWN GPFS

    PROBLEM DESCRIPTION:
    update_all operation which updates GPFS filesets
    on an alt_disk_install will recycling mmfs daemon.
    The install scripts (part of GPFS updates) shouldn't
    recycle mmfs daemon if the update is performed on a
    alternate rootvg.

    PROBLEM SUMMARY:
    Added support for alt_disk_install migration

    PROBLEM CONCLUSION:
    Avoid modifying active system (unmount, process restart,
    etc.) when INUCLIENTS is set

    ------

    APAR: IY34817 COMPID: 5765C3403 REL: 430
    ABSTRACT: EEH FOR SC2+ AIX ERROR LOG ENTRIES INCONSISTENT

    PROBLEM DESCRIPTION:
    SSA PingTimeout error log at the same time as EEH error log.

    PROBLEM CONCLUSION:
    Stop the issue of the spurious timeout when EEH error occurs.

    ------

    APAR: IY34829 COMPID: 5765B9501 REL: 340
    ABSTRACT: ASSERT: VP!= NULL IN BRV.C LINE 203

    PROBLEM DESCRIPTION:
    gpfs assert: VP!= NULL IN BRV.C LINE 203.

    PROBLEM SUMMARY:
    Fixed Assert: vp!=null in BRV.C

    PROBLEM CONCLUSION:
    Fix previous flushMappedPages change to also handle AIX case
    correctly for ref decrement

    ------

    APAR: IY34855 COMPID: 5765C3403 REL: 430
    ABSTRACT: MONITOR ERROR LOG FOR SYMPTOMS OF ARM VIBRATION

    PROBLEM DESCRIPTION:
    Poor performance from some IBM disk drives.

    PROBLEM CONCLUSION:
    Monitor error log for charactistic -- but non-fatal -- errors
    and perform analysis on suspect disk.

    ------

    APAR: IY34856 COMPID: 5765C3403 REL: 430
    ABSTRACT: POLICE VALID SERVICE WORD ON ADAPTER IOCTLS

    PROBLEM DESCRIPTION:
    Adapter resets with the cause -- Illegal Service number.

    PROBLEM CONCLUSION:
    Trap the illegal service number at the disk device driver
    interface and return the request with an error.

    ------

    APAR: IY34917 COMPID: 5765B9501 REL: 340
    ABSTRACT: DIRECT I/O NOT UPDATING FILESIZE

    PROBLEM DESCRIPTION:
    Direct IO in last block but after filesize was not updating the
    metadata filesize or the inode cache filesize.

    LOCAL FIX:
    There is no known work around for this problem.

    PROBLEM SUMMARY:
    Direct IO not updating the filesystem.

    PROBLEM CONCLUSION:
    Direct IO in last block but after filesize was not updating
    the metadata filesize or the inode cache filesize.
    openedDirect flag not getting turned on if file created with
    O_DIRECT flag.

    ------

    APAR: IY35185 COMPID: 5765B8100 REL: 220
    ABSTRACT: ALL APPLICATION PROFILES LASTUPDATE FIELD GETS UPDATED WHEN

    PROBLEM DESCRIPTION:
    The lastupdate field on all the Application Profiles get updated
    with the date of the latest startup of Voice Response.

    ------

    APAR: IY35196 COMPID: 5765B9501 REL: 340
    ABSTRACT: MMCHCLUSTER -R NOT REFLECTED IN MMLSCLUSTER

    PROBLEM DESCRIPTION:
    mmchcluster -R not reflected in mmlscluster

    PROBLEM CONCLUSION:
    Correct the test for changed rcp path;
    change the command to work for clusters based on RSCT peer
    domains.

    ------

    APAR: IY35197 COMPID: 5765B9500 REL: 150
    ABSTRACT: RUNNING ALT_DISK_INSTALL ON LIVE SYSTEM BRINGS DOWN GPFS

    PROBLEM DESCRIPTION:
    update_all operation which updates GPFS filesets
    on an alt_disk_install will recycling mmfs daemon.
    The install scripts (part of GPFS updates) shouldn't
    recycle mmfs daemon if the update is performed on a
    alternate rootvg.

    PROBLEM SUMMARY:
    Added support for alt_disk_install migration

    PROBLEM CONCLUSION:
    Avoid modifying active system (unmount, process restart,
    etc.) when INUCLIENTS is set

    ------

    APAR: IY35225 COMPID: 5765D5100 REL: 340
    ABSTRACT: LATEST PSSP 3.4.0 FIXES AS OF SEPTEMBER 2002

    PROBLEM DESCRIPTION:
    This is the lastest PSSP ptf as of September 2002
    Order this apar to get all of the ptfs as of September 2002.

    PROBLEM SUMMARY:
    This is a packaging apar for PSSP 3.4.0 fixes
    as of September 2002

    PROBLEM CONCLUSION:
    This is a packaging apar for PSSP 3.4.0 fixes
    as of September 2002

    ------