OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
From: AIX Service Mail Server (aixserv_at_austin.ibm.com)
Date: Wed Sep 04 2002 - 02:44:21 CDT

  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

    has requested a copy or has subscribed to the document named "New_AIXV4_Fixes".
    If you would like to be removed from this mailing list, send e-mail to
    aixservaustin.ibm.com with a subject of "unsubscribe New_AIXV4_Fixes", or
    send a note to owner-aixservaustin.ibm.com with your request.

    APAR: IY21028 COMPID: 5765D9300 REL: 310
    ABSTRACT: POE NEEDS KERBEROS, THOUGH PSSP3.2 IS INSTALLED TO HAVE BASIC

    PROBLEM DESCRIPTION:
    POE on an SP needs at least KERBEROS running, although
    with PSP 3.2 the SP could be installed without KERBEROS or
    DCE. The dependency on Kerberos should be removed from POE.

    PROBLEM SUMMARY:
    In ppe.poe 3.1, POE requires the "compatibility" security
    method defined by the PSSP SP Security services in order to
    run parallel jobs. This implies that the SP security
    services
    must be installed and configured with the "compatibility"
    method. As a result, in order to configure the SP security
    services, at least Kerberos Version 4 is required.
    If SP security services are not installed and configured,
    this implies a case where no security methods are defined
    (the value returned by lsauthts is blank).
    POE currently treats the case of no methods as a closed
    system and it will not allow parallel jobs to run.

    PROBLEM CONCLUSION:
    POE 3.1 will be changed to treat "no methods" the same
    as the case where the "compatibility" method was defined,
    which will allow parallel jobs to run with AIX .rhosts
    based user authentication. In this case, POE will not
    depend on the installation and configuration of PSSP SP
    security services, and in turn will not require
    Kerberos.

    ------

    APAR: IY30030 COMPID: 5765D5100 REL: 320
    ABSTRACT: SOME NODES WILL HAVE WORM DIED AFTER REBOOT AND HENCE DO NOT

    PROBLEM DESCRIPTION:
    Error description: After rebooting all nodes, some of them will
    not have Worm process up (there will be
    /var/adm/SPlogs/css/rc.switch.log.extra
    file which complains about more than one rc.switch
    processes...).
    rc.switch failure is related to an extremely small timing
    hole that has to do with how ksh implements pipes.

    LOCAL FIX:
    Manually start Worm by rc.switch ; after a while (about 2 min),
    the switch respond for those nodes will come up.

    PROBLEM SUMMARY:
    The rc.switch script can fail to start the false service
    daemon because it may falsely detect that a daemon is
    already running.

    PROBLEM CONCLUSION:
    The rc.switch script has been changed to prevent it
    from falsely detecting a running fault service
    daemon.

    ------

    APAR: IY31661 COMPID: 5765D5100 REL: 320
    ABSTRACT: S70D DAEMON GETS ERRORS "CANNOT COMMUNICATE WITH REMOTE NODE" IF

    PROBLEM DESCRIPTION:
    At SP systems using a 128-port RAN for S7A tty connection foll
    owing entries occur in system errlog of cws: "cannot communicate
    with remote node". The problem is definately associated with a
    timing issue between the s70d daemon and the H/W. With using
    an 8-port adapter the connection is direct between the 8-port
    box and the CWS. With a 16-port box there is only a direct
    connection between the initial box and the CWS (all other boxes
    making up the 128-way adapter do not have direct connection
    which is where the problem lies).

    PROBLEM SUMMARY:
    SP_attached S70s and S7As connected using more than
    When an S70 or S7A is connected using more than an
    8-port RAN, SPMON_EMSG101_ER entries may be made in the
    errpt. This indicates a communication problem, which
    is not the case. The s70d daemon needs to be more
    tolerant of non-responses from the SAMI.

    PROBLEM CONCLUSION:
    The s70d daemon has been modified to be more tolerant of
    non-responses of any kind from SAMI. The allowable number
    of non-responses has been increased to prevent
    SPMON_EMSG101_ER entries being made in the errpt indicating
    the Supervisor is not responding.

    ------

    APAR: IY31853 COMPID: 5765D5100 REL: 320
    ABSTRACT: EXCESSIVE MPI/MPCI REXMIT MESSAGE IN ERRORLOG

    PROBLEM DESCRIPTION:
    MPCI_REXMIT_STALL and MPCI_REXMIT_RECOVER informational messages
    are filling up the errpt making it impossible to see if any real
    errors are occuring on the system that need administrative
    attention. All programs seem to be running within expectaions.

    PROBLEM SUMMARY:
    Excessive REXMIT STALL/RECOVER errors in the errpt.

    PROBLEM CONCLUSION:
    There are two items things that will be looked at, at this
    time. The first is extending a previous performance fix
    that was only done for US to help IP performance as well.
    The second is to institute a message relief timer so that
    each process can only log one MPCI_REXMIT_STALL error
    message per MP_TIMEOUT period.

    ------

    APAR: IY32038 COMPID: 5765D5100 REL: 320
    ABSTRACT: 0509-036 AND 0509-130 IN PMANRMD.LOG FILE WHEN LIBDCE.A IS ON

    PROBLEM DESCRIPTION:
    The pmanrmd.log file shows a repeatable pattern of the
    following entries.
    0509-036 Cannot load program spsec_ldmod because of the
              following errors
    0509-130 Symbol resolution failed for
              /usr/lpp/ssp/bin/spsec_ldmod because:
    0509-136 Symbol GSS_MECH_MIT_KRB5 is not exported from dependent
              module /usr/lib/libdce.a(shr.o).
    /usr/lpp/ssp/bin/SDRGetObjects: 0025-004 Item specified for
           query, insertion or deletion was not found.
    The problem is triggered by the pman daemon logic finding that
    the libdce.a file is on this system before checking to see if
    DCE authentication is in use. DCE is not in use on this system
    and the file remains for other reasons.
    Other apars with similar symptoms IY17070, IY21195, IY23021 and
    IY22203 either have the fix on or do not apply. There does not
    seem to be any impact to the system other than the error entries
    in the log.

    LOCAL FIX:
    The only impact is the messages and they can be ignored. If the
    libdce.a file is removed the messages stop.

    PROBLEM SUMMARY:
    dsrvtgt was calling spsec_start before it was determining if
    dce authentication was being used. If there is an older
    /usr/lib/libdce.a you will get the load errors seen in the
    /var/adm/SPlogs/pman/pmanrmd.log.
    The -m in the SDRGetObjects call has been change to -q.

    PROBLEM CONCLUSION:
    dsrvtgt has been modified to determine if dce authentication
    is being used before calling spsec_start. If it determines
    dce authentication is not being used it just exits without
    calling spsec_start.
    The SDRGetObjects option list has been fixed.

    ------

    APAR: IY32351 COMPID: 5765D6100 REL: 220
    ABSTRACT: TIMING-HOLE IN LOADL PROCESS TRACKING EXTENSION MAY LEAD TO

    PROBLEM DESCRIPTION:
    there is a timing-hole in the Loadl_pte kernel extension,
    that might lead to a node crash.
    during llctl stop the extension gets unloaded, and
    eventually is being accessed after that during the stop
    process..

    PROBLEM SUMMARY:
    The LoadLeveler process tracking tracking kernel
    extension can cause a node crash if process tracking
    is enabled. Node crashes have been seen when
    reconfiguring or restarting LoadLeveler.

    PROBLEM CONCLUSION:
    The LoadLeveler process tracking tracking kernel
    extension has been changed to prevent a node crash
    from occurring when restarting/reconfiguring
    LoadLeveler with process tracking enabled.

    ------

    APAR: IY32428 COMPID: 5765D5100 REL: 320
    ABSTRACT: DOUBLE FREE OF SERVICE PACKET STORAGE IN HAL_RECV_HNDLR()

    PROBLEM DESCRIPTION:
    In hal_recv_hndlr(), there are cases where the storage for a
    service packet is freed after the packet has been placed on a
    port'svirtual receive FIFO. The port thread will also free the
    storage after reading the service packet from the FIFO. The
    free in hal_recv_hndlr() is erroneous. The results are
    indeterminate, because it depends on if/when the doubly-freed
    storage is reused. One possible result is a fault-service daemon
    core dump.

    LOCAL FIX:
    Restart the fault-service daemon after a core dump with
    /usr/lpp/ssp/css/rc.switch.

    PROBLEM SUMMARY:
    Under some conditions, the hal_recv_hndlr() function will
    free the storage used for service packets; the port thread
    may later free this same storage. The results are
    indeterminate; there may be data corruption or the fault
    service daemon may core dump.

    PROBLEM CONCLUSION:
    Once a service packet is placed on the port's virtual
    receive FIFO queue, the hal_recv_hndlr() function
    will no longer try to free it.

    ------

    APAR: IY32694 COMPID: 5765B8100 REL: 220
    ABSTRACT: NUMERIC OPS OUTPUT WRONG RESULT IN 3270 IF RESULT > 2147483647

    PROBLEM DESCRIPTION:
    Directalk 3270 script wont handle values over 2 billion.
    Bad result when adding or subtracting numbers over 2 billion.

    LOCAL FIX:
    Multiply any of the input parameters by 1.0.
    Output will then be correct, but will also have extra
    decimal places, which may require modifications to the
    3270 server.

    PROBLEM SUMMARY:
    Directalk 3270 script wont handle values over 2
     billion.
    Bad result when adding or subtracting numbers over 2 billion.

    PROBLEM CONCLUSION:
    Changed sprintf output from signed int to
    float with no decimal places. Also added a test to abort
    script and generate error 24507 and abort the script if the
    "1e15 > result > -1e15" are exceeded.
    Note: Exceeding these limits will cause loss of precision in
    calculations and hence are trapped.

    ------

    APAR: IY32751 COMPID: 5765D5100 REL: 320
    ABSTRACT: SRVSUPPWD NEEDS UNIQUE TMP PW FILENAME

    PROBLEM DESCRIPTION:
    srvsuppwd needs unique tmp pw filename

    PROBLEM SUMMARY:
    The srvsuppwd process creates a temporary file that is
    not unique to the process. Since there may be multiple
    srvsuppwd processes running at the same time, this could
    result in an updsuppwd process on a node having to try
    to obtain the supman password file muptiple times.

    PROBLEM CONCLUSION:
    srvsuppwd has been modified to create a temporary file that
    is unique to the process that creates it.

    ------

    APAR: IY32760 COMPID: 5765D5100 REL: 320
    ABSTRACT: CHANGE DEFAULT PERMS ON TAR FILE

    PROBLEM DESCRIPTION:
    change default perms on tar file

    PROBLEM SUMMARY:
    css.snap tar file was world readable.

    PROBLEM CONCLUSION:
    css.snap tar file is now readable only
    by root.

    ------

    APAR: IY32788 COMPID: 5765D5100 REL: 320
    ABSTRACT: S70D DAEMON DIES UNEXPECTED. HARDMON MUST BE STOPPED AND RESTART

    PROBLEM DESCRIPTION:
    SP attached server S80/S85. s70d dies unexpectly. following msgs
    in /var/adm/SPlogs/spmon/s70/s70d.3.log.xxx :
    s70d 3 : 0026-500I s70d daemon started on device"/dev/tty7" (Fra
    me 3) at Sat May 18 09:59:29 2002
    s70d 3 : 0026-507I Entered main processing loop
             SAMI Firmware Level (mm/dd/yy): 8/31/99
    s70d 3 : 0026-522 ioctl() was unsuccessful: Resource temporarily
    unavailable (11)
    s70d 3 : 0026-502I s70d daemon ended (2) on device "/dev/tty7"

    PROBLEM SUMMARY:
    An ioctl failure is causing the s70d to terminate. In the
    log file /var/adm/SPlogs/spmon/s70/s70d.x.log.yyy will be
    the messages:
    0026-522 ioctl() was unsuccessful:
    0026-502I s70d daemon ended (x) on device "/dev/ttyx"
    The s70d should be modifed to not terminate if there
    an ioctl failure.

    PROBLEM CONCLUSION:
    The s70d has been modified to not issue message 0026-522
    when a call to ioctl is unsuccessful and to not
    terminate. The ioctl will either succeed on a subsequent
    retry, or will cause another terminating error to occur.

    ------

    APAR: IY32969 COMPID: 5765D5100 REL: 320
    ABSTRACT: SP SWITCH2 WORM RUNS SLOW UNDER HEAVY PAGING LOAD

    PROBLEM DESCRIPTION:
    The current SP Switch2 Worm uses popen() to invoke the sum
    command on the current compressed topology file. The result of
    the sum command is used to determine if an updated copy of the
    topology file needs to be sent to the node. Under heavy paging
    load, the time necessary for popen() to do a fork to invoke ksh,
    and then ksh to do a fork to invoke sum, can be excessive. If
    the Worm does not report back fast enough, when it receives
    a NODE_INIT packet, the primary will drop the node off of the
    switch.

    LOCAL FIX:
    None, really. The node is normally still okay. There shouldn't
    be any problem bringing it back on the switch, via Eunfence.
    But, the damage is already done.

    PROBLEM SUMMARY:
    Nodes can drop off the SP Switch 2 when they are under
    a heavy load (e.g. high levels of paging). The time
    taken to call the AIX sum command to calculate the
    switch topology file checksum may be too long under high
    load conditions, causing the primary node to drop the
    slow responding node off the switch.

    PROBLEM CONCLUSION:
    The fault_service_Worm_RTG_CS code has been changed to
    calculate the checksum of the switch topology file
    directly instead of calling the AIX sum command.

    ------

    APAR: IY32972 COMPID: 5765D6100 REL: 220
    ABSTRACT: LOADL CANNOT REMOVE A RP JOB

    PROBLEM DESCRIPTION:
    One machine in the LL pool had a crash, which left the two jobs,
    which had been running on the machine in the LL queue.
    LL on the machine was back after reboot, but llstatus showed
    that resources are in use - no new jobs would start. A llcancel
    put the jobs in RP, and the resources on the machine were not
    freed. One job had been issued from this machine, this job
    disappeared from the system after deleting the job_queue files
    in spool/ and recycle LL on this machine. second job, issued
    from another machine persists in queue as RP. resources blocked.

    PROBLEM SUMMARY:
    When LoadLeveler came back up after a crash,
    the job previously in suspended state is gone
    but llq still have it shown as running.
    Doing a llcancel could only set the job
    state to RP without truly removing it.

    PROBLEM CONCLUSION:
    When LoadLeveler came back up after a crash,
    the job previously in suspended state is now
    able to run. And llcancel will be able to
    kill the job.

    ------

    APAR: IY32973 COMPID: 5765D5100 REL: 320
    ABSTRACT: REGATTA_H:SP ATTACH, UCFGCOR WILL UNCONFIGURE A TBCPCI ADAPTER

    PROBLEM DESCRIPTION:
    regatta_h: spattach, ucfgcor will unconfigure a tb3pci adapter

    PROBLEM SUMMARY:
    The unconfig method for certain css0 devices would attempt
    and sometime fail to unconfigure any css0 device.

    PROBLEM CONCLUSION:
    The css0 unconfig method will no longer attempt to
    unconfigure invalid device instances.

    ------

    APAR: IY32977 COMPID: 5765D5100 REL: 320
    ABSTRACT: SWITCH CLOCK 75MHZ 75.2MHZ PROBLEM

    PROBLEM DESCRIPTION:
    fsd calculation used to determine switch clock is incorrect
    75mhz 75.2Mhz
    MPI_WTIME MPI_CLOCK_SOURCE

    PROBLEM SUMMARY:
    It's possible for the switch fault service
    daemon to incorrectly detect the clock frequency
    of the switch, which can result in getting invalid
    results when making a call to read the switch clock
    from a program.

    PROBLEM CONCLUSION:
    The switch fault service daemon has been changed
    to accurately determine the switch clock frequency.

    ------

    APAR: IY33006 COMPID: 5765D5100 REL: 320
    ABSTRACT: EXCESSIVE US MPCI REXMIT MESSAGES IN ERRORLOG

    PROBLEM DESCRIPTION:
    excessive us mpci rexmit messages in errorlog

    PROBLEM SUMMARY:
    Customers are seeing an excessive amount of
    MPCI_REXMIT_STALL errors in the error report when using
    PSSP 3.2

    PROBLEM CONCLUSION:
    Two future release defects improved the processing of
    packets which will reduce the number of these messages.
    A fix test by one customer showed a good decrease in
    the messages with these changes.

    ------

    APAR: IY33582 COMPID: 5765D5100 REL: 320
    ABSTRACT: REGATTA_H:SPATTACH: AFTER CORSAIR HOT PLUG, SDR INCORRECT

    PROBLEM DESCRIPTION:
    regatta_H:SPattach:after corsair hot plug, sdr incorrect

    PROBLEM SUMMARY:
    After adapter configuration on the SP
    Switch2, the SDR adapter_config_status attribute was updated
    in the SDR switch_responds class but the SP Switch2 support
    code expects to find this attirbute in the SDR Adapter class.

    PROBLEM CONCLUSION:
    Modify sdr_acs_update to change adapter
    status in SDR Adapter class vs. switch_responds, only for the
    SP Switch2.

    ------

    APAR: IY33935 COMPID: 5724C3505 REL: 310
    ABSTRACT: ALLOW JAVA APPS TO BE CALLED FROM STATE TABLE

    PROBLEM DESCRIPTION:
     Allow Java apps to be called from state table

    PROBLEM SUMMARY:
    Allow Java apps to be called from state table

    PROBLEM CONCLUSION:
    Control of incoming calls is passed to DTBE
    from action InvokeStateTable rather than at the end of the
    internal Action Ringing. This means state table Incoming_Call
    is now executed first

    ------

    APAR: IY34104 COMPID: 5765D5101 REL: 121
    ABSTRACT: HAGSGLSM HANGS AFTER GLOBAL REBOOT

    PROBLEM DESCRIPTION:
    hagsglsm hangs after global reboot

    PROBLEM SUMMARY:
    When HAGSGLSM fails to connect the Group Services (typically
    when Group Services is not ready to serve the clients),
    first it cleans up the connection to make sure of the
    disconnection, and then it retries to connect Group Services
    again.
    However, due to problem in the cleanup routine, HAGSGLSM may
    go into the indefinite wait (deadlock). The only workaround
    would be restart the hagsglsm
    daemon (but not hags) until the fix is applied.

    PROBLEM CONCLUSION:
    By removing the double locking, the deadlock situation will
    be solved

    ------

    APAR: IY34144 COMPID: 5765D5100 REL: 320
    ABSTRACT: LATEST PSSP 3.2.0 FIXES AS OF AUGUST 2002

    PROBLEM DESCRIPTION:
    This is the latest PSSP ptf as of August 2002.
    Order this apar to get all of the ptfs as of August 2002.

    PROBLEM SUMMARY:
    This is a packaging apar for PSSP 3.2.0 fixes
    as of August 2002

    PROBLEM CONCLUSION:
    This is a packaging apar for PSSP 3.2.0
    fixes as of August 2002

    ------

    APAR: IY34232 COMPID: 5724C3505 REL: 310
    ABSTRACT: NUMERIC OPS OUTPUT WRONG RESULT IN 3270 IF RESULT > 2147483647

    PROBLEM DESCRIPTION:
    Directalk 3270 script wont handle values over 2 billion.
    Bad result when adding or subtracting numbers over 2 billion.

    LOCAL FIX:
    Multiply any of the input parameters by 1.0.
    Output will then be correct, but will also have extra
    decimal places, which may require modifications to the
    3270 server.

    PROBLEM SUMMARY:
    Directalk 3270 script wont handle values over 2
     billion.
    Bad result when adding or subtracting numbers over 2 billion.

    PROBLEM CONCLUSION:
    Changed sprintf output from signed int to
    float with no decimal places. Also added a test to abort
    script and generate error 24507 and abort the script if the
    "1e15 > result > -1e15" are exceeded.
    Note: Exceeding these limits will cause loss of precision in
    calculations and hence are trapped.

    ------

    APAR: IY34330 COMPID: 5724C3505 REL: 310
    ABSTRACT: SUPPORT FOR VIAVOICE CUSTOM SERVER DURING CLEAN UP

    PROBLEM DESCRIPTION:
    SUPPORT FOR VIAVOICE CUSTOM SERVER DURING CLEAN UP via state
    table exit

    PROBLEM SUMMARY:
    SUPPORT FOR VIAVOICE CUSTOM SERVER DURING CLEAN
     UP via state table exit

    PROBLEM CONCLUSION:
    Support added

    ------