OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
From: AIX Service Mail Server (aixservaustin.ibm.com)
Date: Tue Nov 06 2001 - 02:24:49 CST

  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

    APAR: IR43957 COMPID: 5697F4800 REL: 410
    ABSTRACT: FWLOGMGMT: NEED NEW FEATURE TO FORCE ARCHIVING OF ALL LOG ENTRIE

    PROBLEM DESCRIPTION:
    Customer would like to be able to archive logs several times a
    day.

    PROBLEM SUMMARY:
    To Enable logging multiple times in a day, The
    "Days Until Archive" filed should be '0' and "fwlogmgmt -l"
    should be scheduled to get invoked multiple times in a day using
    cron facility in AIX.

    PROBLEM CONCLUSION:
    Code has been modified to allow archiving
    multiple times a day.

    ------

    APAR: IR44032 COMPID: 5697F4800 REL: 410
    ABSTRACT: CONFIG CLIENT GETS DISCONNECTED (TCP RST) WHEN LISTING MORE THAN

    PROBLEM DESCRIPTION:
    When you try to list the active connections, the client will
    get a message saying it has been disconnected and you need
    to login again. This is caused by the server sending a tcp
    Reset on that connection.

    LOCAL FIX:
    All the filters still work fine. You can list them from the
    commandline with "fwfilter cmd=list".

    PROBLEM SUMMARY:
    The problem is with List() member function of
    CfgObj class in cfgobj.cpp In List(), the buffer is created with
    the size of 256 bytes for trace information.

    PROBLEM CONCLUSION:
    The parmlist exceeds the size of the buffer
    allocated for traceinfo.

    ------

    APAR: IR44162 COMPID: 5697F4800 REL: 410
    ABSTRACT: TUNNEL ID DISAPPEARS WHEN RULE IS DISPLAYED IN GUI

    PROBLEM DESCRIPTION:
    when displaying a rule, the tunnel id does no appear in the GUI,
    when the panel is saved the tunnel id is lost

    PROBLEM SUMMARY:
    TUNNEL ID DISAPPEARS WHEN RULE IS DISPLAYED IN
    GUI

    PROBLEM CONCLUSION:
    removed string size limit in tunnel config
    GUI dialog to read entire tunnel ID string.

    ------

    APAR: IR44238 COMPID: 5697F4800 REL: 410
    ABSTRACT: FW CRASH DUE TO STACK OVERFLOW

    PROBLEM DESCRIPTION:
    FW crash due to stack overflow caused by NAT

    PROBLEM SUMMARY:
     core produced in some situations with a stack
    overflow.

    PROBLEM CONCLUSION:
    changed storage model to eliminate
    possibility of stack corruption.

    ------

    APAR: IR44297 COMPID: 5697F4800 REL: 410
    ABSTRACT: CFGMGR COMMAND ON FW MACHINE FAILED WITH THE ERROR

    PROBLEM DESCRIPTION:
    On FW system machine, cfgmgr command failed with
    the error
    # cfgmgr
         sh: /usr/sbin/fwmktun : not found.
    /usr/sbin/fwmktun command seems to be called by
    /usr/lib/methods/cfgipsec command.

    PROBLEM SUMMARY:
    Firewall's cfgipsec was calling an AIX tunnel
    module that didn't ship with AIX.

    PROBLEM CONCLUSION:
    remove unsupported function call

    ------

    APAR: IR44420 COMPID: 5697F4800 REL: 410
    ABSTRACT: FTP BYE COMMAND CAUSE FTP SESSION HANGING UP

    PROBLEM DESCRIPTION:
    FTP BYE COMMAND CAUSE FTP SESSION HANGING UP WHEN USING
    NAT BETWEEN AIX AND UNISYS 2000.

    PROBLEM SUMMARY:
    ftp session to unisys and certain VM hosts
    hung at bye command.

    PROBLEM CONCLUSION:
    fixed nat code to properly handle these
    special situations in the protocol with unisys and VM hosts.

    ------

    APAR: IR44500 COMPID: 5697F4800 REL: 410
    ABSTRACT: FWLOGD DUMPED CORE.

    PROBLEM DESCRIPTION:
    fwlogd dumped core.
    DBX output is:
     (dbx) t
     strlen() at 0xd0169cf8
     _doprnt(??, ??, ??) at 0xd0185850
     vfprintf(??, ??, ??) at 0xd0183cd0
     __syslog_r(??, ??, ??, ??) at 0xd01f4860
     syslog(0x3, 0x200018e0, 0x2f5, 0x615f8, 0x0, 0x60014014,
     0x6000e1ce, 0x 0) at 0xd01f4e34.() at 0x10000b14
    APAR IR43682,IR44419 are opened for V420 Firewall but stack
    entries on above dbx ius not same as IR44419. Also,customer's
    system is v4.1.2 level so we need v412 level fix instead of
    v414.

    PROBLEM SUMMARY:
    FWLOGD DUMPED CORE. SEGMENTATION FAULT IN STRLEN()was addressed
    on January 8th of this year and delivered as an eFix. The code
    patch has been integrated into all levels of the Firewall and
    will be available in future releases.

    ------

    APAR: IR44527 COMPID: 5697F4800 REL: 410
    ABSTRACT: LESS THAN OR EQUAL TO RULE GOES TO ANY WHEN ENTERING RULE

    PROBLEM DESCRIPTION:
    When making a rule with the operation less than or equal to,
    when you go back into the rule after saving it, the operation
    is changed to any. This happens on 4.1.X and 4.2 on both AIX
    and NT.

    LOCAL FIX:
    To get around this, we are currently just sing less than.

    PROBLEM SUMMARY:
    improperly terminated string caused the selecti
    on of the <= operation to not match.

    PROBLEM CONCLUSION:
    fix comparator to properly terminate compare
     string.

    ------

    APAR: IR44558 COMPID: 5697F4800 REL: 410
    ABSTRACT: PING NO RESPONCE

    PROBLEM DESCRIPTION:
    Customer issue ping -s 8 <firewall hostname> from
    remote system (winnt,aix etc).
    But this become no response.
    This problem occur if he specify 8,7,6,5,4,3,2 and 1.
    This is solid.
    Therefore, he can't do load-balancing of F/W because
    the load-balancing product use ping with lower than
    8 bytes packet size.
    This problem only occures on firewall V4.2.

    PROBLEM SUMMARY:
    sub 20 byte ICMP packets not forwarded

    PROBLEM CONCLUSION:
    Source for this module that did stack switching was not MP safe
    so fixing the icmp header check running this module on an MP
    machine caused the stack switching weakness to be exposed.
    Now both problems have been rectified.

    ------

    APAR: IR44627 COMPID: 5697F4800 REL: 410
    ABSTRACT: MISLEADING ERROR "UNABLE TO ACTIVATE FUNTION %S."

    PROBLEM DESCRIPTION:
    when socks server fails to start the socks server name was not
    copied to the error message from filters.

    PROBLEM SUMMARY:
    meaningless error message displayed when socks
    server didn't start delayed addressing actual problems since it
    was unclear which module was failing to activate.

    PROBLEM CONCLUSION:
    fixed path used to pass function name.

    ------

    APAR: IR44963 COMPID: 5697F4800 REL: 410
    ABSTRACT: REDESIGN TO REMOVE MEMORY-LIMT ON RULES PER CONNECTION

    PROBLEM DESCRIPTION:
    Message is displayed describing that user has exceeded
    127 Rule Limit

    PROBLEM SUMMARY:
    memory allocated per connection limited rules
    per connection to 127.

    PROBLEM CONCLUSION:
    changed filter implementation to grow memory
    per connection to allow all rules requested.

    ------

    APAR: IR45250 COMPID: 5697F4800 REL: 410
    ABSTRACT: INSUFFICIENT ROUTES FOR SOCKS CONFIGS

    PROBLEM DESCRIPTION:
    Using socks from a subnet on the secure side, the client was una
    ble to get FTP connectivity even though his traffic was hitting
    the firewall and his filters were correct. Socks config
    missing routes for client networks.

    PROBLEM SUMMARY:
    socks configuration doesn't contain routes that
    aren't contiguous to Firewall so packets aren't forwarded.

    PROBLEM CONCLUSION:
    added support so that all routes on the mach
    ine will be added to socks configs.

    ------

    APAR: IR45519 COMPID: 5697F4800 REL: 410
    ABSTRACT: CRASH IN FIREWALL 4.1 FILTER4

    PROBLEM DESCRIPTION:
    System crash with 888-102-300-0c0.
    Dump analysis shows the following...
    > t -mk
    Skipping first MST
    MST STACK TRACE:
    0x0044beb0 (excpt=00000088:0a000000:00000000:00000088:00000106)
                                                 (intpri=0)
            IAR: .dispatch+76c (00025088): sth r0,0x88(r2
    4)
            LR: .dispatch+5c0 (00024edc)
            0044be78: call_dispatch_point+4 (00028a68)
    0x0044ceb0 (excpt=00000000:00000000:00000000:00000000:00000000)
                 (intpri=0)
            IAR: .e_block_thread+27c (000343d4): addi r3,0
    xf8(r31)
            LR: .e_block_thread+27c (000343d4)
            31023e88: .e_sleep_thread+4c (00034af0)
            31023ed8: .[filter4:fltr_in_chk]+7c (01649584)
            31023f28: .[netinet:ipintr_noqueue2]+44c (05201198)
            31023fc8: .[netinet:in_newstack]+24 (0521659c)
            0044c9f0: .[netinet:in_flip_and_run]+54 (05201e18)
            0044ca40: .[netinet:dogisr]+70 (05200b9c)
            0044caa0: .dmx_8022_receive+380 (000baad4)
            0044cb20: .[tok_demux:tok_receive]+21c (015cc6c4)
            0044cbc0: .[stokdd:stok_rx]+5e8 (015c4658)
            0044cc60: .[stokdd:rw_intr]+298 (015c04a4)
            0044ccd0: .[stokdd:stok_intr]+444 (015c0fd0)
            0044cd80: .i_poll_soft+e0 (0001c954)
            0044cde0: .i_softmod+140 (0001c2c0)
            0044ce70: flih_603_patch+cc (000289d4)

    PROBLEM SUMMARY:
    crash in fbetbl_expand

    PROBLEM CONCLUSION:
    corrected table growth algorithm to allow
    table to expand quickly to handle connections.

    ------

    APAR: IR45851 COMPID: 5697F4800 REL: 410
    ABSTRACT: SOCKS SERVICES BEING OMITTED AFTER REGEN OF RULES

    PROBLEM DESCRIPTION:
    socks rules that were identical up to the last digit were being
    omitted as duplicates.

    PROBLEM SUMMARY:
    socks rules that matched up to end of short
    rule were dropped as duplicates.

    PROBLEM CONCLUSION:
    fixed config to properly check to end of rule

    ------

    APAR: IR45987 COMPID: 5697F4800 REL: 410
    ABSTRACT: CRASH IN FILTER4 AT E_SLEEP_THREAD

    PROBLEM DESCRIPTION:
    system crash on SMP machine at e_sleep_thread

    PROBLEM SUMMARY:
    Filter4 was crashing in situations where the
    separate processors (this crash would only occur on an SMP machi
    ne) were contending to a thread. stack trace showed this
    contention in e_sleep_thread

    PROBLEM CONCLUSION:
    changed locking model to use thread safe
    mechanisms.

    ------

    APAR: IR46076 COMPID: 5697F4800 REL: 410
    ABSTRACT: NAT LOG ENTRIES SHOW ICA0021W: LOG MONITOR - MISFORMATTED LOG DA

    PROBLEM DESCRIPTION:
    NAT logging entries are showing up as "misformatted log entries"

    PROBLEM SUMMARY:
    system calls left unexpected data in buffers
    that were used to relay NAT events to log.

    PROBLEM CONCLUSION:
    changed catalog generation path tobypass
    code that was sampling unexpected data.

    ------

    APAR: IR46230 COMPID: 5697F4800 REL: 410
    ABSTRACT: FOPEN ON /DEV/NULL FAILED,ERRNO 24

    PROBLEM DESCRIPTION:
    fopen on /dev/null failed,errno 24
    Indicated the files limit had been reached but the problem was t
    hat even when a file handle was unneccessary it was opened for
    use.

    PROBLEM SUMMARY:
    handles were being left open and causing errors
    at rule generation

    PROBLEM CONCLUSION:
    changed code logic to close uneccesary handl
    es

    TEMPORARY FIX:
    setting nofiles=-1 in /etc/security/limits.

    ------

    APAR: IR46301 COMPID: 5697F4800 REL: 410
    ABSTRACT: GWAUTH CORE DUMPS WITH ERRPT OUTPUT

    PROBLEM DESCRIPTION:
    gwauth indicated as SOFTWARE PROGRAM ABNORMALLY TERMINATED

    PROBLEM SUMMARY:
      improper list of parameters caused core
    in gwauth with strlen 80

    PROBLEM CONCLUSION:
    fixed parameter list in code.

    ------

    APAR: IR46789 COMPID: 5697F4800 REL: 410
    ABSTRACT: FWFILTER CMD=UPDATE FAILS IF USING MORE THAN 9 DEFINED ADAPTERS

    PROBLEM DESCRIPTION:
    If using more than 9 adapters, fwfilter cmd=update fails.
    Error message is "Setup for specific apdapter fails -
    specific(en5,en9". If fwfilter command fails, fw does not work
    properly. FW is installed with following fixes:
    fwaixutils420.tar, fwaixfilter420.tar, fwaixcfgcli420.tar.

    PROBLEM SUMMARY:
    updating filters was failing if one of the
    adapter names was longer than three characters.

    PROBLEM CONCLUSION:
    buffer to hold names was enlarged to allow
    longer names.

    ------

    APAR: IR46790 COMPID: 5697F4800 REL: 410
    ABSTRACT: FWUSER GIVES CORE WHEN NO PASSWORD SUPPLIED

    PROBLEM DESCRIPTION:
    Entering a command to add a user that would require a password
    gives a coredump instead of an error message.
    EX:
    fwuser cmd=add secftp=password

    PROBLEM SUMMARY:
    Recent AIX Maintenance has changed the way AIX
    user structures are initialized so core occurs when uninitialize
    d structure is accessed.

    PROBLEM CONCLUSION:
    initialized variables before setting so that
    code will work with all known maintenance for AIX.

    TEMPORARY FIX:
    use proper syntax (include password parameter in
    initial fwuser command).

    ------

    APAR: IR46793 COMPID: 5697F4800 REL: 410
    ABSTRACT: TIGHTENING BUFFER CHECKING OF TELNETD

    PROBLEM DESCRIPTION:
    Within every BSD derived telnet daemon under UNIX the telnet
    options are processed by the 'telrcv' function. This function
    parses the options according to the telnet protocol and its
    internal state. During this parsing the results which should
    be send back to the client are stored within the 'netobuf'
    buffer. This is done without any bounds checking, since it
    is assumed that the reply data is smaller than the buffer size
    (which is BUFSIZ bytes, usually).

    PROBLEM SUMMARY:
    possible exposure described in IY22029

    PROBLEM CONCLUSION:
    Modified the proxy telnet code with neccessary limit checking
    statements on "netobuf" buffer to avoid the
    buffer overflow.related to IY22029

    ------

    APAR: IR46935 COMPID: 5697F4800 REL: 410
    ABSTRACT: NAT MANYTO1 IS FAILING TO TRANSLATE MACHINES IN D'STREAM NETWORK

    PROBLEM DESCRIPTION:
    The customer has three networks behind the firewall. A 100.1.x.x
    network directly behind
    the firewall secure adapter - then there is a 222.2.x.x network
    downstream from the 100.1.x.x.x
    network - THEN there is a 10.x.x.x network downstream from the
    222.2.x.x network.
    The customer has a NAT many-to-one setup and all networks work
    except the 10.x.x.x network. If he
    uses a NAT Map for any machine in this network NAT will
    translate the address, but as the trace
    shows below the 10.x.x.x network is not being translated
    correctly.

    PROBLEM SUMMARY:
    NAT many-to-one is failing to translate.
    MAP entries WILL work.

    PROBLEM CONCLUSION:
    comparison was failling when the difference
    between the downstream subnet and the contiguous subnet was too
    large (222 -10 in this case). fixed comparison code to properly
    compare addresses.

    ------

    APAR: IR46943 COMPID: 5697F4800 REL: 410
    ABSTRACT: MAIL IS RELAYED WHEN !,%, AND " ARE PART OF THE EMAIL ADDRESS

    PROBLEM DESCRIPTION:
    Mail is relayed when !,%, and " are part of the email address
    For example:
    Following patterns are not blocked (confirmed by running tests)
     relay%orbz.orgpublicdomain.com
     "relayorbz.org"publicdomain.com
     orbz.org!relaypublicdomain.com

    PROBLEM SUMMARY:
     inbound mail being relayed when domain chaining was simulated
    using !, %, and '.

    PROBLEM CONCLUSION:
    checked for other domain separators in local part

    ------

    APAR: IR46951 COMPID: 5697F4800 REL: 410
    ABSTRACT: LOGMONITOR THRESHOLDS FAIL FOR CERTAIN NAT ICA TAGS

    PROBLEM DESCRIPTION:
    LogMonitor thresholds fail for certain NAT ICA tags

    PROBLEM CONCLUSION:
    Changed to properly update thresholds for all NAT log entries.

    ------

    APAR: IR47129 COMPID: 5697F4800 REL: 410
    ABSTRACT: AFTER UPGRADE FROM R420 TO R421 FIREWALL WILL CRASH THE SYSTEM A

    PROBLEM DESCRIPTION:
    system crashes during the reboot of the system with DSI in
    netinet in_pcbhashlookup2

    PROBLEM SUMMARY:
    firewall crashes on reboot when IKE tunnels
    present

    ------

    APAR: IY15176 COMPID: 5765E5100 REL: 600
    ABSTRACT: WRONG IP_TOS FIELD IS SET BY AIX/CS V6 WHEN RUNNING EE

    PROBLEM DESCRIPTION:
    When a ENTERPRISE EXTENDER EE linkstation is started the TCP/IP
    handshaking is started via port 12000. AIX/CS incorrectly sets
    the ip_tos field ip_tos=6. The field is used for flow control
    and should be set according to following table:
     APPN Transmisssion Priority ] Type of Service ] Destination UDP
              LOW ] X'20' ] 12004
              MEDIUM ] X'40' ] 12003
              HIGH ] x'80' ] 12002
              NETWORK ] X'C0' ] 12001
              XID,TEST,DM,DISC ] X'C0' ] 12000
    According to RED Book 'Migrating Subarea Networks'
    SG24-5957 Chapter 5.1.2 Page 67

    PROBLEM SUMMARY:
    Cannot establish HPR/IP (Enterprise Extender or EE) connection a
    cross a firewall that checks the ip_tos parameter in the IP header.

    PROBLEM CONCLUSION:
    Correctly set this parameter to hex 20,40,80 or C0.

    ------

    APAR: IY15384 COMPID: 5765E5100 REL: 600
    ABSTRACT: SYSTEM CRASH WITH SNA_V5ROUTER CODE. THE PROBLEM IS ENCOUNTERED

    PROBLEM DESCRIPTION:
    AIX system crash at
    . sna_v5router:vba_account_buffer_out +68 (053104cc): twllti
      r6,0x200
    From the xmalloc -u and svmon output prior to this crash,
    we can say this crash is encountered when we attempted to
    establish more LU0 connections after the Overflow heap
    allocation of the AIX system was about to be used up.

    PROBLEM SUMMARY:
    Crash in vba_mblk_to_buf_copy from nru_bind or vba_ips_getb from
     nru_bind after severe memory and buffer shortages.

    PROBLEM CONCLUSION:
    Correct code to handle this error case.

    ------

    APAR: IY15762 COMPID: 5765E5100 REL: 600
    ABSTRACT: BUFFER CONGESTION ERROR.

    PROBLEM DESCRIPTION:
    We got server performance degradation problem.
    'sna.err' shows a lot of 'buffer congestion error'.
    There is a bug in CS/AIX if set_kernel_memory_limit and
    set_buffer_availability = 0 is explicitly configured on
    sna_node.cfg. CS/AIX incorrectly calculate available
    memory and falsely detect memory shortages.
    So this problem occurs even in cases that
    flow traffic is so light.

    PROBLEM SUMMARY:
    Buffer shortage after setting config parameters to 0 and modest
    memory usage.

    PROBLEM CONCLUSION:
    Treat 0 as if the det commands were not present, continue to all
    ow non-zero values.

    ------

    APAR: IY15945 COMPID: 5765B7300 REL: 520
    ABSTRACT: CUMULATIVE MAINTENANCE #02 (CSD02) FOR MQSERIES FOR AIX V5.2

    PROBLEM DESCRIPTION:
    Service offering #02 (CSD02) for MQSeries for AIX V5.2 provides
    fixes for the problems described in this APAR.

    PROBLEM SUMMARY:
    Service offering #02 (CSD02) includes fixes to the following
    problems:
                 (to be added later)

    PROBLEM CONCLUSION:
    Cumulative maintenance #02.

    ------

    APAR: IY15990 COMPID: 5765E5100 REL: 600
    ABSTRACT: AIX/CS V6 GENERATES WRONG TCPIP OPTION FIELD

    PROBLEM DESCRIPTION:
    IP Packet generated by Enterprise Extender is wrong.EE code
    puts 00000006 in the ip header option field which is wrong.
    according to RFC 791.
    Here the important trace output of a EE link station setup:
    TOK: ==( 163 bytes transmitted on interface tr0 )== 11:07:52.03
    TOK: 00000000 00401000 5a4f9acf 0020357a b401aaaa
    TOK: 00000010 03000000 08004606 008db5c8 00001e11
    TOK: 00000020 cb120927 00730927 07b90000 00062ee0
    The ip header starts with 4606.. indicating ip v4 and 6 another
    4 byte fields. The last 4 byte field is IP options=00000006
    Because of the fact that there is no IP option in this case this
    field should be x'00000000' or better the ip header should start
    with 4506.. indicating that only five 4 byte fields follow.
    Customers FIREWALL software does not forward an IP packet with
    such an wrong option field.

    PROBLEM SUMMARY:
    Cannot establish HPR/IP (Enterprise Extender) connection through
    a bridge because of an invalid options field in the IP header of
    the packets sent by CS/AIX.

    PROBLEM CONCLUSION:
    Remove programming on invalid TCP option (as HPR/IP uses UDP).

    ------

    APAR: IY17118 COMPID: 5765B7300 REL: 520
    ABSTRACT: MISMATCH IN DATA SENT BY CICS MOVER TO MQSERIES 5.2

    PROBLEM DESCRIPTION:
    When MQSeries MVS CICS mover tries to send messages to MQSeries
    AIX 5.2, after channel times-out, gets resync rejected error.

    PROBLEM CONCLUSION:
    This problem has been fixed and the fix will be shipped in PTF
    U474779 for CSD02.

    ------

    APAR: IY17132 COMPID: 5765E5100 REL: 600
    ABSTRACT: REPEATED PF8 (SCROLL) KEY RESULTS IN DFH2001 MSG FROM THE MAINFR

    PROBLEM DESCRIPTION:
    Repeated PF8 key (scroll down) results in a DFH2001 message from
     the mainframe and user loses session.

    PROBLEM SUMMARY:
    tn3270 client received error message DFH2001 from CICS because
    the CS/AIX TN server sent data without CD. Session may be
    taken down.

    PROBLEM CONCLUSION:
    Correct handling of race case where multiple user AID sequences
    are queued at the TNserver waiting for direction following keybo
    ard restore without CD from the Host program.

    ------

    APAR: IY17428 COMPID: 5765D5100 REL: 330
    ABSTRACT: DCR FOR BLOCKING MPI_SENDS AND RECEIVES TO DECREASE CPU USE.

    PROBLEM DESCRIPTION:
    In an application using blocking MPI sends and receives, there
    is a great deal of cpu use associated with the send or receive
    waiting for completion. The customer complains that this is
    not necessary. It is possible to modify POE for
    MPI send and receive to allow the customer to choose if
    MPI sends and receives work as currently designed or work in
    a new way that would use a great deal less cpu utilzation. Of
    course, this could effect latency or response time of the send
    or receive and therefore, should be an option. In some cases,
    overall througput or performance could be significantly improved
    by using the new option.

    PROBLEM SUMMARY:
    Support MP_WAIT_MODE=NOPOLL in PSSP 3.3.
    as requested in a DCR by customer.

    PROBLEM CONCLUSION:
    Document the availability of MP_WAIT_MODE=NOPOLL.
    It is documented in the ssp.css.README .
    POE APAR IY23338 is also required Tto enable the
    MP_WAIT_MODE=NOPOLL support. IY23338 is shipped
    in service level ppe.poe 3.1.0.13 or later.

    ------

    APAR: IY17482 COMPID: 5765E5100 REL: 600
    ABSTRACT: UNABLE TO RE-ESTABLISH LU2 SESSION.

    PROBLEM DESCRIPTION:
    Able to login to an LU2 session and work. After logging off
    sometime later, unable to re-establish another LU2 session again

    PROBLEM SUMMARY:
    LU1-3 application hangs up. Trace shows CONFIRMED not responded
    to (although no data received from mainframe).

    PROBLEM CONCLUSION:
    Reject CONFIRMED in this state with SNA_STATE.

    ------

    APAR: IY17599 COMPID: 5765D2000 REL: 500
    ABSTRACT: CRASH IN SNA_V5ROUTER WITH IX85570 APPLIED.

    PROBLEM DESCRIPTION:
    MST STACK TRACE:
     0x2ff402e0 (excpt=40404096:40000000:007fffff:40404096:00000106)
    (intpri=11)
    IAR: .[sna_v5router:nrm_rm_timer_deact_sess_proc]+90 (092da
    640):lhz r3,0x56(r4)
    LR: .[sna_v5router:nrm_rm_timer_deact_sess_proc]+28 (092da5d8)
    2efa2a08: .[sna_v5router:nrm_init_to_rm_rec]+88 (092df698)
    2efa2a48: .[sna_v5router:nrm_queue_handler]+b8 (092e03cc)
    2efa2a88: .[sna_v5router:nba_dispatch_input]+290 (09156af0)
    2efa2ae8: .[sna_v5router:nba_dispatch_process]+c8 (091571c4)
    2efa2b38: .[sna_v5router:nba_scheduler]+200 (091579b4)
    2efa2b98: .[sna_v5router:vpr_stream_uw_drive_scheduler]+2c (0914
    b590)

    PROBLEM SUMMARY:
    0x2ff402e0 (excpt=40404096:40000000:007fffff:40404096:00000106)
     (intpri=11)
    IAR: . sna_v5router:nrm_rm_timer_deact_sess_proc +90 (092da
     640):lhz r3,0x56(r4)
    LR: . sna_v5router:nrm_rm_timer_deact_sess_proc +28 (092da5d8)
    2efa2a08: . sna_v5router:nrm_init_to_rm_rec +88 (092df698)

    PROBLEM CONCLUSION:
    The crash occurs under a stress situation with a lot of APPC con
    versations running every second. An internal timer has expired t
    o deactivate a session but has found the incorrect control block
    . I have corrected the code to prevent this happening.

    ------

    APAR: IY17832 COMPID: 5765E5100 REL: 600
    ABSTRACT: SNAADMIN COMMAND APPEARS TO HANG

    PROBLEM DESCRIPTION:
    Customer is experiencing a hang condition on the snaadmin comman
    d when he issues the following command:
    snaadmin add_dlc_trace ,resource_type=LS ,resource_name=ELGARC5R
    The command executes but never returns to a prompt.

    PROBLEM SUMMARY:
    Snaadmin query commands fail after a period of time, when
    Anynet is in the configuration.

    PROBLEM CONCLUSION:
    Correctly discard indication buffer in Anynet code.

    ------

    APAR: IY17866 COMPID: 5765E5100 REL: 600
    ABSTRACT: SNAADMIN HANG

    PROBLEM DESCRIPTION:
    snaadmin commands hang if commands issued after node_init

    PROBLEM SUMMARY:
    snaadmin status_all fails to complete, in fact query_downstream_
    pu fails to complete when ACTPU has been rejected.

    PROBLEM CONCLUSION:
    Alter handling of ACTPU -ve RSP to clear internal data
    structures correctly.

    ------

    APAR: IY17925 COMPID: 5765E5100 REL: 600
    ABSTRACT: CRASH IN V6 SNA_TRACE.

    PROBLEM DESCRIPTION:
    >trace -m
     Skipping first MST
    MST STACK TRACE:
    0x2ff3b400 (excpt=3549144b:42000000:00008811:3549144b:00000106)
    (intpri=0)
    IAR: .[sna_trace:sixt_end_trace]+90 (0361fcac): stb r3,0x8(
    r)
    LR: .[sna_trace:sixt_end_trace]+34 (0361fc50)
    2ff3a720: .[sna_v5router:vll_router_write_log_to_buf]+aa8
    (0365c1ac)
    2ff3a7c0: .[sna_v5router:vlm_write_log]+48 (0365b6d0)
    2ff3a800: .[sna_v5router:nba_pd_print_var]+1524 (03663cd4)
    2ff3ab20: .[sna_v5router:nba_pd_print]+638 (03662778)
    2ff3acc0: .[sna_v5router:nba_pd_print_problem]+58 (03663fd8)
    2ff3ad00: .[sna_v5router:vns_send_verb_to_cfg_daemon]+b0
    (03681238)

    PROBLEM SUMMARY:
    The crash due to a window during termination, we are trying to
    log an error just as we stop sna. It is possibly caused by a
    sequence of repeated starts and stops.

    PROBLEM CONCLUSION:
    Protect code to prevent trying to make log / trace after
    termination.

    ------

    APAR: IY18065 COMPID: 5765D2000 REL: 504
    ABSTRACT: SYSTEM HANGS COMPLETELY WITHOUT LED CODE

    PROBLEM DESCRIPTION:
    System hangs completely without LED Code.

    PROBLEM SUMMARY:
    System hang (single CPU), manaul dump shows looping in
      vannx_parse_locate from vannx_locate_reply

    PROBLEM CONCLUSION:
    Corrected handling of unusual format locate.

    ------

    APAR: IY18201 COMPID: 5765B7300 REL: 520
    ABSTRACT: RUNMQCHL_ND CORE DUMPS AFTER CONVERSATION ENDS

    PROBLEM DESCRIPTION:
    With MQSeries for AIX 5.2, the runmqchl_nd process is started.
    Transmission through AIX SNA Comms Server 6.0. When the
    runmqchl_nd process ends after messages sent, it will produce
    a core dump. This is due to Comms Server library being released
    and a runmqchl_nd process thread becomes unstable.

    PROBLEM CONCLUSION:
    This problem has been fixed and the fix will be shipped in PTF
    U474779 for CSD03.

    ------

    APAR: IY18202 COMPID: 5765D2000 REL: 504
    ABSTRACT: AFTP FROM MVS TO AIX FAIL IF THE FILE IS LARGER THAN 2GB

    PROBLEM DESCRIPTION:
    When I receive a large data set
    (more than 3GB size) from MVS, aftp client(AIX) displays a
    error message " The specified device or disk is full" and
    stops. But there are enough free disks for that file system.
    When I inspect the received file size, aftp always stops at
    same size (2147483647 byte). The bf=true is set.
    The code should be compiled with _LARGE_FILE defination.

    PROBLEM SUMMARY:
    Fix was to recompile enabling the large file option

    ------

    APAR: IY18318 COMPID: 5765D2000 REL: 500
    ABSTRACT: CRASH IN SNA_V5ROUTER STARTING A LINK.

    PROBLEM DESCRIPTION:
    MST STACK TRACE:
    0xf0000000 (excpt=40404220:40000000:007fffff:40404220:00000106)
    (intpri=11)
    IAR: .[sna_v5router:ncs_notify_active]+28c (05123110):
    lhz r4,0x1e0(r3)
    LR: .[sna_v5router:ncs_fsm_ls_int]+cc8 (05124d08)
    2ef62a18: .[sna_v5router:ncs_fsm_ls_int]+cc8 (05124d08)
    2ef62ab8: .[sna_v5router:ncs_fsm_ls_ext]+f18 (0511d4c0)
    2ef62b48: .[sna_v5router:ncs_dlc_to_cs_signals]+1d8 (0511ac50)

    PROBLEM SUMMARY:
    Crash in sna_v5router.

    PROBLEM CONCLUSION:
    Correct code to clear record of tg_number that is not used from
    a failed link activation.

    ------

    APAR: IY18511 COMPID: 5765D2000 REL: 500
    ABSTRACT: CRASH IN SNA_V5ROUTER

    PROBLEM DESCRIPTION:
    The customer is receiving a number of back level APPC conversati
    some of these are timing out (60 seconds) presumably because the
    application is not running or is very slow. Subsequently an aapl
    tries to receive a conversation (perhaps the for this same TP) a
    fails, the application retries very quickly and eventually CS/AI
    up a reused control block and crashes.

    LOCAL FIX:
    change config or application to prevent timeouts

    PROBLEM SUMMARY:
    Rare crash is back level application issues ALLOCATE (accept) a
    long time after a dynamic load for that application has already
    timed out.

    PROBLEM CONCLUSION:
    Problem can be avoided by correctly configuring the timeout for
    oading the application. Fix made to close the window by checking
    the internal control block before trying to preocess the intern
    al RECEIVE_ALLOCATE message.

    ------

    APAR: IY18712 COMPID: 5765E5100 REL: 600
    ABSTRACT: "DON'T CHECK REMOTE NODE NAME" WITH "DON'T SEND LOCAL NODE NAME"

    PROBLEM DESCRIPTION:
    When setting "don't check remote node name" and "don't send
    local node name" when configuring a Link Station with Xsnaadmin
    tool, error messages 4097-7 and 4097-0 appears in sna.err.

    LOCAL FIX:
    Only selecting "don't check remote node name" without "don't
    send local node name" works.

    PROBLEM SUMMARY:
    Cannot modify ls_attributes (such as don't send local node name)
    on an existing ls. Pop-up in xsnaadmin and error log 4097-7.

    PROBLEM CONCLUSION:
    Corret parsing code to compare the correct length string in the
    define_ls verb control block.

    ------

    APAR: IY18932 COMPID: 5765B7300 REL: 520
    ABSTRACT: MQCREATEBAG CAN FAIL WITH A SIGSEGV UNDER MQSERIES V5.2 ON UNIX

    PROBLEM DESCRIPTION:
      The mqCreateBag API call can fail under MQSeries v5.2 on
    UNIX systems with a segmentation fault when called in the
    context of a multithreaded process. When the parameters
    passed to mqCreateBag are invalid it should instead return
    MQRC_HBAG_ERROR.

    LOCAL FIX:
      A code change is required to correct this problem.

    PROBLEM CONCLUSION:
    This problem has been fixed and the fix will be shipped in the
    following PTFs for CSD03:
        A) MQSeries for V5.2 CSD03
              AIX U478289
              HP-UX (V10) U478290
              HP-UX (V11) U478293
              Sun Solaris U478291
              Linux U478292

    ------

    APAR: IY18957 COMPID: 5765E5100 REL: 600
    ABSTRACT: MSG "FIELD PORT_NUMBER WAS SPECIFIED MORE THAN ONCE" WHEN

    PROBLEM DESCRIPTION:
    When trying to add a X.25 port, the msg
    "Field port_number was specified more than once" is returned.
    This happens even with the first and only port.

    LOCAL FIX:
    Use xsnaadmin to define the port

    PROBLEM SUMMARY:
    Cannot define QLLC port from smit. Get an error saying duplicate
      port_number.

    PROBLEM CONCLUSION:
    Remove the prompt for port_number in CS/AIX (it is generated
      automatically).

    ------

    APAR: IY19120 COMPID: 5765E5100 REL: 600
    ABSTRACT: 'SNA -D' PRODUCES INCORRECT OUTPUT UNDER JA_JP ENVIRONMENT.

    PROBLEM DESCRIPTION:
    After migrating CS V6.0.1.0, 'sna -d' command output
    is incorrect in Japanese, Spanish, Portugese, Korean, Chinese
    and Taiwanese.

    PROBLEM SUMMARY:
    In Japanese the sna -d output is badly laid out with vertical ba
    rs all over the place and wrapping roung the end of lines.

    PROBLEM CONCLUSION:
    Correct translation process to remove the ve
    rtical vars (which are delimiters in the catalog files).

    TEMPORARY FIX:
    Code update (ja_JP/sna.cat) supplied to customer

    ------

    APAR: IY19187 COMPID: 5765E5100 REL: 600
    ABSTRACT: SOME DEFUNCT PROCESSES WERE GENERATED.

    PROBLEM DESCRIPTION:
    Some defunct processes were generated when he issued
    "sna stop" command.
    It doesn't disappear until he did reboot aix.

    PROBLEM SUMMARY:
    Defunct processes seen when using Anynet and stopping node.

    PROBLEM CONCLUSION:
    Using setpinit to allow kproc processes to exit correctly.

    ------

    APAR: IY19250 COMPID: 5765E5100 REL: 600
    ABSTRACT: ALLOCATE_LISTEN TPS FAIL WITH 08460000.

    PROBLEM DESCRIPTION:
    ALLOCATE_LISTEN TPs fail with 08460000. The problem occurs
    because the partner system is specifying TP names that
    include 4 null characters at the end.

    PROBLEM SUMMARY:
    Incoming APPC attaches rejected 0864 when there are trailing
    nulls on the TP name.

    PROBLEM CONCLUSION:
    Ignore trailing nulls on TP name in incoming attach to allow mat
    ch with allocate_listen and provide same function as CS/AIX V4.

    ------

    APAR: IY19436 COMPID: 5765D5100 REL: 320
    ABSTRACT: RC.SP NOW FAILS TO NOTIFY ERRPT OF DISABLED PROCESSORS W/ BOOTUP

    PROBLEM DESCRIPTION:
    Can reproduce error as follows, using Hardware Perspectives,
    where node notebook monitors processorsOffline + hostResponds.
    Shutdown and power off multiprocessor node, such as Winterhawk-
    II or Nighthawk-II. Open a TTY, then use SMS menu to disable
    some of the processors (Select 3, then Select 8, then Select #
    of Processor to toggle Disabled/Enabled in Configuration. Exit
    SMS menu with 98 then 99 selections. Power on (and select auto
    unfence if switch use) the node and wait until Host Responds
    (and Switch Responds, if applicable) light is green. During
    this time, upon power off, the Node Notebook Monitored condition
    (last page) will show hostResponds to go into a "triggered"
    state, while processorsOffline will go from "not triggered" to
    "unknown" state. When power is on and Host Responds is back up,
    the hostResponds goes back to a "not triggered" state, and the
    processorsOffline is ALSO BACK TO A "not triggered" state. At
    NO time is processorsOffline triggered. THis is because rc.sp
    did not detect the disabled processors, because $SSP_INSTALL/
    procs_installed did NOT detect the disabled processors. As a
    consequence, the errpt was not updated with SYSMAN001_ER.
    Because this specific error was missed, Perspectives could not
    trigger processorsOffline.

    LOCAL FIX:
    NO FIX. Workaround is to use "lsdev -C | grep proc" to detect
    which procN's are installed (some have proc0,proc1,proc4,proc5)
    then use"lsattr -El proc<specific # identified>" and note if
    it is in a "enable" or "disable" state.

    PROBLEM SUMMARY:
    Perspectives doesn't monitor the "processoroffline"
    condition. When a multiprocessor node is rebooted after
    a processor has been powered down, the display shows a "not
    triggered" condition. It should say "triggered".

    PROBLEM CONCLUSION:
    The reboot script "rc.sp" has been changed to look for a
    "processor offline" status. It had only been checking for
    "processor installed", and missed the power down condition.
    The code to do this has been relocated to a point in the
    reboot process where the Event Manager has been activated.
    It can then intercept error log messages that relate to
    the node's processors.

    ------

    APAR: IY19989 COMPID: 5765D5100 REL: 320
    ABSTRACT: LARGE SYSTEM SSP 3.2 NODE INSTALLS YIELDS 10% CORRUPTED KEYFILES

    PROBLEM DESCRIPTION:
    There are two potential problems in regards to the Kerberos
    keyfile being transmitted to the node during customization.
    In the first case, get_keyfiles fails to transmit the
    keyfile and the customization hangs at LED a69.
    In the second case, the Kerberos keyfile is transmitted to
    the node, but either a partial or a corrupted keyfile
    is received on the node. In this situation, the node
    completes customization successfully and it is not until
    sometime later that the problem with the invalid Kerberos
    keyfile is discovered.

    LOCAL FIX:
    It is not necessary to redo the install. The correct keyfiles
    are already on the CWS. One can either re-customize the node
    or copy the <node_name>-new-srvtab file from the CWS over to
    the node (chmod 600).

    PROBLEM CONCLUSION:
    Code has been added to the customization process to verify
    that the Kerberos keyfile has been transmitted successfully.
    /spdata/sys1/k4srvtabs/<nodename>-new-srvtab-checksum
    is created on the Kerberos Server for nodes at PSSP 3.2
    or higher.
    During customization, a tftpfile of this new file is done.
    Then after the srvtab had been transmitted over the s1term,
    a cksum of the srvtab is done and verified against the
    value in <nodename>-new-srvtab-checksum.
    If the values match, customization continues. If not,
    a message is logged and the srvtab is retransmitted, up to
    5 times.

    ------

    APAR: IY20076 COMPID: 5765D5100 REL: 320
    ABSTRACT: PRB HMREINIT ON HACWS

    PROBLEM DESCRIPTION:
    When the CWS that is connected to the SP frame supervisor
    card through the S2 connector of the serial Y-cable, have the
    resource group. In the SP supervisor card we have leds OK (led
    3 green on, 7 amber flash slow, 4 and 8 off), everything works
    fine, if we don't run hmreinit. If we run hmreinit in this CWS
    (our primary CWS), the SP supervisor card leds will stay like
    led 4 green on, 7 amber flash slow, 3 and 8 off and we loose
    the serial connection.

    LOCAL FIX:
    A restart of the hardmon daemon doing stopsrc -s hardmon and
    startsrc -s hardmon will correct the problem.

    PROBLEM SUMMARY:
    hmreinit when run on the BACKUP HACWS CWS,
    loses the serial connection, therefore,
    the output of spmon -d is incorrect. It
    does not display frame or node information.
    hmreinit issues hmcmds -G runpost, which
    clears out the buffer area, which stores the
    information of what port to listen on.
    When hmreinit is run from ACTIVE BACKUP CWS,
    it listens on port 2. However, after the
    buffer area is cleared, it tries to listen
    on port 1, which is connected to the INACTIVE
    PRIMARY CWS.

    PROBLEM CONCLUSION:
    hmreinit code was enhances to issue
    an /usr/bin/lshacws before issuing
    hmcmds -g runpost command.
    If CWS is an ACTIVE BACKUP, the
    lshacws returns 32, and we do not
    issue the runpost command.

    ------

    APAR: IY20510 COMPID: 5765B7300 REL: 510
    ABSTRACT: ENDMQLSR DOES NOT STOP THE LISTENER IF NUMBER OF PROCESSES >1000

    PROBLEM DESCRIPTION:
    If you use "runmqlsr" for the TCP/IP listener instead of inetd,
    and shutdown the Qmanager while the number of processes exceeds
    1000, the "runmqlsr" stays active and if you use "endmqlsr" to
    stop the listener, you get a message that it couldn't find any
    running listeners for the specified queue manager.

    PROBLEM CONCLUSION:
    This problem has been fixed and the fix will be shipped in the
    following PTFs for CSD03:
        A) MQSeries for V5.2 CSD03
              Windows NT U200148
              AIX U478289
              HP-UX (V10) U478290
              HP-UX (V11) U478293
              Sun Solaris U478291
              Linux U478292

    ------

    APAR: IY20577 COMPID: 5765D6100 REL: 220
    ABSTRACT: LOADL API/QUERY CALLS SLOW IN RETURNING

    PROBLEM DESCRIPTION:
    LoadL query request such as llstatus -l or llq and API calls
    take a long time to return data. The more queries the longer it
    takes to return.

    PROBLEM SUMMARY:
    Query api performance degrdes substantially when a large
    number of queries occur at the same time.

    PROBLEM CONCLUSION:
    Global freelists used for commonly used objects are highly
    referenced
    by the negotiator in separate threads responding to the
    various
    query apis. Every reference requires obtaining a lock
    protecting
    the free list. Due to the high contention for this lock
    when
    many queries arrive at the same time, the queries tend to
    run
    serially which negates the value of running in separate
    threads.
    The solution is to move the free lists to thread specific
    memory
    which does not require locking.

    ------

    APAR: IY20652 COMPID: 5765E5100 REL: 600
    ABSTRACT: SNA SENSE CODE 080F6051 WHEN ACCESS SECURITY SUBFIELDS ARE X'00'

    PROBLEM DESCRIPTION:
    AIX/CS generates SNA sense code 0x'080F6051' on an incoming
    ATTACH request. Here the important part of the ATTACH header:
    RU: 2B0502FF 0003D100 4008E4D5 C9D9C3E5 .....J. .UNIRCV
          4040180B 02000000 00000000 0000000B
          01000000 00000000 00000001 5112FF30
    Above ATTACH shows the access security subfields are present in
    the right structure but are filled with x'00' instead of a UID
    or PW. I assume this causes sense 080F6051.
    Customer uses VSE attach manager and self written APPC BATCH
    code at host site which seems to be responsible fot the ATTACH
    header setup.
    Anyway, the very same ATTACH is accepted by AIX/CS V4.2.

    PROBLEM SUMMARY:
    Cannot establish incoming APPC conversation, security failure, w
    hen attach includes null userid and password parameters.

    PROBLEM CONCLUSION:
    Treat null parameters as if they were not present, bringing the
    code into line with CS/AIX V4.

    ------

    APAR: IY20739 COMPID: 5765B9500 REL: 140
    ABSTRACT: PROBLEM CREATING A GPFS-1.4 NODESET WHEN THERE IS AN ACTIVE GPFS

    PROBLEM DESCRIPTION:
    If you create a nodeset from a node running GPFS-1.4 when
    there is a GPFS-1.2 nodeset on the system the mmfs.cfg
    information for the new nodeset may be lost.

    PROBLEM SUMMARY:
    when converting from gpfs 1.2 to gpfs 1.4 by
    leaving one nodeset at 1.2 and one at 1.4, the configuration
    file for the 1.4 nodeset is incorrect and the file system
    doesn't moung.

    PROBLEM CONCLUSION:
    do not imbed the mmfs.cfg information in the
    mmsdrfs file if there are still nodesets running the old mm
    commands.

    ------

    APAR: IY20768 COMPID: 5765B7300 REL: 510
    ABSTRACT: AMQ9211 AND AMQ9500 MESSAGES WHEN MQPUT LEAKS CACHE MEMORY

    PROBLEM DESCRIPTION:
      When an MQPUT specifying MQPMO_SYNCPOINT is issued by an
    application, MQSeries must allocate some storage from the
    repository manager in order to record a queue manager
    registration. It appears that a new registration is allocated
    for every MQPUT under syncpoint, which causes the MQSeries
    cluster repository manager to fail with AMQ9211 ("Error
    allocating storage.") and AMQ9500 ("No Repository storage.")
    messages. The MQPUT which encounters the failure may then hang
    while MQSeries is clearing up.

    LOCAL FIX:
      A code fix is required for this problem. Note that this APAR
    and IC29908 record similar symptoms, but the cause of each
    problem is different.

    PROBLEM CONCLUSION:
    This problem has been fixed and the fix will be shipped in the
    following PTFs:
        A) MQSeries for V5.1 CSD08
              OS/2 U200141
              Windows NT U200142
              AIX U474841
              HP-UX (V10) U474877
              HP-UX (V11) U474879
              Sun Solaris U474878
        B) MQSeries for V5.2 CSD03
              Windows NT U200148
              AIX U478289
              HP-UX (V10) U478290
              HP-UX (V11) U478293
              Sun Solaris U478291
              Linux U478292

    ------

    APAR: IY20769 COMPID: 5765D5100 REL: 320
    ABSTRACT: SP SWITCH 2 SCALING- NODES THAT ARE 'UNDER LOADL' MAY DROP OFF

    PROBLEM DESCRIPTION:
    Nodes can drop off the switch during Estart, fence, or unfence
    on a very large SP Switch 2 system; this could occur on a
    heavily loaded node in which the fault service daemon takes too
    long to destroy its deviceDatabase.

    PROBLEM SUMMARY:
    The problem seen at LLNL was that standard nodes where
    taking excessive amounts of time in the
    destroyDeviceDatabase() routine. This function frees the
    allocated memory for the fault service daemon device
    database. This database contains all of the status, and
    connection information for the entire switch network.
    The function was taking so long that Estarts were failing,
    because the nodes where not finishing fast enough.

    PROBLEM CONCLUSION:
    The change made to the fault service daemon, will move the
    freeing of memory (database structures) to a separate
    thread. This will allow the port thread to run faster,
    allowing Estarts to finish faster on very large systems.

    ------

    APAR: IY20851 COMPID: 5765D5100 REL: 320
    ABSTRACT: WORM MSG 2547-662 NEEDS EXPECTED (IN ADDITION TO ACTUAL) INFO

    PROBLEM DESCRIPTION:
    Worm message 2547-662 needs expected connection information in
    addition to the actual connection information that is already
    printed.

    PROBLEM SUMMARY:
    Message 2547-662 only shows actual switch connection
    information. The customer requested that it also show
    the expected connection.

    PROBLEM CONCLUSION:
    Message 2547-662 is produced during Estart phase 1
    exploration when the expected switch connections are not
    yet known. This message is usually proceeded by message
    2547-661 which suggests that the packet data received may
    have been corrupted:
      2547-661 Switch chip miswired, or the switch_plane
      and/or the switch_plane_seq in Chip Location Register
      corrupt or uninitialized
    The expected connection data is not available and so
    cannot be displayed.
    However, while reviewing the code, we discovered a bug in
    the handlePh1SwSvcResponse() and handlePh1NodeSvcResponse()
    functions where in some cases a null pointer may be used
    to create data for the ERRID_CS_SW_HARDWARE_ER errpt. In
    this case the target frame and slot number in the errpt
    will be incorrect. We used this APAR to fix that problem,
    so that now if the target device pointer derived from the
    packet data is NULL, -1 will be substituted for frame
    and slot number in the errpt.

    ------

    APAR: IY20874 COMPID: 5765D5100 REL: 320
    ABSTRACT: CSS.SNAP SHOULDN'T DUMP SRAM BY DEFAULT

    PROBLEM DESCRIPTION:
    There is the potential for a node checkstop when css.snap
    dumps Col SRAM. css.snap (cust or internal) should be more
    selective in terms of dumping SRAM, to avoid the checkstop or
    to greatly reduce the possibility of a checkstop. Removing the
    SRAM dump altogether is not possible, because the SRAM contains
    RAS data for certain Col ucode problems.

    PROBLEM SUMMARY:
    The collection of SRAM data sometimes caused nodes to
    checkstop.

    PROBLEM CONCLUSION:
    The default behavior of css.snap has changed to not collect
    SRAM data. The data will now be collected only if a specific
    argument is passed in. (-d).

    ------

    APAR: IY20971 COMPID: 5765D6100 REL: 220
    ABSTRACT: LLCTL -G RECONFIG NOT RESETTING DEBUG FLAGS

    PROBLEM DESCRIPTION:
    When altering the debug flags in the LoadL_config file,
    then issue a llctl -g reconfig, the debug flags are not reset
    if the NEW debug flag keyword is blank. If the debug flag has
    a NEW debug keyword of any type, it will be altered as expected.

    PROBLEM SUMMARY:
    In LoadLeveler, the llctl reconfig will not reset the
    negotiator debug flag
    if it is changed back to a blank line which mean the old
    debug flags messages
    are still written to the negotiator log.

    PROBLEM CONCLUSION:
    In LoadLeveler, the llctl reconfig will now reset the
    negotiator debug flag
    if it is changed back to a blank line which mean no debug
    flags messages
    are to be written to the negotiator log.

    ------

    APAR: IY21030 COMPID: 5765D6100 REL: 220
    ABSTRACT: LLQ -L -X SHOULD STOP TO SHOW WRONG VALUES FOR Q_SYSPRIO

    PROBLEM DESCRIPTION:
    llq -l -x shows incorrect values for q_sysprio

    PROBLEM SUMMARY:
    LoadLeveler's llq -l -x output is not the same as the llq -l
    output for the q_sysprio value.

    PROBLEM CONCLUSION:
    LoadLeveler's llq -l -x output is gotten
    from the Schedd while the llq -l output
    is gotten from the central manager.
    The q_sysprio and system priority values are only used by
    the central
    manager and have no value to the schedd.
    Therefore, the q_sysprio and system priority output for llq
    -l -x
    would not be set.
    This will be documented in the LoadLeveler manual and llq
    man page.

    ------

    APAR: IY21039 COMPID: 5765D6100 REL: 220
    ABSTRACT: LLSUMMARY: THE LEADING ZERO DOES NOT SHOW ON DECIMAL PLACE.

    PROBLEM DESCRIPTION:
    In llsummary output:
    Time: 0+00:00:01.40000 <- PROBLEM
    Time: 0+00:00:01.040000 <- EXPECTATION
    All values have 6 decimal places and if not there is a leading 0
    not being shown due to the mathematical format and restraints.

    PROBLEM SUMMARY:
    The output of llsummary, when the -l option is used, may
    show fractions of seconds for Starter User and System Time
    and Step User and System Time that don't appear to add up
    to the corresponding Starter and Step Total Times that are
    shown in the output.

    PROBLEM CONCLUSION:
    The total seconds and fractions of a second time values
    being printed are from two separate numbers. The
    fractional part represents micro seconds. If the number
    of micro seconds is less than 100000 (.1 seconds), then the
    number is being shown with a missing leading zero. For
    example, what should be .050000 is being shown as .50000.
    That would make a Total Time value look like it is not the
    correct sum of the User and System Time. The formating of
    the microseconds value has been modified to make sure that
    a leading zero is included when it should be.

    ------

    APAR: IY21043 COMPID: 5765D5100 REL: 320
    ABSTRACT: RC.SWITCH FALSELY DETECTS ANOTHER INVOCATION OF RC.SWITCH

    PROBLEM DESCRIPTION:
    rc.switch falsely detects another instance of rc.switch is
    running, and then fails. The problem is due to the logic that
    looks for other procs named rc.switch: the logic correctly
    removes the PID of the current proc, but does not remove the
    PIDs of child procs. The child procs (forked by the shellto
    do the awk or grep) have the same name as the parent (i.e.,
    rc.switch) after they are forked, but before they exec.

    LOCAL FIX:
    Run rc.switch manually.

    PROBLEM SUMMARY:
    rc.switch falsely detects it own child process as another
    instance of rc.switch. The script issues a "ps" command
    to see if there are other processes called rc.switch
    running. Some of the time, the ps command may catch a
    child process (for example, grep or awk) while it still
    has the same name as its parent (i.e. "rc.switch"). The
    result is that the fault service daemon does not get
    started.

    PROBLEM CONCLUSION:
    By adding the "ppid" option to the ps command that looks
    for other instances of rc.switch, children of the current
    rc.switch process will be eliminated from consideration.

    ------

    APAR: IY21146 COMPID: 5765E5100 REL: 600
    ABSTRACT: CRASH IN SNA_V5ROUTER AT V6.0.1.0.

    PROBLEM DESCRIPTION:
    MST STACK TRACE:
     0xf0000000 (excpt=00000000:42000000:00022011:3f2d5000:00000106)
     (intpri=11)
    IAR: . sna_v5router:nbm_free_buffer +54 (046d9b68): twllti
    r4,0x200
    LR: . sna_v5router:nbm_free_buffer +4c (046d9b60)
    2ef629e8: . sna_v5router:nlm_route_actlu_rsp +28 (048b28e0)
    2ef62a28: . sna_v5router:nlm_send_actlu_rsp_pos +84 (048b1ff8)
    2ef62a68: . sna_v5router:nlm_action_sec_sscp_fsm +4b0 (048b4d88)
    2ef62ac8: . sna_v5router:nlm_sec_sscp_fsm +7bc (048b4814)

    PROBLEM SUMMARY:
    Crash showing nbm_free_buffer called from nlm_route_actlu_rsp.

    PROBLEM CONCLUSION:
    Protect code to not release null pointer.

    ------

    APAR: IY21195 COMPID: 5765D5100 REL: 320
    ABSTRACT: PERSPECTIVES ERROR MSGS WITH V2.1 LIBDCE.A IN NON-DCE AUTHEN ENV

    PROBLEM DESCRIPTION:
    If a customer has V2.1 /usr/lib/libdce.a present,
    with PSSP 3.2 (PTF 10) and AIX 4.33, using NON-
    DCE Authentication,the invocation of either
    Perspectives or Sphardware will result in the
    following error messages:
    0509-150 Dependent module /usr/lpp/ssp/bin
    could not be loaded.
    exec(): 0509-036 Cannot load program spsec_ldmod
    because of the following errors:
    0509-022 Cannot load module
                        /usr/lpp/ssp/bin/spsec_ldmod.
    0509-150 Dependent module libdcepthreads.a
             (dcepthreads_shr.o) could not be
             loaded.
    These were similar, but not the same errors
    observed before IY17070 was applied in PTF
    10 of PSSP 3.2. IY17070 corrected errors
    with SDRChangeAttrValues when V2.1 libdce.a
    was present in a non-DCE authentication
    environment. Authough V2.1 libdce.a is not
    "supported" with AIX 4.33, we have already
    committed to its coexistence with PSSP 3.2
    in a non-DCE environment with APAR IY17070.
    The problem with Perspectives / Sphardware
    is independent of the problem that was
    fixed in APAR IY17070.

    LOCAL FIX:
    The workaround is to remove or rename
    /usr/lib/libdce.a; however, there are
    two independent customers who insist
    that V2.1 libdce.a should be allowed
    to coexist with PSSP 3.2 / AIX 4.33
    in a non-DCE authentication environment.

    PROBLEM SUMMARY:
    Issuing sphardware on a CWS in a non-DCE environment will
    result in error messages being issued if a level of DCE
    prior to 3.1 exists on the system. The following messages
    will be issued several times:
    exec(): 0509-036 Cannot load program spsec_ldmod because
                   of the following errors:
           0509-022 Cannot load module
                   /usr/lpp/ssp/bin/spsec_ldmod.
           0509-150 Dependent module libdcepthreads.a
                   (dcepthreads_shr.o) could not be loaded.
           0509-022 Cannot load module libdcepthreads.a
                   (dcepthreads_shr.o).
           0509-026 System error: A file or directory in the
                   path name does not exist.
           0509-022 Cannot load module /usr/lpp/ssp/bin.
           0509-150 Dependent module /usr/lpp/ssp/bin could
                   could not be loaded.
    The routines were trying to access a module that does
    not exist in /usr/lib/libdce.a in the earlier version
    of DCE.

    PROBLEM CONCLUSION:
    Modified code in perspectives and Event Management to
    first verify that DCE is being used on the system,
    prior to attempting to load the DCE libraries.
    This will allow sphardware to be run in a non-DCE
    environment when a level of DCE prior to 3.1 exists
    on the system.
    APAR IY21195 only provides a partial solution to this
    problem. For a complete solution APAR IY22203, available in
    rsct.clients.rte 1.2.1.1 or greater, must also be installed.

    ------

    APAR: IY21212 COMPID: 5765D6100 REL: 220
    ABSTRACT: NEGOTIATOR MESSAGE IN LLQ -S NOT UPDATED/BACKFILL SCHEDD.

    PROBLEM DESCRIPTION:
    Using the Backfill Scheduler, if a job is submitted and the
    llq -s output says in the Negotiator Message that the user
    has hit thier Jobs running limit at that time, when the limit
    is no longer met, that Negotiator message does not get updated.

    PROBLEM SUMMARY:
    Using the backfill scheduler in LoadLeveler,
    when an user hits the user max job limit,
    a message is set in the NEGOTIATOR MESSAGE seen in
    llq -l. However, this message is not reset even
    if the user is no longer at his max job limit.

    PROBLEM CONCLUSION:
    Using the backfill scheduler in LoadLeveler,
    when an user hits the user max job limit,
    a message is set in the NEGOTIATOR MESSAGE seen in
    llq -l. This message will be reset
    if the user is no longer at his max job limit.

    ------

    APAR: IY21458 COMPID: 5765D5100 REL: 320
    ABSTRACT: PSSPFB_SCRIPT NOT HONORING SOME DCE SETTINGS

    PROBLEM DESCRIPTION:
    In DCE there is an environment variable TRY_PE_SITE which, when
    set to 1, tells all DCE jobs to refer to the /etc/dce/security/
    pe_site file for the preferred Security server. This is done
    if a customer has more than one security server defined but
    some are not reliable and are only used as backups.
    When a node is installed, DCE is configured by psspfb_script
    when it calls spauthconfig. This is when TRY_PE_SITE=1 is
    added to /etc/environment. However, /etc/environment's new
    settings do not take effect at that time, and any commands that
    depend on DCE will not honor the TRY_PE_SITE setting. So
    during a new install, psspfb_script may last very long and the
    configuration of some principals may fail.

    LOCAL FIX:
    Set TRY_PE_SITE variable within psspfb_script

    PROBLEM SUMMARY:
    During a node's installation or customization at PSSP 3.2
    or beyond, in a DCE environment, TRY_PE_SITE=1 is written
    to /etc/environment. However, this variable is not
    utilized until the next time a user logs into the node.
    Additional DCE processing will be executed during the
    node's installation or customization that should make
    use of this setting.

    PROBLEM CONCLUSION:
    psspfb_script will set TRY_PE_SITE=1 on its calls to
    spauthconfig so that DCE calls that may be made in
    spauthconfig will be able to take advantage of this setting.

    ------

    APAR: IY21486 COMPID: 5765D5100 REL: 320
    ABSTRACT: VSD SCRIPT READFENCESDR DOES NOT SET PATH AND ENCOUNTERS PROBLE

    PROBLEM DESCRIPTION:
     Problem specifically deals with customer installation that
    sets it's path so that the first paths are that of a gpfs
    filesystem. This filesystem may not be available. The issue
    is the the script /usr/lpp/csd/bin/readFenceSDR does not set
    it's path and inherits the default.

    PROBLEM SUMMARY:
    RVSD recovery and hence GPFS recovery can hang
    if GPFS has been placed in the PATH before
    /usr/bin, /usr/sbin, and /etc.

    PROBLEM CONCLUSION:
    Several of the RVSD scripts will be modified to
    "export PATH=/usr/bin:/usr/sbin:/etc:$PATH" so
    that standard system commands do not get hung
    due to GPFS being hung.

    ------

    APAR: IY21600 COMPID: 5765D5100 REL: 330
    ABSTRACT: SPGETDESC SUPPORT OF 6H1

    PROBLEM DESCRIPTION:
    spgetdesc support of 6H1

    PROBLEM SUMMARY:
    Translation of the type of server is now recognized by
    spgetdesc. The type of server you are on can be obtained by
    executing `uname -M` on the command line.
    Executing spgetdesc will place the appropriate value in the
    "description" attribute of the Node class for a node that is
    a 6H1 Condor SP-attached server.

    PROBLEM CONCLUSION:
    Updated spgetdesc to recognize 6H1 condors.

    ------

    APAR: IY21601 COMPID: 5765D5100 REL: 320
    ABSTRACT: SPGETDESC SUPPORT OF 6H1

    PROBLEM DESCRIPTION:
    spgetdesc support of 6H1

    PROBLEM SUMMARY:
    Translation of the type of server is now recognized by
    spgetdesc. The type of server you are on can be obtained by
    executing `uname -M` on the command line.
    Executing spgetdesc will place the appropriate value in the
    "description" attribute of the Node class for a node that is
    a 6H1 Condor SP-attached server.

    PROBLEM CONCLUSION:
    Updated spgetdesc to recognize 6H1 condors.

    ------

    APAR: IY21612 COMPID: 5765D5100 REL: 320
    ABSTRACT: HA_VSD SCRIPT DOES NOT CHECK THE RETURN CODE OF CFGVSD. IF CFGVS

    PROBLEM DESCRIPTION:
    ha_vsd script does not check the return code of cfgvsd. If cfgvs
    d is not successful, rvsd daemon should not be started

    PROBLEM SUMMARY:
    ha.vsd will continue to bring rvsd up even if cfgvsd detects
    critical configuration files are absent. This could take a
    lot of time needlessly. Also, cfgvsd only checks if the
    files exist. A processing error in readSDR can create the
    file and leave it 0 length. This should be treated as if
    the file was absent.

    PROBLEM CONCLUSION:
    ha_vsd will now exit without trying to bring rvsd up if
    cfgvsd reports a configuration file missing or empty.
    cfgvsd has also been modified to retry generating the
    files 5 times before giving up.

    ------

    APAR: IY21712 COMPID: 5622DJX00 REL: 211
    ABSTRACT: APAR USED TO CREATE PTF FOR DB2 DATAJOINER V211 FOR AIX

    PROBLEM DESCRIPTION:
    APAR USED TO CREATE PTF FOR DB2 Datajoiner v211 for AI

    LOCAL FIX:
    APAR USED TO CREATE PTF FOR DB2 Datajoiner v211 for AI

    PROBLEM SUMMARY:
    APAR used to create PTF for datajoiner for AIX

    PROBLEM CONCLUSION:
    APAR used to create PTF for datajoiner for A

    ------

    APAR: IY21737 COMPID: 5765D6100 REL: 220
    ABSTRACT: TOTALVIEW ABORTS WITH MSG "PIPE ERROR"

    PROBLEM DESCRIPTION:
    TotalView aborts with msg "Pipe Error" in 24 out of 25 cases,
    with the stack trace indicating that this happens somewhere in
    LoadLeveler trying to get DCE credentials and write them out.
    Invocation is:
    "totalview poe -a executable -procs p -nodes n -infolevel 6"

    PROBLEM SUMMARY:
    When the authentication meathod is llgetdce a sigpipe is
    generated causing debuggers to stop.

    PROBLEM CONCLUSION:
    The addition of sending data to the authentication process
    for lldelegate caused a SIGPIPE to be generated when the
    authentication process is llgetdce. LL is not bothered
    by the signal, but debuggers stop processing on the signal.

    ------

    APAR: IY21740 COMPID: 5765D5100 REL: 320
    ABSTRACT: SPUNMIRRORVG FAILS TO REMOVE HDISK0

    PROBLEM DESCRIPTION:
    When trying to unmirror hdisk0 from rootvg using spunmirrorvg
    command, customer gets error:
    0016-687 Reducing the volume group rootvg by disk(s) hdisk0 had
    a problem with a return code of 2.

    LOCAL FIX:
    If spunmirrorvg has already been run, first rerun spmirrorvg.
    Then perform rmlvcopy, reducevg, bosboot, and bootlist on the
    node to reduce the copy of hdisk0 from rootvg.

    PROBLEM SUMMARY:
    spunmirrorvg was unable to both reduce the number of copies
    of a volume group and reduce the volume group by a
    physical volume that was not listed in the pv_list attribute
    of the Volume_Group object in the SDR.
    spunmirrorvg first calls unmirrorvg to reduce the number of
    copies of the volume group. The list of drives which are
    to no longer contain mirrors are not passed as input
    parameters. As a result, by default, unmirrorvg picks
    the set of mirrors to remove from the mirrored volume
    group. It is possible that unmirrorvg may remove the
    mirror from the physical volume that is listed in the
    pv_list attribute. Then when it tries to remove the
    physical volume that is not listed in the pv_list
    attribute via reducevg it fails.

    PROBLEM CONCLUSION:
    Modified spunmirrorvg to determine if there are any
    physical volumes in the volume group that are not
    listed in the pv_list attribute of the Volume_Group
    in the SDR earlier in its processing. If there
    are physical volumes which will no longer contain
    mirrors, they are passed as input to unmirrorvg,
    so that unmirrorvg does not pick which set of
    mirrors to remove from the volume group.

    ------

    APAR: IY21754 COMPID: 5765D5100 REL: 320
    ABSTRACT: ADAPTER DIAGS WILL FAIL WHEN SWITCH NOT POWERED ON

    PROBLEM DESCRIPTION:
    Adapter diags will fail when switch not powered on

    PROBLEM SUMMARY:
    The problem occurred when switches and nodes were powered
    off, and the nodes were powered on. When the node boots up,
    adapter diags gets called by the configuration method.
    Since the attached switch was NOT powered on, an error bit
    was turned on the Interrupt Status Register (ISR). This
    bit does not indicate a problem with the adapter, but simply
    reports that the switch is not powered on. Adapter diags
    has been changed to ignore this bit.

    PROBLEM CONCLUSION:
    The adapter diags have been changed to ignore a bit set
    in the Interrupt Status Register. When the bit is ignored
    the diags will run properly.

    ------

    APAR: IY21755 COMPID: 5765D5100 REL: 320
    ABSTRACT: 128WAY COLONY DS: CABLE_TEST DOES NOT COMPLETE, HARDMON ERROR

    PROBLEM DESCRIPTION:
    128way COLONY DS: cable_test does not complete, hardmon error

    PROBLEM SUMMARY:
    The cable_test tool encountered problems with s1term when
    it repeatedly opened and closed sessions. The hardmon
    function which managed the s1term with the switch supervisor
    card, was getting out-of-synch with the numerous opens and
    closes. Now cable_test will keep the s1term session until
    the test is done with a given switch board.

    PROBLEM CONCLUSION:
    The cable_test tool has been modified to hold s1term
    sessions open on ISB switches until testing is complete. The
    previous version of cable_test repeatedly opened and closed
    s1term connections.

    ------

    APAR: IY21756 COMPID: 5765D5100 REL: 320
    ABSTRACT: COLONY:CABLE_WRAP TESTS LEAVE SWITCH CHIPS UNINITIALIZED

    PROBLEM DESCRIPTION:
    Colony:Cable_wrap Tests leave Switch Chips Uninitialized

    PROBLEM SUMMARY:
    When using the cable_test tool on a large network, errors
    can be injected when placing switch chips into differential
    line test mode. The changes made should cleanup after the
    test has completed.

    PROBLEM CONCLUSION:
    The cable_test tool has been modified to power off and on
    the switches, thus clearing any switch errors.

    ------

    APAR: IY21758 COMPID: 5765D5100 REL: 320
    ABSTRACT: FREE LOCK STRUCTURES IN DEVICE UNCONFIG

    PROBLEM DESCRIPTION:
    free lock structures in device unconfig

    PROBLEM SUMMARY:
    internal driver resources were not being released at
    unconfig time. If this happened repeatedly, it would
    contribute to resource exhaustion. (Unconfig almost never
    runs in the field, so it should not be a problem.)

    PROBLEM CONCLUSION:
    release resources correctly at unconfig time.

    ------

    APAR: IY21760 COMPID: 5765D5100 REL: 320
    ABSTRACT: BCASTPING DOES NOT WORK COMPLETELY

    PROBLEM DESCRIPTION:
    bcastping does not work completely

    PROBLEM SUMMARY:
    when no option "bcastping=0", ping the switch ip subnet
    broadcast address still get "echo reply" from every node.

    PROBLEM CONCLUSION:
    In receiving path of switch ip interface driver, turn on
    the M_BCAST flag in the IP datagram before deliver to IP
    layer, ICMP layer will discard the "echo request" correctly.

    ------

    APAR: IY21761 COMPID: 5765D5100 REL: 320
    ABSTRACT: SYSTEM HUNG OR CRASHED DURING RECOVERY

    PROBLEM DESCRIPTION:
    system hung or crashed during recovery

    PROBLEM SUMMARY:
    system deadlock or crash during adapter recovery.

    PROBLEM CONCLUSION:
    Change boolean flag "xmtReady" to volatile, this will
    prevent compiler to optimize the code path and generate
    the correct result.

    ------

    APAR: IY21765 COMPID: 5765D5100 REL: 320
    ABSTRACT: WORM TIMEOUT VALUES NEED ADJUSTING

    PROBLEM DESCRIPTION:
    Other nodes can fall off the switch while trying to Efence or
    Eunfence a different node.

    PROBLEM SUMMARY:
    Heavily loaded nodes fall off the switch during Estart,
    Efence or Eunfence. The cause is that the primary does not
    wait long enough for acknowledgements from nodes.

    PROBLEM CONCLUSION:
    Solution is to increase the amount of time that the primary
    waits for acknowledgements from nodes.

    ------

    APAR: IY21853 COMPID: 5765B9501 REL: 330
    ABSTRACT: ASSERT: ...||OLDDISKADDRFOUND.COMPADDR(*OLDDISKADDRP)

    PROBLEM DESCRIPTION:
    ASSERT: ...||OLDDISKADDRFOUND.COMPADDR(*OLDDISKADDRP)

    PROBLEM SUMMARY:
    GPFS self check logic declared an error at
    metadata.C, line 7425

    PROBLEM CONCLUSION:
    mnGetSubIndirectBlock does not need to fix
    lastBlockSubblocks when the filesize changes on another
    node now that RF and WF tokens both get the correct
    lastBlockSubblocks from the metanode before allowing
    read/write of the last block.

    ------

    APAR: IY21854 COMPID: 5765B7300 REL: 510
    ABSTRACT: MESSAGES STUCK ON THE XMITQ IN A CLUSTERING ENVIRONMENT

    PROBLEM DESCRIPTION:
    Arriving new messages on a transmission queue does not start the
    cluster channel if it is inactive and messages are stuck on the
    XMITQ.

    LOCAL FIX:
    Set the channels' DISCINT to 0 to keep the channels running
    permanently as a temporary workaround.

    PROBLEM CONCLUSION:
    This problem has been fixed and the fix will be shipped in the
    following PTFs:
        A) MQSeries for V5.1 CSD08
              OS/2 U200141
              Windows NT U200142
              AIX U474841
              HP-UX (V10) U474877
              HP-UX (V11) U474879
              Sun Solaris U474878
        B) MQSeries for V5.2 CSD02
              Windows NT U200140
              AIX U474779
              HP-UX (V10) U474785
              HP-UX (V11) U474837
              Sun Solaris U474789
              Linux U474836
        C) MQSeries for V5.2.1 CSD01
              Windows NT/2000 U200151

    ------

    APAR: IY21869 COMPID: 5765D5100 REL: 320
    ABSTRACT: RUN TIME OF SDR_CONFIG DOES NOT SCALE ON COLONY SYSTEM

    PROBLEM DESCRIPTION:
    A small section of new code in SDR_config for PSSP3.2 that deals
    with the switch node number for the colony switch does not scale
    well. Internal tests on a simulated 512-node system (with
    nothing else going on) result in run times of about 1hr 22min.
    Customers with that size system have reported run times of up to
    two hours. The section of code that is eating up all of this
    time is Get_SPS2_snn calling Find_SDR, which is done once for
    each node, and the time it takes to return increases as the
    number of nodes in the system increases.

    PROBLEM SUMMARY:
    When SDR_config is invoked on a large system the performance
    is unacceptable. As the number of nodes in the system
    increases the runtime of SDR_config increased exponentially.
    The cause of this problem is in the function Get_SPS2_snn
    which is called for each node and returns the next
    available switch node number. In each call, the subroutine
    goes through the entire list of nodes to determine the
    next available switch node number, which is most cases
    is not necessary.

    PROBLEM CONCLUSION:
    The Get_SPS2_snn function in SDR_config has been modified
    to save the next available switch node number from
    previous calls, unless the type of switch in the system
    has changed. This dramatically improves the run time
    of SDR_config on large systems.

    ------

    APAR: IY21912 COMPID: 5765D5100 REL: 320
    ABSTRACT: COLONY: ADAPTREVC MIC STATUS 6XX INTERRUPT BAD ON REC. SWITCH

    PROBLEM DESCRIPTION:
    colony: adaptrecv mic status 6xx interrupt bad on rec. switch

    PROBLEM SUMMARY:
    Unnecessary interrupt were generated by an error biton the
    adapter being incorrectly enabled

    PROBLEM CONCLUSION:
    That additional bit was removed from MIC enables

    ------

    APAR: IY21913 COMPID: 5765D5100 REL: 320
    ABSTRACT: SDRGETOBJECTS CALL IN CSS.SNAP NEEDS FULL PATH

    PROBLEM DESCRIPTION:
    sdrgetobjects call in css.snap needs a full path

    PROBLEM SUMMARY:
    This fix is for css.snap. It fails when it is called from
    the fault service daemon in certain cases as oppose to the
    command line invocation with an (-a). The error messages
    were seen in daemon.stderr. It was caused by a path
    being incorrectly used. It is now fixed.

    PROBLEM CONCLUSION:
    This defect is fixed by adding the correct full path to the
    logical if-else code path so that css.snap no longer fails
    in this situation.

    ------

    APAR: IY21915 COMPID: 5765D6100 REL: 220
    ABSTRACT: USE OF LL PREFERENCES CAUSES NEGOTIATOR TO HANG

    PROBLEM DESCRIPTION:
    The use of preferences will sometimes result in a machine
    being locked twice. This hangs the negotiator.

    LOCAL FIX:
    Remove any use of preferences in loadlevler command files.

    PROBLEM SUMMARY:
    When preference is used, the negotiator will hang due to
    locking twice on the same machine.

    PROBLEM CONCLUSION:
    When preference is used, the negotiator will not hang
    and jobs will run.

    ------

    APAR: IY21946 COMPID: 5765D5100 REL: 320
    ABSTRACT: MOVE ASSIGNMENT OF USER_DMA_AVAIL OUT OF FIRST OPEN CODE

    PROBLEM DESCRIPTION:
    move assignment of user_dma_avail out of first open code

    PROBLEM SUMMARY:
    We are losing mods via chgcss to win_poolsize. The mod does
    not correctly propogate to any active user-space windows.

    PROBLEM CONCLUSION:
    Changes to window size attributes via chgcss will no longer
    be permitted if any user-space windows are active.

    ------

    APAR: IY21949 COMPID: 5765B9501 REL: 330
    ABSTRACT: NFS EXPORTED GPFS FILESYSTEM:LD: SEVERE ERROR:EXEXPECTED I/O

    PROBLEM DESCRIPTION:
    nfs exported gpfs filesystem:ld:severe error:unexpected I/O

    PROBLEM SUMMARY:
    Running compiles with the source in a GPFS file system
    exported via NFS produced an error in ftruncate system call.

    PROBLEM CONCLUSION:
    NFS does not necessarily pass in the FWRITE flag when it
    calls VNOP_FTRUNC, so or it into the flags passed to
    open/trunc/close, so that ftruncInternal does not
    return E_BADF.

    ------

    APAR: IY21956 COMPID: 5765E5100 REL: 600
    ABSTRACT: CRASH IN CS/AIX 600 IN SNA_V5ROUTER

    PROBLEM DESCRIPTION:
    MST STACK TRACE:
     0xf00002e0 (excpt=00368000:42000000:00000000:00368000:00000106)
     (intpri=11)
    IAR: .[sna_v5router:nbm_free_buffer]+54 (0500ab68): twllti
     LR: .[sna_v5router:nbm_free_buffer]+4c (0500ab60)
    2efa29c8: .[sna_v5router:nlm_action_sec_lu_fsm]+338 (051e0330)
    2efa2a28: .[sna_v5router:nlm_sec_lu_fsm]+454 (051e12c0)
    2efa2a98: .[sna_v5router:nlm_sec_lu_signal]+370 (051e1668)

    PROBLEM SUMMARY:
    Problem occurs after hcon LU13 application tries to recover
    after an error. The sequence is that the application issues
    deallocate(SSCP) followed by allocate(SSCP) while bound and
    not BETB and then receives UNBIND, DACTLU, ACTLU and then
    logs on again. hcon then issues deallocate(SSCP) again when
    BOUND and CS/AIX gets confused trying to send an UNBIND RSP
    internally. I have added code to prevent this.

    PROBLEM CONCLUSION:
    Prevent code trying to send UNBIND RSP following session restart

    ------

    APAR: IY21983 COMPID: 5765D5100 REL: 320
    ABSTRACT: SYNTAX ERRORS IN POST_PROCESS PREVENTING HARDMON FROM BEING

    PROBLEM DESCRIPTION:
    upto ssp.basic 3.2.0.11 there are some syntax errors in
    /usr/lpp/ssp/install/bin/post_process.
    namely, line 344 :
    while $finis < 3 && $rc > 79 && $rc < 87
    should be changed to :
    while $finis -lt 3 && $rc -gt 79 && $rc -lt 87
    and line 400:
    while $rc != 0 && $subsys_checked < 20
    should be changed to :
    while $rc -ne 0 && $subsys_checked -lt 20
    line 408 :
    if $subsys_checked = 20
    to
    if $subsys_checked -eq 20
    as well as probably some :
    if $? != 0
    to
    if $? -ne 0
    and some:
    if $rc = 0
    to
    if $rc -eq 0
    depending on whether you interpret a returncode as a
    numerical or an ascii value ;-)
    the case in line 400 prevents hardmon from being stopped
    correctly before modifying its subssystem which could cause
    him to not start.

    LOCAL FIX:
    change the mentioned lines accordingly.

    PROBLEM SUMMARY:
    Some of the integer comparisons are being performed
    incorrectly in post_process. The checks for integers being
    equal will work correctly, but the checks for < or > are
    comparing the ascii values instead of the integers. As a
    result it is possible that hardmon may not be started.

    PROBLEM CONCLUSION:
    Modified post_process so that comparisons of integers are
    done correctly. The code now uses -lt instead of < and
    -gt instead of >.

    ------

    APAR: IY22004 COMPID: 5765D9300 REL: 310
    ABSTRACT: MPI_CART_MAP CORE-DUMPS

    PROBLEM DESCRIPTION:
    MPI_Cart_map core-dumps

    PROBLEM SUMMARY:
    When using a one by one cartesian structure in MPI_Cart_map
    with periods set to true, MPI failed to map the structure
    correctly and may cause seg fault. Basically, the code
    treats the only task in the structure as both its own
    successor and predecessor, therefore two neighbors while
    actually should be only one.

    PROBLEM CONCLUSION:
    Code changed so that the a task can have itself as only one
    neighbor in a periodic dimension, and the structure will be
    mapped correctly.

    ------

    APAR: IY22005 COMPID: 5765D6100 REL: 220
    ABSTRACT: GETGRGID_R CALL FAILING UNDER NIS

    PROBLEM DESCRIPTION:
    llq -l | grep Unix does not show a group for some users when LL
    is running with NIS.

    PROBLEM SUMMARY:
    At unknown levels of AIX libc.a, getgrgid_r can return
    ENOENT instead of ERANGE when the buffer passed in is
    too small.

    PROBLEM CONCLUSION:
    LL is designed to start with a given size buffer and
    continually retry calling getgrgid_r with larger buffers
    until ERANGE is no longer returned. In recent discussions
    with AIX development, I was informed that we could not
    count on receiving ERANGE when the buffer passed in is too
    small (even though the man page says we can), and that
    getgrgid_r should always be called with a buffer that is
    large enough to hold GRPLINLEN (a grp.h define value)
    characters. For LL accouting to be correct, we must provide
    this work around until such time as getgrgid_r is fixed.
    We will modify the code so that a buffer of GRPLINLEN+1
    bytes is always used on the first call to getgrgid_r. This
    will eliminate the possibility that the buffer will be too
    small. The retry code will be left in place in case it
    is ever needed again.

    ------

    APAR: IY22076 COMPID: 5765E5100 REL: 600
    ABSTRACT: NODE HANG ON ANYNET STOPPING

    PROBLEM DESCRIPTION:
    With Anynet started, stopping the node results in an unrespon-
    sive sna subsystem, apparently hanging on ANYNET_STOPPING. With
    no Anynet configured or Anynet not running the system responds
    as expected.

    PROBLEM SUMMARY:
    Probelm as described above. Code fix to correct
    logic.

    PROBLEM CONCLUSION:
    If using Anynet, updated module is needed in
    scenarios where unable to stop the node without a hang.

    TEMPORARY FIX:
    don't use Anynet or reboot to free up system

    ------

    APAR: IY22078 COMPID: 5765D5100 REL: 320
    ABSTRACT: RVSD_VERSION NOT UPDATED FOR 3.2.0.4

    PROBLEM DESCRIPTION:
    At rvsd 3.2.0.4 (vsd.rvsd.rvsdd) there was a functional change
    to how RVSD deals with fenced VSDs. The RVSD_Fence Class is
    removed and a different mechanism is used. In order for this
    to be triggered RVSD_version must be updated. Up to this point
    we were only concerned with release levels and RVSD_verison was
    only being updated for new releases. But here is a case where
    we care about PTF level. A packaging change will need to be
    made to vsd.rvsd.rvsdd to update RVSD_verion level for 3.2

    LOCAL FIX:
    Manaully update RVSD_version fields (Node class) in the SDR to
    3020004. But be sure to verify that vsd.rvsd.rvsdd is at
    3.2.0.4 or later first.

    PROBLEM SUMMARY:
    RVSD_version was not being updated at
    vsd.rvsd.rvsdd 3.2.0.4 (PTF Set 5) as required.

    PROBLEM CONCLUSION:
    RVSD_version will now be updated to reflect 3.2.0.4
    or later.

    ------

    APAR: IY22138 COMPID: 5765D5100 REL: 320
    ABSTRACT: CSS_COLONY:ADAPTRECV-DECREASE CA INTERVALS-AVOID THRESHOLDING

    PROBLEM DESCRIPTION:
    css_colony:adaptrecv-decrease ca intervals-avoid thresholding

    PROBLEM SUMMARY:
    SP Switch2 adapter recovery was taking the node off the
    switch if two critical errors were seen within a 24-hour
    interval. This resulted in the node falling off the
    switch too often.

    PROBLEM CONCLUSION:
    The CA threshold interval was decreased to four hours.

    ------

    APAR: IY22139 COMPID: 5765D5100 REL: 320
    ABSTRACT: POOR PERFORMANCE OF COLONY DOUBLE/SINGLE

    PROBLEM DESCRIPTION:
    poor performance of colony double/single

    PROBLEM SUMMARY:
    The chgcss command does not dynamically change the
    poolsize for the css1 interface. The new value
    takes effect only after the node is rebooted.

    PROBLEM CONCLUSION:
    The chgcss command was changed to initialize the
    device number field in the poolsize data structure.

    ------

    APAR: IY22167 COMPID: 5765B9500 REL: 130
    ABSTRACT: MMMKVSD: GET ADAPTER FAILED FOR IPA: <IP ADDR>

    PROBLEM DESCRIPTION:
    The command mmmkvsd uses the commands
    /usr/lpp/mmfs/bin/mmcommon convin and
    /usr/lpp/mmfs/bin/mmcommon convnr
    (in section: # Figure out our partition.)
    However, mmcommon no longer has the functions
    convin and convnr implemented. Using mmmkvsd
    will fail with the message
    mmmkvsd: Get adapter failed for IPA: <IP Addr>

    PROBLEM SUMMARY:
    during migration from gpfs 1.2 to gpfs 1.3,
    the mmmkvsd and mmcrfs commands did not operate if issued
    from the cws.

    PROBLEM CONCLUSION:
    fix mmmkvsd command in gpfs 1.3 to operate
    with gpfs 1.2 nodes.

    ------

    APAR: IY22190 COMPID: 5765D5100 REL: 320
    ABSTRACT: IY18700 BREAKS /USR/LPP/SSP/INSTALL/BIN/INSTALL_LIB.PL CODE.

    PROBLEM DESCRIPTION:
    setup_server result in follow errors:
        mknimint: 0016-202 The IP address tuple is not numeric.
        mknimint: 0016-201 There is an incorrect number of tuples in
              address.
        setup_server: 0016-279 Problem of internally called command:
              /usr/lpp/ssp/bin/mknimint; rc= 2.
    This only occurs when APAR IY18700 is applied on the system.
    The problem is generated when an IP address' tuple is composed
    of a single zero (i.e: 192.0.12.12).

    LOCAL FIX:
    APAR IY18700 needs to be removed from the system. Back levels
    of the /usr/lpp/ssp/install/bin/install_lib.pl code have been
    successfully tested.

    PROBLEM SUMMARY:
    ***********************************************************
    * USERS AFFECTED: *
    * *
    * Users with IY18700 (available in ssp.basic 3.2.11) *
    * installed on their Control Workstation or B/I Server, *
    * which also have an ethernet adapter with an IP *
    * address with at least one octet beginning with 0. *
    * *
    ***********************************************************
    * PROBLEM DESCRIPTION: *
    * *
    * mknimint will fail with the following message: *
    * 0016-202 The IP address tuple is not numeric. *
    * *
    ***********************************************************
    * RECOMMENDATION: *
    * *
    * Replace the current version of *
    * /usr/lpp/ssp/install/bin/install_lib.pl with a *
    * pre ssp.basic 3.2.11 version of the file, or install *
    * IY22190 when available. *
    * *
    ***********************************************************

    ------

    APAR: IY22192 COMPID: 5765D5100 REL: 330
    ABSTRACT: INTERNAL ESTART FOLLOWING SWITCH ERROR RECOVERY FAILED

    PROBLEM DESCRIPTION:
    Internal Estart following switch error recovery failed

    PROBLEM SUMMARY:
    Switch recovery had a flaw where a switch chip port would
    not be disabled if a second switch chip reported an error
    which required it to be disabled.

    PROBLEM CONCLUSION:
    Switch recovery was modified to properly disable both ports
    when handling the second error.

    ------

    APAR: IY22246 COMPID: 5765B9501 REL: 330
    ABSTRACT: MTIME CHANGES NOT PROPAGATED ON DIRS FOR DFS

    PROBLEM DESCRIPTION:
    mtime changes not propagated on dirs for DFS

    PROBLEM SUMMARY:
    Bug in DFS export handling of mtime found in code review.

    PROBLEM CONCLUSION:
    Make sure all directory updates get a write lock on the
    directory inode
    (wo or stronger). This is necessary for the exact mtime
    option to work
    correctly on directories.

    ------

    APAR: IY22249 COMPID: 5765B9501 REL: 330
    ABSTRACT: VSX ASSERT FAILED: SRCDIROP.LOCKMODE == LKOBJ::XW FILE DIRECT.C

    PROBLEM DESCRIPTION:
    VSX assert failed: srcDirOp.lockmode == LkObj::xw file direct.C

    PROBLEM SUMMARY:
    GPFS self check logic failed when renaming a directory to a
    directory name that already exists.

    PROBLEM CONCLUSION:
    When renaming a directory and another directory already
    exists with the
    same name, get an XW lock on the parent directory to change
    the link count
    to account for removal of one child directory.

    ------

    APAR: IY22251 COMPID: 5765B9501 REL: 330
    ABSTRACT: VSX ORDINARY USER VSX0 PANIC'S NODE IN GPFS: PANIC: VNODEOPS.C:

    PROBLEM DESCRIPTION:
    VSX ordinary user vsx0 panic's node in gpfs: panic: vnodeops.C:

    PROBLEM SUMMARY:
    Kernel panic in a soft mount environment using mmap.

    PROBLEM CONCLUSION:
     Mmap cannot assume that first vnode pointer in gnode is
    the one used for mapping. Change page fault code not to use
    vnode at all.
    Instead. save VFS data pointer in gpfsNode_t at mapping
    time. Also, since
    paging code can no longer put a hold on vnode, fix
    synchronization between
    pager and mmap termination to ensure that termination won't
    finish until
    all outstanding paging requests are complete.

    ------

    APAR: IY22253 COMPID: 5765B9501 REL: 330
    ABSTRACT: MISC FSCK PROBLEMS

    PROBLEM DESCRIPTION:
    misc fsck problems

    PROBLEM SUMMARY:
    Serviceability improvements for mmfsck.

    ------

    APAR: IY22254 COMPID: 5765B9501 REL: 330
    ABSTRACT: 384WAY, GPFS1.4: ASSERT METADATA.C, LINE 4878 DURING MMFSCK

    PROBLEM DESCRIPTION:
    384way, GPFS1.4: assert metadata.C, line 4878 during mmfsck

    PROBLEM SUMMARY:
    GPFS self check logic failed when running mmfsck in a two
    node HACMP cluster with single node quorum enabled

    PROBLEM CONCLUSION:
    Two node cluster case caused doRecovery to be called
    even when it wasn't needed. This cannot be done in the face
    of fsck.
    Fsck is set up to avoid this by having the clients mount
    readonly
    and not use a log.

    ------

    APAR: IY22257 COMPID: 5765B9501 REL: 330
    ABSTRACT: SLOW MMFSCK

    PROBLEM DESCRIPTION:
    slow mmfsck

    PROBLEM SUMMARY:
    mmfsck performance change

    ------

    APAR: IY22263 COMPID: 5765B9501 REL: 330
    ABSTRACT: BAD EXAMPLE AND MISSING INFO IN MMLSQUOTA DOCUMENTATION

    PROBLEM DESCRIPTION:
    The command documentation for mmlsquota shows an example where
    usr + in_doubt exceeds the quota. This is not allowed by the
    code. Also, the quota documentation should explicitly state that
    user + in_doubt is not allowed to exceed the hard quota.

    PROBLEM SUMMARY:
    The mmlsquota command description did not explicitly state
    the sum of usr and in-doubt could not exceed the hard limit.
    The example also showed a case where this had occured and
    therefore also needed to be updated.

    PROBLEM CONCLUSION:
    The description was updated to include:
    For each file system in the nodeset, the mmlsquota command
    displays:
       Block limits:
            quota type (USR or GROUP)
            current usage in KB
            soft limit in KB
            hard limit in KB
            space in-doubt
            grace period
       File limits:
            current number of files
            soft limit
            hard limit
            files in-doubt
            grace period
    As the sum of the in-doubt value and the current usage may
    not exceed the hard limit, the actual block space and
    number of files available to the user or group may be
    constrained by the in-doubt value. Should the in-doubt
    value approach a significant percentage of the quota, run
    the mmcheckquota command to account for the lost of space
    and files.
    The example was updated as:
    User paul enters:
    mmlsquota
     The system displays information similar to:
                          Block Limits | File
    Limits
    Filesystem type KB quota limit in_doubt grace|files quota
    limit in_doubt grace
    gpfsn USR 728 100096 200192 4880 none| 35 30
      40 10 6days
     This output shows the quotas for user paul in file system
    gpfsn set to a soft limit of 100096K and a hard limit of
    200192K. 728K is currently allocated to him. 4880 is also
    in-doubt, meaning that the quota system has not yet been
    updated as to whether this space has been used by the nodes,
    or whether it is still available. No grace period appears
    because the user has not exceeded his quotas. If he had
    exceeded the soft limit the grace period would be set and
    the user would have that amount of time to bring his usage
    below the quota value. If he failed to do so, he would not
    be allocated any more space.
    The soft limit for files (inodes) is set at 30 and the hard
    limit is 40. 35 files are currently allocated to this user,
    and the quota system does not yet know whether the 10
    10 in-doubt have been used or are still available. A grace
    period of six days appears because the user has exceeded
    his quotas. The user has that amount of time to bring his
    usage below the quota value. If he fails to do so, he will
    not be allocated any more space.

    ------

    APAR: IY22264 COMPID: 5765B9501 REL: 320
    ABSTRACT: BAD EXAMPLE AND MISSING INFO IN MMLSQUOTA DOCUMENTATION

    PROBLEM DESCRIPTION:
    The command documentation for mmlsquota shows an example where
    usr + in_doubt exceeds the quota. This is not allowed by the
    code. Also, the quota documentation should explicitly state that
    user + in_doubt is not allowed to exceed the hard quota.

    PROBLEM SUMMARY:
    The mmlsquota command description did not explicitly state
    the sum of usr and in-doubt could not exceed the hard limit.
    The example also showed a case where this had occured and
    therefore also needed to be updated.

    PROBLEM CONCLUSION:
    The description was updated to include:
    For each file system in the nodeset, the mmlsquota command
    displays:
       Block limits:
            quota type (USR or GROUP)
            current usage in KB
            soft limit in KB
            hard limit in KB
            space in-doubt
            grace period
       File limits:
            current number of files
            soft limit
            hard limit
            files in-doubt
            grace period
    As the sum of the in-doubt value and the current usage may
    not exceed the hard limit, the actual block space and
    number of files available to the user or group may be
    constrained by the in-doubt value. Should the in-doubt
    value approach a significant percentage of the quota, run
    the mmcheckquota command to account for the lost of space
    and files.
    The example was updated as:
    User paul enters:
    mmlsquota
     The system displays information similar to:
                          Block Limits | File
    Limits
    Filesystem type KB quota limit in_doubt grace|files quota
    limit in_doubt grace
    gpfsn USR 728 100096 200192 4880 none| 35 30
      40 10 6days
     This output shows the quotas for user paul in file system
    gpfsn set to a soft limit of 100096K and a hard limit of
    200192K. 728K is currently allocated to him. 4880 is also
    in-doubt, meaning that the quota system has not yet been
    updated as to whether this space has been used by the nodes,
    or whether it is still available. No grace period appears
    because the user has not exceeded his quotas. If he had
    exceeded the soft limit the grace period would be set and
    the user would have that amount of time to bring his usage
    below the quota value. If he failed to do so, he would not
    be allocated any more space.
    The soft limit for files (inodes) is set at 30 and the hard
    limit is 40. 35 files are currently allocated to this user,
    and the quota system does not yet know whether the 10
    10 in-doubt have been used or are still available. A grace
    period of six days appears because the user has exceeded
    his quotas. The user has that amount of time to bring his
    usage below the quota value. If he fails to do so, he will
    not be allocated any more space.

    ------

    APAR: IY22392 COMPID: 5765D5100 REL: 330
    ABSTRACT: COLONY CONFIG NOT REPORTING CORRECT ADAPTER DIAG

    PROBLEM DESCRIPTION:
    colony config not reporting correct adapter diag

    PROBLEM SUMMARY:
    Resource name field of error report contained 'diag_fail'
    message only.

    PROBLEM CONCLUSION:
    Adapter name string was added into resource_name field. Now
    output from errpt command looks like 'css0_diag_fail' or
    'css1_diag_fail'. This is test output from errpt:
    IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
    63F81C60 0813155301 P S css0_diag_fail CSS config failed

    ------

    APAR: IY22509 COMPID: 5765E5100 REL: 601
    ABSTRACT: LU62 'WHAT_DATA_RECEIVED' RETURNS PSH_COMPLETE_DATA_RECEIVED=6'

    PROBLEM DESCRIPTION:
    Customer gets message "uncoded what_data_rcvd val 6" in his
    APPC application program. This problem happens very seldom
    and a core dump is written. AIX/CS API trace shows:
    output.what_data_rcvd = 6
    According to luxsna.h this indicates:
    'PSH_COMPLETE_DATA_RECEIVED=6'
    The very same APPC program was running fine on SNA Server V4.2.
    Customers appl is not able to process above 'what_data_rcvd'
    value and abends.

    PROBLEM SUMMARY:
    Application fails because it gets unexpected
    what_data_received of PS_COMPLETE_DATA_RCVD.

    PROBLEM CONCLUSION:
    Corrected code to handle the BC fill=buffer
    case correctly, tracking current location within a GDS record
    between reads.

    ------

    APAR: IY22523 COMPID: 5765D5100 REL: 320
    ABSTRACT: COLONY CONFIG NOT REPORTING CORRECT ADAPTER DIAG

    PROBLEM DESCRIPTION:
    colony config not reporting correct adapter diag

    PROBLEM SUMMARY:
    Resource name field of error report contained 'diag_fail'
    message only.

    PROBLEM CONCLUSION:
    Adapter name string was added into resource_name field. Now
    output from errpt command looks like 'css0_diag_fail' or
    'css1_diag_fail'. This is test output from errpt:
    IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
    63F81C60 0813155301 P S css0_diag_fail CSS config failed

    ------

    APAR: IY22530 COMPID: 5765E7200 REL: 310
    ABSTRACT: CIFSUSERPROC IS STARTED WHEN CONNECTING WITH INVALID USER.

    PROBLEM DESCRIPTION:
    Enable passthrough authentication server and try to connect
    with a valid user on AIX and passthrough server, and with
    invalid password. A new process cifsUserProc will be created
    with every attempt.

    PROBLEM CONCLUSION:
    exit from cifsUserProc, if the request from the server is,
    just to verify the user existency.

    ------

    APAR: IY22535 COMPID: 5765E6800 REL: 300
    ABSTRACT: MULTILEVEL WILDCARDING

    PROBLEM DESCRIPTION:
    MultiLevel Wildcarding is not supported in the current design
    of Agent

    PROBLEM CONCLUSION:
    Enhanced the code to support the mutlilevel wildcarding.

    ------

    APAR: IY22538 COMPID: 5765D5100 REL: 320
    ABSTRACT: USER SPACE JOB SOMETIMES GET KILLED WITH SWITCH CLOCK

    PROBLEM DESCRIPTION:
    User Space Job sometimes get killed with switch clock

    PROBLEM SUMMARY:
    MPI job can be killed even if successful adapter recovery
    takes place due to switch clock management.

    PROBLEM CONCLUSION:
    With this fix, MPI job should continue if successful
    adapter recovery takes place.

    ------

    APAR: IY22540 COMPID: 5765D5100 REL: 320
    ABSTRACT: NEED TO ADD CALL TO THE MULTILINK DUMP UTILITIES IN CSS.SNAP.

    PROBLEM DESCRIPTION:
    Need to add call to the Multilink dump utilities in css.snap

    PROBLEM SUMMARY:
    The css.snap command needs to collect data on the ml0
    device.

    PROBLEM CONCLUSION:
    The css.snap will now collect data on the ml0 device if it
    is present.

    ------

    APAR: IY22541 COMPID: 5765D5100 REL: 320
    ABSTRACT: 128WAY COLONY D/S : CSS.SNAP RUN ON NODE FILLS UP /VAR

    PROBLEM DESCRIPTION:
    128Way Colony D/S : css.snap run on node fills up /var

    PROBLEM SUMMARY:
    The css.snap command was not checking in all the correct
    places for file system usage.

    PROBLEM CONCLUSION:
    The css.snap command now looks in /var/adm/SPlogs/css0 and
    /var/adm/SPlogs/css1 as well as /var/adm/SPlogs/css for file
    system usage before execution.

    ------

    APAR: IY22542 COMPID: 5765D5100 REL: 320
    ABSTRACT: FUNCTIONALIZE CSS.SNAP EXITS

    PROBLEM DESCRIPTION:
    Functionalize css.snap exits

    PROBLEM SUMMARY:
    The internal flow of css.snap changed.

    PROBLEM CONCLUSION:
    Added an exit function to css.snap.

    ------

    APAR: IY22557 COMPID: 5765E7400 REL: 300
    ABSTRACT: JAZIZO: CPU AND MEMORY ENHANCEMENT

    PROBLEM DESCRIPTION:
    Jazizo needs to be enhanced to have lesser CPU & memory usage

    PROBLEM CONCLUSION:
    modified the code to minimize the data type conversions.

    ------

    APAR: IY22558 COMPID: 5765E7400 REL: 300
    ABSTRACT: PTX V3: AZIZO PRINTER ERROR

    PROBLEM DESCRIPTION:
    On PTX V3.0, when try to print from azizo to a file
    or a printer, following message pops up
    Paper width -802024944.00 is out of range
    ........
    ........
    Please correct input and try again

    PROBLEM CONCLUSION:
    Included the prototype header file in azizo code.

    ------

    APAR: IY22559 COMPID: 5765E7400 REL: 300
    ABSTRACT: 3DMON MISINTERPRETS -D OPTION WITH -DISPLAY OPTION OF XMOTIF

    PROBLEM DESCRIPTION:
    3dmon takes a number along with the -d option as invitation
    delay seconds but the XtInitialize routine which is supposed
    to open the X display interface interprets it as the display
    identifier and tries to open the display, and fails.

    LOCAL FIX:
    Specify the invitation delay value with -d option
    without leaving space between -d & the value.
    e.g. instead of '-d 15' use '-d15'.

    PROBLEM SUMMARY:
    3dmon takes a number along with the -d option as invitation
     delay seconds but the XtInitialize routine which is supposed
     to open the X display interface interprets it as the display
     identifier and tries to open the display, and fails.

    PROBLEM CONCLUSION:
    Fixed the code such that the ambiguity of -d option is
    is removed.

    ------

    APAR: IY22560 COMPID: 5765E6800 REL: 300
    ABSTRACT: XMSERVD DOESN'T STOPS AFTER TIMETOLIVE MINUTES

    PROBLEM DESCRIPTION:
    xmservd has a -l option which can be used to specify
    time_to_live minutes.Even when this time_to_live minutes
    is specified after the -l option,xmservd may not die
    after the specified time_to_live minutes value.

    PROBLEM CONCLUSION:
    Fixed the problem in the calculation of the death time of
    xmservd.

    ------

    APAR: IY22576 COMPID: 5765B9501 REL: 320
    ABSTRACT: VSX CHOWN() DID NOT CLEAR S_ISUID BIT

    PROBLEM DESCRIPTION:
    vsx chows() did not clear s_isuid bit

    PROBLEM SUMMARY:
    chown() did not clear S_ISUID bit

    PROBLEM CONCLUSION:
    fast path when owner and group already
    match was not clearing the S_ISUID or SPISGID bits

    ------

    APAR: IY22578 COMPID: 5765B9501 REL: 330
    ABSTRACT: VSX CHOWN() DID NOT CLEAR S_ISUID BIT

    PROBLEM DESCRIPTION:
    vsx chows() did not clear s_isuid bit

    PROBLEM SUMMARY:
    chown() did not clear S_ISUID bit

    PROBLEM CONCLUSION:
    fast path when owner and group already
    match was not clearing the S_ISUID or SPISGID bits

    ------

    APAR: IY22591 COMPID: 5765D6100 REL: 220
    ABSTRACT: NEGOTIATOR CRASHING

    PROBLEM DESCRIPTION:
    LLQ commands begin to respond slow and then Negotiator stops
    responding and then negotiator crashes.

    LOCAL FIX:
    Do not issue llstatus and llq repeatedly at the same time.

    PROBLEM SUMMARY:
    APAR IY20677 changed locking of various BTree data
    structures
    from using exclusive write locks to using shared read locks
    for
    accesses where the data structures were not being changed.
    Since
    the mechanism for traversing these data structures involves
    actually
    changing the data structure, the shared read locks are
    insufficient
    for protecting the data structure in these cases. This
    inadequate
    locking leads to memory errors which cause the negotiator to
    core dump.

    PROBLEM CONCLUSION:
    In all cases, code has been changed to hold exclusive write
    locks
    when accessing BTree data structures, regardless of whether
    data
    in the data structure is being changed.

    ------

    APAR: IY22592 COMPID: 5765B9501 REL: 330
    ABSTRACT: LOOPING ON RECLOCKRESET

    PROBLEM DESCRIPTION:
    looping on reclockreset

    PROBLEM SUMMARY:
    Development discovered potential loop.

    PROBLEM CONCLUSION:
    Fix loop in RecLockReset due to F_SETLK call using a
    different l_vfs value
    from the one returned by F_GETLK.

    ------

    APAR: IY22593 COMPID: 5765B9501 REL: 330
    ABSTRACT: WAITING BECAUSE OF LOCAL BYTE RANGE LOCK CONFLICT

    PROBLEM DESCRIPTION:
    waiting because of local byte range lock conflict

    PROBLEM SUMMARY:
    Deadlock when a disk error occurs causing a file system
    forced unmount and there are mapped files attempting to be
    pruged.

    PROBLEM CONCLUSION:
    When BR lock gets an error (e.g. SG panic) when flushing
    mapped buffers
    in the byte range, it must unlock the byte-range because the
    caller
    will assume that the range was never acquired.

    ------

    APAR: IY22594 COMPID: 5765B9501 REL: 330
    ABSTRACT: MMFSD STUCK AFTER FAILING

    PROBLEM DESCRIPTION:
    mmfsd stuck after failing

    PROBLEM SUMMARY:
    There have been cases where GPFS termination deadlocked in
    AIX services because the AIX service required a lock held by
    a GPFS thread which detected the failure. The termination
    sequence then hung which resulted in GPFS not restarting.

    PROBLEM CONCLUSION:
    Instead of waiting forever SigUsr1Handler now waits
    for 5 minutes for internal dump to complete and exit

    ------

    APAR: IY22595 COMPID: 5765B9501 REL: 330
    ABSTRACT: ASSERT AT DSYNCH.C, LINE 1315

    PROBLEM DESCRIPTION:
    assert at dsynch.c, line 1315

    PROBLEM SUMMARY:
    GPFS self check logic detected an error at dsynch.C line
    1315

    PROBLEM CONCLUSION:
    Thread was left holding alloc server mutex
    during a config change.

    ------

    APAR: IY22620 COMPID: 5765D6100 REL: 220
    ABSTRACT: LLQ <JOBID> IN MULTI DOMAIN ENVIRONMENT DOES NOT WORK

    PROBLEM DESCRIPTION:
    llq <jobid> in multi domain environment does not work
    Please refer to defect 76010
    When issuing the llq jobid command from a node in a
    different domain from the one running the schedd
    which received the job, no information is reported.
    Example:
    On sp7tr12.hursley.ibm.com:
    $ llq
    Id Owner
    anubis.ssd.hursley-.65.0 loadl
    $ llq anubis.ssd.hursley.ibm.com.65.0
    llq: There is currently no job status to report.
    However, llq -l shows the long listing on all jobs,
    including anubis.ssd.hursley.ibm.com.65.0.

    PROBLEM SUMMARY:
    When issuing the llq jobid command from a node in a
    different domain from the one running the schedd
    which received the job, no information is reported.

    PROBLEM CONCLUSION:
    When issuing the llq jobid command from a node in a
    different domain from the one running the schedd
    which received the job, information is now being reported.

    ------

    APAR: IY22713 COMPID: 5765B9500 REL: 140
    ABSTRACT: MMMKVSD: GET ADAPTER FAILED FOR IPA: <IP ADDR>

    PROBLEM DESCRIPTION:
    The command mmmkvsd uses the commands
    /usr/lpp/mmfs/bin/mmcommon convin and
    /usr/lpp/mmfs/bin/mmcommon convnr
    (in section: # Figure out our partition.)
    However, mmcommon no longer has the functions
    convin and convnr implemented. Using mmmkvsd
    will fail with the message
    mmmkvsd: Get adapter failed for IPA: <IP Addr>

    PROBLEM SUMMARY:
    during migration from gpfs 1.2 to gpfs 1.3, the
    mmmkvsd and mmcrfs commands did not operate if issued from the
    CWS.

    PROBLEM CONCLUSION:
    fix mmmkvsd command in gpfs 1.3 to operate
    with gpfs 1.2 nodes.

    ------

    APAR: IY22714 COMPID: 5765B9501 REL: 330
    ABSTRACT: ERR 4 OF UNKNOWN ORIGIN IN QUOTA PREFETCH

    PROBLEM DESCRIPTION:
     ERR 4 OF UNKNOWN ORIGIN IN QUOTA PREFETCH

    PROBLEM SUMMARY:
    Quota pre-fetch returned RC 4 without explanation

    PROBLEM CONCLUSION:
    Initialize secondary error variable so that FlushRecord exit
    code
    does not return incorrect value.

    ------

    APAR: IY22716 COMPID: 5765B9501 REL: 330
    ABSTRACT: NODE LEFT IN FAILED STATE AFTER RECOVERY

    PROBLEM DESCRIPTION:
    mmfsadm dump cfgmgr will show a node with state "fail" on some
    nodes while it is either "down" or "up" on the other nodes. This
    will block the socket communication to that node causing hung
    threads if revokes are needed.

    PROBLEM SUMMARY:
    During a node failure, a second failing node did not
    complete recovery.

    PROBLEM CONCLUSION:
    Serialize the joining of the single-phase group
    and the N-phase group. This will prevent an observing node
    from seeing a leave event from the single-phase group
    after it has already seen leave/recovery/down/join phases
    on the N-phase group, and assume it must be a new failure.

    ------

    APAR: IY22741 COMPID: 5765B9501 REL: 330
    ABSTRACT: EIO ERROR FROM MMGETACL COMMAND

    PROBLEM DESCRIPTION:
    Problem occurs due to storage of in-memory buffers of the ACL
    file when doing permission checks on directories which have
    implied permission(s).

    PROBLEM SUMMARY:
    Application received an incorrect rejection of access to a
    file because ACLs did not match.

    PROBLEM CONCLUSION:
    Error description: Do not modify the in-memory buffer of the
    ACL file
    when doing permission checks on directories which have
    implied permissions.
    After the modification, the next access of the ACLs would
    return
    E_VALIDATE because the hashed value of the modified entry
    would not match.

    ------

    APAR: IY22764 COMPID: 5765E5400 REL: 440
    ABSTRACT: GEO_MOUNT_FS DOES NOT SET GROUPNAME

    PROBLEM DESCRIPTION:
    In HACMP (and HAES) 4.4.1, the cl_activate_fs
    script requires that the GROUPNAME is set.
    However, HaGeo calls cl_activate_fs without
    setting it.
    In the hacmp.out file, you'll see several
    errors including:
    odmget: Could not retrieve object for HACMPresource,
            odm errno 5904
    ERROR: Could not get the value of FSCHECK_TOOL
    ERROR: Could not get the value of RECOVERY_METHOD

    PROBLEM SUMMARY:
    In an HAGEO environment with HACMP, Geo is calling
    cl_activate_fs from Geo_mount_fs to mount the Geo filesystems.
    Certain environment variables are not set such as "group"
    which is an attribute of HACMPresource. When an odmget is
    issued as follows: odmget -q name=FSCHECK_TOOL AND group=
    HACMPresource you get the following error: odmget: Could not
    retrieve object for HACMPresource, odm errno 5904 It looks
    like FSCHECK_TOOL is not set, but somehow, fsck gets run.

    PROBLEM CONCLUSION:
    Change the code in cl_activate_fs to directly querry ODM for
    any information that was not currently available in the
    environment, and suppress error messages for conditions
    handled by taking defaults.

    ------

    APAR: IY22869 COMPID: 5765E5400 REL: 440
    ABSTRACT: ATTEMPT TO START CLUSTER HANGS IN CLSTART - HAES

    PROBLEM DESCRIPTION:
    When attempting to start the cluster in HAES 4.4, the startup
    hung in clstart. It was hanging on a grep for clstop and
    a tail -1.

    PROBLEM SUMMARY:
    When attempting to start the cluster in HAES 4.4, the startup
    hung in clstart. It was hanging on a grep for clstop and
    a tail -1.

    PROBLEM CONCLUSION:
    The problem was in code which attempts to find a timestamp
    of the last stop of the cluster. Two problems existed:
    1. If the cluster.log entry in /etc/syslog.conf was missing
    it still continued with trying to grep a null filename
    causing the hang. 2. It was grepping for clstop in the
    cluster.log file, which does not exist. Code was changed
    to check to be sure a file location was returned before
    attempting the grep and the grep was changed to look for
    "EVENT COMPLETED: node_down_complete".

    ------

    APAR: IY22905 COMPID: 5765E7200 REL: 310
    ABSTRACT: RENAME A FILE/FOLDER TO DIFFERENT CASE FAILS

    PROBLEM DESCRIPTION:
    Unable to move a file/folder name to same name but different
    case on FastConnect.

    PROBLEM CONCLUSION:
    Fixed the file/folder move functionality.

    ------

    APAR: IY22906 COMPID: 5765E7200 REL: 310
    ABSTRACT: COREDUMP WITH 1024 CHARACTERS

    PROBLEM DESCRIPTION:
    Server core dumps

    PROBLEM CONCLUSION:
    Changing the assert statement to support more characters

    ------

    APAR: IY22946 COMPID: 5765B9501 REL: 330
    ABSTRACT: MMDELDISK STOPS EARLY ON WARNING

    PROBLEM DESCRIPTION:
    mmdeldisk stops early on warning

    PROBLEM SUMMARY:
    In certain error cases, mmdeldisk stopped trying to move
    data due to recoverable errors which could have allowed
    additional data to be recovered.

    PROBLEM CONCLUSION:
    Only terminate repair scan on fatal errors. Scan was being
    stopped
    for non-severe errors such as E_NOBALSPC, but tsdeldisk
    would still
    delete the disk, leaving files with disk addresses pointing
    to the
    deleted disk (i.e. data would be lost).

    ------

    APAR: IY22947 COMPID: 5765B9501 REL: 320
    ABSTRACT: EIO ERROR FROM MMGETACL COMMAND

    PROBLEM DESCRIPTION:
    Problem occurs due to storage of in-memory buffers of the ACL
    file when doing permission checks on directories which have
    implied permission(s).

    PROBLEM SUMMARY:
    Application received an incorrect rejection of access to a
    file because ACLs did not match.

    PROBLEM CONCLUSION:
    Error description: Do not modify the in-memory buffer of the
    ACL file
    when doing permission checks on directories which have
    implied permissions.
    After the modification, the next access of the ACLs would
    return
    E_VALIDATE because the hashed value of the modified entry
    would not match.

    ------

    APAR: IY23023 COMPID: 5765B9501 REL: 320
    ABSTRACT: MMLSQUOTA SHOWS INCONSISTENT GRACE PERIOD VALUES

    PROBLEM DESCRIPTION:
    mmlsquota shows inconsistent grace period values

    PROBLEM SUMMARY:
    mmlsquota shows incorrect grace period.

    PROBLEM CONCLUSION:
     mmlsquota shows inconsistent grace period values. Do not
    use the same buffer for two different strings in one print
    statement.

    ------

    APAR: IY23024 COMPID: 5765B9501 REL: 330
    ABSTRACT: MMLSQUOTA SHOWS INCONSISTENT GRACE PERIOD VALUES

    PROBLEM DESCRIPTION:
    mmlsquota shows inconsistent grace period values

    PROBLEM SUMMARY:
    mmlsquota shows incorrect grace period.

    PROBLEM CONCLUSION:
     mmlsquota shows inconsistent grace period values. Do not
    use the same buffer for two different strings in one print
    statement.

    ------

    APAR: IY23025 COMPID: 5765B9501 REL: 330
    ABSTRACT: LOOP IN UNMOUNT

    PROBLEM DESCRIPTION:
    loop in unmount

    PROBLEM SUMMARY:
    Loop in unmounting a file system under certain conditions

    PROBLEM CONCLUSION:
    cxiCanUncacheOSNode must only return a vnode pointer if the
    count field
    is zero. Infinite loop in unmount calling unCache if not

    ------

    APAR: IY23026 COMPID: 5765B9501 REL: 330
    ABSTRACT: ASSERT ACQUIRING ALMSERVER MUTEX ALREADY HELD

    PROBLEM DESCRIPTION:
    Assert acquiring almserver mutex already held

    PROBLEM SUMMARY:
    GPFS self check logic asserted in the disk allocation
    manager

    PROBLEM CONCLUSION:
    CHECK_CONFIG_CHANGE_RELALM should be calling
    releaseServerAlmMutex
    instead of acquireServerAlmMutex.

    ------

    APAR: IY23028 COMPID: 5765B9501 REL: 330
    ABSTRACT: FORCED UNMOUNTED FS WILL NOT REMOUNT

    PROBLEM DESCRIPTION:
    forced unmounted fs will not remount

    PROBLEM SUMMARY:
    A file system was unmounted by the system due to I/O errors
    while processing a mapped file. The unmount did not fully
    complete inhibiting a remount without shutting down GPFS on
    that node.

    PROBLEM CONCLUSION:
    Allow processing mapped page buffers as long as the SG is
    still
    available, so that force unmount can get all the dirty page
    buffers
    flushed during the initial sync phase of the unmount.

    ------

    APAR: IY23030 COMPID: 5765B9501 REL: 320
    ABSTRACT: NODE LEFT IN FAILED STATE AFTER RECOVERY

    PROBLEM DESCRIPTION:
    mmfsadm dump cfgmgr will show a node with state "fail" on some
    nodes while it is either "down" or "up" on the other nodes. This
    will block the socket communication to that node causing hung
    threads if revokes are needed.

    PROBLEM SUMMARY:
    During a node failure, a second failing node did not
    complete recovery.

    PROBLEM CONCLUSION:
    Serialize the joining of the single-phase group
    and the N-phase group. This will prevent an observing node
    from seeing a leave event from the single-phase group
    after it has already seen leave/recovery/down/join phases
    on the N-phase group, and assume it must be a new failure.

    ------

    APAR: IY23034 COMPID: 5765D5100 REL: 320
    ABSTRACT: HACWS NODE_UP_COMPLETE.POST_EVENT LOOPS FOREVER

    PROBLEM DESCRIPTION:
    ssp.hacws.usr.3.1.1, node_up_complete.post_event loops forever
    if configured with hacmp/es.
    node_up_complete.post_event waits hardcoded for
    /usr/sbin/cluster/.telinit
    BUT
    /etc/inittab: (if HAES is configured)
    clinit:a:wait:/bin/touch /usr/es/sbin/cluster/.telinit

    LOCAL FIX:
    change node_up_complete.post_event manually from
    /usr/sbin/cluster/.telinit
    to
    /usr/es/sbin/cluster/.telinit

    PROBLEM SUMMARY:
    In an HACMP/ES environment in /etc/inittab
    entry clinit does a touch to /usr/es/sbin/cluster/.telinit.
    However, the node_up_complete.post_event file is
    searching to /usr/sbin/cluster/.telinit. Hence we keep
    looping for over 5 minutes looking for that file.
    The node_up_complete.post_event file needed to be
    modified to search for /usr/es/sbin/cluster/.telinit
    instead of /usr/sbin/cluster/.telinit

    PROBLEM CONCLUSION:
    The code was changed for script node_up_complete.
    post_event in an HACMP/ES environment to look for
    /usr/es/sbin/cluster/.telinit file INSTEAD of
    /usr/sbin/cluster/.telinit.

    ------

    APAR: IY23107 COMPID: 5765D5100 REL: 330
    ABSTRACT: PERSPECTIVES ERROR MSGS WITH V2.1 LIBDCE.A IN NON-DCE AUTHEN ENV

    PROBLEM DESCRIPTION:
    PERSPECTIVES ERROR MSGS WITH V2.1 LIBDCE.A IN NON-DCE AUTHEN Env

    PROBLEM SUMMARY:
    Issuing sphardware on a CWS in a non-DCE environment will
    result in error messages being issued if a level of DCE
    prior to 3.1 exists on the system. The following messages
    will be issued several times:
    exec(): 0509-036 Cannot load program spsec_ldmod because
                   of the following errors:
           0509-022 Cannot load module
                   /usr/lpp/ssp/bin/spsec_ldmod.
           0509-150 Dependent module libdcepthreads.a
                   (dcepthreads_shr.o) could not be loaded.
           0509-022 Cannot load module libdcepthreads.a
                   (dcepthreads_shr.o).
           0509-026 System error: A file or directory in the
                   path name does not exist.
           0509-022 Cannot load module /usr/lpp/ssp/bin.
           0509-150 Dependent module /usr/lpp/ssp/bin could
                   could not be loaded.
    The routines were trying to access a module that does
    not exist in /usr/lib/libdce.a in the earlier version
    of DCE.

    PROBLEM CONCLUSION:
    Modified code in perspectives and Event Management to
    first verify that DCE is being used on the system,
    prior to attempting to load the DCE libraries.
    This will allow sphardware to be run in a non-DCE
    environment when a level of DCE prior to 3.1 exists
    on the system.
    APAR IY23107 only provides a partial solution to this
    problem. For a complete solution APAR IY22203, available in
    rsct.clients.rte 1.2.1.1 or greater, must also be installed.

    ------

    APAR: IY23162 COMPID: 5765E5400 REL: 440
    ABSTRACT: ONLY ONE SWAP_ADAPTER WHEN TWO SERVICE ADAPTERS FAIL - HAES

    PROBLEM DESCRIPTION:
    The customer was using 4-port ethernet adapters and was testing
    what would happen if one of these adapter cards failed
    completely. With 2 service adapters, one for each of two
    different networks, configured on the same 4-port adapter, the
    customer pulled both service cables and only one swap_adapter
    event occurred followed by its fail_standby event. No event
    occurred for the second network.

    PROBLEM SUMMARY:
    The customer was using 4-port ethernet adapters and was testing
    what would happen if one of these adapter cards failed
    completely. With 2 service adapters, one for each of two
    different networks, configured on the same 4-port adapter, the
    customer pulled both service cables and only one swap_adapter
    event occurred followed by its fail_standby event. No event
    occurred for the second network.

    PROBLEM CONCLUSION:
    The problem was that the second swap_adapter event on the
    queue was being found to be the same one that had just
    completed, because there was no test for the network being
    the same. This was corrected in the evque.C routine.

    ------

    APAR: IY23184 COMPID: 5765D5100 REL: 320
    ABSTRACT: COLONY CUTOVER

    PROBLEM DESCRIPTION:
    colony cutover

    PROBLEM SUMMARY:
    We restructured a few headers to aid future serviciblity.

    ------

    APAR: IY23200 COMPID: 5765E7200 REL: 310
    ABSTRACT: OPLOCKS ALLOW DATA CORRUPTION BETWEEN 95 AND NT EXCEL USERS.

    PROBLEM DESCRIPTION:
    if oplockfiles = yes, Fast Connect server allow uses to
    modify a shared MS office file (excell, power point) causing
    data corruption

    PROBLEM CONCLUSION:
    allow only one user having full access to the file,
    others can have read-access only.

    ------

    APAR: IY23217 COMPID: 5765E7200 REL: 310
    ABSTRACT: SELECT FILE PROPERTIES CHANGES DATE TO 2497

    PROBLEM DESCRIPTION:
    Timestamp is wrong

    PROBLEM CONCLUSION:
    Checking if all the timestamp bits are set or not

    ------

    APAR: IY23248 COMPID: 5765D5100 REL: 320
    ABSTRACT: USE IBM,7010-S90 FOR CONDORM 6M1

    PROBLEM DESCRIPTION:
    use ibm, 7010-s90 for condorM 6M1

    PROBLEM SUMMARY:
    The CondorM 6M1 needs to be added to the spgetdesc command
    as a valid node type with the description value of IBM,
    7026-6M1.

    PROBLEM CONCLUSION:
    spgetdesc is usually run by rc.sp on the node during
    installation. This command uses the output of uname -M to
    lookup the proper name of the node, in this case, IBM,
    7026-6M1.

    ------

    APAR: IY23253 COMPID: 5765E5100 REL: 600
    ABSTRACT: CRASH IN SNA_V5ROUTER: MEMORY THAT PSE ACCESSED HAD BEEN FREED

    PROBLEM DESCRIPTION:
    Crash in sn_v5router

    LOCAL FIX:
    Code update to be provided by development to address problem in
    freed memory.

    PROBLEM SUMMARY:
    Crash in putq indirectly from vpr_stream_output_msg.

    PROBLEM CONCLUSION:
    Correctly use locking to prevent closure of streams while
    routing a msg to that stream.

    ------

    APAR: IY23257 COMPID: 5765D5100 REL: 320
    ABSTRACT: 128WAY COLONY:CABLE_TEST DOES NOT RESTART FSD

    PROBLEM DESCRIPTION:
    128way colony: cable_test does not restart fsd

    PROBLEM SUMMARY:
    The cable_test tool would not be able to issue a 'dsh'
    command to any of its nodes, if the hostname on the nodes is
    set to the 'ml' interface but the 'reliable hostname' is
    still set to the SPlan interface. This is fixed by setting
    the HN_METHOD variable to reliable, thus forcing dsh to
    use the reliable_hostname.

    PROBLEM CONCLUSION:
    The cable_test tool was changed to set the HN_METHOD
    variable. This will cause the tool to use the reliable
    hostnames stored in the SDR.

    ------

    APAR: IY23258 COMPID: 5765D5100 REL: 320
    ABSTRACT: 128WAY COLONY:CABLE_TEST COMPLETES, GENERATES ERROR MSGS

    PROBLEM DESCRIPTION:
    128way colony:cable_test completes, generates error msgs

    PROBLEM SUMMARY:
    The cable_test tool was causing the following message to
    appear on the console when it executed:
    mknod: /tmp/pipe5.78662: Do not specify an existing file
    With this change, the message no longer appears.

    PROBLEM CONCLUSION:
    The cable_test tool was modified to remove 'old' pipe files
    before continuing, thus avoiding the mknod problem.

    ------

    APAR: IY23260 COMPID: 5765B9501 REL: 320
    ABSTRACT: MMCHMGR RESULTS IN LONG WAITERS-WAITING FOR SG MGR MIGRATE

    PROBLEM DESCRIPTION:
    mmchmgr results in long waiters-waiting for sg mgr migrate

    PROBLEM SUMMARY:
    Deadlock in GPFS when running mmchmgr with a quota command
    active.

    PROBLEM CONCLUSION:
    If a quota file token is lost to another node, migrating
    the FS mgr to another node will hang after it has
    blocked TM activity and then tries to flush the quota file
    which needs to get the token back. Sync/End the quota
    manager before blocking TM activity.

    ------

    APAR: IY23262 COMPID: 5765B9501 REL: 320
    ABSTRACT: VNOP_LOOKUP WITH VATTR ERROR RETURNS RC=0

    PROBLEM DESCRIPTION:
    If lookup succeeds but getattr fails to fill in vattr struct,
    return error code instead of just setting va_flags=VA_NOTAVAIL.

    PROBLEM SUMMARY:
    An NFS server node paniced in NFS exporting a GPFS file
    system.

    PROBLEM CONCLUSION:
    VNOP_LOOKUP with vattr was getting return code 0
    and NFS assumed that the vattr structure was filled in
    without looking at va_flags. If lookup succeeds
    but getattr fails to fill in vattr struct,
    return error code instead of just setting
    va_flags=VA_NOTAVAIL.

    ------

    APAR: IY23290 COMPID: 5765E5400 REL: 440
    ABSTRACT: NODE_UP_REMOTE NEEDS TO STOP APPS IN REVERSE ORDER - HAS,HAES

    PROBLEM DESCRIPTION:
    A customer noticed that during a node_up_remote event on a node
    which had resources for the remote node coming up, that it
    stopped the applications in listed order, rather than in reverse
    order. This caused a 'dependency' problem in the applications
    thatdepended on the first listed applications to be up to stop
    properly. However, with stopping the first listed applications,
    the dependent applications failed to halt properly.

    PROBLEM SUMMARY:
    Customer noticed that application resources taken over by one
    node, stopped the applications in listed order, when the
    failed node returned to the cluster. This caused a
    'dependency' problem with applications that depended on the
    first listed applications to be active, so they they could
    halt properly.
    Meaning, applications started later in the application list
    were dependent on the applications started sooner in the
    application list to start, run, and halt properly.

    PROBLEM CONCLUSION:
    Code in HACMP/HAES 'node_up_remote.sh' event script changed
    such that the applications to stop are listed and stopped in
    reverse order.

    ------

    APAR: IY23338 COMPID: 5765D9300 REL: 310
    ABSTRACT: CHECK SPELLING OF MP_WAIT_MODE ENVIRONMENT VARIABLE

    PROBLEM DESCRIPTION:
    Check spelling of MP_WAIT_MODE environment variable

    PROBLEM SUMMARY:
    To support a no-poll wait option in PSSP V3.3, allow for
    MP_WAIT_MODE=NOPOLL spelling in POE V3.1 release.

    PROBLEM CONCLUSION:
    Change in PE V3.1 will allow a PSSP V3.3 PTF to support a
    no-poll wait option.

    ------

    APAR: IY23363 COMPID: 5765E5100 REL: 601
    ABSTRACT: CUMULATIVE APAR FOR 6.0.1.1 FOR CS/AIX

    PROBLEM DESCRIPTION:
    Cumulative apar for 6.0.1.1 for CS/AIX.

    ------

    APAR: IY23493 COMPID: 5765E7200 REL: 310
    ABSTRACT: CIFS FAILS TO START NETWORK LOGON

    PROBLEM DESCRIPTION:
    Fast Connect fails to start Network Logon because
    PCs taking over Domain name.

    PROBLEM CONCLUSION:
    force to have this Domain name.

    ------

    APAR: IY23496 COMPID: 5765B9501 REL: 330
    ABSTRACT: MMCHMGR RESULTS IN LONG WAITERS-WAITING FOR SG MGR MIGRATE

    PROBLEM DESCRIPTION:
    mmchmgr results in long waiters-waiting for sg mgr migrate

    PROBLEM SUMMARY:
    Deadlock in GPFS when running mmchmgr with a quota command
    active.

    PROBLEM CONCLUSION:
    If a quota file token is lost to another node, migrating
    the FS mgr to another node will hang after it has
    blocked TM activity and then tries to flush the quota file
    which needs to get the token back. Sync/End the quota
    manager before blocking TM activity.

    ------

    APAR: IY23557 COMPID: 5765B9501 REL: 320
    ABSTRACT: MULTI NODE MKDIR'S FAILURES

    PROBLEM DESCRIPTION:
    multi node mkdir's failures

    PROBLEM SUMMARY:
    running a large number of inserts into the same directory in
    parallel from multiple systems causes an incorrect rejection
    of a mkdir

    PROBLEM CONCLUSION:
    Correct locking error in parallel inserts into large
    directories.

    ------

    APAR: IY23558 COMPID: 5765B9501 REL: 330
    ABSTRACT: MULTI NODE MKDIR'S FAILURES

    PROBLEM DESCRIPTION:
    multi node mkdir's failures

    PROBLEM SUMMARY:
    running a large number of inserts into the same directory in
    parallel from multiple systems causes an incorrect rejection
    of a mkdir

    PROBLEM CONCLUSION:
    Correct locking error in parallel inserts into large
    directories.

    ------

    APAR: IY23610 COMPID: 5765B9501 REL: 320
    ABSTRACT: BUFFERDATABLOCKNUM <= OFP -> METADATA.GETLAST

    PROBLEM DESCRIPTION:
    bufferdatablocknum <= ofp -> metadata.getlast

    PROBLEM SUMMARY:
    Gpfs self check logic detected an error at bufdesc.C line
    4591.

    PROBLEM CONCLUSION:
    mergeInode cannot release the cacheObj mutex and the
    atomVarLock on the inode while it swaps the fileSize
    it currently has with the fileSize it reads from disk.

    ------

    APAR: IY23612 COMPID: 5765B9501 REL: 330
    ABSTRACT: BUFFERDATABLOCKNUM <= OFP -> METADATA.GETLAST

    PROBLEM DESCRIPTION:
    bufferdatablocknum <= ofp -> metadata.getlast

    PROBLEM SUMMARY:
    Gpfs self check logic detected an error at bufdesc.C line
    4591.

    PROBLEM CONCLUSION:
    mergeInode cannot release the cacheObj mutex and the
    atomVarLock on the inode while it swaps the fileSize
    it currently has with the fileSize it reads from disk.

    ------

    APAR: IY23613 COMPID: 5765B9501 REL: 330
    ABSTRACT: VNOP_LOOKUP WITH VATTR ERROR RETURNS RC=0

    PROBLEM DESCRIPTION:
    If lookup succeeds but getattr fails to fill in vattr struct,
    return error code instead of just setting va_flags=VA_NOTAVAIL.

    PROBLEM SUMMARY:
    An NFS server node paniced in NFS exporting a GPFS file
    system.

    PROBLEM CONCLUSION:
    VNOP_LOOKUP with vattr was getting return code 0
    and NFS assumed that the vattr structure was filled in
    without looking at va_flags. If lookup succeeds
    but getattr fails to fill in vattr struct,
    return error code instead of just setting
    va_flags=VA_NOTAVAIL.

    ------

    APAR: IY23614 COMPID: 5765B9501 REL: 330
    ABSTRACT: INCORRECT ASSERTION CHECK ON SPARSE FILE

    PROBLEM DESCRIPTION:
    incorrect assertion check on sparse file

    PROBLEM SUMMARY:
    Correct an incorrect debug assertion discovered in
    development.

    PROBLEM CONCLUSION:
    Correct an incorrect debug assertion discovered in
    development.

    ------

    APAR: IY23706 COMPID: 5765E8500 REL: 200
    ABSTRACT: X25NPI CREATES A BAD MESSAGE SITUATION.

    PROBLEM DESCRIPTION:
    panic in streams freeb() > t -mk Skipping first MST
    MST STACK TRACE: 0x004a5eb0
    (excpt=00000000:00000000:00000000:00000000:00000000) (intpri=0)
            IAR: .panic_trap+0 (00012678): tweq r1,r1
            LR: . pse:freeb +38 (014c84c0) 34716a44:
            . pse:freemsg +20 (014bae18) 34716a84:
            . pse:flushq +1ec (014bb054) 34716ae4:
            . pse:sth_rput +5b0 (014d0460) 34716b44:
            . pse:csq_run +23c (014bde38) 34716ba4:
            . pse:csq_lateral +a4 (014bceac) 34716c04:
            . pse:putnext +1b4 (014ba05c) 34716c54:
            . npi:npirput +c0 (01624550) 34716ca4:
            . pse:csq_run +23c (014bde38) 34716d04:
            . pse:csq_lateral +a4 (014bceac) 34716d64:
            . pse:putnext +1b4 (014ba05c) 34716db4:
            . ldterm:ldtty_rput +248 (01555ec0) 34716e14:
            . pse:csq_run +23c (014bde38) 34716e74:
            . pse:csq_turnover +24c (014bd2e4) 34716ee4:
            . pse:csq_lateral +e0 (014bcee8) 34716f44:
            . pse:runq_run +c8 (014cd788) 34716fa4:
            . pse:flip_and_run +38 (014cd8e4) 34716ff4: .low+0
            (00000000) 004a5d50: . pse:flip_and_run +18 (014cd8c4)
            004a5d90: .i_offlevel+84 (0001c768) 004a5de0:
            .i_softmod+338 (0001c440) 004a5e70: flih_603_patch+cc
            (00028a0c)
    0x2ff3b400 (excpt=00000000:00000000:00000000:00000000:00000000
    ) (intpri=11)
       IAR: .waitproc+74 (000258ec): beq cr1,0x25900
       LR: .waitproc+a0 (00025918) 2ff3b388: .procentry+14
       (00097630) 2ff3b3c8: .low+0 (00000000)

    PROBLEM CONCLUSION:
    In npimod.c change npi_prov_event to return not successful if
    nothing to process. change npirsrv to push message on queue
    rather than the reque command.

    ------

    APAR: IY23737 COMPID: 5765E7200 REL: 310
    ABSTRACT: ADDITIONAL FIXES TO DCE REGISTRY

    PROBLEM DESCRIPTION:
    additional functionality need to be added to this new feature.

    PROBLEM CONCLUSION:
    Added additional functionality to this feature.

    ------

    APAR: IY23760 COMPID: 5765D5100 REL: 330
    ABSTRACT: VSD ASSERTED IN SNDHDRCMPL DUE TO A KLAPI RETURN CODE INDICATIN

    PROBLEM DESCRIPTION:
    VSD asserted in SndHdrCmpl because we got a KLAPI return code
    saying a request could not be sent when in fact the request
    had been sent, the I/O was completed and the response was back
    all before KLAPI claimed they could not send the request.

    PROBLEM SUMMARY:
    In a dual adapter configuration, VSD may assert
    in the SndHdrCmpl routine when running
    the KLAPI protocol due to timing issues.

    PROBLEM CONCLUSION:
    KLAPI was returning an error code claiming
    it could not send a request, when in
    fact the request had been sent, the I/O had completed
    and the reponse was already recieved. Under these
    conditions VSD will set the return code to 0 and
    continue processing the request normally.

    ------

    APAR: IY23784 COMPID: 5765B9501 REL: 320
    ABSTRACT: DEADLOCK DURING ABORT PREVENTS INTERNALDUMPS

    PROBLEM DESCRIPTION:
    deadlock during abort prevents internaldumps

    PROBLEM SUMMARY:
    Potential deadlock in termination of GPFS with heavy mmap
    activity.

    PROBLEM CONCLUSION:
    Deadlock in mmap handling during shutdown if one of the
    kprocs is waiting for a mailbox. mmap purge processing
    during preclean should not wait for all the kprocs to
    finish.

    ------

    APAR: IY23785 COMPID: 5765B9501 REL: 320
    ABSTRACT: AFTER TURNING OFF MMAP, KERNEXT REEDED RELOAD

    PROBLEM DESCRIPTION:
    afterning off mmap, kernext needed reload

    PROBLEM SUMMARY:
    Allow certain mmap debug cases without a reboot.

    PROBLEM CONCLUSION:
    Turn off mmapSupported when the daemon ends, so that it
    will be off if the daemon comes back up with mmap disabled.

    ------

    APAR: IY23786 COMPID: 5765B9501 REL: 330
    ABSTRACT: DEADLOCK DURING ABORT PREVENTS INTERNALDUMPS

    PROBLEM DESCRIPTION:
    deadlock during abort prevents internaldumps

    PROBLEM SUMMARY:
    Potential deadlock in termination of GPFS with heavy mmap
    activity.

    PROBLEM CONCLUSION:
    Deadlock in mmap handling during shutdown if one of the
    kprocs is waiting for a mailbox. mmap purge processing
    during preclean should not wait for all the kprocs to
    finish.

    ------

    APAR: IY23788 COMPID: 5765B9501 REL: 330
    ABSTRACT: AFTER TURNING OFF MMAP, KERNEXT REEDED RELOAD

    PROBLEM DESCRIPTION:
    afterning off mmap, kernext needed reload

    PROBLEM SUMMARY:
    Allow certain mmap debug cases without a reboot.

    PROBLEM CONCLUSION:
    Turn off mmapSupported when the daemon ends, so that it
    will be off if the daemon comes back up with mmap disabled.

    ------

    APAR: IY23792 COMPID: 5765E5400 REL: 440
    ABSTRACT: RECONFIG_REOURCES DOES NOT ACTIVATE SWBOOT AFTER DARE

    PROBLEM DESCRIPTION:
    Attempts to add an HPS network to a running SP cluster fail,
    because the necessary boot adapter aliases are not created
    during the DARE operation or the reconfig_topology event.

    PROBLEM CONCLUSION:
    Add code to the reconfig_topology_complete event script that
    will initialize the switch and add any required alias labels.

    ------

    APAR: IY23793 COMPID: 5765E5400 REL: 440
    ABSTRACT: HAGEO RESTART CLUSTER TOO QUICK LEAVES RSCT CONFUSED HAES

    PROBLEM DESCRIPTION:
    In a 4 node HAES cluster, cluster services on one node is
    stopped, graceful with takeover. The takeover finishes
    correctly. Cluster services is restarted on the stopped node,
    but none of the cluster nodes respond to the node's request to
    join. On the joining node and on its neighbor, the following
    error may be recorded in the error log:
    LABEL: GS_DOM_NOT_FORM_WA
    IDENTIFIER: AA8DB7B3
    Type: INFO
    Resource Name: grpsvcs
    Description: Group Services daemon has not been established.
    If left to run, this error will be logged once every 2 hours
    No errors are written to hacmp.out, and nothing is written to
    clstrmgr.debug on any of the nodes.

    PROBLEM CONCLUSION:
    If cluster services are restarted too quickly after stopping,
    the rsct daemons may not properly recognize the node has gone
    down and come back. This problem is aggravated by using HAGEO
    where the default timeout values for the GEO networks are
    greater than 60 seconds.
    The solution is to update the check in rc.cluster (which
    prevents one from restarting cluster services too qucikly) to
    account for the network tuneables.

    ------

    APAR: IY23794 COMPID: 5765E5400 REL: 440
    ABSTRACT: FIX VARIABLE EXPANSION IN NAMESERVER.VERIFY HAES

    PROBLEM DESCRIPTION:
    The DNS plugin does not start correctly, causing cluster
    verification and cluster synchronization to fail. The
    following is logged in /tmp/hacmp.out.
    Error messages in /tmp/hacmp.out that "file db.* does not
    exist or is zero length" even though the file is present
    Error messages in /tmp/hacmp.out that group id is incorrect
    even though it is.

    PROBLEM CONCLUSION:
    Change nameserver.verify.sh to use the correct value when
    comparing group id.
    Change nameserver.verify.sh to change to the correct
    directory before checking for existence of files.
    Changed nameserver.stop.sh and nameserver.cleanup.sh
    comparison from (($? = 0)) to if (($? == 0))

    ------

    APAR: IY23795 COMPID: 5765E5400 REL: 440
    ABSTRACT: CHANGE / SHOW TOPOLOGY AND GROUP SERVICES CONFIGURATION HAES

    PROBLEM DESCRIPTION:
    The following options can still be found on the
    Change / Show Topology and Group Services Configuration panel
    * Interval between Heartbeats (seconds) 1
    * Fibrillate Count 4
    these 2 entries are no longer used by topology services and
    should be removed from smit.

    PROBLEM CONCLUSION:
    Remove the unused smit panel options.

    ------

    APAR: IY23796 COMPID: 5765E5400 REL: 440
    ABSTRACT: EXTRANEOUS HPS NETWORK EVENTS AFTER MIGRATION HAES

    PROBLEM DESCRIPTION:
    During migration installation from a previous level of HAES,
    the customer will see incorrect network
    events for the sp switch - the network is up and functional
    but hacmp will generate network down events.

    PROBLEM CONCLUSION:
    Eliminate an out of date test in one of the cluster utilities.

    ------

    APAR: IY23797 COMPID: 5765E5400 REL: 440
    ABSTRACT: ARP USAGE INFO IN HACMP.OUT HACMP-HAES

    PROBLEM DESCRIPTION:
    Usage information for the ARP command is appearing in hacmp.out
    when arp commands are run by the swap_adapter event. This is
    caused by a change in output of the arp command between aix 4
    and aix 5.

    PROBLEM CONCLUSION:
    Add additional parsing of the arp output to filter the
    additional information in the aix 5 output.

    ------

    APAR: IY23799 COMPID: 5765E5400 REL: 440
    ABSTRACT: SYNCHRONIZING CLUSTER TOPOLOGY SHOULD NOT MAKE THE HANDLE

    PROBLEM DESCRIPTION:
    When upgrade from HAS 4.2.2 to HAS 4.3.1 each node will have a
    unique handle but if you synchronize topology they will all be
    equal.

    PROBLEM SUMMARY:
    Migration to HAES will not complete and topology services will
    loop with the following error message,
    errorCb: Sredrive already scheduled, return

    PROBLEM CONCLUSION:
    Make the HACMPcluster ODM handle value unique when cluster
    topology is synchronized.

    ------

    APAR: IY23800 COMPID: 5765E5400 REL: 440
    ABSTRACT: CLHARVEST_VG: PROBLEM ACCESSING THE CONFIGURATION FILE HAES

    PROBLEM DESCRIPTION:
    clharvest_vg: Problem accessing the configuration file.

    PROBLEM CONCLUSION:
    Fix package name so /usr/sbin/cluster/etc/config diredctory is
    added.

    ------

    APAR: IY23802 COMPID: 5765E5400 REL: 440
    ABSTRACT: CLCHPARAM: NODE NAME VERBOSE_LOGGING=HIGH DOES NOT EXIST

    PROBLEM DESCRIPTION:
    clchparam: Node Name VERBOSE_LOGGING=high does not exist.

    PROBLEM CONCLUSION:
    The object = "DEBUG_LEVEL" does not exist in the HAS HACMPnode
    ODM so a : should not be returned.

    ------

    APAR: IY23805 COMPID: 5765E8200 REL: 230
    ABSTRACT: UCFGGMD CAUSES SYSTEM CRASH, DATA STORAGE INTERRUPT

    PROBLEM DESCRIPTION:
    During HAGEO graceful down with takeover of one node in a 4
    node cluster with active remote GMD I/O, ucfggmd is run (from
    geo_stop_gmds) which causes the system to crash with a data
    storage interrupt.

    PROBLEM CONCLUSION:
    A programming error in the krpc kernel extension was fixed.

    ------

    APAR: IY23806 COMPID: 5765E8200 REL: 230
    ABSTRACT: HAGEO UTILITIES OPTION ON GEORM MAIN PANEL GEORM

    PROBLEM DESCRIPTION:
    HAGEO messages still appear in the geoRM smit panels.

    PROBLEM CONCLUSION:
    Replace the occurences of HAGEO with geoRM.

    ------

    APAR: IY23807 COMPID: 5765E8200 REL: 230
    ABSTRACT: GEORM: SMIT ADD NODE SCREEN SITE NAME IN WRONG PLACE GEORM

    PROBLEM DESCRIPTION:
    When adding a node via the SMIT Add Node screen
    (new_node.diaglog path) two fields are displayed in the
    incorrect order in relation to the values at the right -
    Site Name and Promote Failure Timeout. The command is
    built and executes properly.

    PROBLEM CONCLUSION:
    The entries in the catalog need to be reversed.

    ------

    APAR: IY23808 COMPID: 5765E7200 REL: 310
    ABSTRACT: FILES WITH .BMP CANNOT BE OPENED FROM CLIENTS

    PROBLEM DESCRIPTION:
    edit the file with .bmp extension and it fails to edit.

    PROBLEM CONCLUSION:
    enable the sharing for .bmp file.

    ------

    APAR: IY23816 COMPID: 5765B8100 REL: 220
    ABSTRACT: 12 SECOND SILENCE ON LINE CAUSES HANGUP ON FRENCH ISDN SYSTEMS

    PROBLEM DESCRIPTION:
    After short utterance spoken by caller for recognition, he gets
    12 second silence, then platform hangs up. This is on a French
    ISDN system.

    PROBLEM SUMMARY:
    After caller utterance, gets 12sec silence
    then platform hangs up. This is the case on French ISDN system.

    PROBLEM CONCLUSION:
    Correct handling when dealing with languages other than En_US.

    ------

    APAR: IY23833 COMPID: 5765E8200 REL: 230
    ABSTRACT: GEO_REMOTE_PEER_DOWN 258 : /GEO_SHOW_CONFIG: NOT FOUND HAGEO

    PROBLEM DESCRIPTION:
    Errors occur during cluster event processing, and the
    /tmp/hacmp.out file contains:
    Geo_remote_peer_down 258 : /geo_show_config: not found

    PROBLEM CONCLUSION:
    Modified the Geo_remote_peer_down script to correct a
    programming error.

    ------

    APAR: IY23834 COMPID: 5765E8200 REL: 230
    ABSTRACT: CHANGE START_SERVER TO RUN CFGGMD IN PARALLEL HAGEO

    PROBLEM DESCRIPTION:
    The Geo_start_server scripts together with other scripts
    used by HACMP to start GeoMirror devices configure the
    GMDs in serial. Although the startgmd command was modified
    to work in parallel no such advantage has been exploited
    in the scripts to run cfggmd in parallel.

    PROBLEM CONCLUSION:
    Replace the call to cfggmd in the scripts with section which
    builds a command line. The call the cfggmd command with this
    command line to start the GMDs in parallel.

    ------

    APAR: IY23837 COMPID: 5765E8200 REL: 230
    ABSTRACT: GEO_VERIFY REQUIRES IP LABEL SAME AS HOSTNAME/NODENAME

    PROBLEM DESCRIPTION:
    If the user selects node names which are not the
    same as the hostname of hte machines then geo_verify
    fails as it cannot get the size of the remote devices.

    PROBLEM CONCLUSION:
    Change the rpc.geod such that it uses an interface address to
    get to the remote host rather than the hostname itself.

    ------

    APAR: IY23858 COMPID: 5765E8500 REL: 200
    ABSTRACT: INVALID I_CLEAR() IN TWDINIT

    PROBLEM DESCRIPTION:
    PMR 23588,235,631
    MST STACK TRACE: 0x2ff3b400
    (excpt=00000000:42000000:60014468:20005030:00000106)
    (intpri=11)
            IAR: .i_clear+c4 (0001d35c): tweqi r3,0x0
            LR: .i_clear+48 (0001d2e0) 2ff3b070:
            . twd:twdinit +510 (0163ecc0) 2ff3b2d0: .config_dd+2c0
            (001bb37c) 2ff3b370: .sysconfig+17c (001bb95c)
            2ff3b3c0: .sys_call_ret+0 (00003a90) 00000001: .low+0
            (00000000)

    PROBLEM SUMMARY:
    Assert ini_clear()
    Assert in i_clear()
    MST STACK TRACE: 0x2ff3b400
    (excpt=00000000:42000000:60014468:20005030:00000106)
    (intpri=11)
            IAR: .i_clear+c4 (0001d35c): tweqi r3,0x0
            LR: .i_clear+48 (0001d2e0) 2ff3b070:
            . twd:twdinit +510 (0163ecc0) 2ff3b2d0: .config_dd+2c0
            (001bb37c) 2ff3b370: .sysconfig+17c (001bb95c)
            2ff3b3c0: .sys_call_ret+0 (00003a90) 00000001: .low+0
            (00000000)

    PROBLEM CONCLUSION:
    remove i_clear(&brd_ictx dd.brd_num .sintr);
    remove i_clear (&brd_ictx dds.brd_num .sintr) ;

    ------

    APAR: IY23859 COMPID: 5765E7200 REL: 310
    ABSTRACT: CHANGE GROUP TYPE OF DOMAIN NAME.

    PROBLEM DESCRIPTION:
    Can't start networklogon support on FastConnect server,
    PC claim to have the domain name

    PROBLEM CONCLUSION:
    register the domain name as group type instead of unique name
    type

    ------

    APAR: IY23869 COMPID: 5765E7200 REL: 310
    ABSTRACT: RAS: OUTPUT CIFS-BUILD INFO TO CIFSLOG

    PROBLEM DESCRIPTION:
    CIFS customers needing service may need to supply extra
    data-collection, etc. to determine exact CIFS-version.

    PROBLEM CONCLUSION:
    Change cifsServer Makefile to timestamp every CIFS build,
    and output that build-time to cifsLog when starting CIFS.

    ------

    APAR: IY23875 COMPID: 5765E5100 REL: 600
    ABSTRACT: LIMITED-RESOURCE SESSION INDICATOR IS SET IN NLP.

    PROBLEM DESCRIPTION:
    Limited-resource session indicator ( bind offset 25) is set
    in NLP unexpectedly.

    PROBLEM SUMMARY:
    HPR sessions unexpectedly set as limited-resource.

    PROBLEM CONCLUSION:
    Code changed to not always change the BIND to limited resource
    in the HPR layer.

    ------

    APAR: IY23927 COMPID: 5765E8500 REL: 200
    ABSTRACT: MCA ARTIC960 PORTS GET INTO THE DEFINE STATE AFTER REBOOT

    PROBLEM DESCRIPTION:
    Cannot create ports for ARTIC960 MCA adapter after upgrading
    to sx25.rte 1.1.5.16.

    PROBLEM CONCLUSION:
    This recovery procedure was intended for artic960hx. It was
    implemented into the MCA as a posible recovery. The procedure
    will be take out from the microcode of the MCA.

    ------

    APAR: IY23961 COMPID: 5765B9501 REL: 330
    ABSTRACT: ASSERT SUBROUTINE FAILED: WASASSIGNED BUFMGR.C, LINE 5285

    PROBLEM DESCRIPTION:
    unassignBuffer has to wait if the buffer is being unpinned by
    some other thread.

    PROBLEM SUMMARY:
    GPFS self check logic asserted at bufmgr.C line 5285

    PROBLEM CONCLUSION:
    unassignBuffer has to wait if the buffer is being unpinned
    by some other thread.

    ------

    APAR: IY23966 COMPID: 5765B9501 REL: 330
    ABSTRACT: ASSERT: LOP->GET_OBJ_STATUS() == LKOBJ::VALID

    PROBLEM DESCRIPTION:
    Assert: loP->get_obj_status() == LkObj::valid
    A failure to allocate a new data block in modifyBuffer, e.g.,
    due to quota limit, was leaving the BufferDesc in a half-valid
    state, causing assert when attempting to read the buffer later

    PROBLEM SUMMARY:
    GPFS self check logic asserted at: bufmgr.C, line 4081

    PROBLEM CONCLUSION:
    When looping through buffers to free off the clock list,
    reset the err variable each time around the loop, and ignore
    recently unpinned buffers.

    ------

    APAR: IY23968 COMPID: 5765B9501 REL: 330
    ABSTRACT: ASSERT Q->MEMSTATE == BUFFER::MEMPINNED BUFMGR.C,

    PROBLEM DESCRIPTION:
    Assert q->memState == Buffer::memPinned bufmgr.C, line
    When looping through buffers to free off the clock list,
    reset the err variable each time around the loop and ignore

    PROBLEM SUMMARY:
    GPFS self check logic declared an error when
    trying to reopen a file where the last write had failed due
    to the user quota being exceeded.

    PROBLEM CONCLUSION:
    A failure to allocate a new data block
    in modifybuffer, e.g. due to quota limit, was leaving the
    bufferdesc in a half-valid state, causing assert when
    attempting to read the buffer later.

    ------

    APAR: IY23981 COMPID: 5765B9501 REL: 320
    ABSTRACT: ASSERT Q->MEMSTATE == BUFFER::MEMPINNED BUFMGR.C,

    PROBLEM DESCRIPTION:
    Assert q->memState == Buffer::memPinned bufmgr.C, line
    When looping through buffers to free off the clock list,
    reset the err variable each time around the loop and ignore

    PROBLEM SUMMARY:
    GPFS self check logic declared an error when
    trying to reopen a file where the last write had failed due
    to the user quota being exceeded.

    PROBLEM CONCLUSION:
    A failure to allocate a new data block
    in modifybuffer, e.g. due to quota limit, was leaving the
    bufferdesc in a half-valid state, causing assert when
    attempting to read the buffer later.

    ------

    APAR: IY24250 COMPID: 5765B8100 REL: 220
    ABSTRACT: SYSTEM CRASH DURING DISABLE_CHANNEL

    PROBLEM DESCRIPTION:
    Disable_Cahnnel causing system crash from .kwakeup at
    .simple_lock+18

    PROBLEM CONCLUSION:
    Make closing of a channel fd scan through
    each channel to find any channels referencing this
    fd and clean them up

    ------

    APAR: IY24428 COMPID: 5765E8200 REL: 230
    ABSTRACT: SYSTEM CRASH DURING GMD MKDEV

    PROBLEM DESCRIPTION:
    When a geomirror device is made Available by mkdev, the system
    will frequently crash (888 102 700).

    PROBLEM CONCLUSION:
    The GMD device driver was not functioning correctly on
    AIX 5.1 while configuring a GMD device that uses mwc mode.
    The problem has been corrected.

    ------

    APAR: IY24564 COMPID: 5765D5100 REL: 330
    ABSTRACT: REQUIRED UPGRADES FOR R3.3.0

    PROBLEM DESCRIPTION:
    required upgrades for R3.3.0

    ------

    APAR: IY24671 COMPID: 5765C3403 REL: 430
    ABSTRACT: MEDIA ERRORS NOT PROPERLY REPORTED ON IDE CD-ROM

    PROBLEM DESCRIPTION:
    When the cdrom drive has problem reading the CD-ROM media,
    (due to scratches, incompatible data format, etc.), the error
    does not get properly reported to the application. As a
    result, the user may unknowingly treat the bad data as good
    data from the CD.

    PROBLEM CONCLUSION:
    the problem is caused by the hardware DMA engine incorrectly
    reporting the status. to fix this, the device driver will
    look at both the DMA status, as well as the device's (cdrom
    drive's) status and report any error encountered.

    ------

    APAR: IY24926 COMPID: 5765D5100 REL: 320
    ABSTRACT: LATEST PSSP 3.2.0 FIXES AS OF OCTOBER 2001

    PROBLEM DESCRIPTION:
    This is the lastest PSSP ptf as of October 2001.
    Order this apar to get all of the ptfs as of October 2001.

    PROBLEM SUMMARY:
    This is a packaging apar for PSSP 3.2.0 fixes
    as of October 2001.

    PROBLEM CONCLUSION:
    This is a packaging apar for PSSP 3.2.0
    fixes as of October 2001.

    ------