OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
From: AIX Service Mail Server (aixservaustin.ibm.com)
Date: Tue May 08 2001 - 02:18:10 CDT

  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

    APAR: IY10868 COMPID: 576554801 REL: 110
    ABSTRACT: MSGIGYDS0220-S UNABLE TO LOCATE THE DB2 PRODUCT

    PROBLEM DESCRIPTION:
    MSGIGYDS0220-S Unable to locate the DB2 product.
    The cob2 command forces the LIBPATH to its own location
    thus overriding whatever the user specified.
    CMVC defect number is 20304

    LOCAL FIX:
    $ ln -s /usr/lpp/db2_05_00/lib/libdb2.a .

    PROBLEM SUMMARY:
    When multiple releases of DB2 are installed,
    the DB2 shared library libdb2.a cannot be found in /usr/lib.
    COB2 was coded to set LIBPATH to point to /usr/lib. There was
    no way to cause COB2 to include other paths, so the path to the
    libdb2.a to be used to compile a program that includes EXEC SQL
    statements could not be specified to the compiler.

    PROBLEM CONCLUSION:
    COB2 will be modified to append the paths
    needed by the compiler to the existing value of LIBPATH rather
    than replacing the value of LIBPATH.

    ------

    APAR: IY12785 COMPID: 569692600 REL: 110
    ABSTRACT: X.25/ARTIC960HX PERFORMANCE PROBLEM WITH BOS.MP/UP 4.3.3.16

    PROBLEM DESCRIPTION:
    Only on the ARTIC960HX PCI Adapter,
    'mkdev -l sx25a0' intermittently returns error "Method
    er/usr/lib/methods/cfgsx25):", "0514-048 Error downloading
    microcode or software.", "cfgsx25: tw_get(PH_OK_ACK) failed" or
    "cfgsx25: tw_get(DL_OK_ACK) failed".
    After the port is made and gets connected, X.25 data transfer
    runs very slow over the ARTIC960HX PCI Adapter.

    LOCAL FIX:
    Use a modified bos.mp/up module

    PROBLEM CONCLUSION:
    There are two drivers, ddriciop and twd, which work with the
    ARTIC960Hx (ricio) card. Each installed its own interrupt
    handler for the card. twd needs to be first in the chain.
    Due to a change made in interrupt handling in bos.(mp/up),
    we needed to have only one interrupt handler (from ddriciop)
    installed. Now twd's interrupt handler is registered with
    ddriciop.

    ------

    APAR: IY15383 COMPID: 5765D5100 REL: 311
    ABSTRACT: SWITCH WENT TO DOWN

    PROBLEM DESCRIPTION:
    Switch went to down.
    2510-898 unable to access SDR to get the list of auto-join nodes
      rc= -1.
    2510-195 The fault service daemon got a SIGTERM signal.

    PROBLEM SUMMARY:
    Closed the window so the child process will exit on SIGTERM
    without resetting the adapter.

    PROBLEM CONCLUSION:
    There is a small window where the primary forks a child
    process and an SDR test is run where a SIGTERM to the child
    will result in the child call the standard SIGTERM handler
    and reset the adapter.

    ------

    APAR: IY16163 COMPID: 5765E5400 REL: 440
    ABSTRACT: CLUSTER.ES.SERVER.RTE 4.4.0.5 FAILS IF LINK EXISTS

    PROBLEM DESCRIPTION:
    Install of cluster.es.server.rte 4.4.0.5 fails with error:
    ln: 0653-421 /etc/objrepos/HACMPdisktype exists.

    PROBLEM CONCLUSION:
    Check if /etc/objrepos/HACMPdisktype exists before creating
    link.

    ------

    APAR: IY16416 COMPID: 5765D5100 REL: 311
    ABSTRACT: PSSP_SCRIPT NEEDS TO INSTALL DEVICES.CHRP.BASE.RTE IS

    PROBLEM DESCRIPTION:
    The PSSP 3.x pssp_script fails to install (migrate to)
    PSSP 2.4 on MCA nodes because devices.chrp.base.rte
    gets installed only on PCI nodes. But this fileset is
    a prereq of PSSP 2.4, so it is needed on MCA nodes too.
    The part of pssp_script to install devices.chrp.base.rte
    currently is (I have to wrap lines to fit into SSF)
    # Defect 46958: remove -c (commit) flag for AIX filesets
    oslvl=$($oslevel)
    if $oslvl = $os415 && $oslvl = $os414 ; then
    if -z $($lslpp -qh devices.chrp.base.rte 2>/dev/null)
       && $platform = "chrp" ; then #-
       $installp -abgXd/mnt devices.chrp.base.rte
    ==> only on oslevel >=4.2.0.0 and on platform==chrp
        devices.chrp.base.rte will be installed
    This needs to be changed to
    if $oslvl = $os415 && $oslvl = $os414 ; then
    if -z $($lslpp -qh devices.chrp.base.rte 2>/dev/null)
       && $platform = "chrp" || "$code_version" = "PSSP-2.4"
       then #-
       $installp -abgXd/mnt devices.chrp.base.rte
    ==> devices.chrp.base.rte will be installed if
        oslevel>=4.2.0.0 and (platform==chrp or
        code_version==PSSP-2.4)

    PROBLEM SUMMARY:
    pssp_script only installs the fileset devices.chrp.base.rte
    on chrp nodes. However the ssp.basic fileset in PSSP 2.4
    requires devices.chrp.base.rte, regardless of the type of
    node. pssp_script needs to be modified to install the
    fileset devices.chrp.base.rte when either the node
    platform is chrp, or the PSSP level of the node is 2.4.

    PROBLEM CONCLUSION:
    pssp_script has been modified to install the
    fileset devices.chrp.base.rte when either the node
    platform is chrp, or the PSSP level of the node is 2.4.

    ------

    APAR: IY16816 COMPID: 5765E5400 REL: 440
    ABSTRACT: NIM_TOK CORE DUMP AND FAIL_STANDBY EVENT AFTER ADDING 64

    PROBLEM DESCRIPTION:
    When adding 64 aliases to the service adapter, the
    associated nim may core dump and generate a fail_standby
    event.

    PROBLEM CONCLUSION:
    Increase the buffer set aside for the nmAdapter list and
    skip logic that creates a socket for each alias.

    ------

    APAR: IY16985 COMPID: 5765D6100 REL: 210
    ABSTRACT: LLCANCEL POE INTERACTIVE RUNNING JOB W/EXTERNAL SCHEDULER

    PROBLEM DESCRIPTION:
    Interactive poe running jobs using external scheduler the
            llcancel command can not stop the poe job.

    LOCAL FIX:
    In order to cancel the poe interactive running job w/external
            scheduler. Only the ctrl C can kill the poe job not
            llcancel for now.

    PROBLEM SUMMARY:
    The LoadLeveler command llcancel would not be able
    to cancel a running POE interactive job with
    external scheduler set.

    PROBLEM CONCLUSION:
    The LoadLeveler command llcancel would now be able
    to cancel a running POE interactive job with
    external scheduler set.

    ------

    APAR: IY17133 COMPID: 5765D5100 REL: 311
    ABSTRACT: NODES NEED BOOTED TWICE TO UPDATE TUNING.CUST

    PROBLEM DESCRIPTION:
    Because boot procedure always runs tuning.cust locally, *then*
    checks for customize and ftp's tuning.cust from CWS, the node
    must be rebooted a second time for changes in tuning.cust to
    actually be applied to the node.

    LOCAL FIX:
    Reboot nodes a second time after tuning.cust has been ftp'd
    through customize reboot.

    PROBLEM SUMMARY:
    During a node's customization, tuning.cust is ftp'd from
    the node's Boot/Install Server. However, tuning.cust will
    not be executed until the next reboot of the node.
    pssp_script should be modified so that during a node's
    customization, tuning.cust will be executed.

    PROBLEM CONCLUSION:
    pssp_script has been modified so that during a node's
    customization, tuning.cust will be executed.

    ------

    APAR: IY17237 COMPID: 5765D5100 REL: 311
    ABSTRACT: SPADAPTR ERROR WITH '-S YES' AND LARGE NODE NUMBER

    PROBLEM DESCRIPTION:
    Using spadaptr with the '-s yes' option and a large number of
    nodes can cause incorrect ip addresses to be calculated,
    resulting in error message 0022-047.

    PROBLEM SUMMARY:
    A customer was using spadaptrs to enter data for a large
    number of css adapters using the switch node numbers.
    When a node with a high node number, but a low switch
    number was encountered, the third octet of the IP address
    was calculated incorrectly.
    During the processing of all the nodes, the third octet
    of the IP address had been incremented because the fourth
    octet had exceeded 255. When the node with a high node
    number was being processed it used the incremented third
    octet number instead of the original value. Since this
    calculated IP address was not a valid IP address, an error
    message was issued stating that the IP address could not
    be resolved and spadaptrs terminated.

    PROBLEM CONCLUSION:
    spadaptrs was modified to correct the generation of IP
    addresses for css adapters, when the switch node numbers
    are used. Certain values were not being reset, which
    caused the third octet of the generated IP address to be
    incorrect, which could cause spadaptrs to fail.

    ------

    APAR: IY17321 COMPID: 5765D5101 REL: 111
    ABSTRACT: ORACLE CAN NOT CONNECT HAGSD WITH MANY CLIENTS

    PROBLEM DESCRIPTION:
    Oracle can not connect hagsd with many clients.

    PROBLEM SUMMARY:
    Group Services daemon currently limits the maximum
    pending connections to 5. Therefore, if there are
    many connection requests(e.g., over 500) have been
    made to Group Services at a short time, some of
    the requests may be disallowed with "ECONNREFUSED".
    By extending the maximun pending queue length
    (called, backlog size) to the maximum configured
    queue length (i.e., no -o somaxconn), the unexpected
    connection refusal will be resolved.

    PROBLEM CONCLUSION:
    This fix will enable Group Services subsystem
    to handle many concurrent connection requests
    even if the requests have made at very short
    period.

    ------

    APAR: IY17369 COMPID: 5765D5101 REL: 111
    ABSTRACT: HAGS BROADCAST METHOD CAN CAUSE STORMS ON INSTALLTIONS WITH

    PROBLEM DESCRIPTION:
     When HAGS needs a broadcast, it first tries to send the msg
    out to every body (burst broadcast). Rebroadcasting to the
    undelivered nodes will be performed in every 3 seconds. This may
    cause a big overhead if the number of nodes is large. This has
    been seen to cause hags voting issues.

    PROBLEM SUMMARY:
    Whenever Group Services needs a broadcast,
    it first tries to send the messages to
    all nodes and retries the broadcast
    the messages to the unresponded nodes
    in every 3 seconds.
    This behavior may increase the overload
    to the IP stack particularly if the
    number of nodes is large, and thus
    it may increase the message drop rate
    and cause more retries which may delay
    the Group Services' protocol completion.

    PROBLEM CONCLUSION:
    On a big system (with more than 64 nodes), this
    fix will lessen the overhead to broadcast
    Group Services messages by spreading out
    the message sends.

    ------

    APAR: IY17580 COMPID: 5765D5100 REL: 311
    ABSTRACT: INSUFFICIENT STACK FOR KICKPIPES() IN MPCI, CAUSES A PROBLEM

    PROBLEM DESCRIPTION:
    PSSP 3.1.1 introduced local var 'shoveq' & 'frq' in kickpipe(
    in MPCI. They need stack frame 8192 Bytes, but they are 4096
    Bytes. This fact causes a problem for Informix down.

    PROBLEM SUMMARY:
    Running Informix on an SP system, can fail if
    a query with 2000 or's is done. Informix may detect a
    corruption of the header of its stack block pool, and quit.

    PROBLEM CONCLUSION:
    MPCI, which is used by Informix, added a
    couple of large stack variables for shared memory support.
    This causes a problem, for Informix, because Informix both uses
    MPCI and manages its own threading and stacks. Informix's
    current management does not account for the addition of 8K of
    additional stack space for the MPCI routines that Informix
    calls. MPCI changed the declaration of these new large
    variables so that they are now locatd in the heap, instead
    of the stack. The new MPCI implementation solves the problem
    that Informix had working with our MPCI environment and is
    probably the better way for MPCI to handle these large
    variables.

    ------

    APAR: IY17883 COMPID: 5765D5100 REL: 311
    ABSTRACT: VSD.RESERVE UNNECESSARILY OBTAINING DCE CREDENTIALS

    PROBLEM DESCRIPTION:
    vsd.reserve unnecessarily obtaining DCE credentials

    PROBLEM SUMMARY:
    The vsd.reserve executable, which is used to reserve a
    volume_group, is unnecessarily trying to obtain DCE
    credentials when it determines there is a mismatch in
    the timestamps for the last time the volume was varied on.
    The error can be seen in the console log:
    checkvg_timestamps 174 : /usr/lpp/ssp/bin/dsrvtgt: not found
    rvsd(recov) 03/30/01 11:46:42 vsd.reserve:
    checkvg_timestamps: Failed to obtain
    DCE credentials; rc=127. Continuing.

    PROBLEM CONCLUSION:
    The code to obtain the DCE credentials has been removed.

    ------

    APAR: IY17976 COMPID: 5765E5400 REL: 440
    ABSTRACT: AFTER NODEXNODE STOP OF NODE SHOWS HACMPRD NOT CLSOMETIMES THIS

    PROBLEM DESCRIPTION:
    Administration Guide SC23-4279-01, page 5-11 includes a Note
    that reads:
    Make sure that the concurrent access volume group is in a
    quiescent state (no I/O operations in progress) before
    executing these reconfiguration commands.
    This Note appears to imply that in order to reach a quiescent
    state all applications running in a cluster node should be
    stopped.

    PROBLEM SUMMARY:
    After a node by node migration (successful one) bring a node
    down smitty graceful shows the string "hacmprd" and not
    "clstrmgr". "hacmprd" is the old recovery driver syntax.

    PROBLEM CONCLUSION:
    Change the name in the default message.

    ------

    APAR: IY18037 COMPID: 5765E5400 REL: 440
    ABSTRACT: TWO NODES HAVE SAME RESOURCE GROUP AFTER DARE WHILE ONE NODE WAS

    PROBLEM DESCRIPTION:
    The customer had a cascading mutual takeover configuration with
    inactive takeover set to false. One node was taken down with
    takeover and powered off for a maintenance problem. While that
    node was still down, the customer had a problem on the other
    node which required it to be rebooted. Thus when starting
    HACMP back up on that node, the resource group normally owned
    by the other node was not taken. The customer then ran a dare
    to move that resource group sticky (required) to the node that
    was up to get it active again. Though verification errors had
    to be ignored in order to get this to happen, and warnings were
    given to sync the config to the powered off node before bringing
    it into the cluster, we also stated that the node would not be
    allowed into the cluster until this sync was done. However,
    when that node was started back into the cluster, there was
    nothing that detected the out of sync condition, and so let
    the node join the cluster resulting in it taking the resources
    without the other node releasing them.

    PROBLEM SUMMARY:
    The customer had a cascading mutual takeover configuration with
    inactive takeover set to false. One node was taken down with
    takeover and powered off for a maintenance problem. While that
    node was still down, the customer had a problem on the other
    node which required it to be rebooted. Thus when starting HACMP
    back up on that node, the resource group normally owned by the
    other node was not taken. The customer then ran a dare to move
    that resource group sticky (required) to the node that was up
    to get it active again. Though verification errors had to be
    ignored in order to get this to happen, and warnings were given
    to sync the config to the powered off node before bringing it
    into the cluster, we also stated that the node would not be
    allowed into the cluster until this sync was done. However,
    when that node was started back into the cluster, there was
    nothing that detected the out of sync condition, and so let the
    node join the cluster resulting in it taking the resources
    without the other node releasing them.

    PROBLEM CONCLUSION:
    Add a check in the rc.cluster script to compare resource ODMs
    with active nodes resource ODMs.

    ------

    APAR: IY18046 COMPID: 5765E5400 REL: 440
    ABSTRACT: AFTER N-1 NODEXNODE W/REBOOT: LOCK MANAGER RETURNS CLM_NOLOCKMG

    PROBLEM DESCRIPTION:
    In a 4 node cluster at HAS 440 and set for node by node
    migration the 1st 3 nodes that are "migrated" return
    CLM_NOLOCKMGR when requesting a lock. However after the last
    node is migrated with a node by node the lock manager responds
    correctly on all nodes.

    PROBLEM CONCLUSION:
    Replace /usr/lib/libclm.a and /usr/lib/libclm_r.a with
    /usr/es/lib/libclm.a and /usr/es/lib/libclm_r.a

    ------

    APAR: IY18049 COMPID: 5765E5400 REL: 440
    ABSTRACT: HAES: FAILURE CYCLE = 32 FOR ATM: TOO LONG

    PROBLEM DESCRIPTION:
    The default failure cycle in the HACMPnim class is set to 32,
    resulting in a long wait for death detection.

    PROBLEM CONCLUSION:
    The default setting of the ATM failure cycle will be changed
    from 32 to 8.

    ------

    APAR: IY18059 COMPID: 5765E5400 REL: 440
    ABSTRACT: ON FALLOVER, STANDBY ADAPTER MARKED DOWN IS SOMETIMES SELECTED

    PROBLEM DESCRIPTION:
    node has two standbys. A service address fails (unplugged).
    Swap_adapter completes sucessfully and standby is marked
    down. Fallover occurs. It fails even though there is a
    second standby marked up becuase the standby which is down
    is selected for the service address of the failed node.

    PROBLEM SUMMARY:
    node has two standbys. A service address fails (unplugged).
    Swap_adapter completes sucessfully and standby is marked
    down. Fallover occurs. It fails even though there is a
    second standby marked up becuase the standby which is down
    is selected for the service address of the failed node.

    PROBLEM CONCLUSION:
    Modify clstrmgr so that it exports DOWN for standby adapters
    which are down instead of doing nothing thus making it
    compatible with HAS.

    ------

    APAR: IY18077 COMPID: 5765D5100 REL: 311
    ABSTRACT: VSDVGTS -A TOO SLOW WITH MANY HDISKS AND VPATHS

    PROBLEM DESCRIPTION:
    vsdvgts -a too slow with many hdisks and vpaths

    PROBLEM SUMMARY:
    Several scalability performance problems have been
    observed when a node has a large number of disks
    and/or volume groups being managed by RVSD.

    PROBLEM CONCLUSION:
    The following changes will be made in RVSD to address
    the performance problems observed when a node has
    a large number of disks and/or volume groups.
     - The vsdvgts command has been changed to not use
       the lspv command to determine volume group membership.
     - The RVSD recovery scripts will limit the number of
       varyonvg/varoffvg that can occur in parallel in order
       to reduce ODM lock contention.

    ------

    APAR: IY18128 COMPID: 5697F6400 REL: 640
    ABSTRACT: FIXES TO MINOR DEFECTS IN MESSAGECENTER 6.4

    PROBLEM DESCRIPTION:
    Fixes to minor defects in messagecenter 6.4

    ------

    APAR: IY18282 COMPID: 5765E5400 REL: 440
    ABSTRACT: NXN MIGRATION IS BROKEN FOR CLUSTERS WITH SERIAL NETWORKS

    PROBLEM DESCRIPTION:
    HACMP/ES and RSCT daemons are not started
    After installing HACMP/ES 4.4.0 + pmrs (ptf set 4)
    for node by node migration. The problem is that
    cllsif fails in clstart because the HACMPadapters
    ODM has been corrupted.

    PROBLEM CONCLUSION:
    modify cluster.es.server.rte.post_u so that it correctly
    parses and changes HACMPadapers file.

    ------

    APAR: IY18289 COMPID: 5765C3403 REL: 430
    ABSTRACT: PRINTING PROBLEM IN WIN95

    PROBLEM DESCRIPTION:
    Printing Problem in Win95

    PROBLEM SUMMARY:
    Printing Problem in Win95. When a file is printed to a network
    printer, only first 18 characters of the filename appear in
    the Document name on

    PROBLEM CONCLUSION:
    Job number will be sent instead of filename, which will be
    helpful in differentiating different print jobs.

    ------

    APAR: IY18398 COMPID: 5765E2600 REL: 502
    ABSTRACT: OUTPUT OF COUT/CERR FROM DESTRUCTOR OF STATIC OBJECT DOES NOT AP

    PROBLEM DESCRIPTION:
    When using the C & C++ Compilers 3.6.6 compiler with the
    5.0.2.0 level of the C++ Runtime Library the output of
    cout and cerr statements in the destructors of static objects
    does not appear.
    For example in the following program:
    #include <stream.h>
    class bogus
    {
    public:
        bogus() { cout << "Initialize\n"; }
        ~bogus() { cout << "Clean up\n"; }
    };
    bogus y;
    main()
    {
        cout << "Hello, world\n";
        return 0; // DELETE
    }
    The expected output is:
    Initialize
    Hello, world
    Clean up
    but if the program is built with 3.6.6 and the 5.0.2.0 runtime
    the output "Clean up" from the destructor of the static object
    does not appear.

    PROBLEM CONCLUSION:
    Fixed in VisualAge C++ 5.0.2.1 runtime PTFs

    ------

    APAR: IY18538 COMPID: 5765B8100 REL: 220
    ABSTRACT: SUPPRESS CA_TDM_CONNECT ERROR IF RC=CA_HANGUP

    PROBLEM DESCRIPTION:
    Suppress CA_TDM_Connect error if RC=CA_HANGUP

    PROBLEM CONCLUSION:
    Error was suppressed if RC=CA_HANGUP

    ------

    APAR: IY18595 COMPID: 5765E6110 REL: 220
    ABSTRACT: REQUIRED UPDATES FOR RSCT VERSION 2.2

    PROBLEM DESCRIPTION:
    Required updates for RSCT Version 2.2

    PROBLEM SUMMARY:
    These updates must be applied if you are using
    WebSM or have the PSSP or HACMP/ES products installed.

    PROBLEM CONCLUSION:
    These updates must be applied if you are
    using WebSM or have the PSSP or HACMP/ES products installed.

    ------

    APAR: IY18632 COMPID: 5765C3403 REL: 430
    ABSTRACT: SET SO_KEEPALIVE ON ON CLIENT CONNECTION SOCKETS.

    PROBLEM DESCRIPTION:
    If client system crash and re-connect to server, old client's
    connection and session are not deleted and removed.

    PROBLEM CONCLUSION:
    set sock option SO_KEEPALIVE on the connection socket, so
    the recv() would not block after the tcp_keepidle expired.

    ------