|
Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com |
From: AIX Service Mail Server (aixserv
austin.ibm.com)Date: Tue Jun 12 2001 - 02:21:11 CDT
APAR: IY15379 COMPID: 5765D5100 REL: 320
ABSTRACT: DOUBLE FREE() IN FAULT_SERVICE_WORM
PROBLEM DESCRIPTION:
Problem:
While running the fault_service_Worm with DEBUG_MALLOC
set, the worm cored due to freeing the same space twice.
PROBLEM SUMMARY:
In the check_compat routine, a pointer was used to traverse
a list of integers. the pinter was incremented as each
integer was inspected. After the traversal the pointer was
passed to free(). However at this point the pointer had been
incremented and was not pointing to the head of the
allocated storage anymore.
PROBLEM CONCLUSION:
The solution to the coding error is to use a separate
pointer to traverse the list, keeping the original pointer
value to use in the free() call.
------
APAR: IY15774 COMPID: 5765D5100 REL: 320
ABSTRACT: SMITTY AND PRESPECTIVES DO NOT SUPPORT PARTITION SIZE MORE THAN
PROBLEM DESCRIPTION:
smitty and prespectives do not support partition size more than
256 MB when user tries to create VSD.
PROBLEM SUMMARY:
The createvsd command supports physical partition sizes of
512 and 1024 mb, but the smit interface was never updated
to accept these values. Perspectives, which invokes smit
config_data has the same problem.
PROBLEM CONCLUSION:
The smit panel "Create a Virtual Shared Disk" has been
expanded to allow physical partition sizes of 512 and 1024
megabytes.
------
APAR: IY16126 COMPID: 5765D5101 REL: 120
ABSTRACT: ADAPTER IS BEING DISABLED FASTER THAN IS SHOULD WHEN A NEW
PROBLEM DESCRIPTION:
Adapter being disabled because of issues with adapter_may_
_down field not being reset when a new group is committed. Also
possible where the incoming_bcasts_cnt and incoming_unicasts_cnt
fields are reset.
PROBLEM SUMMARY:
Topology Services has some logic that considers
as down an adapter that is apparently only able
to receive broadcast messages. Receiving only
broadcast messages may point to a problem with
either the network or with routing at the local (or
maybe remote) adapter. Once Topology Services
detects the adapter is unable to join any
adapter membership group with its peers, and
is only receiving broadcast messages, then it
notifies other subsystems (like Group Services
or indirectly HACMP) that the local adapter is
down.
A problem in the logic above sometimes causes
an adapter that is receiving only broadcast
messages to be flagged as down too soon --
before Topology Services can be really sure
that only broadcast messages are being received.
PROBLEM CONCLUSION:
The logic used to flag as down an adapter that
is receiving only broadcast messages has been
modified. With the change, Topology Services will
ensure that the adapter is indeed only receiving
broadcast messages for an appropriate time period
before notifying other subsystems that the
adapter is down.
------
APAR: IY16244 COMPID: 5765D2800 REL: 430
ABSTRACT: CSPOC ADD USER VERY SLOW
PROBLEM DESCRIPTION:
cspoc add user very slow because if many users in /etc/passwd
(>7000) the "lsuser -a id ALL" will use 2 minutes of CPU time
on a RS/6000 model F50.
PROBLEM SUMMARY:
If system has >5000 users CSPOC mkuser can take two minutes.
PROBLEM CONCLUSION:
Replace lsuser with awk on /etc/passwd.
------
APAR: IY16308 COMPID: 5765D5100 REL: 320
ABSTRACT: SETUP_SERVER FAILS WITH ERROR 0016-014 NOT GETTING RELIABLE
PROBLEM DESCRIPTION:
setup_server fails with error 0016-014 not getting a reliable
hostname. When installing any SP system and adding nodes,
frames or attached servers there are many situations where
SDR-config has run and created the node object but the customer
has not entered all the data for the node object. In this case
fields like reliable_hostname are set to "". When setup_server
runs it fails with the 0016-014 error and sets the fatal
setup_server processing incomplete (rc=2) message. setup_server
should just continue and set an informational rc=1 message.
PROBLEM SUMMARY:
When setup_server is executed with incomplete data
information located in the node object
(ie: reliable_hostname="") the following error message
is displayed:
setup_server: 0016-014 Problem found while querying SDR
for reliable hostnames. SDR Return Code 2.
setup_server: Processing incomplete (rc= 2).
After the message is redirected to SDTOUT the command
immediately exits.
PROBLEM CONCLUSION:
setup_server has been modified to no longer terminate when
it encounters a node with incomplete data. For nodes that
do not have a reliable hostname entered, the following
messages will be issued:
setup_server: There is no reliable hostname
assigned to node <node_number>
setup_server: No NIM resources will be allocated for
node <node_number>
setup_server will then continue processing and exit with
a return code of 1.
------
APAR: IY16502 COMPID: 5765D5100 REL: 320
ABSTRACT: HMADM 0026-614 ERROR IS SOMETIMES REPORTED ON THE CONSOLE.
PROBLEM DESCRIPTION:
hmadm 0026-614 error is sometimes reported on the console.
PROBLEM SUMMARY:
As part of the function of cleanup.logs.ws, the hardware
monitor daemon log is changed. The current log is closed
and a new log is opened. In PSSP 3.2, the following
error message was being issued to the console:
Cannot change hmlogfile for cleanup, hmadm error:
hmadm: 0026-614 You do not have authorization to access
the Hardware Monitor.
Prior to the call to hmadm clog, ksrvtgt was being issued
to obtain a hardmon ticket, but it needed to be called
for root SPbgAdm.
PROBLEM CONCLUSION:
cleanup.logs.ws was modified to issue krsvtgt for
root SPbgAdm prior to invoking hmadm clog. This allows
hmadm to complete successfully.
------
APAR: IY16688 COMPID: 5765D5100 REL: 320
ABSTRACT: SDR_CONFIG - INCORRECT SETTING OF ISPARTITIONABLE ATTRIBUTE
PROBLEM DESCRIPTION:
IsPartitional attribute of SP class should only be set to false
after all necessary SDR updates are successfully completed.
Otherwise, subsequent invocations of SDR_config will not
complete setup for the SP Switch 2.
SDR_config -u should also update the Switch_adapter_port class.
If the SDR updates fail when invoked from hmreinit, verify
that this is made clear to the user.
LOCAL FIX:
The IsPartitional attribute in the SP class needed to be reset
to "true" before rerunning SDR_config.
PROBLEM SUMMARY:
Three problems:
1. Execution of SDR_config doesn't retry SDR writes if an
error 80 (Fail to Connect) is encountered.
2. If an SP Switch2 is present, the SPS2_CleanUp subroutine
is sometimes skipped, leaving defunct values in the SDR.
3. The "SDR_config -u" option (update) should not be
permitted when a SP Switch2 is present. It can cause
major SDR corruption.
PROBLEM CONCLUSION:
When SDR_config writes to the SDR, if an error 80 status
is encountered, up to five retries will be attempted, with
a one-second wait in between.
If an SP Switch2 is present, SPS2_CleanUp will always be
run.
The update option of SDR_config will no longer be allowed
if the SP Switch2 is present. This change is reflected in
the man page for the SDR_config command, and will be shown
in the next edition of the PSSP Commands Reference manual.
A new diagnostic message, 0016-742 will be issued if -u is
attempted and the task will be aborted.
------
APAR: IY16803 COMPID: 5765D5100 REL: 320
ABSTRACT: CSTARTUP -S FLAG DOES NOT IGNORE EXISTING SEQUENCING VIOLATIONS.
PROBLEM DESCRIPTION:
The cstartup -S flag is supposed to ignore existing
sequencing violations; some trailing target_nodes are
already up and running. The target_nodes that are already
up are left alone. The other target_nodes are started in
sequence. Running the command returns error 0035-161.
PROBLEM SUMMARY:
When a /etc/cstartSeq file is being
used, and the option -S is used, cstartup
should not be checking for sequence
violations.
PROBLEM CONCLUSION:
Under the section where we're checking for
whether a node is a target node...code was
added to check if one of the command line options
was -S. If so, do not check for sequence
violations.
Also, code was added under the section for a
target node, that if -z or -Z command was
issued then we reset the target node. Previously,
we were always resetting the target node, whether
or not -z or -Z was used.
------
APAR: IY16864 COMPID: 5765B8100 REL: 220
ABSTRACT: DTRA CANNOT LOAD VOCABULARIES
PROBLEM DESCRIPTION:
The fileset "devices.artic960add" Version 1.4.2 shipped in
fix level 3004 does not work correctly with the DTRA adapter
in some system units.
LOCAL FIX:
This can be temporarily worked-around by using
/usr/lib/drivers/s960add from version 1.4.1 of
devices.artic960add.
PROBLEM SUMMARY:
DTRA CANNOT LOAD VOCABULARIES
PROBLEM CONCLUSION:
Microcode device drivers updated to
correct problem
------
APAR: IY16931 COMPID: 5765D5100 REL: 320
ABSTRACT: PSSP_SCRIPT FAILURE IS NIM MASTER OR BIS ADAPTER NAME END WITH
PROBLEM DESCRIPTION:
pssp_script will parse incorretly hostname ending with "is"
because the following lines of code :
pssp_script:581: nim_master_ip=$ nim_master_ip *is
pssp_script:765: bis_adap_addr=$ bis_adap_addr *is
pssp_script:790: nim_master_ip=$ temp *is
Are missing a space character after "*" and before "is" so
when the host command respond "artemis is 1.1.1.1" the resulting
nim_master_ip value will be "is" instead of "1.1.1.1"
LOCAL FIX:
Use a was hostname not ending with "is"
or change the pssp_script from :
$(nim_master_ip *is) to :
$(nim_master_ip *is)
PROBLEM SUMMARY:
When the hostname of the B/I Server ends in "is",
pssp_script fails to determine its ip address correctly.
As a result the node will be unable to ftp files from
the B/I Server during a customization, which will result
in the customization hanging with an led of a03. The
log from the customization will show an error on the
tftpfile of the install_info file. There will be a message
from tftp that the host is unknown.
PROBLEM CONCLUSION:
pssp_script was modified to handle the parsing of the
output of the host command, when the hostname ends
in "is". This allows nodes to customize when their
B/I Server hostname ends in "is".
------
APAR: IY16994 COMPID: 5765D5100 REL: 320
ABSTRACT: VSD/RVSD ENHANCEMENTS
PROBLEM DESCRIPTION:
VSD/RVSD Enhancements
PROBLEM SUMMARY:
Add the README file for VSD support of the Subsystem
Device Driver (SDD). SDD is a device driver shipped
with the Enterprise Storage Server (ESS).
PROBLEM CONCLUSION:
Add the README file for VSD support of the Subsystem
Device Driver (SDD). SDD is a device driver shipped
with the Enterprise Storage Server (ESS).
------
APAR: IY16995 COMPID: 5765D5100 REL: 311
ABSTRACT: VSD/RVSD ENHANCEMENTS
PROBLEM DESCRIPTION:
vsd/rvsd enhancements
PROBLEM SUMMARY:
Add the README file for VSD support of the Subsystem
Device Driver (SDD). SDD is a device driver shipped
with the Enterprise Storage Server (ESS).
PROBLEM CONCLUSION:
Add the README file for VSD support of the Subsystem
Device Driver (SDD). SDD is a device driver shipped
with the Enterprise Storage Server (ESS).
------
APAR: IY17070 COMPID: 5765D5100 REL: 320
ABSTRACT: NON DCE SITUATION CAUSES DCE AUTH TEST TO INTERFERE W/ SDR CMMDS
PROBLEM DESCRIPTION:
If a customer has inadvertenly left a copy of /usr/lib/libdce.a
on his system, but elects NOT to use DCE as an authentication
method (ie- splstdata -p shows "auth_methods k4:std"), it
has been noted the system attempts to use a symbol exported from
libdce.a. For example, the use of SDRChangeAttrValues results
in the following error messages:
exec(): 0509-036 Cannot load program spsec_ldmod because
of the following errors:
0509-130 Symbol resolution failed for
/usr/lpp/ssp/bin/spsec_ldmod because:
0509-136 Symbol GSS_MECH_MIT_KRB5 (number 7) is
not exported from dependent module
/usr/lib/libdce.a(shr.o).
Removal of the file /usr/lib/libdce.a is a workaround, causing
the messages to go away.
LOCAL FIX:
Workaround is to remove /usr/lib/libdce.a, because it is not
used in a non-DCE authentication (Kerberos 5) environment.
Problem can be fixed by changing the logic in an authentication
test in the module that initiates the SDR session.
PROBLEM SUMMARY:
Routines that were trying to write to the SDR in a non-DCE
environment would fail if a level of DCE prior to 3.1
existed on the system. The following messages would be
issued:
exec(): 0509-036 Cannot load program spsec_ldmod because
of the following errors:
0509-130 Symbol resolution failed for
/usr/lpp/ssp/bin/spsec_ldmod because:
0509-136 Symbol GSS_MECH_MIT_KRB5 (number 7) is
not exported from dependent module
/usr/lib/libdce.a(shr.o).
0509-192 Examine .loader section symbols with the
'dump -Tv' command.
The routines were trying to access a variable that does
not exist in /usr/lib/libdce.a in the earlier version
of DCE.
PROBLEM CONCLUSION:
Modified the routine that grants write access to the SDR to
first determine if DCE is being used as an authentication
method, prior to accessing the DCE shared library.
------
APAR: IY17126 COMPID: 5765D5100 REL: 320
ABSTRACT: VSD IN SUSPENDED STATE, VOL GROUPS ARE VARIED OFF. IN ERRPT YOU
PROBLEM DESCRIPTION:
GPFS file system is unmounted, VSD is in suspended state,
volume groups are varied off. Problem is because customer
had a GPFS file system in the beginning of their PATH statement
in /etc/environment. This causes the VSD scripts to callout
ksh and use ksh's PATH list which will hang the VSD scripts
because the GPFS path is unmounted and unavailable. If
VSD scripts used the full path name then the VSD Internal Error
of klapi timeout would not occur because ksh's PATH list would
not be used. VSD Scripts should contain full path names.
LOCAL FIX:
Fix /etc/environment to contain PATHs that are always
available or put the user path at the end of the PATH
statement.
PROBLEM SUMMARY:
If a user's directory name appears in the $PATH environment
variable ahead of the system directories /usr/bin, /usr/sbin
and /etc, RVSD can be impacted. If, for example, the user's
directory wasn't mounted, and RVSD tries to execute a
command that must be resolved from $PATH, it will hang
waiting for the mount.
PROBLEM CONCLUSION:
Five RVSD command modules which rely on $PATH to resolve
system calls have been changed so that /usr/bin, /usr/sbin
and /etc are scanned first.
This will make them more robust, and not be affected by
local modifications.
------
APAR: IY17129 COMPID: 5765D5100 REL: 320
ABSTRACT: NO DEFAULT VALUES WHEN SETTING ENT SPEED AND DUPLEX
PROBLEM DESCRIPTION:
Step 30 of Migrating to the latest level od PSSP in the
Installation and Migration guide states that default values will
be assigned for ethernet speed and duplex. In practise no
default values are assigned, they are left as null.
LOCAL FIX:
Values are entered manually
PROBLEM SUMMARY:
After a migration from PSSP-2.4 to PSSP-3.2 the
new ethernet adapter values, enet_rate and duplex,
residing in the SDR are not given valid default
definitions. The null SDR attributes lead to the
following nodecond failure found in the nodecond
log:
Nodecond Status: network type not selected
return code -1 from boot_network
Nodecond Status: Finished
PROBLEM CONCLUSION:
The new code within /usr/lpp/ssp/install/bin/SDR_init
does a check for the ethernet type and any blank
values currently defined in the SDR for the enet_rate
and duplex attributes. If any blank values are found
the code will now assign 10/half for those 'bnc' and
'dix' adapters and 100/half for those 'tp' adapter(s)
effected.
------
APAR: IY17143 COMPID: 5765D5100 REL: 320
ABSTRACT: PSSP_SCRIPT NEEDS TO INSTALL DEVICES.CHRP.BASE.RTE IS
PROBLEM DESCRIPTION:
The PSSP 3.x pssp_script fails to install (migrate to)
PSSP 2.4 on MCA nodes because devices.chrp.base.rte
gets installed only on PCI nodes. But this fileset is
a prereq of PSSP 2.4, so it is needed on MCA nodes too.
The part of pssp_script to install devices.chrp.base.rte
currently is (I have to wrap lines to fit into SSF)
# Defect 46958: remove -c (commit) flag for AIX filesets
oslvl=$($oslevel)
if $oslvl = $os415 && $oslvl = $os414 ; then
if -z $($lslpp -qh devices.chrp.base.rte 2>/dev/null)
&& $platform = "chrp" ; then #-
$installp -abgXd/mnt devices.chrp.base.rte
==> only on oslevel >=4.2.0.0 and on platform==chrp
devices.chrp.base.rte will be installed
This needs to be changed to
if $oslvl = $os415 && $oslvl = $os414 ; then
if -z $($lslpp -qh devices.chrp.base.rte 2>/dev/null)
&& $platform = "chrp" || "$code_version" = "PSSP-2.4"
then #-
$installp -abgXd/mnt devices.chrp.base.rte
==> devices.chrp.base.rte will be installed if
oslevel>=4.2.0.0 and (platform==chrp or
code_version==PSSP-2.4)
PROBLEM SUMMARY:
pssp_script only installs the fileset devices.chrp.base.rte
on chrp nodes. However the ssp.basic fileset in PSSP 2.4
requires devices.chrp.base.rte, regardless of the type of
node. pssp_script needs to be modified to install the
fileset devices.chrp.base.rte when either the node
platform is chrp, or the PSSP level of the node is 2.4.
PROBLEM CONCLUSION:
pssp_script has been modified to install the
fileset devices.chrp.base.rte when either the node
platform is chrp, or the PSSP level of the node is 2.4.
------
APAR: IY17160 COMPID: 5765D5100 REL: 320
ABSTRACT: DISABLE MONITORING INDIVIDUAL VSDS IN CFGVSD COMMAND.
PROBLEM DESCRIPTION:
Problem:
When I use /usr/lpp/mmfs/bin/mmcrvsd to create a large
filesystem (over 300 vsd disks) it does create the vsds
but does not activate them on the nodes. Each node in the
cluster returns.
monitorvsd: 0034-020 only 300 vsds can be monitored at the
time.
I have to manually start the vsds on each node before I
can format the filesystem.
This is not a big impact, but can cause confusion about if
the system is in a correct state or not.
LOCAL FIX:
Disable monitoring individual vsds in cfgvsd.
PROBLEM SUMMARY:
When more than 300 VSD names are specified on a cfgvsd
command, the execution aborts with error 0034-020, "Only 300
vsds can be monitored at a time."
The execution of monitorvsd on all the VSDs in the list is
not a required feature. In fact, it is undocumented. The
call should be removed. The monitorvsd function is
available, and may be invoked by the user to monitor
individual VSDs.
PROBLEM CONCLUSION:
The cfgvsd command will no longer execute a monitorvsd
command when given a list of VSD names. The monitorvsd
command will still be available for those who need the
feature.
------
APAR: IY17200 COMPID: 5765D5100 REL: 320
ABSTRACT: OUT.TOP INCORRECT LINK STATUS
PROBLEM DESCRIPTION:
uninitialized links are reporting "initialized" status in out,to
p. Example:
2 L: initialized (wrap plug is installed)
Should be -2 L
PROBLEM SUMMARY:
The out.top file reports conflicting status for
uninitialized switch-to-switch links. For example, the
comments report the link as "initialized" when in fact
it may be removed from the network as faulty. This
is what the customer received:
s 10104 1 s 180 3 E154-S04-BH-J33 to E69-S17-BH-J19
2 L: initialized (link has been removed from network -
no AUTOJOIN)
Note, this problem affects only the comments portion of
the out.top entry; the reported switch connections are
correct.
PROBLEM CONCLUSION:
In the above example, the text "2 L; initialized" represents
the device status; the text in parenthesis "(link has been
removed from network - no AUTOJOIN)" represents the link
status. For an uninitialized switch-to-switch link, the
device status is meaningless and will not be displayed.
Given the above example, what you will now see in the
comments portion of the out.top entry is:
-6 R; link has been removed from network - no AUTOJOIN
or
-6 L; link has been removed from network - no AUTOJOIN
if the left side of the link is faulty.
------
APAR: IY17211 COMPID: 5765D2800 REL: 430
ABSTRACT: CLVER DISPLAYS ERROR WITH 3.2.0.6 SSP.BASIC: ERROR: SERVICE
PROBLEM DESCRIPTION:
clver incorrectly displays the following error when the
cluster includes an HPS network and another tcpip network
such as ethernet:
Service adapter <adapter name> is improperly configured on
node <node name>.
The problem appears to be caused by a naming convention
change in PSSP regarding the css adapter/interface.
PROBLEM SUMMARY:
cluster verification fails with following error:
ERROR: Service adapter <adapter name> is improperly configured
on node <node name>.
PROBLEM CONCLUSION:
Modify clver so that it is not so picky about the css name
it is looking for in the CuAt and CuDv ODM classes.
------
APAR: IY17226 COMPID: 5765D6100 REL: 220
ABSTRACT: LLCANCEL POE INTERACTIVE RUNNING JOB W/EXTERNAL SCHEDULER
PROBLEM DESCRIPTION:
Interactive poe running jobs using external scheduler the
llcancel command can not stop the poe job.
LOCAL FIX:
In order to cancel the poe interactive running job w/external
scheduler. Only the ctrl C can kill the poe job not
llcancel for now.
PROBLEM SUMMARY:
The LoadLeveler command llcancel would not be able
to cancel a running POE interactive job with
external scheduler set.
PROBLEM CONCLUSION:
The LoadLeveler command llcancel would now be able
to cancel a running POE interactive job with
external scheduler set.
------
APAR: IY17233 COMPID: 5765D2800 REL: 430
ABSTRACT: INCORRECT LIST OF SHARED FILESYSTEMS UNDER SMIT CHANGE / SHOW
PROBLEM DESCRIPTION:
When a Resource Group only contains one Node name in node list
and no
filesystems nor Volume Groups are specified for this
Resource Group, the smit menu: Change / Show
Characteristics of a Shared File System under the cspoc
options, displays an incorrect list of filesystems, including
all filesystems in rootvg.
PROBLEM SUMMARY:
When a Resource Group only contains one Node name in node list
and no filesystems nor Volume Groups are specified for this
Resource Group, the smit menu: Change / Show
Characteristics of a Shared File System under the cspoc
options, displays an incorrect list of filesystems, including
all filesystems in rootvg.
PROBLEM CONCLUSION:
Modify HACMP smit cspoc odm sm_cmd_opt and sm_cmd_hdr entries
to use correct flags.
------
APAR: IY17241 COMPID: 5765D2800 REL: 430
ABSTRACT: CLSMUXPD FAILS TO OPERATE PROPERLY IF NOFILES IS GREATER THAN
PROBLEM DESCRIPTION:
When nofiles is set to a value greater than 2000,
clsmuxpd does not operate properly.
LOCAL FIX:
set nofiles in /etc/security/limits to 2000 or less.
PROBLEM CONCLUSION:
modify clsmuxpd so that it will correctly handle file
descriptors.
------
APAR: IY17252 COMPID: 5765D2800 REL: 430
ABSTRACT: CLCONVERT CORE DUMPS WHEN CONVERTING FROM HACMP 4.2.2 TO
PROBLEM DESCRIPTION:
clconvert.43 core dumps when converting from a hacmp 422
snapshot.
PROBLEM CONCLUSION:
Modify clconvert.43 so that it correctly converts resources
from hacmp 4.2.2 to hacmp/es 4.3.1
------
APAR: IY17253 COMPID: 5765B9501 REL: 320
ABSTRACT: MISSING MMFS ENTRIES IN /ETC/FILESYSTEMS AFTER MKSYSB INSTALL
PROBLEM DESCRIPTION:
This APAR was opened with regard to the steps required following
reinstall of a GPFS server since the image doesn't backup the
/dev/ entries for mountpoints. You can either mv cluster.nodes
.nodes or mmfs.cfg and restart GPFS to rebuild the /dev entries,
but the documentation doesn't say anything about that.
Documentations should be modified/enhanced to reflect above
steps.
LOCAL FIX:
Modify/enhance the documentation to show the steps required
following the reinstall of a GPFS server.
PROBLEM SUMMARY:
Using network install of a node, the gpfs configuration
files are refreshed; but the entries in the aix
configuration files are not. This results in an inability to
mount file systems until the AIX /dev and /etc/filesystems
entries are refreshed.
PROBLEM CONCLUSION:
On GPFS startup; check to see that the AIX configuration
files contain the needed data.
------
APAR: IY17405 COMPID: 5765D5100 REL: 320
ABSTRACT: RVSD NOT TERMINATING WHEN CSS ADAPTER GROUP IS DISSOLVED
PROBLEM DESCRIPTION:
RVSD is not terminating and restarting KLAPI in all cases where
CSS adapter group is dissolved.
LOCAL FIX:
efix can be found in
/afs/aix/u/bdherr/efix/nersc/adapter_group_disolved
contact Brian Herr for more details
PROBLEM SUMMARY:
RVSD is not terminating and restarting KLAPI in all
cases when the CSS adapter group is dissolved. This
will cause RVSD/VSD to hang when communication is
re-attempted to these nodes.
PROBLEM CONCLUSION:
RVSD will make sure that KLAPI gets terminated and
restarted when the switch adapter group disolves.
------
APAR: IY17409 COMPID: 5765D5100 REL: 320
ABSTRACT: CLEANUP.LOGS.WS: K4DESTROY: 2502-000 NO TICKETS TO DESTROY
PROBLEM DESCRIPTION:
/usr/lpp/ssp/bin/cleanup.logs.ws script run
thru cron will create unexpected stderr output
if the file /var/adm/SPlogs/SPdaemon.log does
not exit.
We only get the k4 ticket if the file exists:
if -f $LOG_DIR/SPdaemon.log
then
WORKSTATION_NAME=`hostname`
# Get K4 creds if required
/bin/ksrvtgt root SPbgAdm
...
fi
but later on we will destroy, independed if we
got it.
# Get rid of K4 creds if any
/bin/k4destroy >/dev/null
unset KRBTKFILE
at this point the message
k4destroy: 2502-000 No tickets to destroy.
is written to stdout. As the script is run from
cron and no stderr redirection is set there
(errors should got back to root as email)
root now will get one email a day because
of k4destroy failing.
We should redirect stderr on k4destroy to
/dev/null too.
Work around:
edit /usr/lpp/ssp/bin/cleanup.logs.ws and
change
/bin/k4destroy >/dev/null
to
/bin/k4destroy >/dev/null 2>/dev/null
PROBLEM SUMMARY:
cleanup.logs.ws issues k4destroy to destroy any Kerberos
Version 4 authentication tickets. If any messages are
written to stderr, such as:
k4destroy: 2502-000 No tickets to destroy.
it results in an email being sent to root, since it is
usually run as a cron job. Customers would prefer to not
see this message and to not receive the email.
PROBLEM CONCLUSION:
cleanup.logs.ws has been modified so that any output to
stderr from k4destroy will be redirected to stdout,
which is already being redirected to /dev/null. As a
result no error messages will be issued from the call
to k4destroy from cleanup.logs.ws. This is consistent
with the call to kdestroy.
------
APAR: IY17438 COMPID: 5765D5100 REL: 320
ABSTRACT: SWITCH WENT TO DOWN
PROBLEM DESCRIPTION:
Switch went to down.
flt file shows the errors:
2510-898 unable to access SDR to get the list of auto-join nodes
rc= -1.
2510-195 The fault service daemon got a SIGTERM signal.
PROBLEM SUMMARY:
Closed the window so the child process will exit on SIGTERM
without resetting the adapter.
PROBLEM CONCLUSION:
There is a small window where the primary forks a child
process and an SDR test is run where a SIGTERM to the child
will result in the child call the standard SIGTERM handler
and reset the adapter.
------
APAR: IY17453 COMPID: 5765D5100 REL: 320
ABSTRACT: BACKUP ID NOT UPDATED IN SDR WHEN BACKUP TIMES OUT DURING
PROBLEM DESCRIPTION:
This apar addresses the aftermath of a backup node not
responding during fence/unfence processing. The backup
node fails to ACK to a DEVICE_DB_UPDATES command;
the backup is fenced and a new backup is chosen. However,
the SDR is not updated to reflect the new backup id.
PROBLEM SUMMARY:
When a primary backup node times out during an Efence or
Eunfence operation, it is fenced off the switch and a new
backup node is chosen, but the SDR is not updated to
reflect the new backup's id.
PROBLEM CONCLUSION:
The error recovery Estart that gets invoked to handle
errors found during Efence or Eunfence will call the
function which will update the backup's name in the SDR.
------
APAR: IY17467 COMPID: 5765D6100 REL: 220
ABSTRACT: LLCTL -Q <COMMAND> DOES NOT PREVENT STDOUT.
PROBLEM DESCRIPTION:
The llctl -q (the quiet mode) <command> permits non error output
as if the -q option was not used. For example, llctl -q reconfig
produces std output exactly like "llctl reconfig" does.
PROBLEM SUMMARY:
In LL 2.2, the llctl -q option is not suppressing
the informational messages.
PROBLEM CONCLUSION:
For LoadLeveler version 2.2,
the llctl -q option will now suppress informational
messages.
------
APAR: IY17483 COMPID: 5765B9501 REL: 320
ABSTRACT: MMLSQUOTA -G <GROUP> RUN AS NONE ROOT USER RETURNS THE ERROR
PROBLEM DESCRIPTION:
mmlsquota -g <group> run as none root user returns the error
"operation not permitted" (in this case user loadl).
GPFS documentation has no limits listed on mmlsquota like 'to
run this you need system group permission'.
LOCAL FIX:
The code internally checks the uid of the process issuing the
command and allows non-root users to see only their own user
quotas. This restriction should be mentionded in the man page
in the next level of documents
PROBLEM SUMMARY:
mmlsquota requires root access
PROBLEM CONCLUSION:
Allow mmlsquota to display the quotas for a group which the
issuing user is a member of.
------
APAR: IY17491 COMPID: 5765D5101 REL: 120
ABSTRACT: ASCIW: GROUP SERVICES UNABLE TO ESTABLISH DOMAIN AFTER RECYCLE
PROBLEM DESCRIPTION:
ASCIW: Group Services unable to establish domain after recycle
PROBLEM SUMMARY:
In a system with multiple networks. a node may not have
connection on all networks. To receive connectivity
information on networks that are not directly connected, a
Topology Services daemon relies on connectivity messages
forwarded from those networks. In some situations,
forwarded connectivity messages may be unnecessarily
ignored or lost, resulting in missed node downs.
PROBLEM CONCLUSION:
Topology Services has been modified to reduce the
possibility that a forwarded connectivity messages may be
ignored or lost. It has more nodes doing the forwarding of
connectivity messages, and it accepts forwarded
connectivity messages when it is busy committing a group.
------
APAR: IY17504 COMPID: 5765D5101 REL: 120
ABSTRACT: PHOENIX.SNAP DOESN'T WAIT LONG ENOUGH ON LARGE SYSTEMS
PROBLEM DESCRIPTION:
phoenix.snap will wait a certain amount of time for commands it
runs to complete before declaring them hung and killing them.
On very large systems some of these commands take longer to run
and this wait time is not long enough. The wait time needs to
be increased for large systems.
LOCAL FIX:
Manually modify the amount of time phoenix.snap waits by
changing the values in the function waitforit()
PROBLEM SUMMARY:
Some of the commands phoenix.snap runs take time
proportional to the number of nodes in the system.
The amount of time it waits for these commands
before declaring them hung and terminating them
needs to be adjusted accordingly.
There were also 3 other bugs noted in phoenix.snap
that are being fixed here:
1. On 3.2 systems the format of the lssrc output
for hags has changed slightly causing phoenix.snap
to fail to determine the hags nameserver.
2. /etc/netsvc.conf was not being collected
because of a typo.
3. vmstat was collecting the wrong data. Instead
of collecting interval statistics as was intended
it was getting statistics since system startup
repeatedly.
PROBLEM CONCLUSION:
phoenix.snap now increases the wait time
proportionally for every 256 nodes in the system.
Thus, for 257-512 nodes it doubles the wait time.
As for the other 3 bugs:
The lssrc parsing has been modified to not be
dependent on the spacing of the output.
The /etc/netsvc.conf typo has been corrected.
The vmstat collection has been modified to run
"vmstat 5 5" instead of 5 instances of "vmstat".
The result is interval data will be collected
instead of numbers since system startup 5 times
over.
------
APAR: IY17518 COMPID: 5765D5100 REL: 320
ABSTRACT: VSD/RVSD ENHANCEMENTS
PROBLEM DESCRIPTION:
VSD/RVSD support for the Subsystem
Device Driver (SDD) for the Enterprise Storage Server (ESS)
PROBLEM SUMMARY:
VSD/RVSD enhancements
PROBLEM CONCLUSION:
VSD/RVSD enhancements
------
APAR: IY17541 COMPID: 5765D5100 REL: 320
ABSTRACT: NODES NEED BOOTED TWICE TO UPDATE TUNING.CUST
PROBLEM DESCRIPTION:
Because boot procedure always runs tuning.cust locally, *then*
checks for customize and ftp's tuning.cust from CWS, the node
must be rebooted a second time for changes in tuning.cust to
actually be applied to the node.
LOCAL FIX:
Reboot nodes a second time after tuning.cust has been ftp'd
through customize reboot.
PROBLEM SUMMARY:
During a node's customization, tuning.cust is ftp'd from
the node's Boot/Install Server. However, tuning.cust will
not be executed until the next reboot of the node.
pssp_script should be modified so that during a node's
customization, tuning.cust will be executed.
PROBLEM CONCLUSION:
pssp_script has been modified so that during a node's
customization, tuning.cust will be executed.
------
APAR: IY17579 COMPID: 5765D5100 REL: 320
ABSTRACT: FSD RESPONSE TIME
PROBLEM DESCRIPTION:
fsd response time
PROBLEM SUMMARY:
When a node is heavily loaded, the switch daemon can be
delayed processing certain packets. In some cases, the node
is dropped from the switch.
PROBLEM CONCLUSION:
The packet processing code in the fault service daemon has
been changed to improve processing time.
------
APAR: IY17583 COMPID: 5765D5100 REL: 320
ABSTRACT: MSSR DOES NOT RELEASE ADAPTER AFTER TEST EXITS.(D/S)
PROBLEM DESCRIPTION:
MSSR does not release adapter after test exits.(D/S)
PROBLEM SUMMARY:
The interface between the fault service daemon, Connectivity
Matrix, and the protocols was changed without changing the
SP Switch Diagnostics.
PROBLEM CONCLUSION:
Added call to FSD::download_processor_route_table() so
that the Connectivity Matrix (CM) will updated, and the
protocols will be restarted.
TEMPORARY FIX:
After the test completes, run an Estart to restore the
correct state to the adapter.
------
APAR: IY17584 COMPID: 5765D5100 REL: 320
ABSTRACT: INSUFFICIENT STACK FOR KICKPIPES() IN MPCI, CAUSES A PROBLEM
PROBLEM DESCRIPTION:
PSSP 3.1.1 introduced local var 'shoveq' & 'frq' in kickpipe(
in MPCI. They need stack frame 8192 Bytes, but they are 4096
Bytes. This fact causes a problem for Informix down.
PROBLEM SUMMARY:
Running Informix, on an SP system, can fail if a query with
2000 or's is done. Informix may detect a corruption of the
header of its stack block pool, and quit.
PROBLEM CONCLUSION:
MPCI, which is used by Informix, added a couple of large
stack variables for shared memory support. This causes a
problem, for Informix, because Informix both uses MPCI and
manages its own threading and stacks. Informix's current
management does not account for the addition of 8K of
additional stack space for the MPCI routines that Informix
calls. MPCI changed the declaration of these new large
variables so that they are now located in the heap, instead
of the stack. The new MPCI implementation solves the
problem that Informix had working with our MPCI environment
and is probably the better way for MPCI to handle these
large variables.
------
APAR: IY17605 COMPID: 5765B9501 REL: 320
ABSTRACT: ASSERT FAILED: IN_CPY->COPYSET...
PROBLEM DESCRIPTION:
assert failed:in_cpy->...
PROBLEM SUMMARY:
GPFS self check logic failed in HandleReq.C
PROBLEM CONCLUSION:
Correct logic error in the token manager
------
APAR: IY17638 COMPID: 5765B9501 REL: 320
ABSTRACT: GPFS QUOTA OPERATION LOSES TOKEN AND ASSERTS
PROBLEM DESCRIPTION:
The problem seems to be that the filesystem managerwanted to
do a quota operation, but had lost the token for the quota file
to a particular node. It seems the quota code forgot about
reacquiring the token before doing the operation and therefore
asserted when some of its data was not in a "valid" state.
PROBLEM SUMMARY:
GPFS self check logic failed in a stress load with quotas
enabled
PROBLEM CONCLUSION:
Correct locking error on quota file
------
APAR: IY17653 COMPID: 5765D5100 REL: 320
ABSTRACT: SPADAPTR ERROR WITH '-S YES' AND LARGE NODE NUMBER
PROBLEM DESCRIPTION:
Using spadaptr with the '-s yes' option and a large number of
nodes can cause incorrect ip addresses to be calculated,
resulting in error message 0022-047.
PROBLEM SUMMARY:
A customer was using spadaptrs to enter data for a large
number of css adapters using the switch node numbers.
When a node with a high node number, but a low switch
number was encountered, the third octet of the IP address
was calculated incorrectly.
During the processing of all the nodes, the third octet
of the IP address had been incremented because the fourth
octet had exceeded 255. When the node with a high node
number was being processed it used the incremented third
octet number instead of the original value. Since this
calculated IP address was not a valid IP address, an error
message was issued stating that the IP address could not
be resolved and spadaptrs terminated.
PROBLEM CONCLUSION:
spadaptrs was modified to correct the generation of IP
addresses for css adapters, when the switch node numbers
are used. Certain values were not being reset, which
caused the third octet of the generated IP address to be
incorrect, which could cause spadaptrs to fail.
------
APAR: IY17683 COMPID: 5765D2800 REL: 430
ABSTRACT: UNABLE TO SYNC CLUSTER TOPOLOGY DUE TO CLLOG ERROR MESSAGE
PROBLEM DESCRIPTION:
During verification of log files, cllog will fail, issuing the
erroneous message that cluster.log has already been redirected.
PROBLEM SUMMARY:
During verification of log files, cllog will fail, issuing the
erroneous message that cluster.log has already been redirected.
The real cause of the problem is an incorrect number of ''
characters in the following awk calls:
awk '/local0.info/ { print $2 }' /etc/syslog.conf
awk '/user.notice/ { print $2 }' /etc/syslog.conf
There should be three '' characters before the '$', not one.
PROBLEM CONCLUSION:
The code was changed to pass grep command which does not
require any escaped characters to cl_rsh so that the entire
line is returned to the local machine. Once returned the
line is then echoed into awk to print just the $2 field.
------
APAR: IY17717 COMPID: 5765D2800 REL: 430
ABSTRACT: THE HOME DIRECTORY FOR ROOT COULD BE DIFFERENT THEN "/".
PROBLEM DESCRIPTION:
When .rhosts file is used by HACMP it assumes that the directory
is "/". The file should be accessed using "~root/.rhosts".
PROBLEM SUMMARY:
error /.rhosts file does not exist
PROBLEM CONCLUSION:
replace /.rhosts reference with root/.rhosts
------
APAR: IY17723 COMPID: 5765B9501 REL: 320
ABSTRACT: GETMSG LOSES ERRNO SETTING
PROBLEM DESCRIPTION:
getmsg loses errno setting
PROBLEM SUMMARY:
Errno gets lost if the message file becomes inaccessible
PROBLEM CONCLUSION:
Correct handling of errno,
------
APAR: IY17726 COMPID: 5765B9501 REL: 320
ABSTRACT: LOGS NOT MIGRATED IN DELDISK
PROBLEM DESCRIPTION:
logs not migrated in deldisk
PROBLEM SUMMARY:
Log migration failed after deleting all of the original
disks in a file system and then trying immediately to
restripe the file system
PROBLEM CONCLUSION:
Correct an error in the creation of spare logs.
------
APAR: IY17753 COMPID: 5765D5100 REL: 320
ABSTRACT: COLONY:EDC ERRORS ON SW LINKS CAUSE NODES TO FALL OFF THE SW
PROBLEM DESCRIPTION:
colony:edc errors on sw links cause nodes to fall off the sw.
PROBLEM SUMMARY:
This Defect introduces the cable_test tool in response to
customer requests to expedite debugging problems with
loose cables or interposer cards.
PROBLEM CONCLUSION:
The cable_test tool was written to help the user isolate
problems with loose cables and improperly seated interposer
cards.
------
APAR: IY17770 COMPID: 5765D6100 REL: 220
ABSTRACT: LOADL LOST DRAIN ON CLASS AFTER LLCTL RECONFIG
PROBLEM DESCRIPTION:
LOADL LOST DRAIN ON CLASS AFTER LLCTL RECONFIG
PROBLEM SUMMARY:
LoadLeveler resets the startd drain on class after a
reconfig.
PROBLEM CONCLUSION:
LoadLeveler now maintains the startd drain on class after a
reconfig.
------
APAR: IY17787 COMPID: 5765D6100 REL: 210
ABSTRACT: STARTER CRASHES WHEN JOB_USER_PROLOG SPECIFIED
PROBLEM DESCRIPTION:
LoadL_starter will sometimes take a Segmentation Violation when
a Job User Prolog is being used.
LOCAL FIX:
The job might run, if it is tried again. Otherwise, you must
find a way to run it without using the Job User Prolog.
PROBLEM SUMMARY:
When JOB_USER_PROLOG is specified in the Config file, the
LoadL_starter will occasionally take a segmentation
violation.
PROBLEM CONCLUSION:
The handling of the JOB_USER_PROLOG was corrected to avoid
the timing condition that could lead to the segmentation
violation.
------
APAR: IY17845 COMPID: 5765D5100 REL: 320
ABSTRACT: WE ARE GETTING THE FOLLOWING ERROR SPKNKEYMAN_ERROR104 WHEN DCE
PROBLEM DESCRIPTION:
In errpt -a, customer sees various entries for the following
error : SPKNKEYMAN_ERROR104. Customer sees error when DCE is
installed but not configured for SP.
LOCAL FIX:
Commenting the following in /etc/rc.sp:
SPNKEYMAN_START=/usr/lpp/ssp/bin/spnkeyman_start
if [ -x ${SPNKEYMAN_START} ]; then
$SPNKEYMAN_START &
fi
PROBLEM SUMMARY:
startup script is checking for
/usr/lib/libdce.a but not for whether
the actual fileset is installed or not..
And it is failing to behave as expected
in the scenarios where libdce.a exists
in /usr/lib not becoz the customer
installed it but becoz some
of his application needs it..
daemon code is not detecting the
completion of the dce config for trusted
services...hence keeps running and making
errpt entries in case of dce is
configured partially.
PROBLEM CONCLUSION:
startup script now checks for the fileset
dce.client.rte instead of checking for
/usr/lib/libdce.a before starting the daemon.
daemon code detects the partial configuration
of dce for trusted services,makes an errpt
entry and exits.
------
APAR: IY17889 COMPID: 5765B9501 REL: 320
ABSTRACT: FORMAT SEGMENTATION FAULT
PROBLEM DESCRIPTION:
format segmentation fault
PROBLEM SUMMARY:
Fix potential segmentation fault during file system
creation.
PROBLEM CONCLUSION:
Fix bug in serializing multiple worker threads
------
APAR: IY17926 COMPID: 5765D6100 REL: 220
ABSTRACT: LL TO SUPPORT NEW INTERFACE TO VMGETINFO
PROBLEM DESCRIPTION:
LL to support new interface to vmgetinfo
PROBLEM SUMMARY:
Needed support for future changes in AIX
for LoadLeveler.
PROBLEM CONCLUSION:
Code changes for future support of
AIX for LoadLeveler.
------
APAR: IY18011 COMPID: 5765D5100 REL: 320
ABSTRACT: PRIMARY DAEMON CORE DUMPS IN CSRECOVERY
PROBLEM DESCRIPTION:
primary daemon core dumps in csrecovery
PROBLEM SUMMARY:
This problem was caused by an array in switch recovery
overflowing and wiping out pointers and other variables. The
array now has one element assigned to each chip and node in
the system, eliminating the overflow condition.
PROBLEM CONCLUSION:
Switch recovery was changed to have a fixed array containing
error reset information. The previous array was coded to 100
entries. If more than 100 resets were pending then the
array would overflow, and pointers to other structures would
be wiped out. This would cause segmentation faults. The new
arrays have one element for each switch chip/node in the
system, so the overflow will not occur.
------
APAR: IY18013 COMPID: 5765D2800 REL: 430
ABSTRACT: ON FALLOVER, STANDBY ADAPTER MARKED DOWN IS SOMETIMES SELECTED
PROBLEM DESCRIPTION:
node has two standbys. A service address fails (unplugged).
Swap_adapter completes sucessfully and standby is marked
down. Fallover occurs. It fails even though there is a
second standby marked up becuase the standby which is down
is selected for the service address of the failed node.
PROBLEM SUMMARY:
node has two standbys. A service address fails (unplugged).
Swap_adapter completes sucessfully and standby is marked
down. Fallover occurs. It fails even though there is a
second standby marked up becuase the standby which is down
is selected for the service address of the failed node.
PROBLEM CONCLUSION:
Modify clstrmgr so that it exports DOWN for standby adapters
which are down instead of doing nothing thus making it
compatible with HAS.
------
APAR: IY18023 COMPID: 5765B9501 REL: 320
ABSTRACT: INDIRECT BLOCKS LEFT AFTER TRUNCATING TO SMALL FILE
PROBLEM DESCRIPTION:
indirect blocks left after truncating to small file
PROBLEM SUMMARY:
Correct an error in maintaining the indirect level of a file
when truncated from a very large file to a very small
non-zero length
PROBLEM CONCLUSION:
Correctly handle the indirection level when truncating files
to a non-zero small size.
------
APAR: IY18025 COMPID: 5765B9501 REL: 320
ABSTRACT: DEAMON ASSET REPDISKADDR::GETFROMARRAY + 0X9C
PROBLEM DESCRIPTION:
daemon asset repdiskaddr::getfromarray + 0x9c
PROBLEM SUMMARY:
Service tool caused a node panic when used.
PROBLEM CONCLUSION:
Correct a logic in a data collection service tool
------
APAR: IY18078 COMPID: 5765D5100 REL: 320
ABSTRACT: VSDVGTS -A TOO SLOW WITH MANY HDISKS AND VPATHS
PROBLEM DESCRIPTION:
vsdvgts -a too slow with many hdisks and vpaths
PROBLEM SUMMARY:
Several scalability performance problems have been
observed when a node has a large number of disks
and/or volume groups being managed by RVSD.
PROBLEM CONCLUSION:
The following changes will be made in RVSD to address
the performance problems observed when a node has
a large number of disks and/or volume groups.
- The vsdvgts command has been changed to not use
the lspv command to determine volume group membership.
- The RVSD recovery scripts will limit the number of
varyonvg/varoffvg that can occur in parallel in order
to reduce ODM lock contention.
------
APAR: IY18108 COMPID: 5765D6100 REL: 220
ABSTRACT: LLSUBMIT API NOT CLOSING JOB COMMAND FILE
PROBLEM DESCRIPTION:
A users program using the submit api runs out of file handles to
run jobs.
LOCAL FIX:
The user's code can be reorganized so that it does not try to
do more than 4 job submissions without shutting down.
PROBLEM SUMMARY:
The llsubmit API does not always close the job command file
PROBLEM CONCLUSION:
The llsubmit API needs to close the job command file after
successfully obtaining the data from the file.
------
APAR: IY18125 COMPID: 5765B9501 REL: 320
ABSTRACT: ONLINE MMCHECKQUOTA: DEALLOC ASSERTS IN FIXSHADOWTABLEBLOCKCOUNT
PROBLEM DESCRIPTION:
online mmcheckquota: dealloc asserts in FixShadowTableBlockCount
PROBLEM SUMMARY:
GPFS self check logic terminated while running mmcheckquota
PROBLEM CONCLUSION:
Fix serialization error in mmcheckquota
------
APAR: IY18165 COMPID: 5765B9501 REL: 320
ABSTRACT: GPFS 1.5 SP /ASSERT IN SYNC.C DIRTYINDBUFS > 0 && IBDP != NULL
PROBLEM DESCRIPTION:
gpfs 1.5 sp /assert in sync.C dirtyindbufs >0 && ibdp != null
PROBLEM SUMMARY:
GPFS self check logic failed in sync.C line 2683
PROBLEM CONCLUSION:
In writeIndirect, wait until updateLogger mutex is held
before checking whether indirect block is dirty.
------
APAR: IY18168 COMPID: 5765B9501 REL: 320
ABSTRACT: ASSERT FAILED HANDLEREQ.C LINE 2598
PROBLEM DESCRIPTION:
assert failed handlereq.c line 2598
PROBLEM SUMMARY:
GPFS self check logic failed at
HandleReq.C line 2598
PROBLEM CONCLUSION:
Token reclaiming flag needs to be turned off when the
token is put backin STABLE state.
------
APAR: IY18170 COMPID: 5765B9501 REL: 320
ABSTRACT: REMOVE ID AND LOG RECORDS FROM SHIPPED FILES
PROBLEM DESCRIPTION:
remove id and log records from shipped files
PROBLEM SUMMARY:
Minor packaging changes
------
APAR: IY18206 COMPID: 5765D9300 REL: 310
ABSTRACT: CHANGE LIGHT WEIGHT CORE FILE DESIGN TO BE ABLE TO NOT PRODUCE
PROBLEM DESCRIPTION:
When a user does an llcancel and is using light weight core
files then they get a LWCF. APAR IY15826 was taken as a DCR
to change the design in a future release. This apar will
retrofit this fix into Parallel Environment 3.1 .
PROBLEM SUMMARY:
When a user does an llcancel and is using light weight
core files then a light weight core file is produced.
The user would like a way to not get the light weight
core files produced on an llcancel.
PROBLEM CONCLUSION:
A design change was made that will add a new environment
variable to control the generation of a light weight
core file when a SIGTERM is received by POE.
llcancel generates a SIGTERM to POE on an
interactive POE job.
The following POE environment variable is now recognized:
export MP_COREFILE_SIGTERM={YES|NO}. The default is YES.
Set the environment variable to NO (case insensitive)
and if light weight core files are being specified, then
on a SIGTERM no light core files will be produced.
There is also a new command line argument to POE
-corefile_sigterm that can be used. The default is YES.
------
APAR: IY18233 COMPID: 5765D5100 REL: 320
ABSTRACT: PSSP_SCRIPT SHOULD SUPPORT NON-ENGLISH LOCALES
PROBLEM DESCRIPTION:
pssp_script (PSSP 3.2) should support non-English locales.
+765 bis_adap_addr=$ bis_adap_addr#*is #- Strip leading stuff
+766 bis_adap_addr=$ bis_adap_addr%% , * #- Strip following
the line does include a bias for "bis_adap_add**is ".
config.log file contains the following error:
+ bis_adap_addr=wg101682 ist 164.17.10.10, Aliases: wg101682
get_eff_addr[25]: wg101682: 0403-009 Die angegebene Nummer ...
LOCAL FIX:
As circumvention:
edit pssp_script lines having "**is " for THIS SPECIFIC problem.
bis_adap_addr=$ bis_adap_addr#*is
change to:
bis_adap_addr=$ bis_adap_addr#**is
PROBLEM SUMMARY:
When customizing a node, if LC_ALL is set to something
other then en_US (e.g. de_DE), the customization will fail.
pssp_script will issue a message from its get_eff_addr
routine stating that an invalid number was supplied for an
IP address.
The problem is caused by pssp_script issuing a "host"
command, and then doing a Korn shell pattern match based
on the word "is". That will fail in non-English locales.
************************************************************
* USERS AFFECTED: Users whose installation default *
* language locale is not English. *
************************************************************
* PROBLEM DESCRIPTION: Node customization will fail in *
* get_eff_addr complaining about bad data in a hostname. *
************************************************************
* RECOMMENDATION: pssp_script must be changed so all *
* its internal calls generate output in the "C" locale. *
************************************************************
PROBLEM CONCLUSION:
pssp_script has been modified to use LC_ALL=C on its calls
to functions that return character strings so that the
output is in English.
This allows the pattern-matching operators that follow
these commands to find the English word(s) they are looking
for
------
APAR: IY18320 COMPID: 5765D5100 REL: 320
ABSTRACT: RESET CLEARS DATA THAT MAY BE NEEDED FOR CHECKSTOP ANALYSIS
PROBLEM DESCRIPTION:
During adapter reset certain registers are cleared that may be
needed for debug.
SP-Switch2 only.
PROBLEM SUMMARY:
A node may checkstop when reading SRAM on a
snap after a critical adapter MIC error has occured and the
adapter has been reset.
PROBLEM CONCLUSION:
Don't read SRAM on critical MIC adapter
errors.
------
APAR: IY18322 COMPID: 5765D5100 REL: 320
ABSTRACT: BITS INCORRECTLY TURNED ON DURING ADAPTER RESET
PROBLEM DESCRIPTION:
Bits are incorrectly set on the SP-Switch2 adapter during
critical adapter recovery.
PROBLEM SUMMARY:
MIC adapter error bit 23 caused by MIC chip
reset can override true adapter errors.
PROBLEM CONCLUSION:
We will ignore MIC bit 23 when identified
along with any other critical adapter error.
------
APAR: IY18326 COMPID: 5765D5100 REL: 320
ABSTRACT: EMASTERD TAKES A LONG TIME ON LARGE SYSTEMS.
PROBLEM DESCRIPTION:
emasterd takes a long time (up to 20 minutes) to establish the
emaster on a large system.
PROBLEM SUMMARY:
On large SP Switch2 systems, the
assignment of a new MSS node can take several minutes;
this is too long a time for the system to be deprived of a
synchronized switch clock.
PROBLEM CONCLUSION:
Changes were made to emastered to
speed-up MSS failover on large systems.
------
APAR: IY18337 COMPID: 5765D5100 REL: 320
ABSTRACT: REENABLE DIAGS EXECUTION AT CONFIG
PROBLEM DESCRIPTION:
reenable diags execution at config
PROBLEM SUMMARY:
diags execution disabled in SP Switch2 adapter
configuration
method to prevent popping another problem (since solved)
which checkstopped node.
PROBLEM CONCLUSION:
diags execution reenabled in SP Switch2
adapter configuration method.
------
APAR: IY18351 COMPID: 5765D5100 REL: 320
ABSTRACT: INCORRECT MESSAGE RETURNED WHEN POWERING ON CONDOR NODES
PROBLEM DESCRIPTION:
Incorrect message returned when powering on Condor Nodes
PROBLEM SUMMARY:
condor nodes require more time to power on
than most other nodes. The hmcmds command needs to be
sensitive to this requirement and wait a longer period of time
for a power on to occur before it reports that the power on
failed.
PROBLEM CONCLUSION:
Modify hmcmds to wait a longer period
of time during a power on sequence before it reports a
failure.
------
APAR: IY18354 COMPID: 5765B9501 REL: 320
ABSTRACT: MMDELDISK STOPS WHEN IT FINDS BROKEN DISK ADDRESS
PROBLEM DESCRIPTION:
MMDELDISK STOPS WHEN IT FINDS BROKEN DISK ADDRESS
PROBLEM SUMMARY:
mmdeldisk terminiates when it finds a disk
block which can not be read.
PROBLEM CONCLUSION:
mmdeldisk should continue when
encountering a bad disk block and mark the indirect block
pointing at the bad disk block as bad.
------
APAR: IY18369 COMPID: 5765D6100 REL: 220
ABSTRACT: CLASS STATEMENT DOES NO ALLOW ENOUGHT SPACE/NEW FORMAT
PROBLEM DESCRIPTION:
LoadL Class statement in the LoadL_config.local file is limited
to 1024 characters. This with the new hardware and number of
processors does not allow to list larger numbers of classes.
Currently you need to list the class for each instance and will
run out of room if large class names or large number of classes.
New format to correct this concern and allow greater number of
classes.
PROBLEM SUMMARY:
LoadL Class statement in the LoadL_config.local file
does not allow large numbers of classes to be listed.
Currently you need to list the class for each instance and
will
run out of room for large class names or large number of
classes.
New format addresses this concern and allow greater number
of
classes.
PROBLEM CONCLUSION:
Two formats for the CLASS statements will be accepted.
old format:
CLASS = { "Class_A" "Class_A" "Class_B" }
new format:
CLASS = Class_A(2) Class_B(1)
The new format will allow more class instances to be
specified and make it
easier to specify them.
If "{" or "}" is detected for the CLASS statement, it will
be processed
according to the old format; otherwise, the new format.
The following applies to the new format:
Each class can only have one entry.
If a class has more than one entry or there is a syntax
error, the whole
CLASS statement will be ignored.
No_Class(1) will be added if there is no good user input
from the CLASS
statement.
White spaces are free for the new format.
The number of instances for a class specified inside ()
should be a
unsigned integer.
------
APAR: IY18378 COMPID: 5765B9501 REL: 320
ABSTRACT: DEADLOCK RESTRIPING A METAFILE.
PROBLEM DESCRIPTION:
deadlock restriping metadata.
PROBLEM SUMMARY:
Deadlock when running restripe command.
PROBLEM CONCLUSION:
Correct locking error in GPFS
------
APAR: IY18485 COMPID: 5765D9300 REL: 310
ABSTRACT: BUG IN CONVERTING FILE OFFSET TO BYTE DISPLACEMENT
PROBLEM DESCRIPTION:
bug in converting file offset to byte displacement
PROBLEM SUMMARY:
The offset in MPI-IO interfaces is expressed in the number
of etypes. This should always be converted into a byte
displacement from file displacement before accessing data.
In an optimization for certain data types, MPI library
failed to handle the conversion correctly.
PROBLEM CONCLUSION:
The optimization code has been changed so offset in MPI-IO
interfaces will be converted into byte offset correctly.
------
APAR: IY18486 COMPID: 576554300 REL: 240
ABSTRACT: BUG IN CONVERTING FILE OFFSET TO BYTE DISPLACEMENT
PROBLEM DESCRIPTION:
bug in converting file offset to byte displacement
PROBLEM SUMMARY:
The offset in MPI-IO interfaces is expressed in the number
of etypes. This should always be converted into a byte
displacement from file displacement before accessing data.
In an optimization for certain data types, MPI library
failed to handle the conversion correctly.
PROBLEM CONCLUSION:
The optimization code has been changed so offset in MPI-IO
interfaces will be converted into byte offset correctly.
------
APAR: IY18487 COMPID: 5765D5101 REL: 111
ABSTRACT: HAEM DAEMON PRIORITY LIKE HATS_PRIORITY
PROBLEM DESCRIPTION:
To stop haem from core dumping, dev has recommended to implement
the same mechanism that hags uses which is to set the priority
of the daemon at hats_priority + 1.
haem was not responding to the hags daemon at the given time (2
mins). When haem detects this, he terminates himself, respawns
and problem goes away until it can not respond again to hags for
2 mins.
haemd did not get enough cpu cycles because it runs with the
PROBLEM SUMMARY:
If the system is heavily loaded, there might be occassions
that haem daemon is not getting enough resources to be able
to respond to hags within two minutes. When this occurs,
haemd terminates and respawns.
PROBLEM CONCLUSION:
The priority of the haemd process and its resource
monitors are now equal to:
(FixPri_Value attribute of the TS_Config class) + 1.
In other words, haemd priority is set to hats_priority + 1
(which is what hags is using). This will minimize the
possibility of haem not being able to respond to hags within
2 minutes since they are not both set to the same priority.
------
APAR: IY18488 COMPID: 5765B9501 REL: 320
ABSTRACT: INCREASEINDIRECTIONLEVEL LIVELOCK RESERVING LOGSPACE
PROBLEM DESCRIPTION:
increase indirection level livelock reserving logspace
PROBLEM SUMMARY:
fix potential deadlock discovered in
development.
------
APAR: IY18492 COMPID: 5765D5100 REL: 320
ABSTRACT: HARDMON ERRORS AFTER APPLYING APAR IY16350
PROBLEM DESCRIPTION:
APAR IY16350 in ptf set 8 introduced new code for hardware
support. The post_u script that is run during the ptf install
will add entries to the /spdata/sys1/spmon/hmthresholds file
only if the node number is 0. The cws will only have a node
number of 0 if install_cw has been run. On new cws installs
that apply ssp code then pt8 before running install_cw the
post_u will not run and the hmthresholds will not be updated
leading to errors 0026-612, 0026-409, 0026-405.
LOCAL FIX:
run /usr/lpp/ssp/ssp.basic/3.2.0.9/inst_root/ssp.basic.post_u
on cws after running install_cw
PROBLEM SUMMARY:
In cases where PTF 8 has been installed on a CWS, prior to
install_cw being run, hmthresholds will not be updated with
entries for the M80. This will result in the CWS not being
able to communicate with hardmon.
PROBLEM CONCLUSION:
post_process has been modified to check hmthresholds for
entries required for the M80 and to add the entries if they
do not exist.
------
APAR: IY18494 COMPID: 5765D5100 REL: 320
ABSTRACT: S1 TTY VALUE NOT UPDATED WITH SPFRAME
PROBLEM DESCRIPTION:
s1 tty value not updated with spframe
PROBLEM SUMMARY:
This conditional was chosen to avoid a double negative, but
it doesn't work as expected for double quotes("").
PROBLEM CONCLUSION:
Changing the conditional now handles the s1tty code path
properly for defined tty and for undefined ("")
------
APAR: IY18515 COMPID: 5765D2800 REL: 430
ABSTRACT: CLINFO EXITS WHEN NOFILES IS SET TO VALUE GREATER THAN 2000.
PROBLEM DESCRIPTION:
Clinfo exits when nofiles is set to a value greater than
2000.
LOCAL FIX:
set nofiles to 2000 or less.
PROBLEM CONCLUSION:
Modify clinfo so that it is able to handle __NUM_ENTRIES
number of open files.
------
APAR: IY18527 COMPID: 5765B9501 REL: 320
ABSTRACT: ASSERT SUBROUTINE FAILED: !UNPINSOMEBUFFER: NO BUFFERS FOUND
PROBLEM DESCRIPTION:
assert subroutine failed: !unpinSomeBuffer: no buffers found
PROBLEM SUMMARY:
Remove incorrect assert. When other threads are
pinning and
unpinning,
the accounting cannot prevent this assertion from happening.
Just wait a litle while and try again.
PROBLEM CONCLUSION:
Remove incorrect assert. When other threads are pinning
and unpinning, the accounting cannot prevent this assertion
from happening. Just wait a litle while and try again.
------
APAR: IY18601 COMPID: 5765B9501 REL: 320
ABSTRACT: INODE PREFETCH LOOPING
PROBLEM DESCRIPTION:
inode prefetch looping
PROBLEM SUMMARY:
Infinite loop under rare conditions found in
development.
PROBLEM CONCLUSION:
InodePretchInstance::WorkerThreadBody
nIdle variable not initialized resulting in prefetch
thread spinning.
------
APAR: IY18606 COMPID: 5765D5100 REL: 320
ABSTRACT: RETRIEVE NODE DEVICE CONFIG INFO ON SNAPS
PROBLEM DESCRIPTION:
retrieve node device config info on snaps
PROBLEM SUMMARY:
The css.snap command was modified to include output
from the "lsattr -E -l css0" and "lsattr -E -l css1"
commands.
PROBLEM CONCLUSION:
The css.snap command needs to provide additional
information.
------
APAR: IY18628 COMPID: 5765D2800 REL: 430
ABSTRACT: HAES: CLSWAPADDRESS CAUSES TWO SWAP_ADAPTER EVENTS TO BE RUN
PROBLEM DESCRIPTION:
When customer issued Swap Network Adapter from the smit panel,
another swap_adapter event started while the initial one was
still executing. The second one ended with a script error and
went into config_too_long.
PROBLEM SUMMARY:
When customer issued Swap Network Adapter from the smit panel,
another swap_adapter event started while the initial one was
still executing. The second one ended with a script error and
went into config_too_long.
PROBLEM CONCLUSION:
Corrected the calls to cl_hats_adapter in cl_swap_IP_address
and cl_swap_ATM_IP_address such that clstrmgr recognizes the
grace period for the swap currently running.
------
APAR: IY18650 COMPID: 5765D2800 REL: 430
ABSTRACT: HACMP.OUT AND CSPOC.LOG HAS TRANSLATED MESSAGE
PROBLEM DESCRIPTION:
If LANG is set to point to a non-english message catalog, the
HA log files may contain non-english text. This can make it
difficult to receive customer support in the event of problems.
PROBLEM CONCLUSION:
This is a change from the solution posted above:
Add 'LANG=C' on the command line when clgetif is called.
De-internationalize clsetenvgrp. This program is not intended
to be run by users, and thus none of its output should be
showing up anywhere other than the logs. The command does not
have a man page or any other user documentation, which means
that this solution should be appropriate.
------
APAR: IY18651 COMPID: 5765D2800 REL: 430
ABSTRACT: MOUNT POINT NOT REMOVED WHEN REMOVING FILESYSTEM WITH C-SPOC -
PROBLEM DESCRIPTION:
When the customer used C-SPOC to remove a filesystem and
specified to also remove the mount point, the mount point was
removed only on the node where the vg was varied on and not
on any of the other cluster nodes participating in the volume
group.
PROBLEM CONCLUSION:
If -r flag do try_parallel of rmdir mount point.
------
APAR: IY18654 COMPID: 5765B9501 REL: 320
ABSTRACT: MMDELDISK FAILS WITH NOT ENOUGH MEMORY
PROBLEM DESCRIPTION:
mmdeldisk fails with not enough memory
PROBLEM SUMMARY:
E_NOMEM error received when running mmdeldisk
PROBLEM CONCLUSION:
Correct memory leak in the token manager
when running mmdeldisk.
------
APAR: IY18673 COMPID: 5765B9501 REL: 320
ABSTRACT: SLOW NFS FLUSH_RANGE CALLS
PROBLEM DESCRIPTION:
slow nfs flush_range calls
PROBLEM SUMMARY:
NFS performance improvement
PROBLEM CONCLUSION:
Streamline one piece of the data flush when
running NFS writes.
------
APAR: IY18695 COMPID: 5765D5100 REL: 320
ABSTRACT: COLONY ADAPTER CONFIG PROBLEM
PROBLEM DESCRIPTION:
colony adapter config problem
PROBLEM SUMMARY:
There was a timing problem that only manifested itself on
old hardware during the css0 adapter configuration.
PROBLEM CONCLUSION:
By removing 3 of 4 reads ( put in place to confirm the
previous write ) during RDRAM reset the timing issue was
resolved.
------
APAR: IY18697 COMPID: 5765D5100 REL: 320
ABSTRACT: COLONY:FIXES/DEBUG ITEMS
PROBLEM DESCRIPTION:
colony; fixes/debug items
PROBLEM SUMMARY:
While working on some problems at a customer's shop, I
discovered some paths through the code that shouldn't be
executed, but were NOT explicitly prevented by the code. I
also added some debug code which didn't negatively affect
performance.
PROBLEM CONCLUSION:
I altered the code to explicitly prevent some of the paths
in the code from being illegally executed. I added some of
the debug code that I developed while working on some field
problems.
------
APAR: IY18701 COMPID: 5765D5100 REL: 320
ABSTRACT: ERROR UNDEFINING VSD FOR GHOST VSDS
PROBLEM DESCRIPTION:
error undefining vsd for ghost vsds
PROBLEM SUMMARY:
The undefvsd command is not undefining vsds. A non-zero
return code is returned, but no error message is issued.
PROBLEM CONCLUSION:
The code is missing a define that is necessary once
the rvsdrestrict level is set to RVSD3.2.0.4.
------
APAR: IY18742 COMPID: 5765D5100 REL: 311
ABSTRACT: PSSP_SCRIPT SHOULD SUPPORT NON-ENGLISH LOCALES
PROBLEM DESCRIPTION:
pssp_script (PSSP 3.2) should support non-English locales.
+765 bis_adap_addr=$ bis_adap_addr#*is #- Strip leading stuff
+766 bis_adap_addr=$ bis_adap_addr%% , * #- Strip following
the line does include a bias for "bis_adap_add**is ".
config.log file contains the following error:
+ bis_adap_addr=wg101682 ist 164.17.10.10, Aliases: wg101682
get_eff_addr[25]: wg101682: 0403-009 Die angegebene Nummer ...
LOCAL FIX:
As circumvention:
edit pssp_script lines having "**is " for THIS SPECIFIC problem.
bis_adap_addr=$ bis_adap_addr#*is
change to:
bis_adap_addr=$ bis_adap_addr#**is
PROBLEM SUMMARY:
When customizing a node, if LC_ALL is set to something
other then en_US (e.g. de_DE), the customization will fail.
pssp_script will issue a message from its get_eff_addr
routine stating that an invalid number was supplied for an
IP address.
The problem is caused by pssp_script issuing a "host"
command, and then doing a Korn shell pattern match based
on the word "is". That will fail in non-English locales.
************************************************************
* USERS AFFECTED: Users whose installation default *
* language locale is not English. *
************************************************************
* PROBLEM DESCRIPTION: Node customization will fail in *
* get_eff_addr complaining about bad data in a hostname. *
************************************************************
* RECOMMENDATION: pssp_script must be changed so all *
* its internal calls generate output in the "C" locale. *
************************************************************
PROBLEM CONCLUSION:
pssp_script has been modified to use LC_ALL=C on its calls
to functions that return character strings so that the
output is in English.
This allows the pattern-matching operators that follow
these commands to find the English word(s) they are looking
for
------
APAR: IY18773 COMPID: 5765D5100 REL: 311
ABSTRACT: EM.DEFAULT LOG HAS WARNING MSG CONCERNING LIBSDR.A BEING LINKED
PROBLEM DESCRIPTION:
In /var/ha/log/em.default log the following warning msg appears
Error!!! The SDR has detected that multiple copies of itself
have been linked into this program. This happens if a library or
the binary has been linked with a static copy of libSDR.a and
shared libraries are used that reference the shared SDR library,
libSDR.a. This program must be re-linked!
The way EM uses the SDR library was changed in 3.1.1 causing
this problem.
PROBLEM SUMMARY:
As a result of the process flow in haemd, when a call is
made to SDROpenSession, the following messages are
erroneously written to the em.default log in /var/ha/log/:
Error!!! The SDR has detected that
multiple copies of itself have been linked into this
program.
This happens if a library or the binary has been linked with
a static copy of libSDR.a and shared libraries are used that
reference the shared SDR library, libSDRs.a. This program
must be re-linked!
There is actually not an error condition in this situation,
but SDRCloseSession has failed to clear an environment
variable which causes the message to be written to the log.
PROBLEM CONCLUSION:
SDRCloseSession was modified to clear the environment
variable which is used to track if multiple copies of the
SDR have been linked into a program.
------
APAR: IY18904 COMPID: 5765B9500 REL: 130
ABSTRACT: SPBGADM STILL NOT BEING ADDED TO /ETC/SYSCTL.MCMD.ACL FILE
PROBLEM DESCRIPTION:
The SPbgAdm entry is still not being added to the
/etc/sysctl.mmcmd.acl file. The reason is that the root part
of mmfs.gpfs.rte did not ship because there were no modified
shippable parts. The fix for IY16838 involved changes a PTF
install processing script, which is not considered a shippable
part. We ned to force the root part of mmfs.gpfs.rte to ship
even though no shippable parts have changed.
LOCAL FIX:
Add the following line to the /etc/sysctl.mmcmd.acl file
PROBLEM SUMMARY:
***********************************************************
* USERS AFFECTED: Users migrating to GPFS 1.3 *
***********************************************************
* PROBLEM DESCRIPTION: *
* The root.SPbgAdm line is not being added to the *
* /etc/sysctl.mmcmd.acl *
* file. This results in GPFS commands failing with *
* insufficient authorization messages. *
***********************************************************
* RECOMMENDATION: Install this APAR *
***********************************************************
TEMPORARY FIX:
*************************************************************
* Manually add the SPbgAdm line to the /etc/sysctl.mmcmd.acl*
* file. For example: *
*_PRINCIPAL root.SPbgAdmj *
*************************************************************
------
APAR: IY18963 COMPID: 576554300 REL: 240
ABSTRACT: INSUFFICIENT STACK FOR KICKPIPES() IN MPCI, CAUSES A PROBLEM
PROBLEM DESCRIPTION:
PSSP 3.1.1 introduced local var 'shoveq' & 'frq' in kickpipe(
in MPCI. They need stack frame 8192 Bytes, but they are 4096
Bytes. This fact causes a problem for Informix down.
PROBLEM SUMMARY:
Running Informix on an SP system, can fail if
a query with 2000 or's is done. Informix may detect a
corruption of the header of its stack block pool, and quit.
PROBLEM CONCLUSION:
MPCI, which is used by Informix, added a
couple of large stack variables for shared memory support.
This causes a problem, for Informix, because Informix both uses
MPCI and manages its own threading and stacks. Informix's
current management does not account for the addition of 8K of
additional stack space for the MPCI routines that Informix
calls. MPCI changed the declaration of these new large
variables so that they are now locatd in the heap, instead
of the stack. The new MPCI implementation solves the problem
that Informix had working with our MPCI environment and is
probably the better way for MPCI to handle these large
variables.
------
APAR: IY18967 COMPID: 5765B9500 REL: 130
ABSTRACT: README UPDATE RELEATIVE TO IBM ESS STORAGE
PROBLEM DESCRIPTION:
README update relative to IBM ESS Storage
------
APAR: IY19005 COMPID: 5765C3403 REL: 430
ABSTRACT: BOS.RTE.INSTALL 4.3.3.51 SHOULD NOT REQUIRE 4.3.3.50
PROBLEM DESCRIPTION:
PTF U475744 (bos.rte.install 4.3.3.51) should install
on any level 4.3 system without any requisite to
previous level bos.rte.install updates. However, it
has a requisite to the 4.3.3.50 level.
PROBLEM CONCLUSION:
Supersede the bos.rte.install 4.3.3.51 level update with
the 4.3.3.52 level.
------
APAR: IY19044 COMPID: 5765D5100 REL: 320
ABSTRACT: PERF. TUNE PAAM BUG FIX AND ADD MX SUPPORT
PROBLEM DESCRIPTION:
Perf. Tune PAAM bug fix and add MX support
PROBLEM SUMMARY:
Performace degradation after PAAM fix - Colony
Possible PAAM bug in TB3MX
PROBLEM CONCLUSION:
Performance tune KHAL - Colony and TB3MX
Add full PAAM fix for TB3MX user-space
------
APAR: IY19045 COMPID: 5765D5100 REL: 320
ABSTRACT: COREDUMP WHEN USE -P OPTION IN THE IFCL_DUMP
PROBLEM DESCRIPTION:
coredump when use -p option in the ifcl_dump
PROBLEM SUMMARY:
The ifcl_dump command will core dump when it is run with
the '-p' flag (/usr/lpp/ssp/css/ifcl_dump -p).
PROBLEM CONCLUSION:
The ifcl_dump command has been changed to prevent a core
dump from occuring when ifcl_dump is run with the -p flag.
------
APAR: IY19046 COMPID: 5765D5100 REL: 320
ABSTRACT: 4M/2M SRAM SUPPORT
PROBLEM DESCRIPTION:
4M/2M SRAM suppport
PROBLEM SUMMARY:
The switch IP driver needs to be able to support both
SP Switch 2 adapters that have 2 megabytes of SRAM, as
well as adapters that have 4 megabytes of SRAM.
PROBLEM CONCLUSION:
The switch IP driver, if_cl, has been changed to support
SP Switch-2 adapters that have 2 megabytes of SRAM, as
well as adapters that have 4 megabytes of SRAM.
------
APAR: IY19047 COMPID: 5765D5100 REL: 320
ABSTRACT: COLONY: DUMP.S NEEDS TO FLUSH ALL 2M OF SRAM
PROBLEM DESCRIPTION:
COLONY: dump.s needs to flush all 2M of SRAM
PROBLEM SUMMARY:
The dump.s routine needs to flush all 2M of SRAM (or the
higher 2M of SRAM, if you are working with a 4M adapter).
Currently, it was only flushing 512K. This meant that under
certain conditions, the microcode logs would include old
values of some of its variables, rather than what was
actually being seen by the microcode which was still in
cache.
PROBLEM CONCLUSION:
I altered dump.s to flush the entire 2M SRAM (or high order
2M of SRAM in a 4M adapter).
------
APAR: IY19048 COMPID: 5765D5100 REL: 320
ABSTRACT: DELETE EXCEPTION HANDLING CODE FROM COLONY DRIVER SOURCE
PROBLEM DESCRIPTION:
delete exception handling code from colony driver source
PROBLEM SUMMARY:
The exception handling code for the SP Switch 2 adapter is
redundant as the adapter will checkstop the node on errors.
The redundant code and related branches was removed.
PROBLEM CONCLUSION:
The exception handling code for the SP-Switch 2 adapter is
redundant as the adapter will checkstop the node on errors.
The redundant code and related branches was removed.
------
APAR: IY19300 COMPID: 5765D5100 REL: 320
ABSTRACT: LAPI/KLAPI PERFORMANCE PROBLEM
PROBLEM DESCRIPTION:
LAPI/KLAPI performance problem after PTF9 on SP-Switch2.
PROBLEM SUMMARY:
KLAPI performance suffers with PTF9.
LAPI performance is not on par with MPI.
PROBLEM CONCLUSION:
With this fix, LAPI US performance should be as good as MPI.
KLAPI performance is fixed with this defect.
------
APAR: IY19310 COMPID: 5765E6110 REL: 220
ABSTRACT: REQUIRED UPDATES FOR RSCT VERSION 2.2
PROBLEM DESCRIPTION:
Required updates for RSCT Version 2.2
PROBLEM SUMMARY:
These updates must be applied if you are using
WebSM or have the PSSP or HACMP/ES products installed.
PROBLEM CONCLUSION:
These updates must be applied if you are
using WebSM or have the PSSP or HACMP/ES products installed.
------
APAR: IY19314 COMPID: 5765D5100 REL: 320
ABSTRACT: SPFRAME MANPGE NEEDS UPDATE FRO CSP OPTION
PROBLEM DESCRIPTION:
spframe manpage needs updated for csp option
PROBLEM SUMMARY:
Support was added with APAR IY16350 in ssp.basic 3.2.0.9 for
the RS/6000 H80 and M80. This support includes a new
hardware protocol of csp on the spframe command. The man
page for spframe needs to be updated with this information.
PROBLEM CONCLUSION:
Updated the spframe man page with the new syntax for
the RS/6000 H80 and M80, which is:
spframe -p CSP -n starting_switch_port -r yes | no
start_frame frame_count starting_tty_port
CSP was also added to the list of valid hardware protocols.
------
APAR: IY19328 COMPID: 5765B9501 REL: 320
ABSTRACT: UMASK NOT HONORED THROUGH NFS
PROBLEM DESCRIPTION:
umask not honored through nfs
PROBLEM SUMMARY:
umask not being honored from nfs.
PROBLEM CONCLUSION:
let lfs and nfs set the file or directory mo
de masked with the umask.
------
APAR: IY19897 COMPID: 5765C3403 REL: 430
ABSTRACT: AIX 4.3 SECURITY RELATED UPDATES AS OF JUNE 2001
PROBLEM DESCRIPTION:
This APAR delivers security related updates for AIX 4.3
available as of June 2001.
This is a packaging APAR only. It will not appear in the list
of APARs on the SMIT "Update Software by Fix (APAR)" panel, nor
will the 'instfix' command show this APAR as being installed
after the updates delivered by this package are installed.
To install selected updates from this package, use the command:
smit update_by_fix
To install all updates from this package that apply to installed
filesets on your system, use the command:
smit update_all
PROBLEM SUMMARY:
Packaging only.
------
APAR: IY19901 COMPID: 5765B8100 REL: 220
ABSTRACT: ARTIC960 VERSION 1.4.3 UPDATE
PROBLEM DESCRIPTION:
Update to Artic drivers to version 1.4.3
------
APAR: IY19908 COMPID: 576552900 REL: 240
ABSTRACT: LATEST PSSP 2.4 FIXES AS OF JUNE 2001.
PROBLEM DESCRIPTION:
Latest PSSP 2.4 fixes as of June 2001.
PROBLEM SUMMARY:
This is a packaging apar for PSSP 2.4 fixes as
of June 2001.
PROBLEM CONCLUSION:
This is a packaging apar for PSSP 2.4 fixes
as of June 2001.
------
APAR: IY19921 COMPID: 5765D5100 REL: 320
ABSTRACT: LATEST PSSP 3.2.0 FIXES AS OF JUNE 2001
PROBLEM DESCRIPTION:
This is the lastest PSSP ptf as of June 2001.
Order this apar to get all of the ptfs as of June 2001.
PROBLEM SUMMARY:
This is a packaging apar for PSSP 3.2.0 fixes
as of June 2001.
PROBLEM CONCLUSION:
This is a packaging apar for PSSP 3.2.0
fixes as of June 2001.
------
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]