|
Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com |
From: AIX Service Mail Server (aixserv_at_austin.ibm.com)
Date: Tue Aug 06 2002 - 02:43:10 CDT
APAR: IY24116 COMPID: 5765E6900 REL: 310
ABSTRACT: FUTURE AVAILABILITY OF LOADLEVELER CHECKPOINT/RESTART FOR 32 BIT
PROBLEM DESCRIPTION:
future availability of loadleveler checkpoint/restart for
64 bit application information.
PROBLEM SUMMARY:
availability of loadleveler checkpoint/restart
for 32 bit application information.
------
APAR: IY24117 COMPID: 5765E6900 REL: 310
ABSTRACT: FUTURE AVAILABILITY OF LOADLEVELER CHECKPOINT/RESTART FOR
PROBLEM DESCRIPTION:
future availability ofloadleveler checkpoint/restart for 64 bit
application information.
PROBLEM SUMMARY:
availability of loadleveler checkpoint/
restart for 64 bit application information.
------
APAR: IY28091 COMPID: 5765D5100 REL: 320
ABSTRACT: SPGETDESC SUPPORT OF WINTERHAWK2 450MHZ
PROBLEM DESCRIPTION:
The WinterHawk2 450MHz needs to be added to the spgetdesc
command .
PROBLEM SUMMARY:
/usr/lpp/ssp/bin/spgetdesc did not describe the
processor speed of the Winterhawk II nodes.
For Winterhawk II nodes, both thin and wide,
currently the description that is returned is:
spgetdesc: Node 5 (c183n05.ppd.pok.ibm.com)
is a 375_MHz_POWER3_SMP_Thin
The description has been updated to return:
375/450_MHz_POWER3_SMP_Thin - for thin nodes
375/450_MHz_POWER3_SMP_Wide - for wide nodes
PROBLEM CONCLUSION:
In the spgetdesc script, the definition table for
a Winterhawk II node was changed to read
375/450_MHz_POWER3_SMP_Thin
OR
375/450_MHz_POWER3_SMP_Wide
------
APAR: IY29066 COMPID: 5765E6800 REL: 300
ABSTRACT: README.PERFAGENT FOR THE SERVER NEEDS TO BE UPDATED
PROBLEM DESCRIPTION:
bos.perf.tools README needed.
PROBLEM SUMMARY:
The readme text for solaris & hp support
need to be removed from perfagent.server README file.
PROBLEM CONCLUSION:
Remove the readme text which tells about xmservd's
solaris & hp support and also added that HP & solaris
are no more supported.
------
APAR: IY29412 COMPID: 5765E7400 REL: 300
ABSTRACT: AZIZO DATA COLLECTION 256 METRIC LIMIT
PROBLEM DESCRIPTION:
ptxtab provides an option to generate a sinlge ASCII file.
PROBLEM CONCLUSION:
flag -u causes ptxtab to ignore the maximum of metircs in each
statset and format a single output file as comma separated
ASCII.
------
APAR: IY29453 COMPID: 5765E6800 REL: 300
ABSTRACT: PTXSPLIT -F OPTION PROCESSES NO MORE THAN 256 METRICS
PROBLEM DESCRIPTION:
ptxsplit -f option processes no more than 256 metrics
PROBLEM SUMMARY:
ptxsplit -f option can process more than 256 metrics.
PROBLEM CONCLUSION:
ptxsplit -u option reconstructs statset, so it can process
more than 256 metrics.
------
APAR: IY29544 COMPID: 5765E6900 REL: 310
ABSTRACT: GANG PREEMTION PROBLEM
PROBLEM DESCRIPTION:
Problem:
When Schedd goes down, jobs in preempted state remain in the
queue in 'EP' state for indefinite time, even after the Schedd
has restarted.
PROBLEM SUMMARY:
Jobs remain in EP (preempt pending) indefinitely.
PROBLEM CONCLUSION:
Jobs need to be running before being preempted.
------
APAR: IY29802 COMPID: 5765C3403 REL: 430
ABSTRACT: ENSCRIPT DOESN'T SUPPORT HEXA CHARMETRICS IN .AFM FILES
PROBLEM DESCRIPTION:
enscript will fail with following error:
enscript: 1007-603 The AFM file /usr/lib/ps/<font>.afm has a
line which is not formatted correctly.
incase there are Hexa charmetrics in the AFM files, in
StartCharmetrics
EndCharmetrics section
A Hexa Charmetric will be typically like this:
CH 816D ; WX 250 ; N space ; B 0 0 0 0 ;
PROBLEM SUMMARY:
enscript and afmdit.awk doesnt support hexa
charmetrics in AFM files
PROBLEM CONCLUSION:
enscript and afmdit.awk enhanced to support
CharMetrics of type:
CH 814E ; WX 1000 ; N chars ; B 334 703 665 805 ;
------
APAR: IY29901 COMPID: 5765B8100 REL: 220
ABSTRACT: SAVEDT USES UNINITIALISED VARIABLE IN SOME CLEANUP ROUTINES.
PROBLEM DESCRIPTION:
The saveDT script can report an error with when attempting to
clean-up after a failure.
This only reports the error if you select to tar to tape.
LOCAL FIX:
The problem will occur when some other failure has occurred on
the system. Check and correct the previous failure.
PROBLEM SUMMARY:
The saveDT script can report an error with when
attempting to clean-up after a failure.
This only reports the error if you select to tar to tape.
PROBLEM CONCLUSION:
By checking appropriate environment
variables were set before trying to use them.
------
APAR: IY30258 COMPID: 5765E6110 REL: 220
ABSTRACT: REQUIRED MAINTENANCE UPGRADE
PROBLEM DESCRIPTION:
required maintenance upgrade
------
APAR: IY30437 COMPID: 5765D5100 REL: 320
ABSTRACT: BAD DMA WRITE FOR KLAPI 0-COPY MSG
PROBLEM DESCRIPTION:
This problem is caused by cleaning up a hal dma handle while the
there is still a post of the message possible.
PROBLEM SUMMARY:
There is a time hole where a DMA buffer may
remain posted after a message is marked as complete to the user.
This leaves the possibility of data corruption or, in the case
of a Regatta, a system check stop.
PROBLEM CONCLUSION:
All outstanding DMA buffers are cancelled
before a buffer is marked as complete to the user.
------
APAR: IY30443 COMPID: 5765E7400 REL: 300
ABSTRACT: GREY OUT USELESS VIEWS
PROBLEM DESCRIPTION:
views are greyed-out based on the amount of data recording
files have.
PROBLEM CONCLUSION:
views are greyed-out based on the amount of data recording
files have. For example if only 1 days worth of data is
available then no need to offer the year-by-month view.
------
APAR: IY30645 COMPID: 5765C3403 REL: 430
ABSTRACT: PPS FAIL TO SYNC ON DISK WITH > 2ª15 PPS
PROBLEM DESCRIPTION:
Customer may be unable to sync all partitions on logical
volumes on a disk with greater than 2ª15 physical partitions.
PROBLEM SUMMARY:
syncvg fails to sync all partitions on disk
with greater than 2ª15 pps.
PROBLEM CONCLUSION:
change variable types in LVM kernel config
routine to ensure proper handling of large
pp numbers.
TEMPORARY FIX:
Reduce factor of VG, thereby decreasing the
number of PPs per PV.
------
APAR: IY30864 COMPID: 5765E5400 REL: 440
ABSTRACT: HAS,HAES: EXTREMELY LONG FALLOVER TIMES DUE TO IMFS COMMANDS
PROBLEM DESCRIPTION:
The customer was testing some fallovers and found that they
were taking an extremely long time. It was found that the
long time was due to imfs commands issued during the takeover
of the vg's. The customer was using bigvgs.
PROBLEM SUMMARY:
HACMP will always run imfs after varying on a volume group.
This is unnecessary for bigvgs.
PROBLEM CONCLUSION:
Test to see if the volume group is a bigvg, and, if so, skip
the imfs.
------
APAR: IY31246 COMPID: 5765D5100 REL: 340
ABSTRACT: RC.SP SETS THE WRONG BOOTLIST IF TOTAL BOOTDISKS NOT
PROBLEM DESCRIPTION:
rc.sp sets the wrong bootlist if total bootdisks
not equivalent to total install disks
PROBLEM SUMMARY:
On the reboot of a node, the bootlist was being reset to
include all of the physical volumes listed for the selected
volume group of the node. Even the physical volumes that
did not contain boot logical volumes were included in
the bootlist. If there was a high number of physical
volumes it could cause a subsequent reboot to fail.
PROBLEM CONCLUSION:
spboot, which is called by /etc/rc.sp, was modified to only
set the bootlist to physical volumes that contain boot
logical volumes.
------
APAR: IY31307 COMPID: 5765B9500 REL: 150
ABSTRACT: PROBLEM IN CONVERTING FROM MMSDRFS TO MMSDRFS2 WHEN MIGRATING
PROBLEM DESCRIPTION:
After migration from gpfs-1.2 to gpfs-1.5, customer was unable
to start daemon because of the following errors ...
mmcommon: Invalid keyword: getNodeSDRdata
mmremote: Unexpected error from getLock: convertFwd. Return code
mmremote: 6027-1571 Unexpected failure executing sysctl -h cws1
Check the preceding messages if any. Check /etc/sysctl.mmcmd.ac
restart sysctl on cws1, or run kinit on this node.
From the error messages this appeared to be sysctl related.
However, the actual problem was that the mmcommon routine was
being called with a function name that had changed in the
gpfs-1.5 version of the file, hence the "Invalid keyword".
PROBLEM SUMMARY:
Allow getNodeSDRdata to be called directly. Needed when
migrating from rel 1.1 or 1.2.
PROBLEM CONCLUSION:
Fixed migration path from release 1.1 and 1.2
------
APAR: IY31350 COMPID: 5765D5100 REL: 320
ABSTRACT: SPCW_DEFER_NTP SHOULD CALL /USR/SBIN/NTPDATE, NOT $SSP_BIN/NTPDA
PROBLEM DESCRIPTION:
The PSSP Version of xntpd is no longer used in PSSP 3.2. PSSP
systems, 3.2 and higher, must use the AIX version of ntpdate,
which is /usr/sbin/ntpdate.
The script /usr/sbin/hacws/spcw_defer_ntp uses $SSP_BIN/ntpdate,
which is incorrect. See line 101. Result is that xntpd does
not start.
LOCAL FIX:
As a workaround, customer can use a symbolic link:
ln -s /usr/sbin/ntpdate /usr/lpp/ssp/bin/ntpdate
OR, edit line 101 in /usr/sbin/hacws/spcw_defer_ntp
an replace $SSP_BIN/ntpdate with /usr/sbin/ntpdate.
PROBLEM SUMMARY:
Effective with PSSP 3.2, ntp is no longer shipped with PSSP.
The AIX version of ntp should be used. spcw_defer_ntp
is calling /usr/lpp/ssp/bin/ntpdate when it should be
calling /usr/sbin/ntpdate.
PROBLEM CONCLUSION:
spcw_defer_ntp has been modified to call /usr/sbin/ntpdate
instead of /usr/lpp/ssp/bin/ntpdate.
------
APAR: IY31381 COMPID: 5765B9500 REL: 150
ABSTRACT: GPFS:6027-848 CONFIG MANAGER 35 FAILED UPDATING NEW NODE STATUS
PROBLEM DESCRIPTION:
gpfs:6027-848 config manager 35 failed updating new node status
PROBLEM SUMMARY:
Fixed sysctl locking condition with mmconfig.
PROBLEM CONCLUSION:
In the sp environment, do not use the output of hostname as
a lock identifier. If hostname on a node is set to be the
same as the switch adapter name, locks cannot be reclaimed
(sysctl cannot talk to the node).
------
APAR: IY31445 COMPID: 5765D9300 REL: 320
ABSTRACT: ADDING PROFILE PROBES TO A LARGE APP CAUSED SESMGR CORE DUMP
PROBLEM DESCRIPTION:
I am not getting profile output from my [large] job
PROBLEM SUMMARY:
Using pct to add profile probes to a large application
causes a sesmgr core dump.
PROBLEM CONCLUSION:
The problem is a memory overlay when adding large numbers of
probes. This causes a seg fault of the profile module which
is loaded by sesmgr. An array of objects in the profile module
was incorrectly being staticly allocated. The fix is to dynamically
allocate the array.
------
APAR: IY31450 COMPID: 5765D6100 REL: 220
ABSTRACT: THE LIMIT OF 4095 OF LOADL WAS EXCEEDED WITH LLQ -S
PROBLEM DESCRIPTION:
The limit of 4095 of LoadL was exceeded with llq -s
command. Customer wishes change request to make this limit
larger.
PROBLEM SUMMARY:
In LoadLeveler 2.2, the command llq -s would core dump when
the internal class list array expands the CLASS statement
to more than 4095 characters.
------
APAR: IY31475 COMPID: 5765B9501 REL: 340
ABSTRACT: FSCK DOES NOT FIX CORRUPTED ALLOC MAP CHAINS
PROBLEM DESCRIPTION:
mmfsck does not fix FSSTRUCT errors of type 114 (corrupted
allocation maps).
PROBLEM SUMMARY:
Fixed mmfsck to repair FSSTRUCT errors of type 114
(corrupted allocation maps)
PROBLEM CONCLUSION:
Fix relinkAllChunks which computed an incorrect allocation
map magic number for a disk. Provide new functionality to
verify allocation map chunk list head bitmap chain.
Recognize chunk list head loops and unlinked chunks.
------
APAR: IY31577 COMPID: 5765B9501 REL: 340
ABSTRACT: MMREPQUOTA SHOWS NEGATIVE USAGE AFTER MMRESTRIPEFS
PROBLEM DESCRIPTION:
mmrepquota shows negative usage after mmrestripefs
PROBLEM SUMMARY:
Fixed mmrestripefs causing mmrepquota to show incorrect
usage.
PROBLEM CONCLUSION:
During restripe and defrag, when deallocating unused blocks
do not decrement quota usage count if these blocks were not
allocated with allocBlock.
------
APAR: IY31578 COMPID: 5765B9501 REL: 320
ABSTRACT: NODE PANICKED BY RUNNING GPFS_STAT()
PROBLEM DESCRIPTION:
node panicked by running gpfs_stat()
PROBLEM SUMMARY:
kpathname being traced after it is freed in kernel.
PROBLEM CONCLUSION:
Fixed trace path which could cause gpfs_stat() to panic
node.
------
APAR: IY31580 COMPID: 5765B9501 REL: 340
ABSTRACT: NODE PANICKED BY RUNNING GPFS_STAT()
PROBLEM DESCRIPTION:
node panicked by running gpfs_stat()
PROBLEM SUMMARY:
kpathname being traced after it is freed in kernel.
PROBLEM CONCLUSION:
Fixed trace path which could cause gpfs_stat() to panic
node.
------
APAR: IY31601 COMPID: 5765D9300 REL: 320
ABSTRACT: MAN PAGE FOR MPCC_R SHOULD DOCUMENT: C++ BINDINGS ARE SUPPORTED
PROBLEM DESCRIPTION:
mpCC_r documentation is incorrect. There is a -cpp option that
is documented in the mpcc_r script.However, the -cpp really appl
ies to the mpCC_r script. The -cpp option in mpCC_r enables use
of full C++ bindings in MPI. The man page for mpCC_r also does
not include the -cpp option.
man page for mpCC_r and mpcc_r need to be changed. man page
for mpCC_r should document that C++ bindings are supported via
the -cpp flag. The mpcc_r man page should remove the -cpp flag
from its description.
PROBLEM SUMMARY:
The documentation for mpCC_r is incorrect. There is a -cpp
option that is documented in the mpcc_r script. However,
the -cpp really applies to the mpCC_r script. The -cpp
optionin mpCC_r enables use of C++ bindings in MPI.
PROBLEM CONCLUSION:
The -cpp option was added to the man page for mpCC_r and the
-cpp option was also removed from mpcc_r. The poe.README
was also changed to note the documentation change to mpCC_r
and mpcc_r.
------
APAR: IY31698 COMPID: 5765D5100 REL: 340
ABSTRACT: ERROR MESSAGE WHEN APPLYING SSP.PMAN 3.4.0.1
PROBLEM DESCRIPTION:
error message when applying ssp.pman 3.4.0.1
PROBLEM SUMMARY:
On the apply of ssp.pman 3.4.0.1 in PTF Set 9 installp
prints the message:
touch: 0652-046 Cannot create
/usr/lpp/ssp/README/pman3.2_save_me1.
and fails to create the marker file. This will result in
pmand being unnecessarily recycled in future PTFs and not
being recycled on PTF reject.
The cause of the problem is that the directory README should
be READMES.
PROBLEM CONCLUSION:
The README in the directory path of the marker file
(/usr/lpp/ssp/README) has been changed to READMES in the
installp script file.
------
APAR: IY31699 COMPID: 5765D5100 REL: 320
ABSTRACT: S1TERM PROCESSING ENHANCEMENTS
PROBLEM DESCRIPTION:
s1term processing enhancements
PROBLEM SUMMARY:
Enhancements were required to s1term processing used by
a node to obtain a srvtab and the supman password.
PROBLEM CONCLUSION:
Enhancements were made to s1term processing used by
a node to obtain a srvtab and the supman password.
The effected scripts were kfserver and pssfb_script
for srvtab processing and srvsuppwd and getsuppwd
for the processing of the supman password.
------
APAR: IY31700 COMPID: 5765B9500 REL: 140
ABSTRACT: PROBLEM IN CONVERTING FROM MMSDRFS TO MMSDRFS2 WHEN MIGRATING
PROBLEM DESCRIPTION:
After migration from gpfs-1.2 to gpfs-1.5, customer was unable
to start daemon because of the following errors ...
mmcommon: Invalid keyword: getNodeSDRdata
mmremote: Unexpected error from getLock: convertFwd. Return code
mmremote: 6027-1571 Unexpected failure executing sysctl -h cws1
Check the preceding messages if any. Check /etc/sysctl.mmcmd.ac
restart sysctl on cws1, or run kinit on this node.
From the error messages this appeared to be sysctl related.
However, the actual problem was that the mmcommon routine was
being called with a function name that had changed in the
gpfs-1.5 version of the file, hence the "Invalid keyword".
PROBLEM SUMMARY:
Allow getnodesdrdata to be called directly.
Needed when migrating from rel 1.1 or 1.2.
PROBLEM CONCLUSION:
fixed migration path from release 1.1 and 1.2
------
APAR: IY31780 COMPID: 5765D5100 REL: 340
ABSTRACT: SETUP_SERVER SHOULD IGNORE PPP CONNECTIONS
PROBLEM DESCRIPTION:
If pp0 adapter is pressent setup_server fails.
setup_server : host: 0827-803 Cannot find address 0.0.0.0.
setup_CWS: 0016-338 Kerberos setup was bypassed for network
interfaces that could not be resolved
Setup_server ends with rc = 0. But The node you are installing
does not receive a kerberos ticket.
Circumvention this problem by detaching pp0 causes that
svcagent cannot be activated and running during setup_server
action.
LOCAL FIX:
A good workaround is to add an entry to /etc/hosts like:
zero 0.0.0.0 # dummy ppp entry to prevent setup_server problems
PROBLEM SUMMARY:
When the Point-to-Point Protocol (PPP) is being used on
a Control Workstation, setup_CWS will terminate processing
with the messages:
host: 0827-803 Cannot find address 0.0.0.0.
setup_CWS: 0016-338 Kerberos setup was bypassed for
network interfaces that could not be resolved.
Since the Point-to-Point Protocol is being displayed in
the netstat -in data, setup_CWS tries to determine the
IP addresses for these interfaces and fails. The data
from the Point-to-Point Protocol should be ignored
by setup_CWS.
PROBLEM CONCLUSION:
setup_CWS has been modified to skip lines of data from
netstat -in which refer to the Point-to-Point Protocol.
------
APAR: IY31795 COMPID: 5765B9500 REL: 150
ABSTRACT: MMSETRCMD COMMAND MISSING IN GPFS 1.5 FOR AIX DISTRIBUTION
PROBLEM DESCRIPTION:
mmchcluster command fails in aix environment because the
mmsetrcmd command is missing.
PROBLEM CONCLUSION:
The mmsetrcmd is not in the set of scripts that are
shipped with AIX. However, the command is documented
in the command reference for AIX.
GPFS will now include this script in the inventory of
scripts.
------
APAR: IY31801 COMPID: 5765B9501 REL: 320
ABSTRACT: ASSERT AFTER METANODE RELINQUISH
PROBLEM DESCRIPTION:
assert after metanode relinquish
PROBLEM SUMMARY:
Fixed an Assert after metanode relinquish
PROBLEM CONCLUSION:
Test for turning off the newMnode flag was in the wrong
place
------
APAR: IY31807 COMPID: 5765E5400 REL: 441
ABSTRACT: HACMP/HAES - ICMP PING CAUSES DELAY IN CLGETADDR, CLGETACTIVENOD
PROBLEM DESCRIPTION:
This APAR corrects the condition where a node is pinging an
adapter on another node and clgetaddr, clgetactivenodes,
clfindres, et. is executing simultaneously which causes the
command to take several minutes to execute.
PROBLEM SUMMARY:
If the customer is executing a ping command on the local node
to any responding remote node and then simultaneously
executes a clgetaddr or clgetactivenodes, then a long
delay (approx) 10 minutes or more will result.
PROBLEM CONCLUSION:
Modification of clgetaddr, and other utilities to send
multiple ping requests.
------
APAR: IY31820 COMPID: 5765D9300 REL: 310
ABSTRACT: C++ NON-THREADED PROGRAMS MAY ABORT WHEN RUN WHEN COMPILED WITH
PROBLEM DESCRIPTION:
When a C++ program is compiled with mpCC (the non-threaded
compile script and then run, jobs may abort.
The workaround listed in the poe.README for C++ programs needs
to be altered so that the workaround states it also applies to
VAC 5.0 . IBM will recommend that customers use the mpCC_r
compile script which is the threaded comi
PROBLEM SUMMARY:
C++ non-threaded programs may abort when run with
the mpCC compile script.
PROBLEM CONCLUSION:
The poe.README is being changed to document that C++
executables built with the non-threaded MPI library may
abort when run. IBM recommends that the threaded compile
script such as mpCC_r be used.
There is also a workaround documented in the
poe.README for creating an alternate mpCC
script that provides for an alternative
initialization routine bound in the executable
that prevents the job abort problem. Threaded
applications compiled with the mpCC_r script are
not affected.
------
APAR: IY31861 COMPID: 5765E5400 REL: 440
ABSTRACT: HAS,HAES: CLVERIFY ERROR WHEN CONCURRENT RG DOES NOT INCLUDE
PROBLEM DESCRIPTION:
The customer was attemptin to sync resources in a 5 node cluster
which had a concurrent resource group with only 2 of the
cluster nodes participating in the group and no disk fencing
specified. The sync attempt failed with error msg from
clverify indicating "Not all nodes of cluster were included in
the concurrent resource group".
PROBLEM SUMMARY:
The customer was attempting to sync resources in a 5 node
cluster which had a concurrent resource group with only 2 of
the cluster nodes participating in the group and no disk
fencing specified. The sync attempt failed with error msg from
clverify indicating "Not all nodes of cluster were included in
the concurrent resource group".
PROBLEM CONCLUSION:
The code was changed so that all nodes of the cluster have
to participate in a concurrent RG only if fencing is
specified as TRUE for the RG.
------
APAR: IY31876 COMPID: 5765E5400 REL: 440
ABSTRACT: HAES: ADD SNAPSHOT CONVERSION PATHS
PROBLEM DESCRIPTION:
It is currently not possible to migrate snapshots from versions
HACMP 4.3.1
HACMP 4.4.0
to
HACMP/ES 4.4.1
Support for these conversions should be added.
PROBLEM CONCLUSION:
Conversion paths for updating snapshots from HAS 4.3.1 and
4.4.0 are added. These will not allow a migration install
of HAES 4.4.1 over these version, but it will allow converting
existing snapshots for use with HAES 4.4.1.
------
APAR: IY31900 COMPID: 5765B9501 REL: 340
ABSTRACT: ASSERT AFTER METANODE RELINQUISH
PROBLEM DESCRIPTION:
assert after metanode relinquish
PROBLEM SUMMARY:
Fixed an Assert after metanode relinquish
PROBLEM CONCLUSION:
Test for turning off the newMnode flag was in the wrong
place
------
APAR: IY31915 COMPID: 5765D5100 REL: 340
ABSTRACT: PROBLEMS MOUNTING GPFS FS AFTER DELETING DISKS. DISK DESCRIPTOR
PROBLEM DESCRIPTION:
Problems mounting gpfs fs after deleting disks. The error
6027-711 was received which indicated that the disk or fs
does not exits. It mentioned the deleted disks. the mmsdrfs2
file in the SDR and /var/mmfs/gen were updated and did not show
the disks. The problem is that the disk descriptor areas on
some vsd's are not updated. By chance, the ones that are not
updated are the first one gpfs uses in attempting to mount the
fs causing the failure.
PROBLEM SUMMARY:
After the mmdeldisk command, some filesystem would not be
able to remount due to old replica data.
PROBLEM CONCLUSION:
When migrating the stripe group descriptor to a new replica
set, update the copy of the destriptor on all other disks in
the stripe group as well. This is necessary to prevent
future attempts to read from disks in the old replica set in
case these disks have since been deleted from the stripe
group.
------
APAR: IY31916 COMPID: 5765B9501 REL: 340
ABSTRACT: PROBLEMS MOUNTING GPFS FS AFTER DELETING DISKS. DISK DESCRIPTOR
PROBLEM DESCRIPTION:
Problems mounting gpfs fs after deleting disks. The error
6027-711 was received which indicated that the disk or fs
does not exits. It mentioned the deleted disks. the mmsdrfs2
file in the SDR and /var/mmfs/gen were updated and did not show
the disks. The problem is that the disk descriptor areas on
some vsd's are not updated. By chance, the ones that are not
updated are the first one gpfs uses in attempting to mount the
fs causing the failure.
PROBLEM SUMMARY:
After the mmdeldisk command, some filesystem would not be
able to remount due to old replica data.
PROBLEM CONCLUSION:
When migrating the stripe group descriptor to a new replica
set, update the copy of the destriptor on all other disks in
the stripe group as well. This is necessary to prevent
future attempts to read from disks in the old replica set in
case these disks have since been deleted from the stripe
group.
------
APAR: IY31966 COMPID: 5765D6100 REL: 220
ABSTRACT: LOADLEVELER WRITES TO SOCKET HANG, POSSIBLY CAUSING CORE DUMP
PROBLEM DESCRIPTION:
If a LoadLeveler daemon is writing to a socket and the socket
window fills up, the write can hang until the window drains. If
the hang is long enough (e.g. if the client is suspended the
window will never drain) and a LoadLeveler daemon is holding
locks over the write, this can eventually cause the LoadLeveler
daemon to core dump.
PROBLEM SUMMARY:
LoadLeveler daemons (such as the LoadL_negotiator) can
hang if they are writing to a socket, and the process
reading from the socket is suspended. If a LoadLeveler
daemon hangs writing to a socket this could result in a
core dump.
PROBLEM CONCLUSION:
The LoadLeveler library code has been changed to prevent
socket writes from hanging when the socket window fills.
LoadLeveler will set the socket in non-blocking mode
and allow write operations to time-out.
------
APAR: IY31991 COMPID: 5765C3403 REL: 430
ABSTRACT: REDUCEVG FAILS TO REMOVE DISK PREVIOUSLY CONTAINING DUMPLV
PROBLEM DESCRIPTION:
reducevg fails to remove disk from volume group when disk
formerly held dump device.
PROBLEM SUMMARY:
reducevg fails when customer attempts to remove a disk
from the volume group which formerly held a copy of the
dump logical volume (the dump_inited flag is still set).
PROBLEM CONCLUSION:
The rmlvcopy command needs to correctly change the status
of the dump_inited flag when removing a dumplv from a
disk.
------
APAR: IY31994 COMPID: 5765B9501 REL: 330
ABSTRACT: READDIR() MISSES MOVED BLOCKS
PROBLEM DESCRIPTION:
ls -l command on GPFS may miss to list files. This may
happen in cases where the directory gets increases.
Example: A directory consists of 3 blocks (0,1,2).
A ls -l command is run, using readdir() the read
directory entries. As the ls command is stating the
the first files another file gets created by another
process. If the directory needs to be increased to
hold this new created file another block (3) gets
allocated and half of the enrties of block 1 are
copied into block 3. The last readdir run by the ls
command is supposed to see this has happened and
return results from both blocks, but it is only
returning entries left in block 1, so it leaves out
the ones moved to block 3.
This problem may be seen by other commands using
readdir() too.
PROBLEM SUMMARY:
ls -l may not show all new entries, readdir()
misses moved blocks.
PROBLEM CONCLUSION:
readdir scan was stopping too soon in some
cases if the directory block was split after the scan started.
Also, fix code to work if a directory block merge occurs in
the middle of readdir scan. This won't happen with current
code because merge gets lock that conflicts with readdir
(also, merge always fails with E_MULTI_RANGE_LOCK(), but fix it
anyway in case this changes.
------
APAR: IY31997 COMPID: 5765C3403 REL: 430
ABSTRACT: SECURITY: BUFFER OVERFLOW IN ERRPT
PROBLEM DESCRIPTION:
Security problem with errpt.
PROBLEM CONCLUSION:
Lengthen a fixed-length buffer beyond the max argument
list length.
------
APAR: IY32009 COMPID: 5765D5100 REL: 340
ABSTRACT: SERVICES_CONFIG FAILING TO CALL ACCT_CONFIG
PROBLEM DESCRIPTION:
services_config failing to call acct_config
PROBLEM SUMMARY:
Under certain conditions, services_config was not calling
acct_config when it should have. As a result, accouting
was not set up correctly on the node.
PROBLEM CONCLUSION:
services_config was modified to correctly call acct_config.
------
APAR: IY32016 COMPID: 5765C3403 REL: 430
ABSTRACT: IMPLEMENT AIX PCI EEH MULTIFUNCTION ADAPTER SUPPORT
PROBLEM DESCRIPTION:
Multifunction adapters may cause a machine check.
PROBLEM CONCLUSION:
Implement EEH kernel services for multifunction adapters
that will allow device drivers to recover from fatal errors
on hardware assisted systems.
------
APAR: IY32027 COMPID: 5765D9300 REL: 320
ABSTRACT: C++ NON-THREADED PROGRAMS MAY ABORT WHEN RUN WHEN COMPILED WITH
PROBLEM DESCRIPTION:
When a C++ program is compiled with mpCC (the non-threaded
compile script and then run, jobs may abort.
The workaround listed in the poe.README for C++ programs needs
to be altered so that the workaround states it also applies to
VAC 5.0 . IBM will recommend that customers use the mpCC_r
compile script which is the threaded comi
PROBLEM SUMMARY:
C++ non-threaded programs may abort when run with
the mpCC compile script.
PROBLEM CONCLUSION:
The poe.README is being changed to document that C++
executables built with the non-threaded MPI library may
abort when run. IBM recommends that the threaded compile
script such as mpCC_r be used.
There is also a workaround documented in the
poe.README for creating an alternate mpCC
script that provides for an alternative
initialization routine bound in the executable
that prevents the job abort problem. Threaded
applications compiled with the mpCC_r script are
not affected.
------
APAR: IY32046 COMPID: 5765C3403 REL: 430
ABSTRACT: ONLINE MIRROR BACKUP FAILS WITH SEQUENTIAL SCHEDULING
PROBLEM DESCRIPTION:
chfs -a splitcopy fails on LV with sequential scheduling
policy
PROBLEM CONCLUSION:
Add logic to sequential mirroring code to check for online
mirror backups
------
APAR: IY32047 COMPID: 5765C3403 REL: 430
ABSTRACT: XLATE IOCTL FAILING ON STRIPED LVS
PROBLEM DESCRIPTION:
The XLATE ioctl is randomly returnning failures on
striped logical volumes.
------
APAR: IY32069 COMPID: 5765B9501 REL: 340
ABSTRACT: ASSRT FAILED:OPENFILE.C, LINE 4448
PROBLEM DESCRIPTION:
assert failed; openfile.c, line 4448
PROBLEM SUMMARY:
When UpdateDataBlockDiskAddrs() returns error other than
E_NOT_METANODE, it is updating version field with the
uninitialized stack value. As a result
cleanIndirectUpdates() never reset dirtyIndirectUpdates
which caused the assert.
PROBLEM CONCLUSION:
mnUpdateSomeDataBlockDiskAddrs() updates version only when
there are no errors from UpdateDataBlockDiskAddrs()
------
APAR: IY32071 COMPID: 5765B9501 REL: 330
ABSTRACT: ASSRT FAILED:OPENFILE.C, LINE 4448
PROBLEM DESCRIPTION:
assert failed; openfile.c, line 4448
PROBLEM SUMMARY:
When UpdateDataBlockDiskAddrs() returns error other than
E_NOT_METANODE, it is updating version field with the
uninitialized stack value. As a result
cleanIndirectUpdates() never reset dirtyIndirectUpdates
which caused the assert.
PROBLEM CONCLUSION:
mnUpdateSomeDataBlockDiskAddrs() updates version only when
there are no errors from UpdateDataBlockDiskAddrs()
------
APAR: IY32072 COMPID: 5765B9501 REL: 320
ABSTRACT: ASSRT FAILED:OPENFILE.C, LINE 4448
PROBLEM DESCRIPTION:
assert failed; openfile.c, line 4448
PROBLEM SUMMARY:
When UpdateDataBlockDiskAddrs() returns error other than
E_NOT_METANODE, it is updating version field with the
uninitialized stack value. As a result
cleanIndirectUpdates() never reset dirtyIndirectUpdates
which caused the assert.
PROBLEM CONCLUSION:
mnUpdateSomeDataBlockDiskAddrs() updates version only when
there are no errors from UpdateDataBlockDiskAddrs()
------
APAR: IY32075 COMPID: 5765E5400 REL: 440
ABSTRACT: CL_LSVG AND CL_LSFS ERRORS (HACMP441 HAES441)
PROBLEM DESCRIPTION:
cl_lsvg and cl_lsfs generate trash output
and error messages: can't locate VG/FS
PROBLEM SUMMARY:
Running smitty cl_admin and then selecting:
Cluster Logical Volume Manager
Shared Logical Volumes
List All Shared Logical Volumes by Volume Group
generates the error:
cl_lsvg: can't locate VG
Running smitty cl_admin and then selecting:
Cluster Logical Volume Manager
Shared File Systems
Journaled File Systems
List All Shared File Systems
generates the error:
cllsfs:can't locate FS
cl_lsvg.cel and cl_lsfs.cel both had errors
that caused the script to attempt to process all
line of a temporary file instead of just the
lines that contained the name of the resource
group currently being processed.
PROBLEM CONCLUSION:
The cl_lsfs.cel and cl_lsvg.cel scripts
were modified to grep the temporary file
for the resource group being processed
rather than reading all lines from the
file.
------
APAR: IY32102 COMPID: 5765D5100 REL: 320
ABSTRACT: ESTART FAILED ON AN SP SWITCH2 SYSTEM WITH WRAPPED SWITCH PORTS
PROBLEM DESCRIPTION:
estart failed on an sp switch2 system with swapped switch por
PROBLEM SUMMARY:
On a SP Switch 2 system, if the switch ports have
wrap plugs, Estart may fail. The following message
will be in the flt file on the primary node:
CSswitchInit: 2510-712 generate_service_routes() failed
with rc=103. If the wrap plugs are removed, Estart will
succeed.
This has been seen at sevice levels ssp.css 3.4.0.6
or 3.4.0.7; it may also occur at service level
ssp.css 3.2.0.18.
PROBLEM CONCLUSION:
The switch fault service daemon code has been corrected.
------
APAR: IY32140 COMPID: 5765B9501 REL: 340
ABSTRACT: ASSERT: MMQUOTAON MAIN PROCESS 2564178 KILLED BY SIGNAL 11
PROBLEM DESCRIPTION:
assert: mmquotaon main process 2564178 killed by signal 11
PROBLEM SUMMARY:
Without dereferencing the pointer quSharesPP, QuotaOn() is
setting to zero the storage area at quSharesPP which caused
segmentation.
PROBLEM CONCLUSION:
Added missing dereference so that right area of memory is
zeroed out.
------
APAR: IY32141 COMPID: 5765B9501 REL: 320
ABSTRACT: ASSERT: MMQUOTAON MAIN PROCESS 2564178 KILLED BY SIGNAL 11
PROBLEM DESCRIPTION:
assert: mmquotaon main process 2564178 killed by signal 11
PROBLEM SUMMARY:
Without dereferencing the pointer quSharesPP, QuotaOn() is
setting to zero the storage area at quSharesPP which caused
segmentation.
PROBLEM CONCLUSION:
Added missing dereference so that right area of memory is
zeroed out.
------
APAR: IY32142 COMPID: 5765B9501 REL: 330
ABSTRACT: ASSERT: MMQUOTAON MAIN PROCESS 2564178 KILLED BY SIGNAL 11
PROBLEM DESCRIPTION:
assert: mmquotaon main process 2564178 killed by signal 11
PROBLEM SUMMARY:
Without dereferencing the pointer quSharesPP, QuotaOn() is
setting to zero the storage area at quSharesPP which caused
segmentation.
PROBLEM CONCLUSION:
Added missing dereference so that right area of memory is
zeroed out.
------
APAR: IY32182 COMPID: 5765E5400 REL: 440
ABSTRACT: HAES: NODEXNODE LEAVES HACMPPAGER WITH HAS PATH TO SAMPLE.TXT
PROBLEM DESCRIPTION:
After a node by node migration from HAS 450 to HAES 450 with
pager configured the odm HACMPpager has the path to the sample.txt
file in the format for HAS (or /usr/sbin/custer...).
PROBLEM CONCLUSION:
The default pager file will be remapped to its HAES location.
User created files will be unmodified.
------
APAR: IY32184 COMPID: 5765B9501 REL: 340
ABSTRACT: SIGNAL 11 DURING LOG RECOVERY
PROBLEM DESCRIPTION:
A bad record in the recovery logs overwrites some other stack
data causing a SIGSEGV.
PROBLEM SUMMARY:
SIGSEGV caused by bad record in the recovery logs
PROBLEM CONCLUSION:
When importing a replicated disk address from the log, check
that nValidDiskAddrs fits in a RepDiskAddr structure. A bad
log record caused stack corruption and later a SIGSEGV
occurred.
------
APAR: IY32186 COMPID: 5765E6900 REL: 310
ABSTRACT: BATCH TASK GEOMETRY JOB GETS ORPHAN. LL CAN'T DISPATCH NEW
PROBLEM DESCRIPTION:
Customer has a task geometry job that would get orphan
processes on lpar nodes. Then when LL thinks it is
gone, no other new jobs could be scheduled. A recycle
was needed after orphan processes were killed.
PROBLEM SUMMARY:
In LoadLeveler, task_geometry jobs vectors are not
created correctly and memory errors can occur.
Or the Negotiator could core dump with Segmentation
fault.
PROBLEM CONCLUSION:
In LoadLeveler, task_geometry jobs vectors
are now created correctly.
Memory leaks calls in Accumulator and Backfill
and Gang dispatching are fixed.
------
APAR: IY32189 COMPID: 5765B9501 REL: 320
ABSTRACT: READDIR() MISSES MOVED BLOCKS
PROBLEM DESCRIPTION:
ls -l command on GPFS may miss to list files. This may
happen in cases where the directory gets increases.
Example: A directory consists of 3 blocks (0,1,2).
A ls -l command is run, using readdir() the read
directory entries. As the ls command is stating the
the first files another file gets created by another
process. If the directory needs to be increased to
hold this new created file another block (3) gets
allocated and half of the enrties of block 1 are
copied into block 3. The last readdir run by the ls
command is supposed to see this has happened and
return results from both blocks, but it is only
returning entries left in block 1, so it leaves out
the ones moved to block 3.
This problem may be seen by other commands using
readdir() too.
PROBLEM SUMMARY:
ls -l may not show all new entries, readdir()
misses moved blocks.
PROBLEM CONCLUSION:
readdir scan was stopping too soon in some
cases if the directory block was split after the scan started.
Also, fix code to work if a directory block merge occurs in
the middle of readdir scan. This won't happen with current
code because merge gets lock that conflicts with readdir
(also, merge always fails with E_MULTI_RANGE_LOCK(), but fix it
anyway in case this changes.
------
APAR: IY32192 COMPID: 5765E5400 REL: 440
ABSTRACT: HACMP/HAES:ERROR FOUND IN CSPOC.LOG DURING CLVM CSPOC OPERATION
PROBLEM DESCRIPTION:
While creating a concurrent volume group in SMIT using "Create
a Concurrent Volume Group" in CSPOC, one line in cspoc.log
showed "FAILED" and another line shortly afterward showed
"RETURN CODE=1". However, the intended operation worked
properly and there were no errors in hacmp.out or AIX errlog.
PROBLEM CONCLUSION:
Concurrent volume groups should be not varied off after
creation. Special check should be added for Conc. VG.
------
APAR: IY32224 COMPID: 5765E5400 REL: 440
ABSTRACT: HACMP/HAES: CLVERIFY INTERPRETS SOME WARNINGS AS ERRORS
PROBLEM DESCRIPTION:
clverify terminates or returns with a non-zero error count
even if only warnings have been issued.
PROBLEM CONCLUSION:
Scan the messages before sending them, to distinguish warning
from error messages, based on the header inserted by clverify.
Skip updating the error count for warning messages.
------
APAR: IY32226 COMPID: 5765E5400 REL: 440
ABSTRACT: HACMP/HAES: DO NOT MODIFY /ETC/RC.SHUTDOWN IF OFFICIAL CALLOUT
PROBLEM DESCRIPTION:
HACMP will rename any user /etc/rc.shutdown, and insert its
own.
PROBLEM CONCLUSION:
If /etc/shutdown contains a callout for HACMP, do not replace
any user version of /etc/rc.shutdown.
------
APAR: IY32260 COMPID: 5765E5400 REL: 440
ABSTRACT: HACMP/HAES: GET_ADDRS CAN RETURN INCORRECT IP ADDRESS
PROBLEM DESCRIPTION:
a call to clgetaddr for a node that is down return the
service label which has been taken over by another node
PROBLEM CONCLUSION:
modify checking performed by library functions.
------
APAR: IY32266 COMPID: 5765D5100 REL: 340
ABSTRACT: SOMETIMES LAPI SHARED MEMORY PROGRAM HANGS.
PROBLEM DESCRIPTION:
Sometimes LAPI chared memory program hangs. In the user
application is was a GAMESS program running a 32 way LAPI
shared memory job.
PROBLEM SUMMARY:
Fixed a problem in lapi that causes the application to hang
sometimes.
PROBLEM CONCLUSION:
Sometimes a LAPI shared memory application will hang.
------
APAR: IY32268 COMPID: 5765E6900 REL: 310
ABSTRACT: LOADL_CONFIG ENV VARIABLE ERROR MSG FIX AND DOCUMENT FORMAT
PROBLEM DESCRIPTION:
The LOADL_CONFIG env variable format isn't documentated. Also,
the error message that it outputted is deceiving.
LOCAL FIX:
Put in the correct format to LOADL_CONFIG. Either in the
format of LOADL_CONFIG=/etc/LoadL.cfg or
LOADL_CONFIG=LoadL
PROBLEM SUMMARY:
There isn't any correct input format documented
for the LoadLeveler environment variable LOADL_CONFIG.
And when the incorrect format was used, the
error message outputted for the incorrect filename
was misleading.
PROBLEM CONCLUSION:
The error message from LOADL_CONFIG environment
variable will now put the filename of
what it is trying to open.
The following would be added to the LoadLeveler
documentation:
In Using and Administering LoadLeveler,
Chapter 5. Submitting and managing jobs,
Subparagraph "Querying multiple LoadLeveler clusters",
The format for LOAD_CONFIG environment variable:
LOADL_CONFIG="fully qualified path and filename"
e.g.
LOADL_CONFIG=/etc/LoadL.cfg
or
LOADL_CONFIG="Name of the file without any suffix extension"
This is because internally the prefix "/etc"
and the suffix ".cfg" would be appended to the
beginning and to the ending of the filename specified.
e.g.
LOADL_CONFIG=LoadL
------
APAR: IY32331 COMPID: 5765D9300 REL: 320
ABSTRACT: WORKAROUNDS RELATED TO THE USE OF TECHNICAL LARGE PAGE FOR POE
PROBLEM DESCRIPTION:
workarounds related to the use of technical large page
for POE jobs.
PROBLEM SUMMARY:
workarounds related to the use of
technical large page for POE jobs.
PROBLEM CONCLUSION:
workarounds related to the use of technical
large page for POE jobs.
------
APAR: IY32353 COMPID: 5765D5100 REL: 320
ABSTRACT: TCE LEAK
PROBLEM DESCRIPTION:
tce leak
PROBLEM SUMMARY:
KHAL buffers supporting KLAPI zero copy were
released under the condition that KHAL port status is clean.
This prevented buffers from being released when port status had
certain flags, reflecting the internal KHAL status, set.
The only condition that should have been checked in the KHAL
port status is whether the port is closed.
PROBLEM CONCLUSION:
To resolve the problem, the check has been
added to KHAL function which releases KHAL buffers allocated
in support if KLAPI zero copy operation. The check verifies
that the buffers are released under any circumstances except
for that the KHAL port is closed.
------
APAR: IY32361 COMPID: 5765D5100 REL: 320
ABSTRACT: NODECOND_MCA NEEDS TO HANDLE 10/100 ADAPTERS
PROBLEM DESCRIPTION:
nodecond_mca does not currently recognize the
10/100 Mbs Ethernet TX MC Adapter. It terminates with the msg:
the first ethernet adapter detected is not a supported
installation adapter.
The code needs to be modified to recognize this supported
adapter.
LOCAL FIX:
Manual node conditioning can be used to select this adapter.
PROBLEM SUMMARY:
nodecond_mca does not currently recognize the
10/100 Mbs Ethernet TX MC Adapter. It terminates with the
message that the first ethernet adapter detected is not a
supported installation adapter.
The code needs to be modified to recognize this supported
adapter.
PROBLEM CONCLUSION:
nodecond_mca has been modified to recognize the
10/100 Mbs Ethernet TX MC Adapter.
------
APAR: IY32362 COMPID: 5765D5100 REL: 340
ABSTRACT: NODECOND_MCA NEEDS TO HANDLE 10/100 ADAPTERS
PROBLEM DESCRIPTION:
nodecond_mca does not currently recognize the
10/100 Mbs Ethernet TX MC Adapter. It terminates with the msg:
the first ethernet adapter detected is not a supported
installation adapter.
The code needs to be modified to recognize this supported
adapter.
LOCAL FIX:
Manual node conditioning can be used to select this adapter.
PROBLEM SUMMARY:
nodecond_mca does not currently recognize the
10/100 Mbs Ethernet TX MC Adapter. It terminates with the
message that the first ethernet adapter detected is not a
supported installation adapter.
The code needs to be modified to recognize this supported
adapter.
PROBLEM CONCLUSION:
nodecond_mca has been modified to recognize the
10/100 Mbs Ethernet TX MC Adapter.
------
APAR: IY32365 COMPID: 5765B9501 REL: 340
ABSTRACT: READDIR() MISSES MOVED BLOCKS
PROBLEM DESCRIPTION:
ls -l command on GPFS may miss to list files. This may
happen in cases where the directory gets increases.
Example: A directory consists of 3 blocks (0,1,2).
A ls -l command is run, using readdir() the read
directory entries. As the ls command is stating the
the first files another file gets created by another
process. If the directory needs to be increased to
hold this new created file another block (3) gets
allocated and half of the enrties of block 1 are
copied into block 3. The last readdir run by the ls
command is supposed to see this has happened and
return results from both blocks, but it is only
returning entries left in block 1, so it leaves out
the ones moved to block 3.
This problem may be seen by other commands using
readdir() too.
PROBLEM SUMMARY:
ls -l may not show all new entries, readdir()
misses moved blocks.
PROBLEM CONCLUSION:
readdir scan was stopping too soon in some
cases if the directory block was split after the scan started.
Also, fix code to work if a directory block merge occurs in
the middle of readdir scan. This won't happen with current
code because merge gets lock that conflicts with readdir
(also, merge always fails with E_MULTI_RANGE_LOCK(), but fix it
anyway in case this changes.
------
APAR: IY32415 COMPID: 5765E6900 REL: 310
ABSTRACT: NEW CONFIGURATION KEYWORD: ENFORCE_RESOURCE_POLICY = HARD | SOFT
PROBLEM DESCRIPTION:
New configuration keyword:
ENFORCE_RESOURCE_POLICY = hard | soft | shares
PROBLEM SUMMARY:
A new keyword that allows the administrator to define the
type of enforcement policy that LoadLeveler will use when
creating WLM classes.
PROBLEM CONCLUSION:
LoadLeveler by default will create WLM shares bases on a job
step's resource requirements when creating a WLM class. The
new keyword will let the administrator decide whether
shares, soft limits or hard limits should be defined.
Soft and hard limits will represent the percentage of step
requested resources divided by total machine resources.
------
APAR: IY32417 COMPID: 5765D5100 REL: 340
ABSTRACT: NGRESOLVE -D FOR IP ADDRESS RETURNS BLANK LINES
PROBLEM DESCRIPTION:
ngresolve -d for ip address returns blank lines
PROBLEM SUMMARY:
Issuing ngresolve with the -d flag should display the IP
address of each node in the node group. Currently a
blank line is being displayed for each node. The adapter
information for the node was not being passed correctly.
PROBLEM CONCLUSION:
Changed the format to pass "node number and adapter type"
from SpNode.C to the constructor of adapter SpAdapter.C.
ngresolve with the -d flag will now display the IP
address of each node in the node group.
------
APAR: IY32429 COMPID: 5765D5100 REL: 340
ABSTRACT: DOUBLE FREE OF SERVICE PACKET STORAGE IN HAL_RECV_HNDLR()
PROBLEM DESCRIPTION:
In hal_recv_hndlr(), there are cases where the storage for a
service packet is freed after the packet has been placed on a
port'svirtual receive FIFO. The port thread will also free the
storage after reading the service packet from the FIFO. The
free in hal_recv_hndlr() is erroneous. The results are
indeterminate, because it depends on if/when the doubly-freed
storage is reused. One possible result is a fault-service daemon
core dump.
LOCAL FIX:
Restart the fault-service daemon after a core dump with
/usr/lpp/ssp/css/rc.switch.
PROBLEM SUMMARY:
Under some conditions, the hal_recv_hndlr() function will
free the storage used for service packets; the port thread
may later free this same storage. The results are
indeterminate; there may be data corruption or the fault
service daemon may core dump.
PROBLEM CONCLUSION:
Once a service packet is placed on the port's virtual
receive FIFO queue, the hal_recv_hndlr() function
will no longer try to free it.
------
APAR: IY32496 COMPID: 5765B9501 REL: 340
ABSTRACT: MULTIPLE PROCESSES UPDATES TO GPFS RESULT IN CORRUPTED/MISSING
PROBLEM DESCRIPTION:
A write/append test demonstrates a serious problem with GPFS
when multiple processes write append to the same file. The
resulting file contains corrupted/missing data. A test PGM opens
a file with flags O_RDWR ] O_CREAT ] O_APPEND and then does a
series of identically sized writes to the file. If multiple
copies of the program are run at the same time on the same LPAR
then the resulting file contains missing or corrupted records.
Same test was run repeatedly to nfs and locally mounted file
systems with no problems.
PROBLEM SUMMARY:
Writes in append mode overwrites previous records
PROBLEM CONCLUSION:
When multiple processes write to the same file in append
mode some records are being overwritten. The fast write path
was not getting an Append (wa) lock on the inode to
serialize the writes.
------
APAR: IY32524 COMPID: 5765D5100 REL: 340
ABSTRACT: K4DESTROY ERROR MESSAGE FROM CLEANUP.LOGS.NODES SCRIPT
PROBLEM DESCRIPTION:
When cron runs the cleanup.logs.nodes script it emails root
with the error message from the k4destroy command:
2502-000 k4destroy: No tickets to destroy.
if kerberos is not active.
This error message should not be emailed if kerberos is not
active. kerberos is not active if it is configured but has been
deactivated using the chauthent and chauthts commands. The test
for kerberos not active is that the lsauthent command output
does not include "Kerberos 4" and the lsauthts command output
not include "Compatibility".
LOCAL FIX:
A workaround of adding '2>/dev/null' to the either the
script or the crontab entry.
PROBLEM SUMMARY:
cleanup.logs.nodes issues k4destroy to destroy any Kerberos
Version 4 authentication tickets. If any messages are
written to stderr, such as:
k4destroy: 2502-000 No tickets to destroy.
it results in an email being sent to root, since it is
usually run as a cron job. Customers would prefer to not
see this message and to not receive the email.
PROBLEM CONCLUSION:
cleanup.logs.nodes has been modified so that any output to
stderr from k4destroy will be redirected to stdout,
which is already being redirected to /dev/null. As a
result no error messages will be issued from the call
to k4destroy from cleanup.logs.nodes.
------
APAR: IY32698 COMPID: 5765B9501 REL: 330
ABSTRACT: SIGNAL 11 DURING LOG RECOVERY
PROBLEM DESCRIPTION:
A bad record in the recovery logs overwrites some other stack
data causing a SIGSEGV.
PROBLEM SUMMARY:
SIGSEGV caused by bad record in the recovery logs
PROBLEM CONCLUSION:
When importing a replicated disk address from the log, check
that nValidDiskAddrs fits in a RepDiskAddr structure. A bad
log record caused stack corruption and later a SIGSEGV
occurred.
------
APAR: IY32699 COMPID: 5765B9501 REL: 320
ABSTRACT: SIGNAL 11 DURING LOG RECOVERY
PROBLEM DESCRIPTION:
A bad record in the recovery logs overwrites some other stack
data causing a SIGSEGV.
PROBLEM SUMMARY:
SIGSEGV caused by bad record in the recovery logs
PROBLEM CONCLUSION:
When importing a replicated disk address from the log, check
that nValidDiskAddrs fits in a RepDiskAddr structure. A bad
log record caused stack corruption and later a SIGSEGV
occurred.
------
APAR: IY32702 COMPID: 5765D5100 REL: 340
ABSTRACT: S70D DAEMON GETS ERRORS "CANNOT COMMUNICATE WITH REMOTE NODE" IF
PROBLEM DESCRIPTION:
At SP systems using a 128-port RAN for S7A tty connection foll
owing entries occur in system errlog of cws: "cannot communicate
with remote node". The problem is definately associated with a
timing issue between the s70d daemon and the H/W. With using
an 8-port adapter the connection is direct between the 8-port
box and the CWS. With a 16-port box there is only a direct
connection between the initial box and the CWS (all other boxes
making up the 128-way adapter do not have direct connection
which is where the problem lies).
PROBLEM SUMMARY:
SP_attached S70s and S7As connected using more than
When an S70 or S7A is connected using more than an
8-port RAN, SPMON_EMSG101_ER entries may be made in the
errpt. This indicates a communication problem, which
is not the case. The s70d daemon needs to be more
tolerant of non-responses from the SAMI.
PROBLEM CONCLUSION:
The s70d daemon has been modified to be more tolerant of
non-responses of any kind from SAMI. The allowable number
of non-responses has been increased to prevent
SPMON_EMSG101_ER entries being made in the errpt indicating
the Supervisor is not responding.
------
APAR: IY32752 COMPID: 5765D5100 REL: 340
ABSTRACT: SRVSUPPWD NEEDS UNIQUE TMP PW FILENAME
PROBLEM DESCRIPTION:
srvsuppwd needs unique tmp pw filename
PROBLEM SUMMARY:
The srvsuppwd process creates a temporary file that is
not unique to the process. Since there may be multiple
srvsuppwd processes running at the same time, this could
result in an updsuppwd process on a node having to try
to obtain the supman password file muptiple times.
PROBLEM CONCLUSION:
srvsuppwd has been modified to create a temporary file that
is unique to the process that creates it.
------
APAR: IY32800 COMPID: 5765B9501 REL: 320
ABSTRACT: ASSERT: !HASTOKENS() (CACHEOBJFLAGS & 0X02)
PROBLEM DESCRIPTION:
assert: !has tokens() !! (cacheobjflags & 0x02)
PROBLEM SUMMARY:
gpfs assert: !hasTokens() !! (cacheObjFlags & 0x02)
PROBLEM CONCLUSION:
If filesystem panics while relinquish of tokens is in
progress in releaseLastHold, do not assert that the
tokencount is zero.
------
APAR: IY32801 COMPID: 5765B9501 REL: 330
ABSTRACT: ASSERT: !HASTOKENS() (CACHEOBJFLAGS & 0X02)
PROBLEM DESCRIPTION:
assert: !has tokens() !! (cacheobjflags & 0x02)
PROBLEM SUMMARY:
gpfs assert: !hasTokens() !! (cacheObjFlags & 0x02)
PROBLEM CONCLUSION:
If filesystem panics while relinquish of tokens is in
progress in releaseLastHold, do not assert that the
tokencount is zero.
------
APAR: IY32803 COMPID: 5765B9501 REL: 340
ABSTRACT: ASSERT: !HASTOKENS() (CACHEOBJFLAGS & 0X02)
PROBLEM DESCRIPTION:
assert: !has tokens() !! (cacheobjflags & 0x02)
PROBLEM SUMMARY:
gpfs assert: !hasTokens() !! (cacheObjFlags & 0x02)
PROBLEM CONCLUSION:
If filesystem panics while relinquish of tokens is in
progress in releaseLastHold, do not assert that the
tokencount is zero.
------
APAR: IY32923 COMPID: 5765E6900 REL: 310
ABSTRACT: LL'S WLM INTEGRATION. LL JOBS WONT RUN. MISLEADING MESSAGE:
PROBLEM DESCRIPTION:
LL used with WLM integration. Periodicaly some parallel jobs
are in the queue in the "Idle" state although there are clearly
resources available. "llq -s" run against the job shows "The max
imum number of steps (27) that can be running when consumable
resources are enforced has been exceeded".
This is definitely NOT the case - there are NO jobs running at
that time.
note: "blocking" keyword is used in LL job command file !!
PROBLEM SUMMARY:
LoadL quit scheduling jobs because it ran out of WLM
classes even thought there were no jobs running on the
machine.
PROBLEM CONCLUSION:
When loadl is using WLM and the blocking keywork is used in
the user's job cmd file, LoadL is over incrementing the
number of classes being used and then decrementing
appropriately, leaving the difference in the class counter.
------
APAR: IY32955 COMPID: 5724C3505 REL: 310
ABSTRACT: DTMF DIGITS SOMETIMES DETECTED TWICE IN VOICEXML APPLICATION
PROBLEM DESCRIPTION:
DTMF digits are sometimes detected twice when entering data
into a voiceXML application when reco is active also.
PROBLEM SUMMARY:
DTMF digits are sometimes detected twice when
entering data into a voiceXML application when reco is
active also.
PROBLEM CONCLUSION:
Harness changed to correct problem
------
APAR: IY32966 COMPID: 5765E6900 REL: 310
ABSTRACT: LOADLEVELER WRITES TO SOCKET HANG, POSSIBLY CAUSING CORE DUMP
PROBLEM DESCRIPTION:
If a LoadLeveler daemon is writing to a socket and the socket
window fills up, the write can hang until the window drains. If
the hang is long enough (e.g. if the client is suspended the
window will never drain) and a LoadLeveler daemon is holding
locks over the write, this can eventually cause the LoadLeveler
daemon to core dump.
PROBLEM SUMMARY:
LoadLeveler daemons (such as the LoadL_negotiator) can
hang if they are writing to a socket, and the process
reading from the socket is suspended. If a LoadLeveler
daemon hangs writing to a socket this could result in a
core dump.
PROBLEM CONCLUSION:
The LoadLeveler library code has been changed to prevent
socket writes from hanging when the socket window fills.
LoadLeveler will set the socket in non-blocking mode
and allow write operations to time-out.
------
APAR: IY32970 COMPID: 5765D5100 REL: 340
ABSTRACT: SP SWITCH2 WORM RUNS SLOW UNDER HEAVY PAGING LOAD
PROBLEM DESCRIPTION:
The current SP Switch2 Worm uses popen() to invoke the sum
command on the current compressed topology file. The result of
the sum command is used to determine if an updated copy of the
topology file needs to be sent to the node. Under heavy paging
load, the time necessary for popen() to do a fork to invoke ksh,
and then ksh to do a fork to invoke sum, can be excessive. If
the Worm does not report back fast enough, when it receives
a NODE_INIT packet, the primary will drop the node off of the
switch.
LOCAL FIX:
None, really. The node is normally still okay. There shouldn't
be any problem bringing it back on the switch, via Eunfence.
But, the damage is already done.
PROBLEM SUMMARY:
Nodes can drop off the SP Switch 2 when they are under
a heavy load (e.g. high levels of paging). The time
taken to call the AIX sum command to calculate the
switch topology file checksum may be too long under high
load conditions, causing the primary node to drop the
slow responding node off the switch.
PROBLEM CONCLUSION:
The fault_service_Worm_RTG_CS code has been changed to
calculate the checksum of the switch topology file
directly instead of calling the AIX sum command.
------
APAR: IY32979 COMPID: 5765D5100 REL: 340
ABSTRACT: SWITCH STOPS RESPONDING TO 8K PACKETS
PROBLEM DESCRIPTION:
When there are burst of in-bound IP traffic, if the number of
outstanding pending IP datagram exceeds the limit of receive
queue, IP datagram will be dropped, receive cluster buffers will
be recovered and recycled back to Corsair adapter. Cluster
buffers are not recovered correctly, and there is an rpool
cluster buffer leakage. When the leakage becomes severe, no
more large IP datagram can run thru the switch network.
PROBLEM SUMMARY:
When IP traffic overflow the receive side, we will drop the
incoming IP datagram but we forget to re-claim the IP
receive cluster buffers, over the long run, we may run out
of receive cluster buffers and can no longer ping large IP
datagram over SP switch
PROBLEM CONCLUSION:
Reclaim the IP receive cluster buffer approriately.
------
APAR: IY32982 COMPID: 5765E7400 REL: 300
ABSTRACT: SAVE METRICS DATA TO FILES
PROBLEM DESCRIPTION:
save metrics data to files in spreadsheet format and
HTML format.
------
APAR: IY32985 COMPID: 5765B9501 REL: 320
ABSTRACT: GPFS FREED ACTIVE STORAGE ON WHEN HANDLING NFS DUPLICATE FCNTL
PROBLEM DESCRIPTION:
KERNEL PANIC IN KXCLEANUPACQUIRES
GPFS freed active storage on when handling NFS duplicate fcntl
requests retries.
LOCAL FIX:
There are no known workarounds for this problem.
PROBLEM SUMMARY:
NFS using GPFS caused kernel panic in kxcleanupacquires
PROBLEM CONCLUSION:
Clear the local sleepElement pointer returned from
kxDupCheckAcquires before returning from gpfsFcntl
(otherwise it will be freed when the routine exits).
------
APAR: IY33005 COMPID: 5765B9501 REL: 340
ABSTRACT: GPFS FREED ACTIVE STORAGE ON WHEN HANDLING NFS DUPLICATE FCNTL
PROBLEM DESCRIPTION:
KERNEL PANIC IN KXCLEANUPACQUIRES
GPFS freed active storage on when handling NFS duplicate fcntl
requests retries.
LOCAL FIX:
There are no known workarounds for this problem.
PROBLEM SUMMARY:
NFS using GPFS caused kernel panic in kxcleanupacquires
PROBLEM CONCLUSION:
Clear the local sleepElement pointer returned from
kxDupCheckAcquires before returning from gpfsFcntl
(otherwise it will be freed when the routine exits).
------
APAR: IY33010 COMPID: 5765D5100 REL: 330
ABSTRACT: GPFS FREED ACTIVE STORAGE ON WHEN HANDLING NFS DUPLICATE FCNTL
PROBLEM DESCRIPTION:
KERNEL PANIC IN KXCLEANUPACQUIRES
GPFS freed active storage on when handling NFS duplicate fcntl
requests retries.
LOCAL FIX:
There are no known workarounds for this problem.
PROBLEM SUMMARY:
NFS using GPFS caused kernel panic in kxcleanupacquires
PROBLEM CONCLUSION:
Clear the local sleepElement pointer returned from
kxDupCheckAcquires before returning from gpfsFcntl
(otherwise it will be freed when the routine exits).
------
APAR: IY33064 COMPID: 5765D5100 REL: 340
ABSTRACT: SP SWITCH 2 WINDOW SUSPEND FAILURE AFTER LOADLEVELER HAS TRIED
PROBLEM DESCRIPTION:
On the SP Switch 2, a failure may occur suspending windows if
a job fails to respond to the suspend request that is issued
during switch recovery. This problem can happen if a job has
a SIGKILL pending (having been killed by LoadLeveler) but has
not yet fully processed the SIGKILL because a thread is in a
system call with signals blocked. When the windows fail to
suspend because of a non-responsive job, switch recovery will
fail on the affected node, and switch responds will be lost on
the affected switch plane.
PROBLEM SUMMARY:
Nodes can drop off the SP Switch 2 when the switch device
driver fails to suspend jobs that are running. The
adapter.log will show the following error:
QUERY SUSPEND WINDOW_COMPLETION ioctl failed
PROBLEM CONCLUSION:
The device driver for the SP Switch 2 has been changed
to allow suspend requests to be properly handled for
jobs that are starting or stopping.
------
APAR: IY33091 COMPID: 5697E3000 REL: 220
ABSTRACT: JAVA ON-THE-SPOT PROBLEM
PROBLEM DESCRIPTION:
Japan Extension Kit V2.2 is upgraded.
PROBLEM CONCLUSION:
All problems we found were fixed.
------
APAR: IY33093 COMPID: 5697E3000 REL: 230
ABSTRACT: WNN7 UPDATE WITH NEW README
PROBLEM DESCRIPTION:
Japan Extension Kit V2.3 is upgraded
PROBLEM CONCLUSION:
All problems we found were fixed.
------
APAR: IY33109 COMPID: 5765D9300 REL: 320
ABSTRACT: POE MAY NOT HANDLE MULTIPLE GMON.OUT FILES ON A NODE CORRECTLY
PROBLEM DESCRIPTION:
When customers compile a program for profiling using the -pg
option using POE, executing the program will sometimes not
create all or full gmon.out files when there are more than
a few tasks on a node. Sometimes the customer will see
messages like the following :
ATTENTION: 0031-662 Node 1 did not send PROFILE_DONE, sent
msgtype 15
ATTENTION: 0031-679 Profiling may not have completed on node 1
PROBLEM SUMMARY:
When customers compile a program for profiling using the -pg
option using POE, executing the program will sometimes not
create all or full gmon.out files when there are more than
a few tasks on a node. Sometimes the customer will see
messages like the following :
ATTENTION: 0031-662 Node 1 did not send PROFILE_DONE, sent
msgtype 15
ATTENTION: 0031-679 Profiling may not have completed on node
1
PROBLEM CONCLUSION:
POE will change the behavior when a SIGCHLD signal
comes in during the processing of gmon.out files.
------
APAR: IY33110 COMPID: 5765B9501 REL: 340
ABSTRACT: NEGATIVE IN_DOUBT VALUES ARE NOT DOCUMENTED VERY WELL
PROBLEM DESCRIPTION:
the output of mmcheckquota might represent to the user
negative values for the in_doubt column.
The Documentation doesnt specify neg. values at all,
so its occurence is quite confusing for customers, and
should be documented.
PROBLEM SUMMARY:
negative in-doubt values for GPFS quotas were not
documented
PROBLEM CONCLUSION:
In the Administration and Programming Reference in the
chapter "Performing GPFS Administration tasks" under the
heading "Checking quotas" and in the chapter
"GPFS commands" under the description section of the
mmcheckquota command, add the paragraph:
When issuing the mmcheckquota command on a mounted file
system, negative in-doubt values may be reported if the
quota server processes a combination of up-to-date and
back-level information. This is a transient situation
and may be ignored.
In the Administration and Programming Reference in the
chapter "Performing GPFS Administration tasks" under the
heading "Listing quotas" and in the chapter
"GPFS commands" under the description section of the
mmlsquota command, add the paragraph:
When issuing the mmlsquota command on a mounted file
system,negative in-doubt values may be reported if the
quota server processes a combination of up-to-date and
back-level information. This is a transient situation
and may be ignored.
In the Administration and Programming Reference in the
chapter "Performing GPFS Administration tasks" under the
heading "Creating file system quota reports" and in the
chapter "GPFS commands" under the description section of
the mmrepquota command, add the paragraph:
When issuing the mmrepquota command on a mounted file
system,negative in-doubt values may be reported if the
quota server processes a combination of up-to-date and
back-level information. This is a transient situation
and may be ignored.
V3 The man pages for mmcheckquota, mmlsquota and
mmrepquota were updated with this information.
------
APAR: IY33111 COMPID: 5765B9501 REL: 330
ABSTRACT: MMFSCK DESTROYED THE ALLOC MAP FILES
PROBLEM DESCRIPTION:
mmfsck destroyed the alloc map files
PROBLEM SUMMARY:
mmfsck corrupted the alloc map file
PROBLEM CONCLUSION:
Fix alloc map segment compare. Map header pointer movement
was wrong. Alloc segment data length computation was wrong.
Number of segments to compare was wrong when number of
segments didn't end on a block boundary. Comparison of
lastblocksubblocks should allow 0 or 32 for full last block.
LastSubblocks modulo must use maxSubblocksPerBlock not 32
for directories and other things that have 'small'
fullblocks.
------
APAR: IY33113 COMPID: 5765B9501 REL: 340
ABSTRACT: MMFSCK DESTROYED THE ALLOC MAP FILES
PROBLEM DESCRIPTION:
mmfsck destroyed the alloc map files
PROBLEM SUMMARY:
mmfsck corrupted the alloc map file
PROBLEM CONCLUSION:
Fix alloc map segment compare. Map header pointer movement
was wrong. Alloc segment data length computation was wrong.
Number of segments to compare was wrong when number of
segments didn't end on a block boundary. Comparison of
lastblocksubblocks should allow 0 or 32 for full last block.
LastSubblocks modulo must use maxSubblocksPerBlock not 32
for directories and other things that have 'small'
fullblocks.
------
APAR: IY33116 COMPID: 5765D5100 REL: 340
ABSTRACT: 0509-036 AND 0509-130 IN PMANRMD.LOG FILE WHEN LIBDCE.A IS ON
PROBLEM DESCRIPTION:
The pmanrmd.log file shows a repeatable pattern of the
following entries.
0509-036 Cannot load program spsec_ldmod because of the
following errors
0509-130 Symbol resolution failed for
/usr/lpp/ssp/bin/spsec_ldmod because:
0509-136 Symbol GSS_MECH_MIT_KRB5 is not exported from dependent
module /usr/lib/libdce.a(shr.o).
/usr/lpp/ssp/bin/SDRGetObjects: 0025-004 Item specified for
query, insertion or deletion was not found.
The problem is triggered by the pman daemon logic finding that
the libdce.a file is on this system before checking to see if
DCE authentication is in use. DCE is not in use on this system
and the file remains for other reasons.
Other apars with similar symptoms IY17070, IY21195, IY23021 and
IY22203 either have the fix on or do not apply. There does not
seem to be any impact to the system other than the error entries
in the log.
LOCAL FIX:
The only impact is the messages and they can be ignored. If the
libdce.a file is removed the messages stop.
PROBLEM SUMMARY:
dsrvtgt was calling spsec_start before it was determining if
dce authentication was being used. If there is an older
/usr/lib/libdce.a you will get the load errors seen in the
/var/adm/SPlogs/pman/pmanrmd.log.
The -m in the SDRGetObjects call has been change to -q.
PROBLEM CONCLUSION:
dsrvtgt has been modified to determine if dce authentication
is being used before calling spsec_start. If it determines
dce authentication is not being used it just exits without
calling spsec_start.
The SDRGetObjects option list has been fixed.
------
APAR: IY33197 COMPID: 5765E6900 REL: 310
ABSTRACT: TIMING EXPOSURE IN LOADL_NEGOTIATOR CAUSES DEADLOCK
PROBLEM DESCRIPTION:
Timing exposures, between a job completion and a Negotiator
Cycle can cause a deadlock condition in the Negotiator.
PROBLEM SUMMARY:
A timing exposure in the LoadLeveler Negotiator could make
it think that there were jobs running, on a machine, when
they had already finished. That wrong assumption could
cause the Negotiator to try to get the same lock, for
write, a second time. The Negotiator would hang, after
that.
PROBLEM CONCLUSION:
The LoadLeveler Negotiator added a second verification
that there were jobs running, on a machine, before trying
to use certain data about the jobs on that machine.
------
APAR: IY33208 COMPID: 5724C3505 REL: 310
ABSTRACT: CACHEM IMPROVEMENTS FOR WEBSPHERE VOICE RESPONSE AIX
PROBLEM DESCRIPTION:
Cachem improvements to be made
PROBLEM CONCLUSION:
Changing cache expiry & file error logging
------
APAR: IY33214 COMPID: 5765E6900 REL: 310
ABSTRACT: A CANCELED INTERACTIVE JOB MIGHT CAUSE THE NEGOTIATOR TO QUIT
PROBLEM DESCRIPTION:
If an Interactive job is Ctrl-C'd at the same time that the
Negotiator decides that it cannot schedule the job to run, the
LoadL_negotiator daemon may fail to handle the job correctly
and will intentionally terminate itself.
PROBLEM SUMMARY:
An interactive poe job can cause the LoadLeveler Negotiator
to get confused, and decide to terminate itself, if the
interactive job is terminated at just the right time during
the negotiation cycle.
PROBLEM CONCLUSION:
The LoadLeveler Negotiator was modified to keep better
track of interactive jobs, during the negotiation cycle.
------
APAR: IY33227 COMPID: 5765C3403 REL: 430
ABSTRACT: MISSING RESOURCE ERROR FOR TMSSAR WHEN CFGMGR IS RUN
PROBLEM DESCRIPTION:
The error message...
MISSING RESOURCE
801020
The following resources were detected previously, but are
not detected now:
- tmssar Target Mode SSA Router
These resources do not have Diagnostic support and cannot
be resolved by the Missing Option Resolution Procedure.
...when cfgmgr is run.
PROBLEM CONCLUSION:
Edit tmssa.ssa.usr.add file to change chgstatus value to 1
------
APAR: IY33251 COMPID: 5765D5100 REL: 340
ABSTRACT: KLAPI SUPPORT FOR REGATTAH SP SWITH 2 AND SP SWITCH 2 2-PLANE
PROBLEM DESCRIPTION:
KLAPI support for regattaH SP Switch 2 and sp switch 2 2-plane
PROBLEM SUMMARY:
klapi support for regattah sp switch and
sp switch2 2-plane
------
APAR: IY33256 COMPID: 5765C3403 REL: 430
ABSTRACT: CRITICAL FIXES FOR AIX 4.3 AS OF JULY 2002
PROBLEM DESCRIPTION:
This APAR delivers security related and other critical fixes for
AIX 4.3.3 made available after the 4330-10 Recommended
Maintenance package. This package is delta to the latest
Recommended Maintenance package and assumes that it is already
installed. This package also assumes that the Critical Fixes
for April 2002 (APAR IY30431) are installed.
This APAR should be ordered with a service level of 433010.
Security issues resolved:
IY30357 SECURITY: Buffer overflow vulnerability in traceroute
IY31997 SECURITY: Buffer overflow in errpt
* = Potential for remote exploitation
Other critical issues resolved:
IY30626 System crash while deleting arp entry
IY31050 Application fails to load with error 0509-036
IY31059 su gives error and exits from a local user to other
users
This is a packaging APAR only. It will not appear in the list
of APARs on the SMIT "Update Software by Fix (APAR)" panel, nor
will the 'instfix' command show this APAR as being installed
after the updates delivered by this package are installed.
To install selected updates from this package, use the command:
smit update_by_fix
To install all updates from this package that apply to installed
filesets on your system, use the command:
smit update_all
A system reboot is not required after installation for the fixes
in this package to take effect.
PROBLEM SUMMARY:
Packaging only.
------
APAR: IY33265 COMPID: 5765D5100 REL: 340
ABSTRACT: SP-SWITCH2: ENTIRE SWITCH PLANE BROUGHT DOWN BECAUSE OF A SINGLE
PROBLEM DESCRIPTION:
SP-Switch2. Entire switch plane brought down by a single
bad switch adapter. Problem is already described by defect
84881.
PROBLEM SUMMARY:
When the adapter processor takes an exception
it does not generate a PCI interrupt to inform the DD and it
hangs itself afterwards. This causes the switch to back up
causing the whole network to go down.
PROBLEM CONCLUSION:
Added a new function to the bootcode to
invoke error recovery in the error exception path.
------
APAR: IY33266 COMPID: 5765B8100 REL: 220
ABSTRACT: DTMF DIGITS SOMETIMES DETECTED TWICE IN VOICEXML APPLICATION
PROBLEM DESCRIPTION:
DTMF digits are sometimes detected twice when entering data
into a voiceXML application when reco is active also.
PROBLEM SUMMARY:
DTMF digits are sometimes detected twice when
entering data into a voiceXML application when reco is
active also.
PROBLEM CONCLUSION:
Harness changed to correct problem
------
APAR: IY33350 COMPID: 5765B8100 REL: 220
ABSTRACT: LANGUAGE CREATION TIMES OUT
PROBLEM DESCRIPTION:
Sometines the careation of a new language will fail if one
of the voice directories contains a large ammount of data.
PROBLEM SUMMARY:
Sometines the careation of a new language will
fail if one of the voice directories contains a large
ammount of data.
PROBLEM CONCLUSION:
By setting a longer timeout so that the
operations are much less likely to timeout during a voice
directory copy operation.
------
APAR: IY33378 COMPID: 5724C3505 REL: 310
ABSTRACT: IMPROVED ACCESSIBILITY FOR WEBSPHERE VOICE RESPONSE
PROBLEM DESCRIPTION:
Improved accessibility for WebSphere Voice Response
PROBLEM SUMMARY:
Improved accessibility for WebSphere Voice
Response
------
APAR: IY33420 COMPID: 5765D5100 REL: 340
ABSTRACT: SCALING SUPPORT
PROBLEM DESCRIPTION:
The IBM eServer Cluster 1600 scaling limit has been increased
to support clusterswith as many as 32 IBM eServer pSeries
690/670 servers with a maximum of 128 Logical Partitions
(LPARs). A single cluster can now employ 1024 POWER4
processors.
PROBLEM CONCLUSION:
The scaling limit for Hardware Management Console (HMC) is
increased. The HMC can now control up to 8 IBM eServer
pSeries 690/670 servers in a cluster with a maximum of
32 LPARs per HMC.
------
APAR: IY33423 COMPID: 5765B9500 REL: 150
ABSTRACT: MMFSCK DESTROYED THE ALLOC MAP FILES
PROBLEM DESCRIPTION:
mmfsck destroyed the alloc map files
PROBLEM CONCLUSION:
Fix alloc map segment compare. Map header pointer movement
was wrong. Alloc segment data length computation was wrong.
Number of segments to compare was wrong when number of
segments didn't end on a block boundary. Comparison of
lastblocksubblocks should allow 0 or 32 for full last block.
LastSubblocks modulo must use maxSubblocksPerBlock not 32
for directories and other things that have 'small'
fullblocks.
------
APAR: IY33448 COMPID: 5765D5100 REL: 340
ABSTRACT: LATEST PSSP 3.4.0 FIXES AS OF JULY 2002
PROBLEM DESCRIPTION:
This is the lastest PSSP ptf as of June 2002.
Order this apar to get all of the ptfs as of May 2002.
PROBLEM SUMMARY:
This is a packaging apar for PSSP 3.4.0 fixes
as of July 2002.
------
APAR: PQ62946 COMPID: 5765C4200 REL: 330
ABSTRACT: ESSL _COPY PERFORMANCE FOR N NEAR POWER OF 2 DEGRADED
PROBLEM DESCRIPTION:
Customer reported that performance of SCOPY was poor for
N values which were powers of 2.
LOCAL FIX:
Use N value which are not near powers of 2 if possible.
PROBLEM SUMMARY:
Performance of ESSL _COPY routines when N
is near a multiple of 128 is poor.
PROBLEM CONCLUSION:
ESSL COPY routines were modified to use
fewer streams or make adjustments to the lengths of the
problem in order to avoid rolling the L1 and L2 caches on the
POWER4.
TEMPORARY FIX:
Avoid N values near multiples of 128.
------
APAR: PQ63390 COMPID: 5765C4200 REL: 330
ABSTRACT: IMPROVE PERFORMANCE ON POWER4
PROBLEM DESCRIPTION:
Performance improvements for p690 needed.
PROBLEM SUMMARY:
Performance improvements were needed for
multiple routines for the p690.
PROBLEM CONCLUSION:
Multiple routines in the L1 BLAS, L3 BLAS,
Eigensystems and Linear Algebraic Equations were improved.
------
APAR: PQ63391 COMPID: 5765C4200 REL: 330
ABSTRACT: ZSCAL PREFETCHING PAST END OF ARRAY ON POWER4
PROBLEM DESCRIPTION:
On the Power4, ZSCAL can prefetch past the end of an array for
N >= 30.
LOCAL FIX:
Allocate the vector to be longer than necessary to avoid
possible seg fault.
PROBLEM SUMMARY:
On Power4, for N >= 30, ZSCAL may prefetch
past the end of the array.
PROBLEM CONCLUSION:
ZSCAL was corrected to prevent reading past
the end of the array.
------
APAR: PQ63394 COMPID: 5765C4200 REL: 330
ABSTRACT: ERRSET CALLS INCORRECT INTERNALLY FOR MULTIPLE LINEAR ALGEBRAIC
PROBLEM DESCRIPTION:
The IUSADR arguments for internal calls to ERRSET in some
subroutines was incorrect in 64-bit mode.
PROBLEM SUMMARY:
The following ESSL routines had the IUSADR
argument in an internal call to ERRSET incorrectly typed as
a 32-bit integer instead of a 64-bit integer in 64-bit mode:
_POTRF, _POICD, _POTRI, _TRTRI, _TPTRI, _GETRI
PROBLEM CONCLUSION:
Internal calls to ERRSET were corrected.
------
APAR: PQ63401 COMPID: 5765C4200 REL: 330
ABSTRACT: IMPROVE FFT PERFORMANCE FOR SMALL LENGTHS
PROBLEM DESCRIPTION:
SciComp report indicated that small length FFT performance for
ESSL was not as good as some public domain packages.
PROBLEM SUMMARY:
Performance for small length FFTs were not
as good as some public domain packages.
PROBLEM CONCLUSION:
ESSL 1-D FFTs were improved for lengths
which are powers of 2 and less than 64.
------
APAR: PQ63403 COMPID: 5765C4200 REL: 330
ABSTRACT: IMPROVE DAXPY FULL CACHE PERFORMANCE ON POWER4
PROBLEM DESCRIPTION:
Through the Scholars program, it was reported that DAXPY
performance on Power4 for very small sizes was less than that
of NETLIB, inlined code and AIX libblas.a. For larger sizes
which are still in the cache, around N=1024, it was reported
than ESSL DAXPY was less than AIX libblas.a
PROBLEM SUMMARY:
When the data is in the cache, DAXPY was not
performing well on POWER4. For very small sizes, the loop
was not a good choice and an internal call to a routine which
determines the machine type was unnecessary overhead. For
larger sizes(around 1K), the loop was still not scheduled well.
PROBLEM CONCLUSION:
prefetch can cause jumpy behavior from size
to size. However, when the data is not in the cache, this
technique provides a signifcant boost. In the interest of not
degrading the performance for those customers whose data is not
in the cache, this technique was retained.
------
APAR: PQ63407 COMPID: 5765C4200 REL: 330
ABSTRACT: L1 BLAS PERFORMANCE FOR N NEAR MULTIPLES OF 128 DEGRADED ON
PROBLEM DESCRIPTION:
When N is near a multiple of 128, the performance of some
ESSL L1 BLAS routines is poor.
LOCAL FIX:
Avoid N lengths near multiples of 128
PROBLEM SUMMARY:
Performance of some L1 BLAS routines for N
near a multiple of 128 was not good on Power4 as the
technique for hardware prefetching can cause the L1 cache
to be rolled more frequently.
PROBLEM CONCLUSION:
Multiple L1 BLAS codes were updated to use
fewer streams or to adjust the problem size to avoid the
cache problem.
TEMPORARY FIX:
Avoid lengths near multiples of 128.
------
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]