|
Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com |
From: AIX Service Mail Server (aixserv
austin.ibm.com)Date: Wed Jun 26 2002 - 02:39:53 CDT
APAR: IY28660 COMPID: 5765D6100 REL: 220
ABSTRACT: LLSUMMARY -R THROUGHPUT/MAXQUEUED AND REAL TIME MULTIPLE OF
PROBLEM DESCRIPTION:
llsummary -r throughput produces Queue and Real times wrong.
Value is a multiple of nodes used when parallel.
PROBLEM SUMMARY:
The throughput reports, produced by the LoadLeveler
llsummary command, can produce higher than appropriate
Queue Time and Real Time numbers for parallel jobs that ran
on multiple nodes.
PROBLEM CONCLUSION:
The llsummary command had been adding the Queue Time and
Real Time numbers from each node that was used to execute
a parallel job. That made the resulting numbers too high
by the number of nodes that was used for the job.
The command was changed to first determine if the job was a
serial job or a parallel job, and to do the correct
calculations, after that.
------
APAR: IY28683 COMPID: 5765D6100 REL: 220
ABSTRACT: MAXSLOTS AND FREESLOTS IN LLCLASS OUTPUT CALCULATE WRONG WHEN
PROBLEM DESCRIPTION:
llclass output calculates maxslots and freeslots
correctly, as long as maxjobs is not set.
if it is set, the number of maxjobs limits the slots,
however, if jobs are running the freeslots is reduced
by the number of tasks.
so a single job can occupy all the slots, though
there are still cpus left over .
PROBLEM SUMMARY:
In LoadLeveler 2.2,
the MAXSLOTS and FREESLOTS values in the llclass output
are incorrect when the maxjobs value is set.
PROBLEM CONCLUSION:
In LoadLeveler 2.2,
the MAXSLOTS and FREESLOTS values from
the llclass output are
now calculated based on the number
of machines that have tasks running
as well as the max_starter and maxjobs 22
0 lues if set.
------
APAR: IY29118 COMPID: 5765D6100 REL: 220
ABSTRACT: MACHPRIO DOES WRONG CALCULATION OVER TIME
PROBLEM DESCRIPTION:
MACHPRIO very often is based on a computation around
LoadAvg. Now LoadLeveler adjusts the LoadAvg with the
value of NEGOTIATOR_LAODAVG_INCREMEMENT when a job is
started on that node.
unfortunately it can happen that this add-on stays longer,
and if multiple jobs are started on the node in question
accumulate to really strange values. (i saw values of upto
-1240.00).
if the machine is idle for a certain amount of time,
the MACHPRIO value recovers on its own ...
LOCAL FIX:
Recycling of Negotiator or Startd on the problem node
recovers immediatly.
alternatively a sequence of "llctl resume" to that node
can recover eventually, too
PROBLEM SUMMARY:
If Loadavg is used in your calculation for MACHPRIO in the
LoadL_config file, the MACHPRIO value can sometimes get
values well out of the range of what it should be. The
problem can happen if a parallel job starts more than two
tasks on the same node. Newer hardware, with increased
numbers of CPUs, are most susceptible to this problem.
That is based on the assumption that the MAX_STARTERS value
is set equal to the number of CPUs on the machine.
PROBLEM CONCLUSION:
The Negotiator internally adjusts a machine's loadavg when
it starts a new job on that machine. As part of that
adjustment, the Negotiator could sometimes keep adjusting
the adjusted value instead of adjusting the real load value
that it received from the machine. The code that
determined which value to adjust was modified to correct
the problem.
------
APAR: IY29205 COMPID: 5765D9300 REL: 310
ABSTRACT: FAILURE IN CREATING CORE DIRECTORIES AND FILES USING MP_COREDIR
PROBLEM DESCRIPTION:
(a) MP_COREDIR must be set to a directory where the user has
permission to write files in the parent of the specified
directory. For example, MP_CORDIR=/tmp will not work because
the user does not have permission to create files in "/", the
parent of /tmp.
(b) If MP_COREDIR=/tmp/abc then core directores called abc.0,
abc.1, etc, will be created in /tmp and these directories will
contain the core files (light weight or regular). Note that
the coredir name is not abc with subdirectories abc.0, etc, as
expected.
(c) If MP_CORDIR is unset then the core directories will be
created in the directory where the user's job is run. This is
as expected.
PROBLEM SUMMARY:
If the user sets MP_COREDIR=/tmp then the core file created
will be in a file called core and if there are multiple
tasks then only one core file is created called core.
The following error message is received by the user:
ERROR: 0031-144 error creating directory for core files,
reason: <The file access permissions do not allow the
specified action.>
T
PROBLEM CONCLUSION:
MP_COREDIR must be set to a directory where the user has
permission to write files in the parent of the specified
directory. For example, MP_CORDIR=/tmp will not work
because the user does not have permission to create files in
"/", the parent of /tmp.
------
APAR: IY29786 COMPID: 5765D5100 REL: 320
ABSTRACT: MEMORY EXHAUSTED W/ NON-CONTIGUOUS USER DEFINED DATA TYPES
PROBLEM DESCRIPTION:
Memory leak sending MPI non-contiguous user defined types with
MPI_GATHER. After a few thousand iterations, the job will end
with ERROR: 0032-171 Communication subsystem error: Memory is
exhausted. in MPI_Gather, task 0. For 32bit us it runs out at
1,043,000 cycles, for 32 bit ip it runs out at 38,000 cycles and
for 64-bit PRPQ over US it runs out at 521,000 cycles.
PROBLEM SUMMARY:
Memory leak sending MPI non-contiguous user defined types
with MPI_GATHER. After a few thousand iterations, the job
will end with ERROR: 0032-171 Communication subsystem error:
Memory is exhausted. The fix has cleaned up memory
allocations that are no longer in used.
PROBLEM CONCLUSION:
The fix is effective. Unused memories are being cleaned up.
------
APAR: IY29853 COMPID: 5765D5100 REL: 320
ABSTRACT: CHANGES FOR AIX51 HEADERS/LIBS
PROBLEM DESCRIPTION:
changes for aix51 headers/libs
PROBLEM SUMMARY:
Needed to define local storage for errno.
PROBLEM CONCLUSION:
Defined local storage for errno.
------
APAR: IY29924 COMPID: 5765D5100 REL: 320
ABSTRACT: RAS: IMPROVE CSS.SNAP DATA COLLECTION
PROBLEM DESCRIPTION:
The css.snap command needs to improve the data collected on some
'soft' snaps.
PROBLEM SUMMARY:
A 'soft' css.snap command needs to collect a switch adapter
microcode dump under certain conditions.
PROBLEM CONCLUSION:
The css.snap script has been changed to improve RAS.
The changes allow for a soft css.snap to collect a switch
adapter microcode dump in certain cases.
------
APAR: IY30011 COMPID: 5765D5100 REL: 320
ABSTRACT: UNINITIALIZED VARIABLE IN COPY MACRO
PROBLEM DESCRIPTION:
uninitialized variable in copy macro
PROBLEM SUMMARY:
There is a variable used without initialized. This may
course some unpredicable problem like data corruption.
PROBLEM CONCLUSION:
Initialize the variable before using it.
------
APAR: IY30028 COMPID: 5765D5100 REL: 320
ABSTRACT: WITH RRA ON, THE .KLOGIN ON THE NODE SHOULD ONLY HAVE THE
PROBLEM DESCRIPTION:
The root.admin or K4 admin id is always getting added to
/.klogin. but it should only be added if RRA is not on or if
RRA is on and it is the cws.
In /.klogin of the nodes should be no admin entry if RRA is on.
LOCAL FIX:
workaround: changing the updauthfiles script at line 880
from
if (defined($k4_admin)) { print KLOGIN_F "$k4_admin\n"; }
to
if ((defined($k4_admin)) &&
(($local_node_number == 0) || ( $restrict_root_rcmd ne "true
" ))){
print KLOGIN_F "$k4_admin\n";
}
PROBLEM SUMMARY:
***********************************************************
* USERS AFFECTED: Users with ssp.basic 3.2.0.15 *
* or greater, installed on a node with *
* the restrict_root_rcmd attribute of *
* the SP_Restricted class set to true, *
* that are using Kerberos 4. *
* *
***********************************************************
* PROBLEM DESCRIPTION: *
* *
* Running /usr/lpp/ssp/bin/updauthfiles on a node when *
* the restrict_root_rcmd attribute of the SP_Restricted *
* class is true and Kerberos 4 is being used as the *
* authentication method for a partition, an entry is *
* being made in .klogin for root.admin. *
* *
***********************************************************
* RECOMMENDATION: *
* *
* Install APAR IY30028, currently targeted for *
* ssp.basic 3.2.0.20 on PTF Set 20, when available. *
* *
* Until APAR IY30028 is available, after running *
* updauthfiles edit .klogin to remove the root.admin *
* entry if the restrict_root_rcmd attribute of the *
* SP_Restricted class is true. *
* *
***********************************************************
------
APAR: IY30039 COMPID: 5765D5100 REL: 320
ABSTRACT: CLEANUP.LOGS.WS FAILS AT PSSP-3.2 PTF18
PROBLEM DESCRIPTION:
The last line of cleanup.logs.ws is a tilde (~). This cuases
error: /root: 0403-006 Execute permission denied.
LOCAL FIX:
Remove the line with the tilde.
PROBLEM SUMMARY:
An extra line was inadvertantly added to the end of
cleanup.logs.ws. The extra line contained one character -
the tilde ( ). This would cause an execution error like:
/usr/lpp/ssp/bin/cleanup.logs.ws 240 : /:
0403-006 Execute permission denied.
The command still works properly.
The error message can be ignored.
PROBLEM CONCLUSION:
The extra line has been removed.
TEMPORARY FIX:
Edit /usr/lpp/ssp/bin/cleanup.logs.ws and remove the last
line.
------
APAR: IY30050 COMPID: 5765D5100 REL: 320
ABSTRACT: SYSLOGD NOT RESTARTED BY CLEANUP.LOGS.NODES
PROBLEM DESCRIPTION:
The syslogm routine in logmgt.cmds, called by cleanup.logs.nodes
(via psyslclr) stops syslogd after trimming a log. If trimming
another log results in an error, the routine exits without
restarting syslogd.
PROBLEM SUMMARY:
When psyslclr is invoked to trim multiple logs and it
successfully trims the first log, but does not have enough
space to trim subsequent logs, syslogd is stopped
but not restarted.
PROBLEM CONCLUSION:
Modified code in logmgt.cmds so that if syslogd is stopped,
it is always restarted.
------
APAR: IY30184 COMPID: 5765D5100 REL: 320
ABSTRACT: IP_RESET(IP_INIT) ERRORS ON SP SWITCH
PROBLEM DESCRIPTION:
ip_reset(IP_INIT) errors on sp switch
PROBLEM SUMMARY:
When a new set of switch routes must be
downloaded to a node on the SP switch, it's possible for an
ip_reset(IP_INIT) error to occur. This error can only be
addressed by rebooting the affected node.
PROBLEM CONCLUSION:
The SP Switch IP driver and microcode have
been changed to prevent ip_reset (IP_INIT) errors from
occuring.
------
APAR: IY30196 COMPID: 5765B9500 REL: 140
ABSTRACT: GPFS MINOR NUMBERS NOT SYNCHRONIZED BETWEEN HACMP NODES CAN
PROBLEM DESCRIPTION:
GPFS is not picky about the minor numbers it assigns to its
filesystem entries in /dev. Basically it just starts at 100 and
increments until it find a free number.
The problem that occurrs on hacmp clusters when clients are NFS
mounting the GPFS filesystems, is that NFS receives a filehandle
based, in part, on the minor number of the filesystem.
If different clients are accessing the same filesystem from two
different gpfs nodes using differing device minor numbers (and
thus different filehandles), when a failover occurs, the node
now handling all the clients will not recognize the other node's
client requests.
LOCAL FIX:
Manually synchronize the device minor numbers when the file-
systems are created, and monitor them periodically in case one
gets deleted (which will result in gpfs recreating it in the
original manner).
PROBLEM SUMMARY:
Fixed /def minor number needed NFS
failover of gpfs server nodes.
PROBLEM CONCLUSION:
Start assigning permanent minor numbers
to all new file systems. The minor numbers will be in the
range 150-maxminornumber (65535 or 255).
TEMPORARY FIX:
Manually synchronize the device minor numbers
when the file-systems are created, and monitor them
periodically in case one gets deleted (which will result in
gpfs recreating it in the original manner).
------
APAR: IY30224 COMPID: 5765B9501 REL: 320
ABSTRACT: MMCRLV FAILS YET RETURNS ERROR CODE OF 0
PROBLEM DESCRIPTION:
mmcrlv fails yet returns error code of 0
PROBLEM SUMMARY:
mmcrlv and mmcrvsd updated to ensure zero return code on a
failure related to hdisk already part of a VG.
PROBLEM CONCLUSION:
mmcrlv and mmcrvsd: fix bug in which conditional call to
unlockSDR() clobbered rc
------
APAR: IY30247 COMPID: 5765D5100 REL: 320
ABSTRACT: KFSERVER_TIMEOUT DEFAULT SHOULD BE 200
PROBLEM DESCRIPTION:
kfserver_timeout default should be 200
PROBLEM SUMMARY:
The value of the kfserver_timeout attribute in the SP class
is currently set to 600. There is no longer a reason for
the value to be this high. It should be lowered to 200.
PROBLEM CONCLUSION:
The value of the kfserver_timeout attribute in the SP class
has been lowered to 200.
------
APAR: IY30383 COMPID: 5765B9501 REL: 330
ABSTRACT: MEMORY LEAK WHILE USING DMAPI
PROBLEM DESCRIPTION:
Memory leak while using DMAPI.
PROBLEM SUMMARY:
fixed memory leadk in dmapi
PROBLEM CONCLUSION:
sfsdmgetdirattrs: not freeing inode
buffer.
------
APAR: IY30392 COMPID: 5765D5100 REL: 320
ABSTRACT: UPDSUPPWD SHOULD ONLY UPDATE THE SUPMAN PASSWORK IF CHANGED
PROBLEM DESCRIPTION:
updsuppwd should only update the supman password if changed
PROBLEM SUMMARY:
The updsuppwd routine will compare the checksum files of the
current password and the previous password. If the checksum
file match then the password will not be transferred over
the s1term and updated on the node.
PROBLEM CONCLUSION:
The updsuppwd routine must be changed to not request the
password for the supman id over the s1term if the password
has not changed.
------
APAR: IY30436 COMPID: 5765B9501 REL: 320
ABSTRACT: CATCH RUNAWAY QUOTA INDOUBT VALUES
PROBLEM DESCRIPTION:
catch runaway quota indoubt values
PROBLEM SUMMARY:
Debug code added to capture runaway quota condition.
PROBLEM CONCLUSION:
Add a trigger that asserts on "run-away" inDoubt values in
update() routine, so that the stripe group manager gets log
assert and trace data can be collected.
------
APAR: IY30595 COMPID: 5765B9501 REL: 320
ABSTRACT: MMDELDISK -C STOPS ON EMEDIA ERROR
PROBLEM DESCRIPTION:
mmdeldisk -c stops on emedia error
PROBLEM SUMMARY:
Fixed mmdeldisk stopping on EMEDIA error
PROBLEM CONCLUSION:
Check for both EIO and EMEDIA errors on reads only on
copyReplicas when deciding to 'break' disk addresses
pointing to bad stripes.
------
APAR: IY30596 COMPID: 5765B9501 REL: 330
ABSTRACT: MMDELDISK -C STOPS ON EMEDIA ERROR
PROBLEM DESCRIPTION:
mmdeldisk -c stops on emedia error
PROBLEM SUMMARY:
Fixed mmdeldisk stopping on EMEDIA error
PROBLEM CONCLUSION:
Check for both EIO and EMEDIA errors on reads only on
copyReplicas when deciding to 'break' disk addresses
pointing to bad stripes.
------
APAR: IY30597 COMPID: 5765B9501 REL: 320
ABSTRACT: MEMORY LEAK WHILE USING DMAPI
PROBLEM DESCRIPTION:
Memory leak while using DMAPI.
PROBLEM SUMMARY:
fixed memory leadk in dmapi
PROBLEM CONCLUSION:
sfsdmgetdirattrs: not freeing inode
buffer.
------
APAR: IY30600 COMPID: 5765B9501 REL: 320
ABSTRACT: ASSERT --IBDP1->INDDIRTY && IDP2->INDDIRTY, METADATA.C, LINE 98
PROBLEM DESCRIPTION:
assert --ibp1->inddirty && ibdp2->inddirty, metadata.c line 98
PROBLEM SUMMARY:
Fixed Assert condition: ibdP1->indDirty && ibdP2->indDirty
PROBLEM CONCLUSION:
iIn doubleUpdateDiskAddr, don't assert that the indirect
blocks are dirty. The update might have already been done by
another node which failed after logging the indirect block
changes. Also, don't deallocate the old addresses unless
they changed since the deallocation might have also happened
before the node crashed.
------
APAR: IY30603 COMPID: 5765B9501 REL: 320
ABSTRACT: PANIC--FETCH-VFS-KX.C
PROBLEM DESCRIPTION:
panic--fetch-vfs-kx.c
PROBLEM SUMMARY:
Fixed panic condition in fetch-vfs-kx.C::bdP->whichBufList
!= w
PROBLEM CONCLUSION:
Prefetch list mutex does not need to be dropped across the
call to cacheObjRele, since the hold count cannot go to
zero.
------
APAR: IY30604 COMPID: 5765B9501 REL: 330
ABSTRACT: PANIC--FETCH-VFS-KX.C
PROBLEM DESCRIPTION:
panic--fetch-vfs-kx.c
PROBLEM SUMMARY:
Fixed panic condition in fetch-vfs-kx.C::bdP->whichBufList
!= w
PROBLEM CONCLUSION:
Prefetch list mutex does not need to be dropped across the
call to cacheObjRele, since the hold count cannot go to
zero.
------
APAR: IY30610 COMPID: 5765B9501 REL: 320
ABSTRACT: MMCHECKQUOTA PRODUCES NEGATIVE NUMBERS
PROBLEM DESCRIPTION:
mmcheckquota sometimes produces negative numbers when GPFS is
under heavy load.
PROBLEM SUMMARY:
mmquotacheck sometimes produces negative numbers for disk
usage.
PROBLEM CONCLUSION:
Do not update server's shadow entries at ComputeShare and
Relinquish routines since the quota usage and quota share
accounting in this case is done through regular quota
entries.
------
APAR: IY30651 COMPID: 5765D5100 REL: 320
ABSTRACT: PMAN ARRAY LIMIT MEANS THAT WHEN AN EVENT HAPPENS, A MESSAGE MAY
PROBLEM DESCRIPTION:
Pman internal array default of 16 adapters per node may not be
enough and can overwrite the pman definitions, causing the
nodes not to see any pman definitions!
LOCAL FIX:
pmand uses an internal array to read the SDR Adapter info. into.
This array is hard coded to 16 members (for each node).
If you have more than 16, it writes beyond the end of the
array, stepping on the PMAN_Subscription variable.
The array size has been set to 32 in a new version of pmand.
Until this new pmand is used, try reduce the amount of SDR info.
PROBLEM SUMMARY:
The code uses the PMAN_subscription variable to remember if
the SDR file is the new PMAN_Subscription file or the old
pmandConfig file.
Because the variable got stepped on when the array
overflowed, the code was incorrectly looking for a
pmandConfig file. The result is it does not find
any events, because it is looking in the wrong
place for them.
PROBLEM CONCLUSION:
Increased the size of the array (number of adapters per
node) from 16 to 64. This will prevent the array from
being overrun and the PMAN_Subscription variable from
getting stepped on.
------
APAR: IY30692 COMPID: 5765D5100 REL: 320
ABSTRACT: DUPLICATE CALLS TO FREE() CAUSING CORE DUMP IN SDR
PROBLEM DESCRIPTION:
free() is being called twice on with the same memory block,
cuasing sdrd to core dump.
PROBLEM SUMMARY:
A duplicate call to free() was causing sdrd to core dump.
PROBLEM CONCLUSION:
The duplicate call has been removed.
------
APAR: IY30693 COMPID: 5765D5100 REL: 320
ABSTRACT: CSS.SNAP.LOG FILE CAN BE OVERWRITTEN
PROBLEM DESCRIPTION:
css.snap.log file can be overwritten
PROBLEM SUMMARY:
If the contents of the css log directories in
/var/adm/SPlogs/css occupy more than 30% of /var, the
css.snap utility will try to free space by deleting old
css.snap files. If there are no files with names ending
in "....css.snap.tar.Z", the css.snap.log file will be
overwritten.
PROBLEM CONCLUSION:
The output of the "ls" command to list the css.snap tar
files is appended to the end of the css.snap.log file.
------
APAR: IY31117 COMPID: 5765B9501 REL: 330
ABSTRACT: PROBLEMS MOUNTING GPFS FS AFTER DELETING DISKS. DISK DESCRIPTOR
PROBLEM DESCRIPTION:
Problems mounting gpfs fs after deleting disks. The error
6027-711 was received which indicated that the disk or fs
does not exits. It mentioned the deleted disks. the mmsdrfs2
file in the SDR and /var/mmfs/gen were updated and did not show
the disks. The problem is that the disk descriptor areas on
some vsd's are not updated. By chance, the ones that are not
updated are the first one gpfs uses in attempting to mount the
fs causing the failure.
PROBLEM SUMMARY:
After the mmdeldisk command, some filesystem would not be
able to remount due to old replica data.
PROBLEM CONCLUSION:
When migrating the stripe group descriptor to a new replica
set, update the copy of the destriptor on all other disks in
the stripe group as well. This is necessary to prevent
future attempts to read from disks in the old replica set in
case these disks have since been deleted from the stripe
group.
------
APAR: IY31130 COMPID: 5765B9501 REL: 320
ABSTRACT: PROBLEMS MOUNTING GPFS FS AFTER DELETING DISKS. DISK DESCRIPTOR
PROBLEM DESCRIPTION:
Problems mounting gpfs fs after deleting disks. The error
6027-711 was received which indicated that the disk or fs
does not exits. It mentioned the deleted disks. the mmsdrfs2
file in the SDR and /var/mmfs/gen were updated and did not show
the disks. The problem is that the disk descriptor areas on
some vsd's are not updated. By chance, the ones that are not
updated are the first one gpfs uses in attempting to mount the
fs causing the failure.
PROBLEM SUMMARY:
After the mmdeldisk command, some filesystem
would not be able to remount due to old replica data.
PROBLEM CONCLUSION:
When migrating the stripe group descriptor
to a new replia set, update the copy of the descriptor on all
other disks in the stripe group as well. This is necessary
to prevent future attempts to read from disks in the old
replica set in case these disks have since been deleted from
the stripe group.
------
APAR: IY31150 COMPID: 5765B9501 REL: 330
ABSTRACT: MMFS: FCNTL LOCK LOOPING ON A NODE
PROBLEM DESCRIPTION:
mmfs hanging in fcntl lock on one node while trying to revoke
from another node that had already relinquished that token, but
had forgotten to tell the token manager.
PROBLEM SUMMARY:
fixed multi-node fcntl token locking condition.
PROBLEM CONCLUSION:
always relinquish down to nl in revoke
handler when byte range tokens are unknown.
------
APAR: IY31172 COMPID: 5765D6100 REL: 220
ABSTRACT: TIMING EXPOSURE IN LOADL_NEGOTIATOR CAUSES DEADLOCK
PROBLEM DESCRIPTION:
Timing exposures, between a job completion and a Negotiator
Cycle can cause a deadlock condition in the Negotiator.
PROBLEM SUMMARY:
A timing exposure in the LoadLeveler Negotiator could make
it think that there were jobs running, on a machine, when
they had already finished. That wrong assumption could
cause the Negotiator to try to get the same lock, for
write, a second time. The Negotiator would hang, after
that.
PROBLEM CONCLUSION:
The LoadLeveler Negotiator added a second verification
that there were jobs running, on a machine, before trying
to use certain data about the jobs on that machine.
------
APAR: IY31173 COMPID: 5765D6100 REL: 220
ABSTRACT: A CANCELED INTERACTIVE JOB MIGHT CAUSE THE NEGOTIATOR TO QUIT
PROBLEM DESCRIPTION:
If an Interactive job is Ctrl-C'd at the same time that the
Negotiator decides that it cannot schedule the job to run, the
LoadL_negotiator daemon may fail to handle the job correctly
and will intentionally terminate itself.
PROBLEM SUMMARY:
An interactive poe job can cause the LoadLeveler Negotiator
to get confused, and decide to terminate itself, if the
interactive job is terminated at just the right time during
the negotiation cycle.
PROBLEM CONCLUSION:
The LoadLeveler Negotiator was modified to keep better
track of interactive jobs, during the negotiation cycle.
------
APAR: IY31238 COMPID: 5765D5100 REL: 320
ABSTRACT: SETUP_SERVER SHOULD IGNORE PPP CONNECTIONS
PROBLEM DESCRIPTION:
If pp0 adapter is pressent setup_server fails.
setup_server : host: 0827-803 Cannot find address 0.0.0.0.
setup_CWS: 0016-338 Kerberos setup was bypassed for network
interfaces that could not be resolved
Setup_server ends with rc = 0. But The node you are installing
does not receive a kerberos ticket.
Circumvention this problem by detaching pp0 causes that
svcagent cannot be activated and running during setup_server
action.
LOCAL FIX:
A good workaround is to add an entry to /etc/hosts like:
zero 0.0.0.0 # dummy ppp entry to prevent setup_server problems
PROBLEM SUMMARY:
When the Point-to-Point Protocol (PPP) is being used on
a Control Workstation, setup_CWS will terminate processing
with the messages:
host: 0827-803 Cannot find address 0.0.0.0.
setup_CWS: 0016-338 Kerberos setup was bypassed for
network interfaces that could not be resolved.
Since the Point-to-Point Protocol is being displayed in
the netstat -in data, setup_CWS tries to determine the
IP addresses for these interfaces and fails. The data
from the Point-to-Point Protocol should be ignored
by setup_CWS.
PROBLEM CONCLUSION:
setup_CWS has been modified to skip lines of data from
netstat -in which refer to the Point-to-Point Protocol.
------
APAR: IY31239 COMPID: 5765D5100 REL: 320
ABSTRACT: SP SWITCH 2 WINDOW SUSPEND FAILURE AFTER LOADLEVELER HAS TRIED
PROBLEM DESCRIPTION:
On the SP Switch 2, a failure may occur suspending windows if
a job fails to respond to the suspend request that is issued
during switch recovery. This problem can happen if a job has
a SIGKILL pending (having been killed by LoadLeveler) but has
not yet fully processed the SIGKILL because a thread is in a
system call with signals blocked. When the windows fail to
suspend because of a non-responsive job, switch recovery will
fail on the affected node, and switch responds will be lost on
the affected switch plane.
PROBLEM SUMMARY:
Nodes can drop off the SP Switch 2 when the switch device
driver fails to suspend jobs that are running. The
adapter.log will show the following error:
QUERY SUSPEND WINDOW_COMPLETION ioctl failed
PROBLEM CONCLUSION:
The device driver for the SP Switch 2 has been changed
to allow suspend requests to be properly handled for
jobs that are starting or stopping.
------
APAR: IY31245 COMPID: 5765D5100 REL: 320
ABSTRACT: RC.SP SETS THE WRONG BOOTLIST IF TOTAL BOOTDISKS NOT
PROBLEM DESCRIPTION:
rc.sp sets the wrong bootlist if total bootdisks
not equivalent to total install disks
PROBLEM SUMMARY:
On the reboot of a node, the bootlist was being reset to
include all of the physical volumes listed for the selected
volume group of the node. Even the physical volumes that
did not contain boot logical volumes were included in
the bootlist. If there was a high number of physical
volumes it could cause a subsequent reboot to fail.
PROBLEM CONCLUSION:
spboot, which is called by /etc/rc.sp, was modified to only
set the bootlist to physical volumes that contain boot
logical volumes.
------
APAR: IY31249 COMPID: 5765B9501 REL: 320
ABSTRACT: MMFS: FCNTL LOCK LOOPING ON A NODE
PROBLEM DESCRIPTION:
mmfs hanging in fcntl lock on one node while trying to revoke
from another node that had already relinquished that token, but
had forgotten to tell the token manager.
PROBLEM SUMMARY:
fixed multi-node fcntl token locking condition.
PROBLEM CONCLUSION:
always relinquish down to nl in revoke
handler when byte range tokens are unknown.
------
APAR: IY31253 COMPID: 5765B9501 REL: 330
ABSTRACT: PROBLEMS WITH NOSUID FLAG IN GPFS FILESYSTEMS
PROBLEM DESCRIPTION:
problems with nosuid flag in gpfs filesystems
PROBLEM SUMMARY:
Security Problem.
PROBLEM CONCLUSION:
Security Problem Resolved.
------
APAR: IY31357 COMPID: 5697E3000 REL: 220
ABSTRACT: WNN6 HUNG-UP BY CTRL + Y
PROBLEM DESCRIPTION:
Wnn6 hungs up by Ctrl + Y.
LOCAL FIX:
Update xwnmo.
------
APAR: IY31372 COMPID: 5765B9501 REL: 330
ABSTRACT: FSCK DOES NOT FIX CORRUPTED ALLOC MAP CHAINS
PROBLEM DESCRIPTION:
mmfsck does not fix FSSTRUCT errors of type 114 (corrupted
allocation maps).
PROBLEM SUMMARY:
Fixed mmfsck to repair FSSTRUCT errors of type 114
(corrupted allocation maps)
PROBLEM CONCLUSION:
Fix relinkAllChunks which computed an incorrect allocation
map magic number for a disk. Provide new functionality to
verify allocation map chunk list head bitmap chain.
Recognize chunk list head loops and unlinked chunks.
------
APAR: IY31376 COMPID: 5765D5100 REL: 320
ABSTRACT: DIAGS FAILING INVALIDLY
PROBLEM DESCRIPTION:
When cfgmgr runs diags against the SP-Switch2 adapter, the diags
routing may fail or not complete. This leaves the status of thea
adapter in the " diag_fail " state. Later rc.switch will fail
and the adpater will not join the switch.
PROBLEM SUMMARY:
There is a potential for adapter reset to be run
concurrently during diagnostics.
PROBLEM CONCLUSION:
Added locking to device driver calls to adapter reset to
prevent simultaneous resets during diagnostics.
------
APAR: IY31379 COMPID: 5765B9500 REL: 130
ABSTRACT: GPFS:6027-848 CONFIG MANAGER 35 FAILED UPDATING NEW NODE STATUS
PROBLEM DESCRIPTION:
gpfs:6027-848 config manager 35 failed updating new node status
PROBLEM SUMMARY:
fixed sysctl locking condition with
mmconfig.
PROBLEM CONCLUSION:
in the sp environment, do not use the
output of hostname as a lock identifier. If hostname on a
node is set to be the same as the switch adapter name, locks
cannot be reclained (sysctl cannot talk to the node).
------
APAR: IY31380 COMPID: 5765B9501 REL: 330
ABSTRACT: GPFS:6027-848 CONFIG MANAGER 35 FAILED UPDATING NEW NODE STATUS
PROBLEM DESCRIPTION:
gpfs:6027-848 config manager 35 failed updating new node status
PROBLEM SUMMARY:
Fixed sysctl locking condition with mmconfig.
PROBLEM CONCLUSION:
In the sp environment, do not use the output of hostname as
a lock identifier. If hostname on a node is set to be the
same as the switch adapter name, locks cannot be reclaimed
(sysctl cannot talk to the node).
------
APAR: IY31382 COMPID: 5765B9501 REL: 330
ABSTRACT: FCNTL LOCKS NOT CLEANED UP ON MMFS DEATH
PROBLEM DESCRIPTION:
fcntl locksnot cleaned up on mmfs death
PROBLEM SUMMARY:
Fixed GPFS recovery condition
PROBLEM CONCLUSION:
kxRecLockReset should process all filesystems even if they
have been marked unmounted previously during shutdown.
------
APAR: IY31386 COMPID: 5765B9501 REL: 320
ABSTRACT: TSCTL MAXFCNTLRANGESPERFILE DOES NOT CHANGE VALUE
PROBLEM DESCRIPTION:
tsctl maxfcntlrangesperfile does not change value
PROBLEM SUMMARY:
Fixed tsctl maxFcntlRangesPerFile does not change value
PROBLEM CONCLUSION:
Fix incorrect setting of maxMBpS when tsctl
maxFcntlRangesPerFile specified.
------
APAR: IY31387 COMPID: 5765B9501 REL: 330
ABSTRACT: TSCTL MAXFCNTLRANGESPERFILE DOES NOT CHANGE VALUE
PROBLEM DESCRIPTION:
tsctl maxfcntlrangesperfile does not change value
PROBLEM SUMMARY:
Fixed tsctl maxFcntlRangesPerFile does not change value
PROBLEM CONCLUSION:
Fix incorrect setting of maxMBpS when tsctl
maxFcntlRangesPerFile specified.
------
APAR: IY31388 COMPID: 5765B9501 REL: 330
ABSTRACT: ASSERT !"NEW_DELETE_DEBUG", NEWDEBUG.C, LINE 176
PROBLEM DESCRIPTION:
assert !"new_delete_deubg", newdebug.c, line 176
PROBLEM SUMMARY:
Fixed failure in mmrestripefs
PROBLEM CONCLUSION:
Realloc code needs to verify the configuration is correct
before updating the disk effort counters.
------
APAR: IY31572 COMPID: 5765B9501 REL: 320
ABSTRACT: MMREPQUOTA SHOWS NEGATIVE USAGE AFTER MMRESTRIPEFS
PROBLEM DESCRIPTION:
mmrepquota shows negative usage after mmrestripefs
PROBLEM SUMMARY:
Fixed mmrestripefs causing mmrepquota to show incorrect
usage.
PROBLEM CONCLUSION:
During restripe and defrag, when deallocating unused blocks
do not decrement quota usage count if these blocks were not
allocated with allocBlock.
------
APAR: IY31576 COMPID: 5765B9501 REL: 330
ABSTRACT: MMREPQUOTA SHOWS NEGATIVE USAGE AFTER MMRESTRIPEFS
PROBLEM DESCRIPTION:
mmrepquota shows negative usage after mmrestripefs
PROBLEM SUMMARY:
Fixed mmrestripefs causing mmrepquota to show incorrect
usage.
PROBLEM CONCLUSION:
During restripe and defrag, when deallocating unused blocks
do not decrement quota usage count if these blocks were not
allocated with allocBlock.
------
APAR: IY31579 COMPID: 5765B9501 REL: 330
ABSTRACT: NODE PANICKED BY RUNNING GPFS_STAT()
PROBLEM DESCRIPTION:
node panicked by running gpfs_stat()
PROBLEM SUMMARY:
kpathname being traced after it is freed in kernel.
PROBLEM CONCLUSION:
Fixed trace path which could cause gpfs_stat() to panic
node.
------
APAR: IY31637 COMPID: 5724C3505 REL: 310
ABSTRACT: UNEXPECTED BEHAVIOUR AFTER RETURNING FROM INVOKEAPPLICATION
PROBLEM DESCRIPTION:
DTBE Applications behave irratially after a call to
invokeApplication returns. The cause is that the internal
representation of the call status may not be accurate. This can
result in unexpected errors.
PROBLEM SUMMARY:
Unexpected behaviour of app after return
from invoke application call
PROBLEM CONCLUSION:
Code modified to handle correctly
------
APAR: IY31768 COMPID: 5765D5100 REL: 320
ABSTRACT: SDRCHANGEATTRVALUES KFSERVER_TIMEOUT FAILS ON REJECT
PROBLEM DESCRIPTION:
sdrchangeattrvalues kfserver_timeout fails on reject
PROBLEM SUMMARY:
Enhancements were required to packaging files for ssp.basic
for the setting of the kfserver_timeout attribute in the
SP class.
PROBLEM CONCLUSION:
Enhancements were made to packaging files for ssp.basic
for the setting of the kfserver_timeout attribute in the
SP class.
------
APAR: IY31802 COMPID: 5765B9501 REL: 330
ABSTRACT: ASSERT AFTER METANODE RELINQUISH
PROBLEM DESCRIPTION:
assert after metanode relinquish
PROBLEM SUMMARY:
Fixed an Assert after metanode relinquish
PROBLEM CONCLUSION:
Test for turning off the newMnode flag was in the wrong
place
------
APAR: IY31993 COMPID: 5765B8100 REL: 220
ABSTRACT: 3270 SESSIONS DO NOT ALWAYS RECOVER WHEN HOST GOES DOWN
PROBLEM DESCRIPTION:
Sometimes if the Host goes down then the 3270 Sessions do
not always recover when the host comes back again.
This is more likely to be seen if some of the sessions are
on a host which stays up, but other sessions are on a host
which goes down.
PROBLEM SUMMARY:
Sometimes if the Host goes down then the 3270 S
Sessions do not always recover when the host comes back again.
This is more likely to be seen if some of the sessions are on a
host which stays up, but other sessions are on a host which
goes down.
PROBLEM CONCLUSION:
If scripts are running when DT is shutdown t
then both CTRL3270 and EXEC3270 tried to deactivate sessions
using E32DACT and E32DACTA. This causes havoc with the TPS
library can result in some of the sessions being broken in SNA
when DT is restarted. The fix was to streamline the shutdown so
that only E32DACT is used and only once.
------
APAR: IY32157 COMPID: 5765D5100 REL: 320
ABSTRACT: LATEST PSSP 3.2.0 FIXES AS OF JUNE 2002
PROBLEM DESCRIPTION:
This is the latest PSSP ptf as of June 2002.
Order this apar to get all of the ptfs as of June 2002.
PROBLEM SUMMARY:
This is a packaging apar for PSSP 3.2.0 fixes
as of June 2002
PROBLEM CONCLUSION:
This is a packaging apar for PSSP 3.2.0
fixes as of June 2002
------
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]