|
Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com |
From: AIX Service Mail Server (aixserv_at_austin.ibm.com)
Date: Tue Oct 08 2002 - 02:43:49 CDT
has requested a copy or has subscribed to the document named "New_AIXV4_Fixes".
If you would like to be removed from this mailing list, send e-mail to
aixserv
austin.ibm.com with a subject of "unsubscribe New_AIXV4_Fixes", or
send a note to owner-aixserv
austin.ibm.com with your request.
APAR: IY32602 COMPID: 5765E6900 REL: 310
ABSTRACT: LLQ AND LLSTATUS DID NOT RETURN A MESSAG
PROBLEM DESCRIPTION:
after installtion of pssp sw ll commands died silently,
returncode 1.
llq and llstatus did not return a message but llctl returns
message 2539-510....
LOCAL FIX:
you may need to reinitialize trusted services
PROBLEM SUMMARY:
When the trusted services isn't configured,
LoadLeveler llq and llstatus commands would
return no data or error messages.
PROBLEM CONCLUSION:
LoadLeveler llq and llstatus would return
correct data even if trusted services
aren't configured.
------
APAR: IY32609 COMPID: 5765D5100 REL: 340
ABSTRACT: NODECOND_CHRP FAILS WITH AUTO/AUTO SETTINGS FOR
PROBLEM DESCRIPTION:
An install of a pseries 670/690 machine, with enet_rate and
duplex settings set to auto/auto, fails with the following error
in the nodecond log along with a led e1f4 node hang:
Nodecond Status: network type not selected
Nodecond Status: pSeries 670/690-released OF lock (network boot)
return code -1 from boot_network
The problem only seems to be isolated to the 670/690 machines.
LOCAL FIX:
Set the enet_rate and/or duplex setting to something other than
auto.
PROBLEM SUMMARY:
netbooting a node with a 10/100 Mbps Ethernet PCI Adapter II
as en0 will fail if the rate and duplex settings of the
adapter are set to auto in the SDR. nodecond_chrp needs to
be modified to handle this adapter.
PROBLEM CONCLUSION:
nodecond_chrp has been modified to be able to netboot a
node with a 10/100 Mbps Ethernet PCI Adapter II as en0
when the rate and duplex settings of the adapter are set
to auto in the SDR.
------
APAR: IY32786 COMPID: 5765D5100 REL: 340
ABSTRACT: XNTPD INOPERATIVE WHEN BROADCASTCLIENT AND SERVER TIMEMASTER
PROBLEM DESCRIPTION:
This syetem has two nodes which had a new install of PSSP 3.4.
A lssrc -a showed the XNTPD was inoperative.
The errpt -a shows an entry for failing module xntpd, which had
a SOFTWARE PROGRAM ERROR, Symptom code 256, Software error code
-9017 and error code of 0.
The root of the xntpd failure is that /etc/ntp.conf has
conflicting lines in it. The file originated in the spmig that
came with PSSP 3.4 and it already had a line of broadcastclient
in it. This system required that timemaster be specified and
that was done via smitty. That invoked spsitenv properly and
this process added the server timemaster line to ntp.conf
without removing the conflicting broadcastclient line.
The customer manually editied the broadcastclient line to
comment it out. Now xntpd is able to start and continue
sucessfully.
When seting a site environment variable to specifiy a server
for the ntp, the broadcastclient line should always be removed.
LOCAL FIX:
Edit the ntp.conf to comment out the broadcastclient line when
specifing a server for xntp.
PROBLEM SUMMARY:
PSSP began using AIX's default ntp.conf file, which has a
"broadcastclient" line. This line is incompatible with
server lines. When both are present, it can cause ntp to
terminate.
PROBLEM CONCLUSION:
The broadcastclient line will be commented by the ntp
configuration routine when any server lines are added.
There will also be a one-time commenting at PTF
installation.
------
APAR: IY33002 COMPID: 5765E6110 REL: 220
ABSTRACT: REQUIRED MAINTENANCE UPGRADE
PROBLEM DESCRIPTION:
required maintenance upgrade
------
APAR: IY33004 COMPID: 5765E6100 REL: 110
ABSTRACT: REQUIRED MAINTENANCE UPGRADE
PROBLEM DESCRIPTION:
required maintenance upgrade
------
APAR: IY33209 COMPID: 5765D5100 REL: 340
ABSTRACT: HMREINIT FAILS TO ADD ENTRY FOR ROOT.ADMIN IN HMACLS FILE, IF
PROBLEM DESCRIPTION:
If a CWS hostname starts with a number (e.g. 2cws) hmreinit
will fail to add the root.admin entry for a new frame to the
hmacls file, if the frame number matches the number at the
begin of the hostname (frame 2 in this case).
This is caused by the following statement in hmreinit:
if -z `/bin/grep "ª *$frame_numberª 0-9 " $HMACLS`
LOCAL FIX:
Add the missing line manually to /spdata/sys1/spmon/hmacls file:
frame# root.admin vsm.
stopsrc -s hardmon
startsrc -s hardmon
PROBLEM SUMMARY:
If the hostname of the Control Workstation begins with
a number that matches a frame number, hmreinit fails to add
all the required entries to /spdata/sys1/spmon/hmacls for
that frame.
PROBLEM CONCLUSION:
hmreinit has been modified to handle the case where the
hostname of the Control Workstation begins with a number
that matches a frame number. hmreinit will now add
all the required entries to /spdata/sys1/spmon/hmacls for
that frame.
------
APAR: IY33223 COMPID: 5765D5100 REL: 340
ABSTRACT: DURING SYSMAN_TEST MESSAGE 0037-014 IS ISSUED FOR PPP CONNECTION
PROBLEM DESCRIPTION:
While running SYSMAN_test the following is issued when the CWS
has a pp0 adapter (used by Service Agent).
SYSMAN_test: 0037-014 Control workstation IP addresses in SDR
do not match netstat output
# netstat -in
...
pp0* 1500 link#4
pp0* 1500 0 0.0.0.0 <==
...
For SYSMAN_test these messages can be ignored but the logic
should be changed to tolerate the PPP adapter.
See APAR IY31780 for similar symptoms.
PROBLEM SUMMARY:
When the Point-to-Point Protocol (PPP) is being used on
a Control Workstation, SYSMAN_test will issue the
following message(s):
SYSMAN_test: 0037-014 Control workstation IP addresses
in SDR do not match netstat output
Since the Point-to-Point Protocol is being displayed in
the netstat -in data, SYSMAN_test tries to match it
with data in the SDR and fails. The data from the
Point-to-Point Protocol should be ignored by SYSMAN_test.
PROBLEM CONCLUSION:
SYSMAN_test has been modified to skip lines of data from
netstat -in which refer to the Point-to-Point Protocol.
------
APAR: IY33247 COMPID: 5765D5100 REL: 340
ABSTRACT: CSHUTDOWN NODE HANG ON CSCONTROL
PROBLEM DESCRIPTION:
cshutdown node hang on cscontrol
PROBLEM SUMMARY:
The cshutdown code was first obtaining DCE credentials. If
it received the DCE credentials it did not proceed to
obtaining the K4 credentials. When shutdown is run on the
K4 nodes, the shutdown receives an error due to the lack of
K4 credentials.
PROBLEM CONCLUSION:
The cshutdown code was changed to obtain both DCE
credentials and K4 credentials if DCE and (k4) compat is
configured on the SP.
------
APAR: IY33264 COMPID: 5765D5100 REL: 340
ABSTRACT: MICROCODE SHOULD TURN OFF TBIC PORT ON P750
PROBLEM DESCRIPTION:
Microcode should turn off TBIC port on P750
PROBLEM SUMMARY:
When the CEC is powered off the adapter continues to run
until it runs out of receive buffers. Since the CEC is
powered off one of the adapter DMAs fails and the microcode
takes an exception. Since host side recovery cannot run the
switch network backs up.
PROBLEM CONCLUSION:
In its exception handler routine the mircocode turns off the
adapter switch port casuing switch error recovery to be
invoked which in turn bit buckets all packets destined for
this adapter.
------
APAR: IY33415 COMPID: 5765E6900 REL: 310
ABSTRACT: LOADL CANNOT REMOVE A RP JOB
PROBLEM DESCRIPTION:
One machine in the LL pool had a crash, which left the two jobs,
which had been running on the machine in the LL queue.
LL on the machine was back after reboot, but llstatus showed
that resources are in use - no new jobs would start. A llcancel
put the jobs in RP, and the resources on the machine were not
freed. One job had been issued from this machine, this job
disappeared from the system after deleting the job_queue files
in spool/ and recycle LL on this machine. second job, issued
from another machine persists in queue as RP. resources blocked.
PROBLEM SUMMARY:
When LoadLeveler came back up after a crash,
the job previously in suspended state is gone
but llq still have it shown as running.
Doing a llcancel could only set the job
state to RP without truly removing it.
PROBLEM CONCLUSION:
When LoadLeveler came back up after a crash,
the job previously in suspended state is now
able to run. And llcancel will be able to
kill the job.
------
APAR: IY33428 COMPID: 5765D5100 REL: 340
ABSTRACT: RVSDS SOMETIMES DONT COME UP ON REBOOT DUE TO UNRELIABLE PARSE
PROBLEM DESCRIPTION:
rvsd startup on reboot somtimes fails with a strange
error message from ha.vsd, pointing to a syntax error
in line 1079 of pssp3.4 ptfset10 level.
in this line the pid of srcmstr is computed via the
ps command output piped to grep.
This is unreliable, since it sometimes comes up with more
than one PID.
theres an open defect (82415) pokcmvc, which deals with
exactly the same prob. for another release.
LOCAL FIX:
either reboot again,
or fix that line to make sure, only the srcmstr s pid is
grepped.
PROBLEM SUMMARY:
A line in ha.vsd that does a grep on srcmstr to determine
if the rvsd daemon was started via srcmstr is not as
robust as it could be. It is possible for it to pick up
multiple process ids, which result in the rvsd daemon
being unable to start. In this case a Syntax error from
ha.vsd is written to the console log.
PROBLEM CONCLUSION:
ha.vsd has been modified to do a more efficient
check to determine the process id of srcmstr.
This check should prevent the syntax error from
ha.vsd which prevents the rsvd daemon from starting.
------
APAR: IY33544 COMPID: 5765D5100 REL: 340
ABSTRACT: SDR_CONFIG SETS PSSP LEVEL TO PSSP 3.2 FOR REG LPARS
PROBLEM DESCRIPTION:
sdr_config sets PSSP level to pssp 3.2 for reg lpars
PROBLEM SUMMARY:
If a CWS has been migrated to PSSP 3.4 from an earlier
level, defining adapters for pSeries 670/690 nodes by
specifying the physical location codes may fail.
PROBLEM CONCLUSION:
The fix will allow you to define adapters by using physical
location codes on pSeries 670/690 nodes. A warning will
be issued by spadaptrs if the PSSP level or code version
specified for the nodes are not at 3.4. This may happen
if the CWS has been migrated from an earlier level to
3.4.
------
APAR: IY33550 COMPID: 5765D5100 REL: 340
ABSTRACT: S70D DAEMON DIES UNEXPECTED. HARDMON MUST BE STOPPED AND RESTART
PROBLEM DESCRIPTION:
SP attached server S80/S85. s70d dies unexpectly. following msgs
in /var/adm/SPlogs/spmon/s70/s70d.3.log.xxx :
s70d 3 : 0026-500I s70d daemon started on device"/dev/tty7" (Fra
me 3) at Sat May 18 09:59:29 2002
s70d 3 : 0026-507I Entered main processing loop
SAMI Firmware Level (mm/dd/yy): 8/31/99
s70d 3 : 0026-522 ioctl() was unsuccessful: Resource temporarily
unavailable (11)
s70d 3 : 0026-502I s70d daemon ended (2) on device "/dev/tty7"
PROBLEM SUMMARY:
An ioctl failure is causing the s70d to terminate. In the
log file /var/adm/SPlogs/spmon/s70/s70d.x.log.yyy will be
the messages:
0026-522 ioctl() was unsuccessful:
0026-502I s70d daemon ended (x) on device "/dev/ttyx"
The s70d should be modifed to not terminate if there
an ioctl failure.
PROBLEM CONCLUSION:
The s70d has been modified to not issue message 0026-522
when a call to ioctl is unsuccessful and to not
terminate. The ioctl will either succeed on a subsequent
retry, or will cause another terminating error to occur.
------
APAR: IY33616 COMPID: 5765D5100 REL: 340
ABSTRACT: INCORRECT LPAR PARTITION STATE OF 'INITIALIZING' IN NODE STATUS
PROBLEM DESCRIPTION:
incorrect lpar partition state of 'initializing' in node status
PROBLEM SUMMARY:
The partition state for LPARs on the Regatta machine
continues to display yellow with 'initializing' on
the Node Status page of the Node notebook even after reboot
is complete.
PROBLEM CONCLUSION:
The hmcd daemon was returning the Regatta states
'initializing' and 'running' incorrectly. It was returning
'initializing' when it should have been returning 'running'
and visa versa. The hmcd daemon passes this information
to the Hardware Monitor, which through Event Management,
passes the information to the Perspectives GUI. The
Hardware Monitor also passes this information to clients
such as the hmmon command line interface.
------
APAR: IY33670 COMPID: 5765D5100 REL: 340
ABSTRACT: TASK_ID WRONG AND PORT DISABLE BEFORE INTERNAL WRAP SET
PROBLEM DESCRIPTION:
task_id wrong and port disable before internal wrap set
PROBLEM SUMMARY:
For Corsair Adapter Diagnostics:
Running diagnostic fails the dma_test after the adapter
has been unfenced or has been connected to the switch.
The cause for this was the incorrect parameter for the
network table was being used as a destination id and
the microcode was throwing out packets because after
becoming unfenced the network table has become
populated by real routes.
The adapter also is reporting false link errors when
the diagnostics is run after the adapter has been
unfenced. The cause for this was that the port is still
active when the TBIC internal wrap mode was turned on,
which in turn generates the link interrupts.
PROBLEM CONCLUSION:
The correct parameter to the dma test is now being used
so diagnostics will no longer fail even if the network
table is populated (after an unfence). The port is now
being disabled prior to enabling the TBIC internal
wrap mode and the spurious link errors are no longer
being generated.
------
APAR: IY33671 COMPID: 5765D5100 REL: 340
ABSTRACT: PROBLEM IN ZERO SDRAM
PROBLEM DESCRIPTION:
problem in zero sdram
PROBLEM SUMMARY:
The last segment of SDRAM is not initialized to zero.
PROBLEM CONCLUSION:
The last segment of SDRAM is now initialized to zero.
------
APAR: IY33672 COMPID: 5765D5100 REL: 340
ABSTRACT: RH3 MPV:MISSING PART NUMBER FOR TB3 PCI ADAPTER
PROBLEM DESCRIPTION:
rh3 mpv:missing part number for tb3 pci adapter
PROBLEM SUMMARY:
VPD for pci adapter is not available for diagnostic
controller to pick up.
PROBLEM CONCLUSION:
Get vpd data from adapter and save it in CuVPD during card
config. Now, when a problem is detected while running
diagnostics, Diagnostic Controller will get FRU from CuVPD
and report the problem.
------
APAR: IY33789 COMPID: 5765B9500 REL: 150
ABSTRACT: ASSERT IN SGMGR.C ON LINE 2018
PROBLEM DESCRIPTION:
assert in sgmgr.c on line 2018
PROBLEM SUMMARY:
assert in sgmgr.C on line 2018. tsdf -q called
relPermissionToRun without first calling getPermissionToRun
PROBLEM CONCLUSION:
tsdf -q should not call relPermissionToRun
since it did not call getPermissionToRun
------
APAR: IY33790 COMPID: 5765B9501 REL: 340
ABSTRACT: MODIFY TSDF AND TSCHDISK CONFLICTS
PROBLEM DESCRIPTION:
modify tsdf and tschdisk conflicts
PROBLEM SUMMARY:
Someone changed disks in Being-emptied state to Suspended
state during a mmdeldisk command and the results may not be
what was expected.
PROBLEM CONCLUSION:
Tweak conflict matrix to have the tsdf and chdisk commands
fail instead of wait for restripe to finish. Allow tsdf with
the -q option to run without permissions since it only
displays in-memory tables.
------
APAR: IY33791 COMPID: 5765B9501 REL: 340
ABSTRACT: MSGS. GARBAGE IN GPFS LOG FILE
PROBLEM DESCRIPTION:
msgs. garbage in gpfs log file
PROBLEM SUMMARY:
Error messages printed thru RSCT contained garbage data
PROBLEM CONCLUSION:
Length of the message string passed to RSCT needs to include
the trailing zero.
------
APAR: IY33792 COMPID: 5765B9501 REL: 340
ABSTRACT: INODES MISSING AFTER INODE FILE EXPANSION
PROBLEM DESCRIPTION:
inodes missing after inode file expansion
PROBLEM SUMMARY:
Inodes missing after inode file expansion
PROBLEM CONCLUSION:
After inode file expansion, when marking new inodes as
available in the inode map, must update segment hints
accordingly. Otherwise, some of the new inodes may be
unavailable until next remount.
------
APAR: IY33793 COMPID: 5765B9501 REL: 340
ABSTRACT: LONG WAITERS ON MMDELFS 'WAITING FOR SG CLEANUP'
PROBLEM DESCRIPTION:
long waiters on mmdelfs 'waiting for sg cleanup'
PROBLEM SUMMARY:
the rpc handler should let the syncClient call use the sg
even if in cleanupInProgress.
PROBLEM CONCLUSION:
sgmMsgSGUmount handler, having finished its work, calls
EndUse, and useCount for fs2 goes to zero, triggering
cleanup, where cleanupInProgress flag is set. During
cleanup, syncFS is called, which calls
QuotaClient::SyncQuotaClt, which does internal RPC handled
by SGHandleQuotaMgrMsg. In this handler, useStripeGroup is
called, and for all messages that are not quotaMsgEndClient
it passes USE_WAIT_FOR_CLEANUP flag, thus making useStripe
Group block because cleanupInProgress is set.
------
APAR: IY33794 COMPID: 5765B9500 REL: 150
ABSTRACT: MMSDRFS 0 LENGTH AFTER PANIC
PROBLEM DESCRIPTION:
mmsdrfs 0 length after panic
PROBLEM SUMMARY:
After node crash the mmsdrfs file was made zero
length
PROBLEM CONCLUSION:
Need to do sync after updating key files
------
APAR: IY33796 COMPID: 5765B9501 REL: 340
ABSTRACT: DIO TRACING FOR TOOLS
PROBLEM DESCRIPTION:
dio tracing for tools
PROBLEM SUMMARY:
Need DIO tracing
PROBLEM CONCLUSION:
Add a DIOQIO trace so that tools for analyzing IO events can
match QIO/FIO pairs.
------
APAR: IY33797 COMPID: 5765B9501 REL: 340
ABSTRACT: INVALIDATE DISKS WHEN MMADDISK FAILS
PROBLEM DESCRIPTION:
invalidate disks when mmaddisk fails
PROBLEM SUMMARY:
When adding new disks, it needs to add new segments to
allocation map. If there is no space left for the new
segments it stops. But after freeing space, the retry of the
mmadddisk says "Are you sure, these disks appear to be in
use", so need to use the -v no option on mmadddisk to
override the check.
PROBLEM CONCLUSION:
When completeDeleteDisks find disks that were being added
(but add failed), it needs to have changeDiskStates
invalidate the disk and SG descriptors when it changes the
state to BeingDeletedFromAllocMap so that the disks do not
look like they belong to a SG anymore.
------
APAR: IY33798 COMPID: 5765B9501 REL: 340
ABSTRACT: VALIDATE ALLOC SEG WRITES
PROBLEM DESCRIPTION:
validate alloc seg writes
PROBLEM SUMMARY:
Sometimes on a loaded system alloc segment data is getting
corrupted.
PROBLEM CONCLUSION:
To catch alloc segment data corruption, if
AssertOnStructureError is set, code is added to validate
alloc segment buffer that was just written still has the
correct checksum, etc. This would verify whether the data
was corrupted in GPFS between the checksum calculation and
the write to disk, or by the disk IO subsystem, et al.
------
APAR: IY33814 COMPID: 5765D5100 REL: 340
ABSTRACT: SPLED FAILS
PROBLEM DESCRIPTION:
spled fails
PROBLEM SUMMARY:
The problem is cause by RegattaH (HMC) frames. The problem
only happens when we start spled immediately after start
hardmon because it take a longer time for hardmon to
connect to HMC. Therefore, If we start spled immediately
after start hardmon, spled will only display non-HMC frames
at the beginning and blow away eventuallly when it find the
other frames since there is no space for the new-found
frames
PROBLEM CONCLUSION:
Here is how the fix works:
If you start spled immediately after start hardmon, spled
will display all frames (non-HMC and HMC). However, it only
displays leds/lcds for non-HMC frames and HMC frames will be
blank at the beginning. It takes about 30 seconds to
display leds/lcds for HMC frames since it takes time for
hardmon connect to HMC frames initiately. Note: The switch
frames will be blank if there are any switch frames in the
system since there are no lcds/leds for the switch frames.
------
APAR: IY33829 COMPID: 5765D5100 REL: 340
ABSTRACT: DCR - SPMKVGOBJ SHOULD GIVE WARNING NOT FAIL IF IMAGE IS NOT IN
PROBLEM DESCRIPTION:
Currently spmkvgobj fails with a non-0 return code if the image
specified by the -i flag cannot be found in
/spdata/sys1/install/images
This can be inconvenient or unrealistic for large customers that
keep a separate install image for each node, but store them
outside spdata when not working on them. Since all spmkvgobj is
doing is entering data and not actually using the image, it
could be made to issue a warning if the image is not there but
still exit successfully (with rc 0).
LOCAL FIX:
Touch the needed filename in the spdata directory to fake
spmkvgobj out (it just checks for filename existence, doesn't
confirm that it is a valid mksysb).
PROBLEM SUMMARY:
spmkvgobj and spchvgobj currently issue an error and
terminate if the install image specified does not exist in
/spdata/sys1/install/images/.
Since the install image is not being used at this point,
the check should be modified to just issue an
informational message if the install image does not exist
and continue processing.
PROBLEM CONCLUSION:
spmkvgobj and spchvgobj have been modified to only issue
an informational message if the install image specified
does not exist. Later processing in spbootins and
mknimres will continue to issue error messages if the
install image is still not present when it is required.
------
APAR: IY33844 COMPID: 5765D5100 REL: 340
ABSTRACT: SPREBUILDSYSMAP LEAVES SYSTEM WITHOUT SYSPAR_MAP ENTRIES
PROBLEM DESCRIPTION:
The command /usr/lpp/ssp/bin/sprebuildsysmap is missing an
absolute path for the command to be issued on line 60, which is
as follows:
$command = "SDR_config -l";
This line should be:
$command = "/usr/lpp/ssp/install/bin/SDR_config -l";
The script is assuming /usr/lpp/ssp/install/bin is in the
System Administrator's PATH and this is not always a valid
assumption.
Because of this, the sprebuildsysmap command is failing and
leaving the system with no entries in the Syspar_map class.
LOCAL FIX:
System Administrator should include /usr/lpp/ssp/bin in the PATH
as a workaround.
PROBLEM SUMMARY:
When the sprebuildsysmap command invokes SDR_config it
does not specify its full path. If the user's PATH does
not contain /usr/lpp/ssp/install/bin/, the call to
SDR_config will fail and the Syspar_map will not be
recreated.
PROBLEM CONCLUSION:
sprebuildsysmap was modified to specify the full path
for all commands that it calls.
------
APAR: IY33854 COMPID: 5765D5100 REL: 340
ABSTRACT: XMEMPIN SERVICE FOR 64BIT U_CLIENT
PROBLEM DESCRIPTION:
xmempin service for 64bit u_client
PROBLEM SUMMARY:
64bit user clients pass in a 64bit shm_p to functions
xmempin/xmemunpin. Since these functions only accept a
32bit shm_p on 32bit kernel, shm_p might be corrupted.
PROBLEM CONCLUSION:
Before passing shm_p to xmempin/xmemunpin, use as_remap64()
to remap the 64bit shm_p to 32bit.
------
APAR: IY33906 COMPID: 5765D5100 REL: 340
ABSTRACT: MODS NEEDED IN HACWS PRE/POST EVENTS
PROBLEM DESCRIPTION:
mods needed in hacws pre/post events
PROBLEM SUMMARY:
HACWS changes needed to support hacmp 4.5
PROBLEM CONCLUSION:
modified hacws pre- and post-event
scripts to correctly handle changed in hacmp 4.5.
------
APAR: IY33925 COMPID: 5765D5100 REL: 340
ABSTRACT: GET_FILE_CHECKSUM CAN LEAVE OPEN FILE DESCRIPTORS
PROBLEM DESCRIPTION:
A node may be fenced off the switch if there are too many open
file descriptors. Symptoms include the following messages in
the fs_daemon_print.file:
get_file_checksum: fopen failed, errno = 24
2547-677 topo did not rebuild correctly
Turning off this nodes switchResponds bits in the SDR
To recover run rc.switch and Eunfence the node.
PROBLEM SUMMARY:
Nodes on the SP-Switch 2 can drop off the switch after the
switch Eprimary node has done many switch service operations
(e.g. Efence/Eunfence/Estart's).
PROBLEM CONCLUSION:
The fault service daemon has been changed to prevent files
from being left open after topology file distribution on
the SP Switch-2 occurs.
------
APAR: IY34137 COMPID: 5765D9300 REL: 320
ABSTRACT: _GETODMNN MAY CORRUPT MEMORY
PROBLEM DESCRIPTION:
_getodmnn may corrupt memory
PROBLEM SUMMARY:
During MPI_Init(), odm_set_path() is called for gathering
info from ODM database. The returned memory pointer is kept
in a variable for memory releasing. The variable was not
initialized and the validity of its value was not checked
before it being used in memory freeing. If ODM function
calls fail for some reason, invalid memory pointer could be
used in the memory freeing and cause memory corruption.
PROBLEM CONCLUSION:
Initialize the variable to NULL and check whether memory is
allocated before it is freed.
------
APAR: IY34141 COMPID: 5765D5100 REL: 340
ABSTRACT: SPFRAME NOT GIVING ERROR WHEN IT SHOULD
PROBLEM DESCRIPTION:
spframe not giving error when it should
PROBLEM SUMMARY:
Running spframe to attach a CSP protocol server to a
switchless SP system gives unexpected behavior. The '-n'
flag is required in this situation, but leaving it out does
not produce an error message as it should.
The cause of this problem was due to a flaw in the routine
that determines if a system is partitionable or not.
PROBLEM CONCLUSION:
The routine that determines if a system is partitionable was
fixed to resolve this defect.
------
APAR: IY34143 COMPID: 5765D5100 REL: 340
ABSTRACT: POWER ON COMMANDS TO NODE1 OF P690 OR P670 DO NOT SUCCEED
PROBLEM DESCRIPTION:
power on commands to node 1 of a p690 pr p670 do not succeed
PROBLEM SUMMARY:
Power on commands to "node 1" of a p690 or p670 server
in SMP mode do not succeed. If you issue
spmon -power on node1
for an SMP mode p690 or p670 server, the following
error message is issued:
spmon: 0026-068 Frame 1 is powered off.
Unable to power on devices in the frame.
spmon: 0026-025 Power command ended in error.
PROBLEM CONCLUSION:
The library used by spmon has been modified to allow
spmon -power on commands to "node 1" of a p690 or p670
server in SMP mode to succeed.
------
APAR: IY34151 COMPID: 5765D5100 REL: 340
ABSTRACT: ADD CRUISER SUPPORT
PROBLEM DESCRIPTION:
add cruiser support
PROBLEM SUMMARY:
Support for the cruiser adapter is needed.
PROBLEM CONCLUSION:
PdDv and PdAt information for cruiser is added to
the corsair.add file so that the cruiser adapter will
be recognized.
------
APAR: IY34152 COMPID: 5765D5100 REL: 340
ABSTRACT: KLAPI/DMA FAILURE
PROBLEM DESCRIPTION:
klapi/dma failure
PROBLEM SUMMARY:
Timing hole in Zero Copy retransmission path of KLAPI can
cause DMA from system memory to be garbage. The packet will
be dropped on the other side but we need to fix this hole so
it does not DMA from system memory bad data. This will only
happen when multiple retransmit request is processed for a
given message.
PROBLEM CONCLUSION:
The fix is to not send a duplicate zero copy packet in
KLAPI retransmission logic if the originator of the zero
copy packet sends an acknowledgement that it has completly
received the packet.
------
APAR: IY34181 COMPID: 5765D5100 REL: 340
ABSTRACT: SPGETDESC: REGATTA SUPPORT ENHANCEMENTS
PROBLEM DESCRIPTION:
spgetdesc:regatta support enhancements
PROBLEM SUMMARY:
Some Regatta support enhancements are required, and are
provided in this apar.
PROBLEM CONCLUSION:
The Regatta support enhancements have been provided.
------
APAR: IY34248 COMPID: 5765E6100 REL: 510
ABSTRACT: LIBPERFSTAT DISK ERROR/MEMORY LEAK
PROBLEM DESCRIPTION:
ers of libperfstat.a disk API function will
experience a memory leak when consumers run
as root.
PROBLEM SUMMARY:
Users of libperfstat disk API may
experience a memory leak when
consumers run as root
------
APAR: IY34339 COMPID: 5765B9501 REL: 340
ABSTRACT: DEADLOCK HAPPENS WHEN A BYTE-RANGE LOCK IS TAKEN OUT ON A FILE.
PROBLEM DESCRIPTION:
PROBLEM SUMMARY:
Deadlock occured when a byte-range lock is taken out on a
file.
PROBLEM CONCLUSION:
a kernel thread executing lockGetattr may deadlock with an
InodePrefetchWorker thread. inodeUsed is called while
holding InodeCacheObj mutex, where an attempt is made to
acquire ipsMutex. The InodePrefetchWorker thread, while
holding ipsMutex, calls relAllLocks, which may eventually
lead to a brUnlock call where InodeCacheObj mutex is
acquired. The fix is for InodePrefetchWorker to temporarily
drop ipsMutex while calling relAllLocks.
------
APAR: IY34343 COMPID: 5765D5100 REL: 340
ABSTRACT: HANG:USER-SPACE SIGNAL LIBRARIES
PROBLEM DESCRIPTION:
hang:user-space signal libraries
PROBLEM SUMMARY:
This problem shows up when one process' send tail is out of
sync with another process' send head. When send fifo is
full, under certain circumstance, the former process think
there is still space available in send fifo. So it
continues to write into the fifo so that send tail passes
send head, which should never happen.
PROBLEM CONCLUSION:
To solve this problem, an extra checking need to be
conducted under that circumstance so that the process can
stop writing to the fifo when it is full.
------
APAR: IY34397 COMPID: 5765B9501 REL: 340
ABSTRACT: EXECUTABLE NOT REPLACED AFTER REBIND
PROBLEM DESCRIPTION:
Customer discovered on GPFS 1.5 filesystem (at PTF set 11) when
compiling a C program, any old executables are not replaced --
so if a program is changed and recompiled, the old executable
must be deleted first.
This has been identified as a problem with memory mapped pages
(and the BRL::flushMappedPages routine) that could affect other
things besides C compilers, although no other symptoms are known
at this time.
LOCAL FIX:
For C compilers on GPFS 1.5 PTF set 11 (mmfs.base.rte 3.4.0.9),
old executable must be deleted before recompiling changed source
code.
PROBLEM SUMMARY:
Old executables not replaced after rebind.
PROBLEM CONCLUSION:
Invalidate pages from VMM cache at truncate time rather than
waiting to do it the next time a byte range lock is
acquired.
TEMPORARY FIX:
For C compilers on GPFS 1.5 PTF set 11 (mmfs.base.rte
3.4.0.9), old executable must be deleted before recompiling
changed source code
------
APAR: IY34408 COMPID: 5765D5100 REL: 340
ABSTRACT: SWITCH PRIMARY NODE HUNG WHILE TRYING TO HANDLE SENDER HANGS
PROBLEM DESCRIPTION:
The primary node hung while trying to handle a sender hang
condition causing the backup to take over. The SwitchScan()
function is not handling sender hang recovery correctly. There
are a couple of minor bugs in the SwitchScan() code that need to
be fixed.
PROBLEM SUMMARY:
When handling sender hang conditions, it is possible for
the switch primary node to send a reset service packet to
an invalid or unknown device, causing the primary to hang
and the backup node to take over. The fault service
daemon on the original primary node may be terminated.
PROBLEM CONCLUSION:
The code to reset sender hangs was not processing
its internal buffers correctly; this has been
corrected.
------
APAR: IY34469 COMPID: 5765D5100 REL: 340
ABSTRACT: RAISE SPSWITCH2PCI ADAPTER RECOVERY BAD PACKET THRESHOLD
PROBLEM DESCRIPTION:
The SPSwitch2PCI adapter will be fenced from the switch when it
receives more than a small number of corrupted switch packets.
Generally speaking, the adapter is the victim, and not the cause
of the bad packets. Therefore, there is little to gain by
fencing the adapter. The focus of this APAR is to raise the
SPSwitch2PCI adapter recovery bad pkt threshold significantly.
PROBLEM SUMMARY:
The switch bad packet threshold values for the SPSwitch2PCI
adapter are set too low. This causes the adapter recovery
logic to fence the node from the switch plane.
PROBLEM CONCLUSION:
The threshold values for bad switch packets for the
SPSwitch2PCI adapter have been increased (in effect
to infinity). This makes the adapter recovery
logic more tolerant of corrupted packets.
------
APAR: IY34658 COMPID: 5765B9500 REL: 150
ABSTRACT: MMCHCLUSTER -R NOT REFLECTED IN MMLSCLUSTER
PROBLEM DESCRIPTION:
mmchcluster -R not reflected in mmlscluster
PROBLEM CONCLUSION:
Correct the test for changed rcp path;
change the command to work for clusters based on RSCT peer
domains.
------
APAR: IY34661 COMPID: 5765B9501 REL: 340
ABSTRACT: C209F6N16 AND C209F5N03 CRASHED WITH FLASHING "888"
PROBLEM DESCRIPTION:
c209f6n16 and c209f5n03 crashed with flashing "888"
PROBLEM SUMMARY:
gpfsMount asserted node because mount helper was not
running.
PROBLEM CONCLUSION:
clear the root gpfsNode pointer in the VFS data when
kSFSMount fails (in case it was set before the failure
occurred).
------
APAR: IY34744 COMPID: 5765B9501 REL: 340
ABSTRACT: RUNNING ALT_DISK_INSTALL ON LIVE SYSTEM BRINGS DOWN GPFS
PROBLEM DESCRIPTION:
update_all operation which updates GPFS filesets
on an alt_disk_install will recycling mmfs daemon.
The install scripts (part of GPFS updates) shouldn't
recycle mmfs daemon if the update is performed on a
alternate rootvg.
PROBLEM SUMMARY:
Added support for alt_disk_install migration
PROBLEM CONCLUSION:
Avoid modifying active system (unmount, process restart,
etc.) when INUCLIENTS is set
------
APAR: IY34817 COMPID: 5765C3403 REL: 430
ABSTRACT: EEH FOR SC2+ AIX ERROR LOG ENTRIES INCONSISTENT
PROBLEM DESCRIPTION:
SSA PingTimeout error log at the same time as EEH error log.
PROBLEM CONCLUSION:
Stop the issue of the spurious timeout when EEH error occurs.
------
APAR: IY34829 COMPID: 5765B9501 REL: 340
ABSTRACT: ASSERT: VP!= NULL IN BRV.C LINE 203
PROBLEM DESCRIPTION:
gpfs assert: VP!= NULL IN BRV.C LINE 203.
PROBLEM SUMMARY:
Fixed Assert: vp!=null in BRV.C
PROBLEM CONCLUSION:
Fix previous flushMappedPages change to also handle AIX case
correctly for ref decrement
------
APAR: IY34855 COMPID: 5765C3403 REL: 430
ABSTRACT: MONITOR ERROR LOG FOR SYMPTOMS OF ARM VIBRATION
PROBLEM DESCRIPTION:
Poor performance from some IBM disk drives.
PROBLEM CONCLUSION:
Monitor error log for charactistic -- but non-fatal -- errors
and perform analysis on suspect disk.
------
APAR: IY34856 COMPID: 5765C3403 REL: 430
ABSTRACT: POLICE VALID SERVICE WORD ON ADAPTER IOCTLS
PROBLEM DESCRIPTION:
Adapter resets with the cause -- Illegal Service number.
PROBLEM CONCLUSION:
Trap the illegal service number at the disk device driver
interface and return the request with an error.
------
APAR: IY34917 COMPID: 5765B9501 REL: 340
ABSTRACT: DIRECT I/O NOT UPDATING FILESIZE
PROBLEM DESCRIPTION:
Direct IO in last block but after filesize was not updating the
metadata filesize or the inode cache filesize.
LOCAL FIX:
There is no known work around for this problem.
PROBLEM SUMMARY:
Direct IO not updating the filesystem.
PROBLEM CONCLUSION:
Direct IO in last block but after filesize was not updating
the metadata filesize or the inode cache filesize.
openedDirect flag not getting turned on if file created with
O_DIRECT flag.
------
APAR: IY35185 COMPID: 5765B8100 REL: 220
ABSTRACT: ALL APPLICATION PROFILES LASTUPDATE FIELD GETS UPDATED WHEN
PROBLEM DESCRIPTION:
The lastupdate field on all the Application Profiles get updated
with the date of the latest startup of Voice Response.
------
APAR: IY35196 COMPID: 5765B9501 REL: 340
ABSTRACT: MMCHCLUSTER -R NOT REFLECTED IN MMLSCLUSTER
PROBLEM DESCRIPTION:
mmchcluster -R not reflected in mmlscluster
PROBLEM CONCLUSION:
Correct the test for changed rcp path;
change the command to work for clusters based on RSCT peer
domains.
------
APAR: IY35197 COMPID: 5765B9500 REL: 150
ABSTRACT: RUNNING ALT_DISK_INSTALL ON LIVE SYSTEM BRINGS DOWN GPFS
PROBLEM DESCRIPTION:
update_all operation which updates GPFS filesets
on an alt_disk_install will recycling mmfs daemon.
The install scripts (part of GPFS updates) shouldn't
recycle mmfs daemon if the update is performed on a
alternate rootvg.
PROBLEM SUMMARY:
Added support for alt_disk_install migration
PROBLEM CONCLUSION:
Avoid modifying active system (unmount, process restart,
etc.) when INUCLIENTS is set
------
APAR: IY35225 COMPID: 5765D5100 REL: 340
ABSTRACT: LATEST PSSP 3.4.0 FIXES AS OF SEPTEMBER 2002
PROBLEM DESCRIPTION:
This is the lastest PSSP ptf as of September 2002
Order this apar to get all of the ptfs as of September 2002.
PROBLEM SUMMARY:
This is a packaging apar for PSSP 3.4.0 fixes
as of September 2002
PROBLEM CONCLUSION:
This is a packaging apar for PSSP 3.4.0 fixes
as of September 2002
------
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]