|
Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com |
From: AIX Service Mail Server (aixserv
austin.ibm.com)Date: Tue May 07 2002 - 02:37:08 CDT
APAR: IY12345 COMPID: 5698PDD00 REL: 360
ABSTRACT: MAKING SERVER SIDE SCRIPT-FILTERING CONFIGURABLE
PROBLEM DESCRIPTION:
making server side script-filtering configurable
PROBLEM CONCLUSION:
making server side script-filtering configurable for
junctions with scripting support (-j option in junctioncp).
To disable filtering within scripting support (filtering
is only one of the scripting support functionality) the
user can add this line to the wand stanza in iv.conf
wand
script-filtering = no
------
APAR: IY25153 COMPID: 5622DJX00 REL: 211
ABSTRACT: PTF USED FOR DATAJOINER 211 FOR AIX
PROBLEM DESCRIPTION:
PTF used for DataJoiner 211 for AIX
------
APAR: IY28176 COMPID: 5765D9300 REL: 320
ABSTRACT: ROUTINE MPI_WTIME RETURNS INCORRECT RESULT ON -Q64 -QREALSIZE=8
PROBLEM DESCRIPTION:
The routine MPI_Wtime produces incorrect results when the
program is compiled with the options -q64 and -qrealsize=8.
LOCAL FIX:
omit -q64 and -qrealsize=8
PROBLEM SUMMARY:
The problem was solved by changing the mpif.h file to
reflect the correct return value type for mpi_wtime and
mpi_wtick functions. Also the mpi.mod files were generated
from these new header files.
PROBLEM CONCLUSION:
The problem arose due to the fact that internally the return
value to mpi_wtime was classified as double precision when
it should have been real*8 instead. Likewise the same was
true of mpi_wtick. Becuase of this, when supplying the
-qrealsize=8 option a mismatch would occur between the two
representations of reals and incorrect data would occur.
------
APAR: IY28351 COMPID: 5765D5100 REL: 340
ABSTRACT: SP SWITCH 2 DAEMON HAS INVALID DEPENDENCY ON SWITCH NODE NUM 0.
PROBLEM DESCRIPTION:
The SP Switch 2 fault-service daemon has a dependency on the
existence of switch node number 0. (Switch node number 0 doesn't
have to be initialized on the switch, but it must exist in the
SDR.) This is an invalid dependency because a customer can
remove the node that is assigned switch node number 0.
LOCAL FIX:
Add the node that is assigned switch node number 0 back to
the system.
PROBLEM SUMMARY:
On an SP_Switch2, if a customer deletes a node that has
been assigned switch node number 0, Estart will fail.
You will see the following message in the flt file on
the primary node:
CSswitchInit: CSworm_bfs_phase1() failed with rc=-51
PROBLEM CONCLUSION:
During phase 1 switch initialization, an assumption was
made that there always was a node that was assigned
switch node number 0 (zero). This is not a valid
assumption. Phase 1 initialization has been changed to
use the primary node's switch node number, instead of
the value 0, as the starting point for exploring the
network.
------
APAR: IY28505 COMPID: 5765D9300 REL: 320
ABSTRACT: FAULTY MPI_CART_SHIFT ROUTINE
PROBLEM DESCRIPTION:
A Fortran MPI program, when run on 4 processors
poe ./topol -procs 4 -hostfile ../psi.test/host.lis
produces the output
myid= 0 myidm,p= 3 1 MPI_PROC_NULL= -3
myid= 1 myidm,p= 0 2 MPI_PROC_NULL= -3
myid= 2 myidm,p= 1 3 MPI_PROC_NULL= -3
myid= 3 myidm,p= 2 -3 MPI_PROC_NULL= -3
The correct output should be:
myid= 0 myidm,p= 3, 1 MPI_PROC_NULL= -3
myid= 1 myidm,p= 0, 2 MPI_PROC_NULL= -3
myid= 2 myidm,p= 1, 3 MPI_PROC_NULL= -3
myid= 3 myidm,p= 2, 0 MPI_PROC_NULL= -3
PROBLEM SUMMARY:
Fixed the problem by changing the code to the mpi_cart_sub
routine so that information whether a dimension is periodic
is preserved in the new topology from the old communicator.
By doing this mpi_cart_shift produces results as expected.
PROBLEM CONCLUSION:
The problem in this case was that the mpi_cart_sub call was
not saving the information about the original dimensions
being periodic or not. As a result consequent call to the
mpi_cart_shift routine did not produe the desirable results.
when the new dimension was shifted some tasks ended up being
pushed out since the dimension was mistakenly marked as non
periodic in the mpi_cart_sub routine.
------
APAR: IY28533 COMPID: 5765D5100 REL: 340
ABSTRACT: NODE NUMBERS NOT BEING IDENTIFIED PROPERLY DURING SETUP_CWS
PROBLEM DESCRIPTION:
In PSSP 3.4, if a customer adds a node whose node number is part
of an already existing node number (such as 17 and 117),
setup_CWS may fail to properly create a new krb-srvtab.
A similar problem exists with mknimres, which is being handled
in IY27621.
LOCAL FIX:
Manually create the krb-srvtab file.
PROBLEM SUMMARY:
The mknimres part of this was addressed under IY27621.
We will only address the setup_CWS bug here.
Under the following conditions setup_CWS will fail to
generate a new-srvtab file for node A:
node A's node number is a substring of node B's
(e.g. node 1 and node 11)
In the SDR Adapter Class, the entry for node B's en0
appears BEFORE node A's. This would normally only happen
if node B is added first.
The code is checking the list to see if the node number
is already in there. But if a node is already in the list
whose number is a superstring of the node it is looking for,
it will think it has a match when it shouldn't, and the node
won't get added to the list. The result is a new-srvtab file
won't get generated for the node.
PROBLEM CONCLUSION:
The code has been modified to search the list for the
node number surrounded by spaces. This will ensure
an exact match only. The list is built in such a
manner that there will always be spaces around each
node number.
------
APAR: IY28535 COMPID: 5765D5100 REL: 340
ABSTRACT: MKNIMRES DOES NOT HANDLE UMASK=077
PROBLEM DESCRIPTION:
If umask is 077, then mknimres creates pssplpp directory that
cannot be exported.
LOCAL FIX:
change umask or change perms on directory manually after failure
PROBLEM SUMMARY:
When a node is a BIS server for another node,
and a customer's umask setting is 077, the
/spdata/sys1/install/pssplpp/PSSP-3.4 directory
is created incorrectly on the BIS node.
The /spdata directory should be created with
permissions of 755, however, with the umask
setting of 077 it is created with permissions
of 2740. This subsequently, makes customization
fail because /spdata/sys1/install/pssplpp/PSSP-3.4/
pssp.installp file cannot be seen or executed.
/usr/lpp/ssp/install/bin/makedir was changed to
add a umask setting of 022, which enables mknimres
to create the /spdata/sys1/install/pssplpp/PSSP-3.4
directory with permission of 755 as intended.
After completion of makedir, the customer's umask
setting will be returned to its original setting.
PROBLEM CONCLUSION:
/usr/lpp/ssp/install/bin/makedir was changed to
add umask(022). Since makedir is only called by
mknimres, this will correct all occurrences
of creating a directory with incorrect permissions
due to customer's umask setting differing from
what we expect. Subsequently, when makedir script
end, the umask setting will be returned to the
customer's original setting.
------
APAR: IY28578 COMPID: 5765D5100 REL: 340
ABSTRACT: THE PERSPECTIVE TABLE VIEW HAS A PROBLEM WITH SWITCH_RESPONDS
PROBLEM DESCRIPTION:
The table view has a problem with switch_responds on
switchless systems. Everything after (to the right)
of where the switch_responds is supposed to be displayed
does not get displayed either.Any columns after the one
where switch_responds WOULD be are blank. Any columns
after the one where switch_responds WOULD be are blank.
LOCAL FIX:
Modified .sphardwaremonitorTableView and put it on
/usr/lpp/ssp/perspectives/profiles/en_US. (current $LANG=en_US)
It swaps the position of switch_responds end Environment LED,
making switch_responds last so there is nothing to the right
of it for it to mess up.
PROBLEM SUMMARY:
The code was not properly handling a non-existent attribute
for example "switch responds" in a switchless system.
This was causing the attribute's column and all columns
to its right to not appear in the table.
Normally a non-existent attribute would not be present in
any profile. However, "switch responds" is a member of the
default table profile.
PROBLEM CONCLUSION:
The code has been modified to check for and skip
non-existent attributes when it creates the list
of columns for the table.
------
APAR: IY28615 COMPID: 5765D5100 REL: 340
ABSTRACT: CSSADM CORE DUMP
PROBLEM DESCRIPTION:
In case of a clock problem on the switch the cssadm
recovery action may cause cssadm to dump core.
The srcmstr then starts a new cssadm which starts
over with the recovery (basicly running Eclock+Estart).
The cssadm dumps core after the Eclock and before the
Estart. As each new cssadm does an Eclock the switch
never gets up again.
The only work around is to stop cssadm, run Eclock -d
followed by an Estart.
PROBLEM SUMMARY:
Core dump in the Switch Admin Daemon (cssadm)
when global switch recovery is turned on and Eclock
global switch recovery is turned on and Eclock is run.
The core dump is due to an overflow of a buffer that
is used to generate shell commands; the core dump may
occur when using long hostnames. Also, on some cssadm
messages, there is a mismatch between the format
specification (%s) and the corresponding parameter
(which is defined as int). The result is that the data
(node number, for example) is not displayed.
PROBLEM CONCLUSION:
The buffer for the generated commands was increased
from 120 to 256 bytes. The message text for information
messages 168 and 169 was changed to display the node's
hostname so as to match the format specification (%s).
------
APAR: IY28755 COMPID: 5765D5100 REL: 340
ABSTRACT: CSS_TYPE NOT UPDATED PROPERLY IN SDR ADAPTER CLASS WHEN
PROBLEM DESCRIPTION:
When migrating directly from PSSP 311 to PSSP 34, the css_type
attribute will not be updated in the Adapter class of the SDR.
LOCAL FIX:
1) Manually update the Adapter class with the correct css_type
using SDRChangeAttrValues.
2) Run SDR_config.
3) Customize all nodes to update their css ODM entries.
PROBLEM SUMMARY:
The old css_type list was taken out of the code for PSSP 3.4
and replaced with two new lists - one for SP-Switch and one
for the SP2-Switch. But the code that fills in the css_type
on migrations to PSSP 3.2 (and all later releases) was still
looking for the old list. This code is also run on new
installs.
PROBLEM CONCLUSION:
The code in SDR_init was modified to use the new tables to
determine the css_type. SDR_init will will run
automatically on installation of this APAR.
------
APAR: IY28837 COMPID: 5765D5100 REL: 340
ABSTRACT: PSSP_SCRIPT VERIFY_QUORUM FUNCTION BREAKS WHEN LVSG IS EXECUTED
PROBLEM DESCRIPTION:
According to PSSP admin guide, when the SDR is initiliazed, a
single volume group object is created with the name of "rootvg"
for each node. However, users are allowed to create additional
volume group objects to represent alternate volume groups for
a node. In this case, when pssp_script runs, the function
verify_quorum fails with error code "516-306: Unable to find
volume group <vg name> in a the Device Configuration Database".
The reason for that is because that vg name is only known to the
SDR but not to the operating system.For example, customer ran
the spbootins -c rootvg51 ... to set a node to install. The
/tftpboot/<node_name.config_info file gets built based on the
volume group object specified. When verify quorum does an lsvg
rootvg51, it really does not exist on the machine, however the
label is valid. The pssp_script fails as a result with return
code of 1.
LOCAL FIX:
Offered the following as a possible circumvention:
>modify the node's config_info file and specify the name of
a volume group that's known to the OS:
-edit /tftpboot/<node_name>.config_info file
-instead of rootvg51, change it to rootvg.
-run /tmp/pssp_script on the node again.
PROBLEM SUMMARY:
***********************************************************
* USERS AFFECTED: *
* *
* Users at the following levels or higher, *
* ssp.basic 3.2.0.15 *
* ssp.basic 3.3.0.2 *
* ssp.basic 3.4.0.1 *
* who are installing or migrating a node for which the *
* selected volume group is named something other than *
* rootvg. *
* *
***********************************************************
* PROBLEM DESCRIPTION: *
* *
* When installing or migrating a node when the selected *
* volume group is not named rootvg, pssp_script will fail *
* with a message that the lsvg command failed. *
* *
***********************************************************
* RECOMMENDATION: *
* *
* Install the appropriate APAR for your release of PSSP, *
* when available. *
* *
* APAR IY29411, currently targeted for *
* ssp.basic 3.2.0.19 on PTF Set 19. *
* *
* APAR IY28837, currently targeted for *
* ssp.basic 3.3.0.6 on PTF Set 8. *
* *
* APAR IY29410, currently targeted for *
* ssp.basic 3.4.0.7 on PTF Set 8. *
* *
* Until the APAR for your release is available, prior to *
* issuing nodecond to begin the installation of the node, *
* edit the file /tftpboot/node_hostname.config_info and *
* change the name of the selected volume group to rootvg. *
* *
***********************************************************
------
APAR: IY28873 COMPID: 5765D5100 REL: 340
ABSTRACT: CFGCOL & CFGCOR DON'T UNREGISTER W/ CSS PDD IN SOME EXIT PATHS
PROBLEM DESCRIPTION:
cfgcol and cfgcor have exit paths where they do not de-register
with the CSS pseudo device driver (pdd) for Communication Matrix
(CM) updates. This leaves a stale entry in the pdd data struct
that maintains the list of clients registered for CM updates.
The results are indeterminate, but one possible result is a node
crash.
PROBLEM SUMMARY:
The Colony and Corsair configuration programs (cfgcol
and cfgcor) have error exit paths where they do not
unregister the adapter with the CSS pseudo device driver
(pdd) table. This situation could arise, for example,
if post diagnostics fail. This may leave a stale entry
in the pdd data structure that maintains the list of
clients registered for Communication Matrix (CM) updates.
The results are indeterminate, but one possible result
is a node crash.
PROBLEM CONCLUSION:
If there are errors in the configuration program
(cfgcol or cfgcor) of the Colony or Corsair adapter
after the device is registered with the CSS pseudo device
driver, the adapter will be unregistered before the
configuration program exits.
------
APAR: IY28877 COMPID: 5765E6900 REL: 310
ABSTRACT: SYNTAX ERR IN CONFIG CAUSES NEGOTIATOR/ALL DEAMONS DOWN
PROBLEM DESCRIPTION:
Syntax error in user defined keyword causes it to run into
Start expression corrupting it and corrupted data send to
Negotiator causing it to crash/all Deamons.
PROBLEM SUMMARY:
A missing closing parenthesis in an expression on one node
can bring down the negotiator.
PROBLEM CONCLUSION:
There are two routines that evaluate expressions. One of
them was fixed in the LoadL 2.2 GA code, and the other one
needs the same fix applied to it - implements a clean
error on a missing closing parenthesis.
------
APAR: IY29067 COMPID: 5765D5100 REL: 340
ABSTRACT: ECLOCK -R DOES NOT CHANGE CLOCK SETTINGS
PROBLEM DESCRIPTION:
On PSSP 320 Eclock -r does not change clock settings on the
SP Switch.
LOCAL FIX:
Use Eclock -f <Clock topology file> or the -d option to clock
your system.
PROBLEM SUMMARY:
Eclock -r now makes the correct determination on the switch
type.
PROBLEM CONCLUSION:
When specifying the -r option on Eclock there was no check
to set the switch type before testing it. Eclock -r was not
running the full Eclock process.
------
APAR: IY29068 COMPID: 5765D5100 REL: 340
ABSTRACT: DIAG -C -D CSS0 BRINGS UP PROBLEM DETERMINATION SCREEN
PROBLEM DESCRIPTION:
executing diag -c -d css0 should not take the user to ELA screen
s, it appears css0 diagnostic method does not use the -c flag
anymore.
I think we want to check DA_CONSOLE_TRUE before calling ela_run
damode bits runing diags -c:
DA_CONSOLE_FALSE 0x00080000
da mode bits when running diags without the -c flag:
DA_CONSOLE_TRUE 0x00040000
PROBLEM SUMMARY:
When running diags -c -d css0, the diag method was calling
diagrpt without first checking for a console causing the
user to be prompted.
PROBLEM CONCLUSION:
The diag method first checks for a console before running
diagrpt.
------
APAR: IY29072 COMPID: 5765E7200 REL: 310
ABSTRACT: SMIT: CANNOT CHANGE OR REMOVE SHARES WITH LARGE # OF SHARES
PROBLEM DESCRIPTION:
Problem occurs on SMITTY during "Change Share" or "Delete
Share" If there are a large number of shares, smitty will
display
'1800-051 There are no items of this type.' instead of
showing the list of shares (to select from).
PROBLEM SUMMARY:
Problem occurs on SMIT during "Change Share" or "Delete Share"
If there are a large number of shares, smitty will display
'1800-051 There are no items of this type.'
instead of showing the list of shares (to select from).
(Problem is due to "net share /infolevel:99" showing different
output for more than 300 shares.)
PROBLEM CONCLUSION:
Fix "net share /infolevel:99" to show same output even when
number of shares exceeds 300.
------
APAR: IY29155 COMPID: 5765B9500 REL: 150
ABSTRACT: MMCHFS DOES NOT CHECK FOR EXISTANCE OF SPECIAL DEVICE FILE FOR
PROBLEM DESCRIPTION:
When using mmchfs to change nodeset for fileystem there is no
check to see if the device special file for the gpfs filesystem
already exists on the nodes in the target nodeset. This becomes
a problem when the customer has a jfs filesystem that has a
device special file with the same name. The mmchfs command
completes successfully; however, if the mmchfs command is used
to bring the gpfs filesystem back to the original nodeset the
special device file that belongs to the jfs filesystem is
removed.
PROBLEM SUMMARY:
When using mmchfs to change nodeset for
filesystem there is no check to see if the device special file
for the gpfs filesystem already exists on the nodes in the
target nodeset. This becomes a problem when the customer has a
jfs filesystem that has a device special file with the same
name. The mmchfs command completes successfully; however, if
the mmchfs command is used to bring the gpfs filesystem back
to the original nodeset, the special device file that belongs
to the jfs filesystem is removed.
PROBLEM CONCLUSION:
Do not remove /dev entries if they were not
created by GPFS .
------
APAR: IY29158 COMPID: 5765B9500 REL: 150
ABSTRACT: MMCHFS FAILS WITH _MOUNT_CHECK_ONLY ERROR
PROBLEM DESCRIPTION:
mmchfs fails with _MOUNT_CHECK_ONLY_error
PROBLEM SUMMARY:
Can not reassign a file system from a nodeset
where all nodes are unavailable.
PROBLEM CONCLUSION:
When moving a file system there is no need
for the daemon to be running anywhere in the source nodeset.
------
APAR: IY29208 COMPID: 5765E6900 REL: 310
ABSTRACT: LOADL_STARTER HANGING WITH DEFUNCT CHILD PROCESS
PROBLEM DESCRIPTION:
LoadL_starter hanging with defunct child process.
This hangs the whole job. Sending a SIGKILL to
the LoadL_starter puts the job into the VACATED
state and it will be restarted. Sometimes the
job may run then.
PROBLEM SUMMARY:
A child defunct process would be
produced if the SA_RESETHAND flag is set
and LoadLeveler had inherited that environment.
PROBLEM CONCLUSION:
LoadLeveler would always disable the SA_RESETHAND
flag so no defunct child process would
be produced.
------
APAR: IY29210 COMPID: 5765D5100 REL: 340
ABSTRACT: LONG DEFAULT MSG TO FFDC_STACK_LOG MACRO WILL CORRUPT STACK
PROBLEM DESCRIPTION:
FFDC_STACK_LOG() macro doesn't accommodate a long default msg
string (argument). A default msg that is too long will cause the
the stack to be corrupted. The results are indeterminate.
PROBLEM SUMMARY:
Messages longer than 80 chars sent from the fault service
daemon are corrupting the FFDC stack.
PROBLEM CONCLUSION:
The function that creates the stack record for the fault
service daemon needs to have the extraneous "Msg not
found: " text removed allowing 15 additional characters
to be in the message. Then msgs 2510-606, 2510-203,
2510-816, 2510-817, and 2510-820 need to be shortened
so that they do not exceed the new limit of 98 characters.
------
APAR: IY29241 COMPID: 5765D5100 REL: 340
ABSTRACT: SETUP_CWS IS TRYING TO UPDATE AFS SRVTABS WHEN IT SHOULD NOT
PROBLEM DESCRIPTION:
For afs authentication, the srvtab files are only generated when
the principals are initially created. However as of IY12653
(63733 in PTF 2) the code is trying to regenerate them every
time a node is set to customize, install or migrate.
PROBLEM SUMMARY:
A change was made to the code in this release such that the
srvtab files are regenerated every time a node's bootp
response is set to customize, install, or migrate.
However, when afs authentication is being used the srvtab
file(s) can only be generated when the principals are
originally created. The result is setup_server failing
with:
afs_add_principal:0016-301 Cannot access /tmp/addprin.27490
setup_CWS: 0016-052 The add_principal command could not
add service principals to the Kerberos V4 database.
setup_server: 0016-279 Problem of internally called
command: /usr/lpp/ssp/bin/se tup_CWS; rc= 2.
setup_server: Processing incomplete (rc= 2).
PROBLEM CONCLUSION:
The code has been modified to not regenerate the
srvtab files when afs authentication is being used.
------
APAR: IY29372 COMPID: 5765D5100 REL: 340
ABSTRACT: SMITTY SPCHUSER SHOWS TOO MANY (ALL) SECONDARY GROUPS
PROBLEM DESCRIPTION:
Customer installed PSSP 3.4 (PTF set 5) and AIX 5.1 (ML 01) on
their control workstation. They have observed that the 'smitty
spchuser' command now displays ALL of groups in the "Secondary
GROUPS" line. The /etc/group file is correct and the 'lsuser -a
groups <userid>' and 'smitty chuser' commands show the correct
outputs. The problem is only observed for the 'smitty spchuser'
command for changing SP users.
I have reproduced these observations. On our test system using
AIX 4.33 ML 9, PSSP 3.2 (ssp.basic 3.2.0.14), I created the user
"hughey" with primary group "staff" and secondary group "nobody"
After this, I tried smitty spchuser and observed:
PRIMARY group [1]
Secondary GROUPS [staff,nobody]
I repeated the same procedure on a test system with AIX 5.1.0.15
and PSSP 3.4.0.3 and observed the following incorrect output:
PRIMARY group [1]
Secondary GROUPS [system,staff,bin,sys,adm,..ALL GROUPS HERE]
LOCAL FIX:
None. This is a display feature of 'smitty spchuser' which is
not working correctly. Command 'smitty chuser' circumvents.
PROBLEM SUMMARY:
When splsuser is invoked for a userid, only those groups
which the userid is a member of should be displayed.
Currently all groups that have members are being displayed.
The cause of this is the change from perl 4 to perl 5 and
the way that strings are handled.
PROBLEM CONCLUSION:
group.pkg which is used by splsuser and spchuser has been
modified to check for certain strings not being null as
opposed to not being defined to handle differences between
perl 4 and perl 5.
------
APAR: IY29398 COMPID: 5765B9501 REL: 340
ABSTRACT: KERNEL PANIC DUE TO ASSERT IN SHHASHV.C LINE 1127:LM_HAVE != NL
PROBLEM DESCRIPTION:
If a file was changing from datashipping to non-datashipping
state in the middle of a read/write request. The file was left
in an unlocked state when returning from dsRdWr, and kSFSRead
was asserting when trying to upgrade the lock.
So gpfsperf with the -ds option is just cleaning up the
datashipping state from all the nodes when the next job starts
reading the same file. This read hits a timing window where it
gets into this retry-after-DS-just-turned-off code and forgot
to relock the file.
The result of all this is a kernel panic which causes the node
to crash with this type of traceback:
mmfs:DoPanic__FPcT1iN23T1+0000EC
mmfs:logAssertFailed+0000F0
mmfs:change_lock_vfs__5LkObjFP8CacheObjQ2_5LkObj12LockMode
EnumT2i+000C
mmfs:upgradeFileLock__FP15KernelOperationP8OpenFile7
FileUIDPQ2_5LkOb
PROBLEM SUMMARY:
GPFS self check logic detected an error ShHashV.C, line 1127
while running jobs using the MPI-Io library.
PROBLEM CONCLUSION:
If a file was changing from datashipping to non-datashipping
state in the middle of a read/write request. The file was
left in an unlocked state when returning from dsRdWr and
kSFSRead was asserting when trying to upgrade the lock. Fix
is to relock the file before returning EAGAIN.
------
APAR: IY29417 COMPID: 5765B9501 REL: 340
ABSTRACT: FILESYSTEM HUNG AND LONG WAITERS
PROBLEM DESCRIPTION:
filesystem hung and long waiters
PROBLEM SUMMARY:
GPFS Deadlock on P690 with AIX 5.1.
PROBLEM CONCLUSION:
If a connection is dropped while an incomplete message was
in progress from the node, the hasData flag must be cleared
in receiveMsg, or it will loop forever.
------
APAR: IY29442 COMPID: 5765D5100 REL: 340
ABSTRACT: PARENT/CHILD PROCESSES TIMING PROBLEM FROM SPMON QUERY
PROBLEM DESCRIPTION:
Patent/Child processes timing problem from spmon query function
PROBLEM SUMMARY:
When spmon does a query it forks a process. But the parent
does not wait for the child to finish and exits.
If the parent is faster than the child then it will return
the user to the command line (print the prompt) before the
query's output is written. This gives the impression that
the shell prompt was never returned and that the command is
hung.
PROBLEM CONCLUSION:
The code has been modified so that on a query the parent
will now wait for the child process to finish before it
exits and returns the user to the command prompt.
------
APAR: IY29444 COMPID: 5765B9501 REL: 340
ABSTRACT: SIGNAL 11 IN FORCEDONERECORDS ON RO MOUNTED FILESYSTEM
PROBLEM DESCRIPTION:
Signal 11 in forceDoneRecords on RO mounted filesystem on token
revoke. token_revoke should not call forceDoneRecords if the
filesystem logfile pointer is null, which is the case when it is
mounted RO
PROBLEM SUMMARY:
Signal 11 in forceDoneRecords on RO mounted filesystem on
token revoke.
PROBLEM CONCLUSION:
token_revoke should not call forceDoneRecords if the
filesystem logfile pointer is null, which is the case when
it is mounted RO
------
APAR: IY29456 COMPID: 5765D5100 REL: 340
ABSTRACT: SETUP_SERVER FAILS IF MULTIPLE OTHER_ADDRS SPECIFIED FOR AN ADAP
PROBLEM DESCRIPTION:
setup_server dies when multiple IP adresses are put in
the other_addrs field in the Adapter SDR class.
The error message is: 0016-338 Kerberos V4 setup
was bypassed for network interfaces that could
not be resolved.
The other_addrs field is used in a HACMP
environment.
LOCAL FIX:
change setup_CWS as follows:
if ($other_addrs ne "\"\"" && $other_addrs ne "") {
$netaddr=$other_addrs;
unless (&check_new_name) {
&exit_setup_CWS($return_code); # exit if errors
} } }
to:
if ($other_addrs ne "\"\"" && $other_addrs ne "") {
foreach $netaddr (split(/,/,$other_addrs)) {
unless (&check_new_name) {
&exit_setup_CWS($return_code); # exit if errors
} } } }
PROBLEM SUMMARY:
setup_CWS was unable to process more that one IP address in
the other_addrs attribute of the Adapter class. It was
not parsing the addresses, so it would attempt to perform
host name resolution on the entire string which would fail.
PROBLEM CONCLUSION:
Modified setup_CWS to parse the values in the other_addrs
attribute of the Adapter class prior to performing host
name resolution.
------
APAR: IY29472 COMPID: 5765B9501 REL: 340
ABSTRACT: GPFS WITH QUOTA ON, IN_DOUBT GROWS TOO QUICKLY ON LARGE SYSTEMS
PROBLEM DESCRIPTION:
On large systems where GPFS quota subsystem is turned on,
the per user in_doubt value grows unchecked.
The reclaiming of unused in_doubt does not bring down its value
to an acceptable level.
LOCAL FIX:
Running mmcheckquota on a GPFS file system that is offline will
reclaim all of the in_doubt . Running mmcheckquota on a mounted
GPFS file system will reduce the in_doubt size but will not
fully reclaim it.
PROBLEM SUMMARY:
The in-doubt value for GPFS files systems using quotas does
not get reclaimed after a period of inactivity.
PROBLEM CONCLUSION:
Provide a way to automatically reclaim unused client
shares after a period of allocation/deallocation inactivity
of the user on a client node, so that the overall inDoubt
for a user does not grow "unboundedly".
------
APAR: IY29570 COMPID: 5765E7200 REL: 310
ABSTRACT: FC COREDUMP WHEN FAILED TO SETUP NEW SESSION
PROBLEM DESCRIPTION:
FC can core dump when failed to setup new session for
new user
PROBLEM CONCLUSION:
revalide the failed session.
------
APAR: IY29571 COMPID: 5765E7200 REL: 310
ABSTRACT: NETBIOS DATAGRAM SERVICE IS NOT WORKING
PROBLEM DESCRIPTION:
currently the NetBIOS DataGram service is not supported
PROBLEM CONCLUSION:
implement the NetBIOS DataGram service
------
APAR: IY29620 COMPID: 5765E7200 REL: 310
ABSTRACT: MAKE POSSIBLE TO CHANGE PERMISSION OF AIX FAST CONNECT FILE
PROBLEM DESCRIPTION:
User from PC client can not change the permissions of a
file owned by different user, though has the write
permission on the parent directory.
PROBLEM CONCLUSION:
Check for write access on the parent directory and based
on that allow or disallow to change the file permissions.
------
APAR: IY29622 COMPID: 5765E6900 REL: 310
ABSTRACT: LL CHANGES TO SUPPORT AIX 5.1.D TECHNICAL LARGE PAGE
PROBLEM DESCRIPTION:
ll changes to support AIX 5.1.D Technical Large Page
------
APAR: IY29623 COMPID: 5765E7200 REL: 310
ABSTRACT: POWERPOINT FILE TIMESTAMP CHANGE WITH WINDOWS 2000
PROBLEM DESCRIPTION:
Modification times are not preserved and they change to the
current time
PROBLEM CONCLUSION:
Moving the code that saves the timestamp from FileInstance
to FileEntry class.
------
APAR: IY29683 COMPID: 5765B9501 REL: 340
ABSTRACT: RECOVERY FAILED AFTER NODE FAILURE DURING RESTRIPE
PROBLEM DESCRIPTION:
recovery failed after node failure during restripe
PROBLEM SUMMARY:
Recovery of the GPFS file system failed after a node failure
while running the mmrestripe command.
PROBLEM CONCLUSION:
When moving indirect blocks for restripe, must spool a done
record before changing the disk address. Otherwise, log
recovery might apply the updates after the indirect block
has been re-used
------
APAR: IY29694 COMPID: 5765D9300 REL: 320
ABSTRACT: MPI TASK DUMPS CORE IN COL_READPKT
PROBLEM DESCRIPTION:
Signal handling (non-threaded) allows an mpi write
to be re-entered by the same task causing tasks of
a POE job to hang or dump core. The stack trace
of a core dumping task is:
Segmentation fault in col_readpkt at 0xd0505d88
0xd0505d88 (col_readpkt+0x12fc) cbea0008 lfd fr31,0x8(r10)
(dbx) t
col_readpkt() at 0xd0505d88
kickpipes() at 0xd04d9870
mpci_recv(??, ??, ??, ??, ??, ??, ??, ??) at 0xd04f6a70
barrier_shft_b(??) at 0xd05920e8
_mpi_barrier(??, ??, ??) at 0xd0591e4c
MPI__Barrier(??) at 0xd059105c
mpi__barrier(??, ??) at 0xd011c3ec
gather_field() at 0x101d3d48
pp_output_slice() at 0x101d31f4
pp_output() at 0x101c9e24
pp_makegribs() at 0x101b97b4
pporg() at 0x101ab6a8
progorg() at 0x100c502c
gmeorg() at 0x1000079c
PROBLEM SUMMARY:
Running a non-threaded user space ( mpi signal handling
library ) program, a thread was re-entering writepkt driven
by a signal and causing the program to core dump with
various errors.
PROBLEM CONCLUSION:
Running a non-threaded ( signal handling mpi ) user space
program the thread that is writing will not reenter the
write routine.
------
APAR: IY29838 COMPID: 5765B9501 REL: 340
ABSTRACT: MMCHECKQUOTA PRODUCES NEGATIVE NUMBERS
PROBLEM DESCRIPTION:
mmcheckquota sometimes produces negative numbers when GPFS is
under heavy load.
PROBLEM SUMMARY:
mmquotacheck sometimes produces negative numbers for disk
usage.
PROBLEM CONCLUSION:
Do not update server's shadow entries at ComputeShare and
Relinquish routines since the quota usage and quota share
accounting in this case is done through regular quota
entries.
------
APAR: IY29867 COMPID: 5765D5100 REL: 340
ABSTRACT: SETUP_SERVER RETURNS 1 WHEN SPNET_ENX EXISTS ALREADY
PROBLEM DESCRIPTION:
setup_server returns 1 when spnet_enx exists aleady
PROBLEM SUMMARY:
On a CWS, after changing the IP address of an external
ethernet adapter that is not part of the SP LAN,
setup_server exits with a return code of 1. Before failing
though, setup_server displays a message letting the user
know there was a problem defining the associated NIM
spnet_enX resource (where enX is the en number of the
external ethernet with the changed IP address). The
following is an error message put out by mknimint ... it
is these errors which cause setup_server to truly fail:
0042-001 nim: processing error encountered on "master":
0042-032 m_mknet: object name must be unique and
"spnet_enX" already exists
mknimint: 0016-286 The "nim -o define" command had a
problem defining spnet_enX on <master_hostname>
with a return code value of 1.
PROBLEM CONCLUSION:
The new code within /usr/lpp/ssp/bin/mknimint does a check
to ensure that the interface associated with the network
being changed is not referenced by any client other than the
master. If the master is the only machine which contains the
changed network then the new code will blank out the nim
'if' install interface and cabletype in order to remove the
spnet_enX network. Once the network associated with the
external ethernet adapter (not part of the SP LAN) is
removed, the code will drop down into the 'regular' routine
of defining/redefining the network to nim.
------
APAR: IY29914 COMPID: 5765B9500 REL: 150
ABSTRACT: MMSTARTUP -W FAILS WHEN THERE IS A SPACE AFTER THE NODE
PROBLEM DESCRIPTION:
mmstartup -w fails when there is a space after the node
PROBLEM SUMMARY:
GPFS not handling a blank after nodename in
the node file associated with mmstartup -w.
PROBLEM CONCLUSION:
Remove leading and trailing white space
around hostnames.
------
APAR: IY29929 COMPID: 5765D5100 REL: 340
ABSTRACT: 64BIT:CSS0 DIAGS FAILS WITH TB3PCI
PROBLEM DESCRIPTION:
64bit:css0 diags fails with tb3pci
PROBLEM SUMMARY:
css0 diags may fail with TB3PCI on LPAR node.
The problem was caused when a previously undefined constant
was given a definition in system header files. A conditional
compilation was then reversed.
PROBLEM CONCLUSION:
The solution was to rename the constant so that it is again
undefined.
------
APAR: IY30003 COMPID: 5765B9501 REL: 340
ABSTRACT: EIO ERROR WHILE CREATING FILES
PROBLEM DESCRIPTION:
Since flushFile with FLUSH_FILESIZE_ONLY does not call
flushIndirects, it cannot skip sending the dirty inode to the
metanode.
PROBLEM SUMMARY:
EIO errors being returned on file creation
PROBLEM CONCLUSION:
Since flushFile with FLUSH_FILESIZE_ONLY does not call
flushIndirects, it cannot skip sending the dirty inode to
the metanode
------
APAR: IY30060 COMPID: 5765D5100 REL: 340
ABSTRACT: SWITCH CLOCK NEEDS TO BLOCK SIGNALS DURING INITIALIZATION
PROBLEM DESCRIPTION:
switch clock needs to block signals during initialization
PROBLEM SUMMARY:
the switch clock API listener thread was not blocking
signals, and was taking delivery of signals intended to
signal other threads in the MPI job.
PROBLEM CONCLUSION:
block signals in switch clock API listener thread.
------
APAR: IY30156 COMPID: 5765B9501 REL: 340
ABSTRACT: FSSTRUCT 111 DIRECTORY ERROR
PROBLEM DESCRIPTION:
fsstruct 111 directory error
PROBLEM SUMMARY:
Cached data block of deleted directory could cause FSSTRUCT
on a newly created file.
PROBLEM CONCLUSION:
Before using the new file data block as a directory block,
compare the generation number in gnode with the generation
number of the new file.
------
APAR: IY30205 COMPID: 5765B8100 REL: 230
ABSTRACT: VXML RECORD TAG NOINPUT BLOCK EXECUTES EVEN AFTER DTMF ENTRY
PROBLEM DESCRIPTION:
When using a catch block to catch noinput within a record block,
the catch block will execute even if the caller ends the
recording with a DTMF. The catch block should only execute if
the recording is not stopped by a DTMF.
PROBLEM SUMMARY:
When using a catch block to catch noinput
within a record block, the catch block will execute even
if the caller ends the recording with a DTMF. The catch block
should only execute if the recording is not stopped by a DTMF.
PROBLEM CONCLUSION:
The record logic in DTInChannel was adding a
DTMF terminator key after the input DTMF key, or fake keys
indicating timout or max length. Due to other changes in the
DTMF processing logic this terminator had become unnecessary
and was thus treated erroneously as input to a non-existent
field causing the raising of a noinput event.
------
APAR: IY30367 COMPID: 5765E4600 REL: 120
ABSTRACT: HANG OF VTT PROCESSES ON AIX 4.3.3
PROBLEM DESCRIPTION:
ViaVoice child hangs with error id 20503
PROBLEM SUMMARY:
Hang of VTT processes on AIX 4.3.3 due to a
problem with the logger process. The looger process hangs when
the /var/vtt/log/current.log file exceeds the size of 1MB.
Subsequently all other ViaVoice telephony processes hang when
trying to write logging messages.
PROBLEM CONCLUSION:
Fixed a deadlock in the logger process
apparent with AIX 4.3.3.
------
APAR: IY30372 COMPID: 5765B8100 REL: 230
ABSTRACT: EXCEPTION FROM VXML WHEN USING SUBMIT TAG.
PROBLEM DESCRIPTION:
The VXMLBrowser generates an Exception under certain conditions
with the use of submit.
PROBLEM SUMMARY:
EXCEPTION FROM VXML WHEN USING SUBMIT TAG
------
APAR: IY30373 COMPID: 5765E5300 REL: 120
ABSTRACT: CORE DUMP OF VTT PROCESSES ON PM EXIT
PROBLEM DESCRIPTION:
The ViaVoice process tsmp will occasionally core when shutting
down ViaVoice.
PROBLEM SUMMARY:
Core dump of VTT processes on pm exit. Symptom
is that a ViaVoice server system is busied out wiht the
'tsmcon -b all' command. After waiting for all active calls
to be completed and shutting down the system with the command
'pm exit', ViaVoice telephony processes intermittently abend.
PROBLEM CONCLUSION:
Fixed the faulty exit procedures.
------
APAR: IY30647 COMPID: 5765D5100 REL: 340
ABSTRACT: LATEST PSSP 3.4.0 FIXES AS OF APRIL 2002
PROBLEM DESCRIPTION:
This is the lastest PSSP ptf as of April 2002.
Order this apar to get all of the ptfs as of April 2002.
PROBLEM SUMMARY:
This is a packaging apar for PSSP 3.4.0 fixes
as of April 2002.
PROBLEM CONCLUSION:
This is a packaging apar for PSSP 3.4.0
fixes as of April 2002.
------
APAR: IY30713 COMPID: 5765B8100 REL: 230
ABSTRACT: INCORRECT PUASES IN <SAYAS> TAG
PROBLEM DESCRIPTION:
Incorrect pause in <sayas> tag. Tele number 04 123 07 456
is said as 0, pause, 4, pause, one two three, etc.
PROBLEM SUMMARY:
Incorrect pause in sayas tag
For example 04 123 5678
said as 0, pause, 4, pause, 123, pause, etc.
------
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]