|
Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com |
From: AIX Service Mail Server (aixserv
austin.ibm.com)Date: Tue Apr 17 2001 - 02:18:57 CDT
APAR: IY03021 COMPID: 576553700 REL: 210
ABSTRACT: REPSERVER COREDUMPS AFTER RESTART
PROBLEM DESCRIPTION:
Under certain circumstances, the repserver
daemon will core dump after a bos restart.
The core dump is usually when the function
'ReplicaWantsAdvance' is running. Below is
a stack trace from the core dump:
signal.pthread_kill(??, ??) at 0xd02050b4
signal._p_raise(??) at 0xd020466c
raise.raise(??) at 0xd0013bac
abort() at 0xd000d330
osi_Free_r(), line 524 in "osi_misc.c"
ReplicaWantsAdvance(), line 4645 in "rep_init.c"
StartImporting(), line 4877 in "rep_init.c"
RepThread(), line 5946 in "rep_init.c"
pthread._pthread_body(??) at 0xd01fa300
PROBLEM SUMMARY:
The repserver may coredump shortly after it initializes
itself upon startup. This is more likely to occur on
systems managing hundreds of replicated filesets.
PROBLEM CONCLUSION:
Fix errors in list management in initialization paths
that refer to memory after it has been deallocated in some
cases.
------
APAR: IY04038 COMPID: 576553700 REL: 210
ABSTRACT: DFS CLIENT CRASH-DFSCOREP.EXT:CM_PAGER+29C
PROBLEM DESCRIPTION:
(08b85f34) 2ff22f78: .procentry+14 (000456a8) 2ff22fb8:
.low+0 (00000000)00000:00000000:00000000:00000106)
Paged out routine
and the registers are:
MSTSAVE AREA AT ADDRESS 0x2ff3b400
...
cr flags: | > | |< | = | | = | = | =v|
...
General Purpose Regs
0:0x000c6000 1:0x2ff22ed8 2:0x08b8b734 3:0x0000000b
4:0x08b8c640 5:0x2ff22f10 6:0x00000004 7:0x00000001
8:0x00000000 9:0x00000001 10:0x00000001 11:0x00000000
12:0x20820220 13:0xdeadbeef 14:0xdeadbeef 15:0xdeadbeef
16:0x08b8c640 17:0x00000000 18:0x08b8bff0 19:0x08b8c078
20:0x08b8c134 21:0x00000000 22:0x08b8bf68 23:0x08b8c1e0
24:0x08b8c12c 25:0x00000000 26:0x00000000 27:0x08d42e58
28:0x0000000b 29:0x08b8c074 30:0x08b8c130 31:0x08b8c13c
The crash was because of a data storage interrupt, and this
would appear to be consistent since we are trying to reference
. 647| 000718 bc 40830018 0 BF cr0=gr6,0 646|
000714 bc 41960034 0 BT 646| 000708
cmpi 2E950000 2 C4 cr5=gr21,0 647| 00070C
l 80D50000 1 L4A
for the code:
if (blocked_bp) {
if (ISSET_BP_WOULDBLOCK(blocked_bp)) { /* should not be set
*/
but in the assembly, cr5 is set on the comparison of gr21
(blocked_bp or buf) to 0, but the branch on this condition is
AFTER the register is used to load from that address!
PROBLEM SUMMARY:
Due to a compiler optimization, a load from address 0 was
being attempted while a page fault was being processed
This resulted in the data storage interrupt and the system
crash. The C code:
if (blocked_bp) {
if (ISSET_BP_WOULDBLOCK(blocked_bp))
{ /* should not be set */
. 647| 000718 bc 40830018 0 BF cr0=gr6,0
646| 000714 bc 41960034 0 BT
646| 000708 cmpi 2E950000 2 C4 cr5=gr21,0
647| 00070C l 80D50000 1 L4A
This shows that the address was being referenced AFTER the
branch comparison.
PROBLEM CONCLUSION:
The code was re-written in such a way that the above
optimization would not occur:
if (blocked_bp) {
#ifdef OT /* 54820 */
test_bp = blocked_bp; /* work around compiler bug */
if (ISSET_BP_WOULDBLOCK(test_bp)) {
#else /* OT */
if (ISSET_BP_WOULDBLOCK(blocked_bp)) {
/* should not be set */
#endif /* OT */
------
APAR: IY05652 COMPID: 576560100 REL: 210
ABSTRACT: MAINTAIN_MACHINE_CONTEXT:SEC_LOGIN_GET_EXPIRATION ST=0X171220EB
PROBLEM DESCRIPTION:
DCED looping consumes CPU with error message in log
PROBLEM SUMMARY:
DCED looping consumes CPU, turned on debugging in the
/var/dce/svc/routing file:
dhd:*.9:STDERR:-;FILE:/opt/dcelocal/var/svc/dced.log
log filled up with:
maintain_machine_context:sec_login_get_expiration
st=0x171220eb
PROBLEM CONCLUSION:
In maintain_machine_context() loop if
sec_login_get_expiration() fails with any error status then
it will attempt to refresh the identity otherwise recreate
it.
------
APAR: IY06693 COMPID: 576553400 REL: 210
ABSTRACT: GDA CONFIG PROBLEMS WHEN CDS REPLICAS EXIST
PROBLEM DESCRIPTION:
The problem is a timing issue in mkdce. The mkdce script
calls rgy_edit and pipes input from stdin. The input
creates a principal, creates an account, adds an entry in
a keyfile for the account, and lastly randomizes the keytab
file password and synchs it with the registry password.
The error arises when rgy_edit tries to randomize the
password and synch it with the registry. It gives an error
that the registry object does not exist. This appears to be
a result of the security master not yet digesting the
previous commands completely. The solution was to
separate the creation commands and the keytab commands,
and put a sleep statement between them to give the secd
a breather for a few seconds. This allows the configuration
to complete successfully.
PROBLEM SUMMARY:
Configuring the GDA component of DCE in a cell that contains
security replicas fails.
PROBLEM CONCLUSION:
Modified mkdce so that GDA configuration works correctly.
------
APAR: IY06854 COMPID: 576553700 REL: 210
ABSTRACT: DFS CRASH IN CM_TRUNCATEALLSEGMENTS+1A0
PROBLEM DESCRIPTION:
Customer is experiencing system crashes in a scenario
where a filesystem was filling up, reaching quota
limitations. When a file was truncated back to a
known good length after a failed write or close, the
system would crash with the following stack:
> t -k
STACK TRACE:
LR: . dfscore.ext:cm_TruncateAllSegments +180
2ff3b100: . dfscore.ext:cm_TruncateFile +214 (05aee
2ff3b160: . dfscore.ext:cm_setattr +e0 (05add250)
2ff3b260: . dfscore.ext:naix_ftrunc +78 (05ad494c)
2ff3b320: .vnop_ftrunc+1c (00131840)
2ff3b360: .trunc_common+194 (0014ab54)
2ff3b3c0: .sys_call_ret+0 (00003980)
061d8000: start+11d (00000141)
0000d000: .nodev+0 (0004be7c)
PROBLEM SUMMARY:
An attempt to truncate down the size of a file after it
has encountered a quota error may result in a system
crash with the following stack trace in the dump image:
IAR: . dfscore.ext:cm_TruncateAllSegments +1a0
(05aeba88): teq r14,r14
LR: . dfscore.ext:cm_TruncateAllSegments +180
2ff3b100: . dfscore.ext:cm_TruncateFile +214 (05aee
2ff3b160: . dfscore.ext:cm_setattr +e0 (05add250)
2ff3b260: . dfscore.ext:naix_ftrunc +78 (05ad494c)
2ff3b320: .vnop_ftrunc+1c (00131840)
2ff3b360: .trunc_common+194 (0014ab54)
The probablity for this is relatively small, but
possible.
PROBLEM CONCLUSION:
Protect against a potential race when creating a new
instance of a file object that may still be active handing
error activity from a previous instance of its use.
------
APAR: IY08501 COMPID: 5765E2820 REL: 430
ABSTRACT: CICSEXPORT DOES NOT COPY CONVERSION TEMPLATES
PROBLEM DESCRIPTION:
The cicsexport command does not copy the conversion templates
for the TD and PD stanzas. The documentation on the command
states that the conversion templates would be copied and
exported. When the ooutput is inported the conversion templates
are not included.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
The cicsexport and cicsimport commands did not copy
conversion templates from one region to another.
PROBLEM CONCLUSION:
The commands are corrected and now copy conversion
templates.
------
APAR: IY08663 COMPID: 576560100 REL: 210
ABSTRACT: CDS CACHE CANNOT HANDLE CDS OBJECTS GREATER THAN 64KB
PROBLEM DESCRIPTION:
CDS cache corruption can occur on nsi lookups
for objects greater than 64Kb in the cds
namespace. The offset table (which contains
16-bit offsets from the beginning of the
data structure) wraps after the 64Kb boundary
potentially returning garbaged information
to the calling APIs. Depending on what
data was stored, the calling APIs may fail
with coredumps or other strange behavior.
PROBLEM SUMMARY:
The CDS server allows storage of very large objects, using
a 32-bit number to indicate the offset to each individual
element of the objects. Since the cds client and RPC
runtime do not request the entire object at once, this
is mapped down to a smaller data set with 16-bit offsets
to the individual elements.
Unfortunately, the cds cache will attempt to cache all of
a very large object. This results in offsets to members
that pass the 64K boundary from the beginning of the
data set to wrap, corrupting the dataset in the cache.
The results of this corruption may vary and often
result in coredumps from dce applications when they
attempt to use the corrupted data returned from the
cache.
If the rpc binding expiration age is set to 0 in DCE
applications, the CDS cache will be avoided and the
problem will not occur.
PROBLEM CONCLUSION:
A check was added in the InsertAtt function in the CDS
cache where data is inserted. If the data being inserted
will cause the data set in the cache to exceed 64K, the
data is not inserted. This results in nearly 64K of a
large CDS object to be inserted in the cache, with
lookups to the remaining portion being forced to go
to the server.
------
APAR: IY09593 COMPID: 5765E2820 REL: 430
ABSTRACT: SNA LISTENER CANNOT BE CREATED ON AIX THROUGH SMITTY
PROBLEM DESCRIPTION:
Using smitty to create a listener on TXSeries 4.3 the protocol
for SNA or IIOP shows up as SNAIIOP which is in error and
prevents one from building an sna listener.
This apar has been assigned to defect
202368.
LOCAL FIX:
use CICSADD to create the listener
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
When CICS was used on an AIX system, the Add Listener panel in
the System Management Interface Tool (SMIT) did not provide the
proper selections for the protocol type field. Two entries ran
together and were not able to be selected separately.
PROBLEM CONCLUSION:
The entries are separated and can be selected properly.
------
APAR: IY12288 COMPID: 5639I0920 REL: 430
ABSTRACT: IF IN A COMMAND PROMPT EXECUTES A CICSGET -C TDD XXXXYOU WILL G
PROBLEM DESCRIPTION:
- open the graphical administration utility
- create an intrapartition TD Queue, with a triggered
transaction associated to it.
- re-open the definition just created in the previous step
- you will see that the associated triggered transaction is
empty(at least in our two instalations of TX 4.3 instalations)
If you go to a command prompt and execute a cicsget -c tdd XXXX
command, youwill get correct results, but the graphical admin
istration utility will not show the triggered transaction
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
When CICS was used on Windows systems, the IBM TXSeries
Administration Tool did not display newly added entries, such
as a transient data queue entry. It was necessary to exit from
the utility and reenter to see the new information.
PROBLEM CONCLUSION:
The IBM TXSeries Administration Tool now refreshes the
screen image properly.
------
APAR: IY12451 COMPID: 5765C3407 REL: 210
ABSTRACT: PTF8
PROBLEM DESCRIPTION:
AIX V2R1 NSM PTF8 - Main
------
APAR: IY13076 COMPID: 5765C3403 REL: 430
ABSTRACT: TPC-C PERFORMANCE ENHANCEMENTS FOR S85
PROBLEM DESCRIPTION:
This APAR delivers TPC-C performance enhancements for the S85.
.
This is packaging APAR only. It will not appear in the list
of APARs on the SMIT "Update Software by Fix (APAR)" panel, nor
will the 'instfix' command show this APAR as being installed
after the updates delivered by this package are installed.
.
To install all updates from this package that apply to installed
filesets on your system, use the command:
.
smit update_all
------
APAR: IY13161 COMPID: 5765E2920 REL: 430
ABSTRACT: UXA1.README V1.11 IN OPT/CICS/SRC/EXAMPLES/UXA1.README FOR SOLAR
PROBLEM DESCRIPTION:
The uxa1.readme v1.11 in opt/cics/src/examples/uxa1.readme on
the last page states that in the cheese demo region environment
file LD_PRELOAD=libdb2.a # static library, but it should be
LD_PRELOAD=libdb2.so # shared object library.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
When CICS was used on a Solaris system, the uxa1_README sample
file had a bad DB2 preload parameter designation. It improperly
reported that the LD_PRELOAD=libdb2.a environment variable
setting was required in the environment file; however, the
correct setting needs to point to libdb2.so.
PROBLEM CONCLUSION:
The uxa1_README sample file now correctly points to the
proper DB2 preload library.
------
APAR: IY13382 COMPID: 5765E2820 REL: 430
ABSTRACT: MISSING FUNCTION FOR MICRO FOCUS COBOL COMPLIER DIRECTIVE
PROBLEM DESCRIPTION:
The Micro Focus compiler directive FOLD-CALL-NAME(LOWER) is not
supported with CICS 4.3. Failures indicate that this function
is not loaded when the application is run resulting in abend
A583 message ERZ058139E / 0091 Micro Focus COBOL run-time system
(RTS) is not recoverable by CICS.Defects 203292 & 203304
LOCAL FIX:
EFIX on TX 4.3.0.3 is now available
EFix for TX 4.3.0.3 (PTF3) is available
on /afs/tr/prod/cics/Release43/PTF3/
e-fixes/AIX/Efix3
Efix will be carried forward to next PTF
(PTF4)
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
The cicsmkcobol command failed to support the Merant COBOL
runtime option FOLD_CALL_NAME(lower). The compile process
failed when customers used lower-case wrappers for their COBOL
calls.
PROBLEM CONCLUSION:
TXSeries CICS now includes the extended APIs to handle the
lower-case wrappers for the COBOL calls.
------
APAR: IY13756 COMPID: 5765E2820 REL: 430
ABSTRACT: CSTD PROGRAM STATISTICS ABEND ASRA WITH GREATER THAN 4000
PROBLEM DESCRIPTION:
TXSeries 4.3 - defect 203323
The statitics transactions for programs CST7 failded with
ABEND ASRA when more than 4000 programs were installed.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
The CSTD transaction abnormally ended with an ASRA abnormal
termination when the internal array limit, which was set to
4000 entries, was reached.
PROBLEM CONCLUSION:
The value for the array limit has been increased, and the
abnormal terminations no longer occur.
------
APAR: IY13762 COMPID: 5765E2920 REL: 430
ABSTRACT: AFTER AN ABEND A27L IN CPMI CICS GETS AN UNSUCCESSFUL CONDITION
PROBLEM DESCRIPTION:
After a msgERZ014016E saying there has been an abenduA27L in
the CPMI transaction there is a msgERZ057002E saying that there
is an unsuccessful condition 'FALSE' for function
ComSU_XPWaitAny on line 2345 of comsu_xpfns.c of module30. This
is followed by msgERZ047015W saying there is a storage
inconsistency in CPMI. The CICS region then issues msgERZ057005E
and fails with abendU5701.
This occurs even with the region SafetyLevel set to normal.
The IBM Transarc Lab defect number is 202391.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
During the processing of the ComSU_WaitForAny function, the
region failed showing a U5701 abnormal termination code. The
problem occurred when the internal send thread picked up a
second instance of a pointer to the intercommunications buffer
before completing the processing on the first pointer
instance.
PROBLEM CONCLUSION:
The process now checks the internal send queue for multiple
pointers instances to the intercommunications buffer and waits
until the processing has completed on the first entry before
starting the second.
------
APAR: IY14063 COMPID: 5765E2920 REL: 430
ABSTRACT: ABEND A583 IN TXSERIES CICS 4.3 ON SUN SOLARIS WHEN USING THE
PROBLEM DESCRIPTION:
When using the MicroFocus COBOL animator to debug a program
with an XCTL there is an abenduA583. The console log shows
messages msgERZ058139E and msgERZ014016E for the abenduA583.
The symrecs show the error path begins with SupOS_ServerExit
and that the COBOL runtime cannot be cleaned up. This occurs
with or without PTF3 applied to the system.
Keywords: ERZ058139E ERZ014016E abendA583 A583
LOCAL FIX:
Efix for defect 203348 from change team fixed the problem.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
When CICS was used on the Solaris platform, the error code A583
was reported when the XCTL Application Programming Interface
(API) command was issued while the Micro Focus Animator utility
was active. Animator's unload function was not called.
PROBLEM CONCLUSION:
An XCTL command now works correctly when used with
Animator.
------
APAR: IY14276 COMPID: 5765D6100 REL: 220
ABSTRACT: EPILOG DOES NOT RUN WHEN STARTER ABNORMALLY TERMINATES DURNING
PROBLEM DESCRIPTION:
During a job termation, LL found the switch table couldn't be
unloaded. It sent a switch table error to the startd which
drained the node. However it appears the starter abended before
running the epilog. There is no sign of the epilog being invoked
in the StarterLog. Immediately after the end of this job,
another job started which said "starter is running in recovery
mode." This job also encountered switch table errors, then sent
VACATED status to StartD and also terminated w/o invoking the
epilog.
PROBLEM SUMMARY:
LoadLeveler fails to run the Epilog for jobs that can not
unload the switch table.
PROBLEM CONCLUSION:
The root cause of this problem is that the starter
shutdown time needed to be increased when the system
went from being limited to 4 tasks per node to being
able to get 16 tasks per node. In the process of
determining the root cause, we found that the starter
code could handle recovery better and improved the
logic for this case.
------
APAR: IY15100 COMPID: 5765E2820 REL: 430
ABSTRACT: POST COMMAND PROCESSING RESULTS IN 'CALL LIST TABLE FULL'
PROBLEM DESCRIPTION:
SYMREC indicates:
SYMPTOMS = PIDS/5765E2820 LVLS/430 PTFS/
(#)tasta,
18:01:34, Jun 30 2000 RIDS/TasTA_CallOnExit LINE/109
MS/057003 MSN/4 SRC/11 PRCS/99 AB/A57A PID/86766
TID/1 TIME/001031165402 EST
SECONDARY SYMPTOMS = Call List Table full
Customer has a long running transaction which issues a number of
POSTs and WAITs. It is only ever waiting on one event, yet the
task call list table is filling up and not being decremented.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
During the processing of the POST and WAIT commands, an error
message indicated that the internal call list table was full.
Corresponding to every POST command, CICS added a callback
function to the internal call list table that was intended to
be used in the case of an abnormal failure. The problem
occurred because the process that clears the internal call list
table expected only a single entry, rather than an entry for
every POST command.
PROBLEM CONCLUSION:
The callback function now is added to the internal call list
table only a single time, rather than with each POST entry. The
table no longer overflows with excess entries.
------
APAR: IY15588 COMPID: 5765D6100 REL: 220
ABSTRACT: MISSING API FUNCTION TO EXTRACT "CONTROL EXPRESSIONS" (START,
PROBLEM DESCRIPTION:
missing api function to retrieve the "control expressions"
(START, SUSPEND, CONTINUE, VACATE, and KILL) for each machine.
PROBLEM SUMMARY:
The LoadLeveler API needs to support the Machine
Expressions. These are Start, Suspend, Continue,
Suspend, and Kill.
The IBM LoadLeveler for AIX Using and Administering manual
needs to be updated to include the new Specifications that
ll_get_data supports to extract the machine expressions
listed above.
PROBLEM CONCLUSION:
The LoadLeveler API was enhanced to support the following
Machine Expressions: Start, Suspend, Continue, Vacate,
and Kill.
The table of "Specifications for ll_get_data Subroutine" in
Chapter 11 of the LoadLeveler Using and Administering
manual needs to be updated with the following new
Specifications:
LL_MachineStartExpr, LL_MachineSuspendExpr,
LL_MachineContinueExpr, LL_MachineVacateExpr, and
LL_MachineKillExpr.
All Specifications are for the Machine Object and have a
Resulting Data Type of char*. The Description would be a
pointer to a string containing the Machine's
start/suspend/continue/vacata/kill control expression. All
new Specifications will be grouped with the Machine Object
and will then be added in alphabetical order by
specification.
------
APAR: IY15742 COMPID: 5765D5100 REL: 320
ABSTRACT: UNABLE TO START RVSDD DUE TO KLAPI LOCKING
PROBLEM DESCRIPTION:
Customer is unable to start rvsdd due to klapi locking
PROBLEM SUMMARY:
Problem Summary:
When RVSD tried to use SIGKILL to kill its process group,
a SIGKILL signal was passed to the event threads created
by KHAL. KHAL event handler tried to call the application
(KLAPI) timer handler.
In the dump, it shows that the timer is invoked again and
again endlessly.
PROBLEM CONCLUSION:
Conclusion:
When a sigkill interrupts the et_wait call and the driver
posts a SIG_EVENT
event to KHAL.
KHAL will call sig_chk() to clean signal first.
KHAL will remove the timer handler invocation and replace it
with a call
to the KLAPI error handler with an error code that indicates
the condition
CSS_HAL_SIG_EVENT.
In addition, RVSD will attempt to kill it's process group
with a
SIGTERM instead of SIGKILL. This should prevent KHAL from
getting
an RVSD generated SIGKILL.
------
APAR: IY15759 COMPID: 5765D5100 REL: 320
ABSTRACT: MISSING OR CORRUPT HWEVENTS CAUSES SPLOGD TO RESPAWN
PROBLEM DESCRIPTION:
If the /spdata/sys1/spmon/hwevents file is corrupted or missing
then the splogd will not stay running. It will respawn and fill
up errpt with SRC and splogd connected to hardmon.
A better error message is needed as well as the prevention of
filling up errpt.
LOCAL FIX:
fix hwevents file.
PROBLEM SUMMARY:
If splogd fails early enough, SRC will detect it and stop
retrying to start it after 3 attempts. It was determined
that some time-consuming processing was occurring in splogd
before the hwevents file was being processed. If splogd
exited because of a problem with the hwevents file, it was
too late for SRC to detect it.
PROBLEM CONCLUSION:
There was no reason the time-consuming processing needed to
happen before the hwevents file was processed. The ordering
has been changed. Now, if splogd fails because of a
problem with the hwevents file, it will only retry it twice
before quitting
(3 total). Some additional error logging and debug log
messages were also added to make it easier to detect and fix
a problem with the hwevents file.
------
APAR: IY15789 COMPID: 5765D6100 REL: 220
ABSTRACT: NEED SYNTAX CHECK FOR LOADL_CONFIG ACCOUNTING OPTIONS
PROBLEM DESCRIPTION:
LoadLeveler runs without indicating that an invalid option
(of 4 available) that was entered into the Accounting section:
ACCT = _________ . When invalid(syntax) Accounting is not
used as may have been expecting to run.
PROBLEM SUMMARY:
If the accounting ACCT value is invalid, no error messages
were
reported in the LoadLeveler logs.
PROBLEM CONCLUSION:
If the accounting ACCT value is incorrect,
an error message will be generated in the LoadLeveler
logs.
e.g. If ACCT value of A_ON was misspelled in LoadL_config
file
ACCT = A_O
LoadL_config ERROR: LoadL Config File has an
invalid ACCT value of A_O.
Accounting parameters might not be set as intended.
NOTE: If A_ON is misspelled, then accounting
would have the default setting of A_OFF.
------
APAR: IY15871 COMPID: 5765E2820 REL: 430
ABSTRACT: CST7 DID NOT FUNCTION PROPERLY WHEN MORE THAN 4000 PROGRAMS WERE
PROBLEM DESCRIPTION:
In program statistics of CSTD, if we set 'Y' in 'Inactive progra
there are not any programs. When we have the setting set to "Y"
it's the display totolly blank. If we set 'N' in 'Inactive progr
it's O.K(It's less than 4000). In TXSeries 4.2.0.8, it's same.
In program statistics of CSTD, if we set 'Y' in 'Inactive progra
there are not any programs. When we have the setting set to "Y"
it's the display totolly blank. If we set 'N' in 'Inactive progr
it's O.K(It's less than 4000). In TXSeries 4.2.0.8, it's same.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
When the number of entries for the Display inactive files
attribute exceeded the value set in the internal array for the
CSTD (collect statistics) transaction, a blank screen was
displayed.
PROBLEM CONCLUSION:
The command no longer displays a blank screen if the array
is exceeded.
------
APAR: IY15904 COMPID: 5765D5100 REL: 320
ABSTRACT: HA.VSD OUTPUT A MESSAGE 2506-112 BACKLEVEL NO
PROBLEM DESCRIPTION:
When starting rvsd using ha_vsd on a system with all nodes
at vsd.rvsd.rvsdd 3.2.0.4 level, ha.vsd will output :
ha.vsd: 2506-112 Wed Dec 6 10:20:39 EST 2000 RVSD can not start
. Backlevel nodes were detected. See the rvsdrestrict command.
This is exactly the same as POK Defect 69791.
LOCAL FIX:
A bypass is to set RVSD Restrict level using.
dsh /usr/lpp/csd/bin/rvsdrestrict -s RVSD3.2.0.4
dsh /usr/lpp/csd/bin/ha_vsd rese
PROBLEM SUMMARY:
Effective with fileset vsd.rvsd.rvsdd 3.2.0.4, it is
required that the RVSD_Restrict_Level class be set. If it
has not been set, ha.vsd will output:
2506-112 Wed Dec 6 10:20:39 EST 2000 RVSD can not start.
Backlevel nodes were detected. See the rvsdrestrict command.
This requirement needs to be relayed to the customer.
PROBLEM CONCLUSION:
During the installation of service to vsd.rvsd.rvsdd,
if it is determined that the RVSD_Restrict_Level class has
not bet set, the class will be set to RVSD3.2.
In addition the following messages will be displayed:
The RVSD_Restrict_Level class in the SDR is not set.
Effective with fileset vsd.rvsd.rvsdd 3.2.0.4,
it is required that the RVSD_Restrict_Level class be set.
The RVSD_Restrict_Level class will be set to RVSD3.2
Please verify that this is your current RVSD runlevel.
If it is not, use /usr/lpp/csd/bin/rvsdrestrict to reset it.
To support VSD fencing for node numbers greater than 1999
up to the supported 2048, all nodes must be at fileset
vsd.rvsd.rvsdd 3.2.0.4 or above. Once all the nodes are at
or above the correct level then you must:
1) Stop RVSD/VSD on all nodes by issuing "ha.vsd stop".
2) Issue "rvsdrestrict -s RVSD3.2.0.4". This will
delete the SDR VSD_Fence class and will cause future
fencing information to be stored in an SDR file.
3) Restart RVSD/VSD on all the nodes by issuing ha_vsd.
The rvsdrestrict command has also been changed. If the
RVSD_Restrict_Level class is set to something other than
3.2.0.4, the VSD_Fence class will be recreated, if it does
not exist.
------
APAR: IY16117 COMPID: 5765D5100 REL: 320
ABSTRACT: PSSPFB_SCRIPT INCORRECTLY CONFIGURES FIBER TYPE ADAPTERS
PROBLEM DESCRIPTION:
The script /usr/lpp/ssp/install/bin/psspfb_script does not have
ANY logic constructs for type = "fiber". The code tests for
types "bnc", "dix", and "tp" ethernets, and "fiber" drops thru
this logic and gets the $media_speed taken from the previous
adapter. Example, with this customer, ent0 was configured first
at media_speed of "10_Half_Duplex". The next adapter, ent1, was
the fiber gigabit ethernet adapter. With type = "fiber" it fell
past all the logic to select media_speed and duplicity or auto,
then on line 714 (ssp 3.2.0):
/usr/sbin/chdev -P -l ent1 -a media_speed=10_Half_Duplex
where ent1 received the previous media_speed of ent0. Given
the current situation, the configuration of the fiber ethernet
is at the mercy of whatever configuration its predecessor has.
LOCAL FIX:
The following workaround is inserted just BEFORE the chdev -P
line (referenced above) of psspfb_script, which is line 714 in
PSSP 3.2.0 and line 592 in PSSP 3.1.1:
if $ci_enet_type="fiber"
then media_speed=$AUTONEGOTIATION
fi
$chdev -P -l ent$dev_num -a media_speed=$media_speed #line 714
PROBLEM SUMMARY:
psspfb_script is incorrectly configuring the media_speed
of fiber ethernet adapters. Instead of setting the
media_speed to Auto_Negotiation, which is the only valid
setting, it is set to the media_speed of the previously
processed ethernet adapter.
PROBLEM CONCLUSION:
psspfb_script has been modified to check if an
adapter has a type of fiber, and if so, the
media_speed of the adapter is set to Auto_Negotiation.
------
APAR: IY16121 COMPID: 5765D5100 REL: 320
ABSTRACT: CUSTOMIZATION W/O REBOOT RESETS NON-BIS DEFAULT ROUTES TO BIS
PROBLEM DESCRIPTION:
First set the default internet route (inet0) to a host that is
not the Boot Install Server of that node. Next, set the node
to customer from the CWS. Finally, log into the node and run
pssp_script. Do NOT reboot or power off/on the node. If you
reexamine the default internet route of the node with the
"netstat -nr" command, you will see it has been changed to the
Boot Install Server of that node.
LOCAL FIX:
Create /etc/rc.local and put the default route correction into
this script. After customization, dsh -w <node> /etc/rc.local
to restore the non-BIS default route. Verify with the command
dsh -w <node> "netstat -nr".
PROBLEM SUMMARY:
The documentation in the Installation Guide and the
Command and Technical Reference for spethernt, on how to set
the default route on a node did not take into account the
customization path.
PROBLEM CONCLUSION:
The section on setting the default route for a node in the
Installation Guide and in the spethernt section of the
Command and Technical Reference have been updated to state
the following:
For FDDI, token ring, or other Ethernet adapters, create
the route in firstboot.cust. For the route to remain set
after customization, also set the route up in /etc/inittab
after the line that runs rc.sp. For the switch, set the
route up in /etc/inittab after the line that runs rc.switch.
------
APAR: IY16196 COMPID: 5765D5100 REL: 320
ABSTRACT: KLAPI FAILURE ON RETRANSMIT
PROBLEM DESCRIPTION:
KLAPI causing node crash due to retansmit errors
PROBLEM SUMMARY:
KLAPI ASSERTs in various ways and causes nodes to crash.
PROBLEM CONCLUSION:
KLAPI getting mis-directed packets and we log and drop
packets.
------
APAR: IY16279 COMPID: 5765D6100 REL: 220
ABSTRACT: CRXLF90 GIVES: LD: 0711-317 ERROR: UNDEFINED SYMBOL: AIO_READ64
PROBLEM DESCRIPTION:
crxlf90 gives:
ld: 0711-317 ERROR: Undefined symbol: aio_read64
ld: 0711-317 ERROR: Undefined symbol: aio_write64
ld: 0711-317 ERROR: Undefined symbol: aio_suspend64
ld: 0711-317 ERROR: Undefined symbol: aio_cancel64
ld: 0711-317 ERROR: Undefined symbol: .einfo
LOCAL FIX:
at line 88 of the script crxlf90
add -l xlf90 before the \
PROBLEM SUMMARY:
The crxlf90 compilation script is missing a reference
to the xlf90 library.
PROBLEM CONCLUSION:
-lxlf90 can be added to the script so that customers do
not have to include this change as part of the
customization required for this script to run.
------
APAR: IY16295 COMPID: 5765D5100 REL: 320
ABSTRACT: DOCS UNCLEAR ON INITIAL_HOSTNAME AND NAME RESOLUTION
PROBLEM DESCRIPTION:
the docs need to be clearified, in the description on howto
use long or short names in reference to the name resolution.
It should be clearly stated, that the initial_hostname
should match the format of the host resolution on the CWS.
i.e. if you use short, host <ip> should return the shortname.
also, the smit help screen on 'smit hostname_dialog' should
be extended in the same way.
PROBLEM SUMMARY:
In the Installation Guide, the documentation on using short
host names instead of long host names when configuring the
initial host names for nodes is not detailed enough.
The documentation and the help for the smit panel for
Hostname Information should be improved.
PROBLEM CONCLUSION:
The following information will be added to the Installation
Guide sections on Configuring Initial Host Names for Nodes:
When determining whether you wish the nodes' host name to
be of the long or short form, you should be consistent with
the host name resolution on the control workstation. If
the host command returns the short form of a host name, you
should choose to use the short form for the node's initial
host name.
The help for the "Use Short or Long Hostnames" field on
the smit panel for Hostname Information has also been
updated. It states that when indicating whether the node's
hostnames should be of the long or short form, you must
be consistent with the host name resolution on the
control workstation.
------
APAR: IY16368 COMPID: 5765D5100 REL: 320
ABSTRACT: SSP.BASIC POST_U MAY RUN SDRGETOBJECTS BEFORE SDR IS READY
PROBLEM DESCRIPTION:
In the ssp.basic post_u script, it does an SDRGetObjects Frame
to get all the frame numbers to add the SPbgAdm line for each
frame to hmacls. Just prior to this the sdr is recycled. It
waits for SRC to report sdr as "active" before it moves on.
But the SDR amy still not yet accept connections/queries at this
point. Some other logic needs to be used to determine is the sdr
is ready to accept connections/queries.
LOCAL FIX:
Add the line(s) to /spdata/sys1/spmon/hmacls manually. They
should look like:
# root.SPbgAdm vsm
where # is the frame number. There should be 1 line for each
frame.
PROBLEM SUMMARY:
The ssp.basic.post_u script gets run as part of the PTF
install process after the files have been installed. If
the sdr daemon has been modified, it will automatically
recycle the sdr. It was waiting for lssrc to report the
sdr as active before it proceeded. But it turns out the sdr
does some internal processing and may not yet be ready to
accept query requests (i.e. SDRGetObjects) when lssrc
reports it as 'active'. In this case a subsequent sdr
query in the ssp.basic.post_u file failed because of this,
which caused the hmacl file to not get updated with the
SPbgAdm entries as was intended.
PROBLEM CONCLUSION:
The script was modified to wait for the SDR to start
accepting queries before it moves on.
------
APAR: IY16468 COMPID: 5765E2820 REL: 430
ABSTRACT: CICS/AIX TERMINATES ABNORMALLY WITH ABEND U5701. ERZ057002E/0014
PROBLEM DESCRIPTION:
ERZ057002E/0014 : CICS internal error: Unsuccessful condition
'( (SUPPR_MUTEX_TYPE_VALUE(Mutex) == SUPPR_MUTEXTYPE_CRUCIAL)
|| (SUPPR_MUTEX_TYPE_VALUE(Mutex) == SUPPR_MUTEXTYPE_NONCRUCIAL)
|| (SUPPR_MUTEX_TYPE_VALUE(Mutex) == SUPPR_MUTEXTYPE_NAMED))'
for function 'SupPR_MutexTerminate' on
line 523 in file '/cics/ServLvl/src/sup/pr/src/suppr_mutx.c'
of module 56
ERZ057005E/0014 : CICS internal error: Abnormally
terminating region with exit code 'U5701'
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
An unexpected exit occurred when the region attempted to
perform a forced purge of a CICS application server (cicsas)
after the terminal was deleted. This was caused by an attempt
to free a mutex that had already been freed.
PROBLEM CONCLUSION:
The flow of the cleanup process has been adjusted to
ensure that no attempt is made to free the mutex twice.
------
APAR: IY16469 COMPID: 5765E2820 REL: 430
ABSTRACT: CONCO_ATTACH DOES NOT NEED TO ABEND WITH A U1002 WHEN TRYING TO
PROBLEM DESCRIPTION:
SYMPTOMS = RIDS/ConCO_Attach LINE/-1 MS/010023 MSN/1 SRC/11 PRCS
AB/U1002 PID/60770 TID/1
SECONDARY SYMPTOMS = PostMortem (Error Path is offset x'1f0' in
ConCO_Attach<SupPR_MemAttachUpcall<SupPR_ShareLockLockU<TasPR_Ca
llApplication<TasPR_RunProgram<TasPR_IRun<ComCL_CallDFHCCINX<
ComCL_CCINUninstall<ComCL_CCINBackEnd<ComCL_CCIN<TasTA_Exec<Tas
TA_Run<main) logging where error occurred
SYMPTOMS = conco, RIDS/ConCO_Attach LINE/328 MS/010023 MSN/1 SRC
/11 PRCS/22 AB/U1002 PID/60770 TID/1
SECONDARY SYMPTOMS = Unable to attach region pool ID 8203 at
RegionPoolBase address a0000000, rc=22
Recieving U1002 abend in ConCO_Attach with message ERZ010023E
when trying to re-attach shared memory.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
Receiving the error ERZ010023E Cannot attach shared memory
incorrectly caused the region to terminate with the U1002
abnormal termination code.
PROBLEM CONCLUSION:
ConCO_Attach<SupPR_MemAttachUpcall<SupPR_ShareLockLockU<TasPR_Ca
llApplication<TasPR_RunProgram<TasPR_IRun<ComCL_CallDFHCCINX<
ComCL_CCINUninstall<ComCL_CCINBackEnd<ComCL_CCIN<TasTA_Exec<Tas
TA_Run<main) logging where error occurred
SYMPTOMS = conco, RIDS/ConCO_Attach LINE/328 MS/010023 MSN/1 SRC
/11 PRCS/22 AB/U1002 PID/60770 TID/1
SECONDARY SYMPTOMS = Unable to attach region pool ID 8203 at
RegionPoolBase address a0000000, rc=22
Recieving U1002 abend in ConCO_Attach with message ERZ010023E
when trying to re-attach shared memory.
------
APAR: IY16470 COMPID: 5765E2820 REL: 430
ABSTRACT: CICS/UNIX DYNAMIC TRANSACTION ROUTING WITH BLANK PROGNAME
PROBLEM DESCRIPTION:
An ATI transaction which is defined as dynamically transaction
routed whose local transaction definition has a blank ProgName
attribute will fail in TasTA_Exec, being mistaken for a CICS
privileged transaction. This is contrary to the docs which stat
that the ProgName attribute may indeed be blank for dynamically
transaction routed transaction definitions.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
Dynamic Transaction Routing (DTR) requests failed when the
local transaction had a blank program name definition. The
program name definition is an optional variable; therefore, the
DTR request needed to be able to handle a blank program name.
PROBLEM CONCLUSION:
Local transactions having a blank program name are
correctly handled by DTR requests now.
------
APAR: IY16471 COMPID: 5765E2820 REL: 430
ABSTRACT: CICSCTL HANGS UP OR DUMPS CORE IF LFLAGS IS LONGER THAN
PROBLEM DESCRIPTION:
If LFLAGS has 1000 or 2000 character lengths of libpath,
cicsctl dumbs or dumps SIGSEGV core file with the error :
RIDS/SupOS_SignalHandler LINE/194
MS/058916 MSN/1 SRC/11 PRCS/0 AB/ PID/44470 TID/0
SECONDARY SYMPTOMS = Unexpected signal 11 caught
ERZ058916E/0001: Unexpected signal 11 caught
make: 1254-004 The error code from the last command is 99.
Stop.
LOCAL FIX:
none. will be fix on ptfset 8
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
The error message ERZ058916E/0001: Unexpected signal 11 caught
was reported when the cicstcl command was run with more than
255 LDFLAGS settings.
PROBLEM CONCLUSION:
The problem was resolved by setting the value for the allowed
number of LDFLAGS to match the value given for the
POSIX_MAX_LINE attribute. Also, a more descriptive error
message is displayed when that value is exceeded.
KEYWORDS: PTF 4 CICS 4.3.0.4, defect # 202670
------
APAR: IY16472 COMPID: 5765E2820 REL: 430
ABSTRACT: DEFECT 202813 PROBLEM WITH TASK TERMINATION DURING FORCE
PROBLEM DESCRIPTION:
We have a problem with task termination during force purge of a
conversational task. In the problem scenario thread 1 is waiting
in TerEP_RConverse (rpc call) waiting for terminal input. The
ERT runs task termination on behalf of a Force Purge request.
Thread 1 is thread tid certified. So the ERT's tran abort does
commit or abort. We recommend decertification of thread 1 prior
to rpc calls.
Defect 202813
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
A FORCE PURGE operation performed on a conversational
task caused the region to hang indefinitely.
PROBLEM CONCLUSION:
This problem has been resolved.
------
APAR: IY16473 COMPID: 5765E2820 REL: 430
ABSTRACT: CICS/UNIX RPC ECI SYNCHRONOUS CALLS FAIL WITH ECI_TIMEOUT NEVER
PROBLEM DESCRIPTION:
A CICS/AIX ECI application that makes synchronous calls
timeout (which is ok) after exceding the eci_timeout parm
coded in the application. CICS never sets the LUW_STATE to
free for the timeout request. If more than the CICSECIMAXLUW
number of requests timeout (default =16) then any new
ECI requests from the program will fail with ECI_ERR_NO_SESSIONS
The defect is 202822.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
External Call Interface (ECI) clients that failed to attach to
a region were timing out with the message ECI_ERR_NO_SESSIONS.
This error occurred because the value for the CICSECIMAXLUW
environment variable had been reached. The value for the
CICSECIMAXLUW variable was being exceeded because the logical
unit of work (LUW) state was not set correctly to the
LUW_Timedout attribute when a client timed out.
PROBLEM CONCLUSION:
The LUW state is set correctly when the client times out, and
the CICSECIMAXLUW limit is not reached unless a critical
problem prevents the LUWs from being freed. In that case, it is
proper to receive the error message indicating no available
sessions.
------
APAR: IY16474 COMPID: 5765E2820 REL: 430
ABSTRACT: CICS APPLICATION SERVERS TERMINATING AFTER RECEIVING SIGNAL 30
PROBLEM DESCRIPTION:
After the application of 5.0.4.1 of the Communication Server
product signal 30 ( SIGUSR1 )is sent to all SNA connections.
Because of changes made to CICS to provide support for JAVA we
no longer provide any signal handling function for this (added
at base level 4.2). Prior to that level, Communication Server
would only send the signal 30 to the activity processes. The
result is that when the signals are received CICS will
abnormally terminate the Application Server. The problem is
that since this is sent to all connected AS processes and not
just the active ones, all of the regions AS processes
terminate. This does not cause region termination and data
integrity is not a problem. As a side effect, if the region
uses DB2, the abnormal termination does not cleanup the
connections and it is possible to exceed the connection limit
for the database.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
If CICS received a signal 30 response from SNA after opening a
SNA device, all of the CICS application servers abnormally
ended and all processing incorrectly shut down.
PROBLEM CONCLUSION:
The signal 30 response does not report an actual error signal,
and it can be ignored. CICS now suppresses the issuing of the
signal 30 and properly handles SNA processing.
------
APAR: IY16475 COMPID: 5765E2820 REL: 430
ABSTRACT: THE MAXTASKCPU NOT WORKING ON THE MIRROR TRANSACTION.
PROBLEM DESCRIPTION:
MaxTaskCpu atribute on transaction definition is no effect for
mirror transaction. It can take effect if non-mirror transaction
with MaxTaskCpu attribute is executed before.
The CICS trace shows the function Tas_TA_CPUCheck is called for
non-mirror transaction but it isn't for mirror transaction.
This error is already determined as code defect through defect#
203047.
LOCAL FIX:
May be bypassed by executing non mirror transaction with
MaxTaskCpu through PLT.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
The Region Definition attribute MaxTaskCPU was not in effect
for any transaction running under the mirror transaction CPMI.
As a result, the transaction exceeded the time allowed for CPU
utilization.
PROBLEM CONCLUSION:
The CPMI transaction now uses the value given in the
MaxTaskCPU attribute to limit the execution of any transaction
running under the mirror transaction.
------
APAR: IY16476 COMPID: 5765E2820 REL: 430
ABSTRACT: FMH43 HEADER CONTAINS A BYTE SETTING THAT WE REJECT AS AN ERROR
PROBLEM DESCRIPTION:
At mainframe CICS level 1.3 the FMH43 header was changed
to use a previously undefined hex "01" in the 5th byte of the
header for remote schedule task. The setting is to be meaningful
to the backend CRSR. In TX series we do not accept that
byte setting for a remote schedule and we put out an error
and end the transaction.
In module comrs_xfm2.c at line 254 we are trying to do a
switch command for case COMRS_FMH_SCHED where we
expecting a setting of hex '00' but in this case we get the'01'
and fall through to the default failure. We should except the
COMRS_FMH_TERMID as a valid setting for a remote scheduled
transaction and allow the process to continue.
This is the same as defect 203238 as written by Bob Kiska
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
In the mainframe version of CICS 1.3, the function management
header (FMH43) was changed to use a previously undefined value
in the fifth byte of a remote schedule task header. TXSeries
CICS did not accept this byte setting for a remote schedule
task, and ended a transaction whenever that byte setting was
encountered.
PROBLEM CONCLUSION:
TXSeries CICS now accepts the new FMH43 setting.
------
APAR: IY16477 COMPID: 5765E2820 REL: 430
ABSTRACT: ERZ042011E NETNAME DOES NOT EXIST IN THE WD RECEIVED IN EVENT
PROBLEM DESCRIPTION:
ERZ042011E Netname does not exist in the WD is received in the
NT event log and CSMT for all terminals autoinstalled with
unique termids and netnames, and no_netname_check is specified.
The customer knows they do not exist in the WD, has checking
turned off, and still sees these in the logs for every install.
He wishes the messages to be surpressed in this case.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
TXSeries CICS incorrectly reported the error ERZ042011E Netname
does not exist in the Terminal Definitions (WD). This message
occurred when a logon request that included only the netname
was submitted for use by an autoinstall process. If the netname
and the device type are included with the logon request,
suppression of the error message is expected.
PROBLEM CONCLUSION:
The autoinstall process now suppresses the error message when
the device type and netname are provided by the logon process.
------
APAR: IY16478 COMPID: 5765E2820 REL: 430
ABSTRACT: TXSERIES 4.2.0.7 A57A ABENDS - SYMRECS SHOWS ERROR CLOSING
PROBLEM DESCRIPTION:
Console would show transactions abending A57A. symrecs shows
Error Closing Handle - 6 out of either SupOS_MutexAccessHandle
or SupOS_MutexDropHandle. Sometimes the CICS region would come
down; sometimes just the application server cicsas would
terminate. defect 203336
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
CICS processes and regions sometimes ended unexpectedly and the
error message Error closing handle -6 was displayed. The fatal
error occurred because CICS attempted to use an invalid handle
for cleaning up the mutex.
PROBLEM CONCLUSION:
Mutex processing is changed to check on the validity of the
handle. If the handle is invalid, then the mutex is freed with
a default handle.
------
APAR: IY16479 COMPID: 5765E2820 REL: 430
ABSTRACT: FEPI CICSLU.EXE DOESN'T CHECK RETURN CODES PROPERLY
PROBLEM DESCRIPTION:
This apar has been opened against defect 203337.
There is a memory problem in the FEPI CICSLU.exe process.
The process was not checking the return code properly and when
the connection was not available it was not releasing the
allocated memory as it should.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
The cicslu.exe process leaked memory when the Front-End
Programming Interface (FEPI) LU0 connection was unavailable.
The problem occurred because the schedule work call was not
checking the return code on FEPI calls; as a result, memory was
not deallocated for calls that failed.
PROBLEM CONCLUSION:
The schedule work call now checks the return code on all
FEPI calls; if a call fails, memory is deallocated correctly.
------
APAR: IY16480 COMPID: 5765E2820 REL: 430
ABSTRACT: WHEN THE MAINFRAME GOES DOWN AND CICS LOSES IT'S ABILITY TO COMM
PROBLEM DESCRIPTION:
When the mainframe goes down and CICS NT loses it's ability to
communicate to it via SNA, an exc_e_illaddr is received followed
by a U1001 for the transaction. The application server then
terminates. symrecs shows the stack as
ComSU_CloseConv<ComTR_Relay<ComTR_RelayMain<TasTA_Exec<ComCR_CRT
EBackEnd<ComCR_CRTE<TasTA_Exec<TasTA_Run
Defect 137429
New defect 203354 was opened for the problem. Defect 137429 per
tains to another platform.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
When Systems Network Architecture (SNA) was used to connect to
a mainframe, CICS had problems if network errors caused the SNA
sessions to be dropped. When CICS attempted to reallocate the
sessions, an exc_e_illaddr exception, indicating an illegal
address, was displayed and the region was shut down.
PROBLEM CONCLUSION:
The correct address is now used on the reallocation process.
------
APAR: IY16484 COMPID: 5765D5101 REL: 120
ABSTRACT: TOPSVCS/GRPSVCS LOGS CREATED WITH WORLD WRITABLE
PROBLEM DESCRIPTION:
Some HACMP generated files created with world writeable
permissions
drwxr-xrwx /var/ha/run/ nmdiag.en1 nmdiag.0.1.0
-rw-rw-rw- root system /var/ha/log/topsvcs.02.11.llunit
-rw-rw-rw- /sbin/cluster/server.status "S7A llup_resource"
LOCAL FIX:
change file permissons to 600
no help for newly created logs
PROBLEM SUMMARY:
The topsvcs/grpsvcs logs had world-writable permissions.
PROBLEM CONCLUSION:
The topsvcs and grpsvcs logs now have a permission
of 644.
------
APAR: IY16486 COMPID: 5765D5100 REL: 320
ABSTRACT: SPDELFRAM FAILS TO DELETE INFORMATION
PROBLEM DESCRIPTION:
spdelfram fails to delete frame information because spdelnode
passes a non-zero return code.
LOCAL FIX:
The work around for this problem is to issue spdelfram a second
time.
PROBLEM SUMMARY:
When issuing spdelfram to delete a frame that has just been
created, spdelnode fails because it is trying to remove
node information that has not been entered yet.
Because the nodes have not yet been configured, spdelnode
issues error messages when it attempts to delete the NIM
resources for the node as well the Adapter class
information for the node in the SDR. As a result, the
spdelfram command fails.
PROBLEM CONCLUSION:
spdelnode was modified to verify that a node has NIM
resources prior to attempting to remove them. It will also
verify that Adapter information exists in the SDR for a
node prior to attempting to delete it. As a result,
spdelnode will no longer issue error message when invoked
for a node that has not been configured. This will allow
spdelfram to delete a frame that has just been created.
------
APAR: IY16492 COMPID: 5765D5100 REL: 320
ABSTRACT: ADAPTER.LOG CA_DECODE_IBITS TERMINATED DUE TO NO ACTIVE BITS
PROBLEM DESCRIPTION:
Message in /var/adm/SPlogs/css0/adapter.log
ca_decode_ibits() terminated due to no active error bits found
PROBLEM SUMMARY:
some conditions which SP Switch2 adapter recognizes but
which are not significant were being reported as errors
PROBLEM CONCLUSION:
handling of various error bits modified to prevent
irrelevant or nonexistent conditions from being reported
as errors
------
APAR: IY16523 COMPID: 5765E2820 REL: 430
ABSTRACT: CICS TERMINALS FROM A MVS SYSTEM DO NOT PASS THROUGH THE AIX
PROBLEM DESCRIPTION:
ISC connections over SNA can not handle duplicate terminal ids
on shipped terminals. The duplicate terminal id condition occurs
when a mainframe region acts as a gateway for many other
mainframe regions, and the CICS AIX region acts as the AOR to
the gateway region. Duplicate terminal ids come in when two of
the mainframe TORs generate the same terminal id and these are
passed on by the gateway to the AIX AOR. The gateway region
is not effected by the duplicate terminal ids because it assigns
alias names to terminals comming in through an ISC connection.
The real terminal ids are passed to the next region when
transactions are shipped to the next region, in this case it was
AIX region. Distrubitive CICS does not assign alias names to
shipped terminals which causes the problem.
A fix is needed to alias the terminal ids shipped in or minimall
check to see if a terminal shipped is a duplicate and then chang
it if it is.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
When a CICS region on an AIX machine acted as the Application
Owning Region (AOR) to the gateway region, it was possible for
duplicate terminal IDs to occur. When two of the Terminal
Owning Regions (TORs) generated the same terminal ID, the
gateway passed the duplicate IDs to the AOR on the AIX machine.
The gateway region was not affected by the duplicate terminal
IDs because it assigned alias names to terminals coming in
through an Intersystem communication (ISC) connection. The
original terminal IDs were passed when the transactions were
shipped to the next region, in this case the AIX AOR region.
TXSeries CICS did not assign alias names to shipped terminals.
PROBLEM CONCLUSION:
The code is changed to check the application terminal ID
against terminal IDs already in use. Duplicate application
terminal IDs are signed on with a numeric extension to ensure a
unique terminal ID for that application terminal ID.
------
APAR: IY16545 COMPID: 5765B9501 REL: 320
ABSTRACT: FILE SYSTEM CANNOT BE REBALANCED
PROBLEM DESCRIPTION:
file system cannot be rebalanced
PROBLEM SUMMARY:
File System cannot be rebalanced
PROBLEM CONCLUSION:
ReuseBitmap caused bad endMask to be generated for last word
of the map
when the number of bits was exactly a multiple of 32.
Since the disk allocation code used this function
extensively, if there was
a multiple of 32 disks in the filesystem strange things
happened.
------
APAR: IY16546 COMPID: 5765B9501 REL: 330
ABSTRACT: FILE SYSTEM CANNOT BE REBALANCED
PROBLEM DESCRIPTION:
file system cannot be rebalanced
PROBLEM SUMMARY:
File System cannot be rebalanced
PROBLEM CONCLUSION:
ReuseBitmap caused bad endMask to be generated for last word
of the map
when the number of bits was exactly a multiple of 32.
Since the disk allocation code used this function
extensively, if there was
a multiple of 32 disks in the filesystem strange things
happened.
------
APAR: IY16632 COMPID: 5765D5101 REL: 120
ABSTRACT: THE NEW DOMAIN TERMINATED THEIR HAGS BECAUSE THE PREVIOUS
PROBLEM DESCRIPTION:
The new domain terminated their hags because the previous
nameserver came back and they think it is the better domain.
This should be the sequence of events that should occur when
cws has a network problem and it is the nameserver.
hags terminated on all the nodes is because of the inconsistency
of the node events in the hats layer. hats in the cws never knew
that he was detected as down by other nodes and he still thought
he is NS since he did not get update from hats about nodes statu
LOCAL FIX:
Re-establish the Domain by not node0
1>Kill subsystem on CWS hagsctrl -k, hatsctrl -k
2>Start subsystem on CWS hatsctrl -s, hagsctrl -s
PROBLEM SUMMARY:
If the Topology Services daemon is temporarily blocked,
or if a node's network adapter suffers a temporary glitch,
then this could result in a scenario where all other
nodes see this node as dead, while this node never sees
the remaining nodes die.
One case where this scenario could have adverse effects
occurs when the node in question is the Control Workstation
(which does not have a switch adapter) and the Group
Services's "Name Server" is located at that node.
In this case it is possible for this "node
event inconsistency" to result in the Group Services
daemon terminating in all the nodes, due to "domain
merging" or "non-stale proclaims".
There are several variants of the problem, according to
the location of the "Name Server" or Topology Services's
Group Leaders. In some instances, the Group Services
daemon terminates with a "Group services daemon received
a non-stale proclaim message" or "Group Services daemon
exit to re-join the domain" message in the AIX error log.
PROBLEM CONCLUSION:
The possibility of node event inconsistency caused
by daemon blockage or adapter/network glitch has been
eliminated with changes to the Topology Services
daemon. In case the Topology Services daemon is blocked
for a long period, the remote nodes will consider this
node dead, as before. With the changed code, this node
will also consider all the remote nodes as dead. This
will result in the Group Services daemon terminating
on this node (with a "merged domain"), but will avoid the
scenario where the Group Services daemon terminates
in a large number of nodes.
Eliminating the node event inconsistency also helps
preventing certain scenarios where the Group Services
domain becomes hung.
------
APAR: IY16741 COMPID: 5765B8100 REL: 220
ABSTRACT: DTBE DOES NOT PERFORM ECHO CANCELLATION CALIBRATION
PROBLEM DESCRIPTION:
The DTBE do not perform any Echo Cancellation calibration.
The CHP is not setting SV231 correctly, so calibration is never
performed.
PROBLEM SUMMARY:
DTBE DOES NOT PERFORM ECHO CANCELLATION
CALIBRATION
PROBLEM CONCLUSION:
changed variable echo_converge type from int
to short
------
APAR: IY16796 COMPID: 5765E2820 REL: 430
ABSTRACT: PROVIDE COMPATIBILITY FOR FMH CHANGES IN CTS2.1 TO TX SERIES
PROBLEM DESCRIPTION:
CTS 2.1 has made changes to the format of the FMH data used to
communicate with TX Series. TX Series must change the parsing
logic for the FMH data to maintain compatiblity between the two
products.
TX Series defect 203397 will provide the changes necessary to
communicate with the mainframe CTS 2.1 system.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
TXSeries CICS needed modifications to accommodate changes made
in the mainframe product, CICS Transaction Server (CTS) 2.1.
PROBLEM CONCLUSION:
The parsing logic of the Function Management Handler (FMH)
process in TXSeries CICS to adapt to changes made in CTS 2.1.
------
APAR: IY16828 COMPID: 5765B9501 REL: 320
ABSTRACT: REVOKE HANDLER WAS RETURNING WITHOUT SENDING A REPLY MESSAGE.
PROBLEM DESCRIPTION:
revoke handler was returning without sending a reply message
when it was unable to find the file in the hash table.
PROBLEM SUMMARY:
Two GPFS nodes deadlocked
PROBLEM CONCLUSION:
Corrected a timing error in lock revoke if the file had
been removed from a hash table
------
APAR: IY16838 COMPID: 5765B9500 REL: 130
ABSTRACT: SPBGADM NOT ADDED TO /ETC/SYSCTL.MMCMD.ACL ON MIGRATION
PROBLEM DESCRIPTION:
As of GPFS 140 an entry for the root.SPbgAdm principal is
required in the /etc/sysctl.mmcmd.acl file. This entry gets
added on a new GPFS install but doesn't appear to get added on
a migration. It does appear in /usr/lpp/mmfs/sysctl.mmcmd.acl
file but nowhere does it tell you to add it to the active
sysctl.mmcmd.acl file. It either needs to get added
automatically or the migration instructions should tell you
how to add it manually.
LOCAL FIX:
Add the following line to the /etc/sysctl.mmcmd.acl file
_PRINCIPAL root.SPbgAdm
PROBLEM SUMMARY:
during migration from GPFS 1.2 to 1.4, the spbg
adm entry was not getting added to the mmfscmd.acl file which
is required at this level of the code.
PROBLEM CONCLUSION:
update the acl file in the post install scri
pt if needed.
------
APAR: IY16868 COMPID: 5765D5100 REL: 320
ABSTRACT: DIAG -C -D CSS0 BRINGS UP PROBLEM DETERMINATION SCREEN
PROBLEM DESCRIPTION:
executing diag -c -d css0 should not take the user to ELA screen
s, it appears css0 diagnostic method does not use the -c flag
anymore.
I think we want to check DA_CONSOLE_TRUE before calling ela_run
damode bits runing diags -c:
DA_CONSOLE_FALSE 0x00080000
da mode bits when running diags without the -c flag:
DA_CONSOLE_TRUE 0x00040000
PROBLEM SUMMARY:
When running diags -c -d css0, the diag method was calling
diagrpt without first checking for a console causing the
user to be prompted.
PROBLEM CONCLUSION:
The diag method first checks for a console before running
diagrpt.
------
APAR: IY16890 COMPID: 5765D6100 REL: 220
ABSTRACT: AFTER RESTARTING LL,(LLSTATUS -R) OUTPUT IS NOT SHOWING THE
PROBLEM DESCRIPTION:
After restarting LL 2.2 (possibly other realeases) and release
one of the jobs we found out that llstatus -R was not reflecting
the correct consumable resources. For example, it was showing
16 CPUs available when 4 of them were already consumed by the
running job.
PROBLEM SUMMARY:
When jobs using consumable resources
are running in LoadLeveler and
LoadLeveler got restarted, those restarted jobs
would now run without using any consumable resources.
Therefore, llstatus -R would not show any consumable
resources used.
PROBLEM CONCLUSION:
When jobs using consumable resources
are running in LoadLeveler and
LoadLeveler got restarted, those restarted jobs'
consumable resources were now saved.
Therefore, llstatus -R would now show consumable
resources used.
------
APAR: IY16940 COMPID: 5765D5100 REL: 320
ABSTRACT: PANIC: KLAPIEXT:_RETRANSMIT_PKT
PROBLEM DESCRIPTION:
While running a heavy I/O workload on the Colony 128way
system (single single), a node panic'd with klapi in its
dump.
IAR: .panic_trap+0 (00012568): tweq r1,r1
LR: . klapiext:_retransmit_pkt +e0 (0165139c)
2ff22d18: . klapiext:_handle_tmr_pop +288 (01663c0c)
PROBLEM SUMMARY:
Recursive calling a function causes retransmit_pkt not
working correctly
PROBLEM CONCLUSION:
Remove the function call to provent from recursive calling.
------
APAR: IY16959 COMPID: 5765E2820 REL: 430
ABSTRACT: ABENDA57A IN SUPOS_MUTEXWAITLOCK BECAUSE COMSU_XPSENDLOCK PASSED
PROBLEM DESCRIPTION:
AbendA57A occured in SupOS_MutexWaitLock after receiving
numerous abend A147 due to ECI_TIMEOUT on the Universal
Client.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
The region ended abnormally and reported an A57A error code
during the processing of an intercommunication control element
(ICE) chain. The process was attempting to place a mutex lock
on an ICE entry which had already been freed.
PROBLEM CONCLUSION:
The processing logic now determines if the ICE entry has been
freed prior to attempting a mutex lock. The lock is not placed
if the entry has been freed. The abnormal termination no longer
occurs.
------
APAR: IY16981 COMPID: 5765D5100 REL: 320
ABSTRACT: ADD -P FLAG TO PTPECTRL USAGE STATEMENT
PROBLEM DESCRIPTION:
A new flag (-p Set Poll Count) was added to the ptpectrl command
While the -p flag is documneted in the ptpectrl man page, it
does not appear in the usage statement that is displayed when
you run ptepctrl -h. Customers would not know this feature is
available to them.
PROBLEM SUMMARY:
The -p flag (Set Poll Count) is documented in the ptpectrl
man page, but does not appear in the usage statement that
is displayed when you run ptpectrl -h.
PROBLEM CONCLUSION:
The -p flag (set poll count) will be displayed when you
run "ptpectrl -h", or when you enter an invalid flag,
for example "ptpectrl -a -x".
------
APAR: IY16986 COMPID: 5765D5100 REL: 320
ABSTRACT: NGCREATE (NGNEW) NO LONGER ACCEPTS A PERIOD IN THE NODE GROUP NA
PROBLEM DESCRIPTION:
Trying to create a node group with a period in the name yields
2530-004 could not create Node group name xxxx
2517-493 xxxx is an invalid node group name
The problem can be recreated by doing a: ngnew -G group.name
It looks like some changes that were done to
SpNodeGroup::ValidName in SpNodeGroup.C to accomodate NLS
character sets caused this - internal defect 46068 file versions
1.47 and 1.48
LOCAL FIX:
Do not use periods in node group names
PROBLEM SUMMARY:
ngnew is no longer allowing the name of a node group to
contain a ".". Attempts to create a node group with a name
containing a "." result in the following messages:
ngnew: 2530-004 Could not create Node Group xx.x.
2517-493 'xx.x' is an invalid node group name.
The function SpNodeGroup::ValidName was not allowing "."
as a valid character in a node group name.
PROBLEM CONCLUSION:
The function SpNodeGroup::ValidName was modified to
allow "." as a valid character in a node group name.
As a result the ngnew command will now allow "."
as a valid character in a node group name.
------
APAR: IY17008 COMPID: 5765D6100 REL: 220
ABSTRACT: NON-FORWARDABLE CREDS NOT IDENTIFIED W/ NEW AUTHENT PAIR
PROBLEM DESCRIPTION:
non-forwardable creds not identified w/ new authent pair
PROBLEM SUMMARY:
Using the lldelegate/llimpersonate DCE_AUTHENTICATION_PAIR,
jobs submitted by a user without forwardable creds
(no -f option on dce_login) go to the hold state. A user can
release
the jobs from hold by executing 'llhold -r', however jobs
will
run without dce credentials. No notification is given to
the user
as to why the job goes to hold. Obscure messages are
written to the
SchedLog, but they are far from clear as the reason, and
require an
administrator to become involved in a simple user error.
Also, the lldelegate/llimpersonate DCE_AUTHENTICATION_PAIR
requires that
LoadLeveler be run with DCE_ENABLEMENT=TRUE. Since this
pair
of programs run with root suid permission, it is not
necessary
for all of LoadLeveler to run with DCE_ENABLEMENT=TRUE if
all
that is required is to delegate DCE credentials.
PROBLEM CONCLUSION:
Changes are made to the SP Security Services library
(libspsec.a)
and the DCE library (libdce.a), to enable the ability for
LoadLeveler
to detect when a user has non-forwardable credentials when
submitting
jobs. Normally, LoadLeveler expects to delegate forwardable
credentials
along with a job. Non-forwardable credentials are not fully
functional and
some user applications may not work properly with them.
LoadLeveler is
changed so that when a user, submitting a job, has
non-forwardable credentials,
LoadLeveler will display a warning message, and proceed to
delegate the
non-forwardable credentials with the job.
Also, LoadLeveler is changed so that the
lldelegate/llimpersonate
DCE_AUTHENTICATION_PAIR may be specified without requiring
that
LoadLeveler be run with DCE_ENABLEMENT=TRUE.
------
APAR: IY17010 COMPID: 5765D5100 REL: 320
ABSTRACT: NON-FORWARDABLE CREDS NOT IDENTIFIED W/ NEW AUTHENT PAIR
PROBLEM DESCRIPTION:
non-forwardable creds not identified w/ new authent pair
PROBLEM SUMMARY:
Using the lldelegate/llimpersonate DCE_AUTHENTICATION_PAIR,
jobs submitted by a user without forwardable creds
(no -f option on dce_login) go to the hold state. A user can
release
the jobs from hold by executing 'llhold -r', however jobs
will
run without dce credentials. No notification is given to
the user
as to why the job goes to hold. Obscure messages are
written to the
SchedLog, but they are far from clear as the reason, and
require an
administrator to become involved in a simple user error.
Also, the lldelegate/llimpersonate DCE_AUTHENTICATION_PAIR
requires that
LoadLeveler be run with DCE_ENABLEMENT=TRUE. Since this
pair
of programs run with root suid permission, it is not
necessary
for all of LoadLeveler to run with DCE_ENABLEMENT=TRUE if
all
that is required is to delegate DCE credentials.
PROBLEM CONCLUSION:
Changes are made to the SP Security Services library
(libspsec.a)
and the DCE library (libdce.a), to enable the ability for
LoadLeveler
to detect when a user has non-forwardable credentials when
submitting
jobs. Normally, LoadLeveler expects to delegate forwardable
credentials
along with a job. Non-forwardable credentials are not fully
functional and
some user applications may not work properly with them.
LoadLeveler is
changed so that when a user, submitting a job, has
non-forwardable credentials,
LoadLeveler will display a warning message, and proceed to
delegate the
non-forwardable credentials with the job.
Also, LoadLeveler is changed so that the
lldelegate/llimpersonate
DCE_AUTHENTICATION_PAIR may be specified without requiring
that
LoadLeveler be run with DCE_ENABLEMENT=TRUE.
------
APAR: IY17027 COMPID: 5765D5100 REL: 320
ABSTRACT: REFERENCES TO HAL IN PIPE2INIT.C SHOULD BE REMOVED
PROBLEM DESCRIPTION:
references to hal in pipe2init.c should be removed
PROBLEM SUMMARY:
"HAL" word should not be exposed in any MPI error messages.
PROBLEM CONCLUSION:
The change made to ensure if an error message is generated
by HAL, the customer sees "INTERNAL ERROR CODE" instead of
"HAL ERROR CODE".
------
APAR: IY17028 COMPID: 5765D5100 REL: 320
ABSTRACT: MODIFY MPCI_STATS_T FOR BINARY COMPATABILITY
PROBLEM DESCRIPTION:
modify mpci_stats_t for binary compatability
PROBLEM SUMMARY:
This fix restores binary compatibility with 3.1.1 .
PROBLEM CONCLUSION:
PSSP 3.2 version of MPCI statistics (used only by Informix)
broke binary compatibility with PSSP 3.1.1 .
------
APAR: IY17030 COMPID: 5765B9501 REL: 320
ABSTRACT: SELF DEADLOCK IF BUFFER STEAL IS LAST RELEASER
PROBLEM DESCRIPTION:
self deadlock if buffer steal is last releaser
PROBLEM SUMMARY:
Timing error caught in development. Might result in deadlock
PROBLEM CONCLUSION:
Correct locking error.
------
APAR: IY17034 COMPID: 5765B9501 REL: 330
ABSTRACT: SELF DEADLOCK IF BUFFER STEAL IS LAST RELEASER
PROBLEM DESCRIPTION:
self deadlock if buffer steal is last releaser
PROBLEM SUMMARY:
Timing error caught in development. Might result in deadlock
PROBLEM CONCLUSION:
Correct locking error.
------
APAR: IY17051 COMPID: 5765B9500 REL: 130
ABSTRACT: FIX POST INSTALL SCRIPTS
PROBLEM DESCRIPTION:
fix post install scripts
PROBLEM SUMMARY:
minor correction to post install script
PROBLEM CONCLUSION:
minor correction to post install script
------
APAR: IY17085 COMPID: 5765D6100 REL: 220
ABSTRACT: LL COMPATABILITY WITH FUTURE RELEASES OF AIX
PROBLEM DESCRIPTION:
LL compatability with future releases of AIX
PROBLEM SUMMARY:
LoadLeveler code needed for compatability with
future release of AIX.
PROBLEM CONCLUSION:
LoadLeveler code for compatability with
future release of AIX.
------
APAR: IY17100 COMPID: 5765B9501 REL: 330
ABSTRACT: REVOKE HANDLER WAS RETURNING WITHOUT SENDING A REPLY MESSAGE.
PROBLEM DESCRIPTION:
revoke handler was returning without sending a reply message
when it was unable to find the file in the hash table.
PROBLEM SUMMARY:
Two GPFS nodes deadlocked
PROBLEM CONCLUSION:
Corrected a timing error in lock revoke if the file had
been removed from a hash table
------
APAR: IY17105 COMPID: 5765D5100 REL: 320
ABSTRACT: ADD -A FLAG TO TO UNFENCEVSD WHICH WILL UNFENCE ALL FENCED VSDS
PROBLEM DESCRIPTION:
Due to the syntax of the unfencevsd command. You must
explicitly list all of the vsd's you want to unfence on
the commandline:
unfencevsd -v vsd_name_list -n node_list
It is difficult to construct an unfencevsd command that list all
the VSDs on a node for large number of VSDs.
Add -a flag to unfencedvsd which will unfence all fenced VSDs
on the specified node(s).
LOCAL FIX:
An -a flag will be added to unfencevsd command
unfencevsd -a -n node_list
which will unfence all fenced VSDs on the specified
node(s).
PROBLEM SUMMARY:
There are failure cases where all vsds on a node
are left in a fenced state. Recovery from this
state is very cumbersome, due to the syntax
of the unfencevsd command where you must list
all of the vsds that you want to unfence.
PROBLEM CONCLUSION:
The (un)fencevsd command will add a "-a" option to
(un)fence all vsds from a particular node.
fencevsd {-a|-v vsd_name_list} -n node_list
unfencevsd {-a|-v vsd_name_list} {-n node_list -f | -r}
------
APAR: IY17106 COMPID: 5765D5100 REL: 320
ABSTRACT: GPFS FENCING IS SLOW
PROBLEM DESCRIPTION:
gpfs fencing is slow
PROBLEM SUMMARY:
GPFS/VSD fencing is taking an excessive amount of time.
PROBLEM CONCLUSION:
The FlushIO script is used on the servers to flush any
outstanding IO in the last phase of fencing. Several
performance enhancements have been made; such as calling
lower level routines and forking more work in parallel.
------
APAR: IY17140 COMPID: 5765D5100 REL: 320
ABSTRACT: WHEN RUNNING SETUP.SERVER NTP_CONFIG FAILS
PROBLEM DESCRIPTION:
when running setup.server ntp_config fails
PROBLEM SUMMARY:
If the /etc/ntp.conf configuration file is absent when
ntp_script is run, sysout shows "0403-057 Syntax error"
pointing to compose 15 . That, in turn, leads to a
"0016-042 Problem found" message.
The /etc/ntp.conf file that is created is, however, correct.
It is (and should be) a copy of the sample configuration
file, "/usr/lpp/ssp/config/ntp.conf.base".
PROBLEM CONCLUSION:
A variable name in the ntp_config script should have been
declared integer. The problem only shows up when the /etc
directory is lacking an ntp.conf file. It has been
corrected.
------
APAR: IY17152 COMPID: 5765D5100 REL: 320
ABSTRACT: PSSPFP_SCRIPT USES MKDIR WITH INVALID SYNTAX, SO THAT
PROBLEM DESCRIPTION:
Incorrect syntax in psspfb_script:
$mkdir -p 755 $K4FILES 2>/dev/null
if umask for root user is not equal 022, the above command
will create /spdata/sys1/k4srvtabs with wrong permissions.
Correct syntax:
$mkdir -p -m 755 $K4FILES 2>/dev/null
LOCAL FIX:
Change mkdir statement in psspfb_script using correct syntax
(see error description).
PROBLEM SUMMARY:
During a node's installation or customization, if the
directory /spdata/sys1/k4srvtabs does not exist, it will
be created. It should be created with permissions of 755,
but if the umask on the system is not 022, the permissions
of the directory may not be 755. psspfb_script had the
incorrect syntax for the mkdir of /spdata/sys1/k4srvtabs.
PROBLEM CONCLUSION:
Modified psspfb_script, so that during a node's installation
or customization, if the directory /spdata/sys1/k4srvtabs
does not exist, it will be created with permissions of 755,
regardless of the umask setting on the node.
------
APAR: IY17169 COMPID: 5765D5100 REL: 320
ABSTRACT: THE DEFAULT IN THE SPADAPTRS SHOULD BE -N
PROBLEM DESCRIPTION:
The default in the spadaptrs should be -n
PROBLEM SUMMARY:
Effective with APAR IY15889, available in ssp.basic 3.2.0.8,
the default for the -n flag of spadaptrs has changed to
"no" on systems with an SP_Switch2 switch. The smit
panel for adding additional adapter database information
should also be updated to show "no" for the -n flag
on systems with an SP_Switch2 switch.
PROBLEM CONCLUSION:
The smit panel for adding additional adapter database
information has been updated to show "no" for the
-n flag on systems with an SP_Switch2 switch.
------
APAR: IY17187 COMPID: 5639I0920 REL: 430
ABSTRACT: CICSLU GROWTH
PROBLEM DESCRIPTION:
The cisslu process growths in a short period of time to 100mb
and crashes the system when using FEPI LU0 calls.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
The Front End Processing Interface (FEPI) start process did not
release and detach threads when it finished using them. This
caused a memory leak in the CICS logical Unit (CICSLU) process
when FEPI LU0 calls were made, and resulted in the abnormal
termination of the region.
PROBLEM CONCLUSION:
The FEPI start process now releases and detaches the threads
which are no longer in use. The region no longer terminates
abnormally.
------
APAR: IY17190 COMPID: 5765E2820 REL: 430
ABSTRACT: CCIN CAUSING HANG WHEN CICS ATTEMPTS TO UNINSTALL
PROBLEM DESCRIPTION:
Merrill reported a cicsip abnormal termination. On further,
investigation it was concluded that the problem was with
hard install of the terminal. Client was trying to hard instal
a terminal with a netname that was already in use. This
caused the hard install to fail and CICS tried to do an suto-
install becasue devType was also specified. During auto-
install conflict was again detected. In order to make sure
that the old entry is not an orphan, autoinstall tried to ping
client associated with the old entry (which happens to be
same as the new entry) and did not received a reply. As a
result, autoinstall decided to delete all the terminal that were
associated with the client (which included the new entry). It
succeed in deleting the old entries but got stuck at the new
entry as it was in use. It tried 12 times and then force purged
the task that was associated with the new entry i.e. itself.
We believe that could have caused the AS to go down while
holding a mutex and as a result the region got hung later.
We should not try to ping the client if the client associated
with the new entry is same as the one associated with the
old.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
When the CICS Communications Definition Install (CCIN)
transaction attempted to logon using a terminal ID whose
netname was already in use, the reuse of the netname caused the
hard install of the terminal to fail. The CCIN transaction
continued with the autoinstall process using the existing ID
while it attempted to ping the client to ensure that the
duplicate ID was not an orphan process. The client never
replied to the ping because it was busy attempting to install
the new terminal ID. When the CCIN process failed to receive a
reply to its ping, it called a processing loop that attempted
to shut down all the terminal entries for the client. When the
loop reached the duplicate terminal ID, it failed to complete
because the ID was in use. As a result, the autoinstall process
never completed.
PROBLEM CONCLUSION:
When it is determined that the new terminal install ID and the
old terminal install ID are from the same client, the ping is
not issued to the client. The call to the processing loop that
attempts to shut down all terminals entries for the client is
avoided, and the process is able to complete.
------
APAR: IY17195 COMPID: 5765E2820 REL: 430
ABSTRACT: U5701 ABEND IN COMSU_XPOPEN
PROBLEM DESCRIPTION:
Abend U5701 is being raised in the ComSU_XPOpen process. This is
being caused by a duplicate entry on the internal processing
queue. This queue is then processed and the first entry is handl
ed correctly but when the second entry is attempted the internal
sequence number is wrong and the abend occurs.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
An U5701 abnormal termination error occurred when the
ComSU_XPOpen function attempted to free an intercommunication
control block (ICB) entry that was on the send queue in a
wait-sleep state. During the wait-sleep state, the thread
processing in the send queue returned ICB entry to the free
list. When the ComSU_ClearAndFreeICB function woke, it also
freed the first ICB entry, which caused duplicate IDs for the
same ICB entry.
PROBLEM CONCLUSION:
The code has been changed so that the send thread ignores the
ICB entry when it has a ComSU_ClearAndFreeICB function waiting
for it.
------
APAR: IY17202 COMPID: 5765D5100 REL: 320
ABSTRACT: ADAPTREVC FSD CUSTOMIZABLE SHELL NEEDS CLASS TYPE
PROBLEM DESCRIPTION:
adaptrevc fsd customizable shell needs class type
PROBLEM SUMMARY:
The /usr/lpp/ssp/adapt_recv_debug shell script may be used
by development to help debug adapter errors caused by css
software. The adapter id and error classification are needed
to assist in future debug efforts.
PROBLEM CONCLUSION:
Pass the error classification of the adapter error and
adapter id to the customizable shell script
/usr/lpp/ssp/adapt_recov_debug. Used for purposes debug
only.
------
APAR: IY17207 COMPID: 5765B9501 REL: 320
ABSTRACT: DEADLOCK IN RENAME ON MULTIPLE NODES
PROBLEM DESCRIPTION:
Deadlock in rename on multiple nodes
PROBLEM SUMMARY:
GPFS deadlock when running multiple renames.
PROBLEM CONCLUSION:
GPFS locking error is rename
------
APAR: IY17223 COMPID: 5765B9500 REL: 140
ABSTRACT: SPBGADM NOT ADDED TO /ETC/SYSCTL.MMCMD.ACL ON MIGRATION
PROBLEM DESCRIPTION:
As of GPFS 140 an entry for the root.SPbgAdm principal is
required in the /etc/sysctl.mmcmd.acl file. This entry gets
added on a new GPFS install but doesn't appear to get added on
a migration. It does appear in /usr/lpp/mmfs/sysctl.mmcmd.acl
file but nowhere does it tell you to add it to the active
sysctl.mmcmd.acl file. It either needs to get added
automatically or the migration instructions should tell you
how to add it manually.
LOCAL FIX:
Add the following line to the /etc/sysctl.mmcmd.acl file
_PRINCIPAL root.SPbgAdm
PROBLEM SUMMARY:
during migration from gpfs 1.2 to 1.4, the spbg
adm entry ws not getting added to the mmfscmd.acl file
which is required at this level of code.
PROBLEM CONCLUSION:
update the acl file in the post instsall scr
ipt is needed
------
APAR: IY17225 COMPID: 5765B9501 REL: 330
ABSTRACT: DEADLOCK IN RENAME ON MULTIPLE NODES
PROBLEM DESCRIPTION:
Deadlock in rename on multiple nodes
PROBLEM SUMMARY:
GPFS deadlock when running multiple renames.
PROBLEM CONCLUSION:
GPFS locking error is rename
------
APAR: IY17227 COMPID: 5765D5100 REL: 320
ABSTRACT: CORRECTIONS TO USER ERROR PATHS
PROBLEM DESCRIPTION:
corrections to user error paths
PROBLEM SUMMARY:
These fixes only impact the ce running the cediags. They
are important for field use.
PROBLEM CONCLUSION:
These are corrections to user error paths and to support
some new hardware for the cediag program.
------
APAR: IY17231 COMPID: 5765D5100 REL: 320
ABSTRACT: SP EXCLUSIVE ACCOUNTING CREATES NEGATIVE NUMBERS IN THE
PROBLEM DESCRIPTION:
when using sp exclusive accounting - the /var/adm/fee files
some times has negative numbers - which makes no sense.
This is caused whenan acctg files AIX fstat in acctexcl.c
returns an error which can not be duplicated. A suggested
solution is for us to handle the fstat error in acctexcl.c
PROBLEM SUMMARY:
When customers use excluse_node accounting,
there are instances when the /var/adm/fee
file has a job charge for a userid, that
is negative.
For example:
4077 jet 0 0 0 0 0 0 0 0 0 0 0 0 -961019740.000000 0 0 0
4077 jet 0 0 0 0 0 0 0 0 0 0 0 0 961020006.000000 0 0 0
This only occurs when:
a job is running using exclusive node accounting
AND
nrunacct (SP accounting) program is run on the node
before the user's job is finished.
We attempt to calculate how much time userid's
have had excluse node usage thus far, even though
their job has not completed yet.
Acctexcl is called from nrunacct and tries to
access the input file statistical buffer (stat_buf).
However, were calling the function to access this
structure, fstat, with incorrect parameters, resulting
in 0 values in the stat_buf fields. Acctjob (also run
from nrunacct) uses this value, which is 0, to calculate
the total job charge fee for each user. Since we started
with 0, the algorithm returns a negative number, which
is incorrect. The code was changed to point to the
correct address for the stat_buf structure.
PROBLEM CONCLUSION:
nrunacct (SP accounting) program calls
executable acctexcl when SP exclusive
accounting is set to true.
If a job is still running when nrunacct
is run, we calculate how much time so far,
userid's have had exclusive node usage.
acctexcl is called from /usr/lpp/ssp/bin
nrunacct to keep track of the start and
end times of a job, per user.
In acctexcl we were using fstat to get
the address of the file statistical buffer
(stat_buf), passing it a file descriptor
pointer and the address for stat_buf.
Within the structure stat_buf, we were
using the field stat_buf.st_mtime, which
is the last data modification for a
specific userid. This field was always
0. This 0 value was then passed to
acctjob, which is also run in nrunacct.
Acctjob calculates the fee that should be
charged a userid for exclusive node usage.
This algorithm is based on subtracting
a value from the stat_buf.st_mtime value,
resulting in a negative number.
The code was changed to point correctly to
the real address of the stat_buf structure,
in which st_mtime value is correct.
OLD:
--> fstat(fd_acct, &stat_buf);
stb.ac_btime = stat_buf.st_mtime;
NEW:
--> if (stat(argv 1 , &stat_buf) == 0) {
stb.ac_btime = stat_buf.st_mtime;
------
APAR: IY17238 COMPID: 5765D5100 REL: 320
ABSTRACT: LAPI PROGRAM HANGS WITH RT_GRQ=OFF IN SOMECASES
PROBLEM DESCRIPTION:
Colony: LAPI program hangs with RT_GRQ=off in somecases.
With RT_GRQ=off we have a situation where our interrupt handler
is trying to aquire a lock that another thread had and
released but due to the large dispatch time of the thread that
wants the lock versus the thread releasing the lock we can get
into a race condition which looks like a hang.
With RT_GRQ=on we do not see this problem.
PROBLEM SUMMARY:
With RT_GRQ=off set, programs that use LAPI as the transport
layer can hang.
PROBLEM CONCLUSION:
Change our locking from GET lock to TRY lock.
TEMPORARY FIX:
Run with RT_GRQ=on.
------
APAR: IY17240 COMPID: 5765B9501 REL: 320
ABSTRACT: DISK ALLOC INFO NOT UPDATED IN TIMELY MANNER
PROBLEM DESCRIPTION:
disk alloc info not updated in timely manner
PROBLEM SUMMARY:
Performance optimization when dealing with larger file
systems
PROBLEM CONCLUSION:
Refresh bit map of disks available for allocation when new
per-disk free if received.
------
APAR: IY17242 COMPID: 5765B9501 REL: 330
ABSTRACT: DISK ALLOC INFO NOT UPDATED IN TIMELY MANNER
PROBLEM DESCRIPTION:
disk alloc info not updated in timely manner
PROBLEM SUMMARY:
Performance optimization when dealing with larger file
systems
PROBLEM CONCLUSION:
Refresh bit map of disks available for allocation when new
per-disk free if received.
------
APAR: IY17244 COMPID: 5765B9501 REL: 330
ABSTRACT: BETTER REVOKE MSG ERROR HANDLING
PROBLEM DESCRIPTION:
better revoke msg error handling
PROBLEM SUMMARY:
Optimize recovery handling of a node failing while
processing a token revoke.
PROBLEM CONCLUSION:
Code optimization
------
APAR: IY17245 COMPID: 5765B9501 REL: 330
ABSTRACT: ASSERT IN FSCK
PROBLEM DESCRIPTION:
assert in fsck
PROBLEM SUMMARY:
Fix an assert that occurred in development of a future
release/
PROBLEM CONCLUSION:
Correct error in creation of log files in fsck.
------
APAR: IY17246 COMPID: 5765B9500 REL: 140
ABSTRACT: QUORUM FAILURE ADDING SECOND NODE WITH SINGLENODEQUORUM
PROBLEM DESCRIPTION:
quorum failure adding second node with singlenode quorum
PROBLEM SUMMARY:
if operating on two node quorums, consistency p
roblems will occur if you do not restart the daemons after
adding a node.
PROBLEM CONCLUSION:
require a restart of all nodes if adding a
node to a system specified as two node quorum
------
APAR: IY17248 COMPID: 5765B9501 REL: 330
ABSTRACT: INCREASE MAX DISKS PER FILESYSTEM TO 2048
PROBLEM DESCRIPTION:
increase max disks per filesystem to 2048
PROBLEM SUMMARY:
internal limit optimization
PROBLEM CONCLUSION:
internal limit optimization
------
APAR: IY17250 COMPID: 5765B9501 REL: 320
ABSTRACT: INCREASE MAX DISKS PER FILESYSTEM TO 2048
PROBLEM DESCRIPTION:
increase max disks per filesystem to 2048
PROBLEM SUMMARY:
internal limit optimization
PROBLEM CONCLUSION:
internal limit optimization
------
APAR: IY17258 COMPID: 5765B9501 REL: 330
ABSTRACT: NODESETID INITIALIZATION
PROBLEM DESCRIPTION:
nodesetid initialization
PROBLEM SUMMARY:
do not allow GPFS to start on a node which
is not properly configured
PROBLEM CONCLUSION:
put a check into gpfs startup that insures
that gpfs has been properly configured into a nodeset
------
APAR: IY17259 COMPID: 5765D5100 REL: 320
ABSTRACT: EIEIO IS NOT ENOUGH
PROBLEM DESCRIPTION:
eieio is not enough
PROBLEM SUMMARY:
Because the relatively long signal's path inside adapter,
some writing procedures into adapter memory may not complete
in the same order, as they came in. Additional formal
reading will garantee this ordering.
PROBLEM CONCLUSION:
Put additional formal reading for garantee of completion of
previous writing into a registry.
------
APAR: IY17280 COMPID: 5765B9501 REL: 330
ABSTRACT: FSCK LOSTBLOCK HANDLING, SG DESC CHECK
PROBLEM DESCRIPTION:
Fsck lostblock handling, SG desc check
PROBLEM SUMMARY:
Correct timing window in fsck after multiple failures
PROBLEM CONCLUSION:
Correct timing window in fsck after multiple failures
------
APAR: IY17305 COMPID: 5765B9501 REL: 320
ABSTRACT: ASSERT IN OPENFILE.H LINE 1245
PROBLEM DESCRIPTION:
assert in OpenFile.h line 1245
logAssert
releaseOwnedBuffersM
closeFile
mmfs_close_internal
mmfs_close
PROBLEM SUMMARY:
GPFS self check logic stopped GPFS reporting an
error in Openfile.h 1245
PROBLEM CONCLUSION:
Remove incorrect self check logic.
------
APAR: IY17329 COMPID: 5765D5100 REL: 320
ABSTRACT: FLUSH LOGFILES BEFORE EXITING FAULT SERVICE DAEMON
PROBLEM DESCRIPTION:
flush logfiles before exiting fault service daemon
PROBLEM SUMMARY:
The fault service daemon exited without any indication why.
Inspection of the code found a few places where the daemon
called exit() without ensuring that log files had been
flushed. So any log messages that pertained to a failure
may have been lost.
PROBLEM CONCLUSION:
Code that periodically flushed the log files had been
accidentally deleted from the fault service daemon main
loop. This fix restored the log flushes.
------
APAR: IY17331 COMPID: 5765D5100 REL: 320
ABSTRACT: REMOVE DEREFERENCE FROM SETTING OF UCODE PTR BOOT UCODE
PROBLEM DESCRIPTION:
remove dereference from setting of ucode ptr forboot ucode
PROBLEM SUMMARY:
Minor excess dereferences in the configuration method device
driver code caused the configuration mechanism to load an
undefined microcode file instead of the correct microcode
file.
PROBLEM CONCLUSION:
Excess dereference symbols were removed from the software
code in order to properly reference the correct microcode
file within the colony configuration method.
------
APAR: IY17332 COMPID: 5765D5100 REL: 320
ABSTRACT: DOUBLE SINGLE WITH A MIX OF 2MB AND 4MB CARDS WILL NOT CONFIGURE
PROBLEM DESCRIPTION:
double single with a mix of 2mb and 4mb cards will not configure
PROBLEM SUMMARY:
This fix is only for nodes with a mix of 2MB and 4MB colony
adapters in a double single configuration. Problems will
only occur with nodes that have a mix of 2MB and 4MB sram
equipped colony adapters. The links in /etc/microcode had
to be changed in order to properly load the correct
microcode file(s) for each adapter.
PROBLEM CONCLUSION:
Links in /etc/microcode had to be modified and in some cases
new links were created and the odm entries for each adapter
modified in order for both cards to load the correct
microcode file(s) properly.
------
APAR: IY17333 COMPID: 5765D5100 REL: 320
ABSTRACT: MISSING LINE FOR 2 PART NUMBERS OF 2MB SRAM CARDS
PROBLEM DESCRIPTION:
missing line for 2 part numbers of 2mb sram cards
PROBLEM SUMMARY:
There were extra part numbers for 2MB colony adapters that
were not included in the configuration method before this
fix. These new part numbers were introduced into the
configuration method in order to properly configure these
cards.
PROBLEM CONCLUSION:
There were extra part numbers for 2MB colony adapters that
were not included in the configuration method before this
fix. These new part numbers were introduced into the
configuration method in order to properly configure these
cards.
------
APAR: IY17334 COMPID: 5765E2820 REL: 430
ABSTRACT: WEBSPHERE 4.3 CICS ON AIX 4.2.1 AND AIX 4.3.1 PTF4 REFERENCE
PROBLEM DESCRIPTION:
Websphere 4.3 CICS on AIX 4.2.1 and AIX 4.3.1 PTF4 reference
apar.
This apar should be used when ordering the latest Websphere
CICS maintenance on the AIX 4.2.1 and AIX 4.3.1 operating
systems.
The PTF associated with this reference apar contains the fixes
for the following list of apars which were opened against
component id. 5765E2820:
IY16468 IY09593 IY13762 IY16469 IY16470 IY16471 IY16523 IY16472
IY16473 IY09478 IY16474 IY16475 IY12288 IY16476 IY16477 IY13161
IY13382 IY13756 IY16478 IY16479 IY14063 IY16480 IY15100 IY08501
IY15871 IY16796 IY17190 IY16959 IY17187 IY17195
LOCAL FIX:
PTF4 reference apar used for ordering.
PROBLEM SUMMARY:
USERS AFFECTED: CICS users, Release 430
PTF 4 CICS 4.3.0.4 for AIX
PROBLEM CONCLUSION:
IY16468 IY09593 IY13762 IY16469 IY16470 IY16471 IY16523 IY16472
IY16473 IY09478 IY16474 IY16475 IY12288 IY16476 IY16477 IY13161
IY13382 IY13756 IY16478 IY16479 IY14063 IY16480 IY15100 IY08501
IY15871 IY16796 IY17190 IY16959 IY17187 IY17195
------
APAR: IY17338 COMPID: 5765B9501 REL: 330
ABSTRACT: ASSERT IN OPENFILE.H LINE 1245
PROBLEM DESCRIPTION:
assert in OpenFile.h line 1245
logAssert
releaseOwnedBuffersM
closeFile
mmfs_close_internal
mmfs_close
PROBLEM SUMMARY:
GPFS self check logic stopped GPFS reporting an
error in Openfile.h 1245
PROBLEM CONCLUSION:
Remove incorrect self check logic.
------
APAR: IY17371 COMPID: 5765D5100 REL: 320
ABSTRACT: SPADAPTRS USING INCORRECT IP ADDRESS IN MSG 0022-395
PROBLEM DESCRIPTION:
spadaptrs using incorrect IP address in msg 0022-395
PROBLEM SUMMARY:
When spadaptrs issues message 0022-395 to indicate that
an IP address is already in use, the IP address listed
in the message is the starting IP address, instead of the
IP address that is already in use. The IP address that
is already in use should be included in the message.
PROBLEM CONCLUSION:
Modified spadaptrs, so that if message 0022-395 is issued,
the IP address in the message is the actual IP address that
is already in use and not the starting IP address from the
spadaptrs invocation.
------
APAR: IY17387 COMPID: 5765D5100 REL: 320
ABSTRACT: SPADAPTRS ERROR ON COLONY SWITCH AFTER IY15889
PROBLEM DESCRIPTION:
spadaptrs error on Colony Switch after IY15889
PROBLEM SUMMARY:
***********************************************************
* USERS AFFECTED: Users with IY15889 (available in *
* ssp.basic 3.2.0.8 ) installed on their *
* Control Workstation, who have a *
* Colony Switch and are using the *
* spadaptrs command to enter data for *
* css adapters to multiple nodes and *
* do not specify the n flag. *
* *
***********************************************************
* PROBLEM DESCRIPTION: Issuing spadaptrs on a system with *
* a Colony Switch, that has IY15889 *
* installed, to enter data for css *
* adapters on multiple nodes, without*
* specifying the n flag will result *
* in the same IP address being *
* assigned to all of the adapters. *
* *
***********************************************************
* RECOMMENDATION: If you have a Colony Switch on your *
* system and are issuing spadaptrs for *
* css adapters, specify the n flag. *
* *
***********************************************************
------
APAR: IY17388 COMPID: 5765D5100 REL: 320
ABSTRACT: COLONY2: WRAP TEST FAILS TO RECOGNIZE WRAP ASSEMBLY(D/S)
PROBLEM DESCRIPTION:
COLONY2: Wrap test fails to recognize wrap assembly(D/S)
PROBLEM SUMMARY:
This problem was caused when the fault service daemon
changed the way it set the chip id on the SP Switch2 chips.
The fault service daemon developer failed to change
the SP Switch Diagnostic Wrap Test, to keep the interface
change consistent. When the id changed the wrap test no
longer recognized the service packet, so the test failed.
PROBLEM CONCLUSION:
The wrap_test code was changed to be consistent with the
fault_service daemon, and will now recognize the
error/status packet.
------
APAR: IY17444 COMPID: 5765B9501 REL: 320
ABSTRACT: INVALID INDIRECT BLOCK IN-MEMORY DATA
PROBLEM DESCRIPTION:
Invalid indirect block in-memory data
PROBLEM SUMMARY:
Under an unusual situation, the in memory (not disk) copy of
the indirect block for a very large file became corrupted.
PROBLEM CONCLUSION:
Correct locking for a situation involving an access to three
indirect blocks worth of data.
------
APAR: IY17457 COMPID: 5765B9501 REL: 330
ABSTRACT: INVALID INDIRECT BLOCK IN-MEMORY DATA
PROBLEM DESCRIPTION:
Invalid indirect block in-memory data
PROBLEM SUMMARY:
Under an unusual situation, the in memory (not disk) copy of
the indirect block for a very large file became corrupted.
PROBLEM CONCLUSION:
Correct locking for a situation involving an access to three
indirect blocks worth of data.
------
APAR: IY17470 COMPID: 5765D6100 REL: 220
ABSTRACT: SLOW STARTING INTERACTIVE POE JOB CAN CAUSE SCHEDD TO DIE
PROBLEM DESCRIPTION:
If the LoadLeveler Central Manager does not respond within one
minute to an interactive POE job that is submitted, the Sched
Daemon will try to do Daemon communication using incomplete
data. That will cause the Sched Daemon to die. This is most
likely to happen if an external scheduler is being used.
LOCAL FIX:
None that is very practical.
PROBLEM SUMMARY:
***********************************************************
* USERS AFFECTED: Users with IY16023 (available in *
* LoadL.full 2.2.0.7) installed and they *
* use LoadLeveler to schedule *
* interactive POE jobs (either by using *
* resd option or because they are using *
* user-space) might experience this *
* problem. Using an external scheduler *
* probably increases the probability of *
* having this problem. *
* *
***********************************************************
* PROBLEM DESCRIPTION: If the Negotiator does not *
* respond within 1 minute of *
* submitting an interactive POE *
* job, the Scheduler Daemon *
* (schedd) will attempt to do *
* socket communication using *
* incomplete data. This will *
* usually cause a SIGBUS (signal *
* 10) error in the Scheduler *
* Daemon. The slow response from *
* the Negotiator is usually because *
* there currently is insufficient *
* resources to run the job. *
* *
***********************************************************
* RECOMMENDATION: If the Scheduler Daemon reaches the *
* point of doing the SIGBUS (signal 10), *
* it tends to get stuck in a loop of *
* restarting, and then generating *
* another SIGBUS (signal 10) about two *
* minutes later. If that happens, *
* shutting down LoadLeveler, on that *
* node, may not be successful, because *
* the Scheduler Daemon (schedd) may keep *
* running. In that case, a kill -9, of *
* the schedd, will be necessary. *
* Stopping the schedd, with kill -9, *
* might even stop the restart/SIGBUS *
* loop, until another interactive POE *
* job is slow to start. The long term *
* solution is to install APAR IY17470. *
* An Efix is also available. *
* *
***********************************************************
------
APAR: IY17566 COMPID: 5765D5100 REL: 320
ABSTRACT: SPMON -D -G ON COLONY WITH ISB HAS SWITCH LED 888
PROBLEM DESCRIPTION:
spmon -d -G on colony with ISB has switch LED 888
PROBLEM SUMMARY:
The spmon command section for switch only frames had not
been updated to include a SPSwitch2, which resulted in an
LED of 888 being displayed.
PROBLEM CONCLUSION:
The spmon command section for switch only frames was updated
to include a SPSwitch2.
------
APAR: IY17597 COMPID: 5765D9300 REL: 310
ABSTRACT: MODIFY MPCI_STATS_T FOR BINARY COMPATABILITY
PROBLEM DESCRIPTION:
modify mpci_stats_t for binary compatability
PROBLEM SUMMARY:
This fix restores binary compatibility with 3.1.1 .
PROBLEM CONCLUSION:
PSSP 3.2 version of MPCI statistics (used only by Informix)
broke binary compatibility with PSSP 3.1.1 .
------
APAR: IY17598 COMPID: 5765D9300 REL: 310
ABSTRACT: REFERENCES TO HAL IN PIPE2INIT.C SHOULD BE REMOVED
PROBLEM DESCRIPTION:
references to hal in pipe2init.c should be removed
PROBLEM SUMMARY:
"HAL" word should not be exposed in any MPI error messages.
PROBLEM CONCLUSION:
The change made to ensure if an error message is generated
by HAL, the customer sees "INTERNAL ERROR CODE" instead of
"HAL ERROR CODE".
------
APAR: IY17624 COMPID: 5765B9501 REL: 320
ABSTRACT: ASSERT: SUBBLOCKS== 0!!SUBBLOCKS==OFP->GETSUBBLOCKSPERFILEBLOCK
PROBLEM DESCRIPTION:
assert: subblocks==0!!subblocks==ofp->getsubblocksperfileblock
PROBLEM SUMMARY:
GPFS self check logic terminated GPFS when running
applications using datashipping
PROBLEM CONCLUSION:
Corrected logic involving datashipping buffer management
------
APAR: IY17632 COMPID: 5648C9802 REL: 430
ABSTRACT: SDK 1.3.0 PTF 5 : CA130-20010330
PROBLEM DESCRIPTION:
Fixes since PTF 4 (ca130-20010207) :
(Note: The descriptions here have been truncated.)
+--------+------+-------+------------------------------------+
|20010210|28296 | |Drag and Drop not working |
+--------+------+-------+------------------------------------+
|20010216|24401 | |jdb next command behaves like run |
+--------+------+-------+------------------------------------+
|20010216|27581 | |JAVA IMF Problem |
+--------+------+-------+------------------------------------+
|20010216|27583 | |JAVA IMF Problem? |
+--------+------+-------+------------------------------------+
|20010216|27831 | |AWT: Peer not created |
+--------+------+-------+------------------------------------+
|20010216|27930 | |loop in initializeAlloc when unable |
+--------+------+-------+------------------------------------+
|20010216|28173 | |JIT: incorrect code for SHIFT with l|
+--------+------+-------+------------------------------------+
|20010216|28413 | |Hang in UNIXProcess.waitFor() |
+--------+------+-------+------------------------------------+
|20010217|28248 | |ca130 PTFs + Wnn6 coredump on JP loc|
+--------+------+-------+------------------------------------+
|20010217|28429 |IY16689|The AIX MMI overwrites breakpoint op|
+--------+------+-------+------------------------------------+
|20010219|28561 | |JPDA stepping runs away on exception|
+--------+------+-------+------------------------------------+
|20010222|28478 | |Cursor Problem |
+--------+------+-------+------------------------------------+
|20010223|25769 | |Looping whilst in GC |
+--------+------+-------+------------------------------------+
|20010223|27916 | |JTextField highlights text incorrect|
+--------+------+-------+------------------------------------+
|20010227|25708 | |AIX window resize problem |
+--------+------+-------+------------------------------------+
|20010307|28042 | |Beans: Introspector does not allow t|
+--------+------+-------+------------------------------------+
|20010309|28215 | |Java IMF Problem |
+--------+------+-------+------------------------------------+
|20010313|27600 | |JAVA IMF Problem |
+--------+------+-------+------------------------------------+
|20010313|28885 | |SUNBUG: 4359598 |
+--------+------+-------+------------------------------------+
|20010315|29039 | |NPE in javax.swing.JTree.AccessibleJ|
+--------+------+-------+------------------------------------+
|20010316|25666 | |Garbage collection / thread dispatch|
+--------+------+-------+------------------------------------+
|20010317|28928 | |Strings clipped in TextField in JP |
+--------+------+-------+------------------------------------+
|20010317|28970 | |Multi-Select true-Only in JFilechoos|
+--------+------+-------+------------------------------------+
|20010317|29016 | |JFrame: select Input Method in Engli|
+--------+------+-------+------------------------------------+
|20010317|29320 | |Garbage collection / thread dispatch|
+--------+------+-------+------------------------------------+
|20010320|28934 | |MMI doesn't check for breakpoints af|
+--------+------+-------+------------------------------------+
|20010320|29278 |IY17559|AIX MMI can notify JVMDI of exceptio|
+--------+------+-------+------------------------------------+
|20010329|28930 | |Javac does not follow symbolic links|
+--------+------+-------+------------------------------------+
|20010329|28978 | |System menu 'Select input method' te|
+--------+------+-------+------------------------------------+
|20010329|29386 | |jni_GetPrimitiveArrayElements does n|
+--------+------+-------+------------------------------------+
|20010329|29476 | |Sockets should not use SO_REUSEADDR |
+--------+------+-------+------------------------------------+
------
APAR: IY17700 COMPID: 5765B9501 REL: 320
ABSTRACT: SIGSEGV IN MHWRITEDATA PROLOG WHEN OUT OF DATASHIPPING MODE
PROBLEM DESCRIPTION:
SIGSEGV in mhWriteData prolog when out of datashipping mode
Thread trace back ...
0xD028977C compare_and_swap() + 0x30
0x101E4BBC mhWriteData(TscMsgHeader*,void*,const NodeIncarnation
0x100075C8 tscHandleMsg(TscMsgHeader*,void*,const NodeIncarnatio
0x10023AEC RcvWorker::main() + 0x114
0x1002393C RcvWorker::thread(int) + 0x84
0x1000E99C Thread::callBody(Thread*) + 0x9C
0x10175328 Thread::callBodyWrapper(Thread*) + 0x98
0xD0131358 _pthread_body() + 0xD0
0xFFFFFFFC
PROBLEM SUMMARY:
Incorrect handling of a message received after data shipping
was turned offf caused a segmentation violation
PROBLEM CONCLUSION:
Correct test in data shipping.
------
APAR: IY17918 COMPID: 5765B8100 REL: 220
ABSTRACT: CHP/DTBE PLAYING NON INTERRUPTABLE PROMPT ALWAYS CLEARS DTMF
PROBLEM DESCRIPTION:
When DTBE play a prompt non interruptable, the DTMF buffer is
cleared before the prompt is played, irrespective of whether the
user has requested this.
To make DTBE a cross platform implementation, this functionality
should be removed in order to match the NT behaviour.
PROBLEM SUMMARY:
CHP/DTBE PLAYING NON INTERRUPTABLE PROMPT
ALWAYS CLEARS DTMF
PROBLEM CONCLUSION:
Removed unwanted call to clear dtmf buffer
on force played prompt.
------
APAR: IY18081 COMPID: 5765C1101 REL: 134
ABSTRACT: XL SMP RUNTIME 1.3.4.0 MAINTANENCE LEVEL
PROBLEM DESCRIPTION:
XL SMP RUNTIME 1.3.4.0 MAINTANENCE LEVEL
------
APAR: IY18110 COMPID: 5648C9802 REL: 430
ABSTRACT: FIX PROBLEMS WITH APAR IY17632
PROBLEM DESCRIPTION:
Problems with APAR IY17632 (SDK 1.3.0 PTF 5:
CA130-20010330).
PROBLEM CONCLUSION:
Ship new PTFs that resolve the packaging problems with
IY17632.
------
APAR: IY18158 COMPID: 5765C6401 REL: 442
ABSTRACT: HEAP/MEMORY DEBUG TOOLKIT 4.4.2.0 MAINT. LEVEL
PROBLEM DESCRIPTION:
HEAP/MEMORY DEBUG TOOLKIT 4.4.2.0 MAINT. LEVEL
PROBLEM CONCLUSION:
HEAP/MEMORY DEBUG TOOLKIT 4.4.2.0 MAINT.
LEVEL
------
APAR: IY18172 COMPID: 5765D5100 REL: 320
ABSTRACT: LATEST PSSP 3.2.0 FIXES AS OF MARCH 2001
PROBLEM DESCRIPTION:
This is the lastest PSSP ptf as of March 2001.
Order this apar to get all of the ptfs as of March 2001.
------
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]