|
Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com |
From: AIX Service Mail Server (aixserv
austin.ibm.com)Date: Tue Nov 06 2001 - 02:24:49 CST
APAR: IR43957 COMPID: 5697F4800 REL: 410
ABSTRACT: FWLOGMGMT: NEED NEW FEATURE TO FORCE ARCHIVING OF ALL LOG ENTRIE
PROBLEM DESCRIPTION:
Customer would like to be able to archive logs several times a
day.
PROBLEM SUMMARY:
To Enable logging multiple times in a day, The
"Days Until Archive" filed should be '0' and "fwlogmgmt -l"
should be scheduled to get invoked multiple times in a day using
cron facility in AIX.
PROBLEM CONCLUSION:
Code has been modified to allow archiving
multiple times a day.
------
APAR: IR44032 COMPID: 5697F4800 REL: 410
ABSTRACT: CONFIG CLIENT GETS DISCONNECTED (TCP RST) WHEN LISTING MORE THAN
PROBLEM DESCRIPTION:
When you try to list the active connections, the client will
get a message saying it has been disconnected and you need
to login again. This is caused by the server sending a tcp
Reset on that connection.
LOCAL FIX:
All the filters still work fine. You can list them from the
commandline with "fwfilter cmd=list".
PROBLEM SUMMARY:
The problem is with List() member function of
CfgObj class in cfgobj.cpp In List(), the buffer is created with
the size of 256 bytes for trace information.
PROBLEM CONCLUSION:
The parmlist exceeds the size of the buffer
allocated for traceinfo.
------
APAR: IR44162 COMPID: 5697F4800 REL: 410
ABSTRACT: TUNNEL ID DISAPPEARS WHEN RULE IS DISPLAYED IN GUI
PROBLEM DESCRIPTION:
when displaying a rule, the tunnel id does no appear in the GUI,
when the panel is saved the tunnel id is lost
PROBLEM SUMMARY:
TUNNEL ID DISAPPEARS WHEN RULE IS DISPLAYED IN
GUI
PROBLEM CONCLUSION:
removed string size limit in tunnel config
GUI dialog to read entire tunnel ID string.
------
APAR: IR44238 COMPID: 5697F4800 REL: 410
ABSTRACT: FW CRASH DUE TO STACK OVERFLOW
PROBLEM DESCRIPTION:
FW crash due to stack overflow caused by NAT
PROBLEM SUMMARY:
core produced in some situations with a stack
overflow.
PROBLEM CONCLUSION:
changed storage model to eliminate
possibility of stack corruption.
------
APAR: IR44297 COMPID: 5697F4800 REL: 410
ABSTRACT: CFGMGR COMMAND ON FW MACHINE FAILED WITH THE ERROR
PROBLEM DESCRIPTION:
On FW system machine, cfgmgr command failed with
the error
# cfgmgr
sh: /usr/sbin/fwmktun : not found.
/usr/sbin/fwmktun command seems to be called by
/usr/lib/methods/cfgipsec command.
PROBLEM SUMMARY:
Firewall's cfgipsec was calling an AIX tunnel
module that didn't ship with AIX.
PROBLEM CONCLUSION:
remove unsupported function call
------
APAR: IR44420 COMPID: 5697F4800 REL: 410
ABSTRACT: FTP BYE COMMAND CAUSE FTP SESSION HANGING UP
PROBLEM DESCRIPTION:
FTP BYE COMMAND CAUSE FTP SESSION HANGING UP WHEN USING
NAT BETWEEN AIX AND UNISYS 2000.
PROBLEM SUMMARY:
ftp session to unisys and certain VM hosts
hung at bye command.
PROBLEM CONCLUSION:
fixed nat code to properly handle these
special situations in the protocol with unisys and VM hosts.
------
APAR: IR44500 COMPID: 5697F4800 REL: 410
ABSTRACT: FWLOGD DUMPED CORE.
PROBLEM DESCRIPTION:
fwlogd dumped core.
DBX output is:
(dbx) t
strlen() at 0xd0169cf8
_doprnt(??, ??, ??) at 0xd0185850
vfprintf(??, ??, ??) at 0xd0183cd0
__syslog_r(??, ??, ??, ??) at 0xd01f4860
syslog(0x3, 0x200018e0, 0x2f5, 0x615f8, 0x0, 0x60014014,
0x6000e1ce, 0x 0) at 0xd01f4e34.() at 0x10000b14
APAR IR43682,IR44419 are opened for V420 Firewall but stack
entries on above dbx ius not same as IR44419. Also,customer's
system is v4.1.2 level so we need v412 level fix instead of
v414.
PROBLEM SUMMARY:
FWLOGD DUMPED CORE. SEGMENTATION FAULT IN STRLEN()was addressed
on January 8th of this year and delivered as an eFix. The code
patch has been integrated into all levels of the Firewall and
will be available in future releases.
------
APAR: IR44527 COMPID: 5697F4800 REL: 410
ABSTRACT: LESS THAN OR EQUAL TO RULE GOES TO ANY WHEN ENTERING RULE
PROBLEM DESCRIPTION:
When making a rule with the operation less than or equal to,
when you go back into the rule after saving it, the operation
is changed to any. This happens on 4.1.X and 4.2 on both AIX
and NT.
LOCAL FIX:
To get around this, we are currently just sing less than.
PROBLEM SUMMARY:
improperly terminated string caused the selecti
on of the <= operation to not match.
PROBLEM CONCLUSION:
fix comparator to properly terminate compare
string.
------
APAR: IR44558 COMPID: 5697F4800 REL: 410
ABSTRACT: PING NO RESPONCE
PROBLEM DESCRIPTION:
Customer issue ping -s 8 <firewall hostname> from
remote system (winnt,aix etc).
But this become no response.
This problem occur if he specify 8,7,6,5,4,3,2 and 1.
This is solid.
Therefore, he can't do load-balancing of F/W because
the load-balancing product use ping with lower than
8 bytes packet size.
This problem only occures on firewall V4.2.
PROBLEM SUMMARY:
sub 20 byte ICMP packets not forwarded
PROBLEM CONCLUSION:
Source for this module that did stack switching was not MP safe
so fixing the icmp header check running this module on an MP
machine caused the stack switching weakness to be exposed.
Now both problems have been rectified.
------
APAR: IR44627 COMPID: 5697F4800 REL: 410
ABSTRACT: MISLEADING ERROR "UNABLE TO ACTIVATE FUNTION %S."
PROBLEM DESCRIPTION:
when socks server fails to start the socks server name was not
copied to the error message from filters.
PROBLEM SUMMARY:
meaningless error message displayed when socks
server didn't start delayed addressing actual problems since it
was unclear which module was failing to activate.
PROBLEM CONCLUSION:
fixed path used to pass function name.
------
APAR: IR44963 COMPID: 5697F4800 REL: 410
ABSTRACT: REDESIGN TO REMOVE MEMORY-LIMT ON RULES PER CONNECTION
PROBLEM DESCRIPTION:
Message is displayed describing that user has exceeded
127 Rule Limit
PROBLEM SUMMARY:
memory allocated per connection limited rules
per connection to 127.
PROBLEM CONCLUSION:
changed filter implementation to grow memory
per connection to allow all rules requested.
------
APAR: IR45250 COMPID: 5697F4800 REL: 410
ABSTRACT: INSUFFICIENT ROUTES FOR SOCKS CONFIGS
PROBLEM DESCRIPTION:
Using socks from a subnet on the secure side, the client was una
ble to get FTP connectivity even though his traffic was hitting
the firewall and his filters were correct. Socks config
missing routes for client networks.
PROBLEM SUMMARY:
socks configuration doesn't contain routes that
aren't contiguous to Firewall so packets aren't forwarded.
PROBLEM CONCLUSION:
added support so that all routes on the mach
ine will be added to socks configs.
------
APAR: IR45519 COMPID: 5697F4800 REL: 410
ABSTRACT: CRASH IN FIREWALL 4.1 FILTER4
PROBLEM DESCRIPTION:
System crash with 888-102-300-0c0.
Dump analysis shows the following...
> t -mk
Skipping first MST
MST STACK TRACE:
0x0044beb0 (excpt=00000088:0a000000:00000000:00000088:00000106)
(intpri=0)
IAR: .dispatch+76c (00025088): sth r0,0x88(r2
4)
LR: .dispatch+5c0 (00024edc)
0044be78: call_dispatch_point+4 (00028a68)
0x0044ceb0 (excpt=00000000:00000000:00000000:00000000:00000000)
(intpri=0)
IAR: .e_block_thread+27c (000343d4): addi r3,0
xf8(r31)
LR: .e_block_thread+27c (000343d4)
31023e88: .e_sleep_thread+4c (00034af0)
31023ed8: .[filter4:fltr_in_chk]+7c (01649584)
31023f28: .[netinet:ipintr_noqueue2]+44c (05201198)
31023fc8: .[netinet:in_newstack]+24 (0521659c)
0044c9f0: .[netinet:in_flip_and_run]+54 (05201e18)
0044ca40: .[netinet:dogisr]+70 (05200b9c)
0044caa0: .dmx_8022_receive+380 (000baad4)
0044cb20: .[tok_demux:tok_receive]+21c (015cc6c4)
0044cbc0: .[stokdd:stok_rx]+5e8 (015c4658)
0044cc60: .[stokdd:rw_intr]+298 (015c04a4)
0044ccd0: .[stokdd:stok_intr]+444 (015c0fd0)
0044cd80: .i_poll_soft+e0 (0001c954)
0044cde0: .i_softmod+140 (0001c2c0)
0044ce70: flih_603_patch+cc (000289d4)
PROBLEM SUMMARY:
crash in fbetbl_expand
PROBLEM CONCLUSION:
corrected table growth algorithm to allow
table to expand quickly to handle connections.
------
APAR: IR45851 COMPID: 5697F4800 REL: 410
ABSTRACT: SOCKS SERVICES BEING OMITTED AFTER REGEN OF RULES
PROBLEM DESCRIPTION:
socks rules that were identical up to the last digit were being
omitted as duplicates.
PROBLEM SUMMARY:
socks rules that matched up to end of short
rule were dropped as duplicates.
PROBLEM CONCLUSION:
fixed config to properly check to end of rule
------
APAR: IR45987 COMPID: 5697F4800 REL: 410
ABSTRACT: CRASH IN FILTER4 AT E_SLEEP_THREAD
PROBLEM DESCRIPTION:
system crash on SMP machine at e_sleep_thread
PROBLEM SUMMARY:
Filter4 was crashing in situations where the
separate processors (this crash would only occur on an SMP machi
ne) were contending to a thread. stack trace showed this
contention in e_sleep_thread
PROBLEM CONCLUSION:
changed locking model to use thread safe
mechanisms.
------
APAR: IR46076 COMPID: 5697F4800 REL: 410
ABSTRACT: NAT LOG ENTRIES SHOW ICA0021W: LOG MONITOR - MISFORMATTED LOG DA
PROBLEM DESCRIPTION:
NAT logging entries are showing up as "misformatted log entries"
PROBLEM SUMMARY:
system calls left unexpected data in buffers
that were used to relay NAT events to log.
PROBLEM CONCLUSION:
changed catalog generation path tobypass
code that was sampling unexpected data.
------
APAR: IR46230 COMPID: 5697F4800 REL: 410
ABSTRACT: FOPEN ON /DEV/NULL FAILED,ERRNO 24
PROBLEM DESCRIPTION:
fopen on /dev/null failed,errno 24
Indicated the files limit had been reached but the problem was t
hat even when a file handle was unneccessary it was opened for
use.
PROBLEM SUMMARY:
handles were being left open and causing errors
at rule generation
PROBLEM CONCLUSION:
changed code logic to close uneccesary handl
es
TEMPORARY FIX:
setting nofiles=-1 in /etc/security/limits.
------
APAR: IR46301 COMPID: 5697F4800 REL: 410
ABSTRACT: GWAUTH CORE DUMPS WITH ERRPT OUTPUT
PROBLEM DESCRIPTION:
gwauth indicated as SOFTWARE PROGRAM ABNORMALLY TERMINATED
PROBLEM SUMMARY:
improper list of parameters caused core
in gwauth with strlen 80
PROBLEM CONCLUSION:
fixed parameter list in code.
------
APAR: IR46789 COMPID: 5697F4800 REL: 410
ABSTRACT: FWFILTER CMD=UPDATE FAILS IF USING MORE THAN 9 DEFINED ADAPTERS
PROBLEM DESCRIPTION:
If using more than 9 adapters, fwfilter cmd=update fails.
Error message is "Setup for specific apdapter fails -
specific(en5,en9". If fwfilter command fails, fw does not work
properly. FW is installed with following fixes:
fwaixutils420.tar, fwaixfilter420.tar, fwaixcfgcli420.tar.
PROBLEM SUMMARY:
updating filters was failing if one of the
adapter names was longer than three characters.
PROBLEM CONCLUSION:
buffer to hold names was enlarged to allow
longer names.
------
APAR: IR46790 COMPID: 5697F4800 REL: 410
ABSTRACT: FWUSER GIVES CORE WHEN NO PASSWORD SUPPLIED
PROBLEM DESCRIPTION:
Entering a command to add a user that would require a password
gives a coredump instead of an error message.
EX:
fwuser cmd=add secftp=password
PROBLEM SUMMARY:
Recent AIX Maintenance has changed the way AIX
user structures are initialized so core occurs when uninitialize
d structure is accessed.
PROBLEM CONCLUSION:
initialized variables before setting so that
code will work with all known maintenance for AIX.
TEMPORARY FIX:
use proper syntax (include password parameter in
initial fwuser command).
------
APAR: IR46793 COMPID: 5697F4800 REL: 410
ABSTRACT: TIGHTENING BUFFER CHECKING OF TELNETD
PROBLEM DESCRIPTION:
Within every BSD derived telnet daemon under UNIX the telnet
options are processed by the 'telrcv' function. This function
parses the options according to the telnet protocol and its
internal state. During this parsing the results which should
be send back to the client are stored within the 'netobuf'
buffer. This is done without any bounds checking, since it
is assumed that the reply data is smaller than the buffer size
(which is BUFSIZ bytes, usually).
PROBLEM SUMMARY:
possible exposure described in IY22029
PROBLEM CONCLUSION:
Modified the proxy telnet code with neccessary limit checking
statements on "netobuf" buffer to avoid the
buffer overflow.related to IY22029
------
APAR: IR46935 COMPID: 5697F4800 REL: 410
ABSTRACT: NAT MANYTO1 IS FAILING TO TRANSLATE MACHINES IN D'STREAM NETWORK
PROBLEM DESCRIPTION:
The customer has three networks behind the firewall. A 100.1.x.x
network directly behind
the firewall secure adapter - then there is a 222.2.x.x network
downstream from the 100.1.x.x.x
network - THEN there is a 10.x.x.x network downstream from the
222.2.x.x network.
The customer has a NAT many-to-one setup and all networks work
except the 10.x.x.x network. If he
uses a NAT Map for any machine in this network NAT will
translate the address, but as the trace
shows below the 10.x.x.x network is not being translated
correctly.
PROBLEM SUMMARY:
NAT many-to-one is failing to translate.
MAP entries WILL work.
PROBLEM CONCLUSION:
comparison was failling when the difference
between the downstream subnet and the contiguous subnet was too
large (222 -10 in this case). fixed comparison code to properly
compare addresses.
------
APAR: IR46943 COMPID: 5697F4800 REL: 410
ABSTRACT: MAIL IS RELAYED WHEN !,%, AND " ARE PART OF THE EMAIL ADDRESS
PROBLEM DESCRIPTION:
Mail is relayed when !,%, and " are part of the email address
For example:
Following patterns are not blocked (confirmed by running tests)
relay%orbz.org
publicdomain.com
"relay
orbz.org"
publicdomain.com
orbz.org!relay
publicdomain.com
PROBLEM SUMMARY:
inbound mail being relayed when domain chaining was simulated
using !, %, and '.
PROBLEM CONCLUSION:
checked for other domain separators in local part
------
APAR: IR46951 COMPID: 5697F4800 REL: 410
ABSTRACT: LOGMONITOR THRESHOLDS FAIL FOR CERTAIN NAT ICA TAGS
PROBLEM DESCRIPTION:
LogMonitor thresholds fail for certain NAT ICA tags
PROBLEM CONCLUSION:
Changed to properly update thresholds for all NAT log entries.
------
APAR: IR47129 COMPID: 5697F4800 REL: 410
ABSTRACT: AFTER UPGRADE FROM R420 TO R421 FIREWALL WILL CRASH THE SYSTEM A
PROBLEM DESCRIPTION:
system crashes during the reboot of the system with DSI in
netinet in_pcbhashlookup2
PROBLEM SUMMARY:
firewall crashes on reboot when IKE tunnels
present
------
APAR: IY15176 COMPID: 5765E5100 REL: 600
ABSTRACT: WRONG IP_TOS FIELD IS SET BY AIX/CS V6 WHEN RUNNING EE
PROBLEM DESCRIPTION:
When a ENTERPRISE EXTENDER EE linkstation is started the TCP/IP
handshaking is started via port 12000. AIX/CS incorrectly sets
the ip_tos field ip_tos=6. The field is used for flow control
and should be set according to following table:
APPN Transmisssion Priority ] Type of Service ] Destination UDP
LOW ] X'20' ] 12004
MEDIUM ] X'40' ] 12003
HIGH ] x'80' ] 12002
NETWORK ] X'C0' ] 12001
XID,TEST,DM,DISC ] X'C0' ] 12000
According to RED Book 'Migrating Subarea Networks'
SG24-5957 Chapter 5.1.2 Page 67
PROBLEM SUMMARY:
Cannot establish HPR/IP (Enterprise Extender or EE) connection a
cross a firewall that checks the ip_tos parameter in the IP header.
PROBLEM CONCLUSION:
Correctly set this parameter to hex 20,40,80 or C0.
------
APAR: IY15384 COMPID: 5765E5100 REL: 600
ABSTRACT: SYSTEM CRASH WITH SNA_V5ROUTER CODE. THE PROBLEM IS ENCOUNTERED
PROBLEM DESCRIPTION:
AIX system crash at
. sna_v5router:vba_account_buffer_out +68 (053104cc): twllti
r6,0x200
From the xmalloc -u and svmon output prior to this crash,
we can say this crash is encountered when we attempted to
establish more LU0 connections after the Overflow heap
allocation of the AIX system was about to be used up.
PROBLEM SUMMARY:
Crash in vba_mblk_to_buf_copy from nru_bind or vba_ips_getb from
nru_bind after severe memory and buffer shortages.
PROBLEM CONCLUSION:
Correct code to handle this error case.
------
APAR: IY15762 COMPID: 5765E5100 REL: 600
ABSTRACT: BUFFER CONGESTION ERROR.
PROBLEM DESCRIPTION:
We got server performance degradation problem.
'sna.err' shows a lot of 'buffer congestion error'.
There is a bug in CS/AIX if set_kernel_memory_limit and
set_buffer_availability = 0 is explicitly configured on
sna_node.cfg. CS/AIX incorrectly calculate available
memory and falsely detect memory shortages.
So this problem occurs even in cases that
flow traffic is so light.
PROBLEM SUMMARY:
Buffer shortage after setting config parameters to 0 and modest
memory usage.
PROBLEM CONCLUSION:
Treat 0 as if the det commands were not present, continue to all
ow non-zero values.
------
APAR: IY15945 COMPID: 5765B7300 REL: 520
ABSTRACT: CUMULATIVE MAINTENANCE #02 (CSD02) FOR MQSERIES FOR AIX V5.2
PROBLEM DESCRIPTION:
Service offering #02 (CSD02) for MQSeries for AIX V5.2 provides
fixes for the problems described in this APAR.
PROBLEM SUMMARY:
Service offering #02 (CSD02) includes fixes to the following
problems:
(to be added later)
PROBLEM CONCLUSION:
Cumulative maintenance #02.
------
APAR: IY15990 COMPID: 5765E5100 REL: 600
ABSTRACT: AIX/CS V6 GENERATES WRONG TCPIP OPTION FIELD
PROBLEM DESCRIPTION:
IP Packet generated by Enterprise Extender is wrong.EE code
puts 00000006 in the ip header option field which is wrong.
according to RFC 791.
Here the important trace output of a EE link station setup:
TOK: ==( 163 bytes transmitted on interface tr0 )== 11:07:52.03
TOK: 00000000 00401000 5a4f9acf 0020357a b401aaaa
TOK: 00000010 03000000 08004606 008db5c8 00001e11
TOK: 00000020 cb120927 00730927 07b90000 00062ee0
The ip header starts with 4606.. indicating ip v4 and 6 another
4 byte fields. The last 4 byte field is IP options=00000006
Because of the fact that there is no IP option in this case this
field should be x'00000000' or better the ip header should start
with 4506.. indicating that only five 4 byte fields follow.
Customers FIREWALL software does not forward an IP packet with
such an wrong option field.
PROBLEM SUMMARY:
Cannot establish HPR/IP (Enterprise Extender) connection through
a bridge because of an invalid options field in the IP header of
the packets sent by CS/AIX.
PROBLEM CONCLUSION:
Remove programming on invalid TCP option (as HPR/IP uses UDP).
------
APAR: IY17118 COMPID: 5765B7300 REL: 520
ABSTRACT: MISMATCH IN DATA SENT BY CICS MOVER TO MQSERIES 5.2
PROBLEM DESCRIPTION:
When MQSeries MVS CICS mover tries to send messages to MQSeries
AIX 5.2, after channel times-out, gets resync rejected error.
PROBLEM CONCLUSION:
This problem has been fixed and the fix will be shipped in PTF
U474779 for CSD02.
------
APAR: IY17132 COMPID: 5765E5100 REL: 600
ABSTRACT: REPEATED PF8 (SCROLL) KEY RESULTS IN DFH2001 MSG FROM THE MAINFR
PROBLEM DESCRIPTION:
Repeated PF8 key (scroll down) results in a DFH2001 message from
the mainframe and user loses session.
PROBLEM SUMMARY:
tn3270 client received error message DFH2001 from CICS because
the CS/AIX TN server sent data without CD. Session may be
taken down.
PROBLEM CONCLUSION:
Correct handling of race case where multiple user AID sequences
are queued at the TNserver waiting for direction following keybo
ard restore without CD from the Host program.
------
APAR: IY17428 COMPID: 5765D5100 REL: 330
ABSTRACT: DCR FOR BLOCKING MPI_SENDS AND RECEIVES TO DECREASE CPU USE.
PROBLEM DESCRIPTION:
In an application using blocking MPI sends and receives, there
is a great deal of cpu use associated with the send or receive
waiting for completion. The customer complains that this is
not necessary. It is possible to modify POE for
MPI send and receive to allow the customer to choose if
MPI sends and receives work as currently designed or work in
a new way that would use a great deal less cpu utilzation. Of
course, this could effect latency or response time of the send
or receive and therefore, should be an option. In some cases,
overall througput or performance could be significantly improved
by using the new option.
PROBLEM SUMMARY:
Support MP_WAIT_MODE=NOPOLL in PSSP 3.3.
as requested in a DCR by customer.
PROBLEM CONCLUSION:
Document the availability of MP_WAIT_MODE=NOPOLL.
It is documented in the ssp.css.README .
POE APAR IY23338 is also required Tto enable the
MP_WAIT_MODE=NOPOLL support. IY23338 is shipped
in service level ppe.poe 3.1.0.13 or later.
------
APAR: IY17482 COMPID: 5765E5100 REL: 600
ABSTRACT: UNABLE TO RE-ESTABLISH LU2 SESSION.
PROBLEM DESCRIPTION:
Able to login to an LU2 session and work. After logging off
sometime later, unable to re-establish another LU2 session again
PROBLEM SUMMARY:
LU1-3 application hangs up. Trace shows CONFIRMED not responded
to (although no data received from mainframe).
PROBLEM CONCLUSION:
Reject CONFIRMED in this state with SNA_STATE.
------
APAR: IY17599 COMPID: 5765D2000 REL: 500
ABSTRACT: CRASH IN SNA_V5ROUTER WITH IX85570 APPLIED.
PROBLEM DESCRIPTION:
MST STACK TRACE:
0x2ff402e0 (excpt=40404096:40000000:007fffff:40404096:00000106)
(intpri=11)
IAR: .[sna_v5router:nrm_rm_timer_deact_sess_proc]+90 (092da
640):lhz r3,0x56(r4)
LR: .[sna_v5router:nrm_rm_timer_deact_sess_proc]+28 (092da5d8)
2efa2a08: .[sna_v5router:nrm_init_to_rm_rec]+88 (092df698)
2efa2a48: .[sna_v5router:nrm_queue_handler]+b8 (092e03cc)
2efa2a88: .[sna_v5router:nba_dispatch_input]+290 (09156af0)
2efa2ae8: .[sna_v5router:nba_dispatch_process]+c8 (091571c4)
2efa2b38: .[sna_v5router:nba_scheduler]+200 (091579b4)
2efa2b98: .[sna_v5router:vpr_stream_uw_drive_scheduler]+2c (0914
b590)
PROBLEM SUMMARY:
0x2ff402e0 (excpt=40404096:40000000:007fffff:40404096:00000106)
(intpri=11)
IAR: . sna_v5router:nrm_rm_timer_deact_sess_proc +90 (092da
640):lhz r3,0x56(r4)
LR: . sna_v5router:nrm_rm_timer_deact_sess_proc +28 (092da5d8)
2efa2a08: . sna_v5router:nrm_init_to_rm_rec +88 (092df698)
PROBLEM CONCLUSION:
The crash occurs under a stress situation with a lot of APPC con
versations running every second. An internal timer has expired t
o deactivate a session but has found the incorrect control block
. I have corrected the code to prevent this happening.
------
APAR: IY17832 COMPID: 5765E5100 REL: 600
ABSTRACT: SNAADMIN COMMAND APPEARS TO HANG
PROBLEM DESCRIPTION:
Customer is experiencing a hang condition on the snaadmin comman
d when he issues the following command:
snaadmin add_dlc_trace ,resource_type=LS ,resource_name=ELGARC5R
The command executes but never returns to a prompt.
PROBLEM SUMMARY:
Snaadmin query commands fail after a period of time, when
Anynet is in the configuration.
PROBLEM CONCLUSION:
Correctly discard indication buffer in Anynet code.
------
APAR: IY17866 COMPID: 5765E5100 REL: 600
ABSTRACT: SNAADMIN HANG
PROBLEM DESCRIPTION:
snaadmin commands hang if commands issued after node_init
PROBLEM SUMMARY:
snaadmin status_all fails to complete, in fact query_downstream_
pu fails to complete when ACTPU has been rejected.
PROBLEM CONCLUSION:
Alter handling of ACTPU -ve RSP to clear internal data
structures correctly.
------
APAR: IY17925 COMPID: 5765E5100 REL: 600
ABSTRACT: CRASH IN V6 SNA_TRACE.
PROBLEM DESCRIPTION:
>trace -m
Skipping first MST
MST STACK TRACE:
0x2ff3b400 (excpt=3549144b:42000000:00008811:3549144b:00000106)
(intpri=0)
IAR: .[sna_trace:sixt_end_trace]+90 (0361fcac): stb r3,0x8(
r)
LR: .[sna_trace:sixt_end_trace]+34 (0361fc50)
2ff3a720: .[sna_v5router:vll_router_write_log_to_buf]+aa8
(0365c1ac)
2ff3a7c0: .[sna_v5router:vlm_write_log]+48 (0365b6d0)
2ff3a800: .[sna_v5router:nba_pd_print_var]+1524 (03663cd4)
2ff3ab20: .[sna_v5router:nba_pd_print]+638 (03662778)
2ff3acc0: .[sna_v5router:nba_pd_print_problem]+58 (03663fd8)
2ff3ad00: .[sna_v5router:vns_send_verb_to_cfg_daemon]+b0
(03681238)
PROBLEM SUMMARY:
The crash due to a window during termination, we are trying to
log an error just as we stop sna. It is possibly caused by a
sequence of repeated starts and stops.
PROBLEM CONCLUSION:
Protect code to prevent trying to make log / trace after
termination.
------
APAR: IY18065 COMPID: 5765D2000 REL: 504
ABSTRACT: SYSTEM HANGS COMPLETELY WITHOUT LED CODE
PROBLEM DESCRIPTION:
System hangs completely without LED Code.
PROBLEM SUMMARY:
System hang (single CPU), manaul dump shows looping in
vannx_parse_locate from vannx_locate_reply
PROBLEM CONCLUSION:
Corrected handling of unusual format locate.
------
APAR: IY18201 COMPID: 5765B7300 REL: 520
ABSTRACT: RUNMQCHL_ND CORE DUMPS AFTER CONVERSATION ENDS
PROBLEM DESCRIPTION:
With MQSeries for AIX 5.2, the runmqchl_nd process is started.
Transmission through AIX SNA Comms Server 6.0. When the
runmqchl_nd process ends after messages sent, it will produce
a core dump. This is due to Comms Server library being released
and a runmqchl_nd process thread becomes unstable.
PROBLEM CONCLUSION:
This problem has been fixed and the fix will be shipped in PTF
U474779 for CSD03.
------
APAR: IY18202 COMPID: 5765D2000 REL: 504
ABSTRACT: AFTP FROM MVS TO AIX FAIL IF THE FILE IS LARGER THAN 2GB
PROBLEM DESCRIPTION:
When I receive a large data set
(more than 3GB size) from MVS, aftp client(AIX) displays a
error message " The specified device or disk is full" and
stops. But there are enough free disks for that file system.
When I inspect the received file size, aftp always stops at
same size (2147483647 byte). The bf=true is set.
The code should be compiled with _LARGE_FILE defination.
PROBLEM SUMMARY:
Fix was to recompile enabling the large file option
------
APAR: IY18318 COMPID: 5765D2000 REL: 500
ABSTRACT: CRASH IN SNA_V5ROUTER STARTING A LINK.
PROBLEM DESCRIPTION:
MST STACK TRACE:
0xf0000000 (excpt=40404220:40000000:007fffff:40404220:00000106)
(intpri=11)
IAR: .[sna_v5router:ncs_notify_active]+28c (05123110):
lhz r4,0x1e0(r3)
LR: .[sna_v5router:ncs_fsm_ls_int]+cc8 (05124d08)
2ef62a18: .[sna_v5router:ncs_fsm_ls_int]+cc8 (05124d08)
2ef62ab8: .[sna_v5router:ncs_fsm_ls_ext]+f18 (0511d4c0)
2ef62b48: .[sna_v5router:ncs_dlc_to_cs_signals]+1d8 (0511ac50)
PROBLEM SUMMARY:
Crash in sna_v5router.
PROBLEM CONCLUSION:
Correct code to clear record of tg_number that is not used from
a failed link activation.
------
APAR: IY18511 COMPID: 5765D2000 REL: 500
ABSTRACT: CRASH IN SNA_V5ROUTER
PROBLEM DESCRIPTION:
The customer is receiving a number of back level APPC conversati
some of these are timing out (60 seconds) presumably because the
application is not running or is very slow. Subsequently an aapl
tries to receive a conversation (perhaps the for this same TP) a
fails, the application retries very quickly and eventually CS/AI
up a reused control block and crashes.
LOCAL FIX:
change config or application to prevent timeouts
PROBLEM SUMMARY:
Rare crash is back level application issues ALLOCATE (accept) a
long time after a dynamic load for that application has already
timed out.
PROBLEM CONCLUSION:
Problem can be avoided by correctly configuring the timeout for
oading the application. Fix made to close the window by checking
the internal control block before trying to preocess the intern
al RECEIVE_ALLOCATE message.
------
APAR: IY18712 COMPID: 5765E5100 REL: 600
ABSTRACT: "DON'T CHECK REMOTE NODE NAME" WITH "DON'T SEND LOCAL NODE NAME"
PROBLEM DESCRIPTION:
When setting "don't check remote node name" and "don't send
local node name" when configuring a Link Station with Xsnaadmin
tool, error messages 4097-7 and 4097-0 appears in sna.err.
LOCAL FIX:
Only selecting "don't check remote node name" without "don't
send local node name" works.
PROBLEM SUMMARY:
Cannot modify ls_attributes (such as don't send local node name)
on an existing ls. Pop-up in xsnaadmin and error log 4097-7.
PROBLEM CONCLUSION:
Corret parsing code to compare the correct length string in the
define_ls verb control block.
------
APAR: IY18932 COMPID: 5765B7300 REL: 520
ABSTRACT: MQCREATEBAG CAN FAIL WITH A SIGSEGV UNDER MQSERIES V5.2 ON UNIX
PROBLEM DESCRIPTION:
The mqCreateBag API call can fail under MQSeries v5.2 on
UNIX systems with a segmentation fault when called in the
context of a multithreaded process. When the parameters
passed to mqCreateBag are invalid it should instead return
MQRC_HBAG_ERROR.
LOCAL FIX:
A code change is required to correct this problem.
PROBLEM CONCLUSION:
This problem has been fixed and the fix will be shipped in the
following PTFs for CSD03:
A) MQSeries for V5.2 CSD03
AIX U478289
HP-UX (V10) U478290
HP-UX (V11) U478293
Sun Solaris U478291
Linux U478292
------
APAR: IY18957 COMPID: 5765E5100 REL: 600
ABSTRACT: MSG "FIELD PORT_NUMBER WAS SPECIFIED MORE THAN ONCE" WHEN
PROBLEM DESCRIPTION:
When trying to add a X.25 port, the msg
"Field port_number was specified more than once" is returned.
This happens even with the first and only port.
LOCAL FIX:
Use xsnaadmin to define the port
PROBLEM SUMMARY:
Cannot define QLLC port from smit. Get an error saying duplicate
port_number.
PROBLEM CONCLUSION:
Remove the prompt for port_number in CS/AIX (it is generated
automatically).
------
APAR: IY19120 COMPID: 5765E5100 REL: 600
ABSTRACT: 'SNA -D' PRODUCES INCORRECT OUTPUT UNDER JA_JP ENVIRONMENT.
PROBLEM DESCRIPTION:
After migrating CS V6.0.1.0, 'sna -d' command output
is incorrect in Japanese, Spanish, Portugese, Korean, Chinese
and Taiwanese.
PROBLEM SUMMARY:
In Japanese the sna -d output is badly laid out with vertical ba
rs all over the place and wrapping roung the end of lines.
PROBLEM CONCLUSION:
Correct translation process to remove the ve
rtical vars (which are delimiters in the catalog files).
TEMPORARY FIX:
Code update (ja_JP/sna.cat) supplied to customer
------
APAR: IY19187 COMPID: 5765E5100 REL: 600
ABSTRACT: SOME DEFUNCT PROCESSES WERE GENERATED.
PROBLEM DESCRIPTION:
Some defunct processes were generated when he issued
"sna stop" command.
It doesn't disappear until he did reboot aix.
PROBLEM SUMMARY:
Defunct processes seen when using Anynet and stopping node.
PROBLEM CONCLUSION:
Using setpinit to allow kproc processes to exit correctly.
------
APAR: IY19250 COMPID: 5765E5100 REL: 600
ABSTRACT: ALLOCATE_LISTEN TPS FAIL WITH 08460000.
PROBLEM DESCRIPTION:
ALLOCATE_LISTEN TPs fail with 08460000. The problem occurs
because the partner system is specifying TP names that
include 4 null characters at the end.
PROBLEM SUMMARY:
Incoming APPC attaches rejected 0864 when there are trailing
nulls on the TP name.
PROBLEM CONCLUSION:
Ignore trailing nulls on TP name in incoming attach to allow mat
ch with allocate_listen and provide same function as CS/AIX V4.
------
APAR: IY19436 COMPID: 5765D5100 REL: 320
ABSTRACT: RC.SP NOW FAILS TO NOTIFY ERRPT OF DISABLED PROCESSORS W/ BOOTUP
PROBLEM DESCRIPTION:
Can reproduce error as follows, using Hardware Perspectives,
where node notebook monitors processorsOffline + hostResponds.
Shutdown and power off multiprocessor node, such as Winterhawk-
II or Nighthawk-II. Open a TTY, then use SMS menu to disable
some of the processors (Select 3, then Select 8, then Select #
of Processor to toggle Disabled/Enabled in Configuration. Exit
SMS menu with 98 then 99 selections. Power on (and select auto
unfence if switch use) the node and wait until Host Responds
(and Switch Responds, if applicable) light is green. During
this time, upon power off, the Node Notebook Monitored condition
(last page) will show hostResponds to go into a "triggered"
state, while processorsOffline will go from "not triggered" to
"unknown" state. When power is on and Host Responds is back up,
the hostResponds goes back to a "not triggered" state, and the
processorsOffline is ALSO BACK TO A "not triggered" state. At
NO time is processorsOffline triggered. THis is because rc.sp
did not detect the disabled processors, because $SSP_INSTALL/
procs_installed did NOT detect the disabled processors. As a
consequence, the errpt was not updated with SYSMAN001_ER.
Because this specific error was missed, Perspectives could not
trigger processorsOffline.
LOCAL FIX:
NO FIX. Workaround is to use "lsdev -C | grep proc" to detect
which procN's are installed (some have proc0,proc1,proc4,proc5)
then use"lsattr -El proc<specific # identified>" and note if
it is in a "enable" or "disable" state.
PROBLEM SUMMARY:
Perspectives doesn't monitor the "processoroffline"
condition. When a multiprocessor node is rebooted after
a processor has been powered down, the display shows a "not
triggered" condition. It should say "triggered".
PROBLEM CONCLUSION:
The reboot script "rc.sp" has been changed to look for a
"processor offline" status. It had only been checking for
"processor installed", and missed the power down condition.
The code to do this has been relocated to a point in the
reboot process where the Event Manager has been activated.
It can then intercept error log messages that relate to
the node's processors.
------
APAR: IY19989 COMPID: 5765D5100 REL: 320
ABSTRACT: LARGE SYSTEM SSP 3.2 NODE INSTALLS YIELDS 10% CORRUPTED KEYFILES
PROBLEM DESCRIPTION:
There are two potential problems in regards to the Kerberos
keyfile being transmitted to the node during customization.
In the first case, get_keyfiles fails to transmit the
keyfile and the customization hangs at LED a69.
In the second case, the Kerberos keyfile is transmitted to
the node, but either a partial or a corrupted keyfile
is received on the node. In this situation, the node
completes customization successfully and it is not until
sometime later that the problem with the invalid Kerberos
keyfile is discovered.
LOCAL FIX:
It is not necessary to redo the install. The correct keyfiles
are already on the CWS. One can either re-customize the node
or copy the <node_name>-new-srvtab file from the CWS over to
the node (chmod 600).
PROBLEM CONCLUSION:
Code has been added to the customization process to verify
that the Kerberos keyfile has been transmitted successfully.
/spdata/sys1/k4srvtabs/<nodename>-new-srvtab-checksum
is created on the Kerberos Server for nodes at PSSP 3.2
or higher.
During customization, a tftpfile of this new file is done.
Then after the srvtab had been transmitted over the s1term,
a cksum of the srvtab is done and verified against the
value in <nodename>-new-srvtab-checksum.
If the values match, customization continues. If not,
a message is logged and the srvtab is retransmitted, up to
5 times.
------
APAR: IY20076 COMPID: 5765D5100 REL: 320
ABSTRACT: PRB HMREINIT ON HACWS
PROBLEM DESCRIPTION:
When the CWS that is connected to the SP frame supervisor
card through the S2 connector of the serial Y-cable, have the
resource group. In the SP supervisor card we have leds OK (led
3 green on, 7 amber flash slow, 4 and 8 off), everything works
fine, if we don't run hmreinit. If we run hmreinit in this CWS
(our primary CWS), the SP supervisor card leds will stay like
led 4 green on, 7 amber flash slow, 3 and 8 off and we loose
the serial connection.
LOCAL FIX:
A restart of the hardmon daemon doing stopsrc -s hardmon and
startsrc -s hardmon will correct the problem.
PROBLEM SUMMARY:
hmreinit when run on the BACKUP HACWS CWS,
loses the serial connection, therefore,
the output of spmon -d is incorrect. It
does not display frame or node information.
hmreinit issues hmcmds -G runpost, which
clears out the buffer area, which stores the
information of what port to listen on.
When hmreinit is run from ACTIVE BACKUP CWS,
it listens on port 2. However, after the
buffer area is cleared, it tries to listen
on port 1, which is connected to the INACTIVE
PRIMARY CWS.
PROBLEM CONCLUSION:
hmreinit code was enhances to issue
an /usr/bin/lshacws before issuing
hmcmds -g runpost command.
If CWS is an ACTIVE BACKUP, the
lshacws returns 32, and we do not
issue the runpost command.
------
APAR: IY20510 COMPID: 5765B7300 REL: 510
ABSTRACT: ENDMQLSR DOES NOT STOP THE LISTENER IF NUMBER OF PROCESSES >1000
PROBLEM DESCRIPTION:
If you use "runmqlsr" for the TCP/IP listener instead of inetd,
and shutdown the Qmanager while the number of processes exceeds
1000, the "runmqlsr" stays active and if you use "endmqlsr" to
stop the listener, you get a message that it couldn't find any
running listeners for the specified queue manager.
PROBLEM CONCLUSION:
This problem has been fixed and the fix will be shipped in the
following PTFs for CSD03:
A) MQSeries for V5.2 CSD03
Windows NT U200148
AIX U478289
HP-UX (V10) U478290
HP-UX (V11) U478293
Sun Solaris U478291
Linux U478292
------
APAR: IY20577 COMPID: 5765D6100 REL: 220
ABSTRACT: LOADL API/QUERY CALLS SLOW IN RETURNING
PROBLEM DESCRIPTION:
LoadL query request such as llstatus -l or llq and API calls
take a long time to return data. The more queries the longer it
takes to return.
PROBLEM SUMMARY:
Query api performance degrdes substantially when a large
number of queries occur at the same time.
PROBLEM CONCLUSION:
Global freelists used for commonly used objects are highly
referenced
by the negotiator in separate threads responding to the
various
query apis. Every reference requires obtaining a lock
protecting
the free list. Due to the high contention for this lock
when
many queries arrive at the same time, the queries tend to
run
serially which negates the value of running in separate
threads.
The solution is to move the free lists to thread specific
memory
which does not require locking.
------
APAR: IY20652 COMPID: 5765E5100 REL: 600
ABSTRACT: SNA SENSE CODE 080F6051 WHEN ACCESS SECURITY SUBFIELDS ARE X'00'
PROBLEM DESCRIPTION:
AIX/CS generates SNA sense code 0x'080F6051' on an incoming
ATTACH request. Here the important part of the ATTACH header:
RU: 2B0502FF 0003D100 4008E4D5 C9D9C3E5 ...
..J. .UNIRCV
4040180B 02000000 00000000 0000000B
01000000 00000000 00000001 5112FF30
Above ATTACH shows the access security subfields are present in
the right structure but are filled with x'00' instead of a UID
or PW. I assume this causes sense 080F6051.
Customer uses VSE attach manager and self written APPC BATCH
code at host site which seems to be responsible fot the ATTACH
header setup.
Anyway, the very same ATTACH is accepted by AIX/CS V4.2.
PROBLEM SUMMARY:
Cannot establish incoming APPC conversation, security failure, w
hen attach includes null userid and password parameters.
PROBLEM CONCLUSION:
Treat null parameters as if they were not present, bringing the
code into line with CS/AIX V4.
------
APAR: IY20739 COMPID: 5765B9500 REL: 140
ABSTRACT: PROBLEM CREATING A GPFS-1.4 NODESET WHEN THERE IS AN ACTIVE GPFS
PROBLEM DESCRIPTION:
If you create a nodeset from a node running GPFS-1.4 when
there is a GPFS-1.2 nodeset on the system the mmfs.cfg
information for the new nodeset may be lost.
PROBLEM SUMMARY:
when converting from gpfs 1.2 to gpfs 1.4 by
leaving one nodeset at 1.2 and one at 1.4, the configuration
file for the 1.4 nodeset is incorrect and the file system
doesn't moung.
PROBLEM CONCLUSION:
do not imbed the mmfs.cfg information in the
mmsdrfs file if there are still nodesets running the old mm
commands.
------
APAR: IY20768 COMPID: 5765B7300 REL: 510
ABSTRACT: AMQ9211 AND AMQ9500 MESSAGES WHEN MQPUT LEAKS CACHE MEMORY
PROBLEM DESCRIPTION:
When an MQPUT specifying MQPMO_SYNCPOINT is issued by an
application, MQSeries must allocate some storage from the
repository manager in order to record a queue manager
registration. It appears that a new registration is allocated
for every MQPUT under syncpoint, which causes the MQSeries
cluster repository manager to fail with AMQ9211 ("Error
allocating storage.") and AMQ9500 ("No Repository storage.")
messages. The MQPUT which encounters the failure may then hang
while MQSeries is clearing up.
LOCAL FIX:
A code fix is required for this problem. Note that this APAR
and IC29908 record similar symptoms, but the cause of each
problem is different.
PROBLEM CONCLUSION:
This problem has been fixed and the fix will be shipped in the
following PTFs:
A) MQSeries for V5.1 CSD08
OS/2 U200141
Windows NT U200142
AIX U474841
HP-UX (V10) U474877
HP-UX (V11) U474879
Sun Solaris U474878
B) MQSeries for V5.2 CSD03
Windows NT U200148
AIX U478289
HP-UX (V10) U478290
HP-UX (V11) U478293
Sun Solaris U478291
Linux U478292
------
APAR: IY20769 COMPID: 5765D5100 REL: 320
ABSTRACT: SP SWITCH 2 SCALING- NODES THAT ARE 'UNDER LOADL' MAY DROP OFF
PROBLEM DESCRIPTION:
Nodes can drop off the switch during Estart, fence, or unfence
on a very large SP Switch 2 system; this could occur on a
heavily loaded node in which the fault service daemon takes too
long to destroy its deviceDatabase.
PROBLEM SUMMARY:
The problem seen at LLNL was that standard nodes where
taking excessive amounts of time in the
destroyDeviceDatabase() routine. This function frees the
allocated memory for the fault service daemon device
database. This database contains all of the status, and
connection information for the entire switch network.
The function was taking so long that Estarts were failing,
because the nodes where not finishing fast enough.
PROBLEM CONCLUSION:
The change made to the fault service daemon, will move the
freeing of memory (database structures) to a separate
thread. This will allow the port thread to run faster,
allowing Estarts to finish faster on very large systems.
------
APAR: IY20851 COMPID: 5765D5100 REL: 320
ABSTRACT: WORM MSG 2547-662 NEEDS EXPECTED (IN ADDITION TO ACTUAL) INFO
PROBLEM DESCRIPTION:
Worm message 2547-662 needs expected connection information in
addition to the actual connection information that is already
printed.
PROBLEM SUMMARY:
Message 2547-662 only shows actual switch connection
information. The customer requested that it also show
the expected connection.
PROBLEM CONCLUSION:
Message 2547-662 is produced during Estart phase 1
exploration when the expected switch connections are not
yet known. This message is usually proceeded by message
2547-661 which suggests that the packet data received may
have been corrupted:
2547-661 Switch chip miswired, or the switch_plane
and/or the switch_plane_seq in Chip Location Register
corrupt or uninitialized
The expected connection data is not available and so
cannot be displayed.
However, while reviewing the code, we discovered a bug in
the handlePh1SwSvcResponse() and handlePh1NodeSvcResponse()
functions where in some cases a null pointer may be used
to create data for the ERRID_CS_SW_HARDWARE_ER errpt. In
this case the target frame and slot number in the errpt
will be incorrect. We used this APAR to fix that problem,
so that now if the target device pointer derived from the
packet data is NULL, -1 will be substituted for frame
and slot number in the errpt.
------
APAR: IY20874 COMPID: 5765D5100 REL: 320
ABSTRACT: CSS.SNAP SHOULDN'T DUMP SRAM BY DEFAULT
PROBLEM DESCRIPTION:
There is the potential for a node checkstop when css.snap
dumps Col SRAM. css.snap (cust or internal) should be more
selective in terms of dumping SRAM, to avoid the checkstop or
to greatly reduce the possibility of a checkstop. Removing the
SRAM dump altogether is not possible, because the SRAM contains
RAS data for certain Col ucode problems.
PROBLEM SUMMARY:
The collection of SRAM data sometimes caused nodes to
checkstop.
PROBLEM CONCLUSION:
The default behavior of css.snap has changed to not collect
SRAM data. The data will now be collected only if a specific
argument is passed in. (-d).
------
APAR: IY20971 COMPID: 5765D6100 REL: 220
ABSTRACT: LLCTL -G RECONFIG NOT RESETTING DEBUG FLAGS
PROBLEM DESCRIPTION:
When altering the debug flags in the LoadL_config file,
then issue a llctl -g reconfig, the debug flags are not reset
if the NEW debug flag keyword is blank. If the debug flag has
a NEW debug keyword of any type, it will be altered as expected.
PROBLEM SUMMARY:
In LoadLeveler, the llctl reconfig will not reset the
negotiator debug flag
if it is changed back to a blank line which mean the old
debug flags messages
are still written to the negotiator log.
PROBLEM CONCLUSION:
In LoadLeveler, the llctl reconfig will now reset the
negotiator debug flag
if it is changed back to a blank line which mean no debug
flags messages
are to be written to the negotiator log.
------
APAR: IY21030 COMPID: 5765D6100 REL: 220
ABSTRACT: LLQ -L -X SHOULD STOP TO SHOW WRONG VALUES FOR Q_SYSPRIO
PROBLEM DESCRIPTION:
llq -l -x shows incorrect values for q_sysprio
PROBLEM SUMMARY:
LoadLeveler's llq -l -x output is not the same as the llq -l
output for the q_sysprio value.
PROBLEM CONCLUSION:
LoadLeveler's llq -l -x output is gotten
from the Schedd while the llq -l output
is gotten from the central manager.
The q_sysprio and system priority values are only used by
the central
manager and have no value to the schedd.
Therefore, the q_sysprio and system priority output for llq
-l -x
would not be set.
This will be documented in the LoadLeveler manual and llq
man page.
------
APAR: IY21039 COMPID: 5765D6100 REL: 220
ABSTRACT: LLSUMMARY: THE LEADING ZERO DOES NOT SHOW ON DECIMAL PLACE.
PROBLEM DESCRIPTION:
In llsummary output:
Time: 0+00:00:01.40000 <- PROBLEM
Time: 0+00:00:01.040000 <- EXPECTATION
All values have 6 decimal places and if not there is a leading 0
not being shown due to the mathematical format and restraints.
PROBLEM SUMMARY:
The output of llsummary, when the -l option is used, may
show fractions of seconds for Starter User and System Time
and Step User and System Time that don't appear to add up
to the corresponding Starter and Step Total Times that are
shown in the output.
PROBLEM CONCLUSION:
The total seconds and fractions of a second time values
being printed are from two separate numbers. The
fractional part represents micro seconds. If the number
of micro seconds is less than 100000 (.1 seconds), then the
number is being shown with a missing leading zero. For
example, what should be .050000 is being shown as .50000.
That would make a Total Time value look like it is not the
correct sum of the User and System Time. The formating of
the microseconds value has been modified to make sure that
a leading zero is included when it should be.
------
APAR: IY21043 COMPID: 5765D5100 REL: 320
ABSTRACT: RC.SWITCH FALSELY DETECTS ANOTHER INVOCATION OF RC.SWITCH
PROBLEM DESCRIPTION:
rc.switch falsely detects another instance of rc.switch is
running, and then fails. The problem is due to the logic that
looks for other procs named rc.switch: the logic correctly
removes the PID of the current proc, but does not remove the
PIDs of child procs. The child procs (forked by the shellto
do the awk or grep) have the same name as the parent (i.e.,
rc.switch) after they are forked, but before they exec.
LOCAL FIX:
Run rc.switch manually.
PROBLEM SUMMARY:
rc.switch falsely detects it own child process as another
instance of rc.switch. The script issues a "ps" command
to see if there are other processes called rc.switch
running. Some of the time, the ps command may catch a
child process (for example, grep or awk) while it still
has the same name as its parent (i.e. "rc.switch"). The
result is that the fault service daemon does not get
started.
PROBLEM CONCLUSION:
By adding the "ppid" option to the ps command that looks
for other instances of rc.switch, children of the current
rc.switch process will be eliminated from consideration.
------
APAR: IY21146 COMPID: 5765E5100 REL: 600
ABSTRACT: CRASH IN SNA_V5ROUTER AT V6.0.1.0.
PROBLEM DESCRIPTION:
MST STACK TRACE:
0xf0000000 (excpt=00000000:42000000:00022011:3f2d5000:00000106)
(intpri=11)
IAR: . sna_v5router:nbm_free_buffer +54 (046d9b68): twllti
r4,0x200
LR: . sna_v5router:nbm_free_buffer +4c (046d9b60)
2ef629e8: . sna_v5router:nlm_route_actlu_rsp +28 (048b28e0)
2ef62a28: . sna_v5router:nlm_send_actlu_rsp_pos +84 (048b1ff8)
2ef62a68: . sna_v5router:nlm_action_sec_sscp_fsm +4b0 (048b4d88)
2ef62ac8: . sna_v5router:nlm_sec_sscp_fsm +7bc (048b4814)
PROBLEM SUMMARY:
Crash showing nbm_free_buffer called from nlm_route_actlu_rsp.
PROBLEM CONCLUSION:
Protect code to not release null pointer.
------
APAR: IY21195 COMPID: 5765D5100 REL: 320
ABSTRACT: PERSPECTIVES ERROR MSGS WITH V2.1 LIBDCE.A IN NON-DCE AUTHEN ENV
PROBLEM DESCRIPTION:
If a customer has V2.1 /usr/lib/libdce.a present,
with PSSP 3.2 (PTF 10) and AIX 4.33, using NON-
DCE Authentication,the invocation of either
Perspectives or Sphardware will result in the
following error messages:
0509-150 Dependent module /usr/lpp/ssp/bin
could not be loaded.
exec(): 0509-036 Cannot load program spsec_ldmod
because of the following errors:
0509-022 Cannot load module
/usr/lpp/ssp/bin/spsec_ldmod.
0509-150 Dependent module libdcepthreads.a
(dcepthreads_shr.o) could not be
loaded.
These were similar, but not the same errors
observed before IY17070 was applied in PTF
10 of PSSP 3.2. IY17070 corrected errors
with SDRChangeAttrValues when V2.1 libdce.a
was present in a non-DCE authentication
environment. Authough V2.1 libdce.a is not
"supported" with AIX 4.33, we have already
committed to its coexistence with PSSP 3.2
in a non-DCE environment with APAR IY17070.
The problem with Perspectives / Sphardware
is independent of the problem that was
fixed in APAR IY17070.
LOCAL FIX:
The workaround is to remove or rename
/usr/lib/libdce.a; however, there are
two independent customers who insist
that V2.1 libdce.a should be allowed
to coexist with PSSP 3.2 / AIX 4.33
in a non-DCE authentication environment.
PROBLEM SUMMARY:
Issuing sphardware on a CWS in a non-DCE environment will
result in error messages being issued if a level of DCE
prior to 3.1 exists on the system. The following messages
will be issued several times:
exec(): 0509-036 Cannot load program spsec_ldmod because
of the following errors:
0509-022 Cannot load module
/usr/lpp/ssp/bin/spsec_ldmod.
0509-150 Dependent module libdcepthreads.a
(dcepthreads_shr.o) could not be loaded.
0509-022 Cannot load module libdcepthreads.a
(dcepthreads_shr.o).
0509-026 System error: A file or directory in the
path name does not exist.
0509-022 Cannot load module /usr/lpp/ssp/bin.
0509-150 Dependent module /usr/lpp/ssp/bin could
could not be loaded.
The routines were trying to access a module that does
not exist in /usr/lib/libdce.a in the earlier version
of DCE.
PROBLEM CONCLUSION:
Modified code in perspectives and Event Management to
first verify that DCE is being used on the system,
prior to attempting to load the DCE libraries.
This will allow sphardware to be run in a non-DCE
environment when a level of DCE prior to 3.1 exists
on the system.
APAR IY21195 only provides a partial solution to this
problem. For a complete solution APAR IY22203, available in
rsct.clients.rte 1.2.1.1 or greater, must also be installed.
------
APAR: IY21212 COMPID: 5765D6100 REL: 220
ABSTRACT: NEGOTIATOR MESSAGE IN LLQ -S NOT UPDATED/BACKFILL SCHEDD.
PROBLEM DESCRIPTION:
Using the Backfill Scheduler, if a job is submitted and the
llq -s output says in the Negotiator Message that the user
has hit thier Jobs running limit at that time, when the limit
is no longer met, that Negotiator message does not get updated.
PROBLEM SUMMARY:
Using the backfill scheduler in LoadLeveler,
when an user hits the user max job limit,
a message is set in the NEGOTIATOR MESSAGE seen in
llq -l. However, this message is not reset even
if the user is no longer at his max job limit.
PROBLEM CONCLUSION:
Using the backfill scheduler in LoadLeveler,
when an user hits the user max job limit,
a message is set in the NEGOTIATOR MESSAGE seen in
llq -l. This message will be reset
if the user is no longer at his max job limit.
------
APAR: IY21458 COMPID: 5765D5100 REL: 320
ABSTRACT: PSSPFB_SCRIPT NOT HONORING SOME DCE SETTINGS
PROBLEM DESCRIPTION:
In DCE there is an environment variable TRY_PE_SITE which, when
set to 1, tells all DCE jobs to refer to the /etc/dce/security/
pe_site file for the preferred Security server. This is done
if a customer has more than one security server defined but
some are not reliable and are only used as backups.
When a node is installed, DCE is configured by psspfb_script
when it calls spauthconfig. This is when TRY_PE_SITE=1 is
added to /etc/environment. However, /etc/environment's new
settings do not take effect at that time, and any commands that
depend on DCE will not honor the TRY_PE_SITE setting. So
during a new install, psspfb_script may last very long and the
configuration of some principals may fail.
LOCAL FIX:
Set TRY_PE_SITE variable within psspfb_script
PROBLEM SUMMARY:
During a node's installation or customization at PSSP 3.2
or beyond, in a DCE environment, TRY_PE_SITE=1 is written
to /etc/environment. However, this variable is not
utilized until the next time a user logs into the node.
Additional DCE processing will be executed during the
node's installation or customization that should make
use of this setting.
PROBLEM CONCLUSION:
psspfb_script will set TRY_PE_SITE=1 on its calls to
spauthconfig so that DCE calls that may be made in
spauthconfig will be able to take advantage of this setting.
------
APAR: IY21486 COMPID: 5765D5100 REL: 320
ABSTRACT: VSD SCRIPT READFENCESDR DOES NOT SET PATH AND ENCOUNTERS PROBLE
PROBLEM DESCRIPTION:
Problem specifically deals with customer installation that
sets it's path so that the first paths are that of a gpfs
filesystem. This filesystem may not be available. The issue
is the the script /usr/lpp/csd/bin/readFenceSDR does not set
it's path and inherits the default.
PROBLEM SUMMARY:
RVSD recovery and hence GPFS recovery can hang
if GPFS has been placed in the PATH before
/usr/bin, /usr/sbin, and /etc.
PROBLEM CONCLUSION:
Several of the RVSD scripts will be modified to
"export PATH=/usr/bin:/usr/sbin:/etc:$PATH" so
that standard system commands do not get hung
due to GPFS being hung.
------
APAR: IY21600 COMPID: 5765D5100 REL: 330
ABSTRACT: SPGETDESC SUPPORT OF 6H1
PROBLEM DESCRIPTION:
spgetdesc support of 6H1
PROBLEM SUMMARY:
Translation of the type of server is now recognized by
spgetdesc. The type of server you are on can be obtained by
executing `uname -M` on the command line.
Executing spgetdesc will place the appropriate value in the
"description" attribute of the Node class for a node that is
a 6H1 Condor SP-attached server.
PROBLEM CONCLUSION:
Updated spgetdesc to recognize 6H1 condors.
------
APAR: IY21601 COMPID: 5765D5100 REL: 320
ABSTRACT: SPGETDESC SUPPORT OF 6H1
PROBLEM DESCRIPTION:
spgetdesc support of 6H1
PROBLEM SUMMARY:
Translation of the type of server is now recognized by
spgetdesc. The type of server you are on can be obtained by
executing `uname -M` on the command line.
Executing spgetdesc will place the appropriate value in the
"description" attribute of the Node class for a node that is
a 6H1 Condor SP-attached server.
PROBLEM CONCLUSION:
Updated spgetdesc to recognize 6H1 condors.
------
APAR: IY21612 COMPID: 5765D5100 REL: 320
ABSTRACT: HA_VSD SCRIPT DOES NOT CHECK THE RETURN CODE OF CFGVSD. IF CFGVS
PROBLEM DESCRIPTION:
ha_vsd script does not check the return code of cfgvsd. If cfgvs
d is not successful, rvsd daemon should not be started
PROBLEM SUMMARY:
ha.vsd will continue to bring rvsd up even if cfgvsd detects
critical configuration files are absent. This could take a
lot of time needlessly. Also, cfgvsd only checks if the
files exist. A processing error in readSDR can create the
file and leave it 0 length. This should be treated as if
the file was absent.
PROBLEM CONCLUSION:
ha_vsd will now exit without trying to bring rvsd up if
cfgvsd reports a configuration file missing or empty.
cfgvsd has also been modified to retry generating the
files 5 times before giving up.
------
APAR: IY21712 COMPID: 5622DJX00 REL: 211
ABSTRACT: APAR USED TO CREATE PTF FOR DB2 DATAJOINER V211 FOR AIX
PROBLEM DESCRIPTION:
APAR USED TO CREATE PTF FOR DB2 Datajoiner v211 for AI
LOCAL FIX:
APAR USED TO CREATE PTF FOR DB2 Datajoiner v211 for AI
PROBLEM SUMMARY:
APAR used to create PTF for datajoiner for AIX
PROBLEM CONCLUSION:
APAR used to create PTF for datajoiner for A
------
APAR: IY21737 COMPID: 5765D6100 REL: 220
ABSTRACT: TOTALVIEW ABORTS WITH MSG "PIPE ERROR"
PROBLEM DESCRIPTION:
TotalView aborts with msg "Pipe Error" in 24 out of 25 cases,
with the stack trace indicating that this happens somewhere in
LoadLeveler trying to get DCE credentials and write them out.
Invocation is:
"totalview poe -a executable -procs p -nodes n -infolevel 6"
PROBLEM SUMMARY:
When the authentication meathod is llgetdce a sigpipe is
generated causing debuggers to stop.
PROBLEM CONCLUSION:
The addition of sending data to the authentication process
for lldelegate caused a SIGPIPE to be generated when the
authentication process is llgetdce. LL is not bothered
by the signal, but debuggers stop processing on the signal.
------
APAR: IY21740 COMPID: 5765D5100 REL: 320
ABSTRACT: SPUNMIRRORVG FAILS TO REMOVE HDISK0
PROBLEM DESCRIPTION:
When trying to unmirror hdisk0 from rootvg using spunmirrorvg
command, customer gets error:
0016-687 Reducing the volume group rootvg by disk(s) hdisk0 had
a problem with a return code of 2.
LOCAL FIX:
If spunmirrorvg has already been run, first rerun spmirrorvg.
Then perform rmlvcopy, reducevg, bosboot, and bootlist on the
node to reduce the copy of hdisk0 from rootvg.
PROBLEM SUMMARY:
spunmirrorvg was unable to both reduce the number of copies
of a volume group and reduce the volume group by a
physical volume that was not listed in the pv_list attribute
of the Volume_Group object in the SDR.
spunmirrorvg first calls unmirrorvg to reduce the number of
copies of the volume group. The list of drives which are
to no longer contain mirrors are not passed as input
parameters. As a result, by default, unmirrorvg picks
the set of mirrors to remove from the mirrored volume
group. It is possible that unmirrorvg may remove the
mirror from the physical volume that is listed in the
pv_list attribute. Then when it tries to remove the
physical volume that is not listed in the pv_list
attribute via reducevg it fails.
PROBLEM CONCLUSION:
Modified spunmirrorvg to determine if there are any
physical volumes in the volume group that are not
listed in the pv_list attribute of the Volume_Group
in the SDR earlier in its processing. If there
are physical volumes which will no longer contain
mirrors, they are passed as input to unmirrorvg,
so that unmirrorvg does not pick which set of
mirrors to remove from the volume group.
------
APAR: IY21754 COMPID: 5765D5100 REL: 320
ABSTRACT: ADAPTER DIAGS WILL FAIL WHEN SWITCH NOT POWERED ON
PROBLEM DESCRIPTION:
Adapter diags will fail when switch not powered on
PROBLEM SUMMARY:
The problem occurred when switches and nodes were powered
off, and the nodes were powered on. When the node boots up,
adapter diags gets called by the configuration method.
Since the attached switch was NOT powered on, an error bit
was turned on the Interrupt Status Register (ISR). This
bit does not indicate a problem with the adapter, but simply
reports that the switch is not powered on. Adapter diags
has been changed to ignore this bit.
PROBLEM CONCLUSION:
The adapter diags have been changed to ignore a bit set
in the Interrupt Status Register. When the bit is ignored
the diags will run properly.
------
APAR: IY21755 COMPID: 5765D5100 REL: 320
ABSTRACT: 128WAY COLONY DS: CABLE_TEST DOES NOT COMPLETE, HARDMON ERROR
PROBLEM DESCRIPTION:
128way COLONY DS: cable_test does not complete, hardmon error
PROBLEM SUMMARY:
The cable_test tool encountered problems with s1term when
it repeatedly opened and closed sessions. The hardmon
function which managed the s1term with the switch supervisor
card, was getting out-of-synch with the numerous opens and
closes. Now cable_test will keep the s1term session until
the test is done with a given switch board.
PROBLEM CONCLUSION:
The cable_test tool has been modified to hold s1term
sessions open on ISB switches until testing is complete. The
previous version of cable_test repeatedly opened and closed
s1term connections.
------
APAR: IY21756 COMPID: 5765D5100 REL: 320
ABSTRACT: COLONY:CABLE_WRAP TESTS LEAVE SWITCH CHIPS UNINITIALIZED
PROBLEM DESCRIPTION:
Colony:Cable_wrap Tests leave Switch Chips Uninitialized
PROBLEM SUMMARY:
When using the cable_test tool on a large network, errors
can be injected when placing switch chips into differential
line test mode. The changes made should cleanup after the
test has completed.
PROBLEM CONCLUSION:
The cable_test tool has been modified to power off and on
the switches, thus clearing any switch errors.
------
APAR: IY21758 COMPID: 5765D5100 REL: 320
ABSTRACT: FREE LOCK STRUCTURES IN DEVICE UNCONFIG
PROBLEM DESCRIPTION:
free lock structures in device unconfig
PROBLEM SUMMARY:
internal driver resources were not being released at
unconfig time. If this happened repeatedly, it would
contribute to resource exhaustion. (Unconfig almost never
runs in the field, so it should not be a problem.)
PROBLEM CONCLUSION:
release resources correctly at unconfig time.
------
APAR: IY21760 COMPID: 5765D5100 REL: 320
ABSTRACT: BCASTPING DOES NOT WORK COMPLETELY
PROBLEM DESCRIPTION:
bcastping does not work completely
PROBLEM SUMMARY:
when no option "bcastping=0", ping the switch ip subnet
broadcast address still get "echo reply" from every node.
PROBLEM CONCLUSION:
In receiving path of switch ip interface driver, turn on
the M_BCAST flag in the IP datagram before deliver to IP
layer, ICMP layer will discard the "echo request" correctly.
------
APAR: IY21761 COMPID: 5765D5100 REL: 320
ABSTRACT: SYSTEM HUNG OR CRASHED DURING RECOVERY
PROBLEM DESCRIPTION:
system hung or crashed during recovery
PROBLEM SUMMARY:
system deadlock or crash during adapter recovery.
PROBLEM CONCLUSION:
Change boolean flag "xmtReady" to volatile, this will
prevent compiler to optimize the code path and generate
the correct result.
------
APAR: IY21765 COMPID: 5765D5100 REL: 320
ABSTRACT: WORM TIMEOUT VALUES NEED ADJUSTING
PROBLEM DESCRIPTION:
Other nodes can fall off the switch while trying to Efence or
Eunfence a different node.
PROBLEM SUMMARY:
Heavily loaded nodes fall off the switch during Estart,
Efence or Eunfence. The cause is that the primary does not
wait long enough for acknowledgements from nodes.
PROBLEM CONCLUSION:
Solution is to increase the amount of time that the primary
waits for acknowledgements from nodes.
------
APAR: IY21853 COMPID: 5765B9501 REL: 330
ABSTRACT: ASSERT: ...||OLDDISKADDRFOUND.COMPADDR(*OLDDISKADDRP)
PROBLEM DESCRIPTION:
ASSERT: ...||OLDDISKADDRFOUND.COMPADDR(*OLDDISKADDRP)
PROBLEM SUMMARY:
GPFS self check logic declared an error at
metadata.C, line 7425
PROBLEM CONCLUSION:
mnGetSubIndirectBlock does not need to fix
lastBlockSubblocks when the filesize changes on another
node now that RF and WF tokens both get the correct
lastBlockSubblocks from the metanode before allowing
read/write of the last block.
------
APAR: IY21854 COMPID: 5765B7300 REL: 510
ABSTRACT: MESSAGES STUCK ON THE XMITQ IN A CLUSTERING ENVIRONMENT
PROBLEM DESCRIPTION:
Arriving new messages on a transmission queue does not start the
cluster channel if it is inactive and messages are stuck on the
XMITQ.
LOCAL FIX:
Set the channels' DISCINT to 0 to keep the channels running
permanently as a temporary workaround.
PROBLEM CONCLUSION:
This problem has been fixed and the fix will be shipped in the
following PTFs:
A) MQSeries for V5.1 CSD08
OS/2 U200141
Windows NT U200142
AIX U474841
HP-UX (V10) U474877
HP-UX (V11) U474879
Sun Solaris U474878
B) MQSeries for V5.2 CSD02
Windows NT U200140
AIX U474779
HP-UX (V10) U474785
HP-UX (V11) U474837
Sun Solaris U474789
Linux U474836
C) MQSeries for V5.2.1 CSD01
Windows NT/2000 U200151
------
APAR: IY21869 COMPID: 5765D5100 REL: 320
ABSTRACT: RUN TIME OF SDR_CONFIG DOES NOT SCALE ON COLONY SYSTEM
PROBLEM DESCRIPTION:
A small section of new code in SDR_config for PSSP3.2 that deals
with the switch node number for the colony switch does not scale
well. Internal tests on a simulated 512-node system (with
nothing else going on) result in run times of about 1hr 22min.
Customers with that size system have reported run times of up to
two hours. The section of code that is eating up all of this
time is Get_SPS2_snn calling Find_SDR, which is done once for
each node, and the time it takes to return increases as the
number of nodes in the system increases.
PROBLEM SUMMARY:
When SDR_config is invoked on a large system the performance
is unacceptable. As the number of nodes in the system
increases the runtime of SDR_config increased exponentially.
The cause of this problem is in the function Get_SPS2_snn
which is called for each node and returns the next
available switch node number. In each call, the subroutine
goes through the entire list of nodes to determine the
next available switch node number, which is most cases
is not necessary.
PROBLEM CONCLUSION:
The Get_SPS2_snn function in SDR_config has been modified
to save the next available switch node number from
previous calls, unless the type of switch in the system
has changed. This dramatically improves the run time
of SDR_config on large systems.
------
APAR: IY21912 COMPID: 5765D5100 REL: 320
ABSTRACT: COLONY: ADAPTREVC MIC STATUS 6XX INTERRUPT BAD ON REC. SWITCH
PROBLEM DESCRIPTION:
colony: adaptrecv mic status 6xx interrupt bad on rec. switch
PROBLEM SUMMARY:
Unnecessary interrupt were generated by an error biton the
adapter being incorrectly enabled
PROBLEM CONCLUSION:
That additional bit was removed from MIC enables
------
APAR: IY21913 COMPID: 5765D5100 REL: 320
ABSTRACT: SDRGETOBJECTS CALL IN CSS.SNAP NEEDS FULL PATH
PROBLEM DESCRIPTION:
sdrgetobjects call in css.snap needs a full path
PROBLEM SUMMARY:
This fix is for css.snap. It fails when it is called from
the fault service daemon in certain cases as oppose to the
command line invocation with an (-a). The error messages
were seen in daemon.stderr. It was caused by a path
being incorrectly used. It is now fixed.
PROBLEM CONCLUSION:
This defect is fixed by adding the correct full path to the
logical if-else code path so that css.snap no longer fails
in this situation.
------
APAR: IY21915 COMPID: 5765D6100 REL: 220
ABSTRACT: USE OF LL PREFERENCES CAUSES NEGOTIATOR TO HANG
PROBLEM DESCRIPTION:
The use of preferences will sometimes result in a machine
being locked twice. This hangs the negotiator.
LOCAL FIX:
Remove any use of preferences in loadlevler command files.
PROBLEM SUMMARY:
When preference is used, the negotiator will hang due to
locking twice on the same machine.
PROBLEM CONCLUSION:
When preference is used, the negotiator will not hang
and jobs will run.
------
APAR: IY21946 COMPID: 5765D5100 REL: 320
ABSTRACT: MOVE ASSIGNMENT OF USER_DMA_AVAIL OUT OF FIRST OPEN CODE
PROBLEM DESCRIPTION:
move assignment of user_dma_avail out of first open code
PROBLEM SUMMARY:
We are losing mods via chgcss to win_poolsize. The mod does
not correctly propogate to any active user-space windows.
PROBLEM CONCLUSION:
Changes to window size attributes via chgcss will no longer
be permitted if any user-space windows are active.
------
APAR: IY21949 COMPID: 5765B9501 REL: 330
ABSTRACT: NFS EXPORTED GPFS FILESYSTEM:LD: SEVERE ERROR:EXEXPECTED I/O
PROBLEM DESCRIPTION:
nfs exported gpfs filesystem:ld:severe error:unexpected I/O
PROBLEM SUMMARY:
Running compiles with the source in a GPFS file system
exported via NFS produced an error in ftruncate system call.
PROBLEM CONCLUSION:
NFS does not necessarily pass in the FWRITE flag when it
calls VNOP_FTRUNC, so or it into the flags passed to
open/trunc/close, so that ftruncInternal does not
return E_BADF.
------
APAR: IY21956 COMPID: 5765E5100 REL: 600
ABSTRACT: CRASH IN CS/AIX 600 IN SNA_V5ROUTER
PROBLEM DESCRIPTION:
MST STACK TRACE:
0xf00002e0 (excpt=00368000:42000000:00000000:00368000:00000106)
(intpri=11)
IAR: .[sna_v5router:nbm_free_buffer]+54 (0500ab68): twllti
LR: .[sna_v5router:nbm_free_buffer]+4c (0500ab60)
2efa29c8: .[sna_v5router:nlm_action_sec_lu_fsm]+338 (051e0330)
2efa2a28: .[sna_v5router:nlm_sec_lu_fsm]+454 (051e12c0)
2efa2a98: .[sna_v5router:nlm_sec_lu_signal]+370 (051e1668)
PROBLEM SUMMARY:
Problem occurs after hcon LU13 application tries to recover
after an error. The sequence is that the application issues
deallocate(SSCP) followed by allocate(SSCP) while bound and
not BETB and then receives UNBIND, DACTLU, ACTLU and then
logs on again. hcon then issues deallocate(SSCP) again when
BOUND and CS/AIX gets confused trying to send an UNBIND RSP
internally. I have added code to prevent this.
PROBLEM CONCLUSION:
Prevent code trying to send UNBIND RSP following session restart
------
APAR: IY21983 COMPID: 5765D5100 REL: 320
ABSTRACT: SYNTAX ERRORS IN POST_PROCESS PREVENTING HARDMON FROM BEING
PROBLEM DESCRIPTION:
upto ssp.basic 3.2.0.11 there are some syntax errors in
/usr/lpp/ssp/install/bin/post_process.
namely, line 344 :
while $finis < 3 && $rc > 79 && $rc < 87
should be changed to :
while $finis -lt 3 && $rc -gt 79 && $rc -lt 87
and line 400:
while $rc != 0 && $subsys_checked < 20
should be changed to :
while $rc -ne 0 && $subsys_checked -lt 20
line 408 :
if $subsys_checked = 20
to
if $subsys_checked -eq 20
as well as probably some :
if $? != 0
to
if $? -ne 0
and some:
if $rc = 0
to
if $rc -eq 0
depending on whether you interpret a returncode as a
numerical or an ascii value ;-)
the case in line 400 prevents hardmon from being stopped
correctly before modifying its subssystem which could cause
him to not start.
LOCAL FIX:
change the mentioned lines accordingly.
PROBLEM SUMMARY:
Some of the integer comparisons are being performed
incorrectly in post_process. The checks for integers being
equal will work correctly, but the checks for < or > are
comparing the ascii values instead of the integers. As a
result it is possible that hardmon may not be started.
PROBLEM CONCLUSION:
Modified post_process so that comparisons of integers are
done correctly. The code now uses -lt instead of < and
-gt instead of >.
------
APAR: IY22004 COMPID: 5765D9300 REL: 310
ABSTRACT: MPI_CART_MAP CORE-DUMPS
PROBLEM DESCRIPTION:
MPI_Cart_map core-dumps
PROBLEM SUMMARY:
When using a one by one cartesian structure in MPI_Cart_map
with periods set to true, MPI failed to map the structure
correctly and may cause seg fault. Basically, the code
treats the only task in the structure as both its own
successor and predecessor, therefore two neighbors while
actually should be only one.
PROBLEM CONCLUSION:
Code changed so that the a task can have itself as only one
neighbor in a periodic dimension, and the structure will be
mapped correctly.
------
APAR: IY22005 COMPID: 5765D6100 REL: 220
ABSTRACT: GETGRGID_R CALL FAILING UNDER NIS
PROBLEM DESCRIPTION:
llq -l | grep Unix does not show a group for some users when LL
is running with NIS.
PROBLEM SUMMARY:
At unknown levels of AIX libc.a, getgrgid_r can return
ENOENT instead of ERANGE when the buffer passed in is
too small.
PROBLEM CONCLUSION:
LL is designed to start with a given size buffer and
continually retry calling getgrgid_r with larger buffers
until ERANGE is no longer returned. In recent discussions
with AIX development, I was informed that we could not
count on receiving ERANGE when the buffer passed in is too
small (even though the man page says we can), and that
getgrgid_r should always be called with a buffer that is
large enough to hold GRPLINLEN (a grp.h define value)
characters. For LL accouting to be correct, we must provide
this work around until such time as getgrgid_r is fixed.
We will modify the code so that a buffer of GRPLINLEN+1
bytes is always used on the first call to getgrgid_r. This
will eliminate the possibility that the buffer will be too
small. The retry code will be left in place in case it
is ever needed again.
------
APAR: IY22076 COMPID: 5765E5100 REL: 600
ABSTRACT: NODE HANG ON ANYNET STOPPING
PROBLEM DESCRIPTION:
With Anynet started, stopping the node results in an unrespon-
sive sna subsystem, apparently hanging on ANYNET_STOPPING. With
no Anynet configured or Anynet not running the system responds
as expected.
PROBLEM SUMMARY:
Probelm as described above. Code fix to correct
logic.
PROBLEM CONCLUSION:
If using Anynet, updated module is needed in
scenarios where unable to stop the node without a hang.
TEMPORARY FIX:
don't use Anynet or reboot to free up system
------
APAR: IY22078 COMPID: 5765D5100 REL: 320
ABSTRACT: RVSD_VERSION NOT UPDATED FOR 3.2.0.4
PROBLEM DESCRIPTION:
At rvsd 3.2.0.4 (vsd.rvsd.rvsdd) there was a functional change
to how RVSD deals with fenced VSDs. The RVSD_Fence Class is
removed and a different mechanism is used. In order for this
to be triggered RVSD_version must be updated. Up to this point
we were only concerned with release levels and RVSD_verison was
only being updated for new releases. But here is a case where
we care about PTF level. A packaging change will need to be
made to vsd.rvsd.rvsdd to update RVSD_verion level for 3.2
LOCAL FIX:
Manaully update RVSD_version fields (Node class) in the SDR to
3020004. But be sure to verify that vsd.rvsd.rvsdd is at
3.2.0.4 or later first.
PROBLEM SUMMARY:
RVSD_version was not being updated at
vsd.rvsd.rvsdd 3.2.0.4 (PTF Set 5) as required.
PROBLEM CONCLUSION:
RVSD_version will now be updated to reflect 3.2.0.4
or later.
------
APAR: IY22138 COMPID: 5765D5100 REL: 320
ABSTRACT: CSS_COLONY:ADAPTRECV-DECREASE CA INTERVALS-AVOID THRESHOLDING
PROBLEM DESCRIPTION:
css_colony:adaptrecv-decrease ca intervals-avoid thresholding
PROBLEM SUMMARY:
SP Switch2 adapter recovery was taking the node off the
switch if two critical errors were seen within a 24-hour
interval. This resulted in the node falling off the
switch too often.
PROBLEM CONCLUSION:
The CA threshold interval was decreased to four hours.
------
APAR: IY22139 COMPID: 5765D5100 REL: 320
ABSTRACT: POOR PERFORMANCE OF COLONY DOUBLE/SINGLE
PROBLEM DESCRIPTION:
poor performance of colony double/single
PROBLEM SUMMARY:
The chgcss command does not dynamically change the
poolsize for the css1 interface. The new value
takes effect only after the node is rebooted.
PROBLEM CONCLUSION:
The chgcss command was changed to initialize the
device number field in the poolsize data structure.
------
APAR: IY22167 COMPID: 5765B9500 REL: 130
ABSTRACT: MMMKVSD: GET ADAPTER FAILED FOR IPA: <IP ADDR>
PROBLEM DESCRIPTION:
The command mmmkvsd uses the commands
/usr/lpp/mmfs/bin/mmcommon convin and
/usr/lpp/mmfs/bin/mmcommon convnr
(in section: # Figure out our partition.)
However, mmcommon no longer has the functions
convin and convnr implemented. Using mmmkvsd
will fail with the message
mmmkvsd: Get adapter failed for IPA: <IP Addr>
PROBLEM SUMMARY:
during migration from gpfs 1.2 to gpfs 1.3,
the mmmkvsd and mmcrfs commands did not operate if issued
from the cws.
PROBLEM CONCLUSION:
fix mmmkvsd command in gpfs 1.3 to operate
with gpfs 1.2 nodes.
------
APAR: IY22190 COMPID: 5765D5100 REL: 320
ABSTRACT: IY18700 BREAKS /USR/LPP/SSP/INSTALL/BIN/INSTALL_LIB.PL CODE.
PROBLEM DESCRIPTION:
setup_server result in follow errors:
mknimint: 0016-202 The IP address tuple is not numeric.
mknimint: 0016-201 There is an incorrect number of tuples in
address.
setup_server: 0016-279 Problem of internally called command:
/usr/lpp/ssp/bin/mknimint; rc= 2.
This only occurs when APAR IY18700 is applied on the system.
The problem is generated when an IP address' tuple is composed
of a single zero (i.e: 192.0.12.12).
LOCAL FIX:
APAR IY18700 needs to be removed from the system. Back levels
of the /usr/lpp/ssp/install/bin/install_lib.pl code have been
successfully tested.
PROBLEM SUMMARY:
***********************************************************
* USERS AFFECTED: *
* *
* Users with IY18700 (available in ssp.basic 3.2.11) *
* installed on their Control Workstation or B/I Server, *
* which also have an ethernet adapter with an IP *
* address with at least one octet beginning with 0. *
* *
***********************************************************
* PROBLEM DESCRIPTION: *
* *
* mknimint will fail with the following message: *
* 0016-202 The IP address tuple is not numeric. *
* *
***********************************************************
* RECOMMENDATION: *
* *
* Replace the current version of *
* /usr/lpp/ssp/install/bin/install_lib.pl with a *
* pre ssp.basic 3.2.11 version of the file, or install *
* IY22190 when available. *
* *
***********************************************************
------
APAR: IY22192 COMPID: 5765D5100 REL: 330
ABSTRACT: INTERNAL ESTART FOLLOWING SWITCH ERROR RECOVERY FAILED
PROBLEM DESCRIPTION:
Internal Estart following switch error recovery failed
PROBLEM SUMMARY:
Switch recovery had a flaw where a switch chip port would
not be disabled if a second switch chip reported an error
which required it to be disabled.
PROBLEM CONCLUSION:
Switch recovery was modified to properly disable both ports
when handling the second error.
------
APAR: IY22246 COMPID: 5765B9501 REL: 330
ABSTRACT: MTIME CHANGES NOT PROPAGATED ON DIRS FOR DFS
PROBLEM DESCRIPTION:
mtime changes not propagated on dirs for DFS
PROBLEM SUMMARY:
Bug in DFS export handling of mtime found in code review.
PROBLEM CONCLUSION:
Make sure all directory updates get a write lock on the
directory inode
(wo or stronger). This is necessary for the exact mtime
option to work
correctly on directories.
------
APAR: IY22249 COMPID: 5765B9501 REL: 330
ABSTRACT: VSX ASSERT FAILED: SRCDIROP.LOCKMODE == LKOBJ::XW FILE DIRECT.C
PROBLEM DESCRIPTION:
VSX assert failed: srcDirOp.lockmode == LkObj::xw file direct.C
PROBLEM SUMMARY:
GPFS self check logic failed when renaming a directory to a
directory name that already exists.
PROBLEM CONCLUSION:
When renaming a directory and another directory already
exists with the
same name, get an XW lock on the parent directory to change
the link count
to account for removal of one child directory.
------
APAR: IY22251 COMPID: 5765B9501 REL: 330
ABSTRACT: VSX ORDINARY USER VSX0 PANIC'S NODE IN GPFS: PANIC: VNODEOPS.C:
PROBLEM DESCRIPTION:
VSX ordinary user vsx0 panic's node in gpfs: panic: vnodeops.C:
PROBLEM SUMMARY:
Kernel panic in a soft mount environment using mmap.
PROBLEM CONCLUSION:
Mmap cannot assume that first vnode pointer in gnode is
the one used for mapping. Change page fault code not to use
vnode at all.
Instead. save VFS data pointer in gpfsNode_t at mapping
time. Also, since
paging code can no longer put a hold on vnode, fix
synchronization between
pager and mmap termination to ensure that termination won't
finish until
all outstanding paging requests are complete.
------
APAR: IY22253 COMPID: 5765B9501 REL: 330
ABSTRACT: MISC FSCK PROBLEMS
PROBLEM DESCRIPTION:
misc fsck problems
PROBLEM SUMMARY:
Serviceability improvements for mmfsck.
------
APAR: IY22254 COMPID: 5765B9501 REL: 330
ABSTRACT: 384WAY, GPFS1.4: ASSERT METADATA.C, LINE 4878 DURING MMFSCK
PROBLEM DESCRIPTION:
384way, GPFS1.4: assert metadata.C, line 4878 during mmfsck
PROBLEM SUMMARY:
GPFS self check logic failed when running mmfsck in a two
node HACMP cluster with single node quorum enabled
PROBLEM CONCLUSION:
Two node cluster case caused doRecovery to be called
even when it wasn't needed. This cannot be done in the face
of fsck.
Fsck is set up to avoid this by having the clients mount
readonly
and not use a log.
------
APAR: IY22257 COMPID: 5765B9501 REL: 330
ABSTRACT: SLOW MMFSCK
PROBLEM DESCRIPTION:
slow mmfsck
PROBLEM SUMMARY:
mmfsck performance change
------
APAR: IY22263 COMPID: 5765B9501 REL: 330
ABSTRACT: BAD EXAMPLE AND MISSING INFO IN MMLSQUOTA DOCUMENTATION
PROBLEM DESCRIPTION:
The command documentation for mmlsquota shows an example where
usr + in_doubt exceeds the quota. This is not allowed by the
code. Also, the quota documentation should explicitly state that
user + in_doubt is not allowed to exceed the hard quota.
PROBLEM SUMMARY:
The mmlsquota command description did not explicitly state
the sum of usr and in-doubt could not exceed the hard limit.
The example also showed a case where this had occured and
therefore also needed to be updated.
PROBLEM CONCLUSION:
The description was updated to include:
For each file system in the nodeset, the mmlsquota command
displays:
Block limits:
quota type (USR or GROUP)
current usage in KB
soft limit in KB
hard limit in KB
space in-doubt
grace period
File limits:
current number of files
soft limit
hard limit
files in-doubt
grace period
As the sum of the in-doubt value and the current usage may
not exceed the hard limit, the actual block space and
number of files available to the user or group may be
constrained by the in-doubt value. Should the in-doubt
value approach a significant percentage of the quota, run
the mmcheckquota command to account for the lost of space
and files.
The example was updated as:
User paul enters:
mmlsquota
The system displays information similar to:
Block Limits | File
Limits
Filesystem type KB quota limit in_doubt grace|files quota
limit in_doubt grace
gpfsn USR 728 100096 200192 4880 none| 35 30
40 10 6days
This output shows the quotas for user paul in file system
gpfsn set to a soft limit of 100096K and a hard limit of
200192K. 728K is currently allocated to him. 4880 is also
in-doubt, meaning that the quota system has not yet been
updated as to whether this space has been used by the nodes,
or whether it is still available. No grace period appears
because the user has not exceeded his quotas. If he had
exceeded the soft limit the grace period would be set and
the user would have that amount of time to bring his usage
below the quota value. If he failed to do so, he would not
be allocated any more space.
The soft limit for files (inodes) is set at 30 and the hard
limit is 40. 35 files are currently allocated to this user,
and the quota system does not yet know whether the 10
10 in-doubt have been used or are still available. A grace
period of six days appears because the user has exceeded
his quotas. The user has that amount of time to bring his
usage below the quota value. If he fails to do so, he will
not be allocated any more space.
------
APAR: IY22264 COMPID: 5765B9501 REL: 320
ABSTRACT: BAD EXAMPLE AND MISSING INFO IN MMLSQUOTA DOCUMENTATION
PROBLEM DESCRIPTION:
The command documentation for mmlsquota shows an example where
usr + in_doubt exceeds the quota. This is not allowed by the
code. Also, the quota documentation should explicitly state that
user + in_doubt is not allowed to exceed the hard quota.
PROBLEM SUMMARY:
The mmlsquota command description did not explicitly state
the sum of usr and in-doubt could not exceed the hard limit.
The example also showed a case where this had occured and
therefore also needed to be updated.
PROBLEM CONCLUSION:
The description was updated to include:
For each file system in the nodeset, the mmlsquota command
displays:
Block limits:
quota type (USR or GROUP)
current usage in KB
soft limit in KB
hard limit in KB
space in-doubt
grace period
File limits:
current number of files
soft limit
hard limit
files in-doubt
grace period
As the sum of the in-doubt value and the current usage may
not exceed the hard limit, the actual block space and
number of files available to the user or group may be
constrained by the in-doubt value. Should the in-doubt
value approach a significant percentage of the quota, run
the mmcheckquota command to account for the lost of space
and files.
The example was updated as:
User paul enters:
mmlsquota
The system displays information similar to:
Block Limits | File
Limits
Filesystem type KB quota limit in_doubt grace|files quota
limit in_doubt grace
gpfsn USR 728 100096 200192 4880 none| 35 30
40 10 6days
This output shows the quotas for user paul in file system
gpfsn set to a soft limit of 100096K and a hard limit of
200192K. 728K is currently allocated to him. 4880 is also
in-doubt, meaning that the quota system has not yet been
updated as to whether this space has been used by the nodes,
or whether it is still available. No grace period appears
because the user has not exceeded his quotas. If he had
exceeded the soft limit the grace period would be set and
the user would have that amount of time to bring his usage
below the quota value. If he failed to do so, he would not
be allocated any more space.
The soft limit for files (inodes) is set at 30 and the hard
limit is 40. 35 files are currently allocated to this user,
and the quota system does not yet know whether the 10
10 in-doubt have been used or are still available. A grace
period of six days appears because the user has exceeded
his quotas. The user has that amount of time to bring his
usage below the quota value. If he fails to do so, he will
not be allocated any more space.
------
APAR: IY22392 COMPID: 5765D5100 REL: 330
ABSTRACT: COLONY CONFIG NOT REPORTING CORRECT ADAPTER DIAG
PROBLEM DESCRIPTION:
colony config not reporting correct adapter diag
PROBLEM SUMMARY:
Resource name field of error report contained 'diag_fail'
message only.
PROBLEM CONCLUSION:
Adapter name string was added into resource_name field. Now
output from errpt command looks like 'css0_diag_fail' or
'css1_diag_fail'. This is test output from errpt:
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
63F81C60 0813155301 P S css0_diag_fail CSS config failed
------
APAR: IY22509 COMPID: 5765E5100 REL: 601
ABSTRACT: LU62 'WHAT_DATA_RECEIVED' RETURNS PSH_COMPLETE_DATA_RECEIVED=6'
PROBLEM DESCRIPTION:
Customer gets message "uncoded what_data_rcvd val 6" in his
APPC application program. This problem happens very seldom
and a core dump is written. AIX/CS API trace shows:
output.what_data_rcvd = 6
According to luxsna.h this indicates:
'PSH_COMPLETE_DATA_RECEIVED=6'
The very same APPC program was running fine on SNA Server V4.2.
Customers appl is not able to process above 'what_data_rcvd'
value and abends.
PROBLEM SUMMARY:
Application fails because it gets unexpected
what_data_received of PS_COMPLETE_DATA_RCVD.
PROBLEM CONCLUSION:
Corrected code to handle the BC fill=buffer
case correctly, tracking current location within a GDS record
between reads.
------
APAR: IY22523 COMPID: 5765D5100 REL: 320
ABSTRACT: COLONY CONFIG NOT REPORTING CORRECT ADAPTER DIAG
PROBLEM DESCRIPTION:
colony config not reporting correct adapter diag
PROBLEM SUMMARY:
Resource name field of error report contained 'diag_fail'
message only.
PROBLEM CONCLUSION:
Adapter name string was added into resource_name field. Now
output from errpt command looks like 'css0_diag_fail' or
'css1_diag_fail'. This is test output from errpt:
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
63F81C60 0813155301 P S css0_diag_fail CSS config failed
------
APAR: IY22530 COMPID: 5765E7200 REL: 310
ABSTRACT: CIFSUSERPROC IS STARTED WHEN CONNECTING WITH INVALID USER.
PROBLEM DESCRIPTION:
Enable passthrough authentication server and try to connect
with a valid user on AIX and passthrough server, and with
invalid password. A new process cifsUserProc will be created
with every attempt.
PROBLEM CONCLUSION:
exit from cifsUserProc, if the request from the server is,
just to verify the user existency.
------
APAR: IY22535 COMPID: 5765E6800 REL: 300
ABSTRACT: MULTILEVEL WILDCARDING
PROBLEM DESCRIPTION:
MultiLevel Wildcarding is not supported in the current design
of Agent
PROBLEM CONCLUSION:
Enhanced the code to support the mutlilevel wildcarding.
------
APAR: IY22538 COMPID: 5765D5100 REL: 320
ABSTRACT: USER SPACE JOB SOMETIMES GET KILLED WITH SWITCH CLOCK
PROBLEM DESCRIPTION:
User Space Job sometimes get killed with switch clock
PROBLEM SUMMARY:
MPI job can be killed even if successful adapter recovery
takes place due to switch clock management.
PROBLEM CONCLUSION:
With this fix, MPI job should continue if successful
adapter recovery takes place.
------
APAR: IY22540 COMPID: 5765D5100 REL: 320
ABSTRACT: NEED TO ADD CALL TO THE MULTILINK DUMP UTILITIES IN CSS.SNAP.
PROBLEM DESCRIPTION:
Need to add call to the Multilink dump utilities in css.snap
PROBLEM SUMMARY:
The css.snap command needs to collect data on the ml0
device.
PROBLEM CONCLUSION:
The css.snap will now collect data on the ml0 device if it
is present.
------
APAR: IY22541 COMPID: 5765D5100 REL: 320
ABSTRACT: 128WAY COLONY D/S : CSS.SNAP RUN ON NODE FILLS UP /VAR
PROBLEM DESCRIPTION:
128Way Colony D/S : css.snap run on node fills up /var
PROBLEM SUMMARY:
The css.snap command was not checking in all the correct
places for file system usage.
PROBLEM CONCLUSION:
The css.snap command now looks in /var/adm/SPlogs/css0 and
/var/adm/SPlogs/css1 as well as /var/adm/SPlogs/css for file
system usage before execution.
------
APAR: IY22542 COMPID: 5765D5100 REL: 320
ABSTRACT: FUNCTIONALIZE CSS.SNAP EXITS
PROBLEM DESCRIPTION:
Functionalize css.snap exits
PROBLEM SUMMARY:
The internal flow of css.snap changed.
PROBLEM CONCLUSION:
Added an exit function to css.snap.
------
APAR: IY22557 COMPID: 5765E7400 REL: 300
ABSTRACT: JAZIZO: CPU AND MEMORY ENHANCEMENT
PROBLEM DESCRIPTION:
Jazizo needs to be enhanced to have lesser CPU & memory usage
PROBLEM CONCLUSION:
modified the code to minimize the data type conversions.
------
APAR: IY22558 COMPID: 5765E7400 REL: 300
ABSTRACT: PTX V3: AZIZO PRINTER ERROR
PROBLEM DESCRIPTION:
On PTX V3.0, when try to print from azizo to a file
or a printer, following message pops up
Paper width -802024944.00 is out of range
........
........
Please correct input and try again
PROBLEM CONCLUSION:
Included the prototype header file in azizo code.
------
APAR: IY22559 COMPID: 5765E7400 REL: 300
ABSTRACT: 3DMON MISINTERPRETS -D OPTION WITH -DISPLAY OPTION OF XMOTIF
PROBLEM DESCRIPTION:
3dmon takes a number along with the -d option as invitation
delay seconds but the XtInitialize routine which is supposed
to open the X display interface interprets it as the display
identifier and tries to open the display, and fails.
LOCAL FIX:
Specify the invitation delay value with -d option
without leaving space between -d & the value.
e.g. instead of '-d 15' use '-d15'.
PROBLEM SUMMARY:
3dmon takes a number along with the -d option as invitation
delay seconds but the XtInitialize routine which is supposed
to open the X display interface interprets it as the display
identifier and tries to open the display, and fails.
PROBLEM CONCLUSION:
Fixed the code such that the ambiguity of -d option is
is removed.
------
APAR: IY22560 COMPID: 5765E6800 REL: 300
ABSTRACT: XMSERVD DOESN'T STOPS AFTER TIMETOLIVE MINUTES
PROBLEM DESCRIPTION:
xmservd has a -l option which can be used to specify
time_to_live minutes.Even when this time_to_live minutes
is specified after the -l option,xmservd may not die
after the specified time_to_live minutes value.
PROBLEM CONCLUSION:
Fixed the problem in the calculation of the death time of
xmservd.
------
APAR: IY22576 COMPID: 5765B9501 REL: 320
ABSTRACT: VSX CHOWN() DID NOT CLEAR S_ISUID BIT
PROBLEM DESCRIPTION:
vsx chows() did not clear s_isuid bit
PROBLEM SUMMARY:
chown() did not clear S_ISUID bit
PROBLEM CONCLUSION:
fast path when owner and group already
match was not clearing the S_ISUID or SPISGID bits
------
APAR: IY22578 COMPID: 5765B9501 REL: 330
ABSTRACT: VSX CHOWN() DID NOT CLEAR S_ISUID BIT
PROBLEM DESCRIPTION:
vsx chows() did not clear s_isuid bit
PROBLEM SUMMARY:
chown() did not clear S_ISUID bit
PROBLEM CONCLUSION:
fast path when owner and group already
match was not clearing the S_ISUID or SPISGID bits
------
APAR: IY22591 COMPID: 5765D6100 REL: 220
ABSTRACT: NEGOTIATOR CRASHING
PROBLEM DESCRIPTION:
LLQ commands begin to respond slow and then Negotiator stops
responding and then negotiator crashes.
LOCAL FIX:
Do not issue llstatus and llq repeatedly at the same time.
PROBLEM SUMMARY:
APAR IY20677 changed locking of various BTree data
structures
from using exclusive write locks to using shared read locks
for
accesses where the data structures were not being changed.
Since
the mechanism for traversing these data structures involves
actually
changing the data structure, the shared read locks are
insufficient
for protecting the data structure in these cases. This
inadequate
locking leads to memory errors which cause the negotiator to
core dump.
PROBLEM CONCLUSION:
In all cases, code has been changed to hold exclusive write
locks
when accessing BTree data structures, regardless of whether
data
in the data structure is being changed.
------
APAR: IY22592 COMPID: 5765B9501 REL: 330
ABSTRACT: LOOPING ON RECLOCKRESET
PROBLEM DESCRIPTION:
looping on reclockreset
PROBLEM SUMMARY:
Development discovered potential loop.
PROBLEM CONCLUSION:
Fix loop in RecLockReset due to F_SETLK call using a
different l_vfs value
from the one returned by F_GETLK.
------
APAR: IY22593 COMPID: 5765B9501 REL: 330
ABSTRACT: WAITING BECAUSE OF LOCAL BYTE RANGE LOCK CONFLICT
PROBLEM DESCRIPTION:
waiting because of local byte range lock conflict
PROBLEM SUMMARY:
Deadlock when a disk error occurs causing a file system
forced unmount and there are mapped files attempting to be
pruged.
PROBLEM CONCLUSION:
When BR lock gets an error (e.g. SG panic) when flushing
mapped buffers
in the byte range, it must unlock the byte-range because the
caller
will assume that the range was never acquired.
------
APAR: IY22594 COMPID: 5765B9501 REL: 330
ABSTRACT: MMFSD STUCK AFTER FAILING
PROBLEM DESCRIPTION:
mmfsd stuck after failing
PROBLEM SUMMARY:
There have been cases where GPFS termination deadlocked in
AIX services because the AIX service required a lock held by
a GPFS thread which detected the failure. The termination
sequence then hung which resulted in GPFS not restarting.
PROBLEM CONCLUSION:
Instead of waiting forever SigUsr1Handler now waits
for 5 minutes for internal dump to complete and exit
------
APAR: IY22595 COMPID: 5765B9501 REL: 330
ABSTRACT: ASSERT AT DSYNCH.C, LINE 1315
PROBLEM DESCRIPTION:
assert at dsynch.c, line 1315
PROBLEM SUMMARY:
GPFS self check logic detected an error at dsynch.C line
1315
PROBLEM CONCLUSION:
Thread was left holding alloc server mutex
during a config change.
------
APAR: IY22620 COMPID: 5765D6100 REL: 220
ABSTRACT: LLQ <JOBID> IN MULTI DOMAIN ENVIRONMENT DOES NOT WORK
PROBLEM DESCRIPTION:
llq <jobid> in multi domain environment does not work
Please refer to defect 76010
When issuing the llq jobid command from a node in a
different domain from the one running the schedd
which received the job, no information is reported.
Example:
On sp7tr12.hursley.ibm.com:
$ llq
Id Owner
anubis.ssd.hursley-.65.0 loadl
$ llq anubis.ssd.hursley.ibm.com.65.0
llq: There is currently no job status to report.
However, llq -l shows the long listing on all jobs,
including anubis.ssd.hursley.ibm.com.65.0.
PROBLEM SUMMARY:
When issuing the llq jobid command from a node in a
different domain from the one running the schedd
which received the job, no information is reported.
PROBLEM CONCLUSION:
When issuing the llq jobid command from a node in a
different domain from the one running the schedd
which received the job, information is now being reported.
------
APAR: IY22713 COMPID: 5765B9500 REL: 140
ABSTRACT: MMMKVSD: GET ADAPTER FAILED FOR IPA: <IP ADDR>
PROBLEM DESCRIPTION:
The command mmmkvsd uses the commands
/usr/lpp/mmfs/bin/mmcommon convin and
/usr/lpp/mmfs/bin/mmcommon convnr
(in section: # Figure out our partition.)
However, mmcommon no longer has the functions
convin and convnr implemented. Using mmmkvsd
will fail with the message
mmmkvsd: Get adapter failed for IPA: <IP Addr>
PROBLEM SUMMARY:
during migration from gpfs 1.2 to gpfs 1.3, the
mmmkvsd and mmcrfs commands did not operate if issued from the
CWS.
PROBLEM CONCLUSION:
fix mmmkvsd command in gpfs 1.3 to operate
with gpfs 1.2 nodes.
------
APAR: IY22714 COMPID: 5765B9501 REL: 330
ABSTRACT: ERR 4 OF UNKNOWN ORIGIN IN QUOTA PREFETCH
PROBLEM DESCRIPTION:
ERR 4 OF UNKNOWN ORIGIN IN QUOTA PREFETCH
PROBLEM SUMMARY:
Quota pre-fetch returned RC 4 without explanation
PROBLEM CONCLUSION:
Initialize secondary error variable so that FlushRecord exit
code
does not return incorrect value.
------
APAR: IY22716 COMPID: 5765B9501 REL: 330
ABSTRACT: NODE LEFT IN FAILED STATE AFTER RECOVERY
PROBLEM DESCRIPTION:
mmfsadm dump cfgmgr will show a node with state "fail" on some
nodes while it is either "down" or "up" on the other nodes. This
will block the socket communication to that node causing hung
threads if revokes are needed.
PROBLEM SUMMARY:
During a node failure, a second failing node did not
complete recovery.
PROBLEM CONCLUSION:
Serialize the joining of the single-phase group
and the N-phase group. This will prevent an observing node
from seeing a leave event from the single-phase group
after it has already seen leave/recovery/down/join phases
on the N-phase group, and assume it must be a new failure.
------
APAR: IY22741 COMPID: 5765B9501 REL: 330
ABSTRACT: EIO ERROR FROM MMGETACL COMMAND
PROBLEM DESCRIPTION:
Problem occurs due to storage of in-memory buffers of the ACL
file when doing permission checks on directories which have
implied permission(s).
PROBLEM SUMMARY:
Application received an incorrect rejection of access to a
file because ACLs did not match.
PROBLEM CONCLUSION:
Error description: Do not modify the in-memory buffer of the
ACL file
when doing permission checks on directories which have
implied permissions.
After the modification, the next access of the ACLs would
return
E_VALIDATE because the hashed value of the modified entry
would not match.
------
APAR: IY22764 COMPID: 5765E5400 REL: 440
ABSTRACT: GEO_MOUNT_FS DOES NOT SET GROUPNAME
PROBLEM DESCRIPTION:
In HACMP (and HAES) 4.4.1, the cl_activate_fs
script requires that the GROUPNAME is set.
However, HaGeo calls cl_activate_fs without
setting it.
In the hacmp.out file, you'll see several
errors including:
odmget: Could not retrieve object for HACMPresource,
odm errno 5904
ERROR: Could not get the value of FSCHECK_TOOL
ERROR: Could not get the value of RECOVERY_METHOD
PROBLEM SUMMARY:
In an HAGEO environment with HACMP, Geo is calling
cl_activate_fs from Geo_mount_fs to mount the Geo filesystems.
Certain environment variables are not set such as "group"
which is an attribute of HACMPresource. When an odmget is
issued as follows: odmget -q name=FSCHECK_TOOL AND group=
HACMPresource you get the following error: odmget: Could not
retrieve object for HACMPresource, odm errno 5904 It looks
like FSCHECK_TOOL is not set, but somehow, fsck gets run.
PROBLEM CONCLUSION:
Change the code in cl_activate_fs to directly querry ODM for
any information that was not currently available in the
environment, and suppress error messages for conditions
handled by taking defaults.
------
APAR: IY22869 COMPID: 5765E5400 REL: 440
ABSTRACT: ATTEMPT TO START CLUSTER HANGS IN CLSTART - HAES
PROBLEM DESCRIPTION:
When attempting to start the cluster in HAES 4.4, the startup
hung in clstart. It was hanging on a grep for clstop and
a tail -1.
PROBLEM SUMMARY:
When attempting to start the cluster in HAES 4.4, the startup
hung in clstart. It was hanging on a grep for clstop and
a tail -1.
PROBLEM CONCLUSION:
The problem was in code which attempts to find a timestamp
of the last stop of the cluster. Two problems existed:
1. If the cluster.log entry in /etc/syslog.conf was missing
it still continued with trying to grep a null filename
causing the hang. 2. It was grepping for clstop in the
cluster.log file, which does not exist. Code was changed
to check to be sure a file location was returned before
attempting the grep and the grep was changed to look for
"EVENT COMPLETED: node_down_complete".
------
APAR: IY22905 COMPID: 5765E7200 REL: 310
ABSTRACT: RENAME A FILE/FOLDER TO DIFFERENT CASE FAILS
PROBLEM DESCRIPTION:
Unable to move a file/folder name to same name but different
case on FastConnect.
PROBLEM CONCLUSION:
Fixed the file/folder move functionality.
------
APAR: IY22906 COMPID: 5765E7200 REL: 310
ABSTRACT: COREDUMP WITH 1024 CHARACTERS
PROBLEM DESCRIPTION:
Server core dumps
PROBLEM CONCLUSION:
Changing the assert statement to support more characters
------
APAR: IY22946 COMPID: 5765B9501 REL: 330
ABSTRACT: MMDELDISK STOPS EARLY ON WARNING
PROBLEM DESCRIPTION:
mmdeldisk stops early on warning
PROBLEM SUMMARY:
In certain error cases, mmdeldisk stopped trying to move
data due to recoverable errors which could have allowed
additional data to be recovered.
PROBLEM CONCLUSION:
Only terminate repair scan on fatal errors. Scan was being
stopped
for non-severe errors such as E_NOBALSPC, but tsdeldisk
would still
delete the disk, leaving files with disk addresses pointing
to the
deleted disk (i.e. data would be lost).
------
APAR: IY22947 COMPID: 5765B9501 REL: 320
ABSTRACT: EIO ERROR FROM MMGETACL COMMAND
PROBLEM DESCRIPTION:
Problem occurs due to storage of in-memory buffers of the ACL
file when doing permission checks on directories which have
implied permission(s).
PROBLEM SUMMARY:
Application received an incorrect rejection of access to a
file because ACLs did not match.
PROBLEM CONCLUSION:
Error description: Do not modify the in-memory buffer of the
ACL file
when doing permission checks on directories which have
implied permissions.
After the modification, the next access of the ACLs would
return
E_VALIDATE because the hashed value of the modified entry
would not match.
------
APAR: IY23023 COMPID: 5765B9501 REL: 320
ABSTRACT: MMLSQUOTA SHOWS INCONSISTENT GRACE PERIOD VALUES
PROBLEM DESCRIPTION:
mmlsquota shows inconsistent grace period values
PROBLEM SUMMARY:
mmlsquota shows incorrect grace period.
PROBLEM CONCLUSION:
mmlsquota shows inconsistent grace period values. Do not
use the same buffer for two different strings in one print
statement.
------
APAR: IY23024 COMPID: 5765B9501 REL: 330
ABSTRACT: MMLSQUOTA SHOWS INCONSISTENT GRACE PERIOD VALUES
PROBLEM DESCRIPTION:
mmlsquota shows inconsistent grace period values
PROBLEM SUMMARY:
mmlsquota shows incorrect grace period.
PROBLEM CONCLUSION:
mmlsquota shows inconsistent grace period values. Do not
use the same buffer for two different strings in one print
statement.
------
APAR: IY23025 COMPID: 5765B9501 REL: 330
ABSTRACT: LOOP IN UNMOUNT
PROBLEM DESCRIPTION:
loop in unmount
PROBLEM SUMMARY:
Loop in unmounting a file system under certain conditions
PROBLEM CONCLUSION:
cxiCanUncacheOSNode must only return a vnode pointer if the
count field
is zero. Infinite loop in unmount calling unCache if not
------
APAR: IY23026 COMPID: 5765B9501 REL: 330
ABSTRACT: ASSERT ACQUIRING ALMSERVER MUTEX ALREADY HELD
PROBLEM DESCRIPTION:
Assert acquiring almserver mutex already held
PROBLEM SUMMARY:
GPFS self check logic asserted in the disk allocation
manager
PROBLEM CONCLUSION:
CHECK_CONFIG_CHANGE_RELALM should be calling
releaseServerAlmMutex
instead of acquireServerAlmMutex.
------
APAR: IY23028 COMPID: 5765B9501 REL: 330
ABSTRACT: FORCED UNMOUNTED FS WILL NOT REMOUNT
PROBLEM DESCRIPTION:
forced unmounted fs will not remount
PROBLEM SUMMARY:
A file system was unmounted by the system due to I/O errors
while processing a mapped file. The unmount did not fully
complete inhibiting a remount without shutting down GPFS on
that node.
PROBLEM CONCLUSION:
Allow processing mapped page buffers as long as the SG is
still
available, so that force unmount can get all the dirty page
buffers
flushed during the initial sync phase of the unmount.
------
APAR: IY23030 COMPID: 5765B9501 REL: 320
ABSTRACT: NODE LEFT IN FAILED STATE AFTER RECOVERY
PROBLEM DESCRIPTION:
mmfsadm dump cfgmgr will show a node with state "fail" on some
nodes while it is either "down" or "up" on the other nodes. This
will block the socket communication to that node causing hung
threads if revokes are needed.
PROBLEM SUMMARY:
During a node failure, a second failing node did not
complete recovery.
PROBLEM CONCLUSION:
Serialize the joining of the single-phase group
and the N-phase group. This will prevent an observing node
from seeing a leave event from the single-phase group
after it has already seen leave/recovery/down/join phases
on the N-phase group, and assume it must be a new failure.
------
APAR: IY23034 COMPID: 5765D5100 REL: 320
ABSTRACT: HACWS NODE_UP_COMPLETE.POST_EVENT LOOPS FOREVER
PROBLEM DESCRIPTION:
ssp.hacws.usr.3.1.1, node_up_complete.post_event loops forever
if configured with hacmp/es.
node_up_complete.post_event waits hardcoded for
/usr/sbin/cluster/.telinit
BUT
/etc/inittab: (if HAES is configured)
clinit:a:wait:/bin/touch /usr/es/sbin/cluster/.telinit
LOCAL FIX:
change node_up_complete.post_event manually from
/usr/sbin/cluster/.telinit
to
/usr/es/sbin/cluster/.telinit
PROBLEM SUMMARY:
In an HACMP/ES environment in /etc/inittab
entry clinit does a touch to /usr/es/sbin/cluster/.telinit.
However, the node_up_complete.post_event file is
searching to /usr/sbin/cluster/.telinit. Hence we keep
looping for over 5 minutes looking for that file.
The node_up_complete.post_event file needed to be
modified to search for /usr/es/sbin/cluster/.telinit
instead of /usr/sbin/cluster/.telinit
PROBLEM CONCLUSION:
The code was changed for script node_up_complete.
post_event in an HACMP/ES environment to look for
/usr/es/sbin/cluster/.telinit file INSTEAD of
/usr/sbin/cluster/.telinit.
------
APAR: IY23107 COMPID: 5765D5100 REL: 330
ABSTRACT: PERSPECTIVES ERROR MSGS WITH V2.1 LIBDCE.A IN NON-DCE AUTHEN ENV
PROBLEM DESCRIPTION:
PERSPECTIVES ERROR MSGS WITH V2.1 LIBDCE.A IN NON-DCE AUTHEN Env
PROBLEM SUMMARY:
Issuing sphardware on a CWS in a non-DCE environment will
result in error messages being issued if a level of DCE
prior to 3.1 exists on the system. The following messages
will be issued several times:
exec(): 0509-036 Cannot load program spsec_ldmod because
of the following errors:
0509-022 Cannot load module
/usr/lpp/ssp/bin/spsec_ldmod.
0509-150 Dependent module libdcepthreads.a
(dcepthreads_shr.o) could not be loaded.
0509-022 Cannot load module libdcepthreads.a
(dcepthreads_shr.o).
0509-026 System error: A file or directory in the
path name does not exist.
0509-022 Cannot load module /usr/lpp/ssp/bin.
0509-150 Dependent module /usr/lpp/ssp/bin could
could not be loaded.
The routines were trying to access a module that does
not exist in /usr/lib/libdce.a in the earlier version
of DCE.
PROBLEM CONCLUSION:
Modified code in perspectives and Event Management to
first verify that DCE is being used on the system,
prior to attempting to load the DCE libraries.
This will allow sphardware to be run in a non-DCE
environment when a level of DCE prior to 3.1 exists
on the system.
APAR IY23107 only provides a partial solution to this
problem. For a complete solution APAR IY22203, available in
rsct.clients.rte 1.2.1.1 or greater, must also be installed.
------
APAR: IY23162 COMPID: 5765E5400 REL: 440
ABSTRACT: ONLY ONE SWAP_ADAPTER WHEN TWO SERVICE ADAPTERS FAIL - HAES
PROBLEM DESCRIPTION:
The customer was using 4-port ethernet adapters and was testing
what would happen if one of these adapter cards failed
completely. With 2 service adapters, one for each of two
different networks, configured on the same 4-port adapter, the
customer pulled both service cables and only one swap_adapter
event occurred followed by its fail_standby event. No event
occurred for the second network.
PROBLEM SUMMARY:
The customer was using 4-port ethernet adapters and was testing
what would happen if one of these adapter cards failed
completely. With 2 service adapters, one for each of two
different networks, configured on the same 4-port adapter, the
customer pulled both service cables and only one swap_adapter
event occurred followed by its fail_standby event. No event
occurred for the second network.
PROBLEM CONCLUSION:
The problem was that the second swap_adapter event on the
queue was being found to be the same one that had just
completed, because there was no test for the network being
the same. This was corrected in the evque.C routine.
------
APAR: IY23184 COMPID: 5765D5100 REL: 320
ABSTRACT: COLONY CUTOVER
PROBLEM DESCRIPTION:
colony cutover
PROBLEM SUMMARY:
We restructured a few headers to aid future serviciblity.
------
APAR: IY23200 COMPID: 5765E7200 REL: 310
ABSTRACT: OPLOCKS ALLOW DATA CORRUPTION BETWEEN 95 AND NT EXCEL USERS.
PROBLEM DESCRIPTION:
if oplockfiles = yes, Fast Connect server allow uses to
modify a shared MS office file (excell, power point) causing
data corruption
PROBLEM CONCLUSION:
allow only one user having full access to the file,
others can have read-access only.
------
APAR: IY23217 COMPID: 5765E7200 REL: 310
ABSTRACT: SELECT FILE PROPERTIES CHANGES DATE TO 2497
PROBLEM DESCRIPTION:
Timestamp is wrong
PROBLEM CONCLUSION:
Checking if all the timestamp bits are set or not
------
APAR: IY23248 COMPID: 5765D5100 REL: 320
ABSTRACT: USE IBM,7010-S90 FOR CONDORM 6M1
PROBLEM DESCRIPTION:
use ibm, 7010-s90 for condorM 6M1
PROBLEM SUMMARY:
The CondorM 6M1 needs to be added to the spgetdesc command
as a valid node type with the description value of IBM,
7026-6M1.
PROBLEM CONCLUSION:
spgetdesc is usually run by rc.sp on the node during
installation. This command uses the output of uname -M to
lookup the proper name of the node, in this case, IBM,
7026-6M1.
------
APAR: IY23253 COMPID: 5765E5100 REL: 600
ABSTRACT: CRASH IN SNA_V5ROUTER: MEMORY THAT PSE ACCESSED HAD BEEN FREED
PROBLEM DESCRIPTION:
Crash in sn_v5router
LOCAL FIX:
Code update to be provided by development to address problem in
freed memory.
PROBLEM SUMMARY:
Crash in putq indirectly from vpr_stream_output_msg.
PROBLEM CONCLUSION:
Correctly use locking to prevent closure of streams while
routing a msg to that stream.
------
APAR: IY23257 COMPID: 5765D5100 REL: 320
ABSTRACT: 128WAY COLONY:CABLE_TEST DOES NOT RESTART FSD
PROBLEM DESCRIPTION:
128way colony: cable_test does not restart fsd
PROBLEM SUMMARY:
The cable_test tool would not be able to issue a 'dsh'
command to any of its nodes, if the hostname on the nodes is
set to the 'ml' interface but the 'reliable hostname' is
still set to the SPlan interface. This is fixed by setting
the HN_METHOD variable to reliable, thus forcing dsh to
use the reliable_hostname.
PROBLEM CONCLUSION:
The cable_test tool was changed to set the HN_METHOD
variable. This will cause the tool to use the reliable
hostnames stored in the SDR.
------
APAR: IY23258 COMPID: 5765D5100 REL: 320
ABSTRACT: 128WAY COLONY:CABLE_TEST COMPLETES, GENERATES ERROR MSGS
PROBLEM DESCRIPTION:
128way colony:cable_test completes, generates error msgs
PROBLEM SUMMARY:
The cable_test tool was causing the following message to
appear on the console when it executed:
mknod: /tmp/pipe5.78662: Do not specify an existing file
With this change, the message no longer appears.
PROBLEM CONCLUSION:
The cable_test tool was modified to remove 'old' pipe files
before continuing, thus avoiding the mknod problem.
------
APAR: IY23260 COMPID: 5765B9501 REL: 320
ABSTRACT: MMCHMGR RESULTS IN LONG WAITERS-WAITING FOR SG MGR MIGRATE
PROBLEM DESCRIPTION:
mmchmgr results in long waiters-waiting for sg mgr migrate
PROBLEM SUMMARY:
Deadlock in GPFS when running mmchmgr with a quota command
active.
PROBLEM CONCLUSION:
If a quota file token is lost to another node, migrating
the FS mgr to another node will hang after it has
blocked TM activity and then tries to flush the quota file
which needs to get the token back. Sync/End the quota
manager before blocking TM activity.
------
APAR: IY23262 COMPID: 5765B9501 REL: 320
ABSTRACT: VNOP_LOOKUP WITH VATTR ERROR RETURNS RC=0
PROBLEM DESCRIPTION:
If lookup succeeds but getattr fails to fill in vattr struct,
return error code instead of just setting va_flags=VA_NOTAVAIL.
PROBLEM SUMMARY:
An NFS server node paniced in NFS exporting a GPFS file
system.
PROBLEM CONCLUSION:
VNOP_LOOKUP with vattr was getting return code 0
and NFS assumed that the vattr structure was filled in
without looking at va_flags. If lookup succeeds
but getattr fails to fill in vattr struct,
return error code instead of just setting
va_flags=VA_NOTAVAIL.
------
APAR: IY23290 COMPID: 5765E5400 REL: 440
ABSTRACT: NODE_UP_REMOTE NEEDS TO STOP APPS IN REVERSE ORDER - HAS,HAES
PROBLEM DESCRIPTION:
A customer noticed that during a node_up_remote event on a node
which had resources for the remote node coming up, that it
stopped the applications in listed order, rather than in reverse
order. This caused a 'dependency' problem in the applications
thatdepended on the first listed applications to be up to stop
properly. However, with stopping the first listed applications,
the dependent applications failed to halt properly.
PROBLEM SUMMARY:
Customer noticed that application resources taken over by one
node, stopped the applications in listed order, when the
failed node returned to the cluster. This caused a
'dependency' problem with applications that depended on the
first listed applications to be active, so they they could
halt properly.
Meaning, applications started later in the application list
were dependent on the applications started sooner in the
application list to start, run, and halt properly.
PROBLEM CONCLUSION:
Code in HACMP/HAES 'node_up_remote.sh' event script changed
such that the applications to stop are listed and stopped in
reverse order.
------
APAR: IY23338 COMPID: 5765D9300 REL: 310
ABSTRACT: CHECK SPELLING OF MP_WAIT_MODE ENVIRONMENT VARIABLE
PROBLEM DESCRIPTION:
Check spelling of MP_WAIT_MODE environment variable
PROBLEM SUMMARY:
To support a no-poll wait option in PSSP V3.3, allow for
MP_WAIT_MODE=NOPOLL spelling in POE V3.1 release.
PROBLEM CONCLUSION:
Change in PE V3.1 will allow a PSSP V3.3 PTF to support a
no-poll wait option.
------
APAR: IY23363 COMPID: 5765E5100 REL: 601
ABSTRACT: CUMULATIVE APAR FOR 6.0.1.1 FOR CS/AIX
PROBLEM DESCRIPTION:
Cumulative apar for 6.0.1.1 for CS/AIX.
------
APAR: IY23493 COMPID: 5765E7200 REL: 310
ABSTRACT: CIFS FAILS TO START NETWORK LOGON
PROBLEM DESCRIPTION:
Fast Connect fails to start Network Logon because
PCs taking over Domain name.
PROBLEM CONCLUSION:
force to have this Domain name.
------
APAR: IY23496 COMPID: 5765B9501 REL: 330
ABSTRACT: MMCHMGR RESULTS IN LONG WAITERS-WAITING FOR SG MGR MIGRATE
PROBLEM DESCRIPTION:
mmchmgr results in long waiters-waiting for sg mgr migrate
PROBLEM SUMMARY:
Deadlock in GPFS when running mmchmgr with a quota command
active.
PROBLEM CONCLUSION:
If a quota file token is lost to another node, migrating
the FS mgr to another node will hang after it has
blocked TM activity and then tries to flush the quota file
which needs to get the token back. Sync/End the quota
manager before blocking TM activity.
------
APAR: IY23557 COMPID: 5765B9501 REL: 320
ABSTRACT: MULTI NODE MKDIR'S FAILURES
PROBLEM DESCRIPTION:
multi node mkdir's failures
PROBLEM SUMMARY:
running a large number of inserts into the same directory in
parallel from multiple systems causes an incorrect rejection
of a mkdir
PROBLEM CONCLUSION:
Correct locking error in parallel inserts into large
directories.
------
APAR: IY23558 COMPID: 5765B9501 REL: 330
ABSTRACT: MULTI NODE MKDIR'S FAILURES
PROBLEM DESCRIPTION:
multi node mkdir's failures
PROBLEM SUMMARY:
running a large number of inserts into the same directory in
parallel from multiple systems causes an incorrect rejection
of a mkdir
PROBLEM CONCLUSION:
Correct locking error in parallel inserts into large
directories.
------
APAR: IY23610 COMPID: 5765B9501 REL: 320
ABSTRACT: BUFFERDATABLOCKNUM <= OFP -> METADATA.GETLAST
PROBLEM DESCRIPTION:
bufferdatablocknum <= ofp -> metadata.getlast
PROBLEM SUMMARY:
Gpfs self check logic detected an error at bufdesc.C line
4591.
PROBLEM CONCLUSION:
mergeInode cannot release the cacheObj mutex and the
atomVarLock on the inode while it swaps the fileSize
it currently has with the fileSize it reads from disk.
------
APAR: IY23612 COMPID: 5765B9501 REL: 330
ABSTRACT: BUFFERDATABLOCKNUM <= OFP -> METADATA.GETLAST
PROBLEM DESCRIPTION:
bufferdatablocknum <= ofp -> metadata.getlast
PROBLEM SUMMARY:
Gpfs self check logic detected an error at bufdesc.C line
4591.
PROBLEM CONCLUSION:
mergeInode cannot release the cacheObj mutex and the
atomVarLock on the inode while it swaps the fileSize
it currently has with the fileSize it reads from disk.
------
APAR: IY23613 COMPID: 5765B9501 REL: 330
ABSTRACT: VNOP_LOOKUP WITH VATTR ERROR RETURNS RC=0
PROBLEM DESCRIPTION:
If lookup succeeds but getattr fails to fill in vattr struct,
return error code instead of just setting va_flags=VA_NOTAVAIL.
PROBLEM SUMMARY:
An NFS server node paniced in NFS exporting a GPFS file
system.
PROBLEM CONCLUSION:
VNOP_LOOKUP with vattr was getting return code 0
and NFS assumed that the vattr structure was filled in
without looking at va_flags. If lookup succeeds
but getattr fails to fill in vattr struct,
return error code instead of just setting
va_flags=VA_NOTAVAIL.
------
APAR: IY23614 COMPID: 5765B9501 REL: 330
ABSTRACT: INCORRECT ASSERTION CHECK ON SPARSE FILE
PROBLEM DESCRIPTION:
incorrect assertion check on sparse file
PROBLEM SUMMARY:
Correct an incorrect debug assertion discovered in
development.
PROBLEM CONCLUSION:
Correct an incorrect debug assertion discovered in
development.
------
APAR: IY23706 COMPID: 5765E8500 REL: 200
ABSTRACT: X25NPI CREATES A BAD MESSAGE SITUATION.
PROBLEM DESCRIPTION:
panic in streams freeb() > t -mk Skipping first MST
MST STACK TRACE: 0x004a5eb0
(excpt=00000000:00000000:00000000:00000000:00000000) (intpri=0)
IAR: .panic_trap+0 (00012678): tweq r1,r1
LR: . pse:freeb +38 (014c84c0) 34716a44:
. pse:freemsg +20 (014bae18) 34716a84:
. pse:flushq +1ec (014bb054) 34716ae4:
. pse:sth_rput +5b0 (014d0460) 34716b44:
. pse:csq_run +23c (014bde38) 34716ba4:
. pse:csq_lateral +a4 (014bceac) 34716c04:
. pse:putnext +1b4 (014ba05c) 34716c54:
. npi:npirput +c0 (01624550) 34716ca4:
. pse:csq_run +23c (014bde38) 34716d04:
. pse:csq_lateral +a4 (014bceac) 34716d64:
. pse:putnext +1b4 (014ba05c) 34716db4:
. ldterm:ldtty_rput +248 (01555ec0) 34716e14:
. pse:csq_run +23c (014bde38) 34716e74:
. pse:csq_turnover +24c (014bd2e4) 34716ee4:
. pse:csq_lateral +e0 (014bcee8) 34716f44:
. pse:runq_run +c8 (014cd788) 34716fa4:
. pse:flip_and_run +38 (014cd8e4) 34716ff4: .low+0
(00000000) 004a5d50: . pse:flip_and_run +18 (014cd8c4)
004a5d90: .i_offlevel+84 (0001c768) 004a5de0:
.i_softmod+338 (0001c440) 004a5e70: flih_603_patch+cc
(00028a0c)
0x2ff3b400 (excpt=00000000:00000000:00000000:00000000:00000000
) (intpri=11)
IAR: .waitproc+74 (000258ec): beq cr1,0x25900
LR: .waitproc+a0 (00025918) 2ff3b388: .procentry+14
(00097630) 2ff3b3c8: .low+0 (00000000)
PROBLEM CONCLUSION:
In npimod.c change npi_prov_event to return not successful if
nothing to process. change npirsrv to push message on queue
rather than the reque command.
------
APAR: IY23737 COMPID: 5765E7200 REL: 310
ABSTRACT: ADDITIONAL FIXES TO DCE REGISTRY
PROBLEM DESCRIPTION:
additional functionality need to be added to this new feature.
PROBLEM CONCLUSION:
Added additional functionality to this feature.
------
APAR: IY23760 COMPID: 5765D5100 REL: 330
ABSTRACT: VSD ASSERTED IN SNDHDRCMPL DUE TO A KLAPI RETURN CODE INDICATIN
PROBLEM DESCRIPTION:
VSD asserted in SndHdrCmpl because we got a KLAPI return code
saying a request could not be sent when in fact the request
had been sent, the I/O was completed and the response was back
all before KLAPI claimed they could not send the request.
PROBLEM SUMMARY:
In a dual adapter configuration, VSD may assert
in the SndHdrCmpl routine when running
the KLAPI protocol due to timing issues.
PROBLEM CONCLUSION:
KLAPI was returning an error code claiming
it could not send a request, when in
fact the request had been sent, the I/O had completed
and the reponse was already recieved. Under these
conditions VSD will set the return code to 0 and
continue processing the request normally.
------
APAR: IY23784 COMPID: 5765B9501 REL: 320
ABSTRACT: DEADLOCK DURING ABORT PREVENTS INTERNALDUMPS
PROBLEM DESCRIPTION:
deadlock during abort prevents internaldumps
PROBLEM SUMMARY:
Potential deadlock in termination of GPFS with heavy mmap
activity.
PROBLEM CONCLUSION:
Deadlock in mmap handling during shutdown if one of the
kprocs is waiting for a mailbox. mmap purge processing
during preclean should not wait for all the kprocs to
finish.
------
APAR: IY23785 COMPID: 5765B9501 REL: 320
ABSTRACT: AFTER TURNING OFF MMAP, KERNEXT REEDED RELOAD
PROBLEM DESCRIPTION:
afterning off mmap, kernext needed reload
PROBLEM SUMMARY:
Allow certain mmap debug cases without a reboot.
PROBLEM CONCLUSION:
Turn off mmapSupported when the daemon ends, so that it
will be off if the daemon comes back up with mmap disabled.
------
APAR: IY23786 COMPID: 5765B9501 REL: 330
ABSTRACT: DEADLOCK DURING ABORT PREVENTS INTERNALDUMPS
PROBLEM DESCRIPTION:
deadlock during abort prevents internaldumps
PROBLEM SUMMARY:
Potential deadlock in termination of GPFS with heavy mmap
activity.
PROBLEM CONCLUSION:
Deadlock in mmap handling during shutdown if one of the
kprocs is waiting for a mailbox. mmap purge processing
during preclean should not wait for all the kprocs to
finish.
------
APAR: IY23788 COMPID: 5765B9501 REL: 330
ABSTRACT: AFTER TURNING OFF MMAP, KERNEXT REEDED RELOAD
PROBLEM DESCRIPTION:
afterning off mmap, kernext needed reload
PROBLEM SUMMARY:
Allow certain mmap debug cases without a reboot.
PROBLEM CONCLUSION:
Turn off mmapSupported when the daemon ends, so that it
will be off if the daemon comes back up with mmap disabled.
------
APAR: IY23792 COMPID: 5765E5400 REL: 440
ABSTRACT: RECONFIG_REOURCES DOES NOT ACTIVATE SWBOOT AFTER DARE
PROBLEM DESCRIPTION:
Attempts to add an HPS network to a running SP cluster fail,
because the necessary boot adapter aliases are not created
during the DARE operation or the reconfig_topology event.
PROBLEM CONCLUSION:
Add code to the reconfig_topology_complete event script that
will initialize the switch and add any required alias labels.
------
APAR: IY23793 COMPID: 5765E5400 REL: 440
ABSTRACT: HAGEO RESTART CLUSTER TOO QUICK LEAVES RSCT CONFUSED HAES
PROBLEM DESCRIPTION:
In a 4 node HAES cluster, cluster services on one node is
stopped, graceful with takeover. The takeover finishes
correctly. Cluster services is restarted on the stopped node,
but none of the cluster nodes respond to the node's request to
join. On the joining node and on its neighbor, the following
error may be recorded in the error log:
LABEL: GS_DOM_NOT_FORM_WA
IDENTIFIER: AA8DB7B3
Type: INFO
Resource Name: grpsvcs
Description: Group Services daemon has not been established.
If left to run, this error will be logged once every 2 hours
No errors are written to hacmp.out, and nothing is written to
clstrmgr.debug on any of the nodes.
PROBLEM CONCLUSION:
If cluster services are restarted too quickly after stopping,
the rsct daemons may not properly recognize the node has gone
down and come back. This problem is aggravated by using HAGEO
where the default timeout values for the GEO networks are
greater than 60 seconds.
The solution is to update the check in rc.cluster (which
prevents one from restarting cluster services too qucikly) to
account for the network tuneables.
------
APAR: IY23794 COMPID: 5765E5400 REL: 440
ABSTRACT: FIX VARIABLE EXPANSION IN NAMESERVER.VERIFY HAES
PROBLEM DESCRIPTION:
The DNS plugin does not start correctly, causing cluster
verification and cluster synchronization to fail. The
following is logged in /tmp/hacmp.out.
Error messages in /tmp/hacmp.out that "file db.* does not
exist or is zero length" even though the file is present
Error messages in /tmp/hacmp.out that group id is incorrect
even though it is.
PROBLEM CONCLUSION:
Change nameserver.verify.sh to use the correct value when
comparing group id.
Change nameserver.verify.sh to change to the correct
directory before checking for existence of files.
Changed nameserver.stop.sh and nameserver.cleanup.sh
comparison from (($? = 0)) to if (($? == 0))
------
APAR: IY23795 COMPID: 5765E5400 REL: 440
ABSTRACT: CHANGE / SHOW TOPOLOGY AND GROUP SERVICES CONFIGURATION HAES
PROBLEM DESCRIPTION:
The following options can still be found on the
Change / Show Topology and Group Services Configuration panel
* Interval between Heartbeats (seconds) 1
* Fibrillate Count 4
these 2 entries are no longer used by topology services and
should be removed from smit.
PROBLEM CONCLUSION:
Remove the unused smit panel options.
------
APAR: IY23796 COMPID: 5765E5400 REL: 440
ABSTRACT: EXTRANEOUS HPS NETWORK EVENTS AFTER MIGRATION HAES
PROBLEM DESCRIPTION:
During migration installation from a previous level of HAES,
the customer will see incorrect network
events for the sp switch - the network is up and functional
but hacmp will generate network down events.
PROBLEM CONCLUSION:
Eliminate an out of date test in one of the cluster utilities.
------
APAR: IY23797 COMPID: 5765E5400 REL: 440
ABSTRACT: ARP USAGE INFO IN HACMP.OUT HACMP-HAES
PROBLEM DESCRIPTION:
Usage information for the ARP command is appearing in hacmp.out
when arp commands are run by the swap_adapter event. This is
caused by a change in output of the arp command between aix 4
and aix 5.
PROBLEM CONCLUSION:
Add additional parsing of the arp output to filter the
additional information in the aix 5 output.
------
APAR: IY23799 COMPID: 5765E5400 REL: 440
ABSTRACT: SYNCHRONIZING CLUSTER TOPOLOGY SHOULD NOT MAKE THE HANDLE
PROBLEM DESCRIPTION:
When upgrade from HAS 4.2.2 to HAS 4.3.1 each node will have a
unique handle but if you synchronize topology they will all be
equal.
PROBLEM SUMMARY:
Migration to HAES will not complete and topology services will
loop with the following error message,
errorCb: Sredrive already scheduled, return
PROBLEM CONCLUSION:
Make the HACMPcluster ODM handle value unique when cluster
topology is synchronized.
------
APAR: IY23800 COMPID: 5765E5400 REL: 440
ABSTRACT: CLHARVEST_VG: PROBLEM ACCESSING THE CONFIGURATION FILE HAES
PROBLEM DESCRIPTION:
clharvest_vg: Problem accessing the configuration file.
PROBLEM CONCLUSION:
Fix package name so /usr/sbin/cluster/etc/config diredctory is
added.
------
APAR: IY23802 COMPID: 5765E5400 REL: 440
ABSTRACT: CLCHPARAM: NODE NAME VERBOSE_LOGGING=HIGH DOES NOT EXIST
PROBLEM DESCRIPTION:
clchparam: Node Name VERBOSE_LOGGING=high does not exist.
PROBLEM CONCLUSION:
The object = "DEBUG_LEVEL" does not exist in the HAS HACMPnode
ODM so a : should not be returned.
------
APAR: IY23805 COMPID: 5765E8200 REL: 230
ABSTRACT: UCFGGMD CAUSES SYSTEM CRASH, DATA STORAGE INTERRUPT
PROBLEM DESCRIPTION:
During HAGEO graceful down with takeover of one node in a 4
node cluster with active remote GMD I/O, ucfggmd is run (from
geo_stop_gmds) which causes the system to crash with a data
storage interrupt.
PROBLEM CONCLUSION:
A programming error in the krpc kernel extension was fixed.
------
APAR: IY23806 COMPID: 5765E8200 REL: 230
ABSTRACT: HAGEO UTILITIES OPTION ON GEORM MAIN PANEL GEORM
PROBLEM DESCRIPTION:
HAGEO messages still appear in the geoRM smit panels.
PROBLEM CONCLUSION:
Replace the occurences of HAGEO with geoRM.
------
APAR: IY23807 COMPID: 5765E8200 REL: 230
ABSTRACT: GEORM: SMIT ADD NODE SCREEN SITE NAME IN WRONG PLACE GEORM
PROBLEM DESCRIPTION:
When adding a node via the SMIT Add Node screen
(new_node.diaglog path) two fields are displayed in the
incorrect order in relation to the values at the right -
Site Name and Promote Failure Timeout. The command is
built and executes properly.
PROBLEM CONCLUSION:
The entries in the catalog need to be reversed.
------
APAR: IY23808 COMPID: 5765E7200 REL: 310
ABSTRACT: FILES WITH .BMP CANNOT BE OPENED FROM CLIENTS
PROBLEM DESCRIPTION:
edit the file with .bmp extension and it fails to edit.
PROBLEM CONCLUSION:
enable the sharing for .bmp file.
------
APAR: IY23816 COMPID: 5765B8100 REL: 220
ABSTRACT: 12 SECOND SILENCE ON LINE CAUSES HANGUP ON FRENCH ISDN SYSTEMS
PROBLEM DESCRIPTION:
After short utterance spoken by caller for recognition, he gets
12 second silence, then platform hangs up. This is on a French
ISDN system.
PROBLEM SUMMARY:
After caller utterance, gets 12sec silence
then platform hangs up. This is the case on French ISDN system.
PROBLEM CONCLUSION:
Correct handling when dealing with languages other than En_US.
------
APAR: IY23833 COMPID: 5765E8200 REL: 230
ABSTRACT: GEO_REMOTE_PEER_DOWN 258 : /GEO_SHOW_CONFIG: NOT FOUND HAGEO
PROBLEM DESCRIPTION:
Errors occur during cluster event processing, and the
/tmp/hacmp.out file contains:
Geo_remote_peer_down 258 : /geo_show_config: not found
PROBLEM CONCLUSION:
Modified the Geo_remote_peer_down script to correct a
programming error.
------
APAR: IY23834 COMPID: 5765E8200 REL: 230
ABSTRACT: CHANGE START_SERVER TO RUN CFGGMD IN PARALLEL HAGEO
PROBLEM DESCRIPTION:
The Geo_start_server scripts together with other scripts
used by HACMP to start GeoMirror devices configure the
GMDs in serial. Although the startgmd command was modified
to work in parallel no such advantage has been exploited
in the scripts to run cfggmd in parallel.
PROBLEM CONCLUSION:
Replace the call to cfggmd in the scripts with section which
builds a command line. The call the cfggmd command with this
command line to start the GMDs in parallel.
------
APAR: IY23837 COMPID: 5765E8200 REL: 230
ABSTRACT: GEO_VERIFY REQUIRES IP LABEL SAME AS HOSTNAME/NODENAME
PROBLEM DESCRIPTION:
If the user selects node names which are not the
same as the hostname of hte machines then geo_verify
fails as it cannot get the size of the remote devices.
PROBLEM CONCLUSION:
Change the rpc.geod such that it uses an interface address to
get to the remote host rather than the hostname itself.
------
APAR: IY23858 COMPID: 5765E8500 REL: 200
ABSTRACT: INVALID I_CLEAR() IN TWDINIT
PROBLEM DESCRIPTION:
PMR 23588,235,631
MST STACK TRACE: 0x2ff3b400
(excpt=00000000:42000000:60014468:20005030:00000106)
(intpri=11)
IAR: .i_clear+c4 (0001d35c): tweqi r3,0x0
LR: .i_clear+48 (0001d2e0) 2ff3b070:
. twd:twdinit +510 (0163ecc0) 2ff3b2d0: .config_dd+2c0
(001bb37c) 2ff3b370: .sysconfig+17c (001bb95c)
2ff3b3c0: .sys_call_ret+0 (00003a90) 00000001: .low+0
(00000000)
PROBLEM SUMMARY:
Assert ini_clear()
Assert in i_clear()
MST STACK TRACE: 0x2ff3b400
(excpt=00000000:42000000:60014468:20005030:00000106)
(intpri=11)
IAR: .i_clear+c4 (0001d35c): tweqi r3,0x0
LR: .i_clear+48 (0001d2e0) 2ff3b070:
. twd:twdinit +510 (0163ecc0) 2ff3b2d0: .config_dd+2c0
(001bb37c) 2ff3b370: .sysconfig+17c (001bb95c)
2ff3b3c0: .sys_call_ret+0 (00003a90) 00000001: .low+0
(00000000)
PROBLEM CONCLUSION:
remove i_clear(&brd_ictx dd.brd_num .sintr);
remove i_clear (&brd_ictx dds.brd_num .sintr) ;
------
APAR: IY23859 COMPID: 5765E7200 REL: 310
ABSTRACT: CHANGE GROUP TYPE OF DOMAIN NAME.
PROBLEM DESCRIPTION:
Can't start networklogon support on FastConnect server,
PC claim to have the domain name
PROBLEM CONCLUSION:
register the domain name as group type instead of unique name
type
------
APAR: IY23869 COMPID: 5765E7200 REL: 310
ABSTRACT: RAS: OUTPUT CIFS-BUILD INFO TO CIFSLOG
PROBLEM DESCRIPTION:
CIFS customers needing service may need to supply extra
data-collection, etc. to determine exact CIFS-version.
PROBLEM CONCLUSION:
Change cifsServer Makefile to timestamp every CIFS build,
and output that build-time to cifsLog when starting CIFS.
------
APAR: IY23875 COMPID: 5765E5100 REL: 600
ABSTRACT: LIMITED-RESOURCE SESSION INDICATOR IS SET IN NLP.
PROBLEM DESCRIPTION:
Limited-resource session indicator ( bind offset 25) is set
in NLP unexpectedly.
PROBLEM SUMMARY:
HPR sessions unexpectedly set as limited-resource.
PROBLEM CONCLUSION:
Code changed to not always change the BIND to limited resource
in the HPR layer.
------
APAR: IY23927 COMPID: 5765E8500 REL: 200
ABSTRACT: MCA ARTIC960 PORTS GET INTO THE DEFINE STATE AFTER REBOOT
PROBLEM DESCRIPTION:
Cannot create ports for ARTIC960 MCA adapter after upgrading
to sx25.rte 1.1.5.16.
PROBLEM CONCLUSION:
This recovery procedure was intended for artic960hx. It was
implemented into the MCA as a posible recovery. The procedure
will be take out from the microcode of the MCA.
------
APAR: IY23961 COMPID: 5765B9501 REL: 330
ABSTRACT: ASSERT SUBROUTINE FAILED: WASASSIGNED BUFMGR.C, LINE 5285
PROBLEM DESCRIPTION:
unassignBuffer has to wait if the buffer is being unpinned by
some other thread.
PROBLEM SUMMARY:
GPFS self check logic asserted at bufmgr.C line 5285
PROBLEM CONCLUSION:
unassignBuffer has to wait if the buffer is being unpinned
by some other thread.
------
APAR: IY23966 COMPID: 5765B9501 REL: 330
ABSTRACT: ASSERT: LOP->GET_OBJ_STATUS() == LKOBJ::VALID
PROBLEM DESCRIPTION:
Assert: loP->get_obj_status() == LkObj::valid
A failure to allocate a new data block in modifyBuffer, e.g.,
due to quota limit, was leaving the BufferDesc in a half-valid
state, causing assert when attempting to read the buffer later
PROBLEM SUMMARY:
GPFS self check logic asserted at: bufmgr.C, line 4081
PROBLEM CONCLUSION:
When looping through buffers to free off the clock list,
reset the err variable each time around the loop, and ignore
recently unpinned buffers.
------
APAR: IY23968 COMPID: 5765B9501 REL: 330
ABSTRACT: ASSERT Q->MEMSTATE == BUFFER::MEMPINNED BUFMGR.C,
PROBLEM DESCRIPTION:
Assert q->memState == Buffer::memPinned bufmgr.C, line
When looping through buffers to free off the clock list,
reset the err variable each time around the loop and ignore
PROBLEM SUMMARY:
GPFS self check logic declared an error when
trying to reopen a file where the last write had failed due
to the user quota being exceeded.
PROBLEM CONCLUSION:
A failure to allocate a new data block
in modifybuffer, e.g. due to quota limit, was leaving the
bufferdesc in a half-valid state, causing assert when
attempting to read the buffer later.
------
APAR: IY23981 COMPID: 5765B9501 REL: 320
ABSTRACT: ASSERT Q->MEMSTATE == BUFFER::MEMPINNED BUFMGR.C,
PROBLEM DESCRIPTION:
Assert q->memState == Buffer::memPinned bufmgr.C, line
When looping through buffers to free off the clock list,
reset the err variable each time around the loop and ignore
PROBLEM SUMMARY:
GPFS self check logic declared an error when
trying to reopen a file where the last write had failed due
to the user quota being exceeded.
PROBLEM CONCLUSION:
A failure to allocate a new data block
in modifybuffer, e.g. due to quota limit, was leaving the
bufferdesc in a half-valid state, causing assert when
attempting to read the buffer later.
------
APAR: IY24250 COMPID: 5765B8100 REL: 220
ABSTRACT: SYSTEM CRASH DURING DISABLE_CHANNEL
PROBLEM DESCRIPTION:
Disable_Cahnnel causing system crash from .kwakeup at
.simple_lock+18
PROBLEM CONCLUSION:
Make closing of a channel fd scan through
each channel to find any channels referencing this
fd and clean them up
------
APAR: IY24428 COMPID: 5765E8200 REL: 230
ABSTRACT: SYSTEM CRASH DURING GMD MKDEV
PROBLEM DESCRIPTION:
When a geomirror device is made Available by mkdev, the system
will frequently crash (888 102 700).
PROBLEM CONCLUSION:
The GMD device driver was not functioning correctly on
AIX 5.1 while configuring a GMD device that uses mwc mode.
The problem has been corrected.
------
APAR: IY24564 COMPID: 5765D5100 REL: 330
ABSTRACT: REQUIRED UPGRADES FOR R3.3.0
PROBLEM DESCRIPTION:
required upgrades for R3.3.0
------
APAR: IY24671 COMPID: 5765C3403 REL: 430
ABSTRACT: MEDIA ERRORS NOT PROPERLY REPORTED ON IDE CD-ROM
PROBLEM DESCRIPTION:
When the cdrom drive has problem reading the CD-ROM media,
(due to scratches, incompatible data format, etc.), the error
does not get properly reported to the application. As a
result, the user may unknowingly treat the bad data as good
data from the CD.
PROBLEM CONCLUSION:
the problem is caused by the hardware DMA engine incorrectly
reporting the status. to fix this, the device driver will
look at both the DMA status, as well as the device's (cdrom
drive's) status and report any error encountered.
------
APAR: IY24926 COMPID: 5765D5100 REL: 320
ABSTRACT: LATEST PSSP 3.2.0 FIXES AS OF OCTOBER 2001
PROBLEM DESCRIPTION:
This is the lastest PSSP ptf as of October 2001.
Order this apar to get all of the ptfs as of October 2001.
PROBLEM SUMMARY:
This is a packaging apar for PSSP 3.2.0 fixes
as of October 2001.
PROBLEM CONCLUSION:
This is a packaging apar for PSSP 3.2.0
fixes as of October 2001.
------
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]