|
Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com |
From: Yuri K (koroby398
ifrance.com)Date: Sun Feb 10 2002 - 13:17:12 CST
Hello dreamwvr,
Sunday, February 10, 2002, 16:27:33, you wrote:
ddc> On Sun, Feb 10, 2002 at 09:13:06AM +0100, johansz
free.fr wrote:
>> > article URL below) Sun Microsystems is going to incorporate Linux into
>> >
>> > its "high-end" offerings (currently they offer it only on their "low
>> > end" Cobalt appliances).
ddc> What does this have to do with BSD? or more specifically OpenBSD?
ddc> Just curious.
>> There is one year Sun became Mandrake partner (Linux)
ddc> Must have just happened recently..
>> Sun decides to stop Solaris for intel (for cost and support reason)
ddc> Having used Solaris for Intel since it's pre-released I can tell
ddc> you there was very little cost for them other than wrapping paper.
ddc> Why? Well they did on a scale of 1 and 100 support for there product
ddc> was around a 1. ( If you were lucky. )
>> Solaris IS NOT THE BEST UNIX and patching each week have a cost...
>> and i am working for sun ...
ddc> There is nothing wrong with Solaris from a proprietary NIX point
ddc> of view. They are no better or worse than any other. Actually They
ddc> are much better than some I would like not to mention;-))
ddc> Driver support without story is always good as well.
here is your story, very insightful one:
(and it does cost money, especially when you have morons playing
managers)
From: Bruce Adler (bruce.NxOxSxPxAxMx.adler
acm.org)
Subject: Re: Sun will start shipping x86 Linux machines..
Newsgroups: comp.unix.solaris, alt.solaris.x86
Date: 2002-02-09 15:52:40 PST
Well, that's not exactly true. There's a long history behind those
Adaptec and LSI drivers which almost no one outside of Sun knows about
(and the few managers left at Sun who were directly involved are
motivated to lie about what really happened to avoid making themselves
look like fools).
Adaptec and LSI have produced exactly one driver each (cadp160 and
symhisl). All the other Solaris drivers for their supported products
(adp, cadp, ncrs) are either drivers I wrote or are derived directly
from a driver I wrote (glm).
The rumor I heard was that NCR/Symbios/LSI Logic did the x86 symhisl
driver because of all the chips Sun buys for their SPARC products (it
was a one man project that took about 3-4 months and he did such a good
job that there's very little on going maintenance costs).
On the other hand, Adaptec initially did the x86 and SPARC versions of
their drivers because they thought they could steal the SPARC business
away from LSI Logic, but once they realized that was never going to that
they're now charging (or sharing the costs) with Sun for the x86 cadp160
driver (and told Sun, "take a hike", when Sun asked, "what about a driver
for your Windows-only RAID adapters").
Furthermore, the details of how those two drivers got created are,
I think, somewhat enlightening to understand why Sun's x86 IHV
effort was such an abysmal failure.
Contrary to what anyone thinks, the cadp160 driver is *not* based on
the cadp driver. That would have been too obvious and too simple. It's
also not based on any recent version of the adp driver. Rather, it's
based on (wait for it ...) the obsolete and very buggy 2.4 version
of the adp driver. Those of you that have been using Solaris since
the 2.4 release might remember that back then Solaris x86 didn't have
nearly the stability it has now. It took lots of bug fixes, patches,
and rewrites over the years between 1991 and 1995 to get Solaris
to the point of being as rock-stable as a Solaris SPARC system. The
old version of the adp driver had more than its own share of bugs. In
fact, the adp driver had so many known problems and missing features
that I completely redesigned it for one of the 2.5 DU releases (2.5
DU3, I think). It was that re-write that produced the first highly
stable version of the adp driver.
The cadp driver that's currently shipping is a Sun written driver
that's directly based on my version of Sun's adp driver. But that wasn't
what was supposed to happen. The Sun x86 IHV manager put a huge effort
into getting Adaptec to produce their own cadp driver without any
of Sun's engineering assistance (that was his definition of how
a IHV program should work; since he was in charge of the IHV program
and since he had less than zero engineering ability, he would have to
share the credit with others if Sun's IHV program involved any sort of
joint engineering effort between Sun and the IHV; he defined success
by how many Sun engineer jobs he could eliminate).
I think that his intent was that any questions from x86 IHV could be
handled via Sun's regular support channels since, "that's what they're
there for" (and I think more importantly, because any such problems
would come out of someone else's budget and therefore he could claim
that he could get third-party drivers written that much more cheaply by
the IHVs than by writing them internally). Because he didn't consult any
experienced Sun device driver engineers while working with Adaptec,
Adaptec ended up with a lot of bad information and the wrong version of
Sun's adp driver as a starting point (as near as I was able to determine
long after the fact, what happened is that at the beginning of the cadp
project Adaptec told Sun's IHV manager that they already had the adp
source code, and even though Sun's IHV manager was aware that adp had
recently been completely re-designed, he never even considered asking
Adaptec what version of the adp source code they had).
The Adaptec cadp driver was supposed to be one of the first of a line
of x86 IHV drivers that (the first incantation of) Sun's x86 IHV program
would produce (Sun has resurrected and "re-tooled" their IHV program
multiple times over the past few years but it's basically always been
the same and has always been a non-starter because it's always
been run by the same set of managers who are pigeon-holed doing
miscellaneous x86-related management tasks (i.e., they plateau-ed
years ago and are stuck off in an obscure branch of the Sun
management hierarchy).
I strongly suspect that the primary reason Adaptec originally got involved
in the x86 IHV project was that Adaptec had much larger goals than a x86
driver. They weren't interested in simply adding another OS driver to
the floppies inside of their 2940 boxes. What they really wanted was to
sell a bunch of hardware to Sun to use on their SPARC boxes (Adaptec,
like Sun, gives the software away for free and makes their profits on
hardware sales). Initially they wanted Sun to buy their Ultra2 SCSI HBA
chips and eventually they wanted Sun to buy their Fibre Channel chips.
Towards that end, Adaptec delivered both an x86 version of the cadp
driver and a SPARC version of the cadp driver for Adaptec's then
shipping Ultra2 SCSI HBAs.
Adaptec figured that since the cadp driver for both platforms was compiled
from common source code, that it was to their benefit to accept Sun's x86
IHV assistance in the hopes that would give them an advantage in their real
goal of selling Adaptec SCSI chips to Sun's SPARC hardware engineers. I
think Adaptec (wrongly) assumed that the Sun manager that ran the x86 IHV
program had some influence over (or even knew) the Sun managers who made
hardware procurement decisions (either that or the x86 IHV manager simply
lied to Adaptec, or made promises he couldn't possibly fulfill). On
the contrary, in fact when some of the SPARC-bigots at Sun reviewed
Adaptec's cadp source code, they saw some of the Solaris x86-specific
vestiges and made rude comments on the level of, "it's never going to be
as good as glm" (either because they didn't realize that a sub-human x86
programmer (me) had written the original version of their precious SPARC
glm driver, or because they didn't realize I would eventually indirectly
get copies of their emails).
Unfortunately neither the x86 or SPARC version of Adaptec's cadp driver
worked correctly when delivered. And because they were delivered to
separate departments within Sun, there were two separate groups of
engineers within Sun who independently told their managers that the
Adaptec cadp driver had so many flaws that it might never work correctly
without a total re-write (if I wasn't so lazy, I could probably still
find in one of my email archives the email I wrote that contains my
estimates for both options). My recollection is that about the only
difference in the two evaluations was that the SPARC engineers didn't know
the reason why Adaptec's driver was so flawed, was that Adaptec was given
the wrong version of the driver (the 2.4 x86 adp driver) as a starting
point.
Because the x86 group couldn't just ignore not having a working driver
for the latest Adaptec Ultra2 SCSI products, I and a couple of other
engineers had to almost completely re-write the cadp driver. The re-write
used the latest and greatest 2.6 adp driver as a starting point. Actually
it wasn't a total re-write, it was kind of a piece-wise merge (using the
GUI merge tool in Workshop) because there was still a lot of commonality
between the obsolete 2.4 version of adp and the 2.6 version of adp. (If
Adaptec had been given the 2.5 or 2.6 version of adp as their starting
point, their version of cadp probably would have had at less than half
the flaws it had, and I would have recommended fixing Adaptec's cadp
driver rather than a total re-write).
On the SPARC side of the house, they concluded that not only did Adaptec's
cadp driver just not work correctly, that even if it could be made to
work correctly, that they already had a driver (glm) that delivered much
better performance using a chip that cost a lot less.
In other words, on the SPARC platform an HBA based on the Adaptec chip
would always be less efficient than the already existing (and already
working) NCR/Symbios/LSI Logic based HBAs. Of course, the SPARC engineers
could easily have done that same analysis before Adaptec went to all the
trouble of writing a cadp driver, but you can't expect an x86 IHV manager
with zero engineering talent to know something like that's even possible,
or expect him to even think to ask whether there might be design
limitations that might make Adaptec's chip a poor choice for a SPARC box.
Therefore, contrary to Adaptec's hopes, there was zero chance that
Sun would replace their glm-based HBAs with an cadp-based HBA. And given
that Adaptec had nearly no sales of their earlier AHA-2940 w/Open Firmware
boards, Sun wasn't very concerned about not having a working SPARC cadp
driver for Adaptec's new Ultra2 HBA products. So Sun's SPARC people
basically told Adaptec, "thanks, but no thanks". Sun wasn't going to buy
any of Adaptec's Ultra2 chips. But they also made it clear to Adaptec
that they weren't going to stand in Adaptec's way if Adaptec wanted to
produce their own SPARC-compatible Ultra2 HBA using their own resources.
Adaptec obviously lost interest in all SPARC HBA products. Adaptec never
did a Solaris 7 64-bit SPARC version of their short-lived SPARC adp
driver, and there never was any public release of Adaptec's SPARC cadp
driver.
Without either a working x86 or SPARC version of Adaptec's cadp
driver, I and several other x86 engineers had to scramble to produce
a Sun version of the x86 cadp driver. Up until that point there
was almost no involvement by any Sun engineers in the cadp project.
It had been a one man show by Sun's x86 IHV manager. Yet when the dust
finally settled he managed to come out of it unscathed and blamed
the delays and added costs on the engineers rather than the flawed
process that produced a worthless cadp driver and wasted months of
precious calendar time. The same engineers that delivered the bad news
about the Adaptec version of the cadp driver also got stuck with
producing the replacement cadp driver (with very little help from a
disappointed Adaptec). Amazingly, those engineers got blamed by the x86
IHV manager both for goring his sacred-cow, and then got accused by him
of being incompetent when the re-written driver wasn't done on time due
to hidden bugs in Adaptec's firmware (which Adaptec never released
the source code for and took months to debug and fix).
In the two year period between Solaris 7 and Solaris 8, Adaptec
transformed their cadp driver (still based on the obsolete 2.4
version of adp) into the cadp160 driver. Of course, the stuck
with their original version of capd and never adopted the Sun
rewritten version of cadp even though that's the only version that
ever worked correctly. During that same period Sun whittled away at
their own internal x86 driver group and x86 driver development budgets
so that there was zero possibility that they could even consider
updating their own cadp driver to support Adaptec's Ultra160 SCSI HBAs.
Therefore, by a process of elimination that left Adaptec's cadp160
driver as the only candidate regardless of how bad it's quality was (i.e.,
it was a Hobson's Choice). AFAIK, Adaptec's cadp160 driver had (and
probably still has) a shitwad of bugs that were long ago eliminated from
the adp and capd drivers (AFAIK only cadp160 bug reports get forwarded
Adaptec; Adaptec never sees any of the current or prior adp and cadp
bug reports). The cadp160 driver is also still missing most of the
features that were added to the redesigned adp driver to make it more
compatible with all the SPARC HBA drivers (compare the cadp(7) and
cadp160(7) man pages to see at least some of the externally visible
deficiencies in the cadp160 driver).
The obvious question one would ask is why the hell would Sun do something
like that (i.e., give their IHV an obsolete driver on which to base a new
driver and then cut-off their access to any sort of engineering support)?
In my opinion, the answer to that question is that the managers who ran
Sun's x86 IHV program were talentless grunts who were more concerned with
shifting the responsibility for driver development off of their own backs
than they were concerned for producing a viable and quality product.
I think they reasoned that they only way they were ever going to
collect their full year-end bonuses (and get that big promotion to
something unrelated to x86 drivers) was to redefine their jobs so
that they wouldn't be held responsible no matter how bad the quality
of the new IHV drivers. When they first came up with the "shift all
new driver development to the IHVs" idea, they had already made a
very expensive and very unproductive mess of Sun's internal driver
development projects. But because they were the number one reason for that
mess, and because they had no engineering experience, I think even they
must have at least subconsciously realized that they couldn't
possibly straighten out their own mess. Their only alternative was to
stop doing x86 driver development internally (and to do it before
Sun's upper management noticed there was a major problem and brought in
an outside manager like they had to do back in 1993, which was the first
time the x86 driver development projects got totally out of control).
I think those managers figured that if they could just shift all the x86
driver development risks and responsibilities to the x86 IHVs (that they
swore they could easily line up) then could then get rid of all of their
own engineers. In other words, since they had zero engineering experience
and couldn't write a device driver if their lives depended on it, they
decide the problem was the expensive x86 programmers on their own staffs.
Their solution to that problem was to get rid of all of their staff except
end up with cushy jobs schmoozing the IHVs with a Sun expense account.
They'd have no responsibility, no engineering projects to screw up, and
they would still collect the same pay as before.
Of course, nobody would buy such a proposal if you told them that, if they
would just straighten out their own mess, that it would take no more effort
to develop x86 drivers internally than it's going to take to provide real
technical support to an active x86 IHV. And anybody with half a brain could
recognize that there are certain key x86 drivers that Sun can not simply
ignore the quality of. Therefore there's very little cost saving on what
was then the most expensive phase of a Sun device driver project. Sun would
have to, prior to every release, continue to fully QA test most of the x86
IHV drivers they lined up. And of course most of the release engineering
and packaging costs of drivers that are included on the installation
CDROM would continue to be paid by Sun.
It doesn't take rocket science to realize that over the lifetime of a
device driver, the initial development costs of that driver (if it's
done correctly) are a rather small fraction of the total lifetime costs
of that driver. So I've kind of always assumed that there was never any
sort of real cost-benefit analysis done for any of the x86 IHV projects
that Sun undertook. It was more likely, this is the only thing we know
how to do and/or we don't like doing x86 device drivers so what can we
do instead?
The story behind the Symbios/LSI Logic symhisl driver is a completely
different one than the story behind the cadp driver. And for a lot of
reasons, the Symbios symhisl project turned out completely differently.
Symbios's symhisl project started off the same as the Adaptec cadp
project. The x86 IHV manager attempted to persuade Symbios to take over
all responsibility for producing all future x86 drivers for their own
products. Also, like he did with the Adaptec cadp project, he
attempted to structure the Symbios symhisl project as a purely third-party
driver development project without any Sun engineering costs. In other
words, Sun was going to get a new x86 driver without having to pay for it
(if you ignore the x86 IHV manager's rather substantial salary and the
costs shifted to the tech support group because Symbios was getting a
full Solaris tech support privledges without paying for it). Although
the Adaptec project followed that model initially (up until it
self-destructed), the Symbios project only followed that model in the
mind of the x86 IHV manager. The way the project evolved looked nothing
like the way the x86 IHV manager intended it to happen.
When Symbios agreed to write the symhisl driver, Sun already had well
written and well established x86 and SPARC drivers for all the existing
NCR/Symbios/LSI PCI-SCSI products (i.e.e, the ncrs and glm drivers).
Those drivers were mostly done without any assistance from Symbios. So
Symbios either didn't have any in-house Solaris driver writing talent
or perhaps didn't have anyone with that talent immediately available.
Therefore Symbios immediately hired an outside consultant to complete
the symhisl project (I'm pretty certain he was someone who had done
a lot of prior work for Symbios and so he new the Symbios products as
well or better than any Sun engineer).
I don't remember exactly, but I think Symbios's consultant told me he
either had never previously written a Solaris driver, or had only previously
done one simple Solaris driver, or had done SPARC drivers but not x86 drivers.
That didn't matter much because he was a very smart and very talented
programmer. Also, unlike Adaptec, he requested up-front (or was just given)
all the source code for the latest and greatest ncrs and glm drivers (and
he probably also got from Symbios the source code for Symbios's existing
non-Solaris drivers). He also, very early on, established direct access
(via email and phone calls) to myself and some of the engineers responsible
for architecting the PCI-DDI and writing the "Writing Device Drivers"
document (and probably even the SPARC engineers that maintained the SPARC
glm driver).
So Symbios's consultant had about 100 times as much *current* and *correct*
information to start with than Adaptec did.
And regardless of how "arms-length" Sun's x86 IHV manager wanted to run
the x86 IHV program, there was a very close enough relationship between
the various Sun engineers and the Symbios consultant. The communication
between him and Sun's engineers worked well enough that there
weren't any barriers in his way from him doing just as good a job as any Sun
engineer could do. (He in fact did such a thorough job that he discovered
and reported a half a dozen PCI-DDI related bugs in the kernel and the WDD
documentation).
In other words, regardless of what how it said on paper that the x86 IHV
program was supposed to be run, the symhisl project was a highly back-door,
fully cooperative engineering effort between Sun and Symbios. The
Symbios consultant regularly mailed his questions directly to the Sun
engineers who could most quickly answer his questions and the
answers went back to him (usually) within hours.
Conversely with Adaptec, their questions were handled rather badly
(if they ever got answered at all). A technical question from Adaptec
required a days-long round-about trip via multiple levels of management
finally ending up in the hands of some random Sun support person who
frequently gave the wrong answer either because he/she didn't write
device drivers or because Adaptec's questions had been badly translated
from a foreign language! (really, I'm not joking about this one)).
It's highly likely that Sun's x86 IHV manager was completely unaware
about all the engineering give and take that was going on between
Sun and Symbios (I certainly never bothered to inform the x86 IHV manager
about the on-going discussions with Symbios's consultant; if I ever
gave it second thought I probably figured why risk breaking something
that's not broken).
Symbios's consultant was smart enough to recognize almost immediately
that given the huge amount of time and effort that Sun had already put
into both the ncrs and glm drivers, and given that all the
NCR/Symbios/LSI Logic PCI SCSI HBA chips are fundamentally backwards
compatible, and
given that the SPARC glm driver ALREADY SUPPORTED THE EXACT SAME CHIPS
as the ones he was writing an x86 driver for, that writing a completely
new x86 driver for the chip (which mostly was just a faster version of
the old chip that the ncrs driver already supported) was one of those
managerial/marketing decisions that made absolutely no sense at all.
My impression was, that instead of trying to talk Sun or Symbios out of
a such an obviously silly, brain-dead, head-up-their-butts management
decision, that he saw it as an opportunity for him to both get someone else
to pay him a lot of money to learn how to write a Solaris HBA driver, and
as an opportunity to start from scratch and try to see how much he could
improve on Sun's ncrs/glm design with his own design.
Therefore, rather than it being a situation where a company (i.e.,
Adaptec) was trying to win (as cheaply as possible) a medium-sized
sales contract from another company (Sun). It was instead, a golden
opportunity for a motivated consultant to step into a well established,
highly co-operative, existing relationship and use that situation
to broaden his set of skills and his resume (and get paid for it).
-- Best regards, Yuri mailto:koroby398ifrance.com
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]