OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
From: Tony Przygienda (prz_at_XEBEO.COM)
Date: Thu Nov 07 2002 - 10:17:03 CST

  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

    Ash, Gerald R (Jerry), ALASO wrote:

    >Dave,
    >
    >>>Such failures are not the fault of the service provider
    >>>operation or the vendor/equipment implementation. They are
    >>>due to shortcomings in the link-state protocols themselves --
    >>>thus the need for the enhancements proposed in the draft.
    >>>
    >
    >>I strongly disagree with this statement. While the design of the
    >>protocols can make it challenging, there is ample room in
    >>implementation to provide stable and scalable networks.
    >>
    >>When a network collapses, the fault lies at the feet of the
    >>implementers. In every case I've seen (too many), the collapse was
    >>inevitable sooner or later, due to naive design choices in software,
    >>but at the same time was quite nonlinear in its onset (making any
    >>predictive or self-monitoring approach pretty hopeless.)
    >>
    >>There are some things that would make the job easier, at the cost
    >>of additional complexity, but pointing at network collapses
    >>and blaming the protocols is disingenuous.
    >>
    >
    >I think you should review the ample evidence presented in http://www.ietf.org/internet-drafts/draft-ash-manral-ospf-congestion-control-00.txt that the protocols need to be enhanced to better respond to congestion collapse:
    >
    >- Section 2: documented failures and their root-cause analysis, across multiple service provider networks (also review the cited references)
    >- Appendix B: vendor analysis of a realistic failure scenario similar to one experienced as discussed in Section 2 (perhaps you would like to provide your own analysis of this scenario based on your OSPF implementation)
    >- Appendix C: simulation analysis of protocol performance (other I-D's being discussed provide analysis of proposed protocol extensions)
    >
    >To say that network collapse in *every* case is due to *naive design choices* ignores the evidence/analysis presented. Based on the evidence/analysis, there is clearly room for the protocols to be improved to the point where networks *never* go down for hours or days at a time (drawing unwanted headlines & business impact).
    >
    >Jerry
    >
    Jerry, most of the things you say in your document (which is actually
    pretty good) has been
    known to people like Dave and other old-time implementors since years
    and avoiding exactly
    those things by smart implementation techniques was what was
    differentiating the have from
    the have-nots. I remember myself learning some of those things by hard
    experience and some
    by looking at old-hands code ;-) [Albeit I remember also picking up a
    lot of smart control protocol
    ideas from your RTNR work]. I do not think that Dave is putting down
    what you say, rather
    (and I commit the stupidity to interpret his words by my own beliefs)
    that what your document
    says are mostly _implementation_ issues, not _standardization_ and
    therefore it is not a very wise
    idea to add them to the charter of a _standards_ group. Good protocol
    specs are _not_
    implementation cookbooks, they are documents governing bits on the wires
    in such a way that
    two people implementing things in vastly different ways can still talk
    to each other. Recommendations
    of implementation techniques prove long-term inherently dangerous (like
    Joel pointed out, at a
    certain point in time adding more code to an implementation introduces
    more bugs than the
    performance gain is worth) or utterly ridiculous (look at ISIS 0-63
    metric to make SPF real fast,
    it lead to quite bad contortions).

        thanks

        -- tony