OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
From: Ash, Gerald R (Jerry), ALASO (gash_at_ATT.COM)
Date: Thu Nov 07 2002 - 14:00:48 CST

  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

    Rohit,

    > > The problem *was* in the flooding storm that was triggered
    > > in all the failures cited, and the inability to recover.

    > We may be talking about different incidents. The one that I
    > was involved in, was a clear implementation mistake
    > (a case was missed) which was the root cause. It should have
    > been caught in testing if somebody had bothered to
    > look at why certain LSAs were being generated. The excessive
    > flooding was simply the result.

    So we've both observed flooding storms and their bad effects. They can be started in many different ways, but sometimes bring the network down and take hours or days to recover (that is the problem we want to solve).
     
    > I couldn't say this better than Tony/Joel : "at a certain point adding
    > more code to an implementation introduces more bugs than the
    > performance gain is worth". I (like most developers) can attest to this
    > from first hand experience.

    I certainly agree with you, Dave, Joel, Tony (thanks all!) that excellent implementation and testing is essential, and that adding complexity and more bugs surely isn't the goal. I too have experience designing/testing/implementing new routing methods in large-scale applications, and well appreciate these points.

    Summary: We've identified a problem and the problem isn't solved entirely by good implementation/testing/operation, we feel some protocol extensions are necessary.

    Thanks,
    Regards,
    Jerry