<?xml version="1.0" encoding="US-ASCII"?>
<!-- edited with XMLSPY v5 rel. 3 U (http://www.xmlspy.com)
     by Daniel M Kohn (private) -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 PUBLIC "" "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
]>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="info" docName="draft-deng-spring-sr-loop-free-02"
     ipr="trust200902">
  <front>
    <title abbrev="draft-deng-spring-sr-loop-free">SR based Loop-free
    implementation</title>

    <author fullname="Lijie Deng" initials="L" surname="Deng">
      <organization>China Telecom</organization>

      <address>
        <postal>
          <street>109, West Zhongshan Road, Tianhe District</street>

          <city>Guangzhou</city>

          <region>Guangdong</region>

          <code>510000</code>

          <country>China</country>
        </postal>

        <email>denglj4@chinatelecom.cn</email>
      </address>
    </author>

    <author fullname="Yongqing Zhu" initials="Y" surname="Zhu">
      <organization>China Telecom</organization>

      <address>
        <postal>
          <street>109, West Zhongshan Road, Tianhe District</street>

          <city>Guangzhou</city>

          <region>Guangdong</region>

          <code>510000</code>

          <country>China</country>
        </postal>

        <email>zhuyq8@chinatelecom.cn</email>
      </address>
    </author>

    <author fullname="Xuesong Geng" initials="X" surname="Geng">
      <organization>Huawei Technologies</organization>

      <address>
        <postal>
          <street>Huawei Building, No.156 Beiqing Rd</street>

          <city>Beijing</city>

          <region>Beijing</region>

          <code>100095</code>

          <country>China</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>gengxuesong@huawei.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Zhibo Hu" initials="Z" surname="Hu">
      <organization>Huawei Technologies</organization>

      <address>
        <postal>
          <street>Huawei Building, No.156 Beiqing Rd</street>

          <city>Beijing</city>

          <region>Beijing</region>

          <code>100095</code>

          <country>China</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>huzhibo@huawei.com</email>

        <uri/>
      </address>
    </author>

    <date day="24" month="May" year="2024"/>

    <area>RTG Area</area>

    <workgroup>Spring Working Group</workgroup>

    <keyword>RFC</keyword>

    <abstract>
      <t>Microloops are transient packet loops that occur in the network
      following a topology change (link- down, link up, node fault, or metric
      change events). Microloops are caused by the non-simultaneous
      convergence of different nodes in the network. If a converged node sends
      traffic to a neighbor node that has not converged yet (or vice versa),
      traffic may be looped between these nodes, resulting in packet loss,
      jitter, and packet disorder. This document presents some optional
      implementation methods aimed at loop avoidance in different scenarios of
      IGP network convergence.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t/>

      <t>An IP network computes paths based on the distributed IGP protocols.
      If a node or link fails, a loop may occur on the network because LSDBs
      are not synchronized. Take the IS-IS/OSPF link state protocol as an
      example: Each time the network topology changes, some routers need to
      update the FIB table based on the new topology. Due to the different
      convergence time and convergence order, different routers may be
      asynchronous for a short time. Depending on the capability,
      configuration parameters, and service volume of the device, the database
      may not be synchronized in milliseconds to seconds. During this period,
      each device on the packet forwarding path may be in the pre-convergence
      state or in the post-convergence state. If the status is not
      synchronized, forwarding routes may be inconsistent and a forwarding
      loop may occur. However, such a loop disappears after all devices on the
      forwarding path complete convergence. Such a transient loop is called
      &ldquo;microloop&rdquo;. Microloops may cause packet loss, delay
      variation, and packet disorder on the network.</t>

      <t>The Segment Routing defined in <xref target="RFC8042"/> . can be used
      to cope with the microloop issue on the network. When a loop may occur
      due to a network topology change, a network node creates a loop-free
      segment list to direct traffic to the destination address. After all
      network nodes converge, the network node returns to the normal
      forwarding state. This effectively eliminates loops on the network.</t>

      <t><xref target="I-D.bashandy-rtgwg-segment-routing-uloop"/> describes
      the basic principles of how to use Segment Routing to cope with
      microloop. This document describes some optional implementation methods
      of SR for microloop avoidance in different scenarios.</t>
    </section>

    <section title="Conventions used in this document">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref target="RFC2119"/>
      .</t>
    </section>

    <section title="Anti-Microloop Scheme for Switching Scenarios">
      <t/>

      <t>Switching microloops refer to the microloop occurring after the
      node/link fails. Along the traffic forwarding path, it may cause a loop
      if a node closer to the point of failure converges before a node far
      from the point of failure. Figure 1 is used as an example to describe
      the switching microloop caused process: when the link between R3 and R5
      fails, it is assumed that R3 completes convergence first and R2 does not
      complete convergence. R1 and R2 forward the packet along the previous
      path to R3. Since R3 has converged, it forwards the traffic to R2
      according to the route after convergence. Thus, the switching microloops
      happen between R2 and R3.</t>

      <figure>
        <artwork align="center"><![CDATA[
 +----------------------------------------------------------------+
 |                                             X  link failure    |
 |                                                                |
 |   +-------+      +-------+       +-------+                     |
 |   |   R1  |------|   R2  |-------|   R3  |                     |
 |   +-------+  10  +-------+   10  +-------+                     |
 |                       |               |                        |
 |                       | 10            X  10                    |
 |                       |               |                        |
 |                  +-------+       +-------+        +-------+    |                  
 |                  |   R4  |-------|   R5  |--------|   R6  |    |
 |                  +-------+ 1000  +-------+   10   +-------+    |
 |                      End.X SID 4::1                            |
 |                                                                |
 +----------------------------------------------------------------+
Figure 1: Switching illustrative scenario, failure of link R3-R5
]]></artwork>
      </figure>

      <t>TI-LFA (draft-ietf-rtgwg-segment-routing-ti-lfa-12 describes the
      fundamentals of TI-LFA.) is deployed in all nodes of the network, and
      when the link between R3 and R5 fails, the convergence process after
      deploying switching anti-microloop is as follows:</t>

      <t><list style="symbols">
          <t>Phase 1: A hold-down timer T1 is configured on R3 (R3 is the
          neighboring node of the failed link) and R3 uses TI-LFA forwarding
          for the duration of T1: The message is forwarded along the backup
          path with the next hop at node R2 and encapsulates the SR Repair
          List &lt;4::1&gt;.</t>

          <t>Phase 2: A hold-down timer T2 is configured on the remote node
          and the node forwards traffic to R3 (specify the Node Sid of R3) for
          the duration of T2.</t>

          <t>Phase 3: T2 timeout, the remote node returns to normal
          convergence first.</t>

          <t>Phase 4: T1 timeout, R3 reverts back to normal convergence.</t>
        </list>Time T1 must be longer than time T2. This scheme is limited to
      single points of failure, the TI-LFA backup path may be affected in case
      of multi-point failure.</t>
    </section>

    <section title="Anti-Microloop Scheme for Back-switching Scenarios">
      <t/>

      <t>Microloops may occur not only when the node/link fails, but also
      after the failure node/link recovers. Figure 2 is used as an example to
      introduce the process of the back-switching microloop. After the failure
      node/link recovers, it may cause a loop if a node further from the point
      of failure converges before a node closer to the point of failure.</t>

      <t>R1 forwards the traffic to the destination node R6 following the path
      R1-&gt;R2-&gt;R3-&gt;R5-&gt;R6. When the link between R2 and R3 fails,
      R1 forwards the traffic to the destination R6 following the re-converged
      path R1-&gt;R2-&gt;R4-&gt;R5-&gt;R6. After the failure link between R2
      and R3 is recovered, assuming that R4 is the first to convergence, R1
      forwards the traffic to R2. Since R2 has not completed convergence, the
      packet is still forwarded to R4 in accordance with the path before the
      failure link recovering. R4 has already completed convergence, so R4
      forwards it to R2 in accordance with the path after the failure link
      recovering, and the mircoloop occurs between R2 and R4.</t>

      <figure>
        <artwork align="center"><![CDATA[
 +---------------------------------------------------------------+
 |                                            & Link Recovery    |
 |                         End.X SID 2::3                        |
 |   +-------+      +-------+   &   +-------+                    |
 |   |   R1  |------|   R2  |-------|   R3  |                    |
 |   +-------+  10  +-------+  10   +-------+                    |
 |                       |               |                       |
 |                       | 10            | 10                    |
 |                       |               |                       |
 |                  +-------+       +-------+        +-------+   |                  
 |                  |   R4  |-------|   R5  |--------|   R6  |   |
 |                  +-------+ 1000  +-------+   10   +-------+   |
 |                                                               |
 |                                                               |
 +---------------------------------------------------------------+
Figure 2: Back-switching illustrative scenario, recovery of link R2-R3
]]></artwork>
      </figure>

      <t>Since the network does not enter the TI-LFA forwarding process after
      the node/link failure recovers, the delay convergence cannot be used in
      the back-switching scenario to prevent the generation of microloops as
      in the switching scenario. In the back-switching scenario, we only need
      to specify the Adj-SID of the back-switching link to achieve
      loop-free.</t>

      <t>From the above process of back-switching microloop generation, it can
      be seen that microloops happen because R4 is unable to pre-install a
      loop-free path computed for the link-up. Therefore, in order to
      eliminate the potential loop after the faulty link recovers, R4 needs to
      be able to converge to a loop-free path.</t>

      <t>When the faulty node/link recovers, the path can be anti-microloop by
      simply specifying Adj-SIDs of the neighbor node. As shown in Figure 2,
      R4 senses that the faulty link R2-R3 recovers and re-converges to the
      destination R6 with the path R4-&gt;R2-&gt;R3-&gt;R5-&gt;R6. The
      recovery of the faulty link R2-R3 does not affect the SR path from R4 to
      R2 and the SR path from R3 to R6, so both of them are loop-free. Since
      the only thing affected is the path from R2 to R3, the loop-free path
      from R4 to R6 can be determined by just specifying the path from R2 to
      R3. So it is only necessary to insert an End.X SID from R2 to R3 in the
      converged path of R4 End.X SID that instructs the routers to forward the
      message from R2 to R3, for example, R4 inserts anti-microloop segment
      list &lt;2::3&gt; in the message before forwarding it to R2, the path
      from R4 to R6 is guaranteed to be loop-free.</t>
    </section>

    <section title="Anti-Microloop Scheme for Multi-source Scenarios">
      <t/>

      <t>When an IPv4 or IPv6 prefix is advertised by multiple nodes in an
      IS-IS domain, the prefix has multiple route sources, which is called a
      multi-source route. This section is for the multi-source microloop
      avoidance scenario, which may occur when multiple nodes advertise the
      same route with inconsistent convergence speeds.</t>

      <t>The prevention of multi-source microloop is conducted by adding SRv6
      END.X and END SID to the segment list in the SRv6 scenario while adding
      prefix SID and Adj SID to the label stack in the SR-MPLS scenario.</t>

      <t>The following example describes how the microloop happens when
      multiple nodes advertise the same route.</t>

      <t>1. R3 and R6 both advertise the route 2001:db8:3::. The link between
      R2 and R3 fails. Assuming that R2 completes the convergence first, and
      R1 has not completed yet.</t>

      <t>2. R1 forwards the packet with address prefix 2001:db8:3:: to R2
      along the path before the failure.</t>

      <t>3. Because R2 has completed convergence, R2 forwards packets to R1
      according to the next hop of the route. In this way, a loop is formed
      between R1 and R2.</t>

      <figure>
        <artwork align="center"><![CDATA[
 +---------------------------------------------------+
 |                                 X  link failure   |
 | 2001:db8:1::    2001:db8:2::      2001:db8:3::    |
 |   +-------+       +-------+        +-------+      |
 |   |   R1  |-------|   R2  |----X---|   R3  |      |
 |   +-------+  10   +-------+   10   +-------+      |
 |        |                                          |
 |        | 10                                       |
 |        |                                          |
 |   +-------+       +-------+        +-------+      |                  
 |   |   R4  |-------|   R5  |--------|   R6  |      |
 |   +-------+  10   +-------+   10   +-------+      |
 | 2001:db8:4::     2001:db8:5::     2001:db8:3::    |
 |                                                   |
 +---------------------------------------------------+
Figure 3: Multi-source illustrative scenario, failure of link R2-R3]]></artwork>
      </figure>

      <t>A possible solution is that: the preferred destination node of the
      packets destined for 2001:db8:3:: changes from R3 to R6, but the
      convergence path from R2 to R5 does not change. In this case, timer T1
      on R2 can be started. Before T1 expires, for a packet that accesses R6,
      an End.X SID between R5 and R6 or an End SID of R6 is added to the
      encapsulation in order to ensure that the packet is forwarded to R6. The
      basic principle in the case of SR-MPLS is similar to that in the case of
      SRv6.</t>
    </section>

    <section title="Anti-Microloop Scheme for Multi-point Scenarios">
      <t>TBD</t>
    </section>

    <section title="Conclusion">
      <t>There are various scenarios and different implementation methods for
      loop prevention. The implementation methods proposed by this document
      based on SR microloop avoidance mechanism can be used for subsequent
      research and development.</t>
    </section>

    <section title="Security Considerations">
      <t>The behavior described in this document is internal functionality to
      a router that result in the ability to explicitly steer traffic over the
      post convergence path after a remote topology change in a manner that
      guarantees loop freeness. Because the behavior serves to minimize the
      disruption associated with a topology changes, it can be seen as a
      modest security enhancement.</t>
    </section>

    <section title="IANA Considerations">
      <t>No requirements for IANA.</t>
    </section>

    <section title="Acknowledgement">
      <t>The authors would like to thank everyone who contributed to the
      draft.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include="reference.RFC.8042"?>

      <?rfc include='reference.I-D.ietf-rtgwg-segment-routing-ti-lfa'?>

      <?rfc include='reference.I-D.ietf-spring-segment-protection-sr-te-paths'?>

      <?rfc include='reference.I-D.bashandy-rtgwg-segment-routing-uloop'?>
    </references>
  </back>
</rfc>
