You are browsing a read-only backup copy of Wikitech. The live site can be found at wikitech.wikimedia.org

Difference between revisions of "RPKI"

From Wikitech-static
Jump to navigation Jump to search
imported>Ayounsi
imported>Ayounsi
Line 92: Line 92:
 
             from policy BGP_rpki;
 
             from policy BGP_rpki;
 
         }
 
         }
     /* Temorarily for IXP_in, we reject invalids */
+
     [...]
     term rpki-invalids {
+
     policy-statement BGP_community_actions {
        from community RPKI:INVALID;
+
        term rpki-invalids {
        then reject;
+
            from community RPKI:INVALID;
 +
            then reject;
 +
        }
 +
        [...]
 
     }
 
     }
    [...]
 
 
policy-statement BGP_rpki {
 
policy-statement BGP_rpki {
 
     term valid {
 
     term valid {
Line 154: Line 156:
  
 
====Identify if an issue is due to invalid RPKI====
 
====Identify if an issue is due to invalid RPKI====
 
+
[[File:RIPE stat - Prefix Routing Consistency.png|thumb|Example of a RPKI invalid prefix, with valid less specifics.]]
* Enter the IP of the user reporting an issue in https://stat.ripe.net/
+
* Enter the IP of the user reporting an issue in https://stat.ripe.net/widget/prefix-routing-consistency
* In the "Routing Status" frame (bottom right), find the sub-frame "Originated by:", which should have a "RPKI Status" emoji.
+
* If the emoji is red/sad, then the IP is originating from a RPKI invalid prefix. Hover over the face to have more details.
* If the emoji is red/sad, then the IP is originating from a RPKI invalid prefix. Hover over the image to have more details.
+
* If the prefix or IP is not covered by a less specific prefix (see image) then it will not be able to be routed back to the client.
* Reach out to the provider so they fix their ROA, or setup an exception (less preferred, see bellow).
+
* In that case, reach out to the provider so they fix their ROA, or disable validation (less preferred).
  
 
====Disable validation====
 
====Disable validation====

Revision as of 12:01, 15 January 2020

Signing

Prefixes

All our prefixes have matching ROAs, for AS14907.

They are setup through the RIR's hosted RPKI platforms.

RIR Subnet Lenght
RIPE 185.15.56.0/22 up to /24
RIPE 2a02:ec80::/29 up to /48
RIPE 91.198.174.0/24 /24
ARIN 2620:0:860::/46 up to /48
ARIN 198.35.26.0/23 up to /24
ARIN 208.80.152.0/22 up to /24
APNIC 103.102.166.0/24 /24
APNIC 2001:df2:e500::/48 /48

Monitoring

BGPmon Network monitoring#RPKI Validation Failed

RIPE Network monitoring#Resource Certification (RPKI) alerts

Validation

Tracking task: https://phabricator.wikimedia.org/T220669

Gerrit changes: https://gerrit.wikimedia.org/r/q/topic:%22rpki%22+(status:open%20OR%20status:merged)

VMs: https://netbox.wikimedia.org/virtualization/virtual-machines/?q=rpki (Routinator requirements)

Grafana: https://grafana.wikimedia.org/d/UwUa77GZk/rpki

Current status

In production, reject RPKI invalid prefixes on peering links.

RPKI validation infra.png

Packaging

Progress is being made toward an official Debian package in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=929024

As well as a request to SRE on how to package Rust/Go apps for our infra in https://phabricator.wikimedia.org/T220836

In the meantime the package is build the following way, on a Cloud VM:

sudo apt-get install musl-tools build-essential
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
cargo install cargo-deb
git clone https://github.com/NLnetLabs/routinator/
cd routinator/
git checkout <version>
rustup target add x86_64-unknown-linux-musl
cargo deb --target=x86_64-unknown-linux-musl

https://rpki.readthedocs.io/en/latest/routinator/installation.html#building-a-statically-linked-routinator

Then added to Reprepro

Router config

First we need the routers to talk to the Validators:

routing-options {
    [...]
    validation {
        group rpki {
            session 2620:0:861:103:10:64:32:19 {
                port 3323;
            }
            session 2620:0:860:101:10:192:0:103 {
                port 3323;
            }
        }
    }
}

Then we classify the learned prefixes

policy-options {
[...]
    policy-statement BGP_IXP_in (and Transit_in) {
    [...]
        term rpki-classification {
            from policy BGP_rpki;
        }
    [...]
    policy-statement BGP_community_actions {
        term rpki-invalids {
            from community RPKI:INVALID;
            then reject;
        }
        [...]
    }
policy-statement BGP_rpki {
    term valid {
        from {
            protocol bgp;
            validation-database valid;
        }
        then {
            validation-state valid;
            community add RPKI:VALID;
        }
    }
    term invalid {
        from {
            protocol bgp;
            validation-database invalid;
        }
        then {
            validation-state invalid;
            community add RPKI:INVALID;
        }
    }
    term unknown {
        from {
            protocol bgp;
            validation-database unknown;
        }
        then {
            validation-state unknown;
            community add RPKI:UNKNOWN;
        }
    }
}
}

We also set the validation status for prefixes exchanges on iBGP (internal) sessions:

policy-statement iBGP_rpki {
    term valid {
        from community RPKI:VALID;
        then validation-state valid;
    }
    term invalid {
        from community RPKI:INVALID;
        then validation-state invalid;
    }
    term unknown {
        from community RPKI:UNKNOWN;
        then validation-state unknown;
    }
}

How-to

Identify if an issue is due to invalid RPKI

Example of a RPKI invalid prefix, with valid less specifics.
  • Enter the IP of the user reporting an issue in https://stat.ripe.net/widget/prefix-routing-consistency
  • If the emoji is red/sad, then the IP is originating from a RPKI invalid prefix. Hover over the face to have more details.
  • If the prefix or IP is not covered by a less specific prefix (see image) then it will not be able to be routed back to the client.
  • In that case, reach out to the provider so they fix their ROA, or disable validation (less preferred).

Disable validation

If validation is causing any issue and must be quickly disabled, stopping Routinator would not work, as by default the routers will keep the validator data in cache for 1h.

On the Routinator side, you can add a local exception (not implemented in our infra yet)

On the router side, you can either (depending on scope):

  • Disable all validation: deactivate routing-options validation
  • Not act on validation data: TODO
  • Set a static override: see Juniper's doc

Monitoring

RPKI to router port

  • See bellow to check if the process is running
  • Check if the port (3323) is open in iptables
  • Check if routinator listens on the port (sudo netstat -nlpt | grep routinator)
  • Test port from a monitoring host (eg. nc -zv <hostname> <port>)
  • Open a task, cc netops/traffic

Process

Troubleshot it like most processes:

  • sudo service routinator status
  • Routinator logs to syslog, check logstash or /var/log/syslog
  • Try to re-start it sudo service routinator start
  • Open a task, cc netops/traffic

Grafana alerts

Valid ROAs decreasing

A possible cause is that Routinator can't download the new ROAs from the repositories

  • Check the logs for signs of rsync failure (eg. rsync rpki.ripe.net/repository: rsync: mkstemp "/var/lib/routinator/repository/rpki.ripe.net[...]CAi" failed: Permission denied (13))
  • try to manually run the rsync from a temporary directory
  • Ensure the server have connectivity to the internet (eg. check the proxies)
Rsync status > 0

Look at the logs for more information on the failure.

Try to run the rsync manually, from a host not behind a proxy to rule out the proxies.

If the issue is on the rsync server side, ack the alert and monitor the issue (not actionable).

Possible future work

  • Add monitoring on the routers side. Currently only screen scraping/netconf seems doable (no SNMP).
  • Encrypt the RTR traffic. Not a blocker as it's not PII and it's not leaving our infrastructure. Not supported on Junos.
  • Implement mechanism to easily add exceptions.

Resources

Routinator's doc: https://rpki.readthedocs.io/en/latest/routinator/index.html

Juniper's doc: https://www.juniper.net/documentation/en_US/junos/topics/topic-map/bgp-origin-as-validation.html