back to article Google routing blunder sent Japan's Internet dark on Friday

Last Friday, someone in Google fat-thumbed a border gateway protocol (BGP) advertisement and sent Japanese Internet traffic into a black hole. The trouble began when The Chocolate Factory “leaked” a big route table to Verizon, the result of which was traffic from Japanese giants like NTT and KDDI was sent to Google on the …

  1. pdh

    Fragility

    So a few errant keystrokes by an anonymous Google administrator can knock the third-largest economy in the world off the Internet? I honestly had no idea the Internet was this fragile.

    1. james 68

      Re: Fragility

      It isn't and it didn't.

      Read my post below.

    2. ForthIsNotDead

      Re: Fragility

      They call it a "mistake".

      I call it a test.

  2. Lee D Silver badge

    I don't get why BGP doesn't have an inherent way to detect that such a path is dodgy (e.g. traffic loss/refusal reporting) and therefore downgrade it.

    It seems a major oversight in a routing protocol that the routes aren't tested, monitored, reported and downgraded if they aren't actually shifting traffic. Then such things are a few seconds of blip rather than hours of outage, and something flashing red on a software control panel somewhere saying "you cocked up, please fix".

    Is it really going to take a sustained, serious attack on such functions before we fix them? When someone like Russia wants to take out a country's internet, they could have a few related companies "accidentally" mis-advertise and cause no end of problems without having to lift a finger in terms of weaponry.

    This, and email. The last two vestiges of crappy protocols running the world and everyone knowing they're rubbish but NOBODY moving towards fixing them.

    1. eldakka
      Holmes

      I don't get why BGP doesn't have an inherent way to detect that such a path is dodgy (e.g. traffic loss/refusal reporting) and therefore downgrade it.

      The systems did report that the path was dodgy, hence users/clients getting errors on sending their data. The problem was, the issue was in the "source of truth" for determining routing/pathing of traffic to/from the affected (mostly Japanese) providers. Therefore even tho clients were getting errors, what good did that do? The source of truth said to access this ISP, you have to go there, if there doesn't work, then there is nothing that can be done, they're screwed as their devices can only send the data to where the source of truth (BGP tables) tells them to send it, and if those locations don't work, they're fucked. The only way to have fixed it for specific clients would be to modify their nearest BGP router (if you had access to it), to manually update the local table and to disable automatic updates so the broadcast updates would be ignored. The BGP table has over 64,000 entries, have fun manually editing that...

      You may have well of said:

      I don't get why there isn't a golden key ("not a backdoor" /rolleyes) so only the government can de-crypt encrypted data and no-one else can (or so say the government)...

      I don't get why we don't have bases on the moon...

      I don't get why we all aren't driving electric cars...

      I don't get why we don't have fusion power...

      Maybe because it's hard?

      Just because we don't have it doesn't mean people, dozens, hundreds, thousands of them, aren't working on it. That millions, perhaps billions of dollars aren't being spent on solving it. And even if they did implement such a beast, it'd be slow and hence require much increased processing power (hence expense) in border-gateway routers, add extra latency to the internet and require beefier connections (all this checking isn't free), and it'd take decades for all the various ISP's and tier 1 and 2 providers and so on to all agree to upgrade all their millions of devices that'd need to implement said protocol?

      1. deevee

        What has happened to the internet? The original design meant any link that was down was automatically bypassed and another route found. This is "supposed" to be the benefit of BGP.

        We have seen similar things happen a number of times over the years. Is it poor design, or poor implementation, or is it just organisations deciding to override the automatic healing functionality for commercial benefit?

        1. veti Silver badge

          What has happened to the internet?

          It grew.

          The original design was never envisaged to handle anything within about five orders of magnitude of today's traffic. Not surprising it creaks a bit from time to time.

        2. eldakka

          In addition to @veti's reply wrt scale, it should also be pointed out it was developed in an atmosphere of academic institution trust. It wasn't developed with malicious actors in mind, with people deliberately trying to break it or subvert it. It was developed in the good-faith belief that everyone administering it was technically competent and trustworthy and would have no reason to break it. Before the 'Axis of Evil', before 9/11, before the US State Department cable leaks, before Snowden confirmed the extent of government surveillance, etc.

          1. foxyshadis

            @eldakka

            You managed to completely miss the point with both replies. No one was asking for some kind of historical perspective on the protocol, no one cares, it sounds like you're trying to excuse away problems by claiming that there's nothing we can do because it was designed years ago.

            The whole point of the posts you're replying to is asking WHEN are they going to be fixed, so that a rogue actor can't maliciously bring down the internet easily, even if for a short time. (And ranting that no one seems to care enough about a gaping hole to do anything.)

            1. eldakka

              Re: @eldakka

              The whole point of the posts you're replying to is asking WHEN are they going to be fixed,

              No, they weren't. The posts I replied to were asking why it isn't already there. Which is what I answered to and that there are, in fact, people working on alternatives.

              However, it's like asking "how long is a piece of string" or "why we aren't on the moon yet" or "when will we have fusion power". The answers to those questions are the same as the answer to why BGP isn't fixed yet - it's technically hard, it's financially expensive, it's political, no-one knows HOW to fix it.

              It is not a simple problem, it is filled with historical baggage, and no-one has yet found a better solution that meets all the criteria:

              1) fixes the problems of people screwing around with it (inserting bad routes etc) (technical);

              2) performs fast enough to be implemented on current or near-term hardware (technical/financial);

              3) isn't cripplingly expensive to implement (financial/political);

              4) has agreement from the hundreds of stakeholders (political);

              5) will be able to be rolled out in a reasonable amount of time (political/technical/financial).

              So people are working on alternatives, but no-one has found a solution that fits all the criteria.

              It's like saying "if the nerds nerd harder I'm sure they can come up with a solution". That's easy for a backseat driver to say. If you don't like it, then YOU go and fix it, or YOU go and fund a team to fix it.

              It will be fixed when (if) it gets fixed.

          2. Allan George Dyer
            Holmes

            "An elegant protocol... for a more civilized age."

            @eldakka - No! This mythic time on the internet ended far earlier than you imagine. BGP was first used in 1994, and updated in 2006, when security threats were already a widespread concern. Already, we'd had the Morris worm and the Michelangelo virus. The IPsec working group started about 1992, and the RFC was published 1995, so someone was working on secure protocols when BGP was developed. Hell, the film Wargames came out a decade earlier, so network threats were even part of the popular consciousness!

            Rather than pointing the finger at academic institution trust, it might be worth looking at the cost trade-off of doing it properly, and the telecoms companies wanting to get into this new era quickly.

            icon - there's no Star Wars icon, and a deerstalker is very elegant.

      2. Nick Stallman

        But it doesn't have to go there. There are multiple routes via multiple providers.

        Prior to Google's announcement, a Japanese ISP already had one or more routes to each destination. The new 'shorter' Google route got added in addition to the already existing ones.

        With some sort of monitoring you could detect that routes via the new announcement are failing, then revert back to the longer pre-existing routes.

    2. Voland's right hand Silver badge

      I don't get why BGP doesn't have an inherent way to detect that such a path is dodgy

      It is not in the requirements. It is however in the requirements for the system which provisions the BGP announcement in large SPs like Google and Verizon as well as filters incoming announcements.

      This is a repeat of the usual idiocy of American private peering links.

      In America most peering is private with secret peering policies. There are very few fat links between oligopolists and there is NO ACL on the route announcements. So if someone fat-fingers an announcement the whole system goes into meltdown. This has happened again and again and again and will continue to happen. The first time I remember was as far back as 1996? or 1997 when some small Florida SP experimenting with gated source took down most of the USA Internet for a couple of hours.

      Compared to that in Europe most peering is public via peering points. Peering policies are PUBLIC and registered with RIPE in a format which is machine readable and everyone can form (and does form) an ACL on what to accept from a respective peer. As a result you get up to 3 points of enforcement: source, route server in the peering point, destination.

      The issue here is - yanks never learn. Not invented here (in the great Silly Valley), hence does not exist. They keep being whacked by this on a regular basis, but continue to suffer and enjoy it.

      1. Anonymous Coward
        Anonymous Coward

        The issue here is - yanks never learn. Not invented here (in the great Silly Valley), hence does not exist. They keep being whacked by this on a regular basis, but continue to suffer and enjoy it.

        Well, I couldn't care less if they engaged in self flagellation, but the problem (as with their financial mismanagement) is that anyone outside gets to "enjoy" the consequences as well and there is a distinct asymmetry in how the damage is dealt with. If it's a problem of US origin it's "oops, we did it again, just a silly mistake", if it's them furreiners it's obviously a heinous cyberattack and law enforcement starts pouring out like maggots from a disturbed dead badger.

        Maybe we should send a message in the only language they seem to understand: sue them.

        Yes, I'm grumpy. Now get me my coffee.

      2. patrickstar

        Most traffic in Europé is over private peering, not public exchanges. There simply isn't enough capacity on the public ones.

        And some exchanges (like NetNod) are far too expensive to justify hooking up unless you have very special requirements.

        Plus larger networks tend to have very restrictive peering policies, as usual.

        And really - as a smaller actor today, peering rarely makes sense financially since transit is so cheap and peering comes with a significant chunk of added complexity.

        For most of them, it's basically only relevant if you happen to be in the right place and your NOC has spare time to kill. A lot of those that still do it are mostly doing so out of habit. A few of the others have specific requirements (latency and such) to justify it.

        As to automatic route filtering via the RIPE DB - it's certainly done by some, but not many. Definitely not a universal practice.

        1. Tom Samplonius

          "Most traffic in Europé is over private peering, not public exchanges. There simply isn't enough capacity on the public ones."

          Source? Because I don't think there are any sources which say this, as volumes of privately peered data are confidential.

          Personally, I'd have to say that public peering is much bigger in Europe than private. DE-CIX, AMSIX, and Linx carry a lot of traffic.

          1. patrickstar

            The source is having actually worked with several major European networks. The consensus has always been that private peering carries the majority of traffic exchange - certainly for the larger players.

            I'm sure you can find someone to confirm this if you ask around - the fact itself isn't exactly a secret.

            Yes - DE-CIX, AMSIX and LINX carry a lot of traffic. Private peers carry a lot more than that.

            Fun game: Select a major network that hasn't anonymized its port statistics at the major IXes. Sum up all traffic on them. Note that it's far, far less (order of magnitude in some cases) than what you'd expect a network of that size to exchange with the others.

            (Note that I haven't really been involved in the ISP business for several years, so theoretically there's a possibility that this has changed since then. But I seriously doubt it.)

    3. Tom Samplonius

      "This, and email. The last two vestiges of crappy protocols running the world and everyone knowing they're rubbish but NOBODY moving towards fixing them."

      Just because you are not aware of it, doesn't mean things are not happening. Most carriers have prefix filters set on network-to-network connections to filter our this type of thing. Clearly Verizon did not have this setup on their peering links with Google in Japan. But it doesn't mean that isn't getting done. The future is digital signatures on the routes themselves. This is happening, but progress is pretty slow.

      As far as email, not much can't be done about it. DKIM was the last major advance, but many small mail servers don't support it yet. BGP is easier to extend, as it has plenty of knobs. And there are fewer parties, and all of the relationships are known in advance, unlike email.

  3. razorfishsl

    So all the traffic for Japan "accidentally" went into google?

    1. james 68

      Far from "all", in fact a reasonably small amount which by chance covered the government's supplier (which is why they're making a big noise over what amounts to a tiny drama).

      I live just outside Tokyo and saw no issues at all on either my home connection (AU, largest supplier in Japan) or on my mobile (UQ, a tiny supplier). Nobody else that I have talked to here in Japan noticed anything out of the ordinary either.

      1. Pascal Monett Silver badge

        Ah, the good old "I had no issue, so there was no problem".

        1. james 68

          @Pascal Monett

          "Ah, the good old "I had no issue, so there was no problem"."

          Well when the claim is that "ZOMG!!! ALL OF JAPAN WAS TAKEN OFFLINE!!!!" and my case and all of those that I have interacted with clearly proves that the premise is incorrect then yes, the fact that I did not have a problem is direct contradictory evidence and valid proof that the story is at best wrong and at worst clickbait.

          The article fails to point out that only a handful of providers who are affiliated with Verizon were actually affected and makes wild claims that this, somehow, is the entirety of Japanese internet infrastructure.

    2. Anonymous Coward
      Joke

      "So all the traffic for Japan "accidentally" went into google?"

      It was just a test which went out of control.... the transit infrastructure was not ready yet.... after all, with those new FCC rules, Google needs a way to get the same data the provider gets...

  4. Anonymous Coward
    Anonymous Coward

    Easy to do...

    "someone in Google fat-thumbed a border gateway protocol (GGP)"

    :)

    1. David 132 Silver badge
      Happy

      Re: Easy to do...

      I should point out that there is a "tips & corrections" link at the bottom of this, indeed of all Reg stories, and if you use it to notify them of their unfortunate case of Inadvertent Acronym Confusion (AAD) they'll not only remedy it, but probably send you a very nice & polite email to thank you for your diligence.

      1. David Roberts

        Re: Easy to do...tips and corrections

        It has also been pointed out in the past that this link does not work on all client systems.

      2. handleoclast
        Coat

        Re: Easy to do...

        I assumed it was intentional. Google Gateway Protocol.

      3. Anonymous Coward
        Anonymous Coward

        Re: Easy to do...

        I should point out that there is a "tips & corrections" link at the bottom of this

        .. and I should point out that, unlike the "Tips and corrections" link, the whooshing sound you heard is indeed far less extensively documented.

        I know it's Monday but Christ, relax (which is admittedly a very bad pun in itself, sorry). Have a beer.

      4. Snowy Silver badge

        Re: Easy to do...

        The "tips & corrections" link at the bottom of this, indeed of all Reg stories only works if you have a email program installed and configured.

  5. bombastic bob Silver badge
    FAIL

    All Eggs, One Basket, big fat thumb go *squish*

    Yeah, kinda what the title says.

    NOT good. Someone needs a CLUE BAT for letting things get to that point.

  6. Anonymous Coward
    Anonymous Coward

    VZ asleep

    "In this case it appears Verizon had little or no filters" thus GarbageIn-GarbageOut.

  7. naive

    Internet is one of the worlds most successful collaboration projects

    Collaboration means it is not owned by anyone, so it depends on parties collaborating to make it work.

    The standards we have now, are successfully deployed for nearly two decades.

    The fact that sometimes someone makes a mistake now and then, doesn't mean we need a complete overhaul of things that work so well, or to introduce all kinds of new stuff, introducing more complexity and difficulties in getting things to work well between for example AWS systems somewhere in VA and users in Pakistan.

    So no, internet management tools are not designed for Microsofties who are used to clicking a dozen "Are you sures". Which is good, because it created something great without introducing a sickening monopoly which left people like Bill Gates with more millions than most people in the world own in single dollars.

    1. Anonymous Coward
      Anonymous Coward

      Re: Internet is one of the worlds most successful collaboration projects

      Yes, but.... You edit /etc/sudoers by hand or using visudo?

      It is not overly difficult to put in place checks so that errors of this sort are simply impossible. Google's internal systems while I was there were shockingly short on prechecks of this sort, and what was there was impossible to wrap into higher level systems in a sane fashion.

      At some point, "Oops, I just sinkholed 1000TB of net traffic" needs to be followed by "Oops, you just incurred a $1M fine". Companies will prioritize stabilities when they are forced to. Not before.

  8. Anonymous Coward
    Anonymous Coward

    All your base are belong to us

    All your base are belong to us

  9. JeffyPoooh
    Pint

    Can "border gateway protocol (BGP) advertisements" be spoofed?

    The word [BGP] "advertisement" is concerning. Advertising often means 'just putting it out there'. Hopefully they're signed and authenticated, if not always correct.

    So much for the Internet automatically "routing around damage" as was originally promised. Yes, it is more fragile (relying on single routes, single points of failure) than was originally intended and promised back in the early days.

    1. patrickstar

      Re: Can "border gateway protocol (BGP) advertisements" be spoofed?

      BGP announcements are not signed or authenticated in any way in most cases.

      And this is the very protocol that allows the Internet to route around damage. To be able to do that, you need dynamic routing. It could certainly be argued that a protocol with specified primary and secondary routes would be harder to screw up (a lot of telco stuff works like that), but then you wouldn't have an Internet as we know it.

  10. Concrete Gannet
    Mushroom

    Similar in Australia in 2012

    In February 2012, Dodo (a smallish ISP) sent a BGP change to Telstra (by far the biggest in Oz) implying Dodo was the entire Internet.

    https://bgpmon.net/how-the-internet-in-australia-went-down-under/

  11. Andy The Hat Silver badge

    Attack vector?

    Looking at this from a non-expert point of view, is this not a potential 'soft underbelly' attack vector? If one mis-configuration causes some meyhem, what would be the results of half a dozen simultaneous, deliberately planted false 'adverts'? Although obviously possible for an insider to deliberately or accidentally implement mis-configurations, is there enough inherent security to prevent it being possible for an external third party to do that with multiple targets?

    1. patrickstar

      Re: Attack vector?

      You need to compromise (or otherwise control) networks that peer with, buy transit from, or sell transit to, your targets. Or do so with networks that in turn do it with your targets, etc.

      And your desired announcements need to be accepted by whatever filters (prefix, AS path and/or max prefix limit) that may or may not be in place along the way.

      Generally, if you just buy a transit connection you will only be able to announce specifc pre-arranged prefixes (you'd typically have them accept your prefixes and those of your customers - often this will need to be authenticated with corresponding RIPE database entries and such). The network you're buying transit from might then very well have to go through the same process with THEIR upstreams.

      For filtering not to apply to you, you need to be a BIG network (so that it's too hard to maintain filter lists) and/or have stupid/lazy upstream providers.

      With peers (like via an internet exchange point or private peering hooked up in some DC cross-connect/meet-me room) however, filtering is often not as strict. Far too often it will only consist of a hard limit for the maximum amount of prefixes, to protect against accidental misconfiguration (like announcing a full table, i.e. the entire Internet, to your peer).

      Then after it's in place, the questions are how far your announcements will spread and if they will be preferred over the originals. The latter you can typically affect as a malicious actor by announcing more specific prefixes (for example, if your target announces a /22, you can announce four /24's) since the most specific route is preferred.

      Localized routing disasters (like several networks peering with some fat-fingered guy at one or several IXes) are pretty common, but for global scale you need at least one of the big players to pick up the bogus announcements, and this is actually pretty rare. Maybe a few major incidents per year globally?

  12. Anonymous Coward
    Coat

    But....

    ...Google IS the internet, isn't it?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like