back to article Amazon's AWS S3 cloud storage evaporates: Top websites, Docker stung

Amazon Web Services is scrambling to recover from a cockup at its facility in Virginia, US, that is causing its S3 cloud storage to fail. The internet giant has yet to reveal the cause of the breakdown, which is plaguing storage buckets hosted in the US-East-1 region. The malady kicked off around 0944 Pacific Time (1744 UTC) …

Page:

  1. Andy the ex-Brit
    Mushroom

    Strava

    Strava is down due to this! How can I check how many miles I've ridden so far this month?

    1. Hedley Phillips

      Re: Strava

      If it's not on Strava it didn't happen

    2. BongoJoe

      Re: Strava

      Has the Ordnance Survey site gone down?

  2. Steve Davies 3 Silver badge
    Paris Hilton

    But....

    Isn't the selling point of all this cloudy stuff that it does not go down???????

    I guess the AWS cloud must have pissed down on someone until all the clouds disappeared.

    Paris because she is good at shedding tears.

    1. Just a geek
      Mushroom

      Re: But....

      Too many people (non IT folk) seem to think that the cloud is this magical place that never has an issue. No matter how many outages Amazon, Azure, etc have, people still seem to think that it's made of magic.

      Deploy in the cloud by all means but still backup, replicate, ensure that you don't have a single point of failure.

      1. bombastic bob Silver badge
        Devil

        Re: But....

        "Deploy in the cloud by all means but still backup, replicate, ensure that you don't have a single point of failure."

        always good advice.

        /me uses github. that's cloudy enough.

      2. Doctor Syntax Silver badge

        Re: But....

        "Deploy in the cloud by all means but still backup, replicate"

        We used to call that keeping a dog and barking yourself.

      3. Anonymous Coward
        Anonymous Coward

        @Geek

        "Too many people (non IT folk) seem to think that the cloud is this magical place that never has an issue."

        True, but who's fault is that? Isn't this exactly their whole selling point to begin with?

        I also don't think you should dismiss the whole argument that easily, because when properly set up you can get a redundant environment if you want to. The fact that it now doesn't work this way at AWS tells me more about their infrastructure than the (in)abilities of virtualized hosting.

      4. Dan 10

        Re: But....

        "Deploy in the cloud by all means but still backup, replicate, ensure that you don't have a single point of failure."

        Unfortunately, that is what they've done. This fault affects a specific region, each of which contain multiple availability zones. Each zone constitutes a logical datacentre, comprising multiple physical datacentres (between 3 and 6 in each AZ, I believe). Deployment across two or more AZs in a given region *is* removing the single points of failure. Supposedly. Didn't work this time.

        AWS don't particularly recommend deploying across more than one region, because each region is effectively a completely different cloud, common in branding, usage etc, but connected only via the public internet. Replication between zones within a region is fast and free, but replication between regions is slower and costs.

        Ultimately though, a well-designed AWS deployment, consisting of all the fault-tolerant bells and whistles, still has no upfront cost and is thus way more achieveable than doing it on-prem. Said bells/whistles will make nuclear outages like this the cause of the rare downtime you do get.

    2. FIA Silver badge

      Re: But....

      Isn't the selling point of all this cloudy stuff that it does not go down???????

      No.

      It's that 'IT stuff' has become a utility, as in you only pay for what you use.

      This means you can build highly resilient and/or scaleable systems without huge upfront costs.

      Doesn't mean people do though. ;)

      1. Anonymous Coward
        Anonymous Coward

        Re: But....

        Fact is any business running anything critical to the business on other people's servers better have a contract guaranteeing they get back more than the down time costs (goodwill for example ain't cheap) or the people responsible are simply shirking their fiduciary duty to the company.

        1. Adam 52 Silver badge

          Re: But....

          "better have a contract guaranteeing they get back more than the down time costs"

          Why? If the downtime is less than you'd get elsewhere, or if the savings are more than the cost or if the faster time to market means you make massively more than the cost then you're still up.

        2. FIA Silver badge

          Re: But....

          Fact is any business running anything critical to the business on other people's servers better have a contract guaranteeing they get back more than the down time costs (goodwill for example ain't cheap) or the people responsible are simply shirking their fiduciary duty to the company.

          No, that's exactly the opposite of what you should be doing, you're looking to apportion blame after the fact. This is little use if your business has gone bust due to the downtime. Better to design systems that minimise the risk of this happening in the first place.

          Using the cloud allows you to build complex systems with little upfront cost.

          That's it.

          This does mean that smaller companies can build an infrastructure that's distributed and resilient in a way that wasn't financially feasible 10-15 years ago; and larger companies can potentially significantly reduce their DR expenditure.

          It doesn't mean it'll never fail or require administration or backup or all the other things you should be doing with an IT infrastructure. It just means you don't spend a boatload upfront on kit.

          1. Anonymous Coward
            Anonymous Coward

            Re: But....

            >It just means you don't spend a boatload upfront on kit.

            And generally have less say on how things are setup and ran. Which is fine I guess for some but I personally wouldn't work for a company where I was responsible for production mission critical software running on systems not owned by my company, with a contract or not. The edge to building a lifetime of skills is getting a say directly and indirectly on such matters.

            1. Anonymous Coward
              Anonymous Coward

              Re: But....

              That said the cloud has it purposes. Definitely a cost saver for non mission critical non proprietary stuff. Still when internal manufacturing is your core mission the cloud is more a distraction for the bean counters than something to look forward too.

            2. Anonymous Coward
              Anonymous Coward

              Re: But....

              "It just means you don't spend a boatload upfront on kit."

              That is understating it.

              One of the huge advantages of public cloud is that you pay for actual utilization vs scaling to peak. That is huge. It would be worth using public cloud just for that benefit. As anyone who has ever sized on prem infrastructure knows, you scale to peak (meaning that you are paying for infrastructure every day as though it is the busiest day in the history of the company, even though most days are not the busiest day in the history of the company) and then you add 20% to the sizing because no one can be certain that the peak will not increase at some point and you cannot just elastically add scale. That equals many, many billions of dollars every year in infrastructure which is purchased and never or very rarely used.

          2. Doctor Syntax Silver badge

            Re: But....

            "It just means you don't spend a boatload upfront on kit."

            It also means your interests aren't necessarily at the front of the queue when it comes to recovering from this sort of (not) outage.

      2. Doctor Syntax Silver badge

        Re: But....

        "Doesn't mean people do though."

        Maybe because it's been sold as cheaper than running your own data centre.

        When IT try to persuade the business to make provision for this sort of thing it's probably dismissed as IT being profligate again or even IT trying to bump up costs so their own service is still competitive.

    3. macjules

      Re: But....

      Guys, EVERYTHING goes down on you at sometime or another.

      1. Anonymous Coward
        Anonymous Coward

        Re: But....

        >Guys, EVERYTHING goes down on you at sometime or another.

        Of course but when you have a good working personal relationship with gentlemen equally professional to yourself and with badges that only contain a slightly different number to yourself then its causes a lot less panic and is much easier to contact the exactly right people on the exact right time and get the answers you can count on and the service you need without as others say having to worry about if someone is putting your company's interests first. If this is not the case with your company then you should start thinking about finding a new company.

        1. Anonymous Coward
          Anonymous Coward

          Re: But....

          >Guys, EVERYTHING goes down on you at sometime or another.

          Network goes down and occasionally hardware goes down but fun fact even after years of supporting it I have never seen an HP-UX OS crash due to software ever. Of course thanks to Red Hat and cheap commodity hardware rising (and not giving 2 shits about POSIX) and HP squeezing its last few customers I do probably sadly see more Linux kernel panics in my future sigh.

        2. macjules

          Re: But....

          Two words: Tier Caching. All our US sites use S3 in W.VA but not one was affected.

          1. Dan 10

            Re: But....

            @macjules

            Caching in the Cloudfront sense, or within S3 itself?

    4. TheVogon

      Re: But....

      No - that's never been the claim of cloud. They specifically tell you it's not 100% guaranteed. That's why anything that matters should be designed not to rely on a single cloud region....

    5. Anonymous Coward
      Anonymous Coward

      Re: But....

      Yes, exactly. All our deployment and storage services are dependent on S3 or S3 backed apps and were all critically impacted but you wouldn't have noticed because our cloud based infrastructure was spread over many zones with enough resources (and cache) to weather the storm. A fortune 500 company managing many hundreds of web services.

      1. Anonymous Coward
        Anonymous Coward

        Re: But....

        "our cloud based infrastructure was spread over many zones with enough resources (and cache) to weather the storm."

        Righto.

        Cache doesn't have everything in it though, so what happens when something uncached is required from somewhere else ?

        Works, but slowly ?

        Total failure of that request and anything related thereto?

        "High error rate"?

        Interested readers want to know.

    6. Jason Hindle

      Re: But....

      "Isn't the selling point of all this cloudy stuff that it does not go down???????"

      Not without multiple levels of geographic redundancy. It's hugely expensive for an event that might only happen once every few years. Those dumb pipes known as the carriers have it in spades*. The likes of Amazon and Google, no so much. I like carriers (from a technical perspective).

      * Even for voice mail, and no one uses that.

    7. Anonymous Coward
      Anonymous Coward

      Re: But....

      This is why a proper public cloud should be 100% automated. Not mostly automated like AWS.

  3. Anonymous Coward
    Anonymous Coward

    I'll punt these up in advance:

    "You can't trust the cloud"

    "It's the NSA installing a tap"

    "My data centre has been up for 30 years" (btw, so is Amazon's).

    Just to be smug, it took us 3 minutes from the first alert to switch from serving from US East and Ireland to Ireland and Frankfurt.

    1. Valarian

      "Just to be smug, it took us 3 minutes from the first alert to switch from serving from US East and Ireland to Ireland and Frankfurt."

      This, times a thousand. Any website or service pinning itself to a single node of a by-design distributed storage facility deserves whatever arse-kicking their customers choose to administer. The cloud, as is so often the case, is not the problem here - it's how it's being (mis)used that is the cause of any woes.

      1. Anonymous Coward
        Anonymous Coward

        To be fair, s3 is supposed to be multi-AZ and resilient within a region but as we saw with the last us-east outage and the recent London PoP outage tropical storms and power failures are no respecters of architectural diagrams.

      2. Mage Silver badge
        Mushroom

        Cloud selling and Pricing

        Yes, the "Cloud" is the problem. The way it's hyped, priced and marketed encourages beancounters to outsource to it.

        Almost Zero regulation.

        No 3rd party audit or oversight

        No transparency on backup, resilience, security or privacy. Just vendor hype.

        There are things that are appropriate for the "Cloud". However increasingly due to marketing of the Cloud vendors, the applications are inappropriate.

        1. Lusty

          Re: Cloud selling and Pricing

          No third party audit? Have you ever tried reading? AWS and Azure are probably the most audited data centres on the planet!

          1. Anonymous Coward
            Anonymous Coward

            Re: Cloud selling and Pricing

            Just shows you how useless audits are. All the audit is "do you do dumb things"? Nope. Okay you pass. I'm sure those accounting folk who do the audits like getting paid the big chunk of money my company pays them to say, yep, they say they do this.

            1. jMcPhee

              Re: Cloud selling and Pricing

              You left out some key steps the auditors follow:

              1) Pay us

              2) Show us you don't do dumb things

              3) Here are some pissant concerns/findings so we can say we did something. Oh, and here are some meaningless pain-in-the-ass findings to address because they are one auditor's special area of expertise - you should make his book mandatory reading.

              4) Your own in-house staff know about the real problems. But, "A prophet is not without honor except in his own country, among his own relatives, and in his own house.."

              5) Set up the next audit. Don't forget about (1)

              1. Anonymous Coward
                Anonymous Coward

                Re: Cloud selling and Pricing (@jMcPhee)

                I've worked at a place where the internal risk reviews, done by an employee of a different department in the same company, were exactly like that.

                Real serious issues were not allowed to be raised. By order of the management, the only issues that were allowed to be mentioned were the ones that could be acceptably mitigated at no cost.

                So something like only having one developer who knew anything serious about the company's internally developed customer-specific architecture-specific version of gcc, one not used (let alone maintained) anywhere else in the world, wasn't considered a recordable risk by the auditor.

                Then one year the developer in question went on holiday and didn't come back. Never seen again.

                Still, it mustn't have been a problem, because it wasn't recorded as a risk.

        2. Anonymous Coward
          Anonymous Coward

          Re: Cloud selling and Pricing

          "Almost Zero regulation"

          Almost? Care to list any?

          I'd like to see the actual energy bill. Not a percentage estimate of what you save, but a percentage estimate of what Amazon does NOT save. Where's that at, in a NSA vault perhaps?

          "...most audited data centres on the planet!"

          Audited for what? Do you actually know, honestly know? Do you believe everything you read? Read this: the USA doesn't spy on its citizens.

          1. Lusty

            Re: Cloud selling and Pricing

            "Audited for what? Do you actually know, honestly know?"

            Yes. I and everyone else who bothered to look do know. It's quite well covered actually, and has to be to allow architects to do our work properly.

            Azure details are in the trust centre.

            https://azure.microsoft.com/en-gb/support/trust-center/

            AWS is in their compliance and assurance pages

            https://aws.amazon.com/compliance/

            1. TheVogon

              Re: Cloud selling and Pricing

              "Audited for what? Do you actually know, honestly know"

              There are 2 main types of data centre audit - security and environmental.

              Usually a security audit would be a once off and would certify the facility to a specific standard - or just generally that it was secure by design and process with no significant security risks.

              An environmental audit should be conducted yearly on any critical datacentres, MERs, SERs, etc. Usually after your annual deep clean... This will give you an extensive report on everything from aircon, UPS and fire alarms to the type and size of the particles in the air! For anyone who has any of the above facilities who isn't do this then you should be. Two companies that can help are Bureau Veritas and Aquacair...

  4. Anonymous Coward
    Anonymous Coward

    I guess this guy finally broke AWS...

    https://www.reddit.com/r/DataHoarder/comments/5s7q04/i_hit_a_bit_of_a_milestone_today/

  5. Anonymous Coward
    Anonymous Coward

    It's not "high error rates", it's total failure to accept connections!

    $ telnet s3.amazonaws.com 443

    Trying 54.231.82.140...

    ^C

    $ telnet s3-external-1.amazonaws.com 443

    Trying 54.231.33.168...

    ^C

    These are the endpoints listed at http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region

    1. Lusty

      An advanced cloud storage service fails to accept telnet connections. Shocker. Telnet and ping are not reliable test tools. I'd expect these services to drop such fake connections as security risks.

      1. Sandtitz Silver badge
        Facepalm

        @Lusty

        "An advanced cloud storage service fails to accept telnet connections. Shocker. Telnet and ping are not reliable test tools. I'd expect these services to drop such fake connections as security risks."

        How is telnet to port 443 a 'fake connection and a security risk'?

        How can you drop telnet connections to port 443 but allow legitimate SSL traffic to the same port?

        1. Lusty

          Re: @Lusty

          "How is telnet to port 443 a 'fake connection and a security risk'?"

          The lack of any legitimate data would flag it up as a security risk. Using Telnet without encryption to connect to a TLS service is a dead givaway that it's not legit since Telnet doesn't set up the TLS before the connection.

          If you lot think ping is a good way to test a network then you need to get out more. For ping to work, it needs the service accessible and running on the endpoint you're testing and requires that nothing drops the traffic in between. It's quite a common thing and might confirm a connection is up, but lack of a ping response tells you nothing about whether that connection is down, certainly not a non-ping service on that same endpoint.

          1. Alister

            Re: @Lusty

            @Lusty,

            You put:

            The lack of any legitimate data would flag it up as a security risk. Using Telnet without encryption to connect to a TLS service is a dead givaway that it's not legit since Telnet doesn't set up the TLS before the connection.

            And just how do you imagine a TLS session starts? If you are using telnet to prove or disprove connectivity exists to a host, then the initial connection attempt is all you need, and that is the same for any tcp connection, whether it be a TLS negotiation or any other protocol.

            I agree with you about ping, most secured environments block ICMP traffic nowadays, however, it and traceroute are still useful for investigating latency and routing so long as you temporarily enable it on the endpoint.

            1. Lusty

              Re: @Lusty

              TLS works at the transport layer, clue is in the name. The security device sitting between the AWS/Azure host and the network would likely terminate any connections which are not actually setting up a secure transport as part of that connection. In case you missed it, both services have installed custom silicon on the network side of the NIC for exactly this purpose.

              Telnet doesn't expose the transport layer, and so if this were terminated it would indeed show as no connectivity when the service is up for legitimate traffic.

              I've not tested whether these services work with a Telnet test - my point was that just like ICMP, it proves nothing about the service itself.

      2. Anonymous Coward
        Anonymous Coward

        Umm, do you know the basics of networking? Even if Amazon had the most amazing WAF that specifically looked for telnet vs. curl or code, they'd have to let them connect first on the standard port to start talking. Until a program starts talking specific protocols and going, the WAF is going to have to let them start.

        Having telnet (or nc, or anything else in the world that can make a network TCP connection) all operates the same at the most basic levels of connecting out to a remote server on a specific port.

      3. disgruntled yank

        So let's not use telnet:

        $ curl https://s3-external-1.amazonaws.com

        ^C

        [me@mine ~]$ curl https://s3.amazonaws.com

        ^C

      4. Alister

        @Lusty

        I think you just blew any credibility you had to comment on networking subjects.

        1. Lusty

          @Alister, see other response regarding TLS and Telnet. Right back at you.

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like