back to article $310m AWS S3-izure: Why everyone put their eggs in one region

With Amazon now recovered from a four-hour outage that brought a large portion of the internet to a grinding halt, analysts are looking back to see what lessons companies can learn from the ordeal. The system breakdown – or as AWS put it, "increased error rates" – knocked out a single region of the AWS S3 storage service on …

  1. AceRimmer

    Softlayer vs AWS

    "The issue with public cloud providers – particularly aggressively priced ones like Amazon – is that your data goes to the cheapest place. It is one of the tradeoffs you make when you go to Amazon versus an IBM Softlayer," Enderle said.

    http://www.networkworld.com/article/3020235/cloud-computing/and-the-cloud-provider-with-the-best-uptime-in-2015-is.html

    Softlayer is more expensive, has less features and is no more reliable than AWS

    1. Anonymous Coward
      Anonymous Coward

      Re: Softlayer vs AWS

      You weren't seriously expecting meaningful and accurate analysis from Enderle were you? He's well-known as an "analyst"-for-hire. Shame on El Reg for quoting him.

  2. A Non e-mouse Silver badge
    Facepalm

    The takeaway, say the industry analysts, is that companies should consider building redundancy into their cloud instances just as they would for on-premises systems

    And this is new how...?

    1. gv

      "This could come in the form of setting up virtual machines in multiple regions or sticking with the hybrid approach of keeping both cloud and on-premises systems"

      Beancounters take note.

      1. billse10

        "Beancounters take note"

        just love your optimism!

    2. Anonymous Coward
      Anonymous Coward

      I thought that was the whole point of the cloud? That it automatically took care of redundancy for you...

      And here's me, stuck with hot-stand-by in a satellite building, connected over 10gbps.

  3. Justin Clift

    Self hosted S3

    There are decent S3 server implementations around, so it's definitely possible to self host S3.

    We use Minio, which is working well and looks to have a bright future.

  4. John Smith 19 Gold badge
    Holmes

    "companies should consider building redundancy into their cloud instances "

    Interesting about the pricing.

    Especially if they are like Adobe and Salesforce, who make it difficult (impossible?) to run locally.

    In fact it's not a cloud. It's a set of islands of processing and storage with different prices to ship and from them, and between them.

    I had thought it was one price regardless of where Amazon puts your stuff. In fact I thought they load balanced, migrating processing as needed.

    Y'know, the whole "It's in the cloud, it doesn't matter where it runs" b***cks.

    Obviously I was wrong.

    This also has characteristics of the looks like the AT&T phone system crash when one (voice) exchange crashed and the bug in the fail over software propagated the crash to other exchanges.

    1. Adam 52 Silver badge

      Re: "companies should consider building redundancy into their cloud instances "

      A cursory glance at the documentation would have educated you. I don't see how you making incorrect assumptions because you haven't done the most basic research is AWS's fault.

      1. John Robson Silver badge

        Re: "companies should consider building redundancy into their cloud instances "

        "A cursory glance at the documentation would have educated you. I don't see how you making incorrect assumptions because you haven't done the most basic research is AWS's fault."

        Why would I do that research - I don't run anything there.

        I kind of expect large cloud providers to do sane things with customer data...

        1. big_D Silver badge

          Re: "companies should consider building redundancy into their cloud instances "

          Exactly, the marketing for moving to the cloud has always been: "your data isn't in one place and if one server/location fails, you just keep on working."

          Never having trusted the cloud, I haven't any real experience of using it - other than the likes of GDrive/OneDrive for private use.

        2. tiggity Silver badge

          Re: "companies should consider building redundancy into their cloud instances "

          Sane things would include some forms of regionality (I'm talking generically here, not specific Amazon regions BTW).

          e.g.

          1. Data Protection rules may mean you can only host your data in certain countries and so a "stored anywhere" cloud may make you liable to fines

          2. "Your country" has various trade embargoes with other countries, and the "no business with X" rules would extend to cloudy storage there.

          3. Region close to you / most of your users should be more responsive as data transfer speed is not infinitely fast and so distance matters (how much it matters depends on amount of data being pumped & expected response time for user with that amount of data)

          4+ Umpteen other reasons why regionality is needed

          BUT..

          If a customer does not need regional lockin (to whatever size of "region") then I agree it would be nice if cloud providers made an "easy" cloud solution that just worked, with no hidden costs of moving data across "regions", but then they would lose a chance to make lots of lovely cash, so don't expect that anytime soon.

          1. Adam 52 Silver badge

            Re: "companies should consider building redundancy into their cloud instances "

            " an "easy" cloud solution that just worked"

            The problem with that is that as soon as you do multi-region products your regions are no longer independent. So software/config issues - like the DynamoDB one 18 months ago and most likely whatever Tuesday's was - will take out more than one region.

            Tuesday's event doesn't appear to have been environmental so would have affected multiple regions had the been linked.

            My suspicion is that this is why s3 cross region replication is so hard (it does exist, btw).

            1. Anonymous Coward
              Anonymous Coward

              Re: "companies should consider building redundancy into their cloud instances "

              "...will take out more than one region."

              I don't think it would, but all other regions would be dragged down surely. Regardless of either situation, the only feasible fix would be to have a 1:1 backup region configured nearly entirely different on the back end. Maybe not, I really don't know. But obviously Amazon alone can't take care of all your needs, so that dilemma is finally over.

      2. Doctor Syntax Silver badge

        Re: "companies should consider building redundancy into their cloud instances "

        "A cursory glance at the documentation would have educated you."

        Does the marketing provide the same education? The decision makers all too often are those who not only wouldn't give documentation even a glance, nor would they understand it even after prolonged study. Those who get to read the documentation are those who have to read it to implement the decisions above their heads.

        Cheaper than running your own data centre. Simple. We take all that hard stuff off your hands. That's what sways decision makers.

        1. localzuk Silver badge

          Re: "companies should consider building redundancy into their cloud instances "

          In terms of redundancy and resilience, hosting in a single region at AWS is no different to having your own single set of servers in your own data-center. The advantage is the scalability - need to triple your capacity? Click a button and its done. Not so easy with your own servers.

          Resilience-wise, you need to design your applications to use redundancy really.

  5. Adam 52 Silver badge

    us-east-1 is Amazon's least reliable region. It's not in a great location from a weather perspective. It's also where AWS roll out their new versions first, which leads to problems like the Dynamodb secondary indexes that took out us-east-1 in September 2015.

    If you are going to rely on one region, don't make it us-east-1.

  6. Doogie Howser MD

    A third option

    As well as having hybrid deployments and multi-region designs, you could even go a step further and keep some stuff in AWS and some stuff in say Azure. I know it's more management and more cost, but the skills are largely transferable and the bottom line is, how much cash and reputation are you losing as a business when your cloud based app/whatever is off line?

    1. AMBxx Silver badge

      Re: A third option

      Would be interesting to have an article comparing how AWS & Azure would need to be configured. I use Azure for a lot of storage of installation files. Availability isn't a major concern as never urgent, but when setting up the storage you specify the level of redundancy and it's priced accordingly.

      That ranges from just local redundancy through geo-redundancy. Somewhere in the middle, you can have read-only redundancy. Seems like a good setup, but would be interesting to see what Amazon offer in comparison.

      1. Jim 43

        Re: A third option

        I like the Azure storage redundancy options as well -- It's one of the places that I think they beat AWS. With Amazon S3 you only have two redundancy choices out of the box: Standard and Reduced Redundancy.

        Standard claims 99.999999999 data durability and 2 concurrent facilities (in the same region).

        Reduced Redundancy claims 99.99 data durability and single facility.

        If you want your data available in multiple regions then it's up to you to write that data to multiple places (buckets or containers). What's more, a lot of people get confused about this as the top of the console page says "Global" where you would normally select a region (I can forgive casual users for thinking that their data is stored in multiple regions by default). If you actually read the docs (or create a storage bucket) then you should understand that each storage bucket resides in a single region.

  7. brotherelf
    Trollface

    Ah yes, the usual pointy-haired problem: the service was too mission-critical to test if failover/restore/… would actually work, because "the competition promises nineteen nines". (For extra bonus PHB points, then stop funding backup/failover/… because "untested procedures do not actually increase availability".)

  8. Dan 10

    Re: "companies should consider building redundancy into their cloud instances"

    I think the author misses the point slightly, and states the bleeding obvious Of course, any implementation with any value should be resilient, but each AWS region includes multiple availability zones, each containing multiple datacentres. Any deployment with resiliency across those *should* be resilient, period. Amazon make the point that replication within a region delivers HA/DR, is fast and free. Replication between regions adds complexity, is slower (as it's over the public internet) and costs, if only because one of the core tenets of cloud is paying for data egress out of the source.

    To put it another way, how many of you have your on-prem DCs spanning different regions? Off the top of my head, I would have to go back 7 jobs to find a place that did, and most of those are big enterprise shops.

    I think one possible takeaway is that Amazon's position that you don't need to deploy into multiple regions is now called into question. If I worked there, I'd be pushing for a new service in the form of direct connectivity (not via internet) between regions, with a lower price point for data transfers. AWS do offer this kind of connectivity to customer sites, but presumably anything between regions would be a fat pipe not specific to any single customer.

    Alternatively, perhaps the fact that so many orgs tried to all failover at once is key, in which case maybe AWS needs to review it's provisioning/overcommit policies.

  9. Jc (the real one)

    'Pouring one hundred gallons of water through a one gallon hose'

    If only commentators could use analogises that make sense. How long is a gallon? What diameter? ;)

    Jc

    1. Anonymous Coward
      Anonymous Coward

      "How long is a gallon?"

      8 pints long obviously ! Diameter depends on dimple or straight

    2. eldakka

      new reg unit?

  10. Doctor Syntax Silver badge

    Are we holding it wrong?

    Not the Cloud, not Amazon but the internet as a whole.

    Looking back at the origins of the internet it was preceded by separate networks which had a weird and wonderful spread of technologies.* The internet came along with its own protocols to connect these individual networks together but the original emphasis was on those individual networks. Local came first and the internet only provided what wasn't local.

    Now that we're almost all using the same internet protocols to run our LANs the distinction between local and remote has been blurred. We've forgotten to put local first but we're relying on an internet designed under the assumption that we would.

    Even if we go along with the old Sun slogan that the network is the computer we shouldn't extend that to believing that the internet is the computer. This was a reminder of that.

    *I remember one which consisted of little terminal boxes each with a single RS-232 connection daisy-chained together with coax. TV aerial coax with TV aerial connectors.

  11. PghMike

    inter-region transfer costs aren't free

    You pay about 5-10 cents/GB for transfers between regions. You even pay 1 or 2 cents per GB for transfers out of one availability zone into another in the same region. So redundancy isn't free.

    That's probably another reason it doesn't get done as often. But I suspect the biggest issue is the complexity, and dealing with additional latency when going between regions.

  12. GrapeBunch

    Pyrus vs Citrus trees

    If cloud costs .02 / GB, then the home user of the silver standard of storage devices (the external 2.5" USB3 magnetic drive) amortizes his investment in a few (say 3) months. It's a pears and oranges comparison, of course, but it does give a flavour for why (some) cloud storage companies keep losing money just to stay in the game. They have to be competitive both with the big guys, and with on-site storage.

    Some tech services are naturals for offshoring: coding, support, telemarketing, manufacturing etc. I wonder to what extent that applies to cloud storage. For example, keep your data in Poughkeepsie, but your redundancy in Poona?

  13. Ken Moorhouse Silver badge

    Inter-Regional Transfers...

    Need careful design.

    Does data get presented to the user only once it has replicated successfully, or could you be looking at stale data?

    If data is consistently presented, you then have latency to consider, as an earlier commentator I think implied. You have no control over how long an inter-regional transfer will take.

    If an outage occurs, what happens when one of your regions is internally (i.e., via the Cloud infrastructure) unable to replicate its data - does that cause the replication to fail with the result that the whole lot falls over like a line of dominos?

    Cloud is not a Plug & Play solution, it requires a lot of Difficult Questions to be answered. Are Cloud Providers giving you the correct answers to those questions?

  14. Jon 37

    If you're not testing it, it's probably broken...

    There's an open-source program, Chaos Gorilla, that will randomly kill an AWS availability zone for you (during normal business hours). This lets you discover whether or not your failover works at a time when it's easier and more convenient to fix it if you discover your failover doesn't work. (Your staff are all in the office, and the availability zone isn't really down so you can always just disable Chaos Gorilla).

    It's originally from Netflix:

    http://techblog.netflix.com/2011/07/netflix-simian-army.html

  15. phands
    Facepalm

    You Quoted Enderle?

    Serious loss of credibility for El reg when you quote the drooling fool who supported SCO's attempt to kill Linux.

  16. Herby

    Pricing???

    The pricing of various amounts of storage being different for differing regions is a bit silly. It is similar to stock exchanges that quote different prices for the same stock (make $$$ fast).

    The price of $.025/Gb/Month turns into profit quite quickly compared to available disks these days. I can easily purchase a 3Tb drive for around $90 locally (off the shelf), and if I price it on a per Gb basis it turns out to be about $0.03/Gb. I can't believe that it costs that much to keep it spinning, so there is a BUNCH of profit in the pricing.

    It all comes down to: If you want to have people keep their eggs in different baskets, you better price the baskets the same, or you will overload one of the baskets. Simple logic.

    Then again, maybe Amazon wants a single point of failure, who knows.

    1. MatthewSt

      Re: Pricing???

      This argument seems to do the rounds quite often. On a small scale yes, you can do it for that price. However, you're missing the redundancy element here (so you need three to six times the number of disks that you want to store data for), you've not got any set-up costs (datacentre, the rest of the server you're running in) or running costs (power, cooling, connectivity to the rest of the world) which both need to be multiplied by three to six as well. Drives last about 4 years (https://www.backblaze.com/blog/how-long-do-disk-drives-last/), so on average you can expect to be replacing 25% of your disks per year.

      It's the rest of the variables (power, staffing, space, taxes) which cause the difference in pricing in regions

      There's a lot more than this, but that's the start of the difference between buying the storage outright and using a service

      1. Ken Moorhouse Silver badge

        Re: so you need three to six times the number of disks (for redundancy)

        Yes, but I hope that there is no implication in that statement that these extra resources are in the same location.

        Cos if they are, it is a single point of failure (suppose the datacentre succumbs to an earthquake, hurricane, fire, flood, plaque of locusts....).

      2. Anonymous Coward
        Anonymous Coward

        Re: Pricing???

        *Disks* are cheap. *Storage* is expensive.

        ... and often difficult to do well.

    2. eldakka

      Re: Pricing???

      The costs for the identical work or facilities could be different in 2 different data centers.

      For example, electricity pricing can be different in different states let alone countries. So a data centre in location 1 could be paying $0.15/kWh, and in another locality it's $0.11/kWh. And that $0.04kWh difference adds up when you are talking many MWh/day power usage.

      Real estate purchasing/rent/land taxes could be different.

      Labour costs could be different.

      Transport/supply costs could be different (could cost X to install 1PB in data centre 1, but 1.2X in data centre 2).

      Now, whether the costs should be amortised across the entire global infrastructure is a question for the accountants.

  17. Fazal Majid

    US-East is popular

    Because half the US population lives in the Eastern Time Zone.

    Amazon only recently (4 months ago) opened its US-East-2 region. Many people haven't heard about it yet (I hadn't until just now) and in any case it is based in Ohio, which is nowhere near as big a connectivity hub as Virginia.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like