back to article Google Spanner in the NewSQL works?

The commercial release by Google of its Spanner database as a public beta last month came as both a pleasant surprise and a wake-up call: perhaps the end of NoSQL databases is in sight. Spanner represents the pinnacle of the so-called NewSQL movement, which is a rejection of NoSQL values and a return to the old values of SQL, …

  1. Anonymous Coward
    Anonymous Coward

    SQLITE3

    Is awesome. Try it folks.

    1. Geoffrey W

      Re: SQLITE3

      It may be awesome, and I use it often, but an embedded local database engine is not really the same level of technology as a distributed database server. The Jet4 database engine is pretty good too, despite its age, and better at multi user use than SQLite, or was last time I looked, but they are minnows compared to Spanner.

      1. Anonymous Coward
        Anonymous Coward

        Re: SQLITE3

        As far as I am aware theres no limit on concurrent reads but there is a limit on concurrent writes (1).

        I use SQLite for APIs (using slim framework) and its blazing fast when loaded into RAM. Saves buckets of cash on disk IO cost when used this way. Even for much larger databases since RAM is cheaper than disk IO (especially on Azure) usually.

        Obviously if you have terabytes of data this isnt practical, but most people dont.

        Biggest database I have loaded into RAM is 51GB and the API it serves handles 200k requests a day with ease.

        Using a combination of write queuing and read caching the concurrent write limit generally isnt an issue though.

        Nginx is very useful for caching which helps prevent problems occurring when the database is locked for writing.

        Using an AMQP solution helps with setting up a jury rigged write queuing system.

        That said, I generally dont subscribe to the "more boxes is more scale" way of thinking in terms of databases.

        This particular set up is a hard sell for shops used to the Microsoft mentality that calls for more boxes (and therefore more licenses...hmm part of the plan MS?) to increase scale...but I see that as a waste of resources since its unlikely that are actually maxxing a box out before you throw another in. What you really have is a bottleneck. Something that can usually be resolved with more (or faster) NICs and faster storage.

  2. zebthecat

    "Traditional relational database system do this by only having one master, the keeper of the truth and slaves which keep copies and may actually be wrong!"

    Unless you use the Master/Master circular replication set up that is

    1. Anonymous Coward
      Anonymous Coward

      This is not a "traditional" system. It's a distributed system and it's either not truly master-master (i.e. sharded) or it's eventually consistent. Or worse (and surprisingly common), both.

      1. Anonymous Coward
        Anonymous Coward

        If you use replication then reads are pretty much 100% available for an RDBMS. Writes can be too with certain caveats. Googles system doesn't actually offer anything over that. The election process can still be a problem as there's a pause and what about data which has been stored by the now dead system. It won't be available on the copies yet.....It doesn't look like they've solved anything at all!

    2. cmaurand

      You still want to be careful of writes to both masters. You can get data conflicts very easily. You'll need some sort of arbiter.

  3. frank ly

    What time is it?

    "This is an implementation of synchronized clocks using GPS receivers and atomic clocks in every data center. This can cause problems during a partition if a node can’t connect to a master – its clock will drift, causing the election of Paxos masters to slow down."

    I'd have thought that a GPS/atomic clock would have enough accuracy and resolution that any data centre that was unable to connect would still have the same time as other data centres, for quite a while.

    1. Paul Crawford Silver badge

      Re: What time is it?

      Exactly, if you use NTP and lose the time server link you get drift, but if you have local stratum-1 servers (i.e. time-servers that get their time from an atomic clock either directly, or most commonly from GPS time-transfer) that simply should not happen.

      Still, all that using 'time' as a marker does is reduce the window of uncertainty in any split-decision issues, its not like an atomic (computing sense) transaction counter or similar that could be used to eliminate it. After all, you will get some variation in packet delays from originator(s) to SQL-like server(s) so time is not an absolute marker for event order in this case, but if you know your worst-case error is only tens of microseconds then you can at least narrow the window of event/decision uncertainty to be resolved.

      Also (back to another rant of mine) to Google time-smoothing - that is a bad idea, but only needed or possibly justified if you use time_t / UTC as your system clock. How do you guarantee drift at stable rates? Keeping all system clocks on atomic time (e.g. GPS, or TDT) avoid the leap-second issues and allows reliable syncing to an atomic-disciplined local clock.

  4. Dr Who

    Is it just me

    or was most of this article complete gobbledegook? I kept having to reread sections which looked on the surface like they should have meaning, but even with rereading may just have well been a load of blahs.

    1. JLV

      Re: Is it just me

      I think it's unrealistic to expect a one page article to get just a random person up to speed on complex subjects like these.

      Having said that, I found what was covered clearly and concisely explained and, as a mostly db-side dev, had no great trouble following the general gist of it.

      Maybe it's just me though ;-)

      p.s. I recognize the possibility that you might know more about the subject and found the treatment too simple. However, bear in mind that much more detail might have lost the rest of us.

  5. Anonymous Coward
    Anonymous Coward

    @zebthecat, typically you would still only writes to one master, this ensures that write data is "always" consistent, even if there is a split brain.

    There is an alternative which is quite common though, where there is a qurom of hosts, data can be written to any host, but the transaction is not committed until the majority agree it has been successful..

  6. AMBxx Silver badge
    Windows

    It's google

    Just as people start to use it, they'll get bored and move onto something else.

    Either that or it will be 'shared' data.

  7. WibbleMe

    I want to save data and I want to save it now. I dont want to define a new row with int or VARCHAR or define how large the table can be. My time is money not some ones OCD time

    1. AMBxx Silver badge

      How's that new Accounts system coming on?

    2. Doctor Syntax Silver badge

      "I want to save data and I want to save it now. I dont want to define a new row with int or VARCHAR or define how large the table can be."

      Don't we all? But the rest or us want to be able to find it again and to have it consistent when we find it, especially if multiple people are saving related data. We also have multiple people wanting to access the same data.

      However there's always the much under-rated cardfile.exe which may do just what you want.

  8. Anonymous Coward
    Anonymous Coward

    Relational DBs never went away.

    At least not for businesses with sufficiently complex data. NoSQL DBs are fine for things like data warehousing where they simply have to serve the same preformatted data over and over, or for storage of simple web data. But when any kind of data relationships come into your processing then they really start to suck. With MongoDB for example you have to structure your documents to support your application which is fine for application A. But if application B wants to query the data in a different way (eg App As sub document data is app Bs main driver keys) then you either duplicate everything into a different document format or put up with sucky performance as Mongo does table scans trying to find the data. Or you can join on collections using their hideous inefficient hack and borderline incomprehensible javascript/json syntax. Think SQL can get complex? Check out some Mongo javascript trying to do aggregations (their term for grouping) along with joining. You end up with so many nested documents and clauses it looks like an explosion in a bracket factory that would make even a die hard lisp programmer weep with envy.

    There was a very good reason relational DB theory was created - because it mirrors real world business activity for which flat file DBs had reached the end of the road in the 70s. This is a crucial point that the NoSQL kids who refused to learn from history completely ignored (or didn't even know about) and happily re-iinvented 1950s & 1960s wheels thinking they were being cutting edge. Oh, but they had some cool buzzwords & marketing. Yay.

    1. AMBxx Silver badge
      Joke

      Re: Relational DBs never went away.

      You old fogey. Who cares if a transaction is durable any way :D Consistency is for wimps.

      1. PNGuinn
        Joke

        Re: Relational DBs never went away.

        REAL coders would do it with EMACS.

    2. JLV

      Re: Relational DBs never went away.

      I agree with your points in general. SQL has some massive advantages when dealing with structured data.

      It's a lingua franca, rather than requiring you to learn a query language for each product.

      Also, while setting up the tables may not be super easy (and why should it be?), you can do pretty much anything you want with the system afterwards, as long as your query skills and the performance are up to it.

      However, even outside of web-by/scaling/distribution problems SQL does have at least one massive Achille's heel - self-referential data, like parts-of-parts and modelling a car engine for example is not a good fit for the standard relational model. Configuration data is another case - defaults, options, overrides - they make sense in JSON, less so in SQL.

      I suspect a good solution might involve co-opting graph databases on occasion. And I rather like the liberty that comes with sprinkling in some JSON data into Postgresql columns, without having to shoehorn every problem into SQL-type ERDs.

      i.e. I don't think MongoDB and the like are so hot outside of some particular, albeit broad, domains, but I like the fact that we are experiencing so much innovation in alternative ways to organize data rather than being forced to use SQL for everything.

      1. Doctor Syntax Silver badge

        Re: Relational DBs never went away.

        "SQL does have at least one massive Achille's heel - self-referential data, like parts-of-part"

        Oh dear. I wish someone had told me that before I wrote one of those. Especially with the fun complication that one part at the bottom of the assembly could have a serial number and that that serial number would apply to each part above it.

        1. Anonymous Coward
          Anonymous Coward

          Re: Relational DBs never went away.

          Configuration bits and pieces have never been a problem in my experience. Bits made from other bits (bill of materials) can be but now-a-days there are a few standard things you can do to work round that in a logical way, at least in 99% of cases I've seen.

  9. Steve Channell

    When Data is Immutable the problem is simpler

    The classis problem with distributed relational databases occur when attempting to update the same row in two locations. Heterogeneous distributed systems have to deal with consistency at the application level when ATM cash withdrawals happen at the same time as e-commerce, e-banking, batch and branch transactions. It turns out that replicating immutable transactions avoids many of the needs for ACID updates.

    Spanner is a great example of what can be achieved when you have control over a dedicated network, but so is Teradata, Netezza, Exdata, SQL/Server PDW.

  10. Ilsa Loving

    Am I missing something?

    So I don't understand now. The architecture of Spanner sounds interesting at all, but based on what I've read in this article, the benefits of Spanner are primarily achieved by having a very high quality infrastructure under it.

    Spanner itself is a NoSQL database that incorporates some of the SQL philosophy, but it is still just as susceptible to the exact same problems as literally every other database out there.

    Did I miss something? What can Spanner do that every other distributed database on the planet can do, given the same high quality infrastructure?

    1. Anonymous Coward
      Anonymous Coward

      Re: Am I missing something?

      The other major differentiator is the use of the Raft algorithm for leader election. The article says Paxos, but it's really raft. This gives it a scalability that conventionally architected RDBMSs currently can't achieve, and why the architecture's been lifted for Apache Kudu. This makes it Stupidly Scalable and very performant in the single datacentre. It's Google's addition of "TrueTime" that allows them to scale across datacentres.

      1. Anonymous Coward
        Anonymous Coward

        Re: Am I missing something?

        "very performant in the single datacentre. "

        Please, can people stop using this made up naff jargon word perfomant. It sounds geeky and a little bit try-hard. If you mean the software is competant just say so.

        1. Anonymous Coward
          Anonymous Coward

          Re: Am I missing something?

          You can "scale out" an RDBMS too. You can use clustering so you can bring 100s or 1000s of cores into play or you can replicate doing the same but across geographic locations.

          The reality is though that 99.9% of businesses don't need that.

          1. Anonymous Coward
            Anonymous Coward

            Re: Am I missing something?

            "you can replicate doing the same but across geographic locations."

            Not really you can't. There's a very good reason google dropped their already-heavily-modified mysql for spanner. Critically traditional transactional architectures come with huge trade offs when you try and distribute the transaction geographically. Throwing Paxos into the mix gives you the consensus semantics needed to scale transaction management and throwing global time synchronisation on top allows you to use those semantics consistently (for some large value of "consistently").

            I mean, you're welcome to try, but you'll soon run into a wall.

  11. Kevin McMurtrie Silver badge

    99.9999%

    Are you kidding me? Calling services degraded or regionally unavailable rather than "down" is cheating when any given observer will regularly have issues with Google services not working.

  12. Tom 64
    Coffee/keyboard

    Not particularly interesting

    It sounds like a layer over big table to aid partitioning and querying.

    If you have building scalable systems for any length of time, you'll already have developed all the strategies you need to deal with making MySQL/NoSQL work for you, and at a fraction of the cost.

    1. Anonymous Coward
      Anonymous Coward

      Re: Not particularly interesting

      "It sounds like a layer over big table..."

      Only on the surface. BigTable is a log-structured merge-tree. It is very, very highly optimised for random IO, to the point where doing any kind of scan at all is going to slow performance to a shocking crawl. Spanner sits in the middle ground between immutable columnar data and pure key-value stores. Not as fast for raw analytics, and not as fast for raw IO, but mutable and analytical at the same time.

    2. chu3

      Re: Not particularly interesting

      Really? I think that's true for NoSQL but there isn't a clear leader for transactional MySQL workloads. There seem to be a lot of strategies, and there seem to be a lot of bolt-on solutions to MySQL that implement some kind of middleware sharding layer. There are very few truly scalable (shared nothing) strategies that also provide the guarantees of ACID, transactions without caveats, and all the things you would want from your operational data store (e.g. consistent backups). ClustrixDB seems to provide that (as well as some measure of MySQL Compatibility), and NuoDB (though I haven't tried it because I'd have to rewrite my app)

  13. Anonymous Coward
    Anonymous Coward

    Not much to see here

    mysql> DROP DATABASE SPANNER;

    Query OK, 0 people affected, 1 warning (0.00 sec)

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like