back to article What your database needs is a good thermometer

In the very early days of databases, when they contained relatively small quantities of information, the distinction between frequencies of access was pretty immaterial. Now that we collect and store data in massive quantities, however, the distinction is becoming increasingly important. And apposite terms have evolved to …

COMMENTS

This topic is closed for new posts.
  1. Daniele Procida

    Was this an advertisement for Teradata?

    Now that commercial television will apparently be allowed product placement in its programming, is The Register going to follow suit?

    Interesting, the format of this piece followed exactly that of Radio Four's "Thought for the Day":

    1. a description of a tasty real-life problem with a more-or-less novel analysis of it

    2. an abrupt switch into product-selling mode without skipping a beat

    3. a reassuring single-sentence revisiting the original issue, to help you forget the evangelism without actually escaping it, and to seal the pretence that the whole thing has really been an investigation of a problem and not at attempt to sell you something.

    Not very nice.

  2. Pete 2 Silver badge

    Zero management

    The goal of most enterprise-class disk arrays seems to be the opposite of this. They work by configuring their disks as massively striped arrays (Stripe And Mirror Everything) that the admins can treat as one big bucket. With any performance issues (of which there are many) being addressed by throwing more cache RAM or access paths at the problem.

    The reason for this is essentially that data management is hard. It requires skills that are rare in most organisations, since they combine the talents of a DBA to know where the database interactions are with those of a storage-management person to consider the logical-to-physical mapping of the data and the knowledge of an applications specialist to realise why these things are happening. A second factor, which is a BIG drag on large organisations' ability to manage their data is the sheer amount of time it takes to migrate hot data to the correct locations - while the array in question is either still working or down for this maintenance. Since the arrays we need to fix are the very ones with the hotspots that can't handle the extra I-Os needed to move their data off! Yes, there are tools out there that will "transparently" migrate data about, without impacting I-O rates but being able to forecast exactly what effect a mass-migration will have (and therefore whether it will, in fact, cure your problem) is very, very hard. Sometimes the law of unintended consequences mean you end up with a worse problem than when you started.

    For these reasons most large organisations (i.e. the ones with the biggest performance problems) prefer solutions that merely require the application of money, rather than skills: for example intelligent controllers, mirrored copies that can be broken off from the main array and snapshots for backups.

    Of course, the best solution would be to know your application and business requirements so you can properly design your storage requirements from day #1, by knowing how the application would grow, how big it would eventually get, how many IOPS are needed and at what response time and then doubling the answer to get close to the end result. But we can rule that out as a possibility, can't we?

  3. Nathan Meyer

    Faster Hardware Will Always Be Defeated

    Having done my time in systems managment and performance, I always find it touching when I read articles about trying to squeeze the maximum from new fast hardware. It matters not what you do, faster hardware will always be defeated by application programmers (following the latest development method) and customer requirements (for senseless functions they fondly imagine to be "cutting edge".) Got to love the system guys though, they do keep trying..

  4. John Smith 19 Gold badge
    Thumb Down

    Welcome back the anamartic wafer scale disk emulator

    Ho hum here we go again.

    Something also tells me that Teradata may not be quite as unique in this as they claim.

    However they may well have it better integrated into their system (IIRC they host on their own closed source DB, not Oracle, SQL, DB2 or whatever).

  5. Britt Johnston

    data zoos

    Our databases are so tied up with access issues that there is no control of 'hot' data outside the transactional systems. Instead of managed filtering into OLAP warehouses, a download from the wild occasionally escapes into the business world, where it is replicated, joined with others and updated.

  6. Michael 77
    Paris Hilton

    @ Daniele

    Cynic!

    http://www.theregister.co.uk/2009/08/24/whitehorn_ssds_servers/

    Paris, because she's not an evangelist!

  7. The First Dave

    Power

    What is the basis for that tiny power consumption figure for SSD's? I'm presuming that you aren't comparing like for like in terms of size, and even so, the load factor is important for both types of drive.

  8. Chris C

    Accuracy?

    "Slow hard disks (5K rpm) 6 – 8 watts High Slow Cheap

    Fast hard disks (15K rpm) 16 watts Medium Fast Expensive

    SSD (Solid State Disks) 150mW Low Blistering Painfully expensive"

    Sorry, but I lost interest and stopped reading at the inaccurate table of drives. When the details of a three-product table aren't accurate, I have no faith that the rest of the article will be, either.

    "5K rpm" hard disks? Where? When was the last time anyone bought a 5K rpm hard disk? Searching my distributors, I could only find a few models still available in 5400 (5900 for a few Seagates) rpm. 7200 rpm has been the standard for a long time.

    And what's the deal with the SSD info? 150mW? Where? Which models? The Intel X-25E uses 2.4W (typical) for the 32GB model and 2.6W (typical) for the 64GB model. The OCZ Vertex EX (60GB) uses 2W active and 500mW idle. Then there's the issue of "blistering" speed. Yes, read speed is good, but write speed is horrible, especially for random writes. The Imation S-Class 27519, for example, has 19,000 random read IOPS but only 130 random write IOPS. The Intel X-25E has 35,000 random read IOPS but only 3,300 random write IOPS. Many benchmarks have shown SSDs to be horribly slow at small (4K) writes as well. So, if you're going to call an SSD "blistering", you better qualify it by saying that it's only "blistering" for reads, as write speed will likely be far less than even a 5400-rpm hard drive on a write-heavy basis.

  9. alien anthropologist
    FAIL

    Sniff..

    ..sniff.. CRIPES! This article does not merely smells, it reeks!!

    The best place for hot data is in memory. Accessing data in memory in a storage array using fricken large pipes.. like 40Gbit Inifiband running RDMA protocol.. is fast. Very fast.

    Recipe is simple. Storage cell or array, running 8 or more CPUs with 512MB RAM (RAM is cheap). 2 or more PCI-E Infiniband ports running into a Infiniband switch. Where the LUNs on the storage cell are published as SRP devices to database servers that are wired, via Infiniband, into the same switch.

    And this is not just something we alone are using.. Oracle's Exadata storage nodes use the exact same technology. And according to all accounts, at truly blistering speeds (they also use special s/w juice at the storage layer that builds custom data blocks on the fly).

    Looking solely at the physical I/O storage layer is idiotic. For performance and scalability, you need to look at the complete I/O subsystem. Not just storage. And with RAM being pretty inexpensive and with incredible pipe sizes available with Infiniband.. this is not rocket science.

  10. kingwahwah
    Headmaster

    @Nathan Meyer

    So true.

    I provided fibre speed access to DB sat on SAS 15k array only for developers to cripple it with bad code. Wild card searches on UK postcode database....but development time "too precious" to fix!

  11. Mark.Whitehorn

    General reply

    > Was this an advertisement for Teradata?

    No, is the simple answer. Teradata didn’t pay for the article, it didn’t even request it. I wrote the article because I have used SSDs is databases and found them efficacious in certain situations, so I wrote about them. Teradata has, in my opinion, an excellent database engine and makes innovative use of SSDs, so I mentioned the company and the product as well.

    >Interesting, the format of this piece followed exactly that of Radio Four's "Thought for the Day":

    I have to say that I am proud to be even considered in the same breath as Rabbi Bloom......

    >Something also tells me that Teradata may not be quite as unique in this as they claim.

    Certainly other companies can migrate data dependent upon its use but none, so far as I am aware, do it as effectively as Teradata – which is why I chose that company as the example. .

    > What is the basis for that tiny power consumption figure for SSD's?

    > And what's the deal with the SSD info? 150mW? Where? Which models?

    The figures are for the Intel X18-M and X25-M Mainstream SATA Solid-State Drive. In fact, the figure I quoted was the Active power consumption (150 mW Typical (PC workload); the idle figure is half that (75 mW Typical)

    >Then there's the issue of "blistering" speed. Yes, read speed is good,

    >but write speed is horrible, especially for random writes.

    This certainly used to be true and you may have had poor experiences with early disks, but the most recent SSDs are in a different class.

    >The Imation S-Class 27519, for example, has 19,000 random read IOPS

    >but only 130 random write IOPS.

    >The Intel X-25E has 35,000 random read IOPS but only 3,300 random write IOPS.

    True, but the figures for the X 18-M and X 25-M are:

    Random I/O Operations Per Second (IOPS) Random 4 KB Reads: up to 35,000 IOPS

    Random 4 KB Writes:

    • 80 G X25/X18-M - up to 6,600 IOPS

    • 160 G X25/X18-M - up to 8,600 IOPS

    > Many benchmarks have shown SSDs to be horribly slow at small (4K) writes as well.

    Please note that the above figures are for 4K writes.

    >So, if you're going to call an SSD "blistering", you better qualify it by saying that it's >only "blistering" for reads, as write speed will likely be far less than even a

    >5400-rpm hard drive on a write-heavy basis.

    Given the figures quoted above for the new Intel disks, I’m happy that the use of the word ‘blistering’ is still appropriate unqualified.

    >"5K rpm" hard disks? Where? When was the last time anyone bought a 5K rpm hard disk? >Searching my distributors, I could only find a few models still available in 5400

    >(5900 for a few Seagates) rpm. 7200 rpm has been the standard for a long time.

    Where? - Still available at your distributers. When? People are still buying them now. And they are very cheap because, I agree, the standard disk of today is faster. What I was trying to do in this table was to plot the ends of the available disk spectrum. Had I started the table with a 7K RPM disk someone would have pointed out, quite correctly, that 5K RPM disks were still available and I had missed them.

    >The best place for hot data is in memory.

    For speed certainly, but speed is not the only consideration; Durability is also certainly part of the ACID test....

This topic is closed for new posts.

Other stories you might like