back to article 12 simple rules: How Ted Codd transformed the humble database

Edgar – or Ted – Codd is one of the most influential figures in computing. Born 90 years today*, Codd – who passed away in 2003 – was the man who first conceived of the relational model for database management. Relational databases are today ubiquitous – on your PC, in your smartphone, in your bank’s ATMs, inside airline …

COMMENTS

This topic is closed for new posts.
  1. BlueGreen

    There's much to be said on sql

    and I was going to start, but then I realised I was missing the most important thing about him in the article:

    "Personally offended by US senator Joseph McCarthy’s Cold-War Communist-baiting, Codd abandoned IBM and the US entirely in 1953 and went to work across the border in Canada"

    1. Destroy All Monsters Silver badge
      Holmes

      Re: There's much to be said on sql

      The main thing to be said about SQL is that persons interested in learning about relational databases could do worse than check out Tutorial D.

  2. pierce

    wait, if wikipedia is wrong, and you have a reference to the correct infromation, you are supposed to update wikipedia and link in the reference.

    1. kventin

      """you are supposed to update wikipedia"""

      unless you can't be bothered. anyway, supposed by whom?

      1. FartingHippo
        Stop

        update wikipedia

        Well, someone's done it now.

  3. Anonymous Coward
    Anonymous Coward

    Ted didn't merely transform the database

    He normalised it (ducks for cover).....

    1. P. Lee
      Coat

      Re: Ted didn't merely transform the database

      He was the First to put it in its Normal Form.

  4. David Dawson

    NoSQL covers everything that is not SQL, not just key/ value.

    Key value is just one model, others are graph (neo4j) and document (mongodb, couchdb).

    So, nosql is a bit of a silly name, defining what something isn't, rather than what something is.

    1. ratfox
      Angel

      defining what something isn't

      You mean, like GNU, WINE… …Bing…?

    2. Anonymous Coward
      Anonymous Coward

      NoSQL

      Is not No SQL. It is Not ONLY SQL.

      1. Destroy All Monsters Silver badge
        Trollface

        Re: NoSQL

        Ni SQL?

        NI NI NI!

    3. Anonymous Coward
      Anonymous Coward

      That is "NoRelational", not NoSQL.

      The NoSQL crowd was a bit taleban calling their product that way. SQL is just a language, for the matter you can use it for non-relational data bases as well. And frankly I never understood their hate for SQL - after all it's a declarative language like most functional languages that are fashionable now, while SQL is not :).

      I understand that RDBMS may not be the best choice for some types of data, but something Codd got right was that often data themselves are more valuable than the applicataions accessing them. Applications come and go, data stay. Decoupling fully the data (and their model) from the applications accessing them was - and is - a big idea and improvement. NoSQL is fast, but their data model is far less decoupled from the representation of the application(s) accessing it. If data have no value outside the application, good, but when data are more valuable than the application, and multiple applications need to access the data in different ways. well, RDBMS are still a good idea.

      1. dan1980

        Re: That is "NoRelational", not NoSQL.

        Well put.

        The current love for NoSQL is due to the 'big data' idea, and ties into a previous Reg article about SANs apparently being not long for the world.

        The thing that many comments there ignored was that a big Hadoop cluster tied up with a NoSQL database all running on distributed DAS is just one component of a full solution. That giant data-crunching platform requires data to be fed to it from somewhere and that somewhere may well be an application (or 20) running on . . . RDBMS - be it a web platform or an ERP like SAP.

        In addition, to be any use, the output of those wonderful, distributed compute clusters must be somehow presented to the world and that presentation platform will, again, likely have some form of RDMBS running behind it.

        Not to mention that those systems will require backup solutions which, again, may involve an RDBMS.

        If NoSQL is 'taking over' from RDBMS then that really only represents the type of workloads that are currently being employed. Just remember that these 'big data' workloads are only possible due to the massive amounts of information being collected, processed and organised by other applications - applications which are quite likely to be relying on an RDBMS.

  5. smartypants

    You can store pictures, images etc. in a relational database...

    ...Whether you want to is a different matter though.

    Relational databases are brilliant. It's just a shame that they don't scale well to the size required by a growing number of use-cases.

    Here is a recommended guide to anyone who is a bit confused by the database options:

    http://howfuckedismydatabase.com/

    1. Matt 21

      Re: You can store pictures, images etc. in a relational database...

      I've seen no evidence that SQL doesn't scale, quite the opposite in fact. I've worked on may SQL databases running over the hundreds of TB and scaling has never been a problem.

    2. JLV
      Trollface

      Re: You can store pictures, images etc. in a relational database...

      Not only can you store all sorts of data, you can hang arbitrary attributes off objects.

      For example, suppose you have a table with a growing, but uncertain amount of fields:

      Table Foo (id, bar1dt, bar2varchar, bar3numeric...) where you don't know how many bars you will end up with.

      Instead restructure it to hold its attributes in a child table: Table Foo (id) and Table FooAttrib (id, attrid, numvalue, dtvalue,varcharvalue).

      Bit of a head-twister, but works surprisingly well if you need that kind of flexibility in your application.

      Relational databases do have one very large Achilles's heel however, not mentioned here. It sucks at same-type hierarchies. Parts-of-parts or directed graphs. Say if you want to identify the relationships between parts of an engine. Or a manager-to-employees hierarchy. The standard-ANSI/no vendor-extension SQL to express that is typically hand-rolled, clumsily-expressed recursion, and brutal.

      Graphs are also a big part of social networks and this is probably a big part why developers working on social networks, which are after all the most important apps ever (sarcasm), sneer on SQL's unworthiness.

      That and Java devs typically can't write to SQL without an ORM doing all the hand-holding so that also proves SQL sucks ;-)

  6. Anonymous Coward
    Anonymous Coward

    The relational model was a brilliant insight. It's such a shame that Codd's two biggest acolytes, Date and Warden/Darwen (why he can't decide how to spell his name is a mystery), are such egotistical and dogmatic fruit loops. Reading Date in particular reminds me of the Soviet propaganda tracts I had to study at university.

    1. Spoonsinger
      Coat

      Re: reminds me of the Soviet propaganda tracts I had to study at university

      You have a point, but reading Soviet propaganda tracts at university probably wouldn't have help me pass the DB design modules. (I assume you went a different route).

      1. Destroy All Monsters Silver badge
        Big Brother

        Re: reminds me of the Soviet propaganda tracts I had to study at university

        But Soviet database are always CORRECT by order of the party

        1. Michael Wojcik Silver badge

          Re: reminds me of the Soviet propaganda tracts I had to study at university

          But Soviet database are always CORRECT by order of the party

          Technically, this algorithm is known as party-checking error correction.

    2. BlueGreen

      @ Chris Wareham

      I have never seen Darwen spelt any other way. However a quick search gets me this off wikipedia "His early works were published under the pseudonym of Andrew Warden".

      I don't believe they are egotistical and dogmatic. Date has strong opinions, then again his name is virtually synonymous with RDBs because he's done a lot of development on the maths behind it. I guess he has a right to hold those opinions, and it's your burden to show they're unreasonable.

      As for Darwen, well, I corresponded with him by email over an issue (limits of sql optimisation and perhaps how to deal with them) and he was polite, considerate and gave his time generously.

      I find I cannot upvote your post. Sorry.

    3. GreyWolf
      Holmes

      Chris Date was my instructor...

      ..on the various database models..he had to get pretty egotistical and dogmatic because of the "misleading" (aka pack of lies) FUD being spread by the Codasyl advocates. Chris came up with a simple and obvious Codasyl design, a simple and obvious program code example - no-one was ever able to debug it - if I remember rightly it was about 20 lines of code and contained at least 18 bugs. The fundamental problem that he was illustrating was that ring structures like Codasyl look nice in the abstract, but it is impossible to validate programs wrtten for them..

  7. Michael Shelby

    "everybody understood the principle"

    But not everybody can apply it, which is why thedailywtf.com exists today.

  8. Anonymous Coward
    Anonymous Coward

    Relational ignored in 1976

    The 1976 book by James Martin (Principles of Database Management, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1976) fails to mention the relational model. I didn't read the 1989 revised edition....but maybe relational got a mention by then.

    1. This post has been deleted by its author

      1. Destroy All Monsters Silver badge
        Holmes

        Re: Relational ignored in 1976

        Ok, in "Computer Data-Base (sic) Organization" by James Martin ("The most up-to-date and thorough guide to the techniques of data base organization"), 1975 by Prentice-Hall, ISBN 0-13-165506-X, (printed on excellent paper in black and pink) we read:

        Part I (Logical Organization) Chapter 13: Relational Data Bases (pp 149-168)

        "Data-base systems run the danger of becoming cumbersome, inflexible and problematic. The logical linkages tend to multiply as new applications are added and as users request that new forms of query be answerable with the data. A high level of complexity will build up in many data-base systems. Unless the designers have conceptual clarity they will weave a tangled web. It is possible to avoid the entanglements that build up in tree and plex structures, by a technique called normalization. Normalization techniques have been designed and advocated by E.F. Codd. ... The enthusiasts of normalization have a vocabulary of their own and a tendency to dress up a basically simple subject in confusing language. The table, like that in Fig. 5.3, is referred to as a relation. A data base constructed using relations is referred to as a relational data base."

        There is also chapter 14: "Third Normal Form" ...

  9. Destroy All Monsters Silver badge

    1) Gavin, you got a typo: "Turning Award"

    2) Hands up who recognized the cover of the december 1972 issue of Communications of the ACM

  10. Anteaus

    Mainframe yes.. Website no.

    SQL may have been a brilliant innovation for mainframes crunching large amounts of data in a secure room, but it is a diabolically bad choice for Internet use, since it allows malicious commands to be injected into the data stream, which is especially dangerous in data entered from online forms. From what I've briefly read about it, it's not too clear if NoSQL is any more secure in that respect.

    1. Anonymous Coward
      Anonymous Coward

      Re: Mainframe yes.. Website no.

      It's just bad programming habits - in any context - including mainframes. SQL Injection has always been an issue since SQL commands can be easily built at run-time and sent to the database - the Internet just magnified it, but you can inject in any bad-written application, web or not.

      Techniques like bind variables, stored procedures, grants, etc. has been available for a long time as well, but too many web "developers" never learnt how to proper code an application using SQL (and design a database) - after all chaining some strings - without sanitizing them - is easier than declaring a variable, assigning the value and so on (or write a stored procedure and/or assign proper grants, let everything connect as the database owner!), isnt' it? It's vulnerable to injection, forces the database to reparse each statement. litters the code, but hey, it's fast and easy.... after all what is important is the site design, isn't it?

    2. Destroy All Monsters Silver badge
      Trollface

      Re: Mainframe yes.. Website no.

      a diabolically bad choice for Internet use

      This is like saying internal combustion engines are a diabolically bad choice for automotive devices because then people will crash them, drive while being drunk or text their friends.

    3. ARaybould

      Re: Mainframe yes.. Website no.

      The problem of SQL injection attacks has nothing to do with the relational model or even its implementations (given that secure ways to bind data exist), and everything to do with amateurism in web development.

  11. miket82
    Holmes

    DOB

    There is a rather large relational database in the UK called the registrar of births and deaths. Codds Date of birth could be verified by a simple query. Nothing like a primary source for data integrity! On the other hand try the internet, that's never wrong is it?

  12. Epobirs

    "Some of the biggest and most profitable names on the computing scene – Oracle, IBM and Microsoft – are currently working on relational database management systems."

    Odd wording there. It makes it sounds as if those companies, which are long time players in the RDB industry, are just now preparing their first products.

    1. oolor

      Or perhaps software at that level is an ever evolving process of patches, optimization, and new features which are major parts of these companies' ongoing business.

  13. anaru

    Codd OUTER JOIN NoSQL?

    I had to read this twice before I realised there was no specific connection between Codd and NoSQL, despite what can be inferred from "On the eve of the anniversary of Codd's birth, NoSQL advocates will gather...".

    Here I was thinking the father of relational had had some kind of deathbed conversion!

  14. Stephen Channell
    Happy

    IBM loved relational

    It's not true that IBM was late to the Relational party, it is just that it had to reach a quality threshold before it would be allowed out as a program product. System-R is what Larry copied with Oracle (complete with Rule Based Optimiser).. it was not until Oracle7 that Larry's DB had a Cost-Based-Optimiser to compare with DB2 2.

    Far from cannibalising IMS, relational was criticised as a wheeze to sell more DASD (disk) & CPU, but the competitors could not argue with the mathematical proof of relational algebra & calculus in set theory: there is no information that can not be represented relationally.

    For a very long time IMS was much more scalable than DB2 (you could even mount IMS in DB2), but only for one use-case.. choose the wrong one, and DB reorganisation was a killer.

    1. Michael Wojcik Silver badge

      Re: IBM loved relational

      And IBM's System R, of course, is from the '70s. The System R project inspired Ingres (or more specifically, it inspired the use of a relational database for the INGRES project) and Oracle, as you noted, so Stonebreaker and Ellison were in no way picking up a technology that IBM was ignoring. Stonebreaker, in particular, made a tremendous contribution, but the article's narrative about IBM ignoring Codd's work is a fable.

      Even excluding System R, which I think was sold only on a limited basis (as a PRPQ, maybe?), IBM had a commercial relational database in '81, which is a mere two years after Oracle. They can scarcely be said to have been late to the party. And it's not like the IMS cash cow needed protecting; IMS DB is still going strong.

      Typical Gavin Clarke article, with ample technical errors. (Others have already pointed out such items as the bizarre mischaracterization of NoSQL as only key/value databases.) Sigh.

  15. Destroy All Monsters Silver badge
    Windows

    An interesting note found on a harddisk

    In "The Genesis of a Database Computer - A conversation with Jack Shemer and Phil Neches of Teradata Corporation - IEEE Computer Nov. 1984":

    This article gives the context of the The DBC/1012 system with two interface processors, four access module processors, and four Winchester disk units. When fully extended to 1024 processors operating in parallel, the system will be capable of storing a terabyte (trillion bytes) of data.

    We read:

    Shemer: Another factor [in building the database computer] was the relational data model - the fourth generation of database management software. People wanted it but could not afford it, nor was it practical. The reason was that it took a tremendous number of MIPS to deliver the functionality of a relational system. However, running the software on a mainframe practically relegated the big computer to the level of a personal computer. Consequently, the user environment has retained what I call the machine-friendly forerunners, namely the hierarchical and network database management systems that emerged in the 60's. These approaches were designed to process efficiently in single data stream machine environments, while the relational model admitted to parallel processing.

    In the relational model, data is not explicitly ordered, since data items don't have pointers embedded in the data. Rather than traversing a family tree or hierarchy, you're dealing with rows and columns that represent the way most people like to view information. The relational system is synonymous with people-friendly; it's what people want, what the end user and the application programmer desire.

    The big problem was to make the relational system cost- and performance-effective. The only way to do that was to provide a great many processing cycles at low cost.

    ...

    Computer: It was an IBM scientist, E. F. Codd, who originally conceived the relational database model. What is IBM doing now?

    Shemer: IBM has taken what I regard as a two-phased approach. On the one hand, it has IMS and DLl for the production environment. They use the hierarchical approach of the 60's, now almost 20 years old. IBM appears to be committed to that investment; it is telling users to keep IMS for high-volume applications. On the other hand, it has a new relational product called DB2 that is intended for the what-if query in the end-user environment. It is for the ultimate information user who may be a novice programmer or somebody not well versed in programming at all.

    As I see it, IBM has effectively segmented the database world into two disjointed environments. It has essentially stated that the relational system it will deliver under DB2 is not efficient in accommodating production processing demands. In other words, keep IMS for account rendition, master file maintenance, etc., and use DB2 for what-if queries. It is a real dilemma for users. Moreover, this approach complicates matters. You already have an IMS database, let's say. To build a relational database, you have to have a utility program to extract information from the IMS master file. You now have two databases. What's more, they run on different machine environments, producing multiple versions of the truth. One file or the other is always out of date. Having two databases is a step backward, because one of the prime reasons for creating database management systems in the 60's was to allow multiple applications to have access to the same data. That data should have the same value at the same instant of time for both the production application environment and the what-if query environment.

  16. Jim 59

    Cool

    Cool article. Cool bloke. Cool invention. I remember trying to get my head round the RDBMS notion in the 80s, helping Dad set up a database for work. It was call Smart or something, ran on DOS.

  17. John Savard

    One trivial example of "what's wrong" with the relational model is that you use a key value to reference an entry in another table, which requires looking that key up in the index table, instead of using a direct pointer. That requires more disk accesses, and is slower.

    Of course, it's also much easier to update. But some tables aren't updated often. So it isn't a bad thing to offer the choice of doing a few non-relational things. That, of course, doesn't detract from the benefits of the relational model noted in the article.

    1. Destroy All Monsters Silver badge
      Facepalm

      thatisthepoint.jpg

      To get rid of the unmaintainable mess of pointers.

      And mixing the two concepts is just ... no.

    2. Stephen Channell
      Meh

      Cluster Indexes address this

      With a cluster index data is organised as part of the index leaf, so there is no additional access, but it’s not always a good idea because the cascading effects of moving rows from page to page can kill concurrency. Compare that to a network DB where it was common to pre-allocate max space to avoid moving data & killing concurrency (CHAR(100) instead of VARCHAR(100)).. what to saved in IO scans you paid in IO reads.

  18. This post has been deleted by its author

  19. PAT MCCLUNG

    Ted Codd told me when we were at a meeting of conference sometime in the late 70s that he had read a scholarly paper which had asserted that tabular data could not be relational, and that was why he named his creation the Relational Model.

  20. ARaybould

    Not so Simple

    To someone approaching the relational model by learning SQL, it might superficially look like just 'data in tables', but it is not that simple. I recall being a meeting in which the DBAs proudly unveiled their schema for a rewrite of their company's customer and order database. It was fully normalized, they claimed, but it turned out that they had packed a bunch of repeating groups into strings. What they thought was being clever merely revealed the superficiality of their understanding, and this decision caused no end of problems.

    BTW, I am aware that denormalization may be the right design choice, but that is beside the point here.

This topic is closed for new posts.

Other stories you might like