back to article The glorious uncertainty: Backup world is having a GDPR moment

"The right to erasure is not absolute," the UK Information Commissioner's Office told us as the question of the backup tech industry's exposure to the EU's General Data Protection Regulation was raised in the week after it came into force. The concerns Just last week, Curtis Preston, chief technical architect at Druva, raised …

  1. Anonymous Coward
    Anonymous Coward

    Blockchain ?

    Worth noting that once data is in a blockchain, it's there forever, unless the blockchain was designed to remove data before creation.

    The best you can do is post a correction/addendum later on.

    1. Anonymous Coward
      Facepalm

      Re: Blockchain ?

      Worth noting that once data is in a blockchain, it's there forever, unless the blockchain was designed to remove data before creation.

      Yes, all the banks and other people getting excited by blockchains recently mostly haven't considered this at all. Other than suggesting that blockchains should be exempt from GDPR!

      https://www.theverge.com/2018/4/5/17199210/blockchain-coin-center-gdpr-europe-bitcoin-data-privacy

      1. Anonymous Coward
        Anonymous Coward

        Re: banks ... haven't considered this at all.

        You know why ? Because it costs. Just like DDA requirements (which they also suggested shouldn't apply to them going back).

        1. Anonymous Coward
          Joke

          Re: banks ... haven't considered this at all.

          That's ok, just try asking the Register to forget all our private information related to the existence of reality... oh, I know some have already forgotten reality exists... ;)

    2. Claptrap314 Silver badge

      Re: Blockchain ?

      No. A majority of miners can rewrite the chain. This was done, for instance, in response to the Etherium hack. For an internal blockchain, a rewrite is more-or-less trivial.

      Deleting data would be very similar, I believe, to purging proprietary data out of a git repo.

      Hard, but finite.

  2. Anonymous Coward
    Anonymous Coward

    A minor point but doesn't Article 17 use the word erase, not delete?

    My understanding is that just changing the data to something useless ("wibble" would be my weapon of choice) complies with the letter and spirit of the regulation.

    I can't think of a situation where erasure would be accomplished better by obfuscation than deletion but my point is the rules don't say you have to delete.

    1. Anonymous Coward
      Anonymous Coward

      In the context of data, erase == delete

      erase

      VERB

      [WITH OBJECT]

      1 Rub out or remove (writing or marks)

      ‘graffiti had been erased from the wall’

      1.1 Remove all traces of; destroy or obliterate.

      ‘over twenty years the last vestiges of a rural economy were erased’

      ‘the magic of the landscape erased all else from her mind’

      1.2 Remove recorded material from (a magnetic tape or medium); delete (data) from a computer's memory.

      ‘the tape could be magnetically erased and reused’

      ‘the file has been erased from the hard disk’

      1. cream wobbly

        Beware of the Leopard?

        Not really feeling it, are you?

        In computing jargon, erasure is as simple as unlinking. The data's still there, but there's no direct path to its retrieval. It's the electronic version of 'in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the Leopard.”'

        In legal jargon, erasure has a very definite meaning, closer to the common sense concept of erasure: the data's gone. This type of erasure in computing jargon means scrubbing; whether that's overwriting with zeroes, or "dd </dev/urandom >/that/guy" in a loop fifteen times with an upside-down chicken on your left elbow.

        Of course if you scrub something, you should log it...

    2. Anonymous Coward
      Anonymous Coward

      Not really relevant for this issue...

      Whether you want to scrub it or permanently delete it, the challenge of finding the personally identifiable data in a backup is the same.

  3. Aladdin Sane

    Not my field of expertise

    But deleting on restore would seem to be the most logical way to go about this. My only question is, once you've "forgotten" about somebody, how do you remember to forget them on a restore?

    1. Anonymous Coward
      Anonymous Coward

      Re: Not my field of expertise

      Erase-on-restore is probably a nonstarter because it is technically trivial to *not* erase-on-restore, so the PII is still definitely available and identifiable. Likewise you've get to the root of the problem in that you need to be storing a unique (i.e. not anonymous) identifier to perform the erase-on-restore in the first place.

      Anonymisation of your backups through something like tokenisation or classic data mastering techniques is really your only option. If you delete the tokenisation key or the master record, the record in the backup becomes (to some extent) anonymous. However even this is thorny because simply removing explicit PII is not necessarily enough to anonymise the data. Depending on the data context it may be trivial to reconstruct the identity, even if all of the unique keys and identifying fields are now random garbage.

      Yes, this is hard. I suspect that, based on what the guidance eventually says, static/cold backups will have to be strictly time limited to a period less than what we're currently used to and justified as legitimate business purposes. As long as we're all perfectly clear with our data subjects that we're doing that, we should be fine.

      1. Adam 52 Silver badge

        Re: Not my field of expertise

        Storing a list of erased people is legitimate. There are plenty of reasons to do it (protection from non-compliance claims is the obvious one).

        Just because it's trivial not to erase on restore doesn't make it non-compliant. It's technically trivial to make your s3 bucket public visible but as long as you don't do it you're OK.

      2. Doctor Syntax Silver badge

        Re: Not my field of expertise

        "Erase-on-restore is probably a nonstarter because it is technically trivial to *not* erase-on-restore"

        It's equally technically trivial to not act on the request in the first place. No difference.

        "If you delete the tokenisation key or the master record, the record in the backup becomes (to some extent) anonymous."

        How do you handle the restoration of the backup of the key?

    2. Chris 3

      Re: Not my field of expertise

      Keep a log of those people who have successfull requested deletion.

      If you restore a backup, re-run deletions from the time of the backup.

      That log would be covered by legitimate interest.

      1. really_adf

        Re: Not my field of expertise

        Keep a log of those people who have successfull requested deletion.

        If you restore a backup, re-run deletions from the time of the backup.

        That log would be covered by legitimate interest.

        Not sure your last point applies but I note only someone restoring data needs to be able to read the log and entries can be removed after the retention period for the data is reached.

        Seems like a pragmatic solution to me.

    3. Anonymous Coward
      Anonymous Coward

      Re: Not my field of expertise

      > My only question is, once you've "forgotten" about somebody, how do you remember to forget them on a restore?

      One suggestion elsewhere that sounds reasonable, is to one way hash their identifiers (email, etc) and store those. The original identifiers can't be recovered, however newly restored records could then be hashed and compared. If any records match, nuke those ones.

      On a side note, it would be kind of amusing (not in a good way) to see a hash collision play out through various systems. And if someone's details have a hash collision on one system, there's a reasonably chance they'll also collide elsewhere too.

    4. biolo

      Re: Not my field of expertise

      In some cases it is possible, say you've identified a row in a RDBMS that was covered by the RTBF request, if that row has a unique reference number (say customer or order ID), then you could add that unique reference number to your "future_forget" list. If the only way of identifying the row is by using the persons personal information, adding that to your "future_forget" list would have its own obvious GDPR problem, although you might be able to argue that that information was necessary in order to comply with GDPR and therefore lawful as long as you weren't using it to influence decisions. If the law requires you to retain info, then a GDPR request cannot compel you to delete it. Of course in this instance the data only exists because of the GDPR request, but surely you need to track RTBF requests, to show you have complied with them, and to do that you have to store the requesters personal info in your RTBF tracker. I think it's fair to say that this whole area is somewhat unclear.

    5. chivo243 Silver badge
      Boffin

      Re: Not my field of expertise

      @3 hrs Aladdin Sane

      "But deleting on restore would seem to be the most logical way to go about this. "

      That is what a GDPR readiness group told me a few weeks ago. and that a log of what's been deleted should be kept? then that remaining data should be backed up again...

      Lovely!

    6. Anonymous Coward
      Anonymous Coward

      Re: Not my field of expertise

      I'm currently writing GDPR cleardown procedures for people that no longer need processing.

      When the cleardown is run, the unique id for the person is stored, and the date it was deleted. This cannot be tied to any personally identifying data by itself. Nothing else is retained.

      In the very unlikely event where a customer accidentally sets a person to have left years ago, dated some time ago, and the data are mistakenly deleted we can tell them when it was deleted. Can't get the data back, of course.

      1. Adam 52 Silver badge

        Re: Not my field of expertise

        "When the cleardown is run, the unique id for the person is stored, and the date it was deleted. This cannot be tied to any personally identifying data by itself. Nothing else is retained"

        This only works for very, very simple organisations. Most organisations will be worried about being sued or regulator investigations. Those organisations will have to store somewhere the identity associated with the unique id in order to defend themselves.

        As soon as you do that your data subject can be reidentified from information which can reasonably be expected to be... whatever the precise wording is.

        1. Anonymous Coward
          Anonymous Coward

          Re: Not my field of expertise

          This is for anything but a simple organisation. Data are held for third parties. They define personal data that are no longer required, the age of data to be cleared down, and when it should be activated. If the third party has a need to retain the data for their own internal or legal purposes, that is their choice. They know that once the clear down is activated, historic data cannot be recovered. The minimum retention period should be sufficient to comply with potential legal requirements, HMRC requests, and so on, if the third party has any sense.

          Remember the 'right to erasure' does not apply 'for the establishment, exercise or defence of legal claims.'. Neither does it apply if the personal data is necessary 'for the purpose which you originally collected or processed it for'.

          I'm not involved in the legal side of the business, but the above seems to indicate that if there is a potential legal liability, data retention up and until the limit of that liability is fine.

    7. Cynic_999

      Re: Not my field of expertise

      The solution is pretty obvious to me. Deletions are only performed on live (current) data, BUT a record or log of all such deletions is kept. If and when it is ever necessary to restore from a backup archive, that deletion log is used to immediately delete the same data on the newly restored records before going live with the restored media.

      Something that could be trivially automated so that it is applied automagically after any restore script is run.

    8. wcpreston

      Re: Not my field of expertise

      Well this IS my field of expertise, so I'll chime in. :)

      Many backup systems have added (or are in the process of adding) features to delete personal data from the backup -- to a certain extent. For example – depending on how your backup is stored – it is technically feasible to delete all spreadsheets, word processing files, or PDFs with a certain person's identifier in them. But asking that same backup product to delete the person's data from within a file or database – while keeping the rest of that file intact – is venturing into extremely dangerous waters. If that's what we're asking, I'll have to agree with the quote in the article form Linus Chang, "deleting data from a backup is a terrible idea because it risks corrupting the backup, breaking referential integrity, breaking applications that were expecting that data to be present, and importantly, breaking any checksums on the data that would prove that a restore was successful."

      That leaves the "delete on restore" option. My opinion is that a RTBF journal/database that stores ONLY the unique identifiers (and no other data) – while it sounds on the face a direct defiance of the RTBF – is the best way to ensure the person "stays forgotten" if there is ever a restore from older backups. It's even possible to have the backup system trigger the "make sure these people stay forgotten" process after a restore.

      The RTBF article of the GDPR says you can keep data required to defend against a legal claim. In addition to being used for this "make sure they stay forgotten" process, this database I'm proposing can also be used to prove when someone asked to be forgotten, when they were forgotten, etc, in case of a GDPR claim. In addition, the use I'm proposing is also to protect against a legal claim – that you said you forgot somebody that you ended up restoring from backup. Ergo, I think it should be OK to have a RTBG journal/database. I am not a lawyer and am not giving legal advice on GDPR. I'm just spitballing here.

    9. Doctor Syntax Silver badge

      Re: Not my field of expertise

      My only question is, once you've "forgotten" about somebody, how do you remember to forget them on a restore?

      GDPR allows you to keep PII which is being held for a good reason. You couldn't, for instance, forget the delivery details of an order which is yet to be despatched. On this basis one should be able to hold the forget request until all the backups that the real data may be on have been superseded and wiped.

    10. bombastic bob Silver badge
      Devil

      Re: Not my field of expertise

      "how do you remember to forget them on a restore?"

      Ironically, you'll probably need to "remember" them in a "forget" data set. Google will have to "remember" them in a "forget" algorithm, too, so that search results don't show whatever it is that's supposed to be "forgotten". And don't even get started on www.archive.org [it's a GREAT resource for 'forgotten' data files and web pages that you might want to research, but are no longer available].

      I guess www.archive.org could just make it impossible for EU people to access it... or "remember" to "apply the 'forget filters'" if you have an EU IP address or something.

      I like the idea of GDPR in principle, but I think the actual mechanics of it will have too many unintended consequences [like for backups].

    11. spold Silver badge

      Re: Not my field of expertise

      >>> once you've "forgotten" about somebody, how do you remember to forget them on a restore?

      Tag the data subject records with a unique identifier (a meaningless but unique number - MBUN). When I forget someone and delete the records that MBUN is no longer linked to a an identity, Keep a list of all the MBUN identifiers whose records you forgot - when you restore delete any records on my MBUN list.

      Of course encrypt my backups - many authorities consider that in the event of a breach encrypted data is not a disclosure since the risk of re-identification is very small. Whether all this is sufficient is open to guidance and findings by individual DPAs but may well be a defensible position and unlikely to incur a huge fine.

  4. Anonymous Coward
    Anonymous Coward

    This is going to be interesting for marketing and mailshots to existing customers restored from a backup that asked for their data to be deleted.

    1. Adam 52 Silver badge

      You just re-delete as part of your restore process. Same as replaying a transaction log before bringing a database back up.

      1. AMBxx Silver badge

        You're assuming the log of those to be deleted has also been backed up.

    2. wcpreston

      I had a person reply to one of my twitter comments that this exact thing happened to him. The company "forgot" him, then a while later he started getting emails from them. Upon investigation, they realized the restored the marketing database to before he was forgotten. They had to institute the kind of process we're talking about.

  5. stungebag

    Not a problem

    I don't understand the problem. If a person's data is deleted then subsequent backups will not contain it. If it's ever necessary to restore from a backup taken prior to the deletion then later transactions, including the deletion, will be reapplied. Yes. it's theoretically possible to restore the backup and then do something nefarious, but if you're that sort of organisation you won't care about complying with GDPR in the first place.

    This is all being overthought, even though the Information Commisioner has repeatedly made it clear that enforcement will be appropriate to the organisation and circumstances and that those making an honest effort have nothing to fear.

    Maybe - just a thought - there are those trying to stir up GDPR FUD for financial gain. Oh, surely not?

    1. Aladdin Sane

      Re: Not a problem

      those making an honest effort have nothing to fear.

      Nurse! My sides!

    2. Duncan Macdonald

      Re: Not a problem - Not true

      This is another version of erase on restore. The problem is that an old copy of a database can be restored WITHOUT applying the later transactions (and this may well be the case for debugging a problem) in which case the persons data is accessible again.

      The question that the "right to be forgotten" legislation has to take into account is whether a commitment to delete a persons data if restored from a backup is sufficient.

      There is also the problem that an old backup of a users files may contain personal data that was not identified in a search for such data because it was only in backups. (An example - a user had a spreadsheet with names and addresses that was deleted before GDPR came into force but which still exists on old backups.)

    3. rmason

      Re: Not a problem

      @Stungebag

      People keep backups. That's the bit you're missing.

      Not everyone is on one week/month retention then overwrite.

      We have old backups going back years. Deleting something from *all of them* would require significant time, and probably 5 or so old types of type drive somehow being connected and bought back to life.

      1. Oliver Mayes

        Re: Not a problem

        "5 or so old types of type drive somehow being connected and bought back to life."

        If it's that difficult for you to restore a backup, do you really have a backup?

        1. Anonymous Coward
          Anonymous Coward

          Re: Not a problem

          "do you really have a backup?"

          It depends on the requirements. If it's for a legal requirement with very infrequent and non urgent access then yes. There are specialist data recovery firms that maintain infrastructure to support just this type of access.

          There is a world of difference having to retrieve something specific from a particular time, to having to restore data from all of the backup tapes over a period of time. I've know multiple tape drives being worn out during massive data disclosure exercises. Incidentally if the tape movement logs show a bunch of your old tapes have been retrieved and updated, data disclosure becomes even more of a world of pain.

          1. Doctor Syntax Silver badge

            Re: Not a problem

            "If it's for a legal requirement with very infrequent and non urgent access then yes."

            If this means that the PII has to be retained for legal purposes then you're in the clear.

        2. Doctor Syntax Silver badge

          Re: Not a problem

          "If it's that difficult for you to restore a backup, do you really have a backup?"

          And why are you even keeping it that long?

    4. Anonymous Coward
      Anonymous Coward

      Re: Not a problem

      Where I work we're on an 11 year retention policy for every single piece of data. We have data stored on DDS DAT tapes, some of it is so old and has been deemed applicable to audit and must be kept. I remember an episode 2 years ago where the purchasing team had to source a working DAT tape drive from eBay or Craiglist or some such nonsense.

      This raises the question, how can I erase your data if I cannot read it back? I appreciate the argument that, "Ignorance is no excuse." and "It's you're perogative to ensure you can read backed up data." but that argument will be tested at some point. You have Fred's data on a tape backup that you know you cannot dump in the bin but at the same time you can no longer read. What do you do? Break the law on retention or break the GDPR law? I suppose you pick the cheapest in terms of the fine and hope none of your customers find out! Ha ha!

      1. Doctor Syntax Silver badge

        Re: Not a problem

        "You have Fred's data on a tape backup that you know you cannot dump in the bin but at the same time you can no longer read."

        This raises questions about the sanity of the audit or about your failure to migrate the old data to new media once the old one becomes obsolete. It also raises the question of whether you have effectively forgotten everything on the old media already.

      2. Alan Brown Silver badge

        Re: Not a problem

        "Where I work we're on an 11 year retention policy for every single piece of data. We have data stored on DDS DAT tapes, some of it is so old and has been deemed applicable to audit and must be kept."

        I see this kind of misinterpretation of "kept" all the time.

        Keeping the data is not the same as keeping the media. Migrating backed up data to new media is critical to ensure that your backups actually remain accessible.

        If someone had to go find a drive on Ebay, that's prima-facie evidence that you haven't bothered actually periodically checking the integrity of those backups - which is a necessary requirement for any properly working backup system. As such the system should have a big red FAIL stamped on it.

        Two worst cases I can think of off the top of my head:

        1: The BBC micro and their Domesday book.

        2: The academic I work with who has a garage full of 1970s-era NASA 9-track tapes full of raw data from earth observation satellites he wants restored one day - for shits and giggles I found an outfit which can do it. They want over £250 per tape - not "because they can", but because the equipment is so fragile (and head wear such an issue) that it costs about that much just to keep it running (people scrounging old electronics to find working bits have to be paid)

        When I told him how much it'll cost, it put a dampener on his restoration plans. He'll never afford to be able to do it but he won't admit defeat and bin the tapes either. Every year he delays his decision the per-tape cost continues to climb and one day they won't be able to be restored at all (My suspicion is that the original data is still online inside NASA somewhere anyway, they seldom discard things)

    5. DavCrav

      Re: Not a problem

      "I don't understand the problem. If a person's data is deleted then subsequent backups will not contain it. If it's ever necessary to restore from a backup taken prior to the deletion then later transactions, including the deletion, will be reapplied. Yes. it's theoretically possible to restore the backup and then do something nefarious, but if you're that sort of organisation you won't care about complying with GDPR in the first place."

      GDPR means you shouldn't have that information any more. Not 'well, it's all the way at the back of the filing room, and I never go that far, so no worries, right?'

      GDPR and backups, without case law to guide, is an issue.

      1. wcpreston

        Re: Not a problem

        I'm not sure the filing room analogy works here. In this analogy, the filing room represents the main "production" copy of the data. And yes, you WOULD have to delete it in that case.

        But imagine, if you will, you had a scanned JPG of every piece of paper in each drawer, stored in a completely separate system. Continue to imagine that you don't have OCR, so you can't scan the contents of each JPG without physically pulling each one up, reading all the words in it, then moving on to the next image. Now imagine being asked to redact info from those JPGs without the ability to search them. Now you're a bit closer to what we're talking about here.

        I agree with your last comment. I look forward to further guidance from the ICO.

        1. DavCrav

          Re: Not a problem

          "I'm not sure the filing room analogy works here. In this analogy, the filing room represents the main "production" copy of the data. And yes, you WOULD have to delete it in that case.

          But imagine, if you will, you had a scanned JPG of every piece of paper in each drawer, stored in a completely separate system. Continue to imagine that you don't have OCR, so you can't scan the contents of each JPG without physically pulling each one up, reading all the words in it, then moving on to the next image. Now imagine being asked to redact info from those JPGs without the ability to search them. Now you're a bit closer to what we're talking about here."

          Just because it's difficult, maybe even impossible without deleting your backups, doesn't necessarily mean you don't have to do it under the law. If your backups are required by law, then you might have to restore then, delete the data, then re-archive.

          Nobody said that new legal requirements would be easy, but you know, people had two years to think about this. Why has it taken until a week after the law started for someone to say 'what about backups'?

          1. wcpreston

            Re: Not a problem

            We brought it up before. It's getting coverage now because the law is now in effect. Such is life with news.

            There are actually sections in the GDPR that speak to technical infeasibility and undue burden as a defense against certain requirements of the law. In addition, the need to keep the data for other valid business purposes is also a defense.

            As to what you're proposing (restore, delete, backup again) for every single request? The cost is so high that most companies would just pay the fine if the law were to be enforced that stringently. We're talking costs in the tens of millions every single time you get a request. My opinion is that is never going to happen. Not to mention the risk of doing something wrong and doing damage to the company.

            The ICO said they will provide guidance on this soon, and I for one am looking forward to it. I'm willing to bet the advice is going to be closer to what Robert Wassall said in the article. The data needs to not be accessible to production systems, not be used for any decisions, etc. To that i would add that a company must commit to deleting it if it ever DOES come out of the backup system via some kind of restore.

            My opinion so far.

            1. DavCrav

              Re: Not a problem

              "As to what you're proposing (restore, delete, backup again) for every single request? The cost is so high that most companies would just pay the fine if the law were to be enforced that stringently. We're talking costs in the tens of millions every single time you get a request."

              For now, it would cost a lot. My guess is that new backup software will be developed with extra functionality for GDPR and RTBF-based queries (which are of course different), and it will become 'best practice' to have that sort of backup, with the largest organizations being the first to be punished if they don't, and trickling down as technology is refreshed in companies.

              And all this will happen again (will it?) when ePrivacy Mk2 comes into force.

          2. Doctor Syntax Silver badge

            Re: Not a problem

            "Why has it taken until a week after the law started for someone to say 'what about backups'?"

            It hasn't.

    6. wcpreston

      Re: Not a problem

      I can only speak for myself. My blog was a response to comments I was seeing out there that suggested that GDPR RTBF was absolute and everyone should be able to delete personal data from production AND backup systems. So I suggested that wasn't possible given current technology (nor do I think full RTBF from backups is coming any time soon), and suggested the kind of process you mentioned in your comment. So I feel like I'm trying to clear up FUD, not create it.

    7. Doctor Syntax Silver badge

      Re: Not a problem

      " If it's ever necessary to restore from a backup taken prior to the deletion then later transactions, including the deletion, will be reapplied."

      You'd hope so but Murphy's Law can apply here.

  6. Dodgy Geezer Silver badge

    At some point....

    ...people will be asking the question: "Why exactly are we doing this? What is the cost/benefit ratio for this kind of work? Is this the best way to improve humanity's lot, or is it a completely excessive response to something which was a non-problem in the first place..."

    1. JimmyPage Silver badge
      Megaphone

      Re: Why exactly are we doing this?

      to comply with a regulatory framework which had to be imposed after "the industry" proved itself totally unable, unwilling, and increasingly unlikely to manage that itself.

      1. LeeH

        Re: Why exactly are we doing this?

        A regulation the EU apparatus have exempted themselves from https://www.express.co.uk/news/world/967585/gdpr-eu-personal-data-hack-leak-personal-data-brussels

    2. iron Silver badge

      Re: At some point....

      Because your non-problem has via Facebook and CA contributed to Brexit, Trump and god knows what else. Because we never gave these people consent to record our personal information, track and profile us or the purpose of showing adverts. Because given free rein the ethically challenged, Randian CEOs running most big IT firms will exploit you as far as they can just to gain an extra dollar. I could go on.

      You may not have recognised the problem but that does not mean there was no problem.

    3. CommanderGalaxian

      Re: At some point....

      @Dodgy Geezer "cost/benefit ratio"

      Pretty simple - for each offence face being fined up to 4% of your global turnover. That's the sort of numbers that concentrate the mind of even the most glutinous data hoarding spammers. The carrot approach didn't work, now time for the big stick.

  7. sitta_europea Silver badge

    I'm surprised nobody's yet mentioned HMRC.

    Business records, not unreasonably, will include personal information such as the billing address on an invoice.

    HMRC says "Thou shalt keep all this information for seven years."

    Customer says "Delete this information now".

    So who wins that one?

    1. Anonymous Coward
      Anonymous Coward

      Who wins ? - HMRC of course ....

      if they mandate something by law, it trumps GDPR (which will have exemptions for statutory cases anyway).

    2. Lee D Silver badge

      Consider a school.

      Current government advice for various data held is published in a nice compact table with type of data and years you need to keep it by law.

      Some of it brings the limit up to legally requiring storing certain personal data for 25 years after they were last a pupil. And, no, you can't anonymise it.

      As such, "right to be forgotten" for many such pieces of data is basically non-existent.

      Even financial records tend to hover about the 4-7 year mark for even the smallest business, and no, you can't just anonymise them (the taxman may have something to say about that should you be audited for, say, VAT or income tax for private contractors as in IR35).

      The right to be forgotten is a way off for most people and requests handled on an individual basis. But GDPR hasn't really considered it in terms of practical solutions.

      1. Chris 3

        I think you'll find...

        It's not

        > Some of it brings the limit up to legally requiring storing certain personal data for 25 years after they were last a pupil.

        But you should store it until the pupil is 25.

        1. Lee D Silver badge

          Re: I think you'll find...

          Depends what you read.

          Going by:

          https://www.education-ni.gov.uk/publications/disposal-records-schedule

          Then, yes. But other places give differing advice as that's the MINIMUM required (and some of that goes up until the pupil is aged 30!).

          If the table in that document isn't enough to convince government to set a single data retention standard, then nothing will.

        2. Paul IT

          Re: I think you'll find...

          In regards to Social Services retention periods...

          The following records are subject to statutory requirements:

          Case records relating to children who have been placed, to be retained until the 75th anniversary of the child’s birth or for 15 years after death if the child dies before age 18 (Arrangements for placement of Children (General) Regulations 1999). GDPR that one!

      2. iron Silver badge

        'But GDPR hasn't really considered it in terms of practical solutions.'

        Except it has. If you are required to keep a piece of data for legal reasons then you are allowed to keep it even in the face of a deletion request.

      3. fnusnu

        It's not an unqualified right to be forgotten...

        or did you forget this?

    3. katrinab Silver badge

      You keep the information, but don't use it for any purpose other than enabling HMRC to check whether you have paid the correct tax.

    4. ToddRundgrensUtopia

      One law does not trump another, so HMRC is a lawful basis for keeping/processing personal data

    5. Doctor Syntax Silver badge

      "So who wins that one?"

      Every time GDPR comes up we have to explain this all over again.

      E V E R Y bloody time!

      If anybody concerned with implementing GDPR compliance is still asking this sort of thing they're clearly out of their depth.

  8. HxBro
    Paris Hilton

    Seems an over complication

    1. Inform user you have encrypted backups that may hold data on them

    2. You can't delete the data in those backups as it's not technically feasible or practical, you can refuse a delete on these grounds.

    3. You will remove them after X months/years or whenever the backups go into rotation.

    4. If you do restore a backup after the period you removed the user, but before the user was removed in the data, just re-run the delete function again.

    However to re-run the delete function again, you need to keep some personal data of who to delete, so in theory you can't delete them if you've forgotten them... Now I'm pretty good at forgetting things according to my wife, so I've deleted her and I now get excited to see a strange woman in my bed at night

    1. Aladdin Sane
      Trollface

      Re: Seems an over complication

      Just how strange is this woman?

  9. Chris Miller

    Gosh

    You mean legislators have no understanding of the technology they're legislating about?

    #sayitaintso

  10. anthonyhegedus Silver badge

    The data in the backup is never 'beyond use'. Isn't that the whole point of a backup? If someone's data is in a backup, then it's recoverable. Unless they implement this 'delete on restore' thing.

    1. Anonymous Coward
      Anonymous Coward

      I think it means beyond use in the sense that someone inspecting the raw backup data couldn't extract any meaningful information from it.

  11. Anonymous Coward
    Anonymous Coward

    I quite like the idea of technology having to adapt itself to human laws rather than the other way around. And obviously, because here; it's difficult, possibly costly, but by no mean mathematically impossible (as is the case for Law Enforcement demanding magical decryption keys).

    1. Anonymous Coward
      Anonymous Coward

      "I quite like the idea of technology having to adapt itself to human laws"

      Good luck designing the gun that can't be used to commit a crime but otherwise functions as a gun.

      1. Aladdin Sane
        Terminator

        I think that if technology is adapting itself, GDPR will be the least of our worries.

  12. Valerion

    Snapshots

    Surely you wouldn't make them read/write and modify and go all horribly wrong like that as suggested?

    You'd do the delete on the primary instance, then take a new snapshot to replace the existing one with. Most places would do snapshot creation automated on a schedule anyway so it would probably just sort itself out overnight. You'd then just delete the old one.

    1. wcpreston

      Re: Snapshots

      No, I am NOT suggesting anyone do that with their snapshots. But your suggestion has another issue. The snapshot you're suggesting the company delete has other legitimate purposes. It's very common, for example, to keep all snapshots for 30, 60, 90 days, or even longer.

  13. Anonymous Coward
    Anonymous Coward

    I believe at a time up until around the late 80s or early 90s the examination boards for schools, colleges and universities just deleted all their records and binned the old exam paper a couple of years after the student did their exams. And this lead to the situation where people were claiming that they achieved a particular qualification but has lost their certificates and the examination boards had to take their word for it and issue a new certificate with whatever grade they wanted on it.

    Back at the time know people who did this and went from average students to star pupils overnight.

  14. Alan Mackenzie
    Unhappy

    Serve them right!

    How come this is being presented as though it were a new problem? There's been a data protection act in Britain since, since, ..... was it 1983? I can't remember any more. That was 35 years ago, and it also contained the right to have false or outdated data deleted.

    Organisations and manufacturers have been arrogantly violating this legislation for decades, since respecting it would be expensive and inconvenient, and the legislation was toothless.

    Now I look forward to massive penalties to be suffered by those who've been holding unlawful backups for all these decades. It won't happen though. It never does with data protection. :-(

  15. Graham Lee

    Sometimes

    These backup vendors and data privacy consultants are trying to define what the right to erasure is but sometimes it's the broken heart that decides.

  16. Anonymous Coward
    Anonymous Coward

    Ivory tower IT

    I love the 'just delete on restore' answers - implying that the posters are working in wonderfully organised IT shops, with everything QA'd and stored on one monolithic central IT system. Maybe come down from the tower occasionally and meet the real world of personal data scattered in Excel spreadsheets, Word documents, pdfs and for all I know coded into C# objects.

    Now a sane person probably wouldn't lose much sleep over someone finding a list of invitees to a conference held 5 years ago, stored on a CD marked 'My C:\drive backup' - but so far there's little evidence of sanity around. The asylum's latest suggestion is a 'clear desk' policy, in case personal information is compromised, applied to workers who don't ever handle personal data - but I guess you can't be too careful - after all that scientific report you are referring to to do your job was written by a person, and that person's name is written on the cover, and where they work...won't anyone think of the children?

    1. stephanh

      Re: Ivory tower IT

      "meet the real world of personal data scattered in Excel spreadsheets, Word documents, pdfs and for all I know coded into C# objects."

      If that is how a company handles personal data, they will soon meet the real world of massive GDPR fines.

      That why the more forward-looking organizations have spent the last two years changing from the "real world" you sketched to a world in which GDPR compliance is actually possible.

      1. Anonymous Coward
        Anonymous Coward

        Re: Ivory tower IT

        If it was HR data spread around the enterprise then any GDPR grief would be deserved - but your 'forward-looking' organisations either haven't been around for more than 2 years, have amazingly simple and tied down IT systems or are wilfully self deluded if they think they are on top of the 'list of names and emails in a spreadsheet/printout/filing cabinet' that seems to exercise the GDPR purists so much. Or they just took a 'nuke it all from orbit' approach, which might work if the past means nothing to you - might not work so well if your business is all about managing data and knowledge that has been collected in the past.

    2. Doctor Syntax Silver badge

      Re: Ivory tower IT

      "Maybe come down from the tower occasionally and meet the real world of personal data scattered in Excel spreadsheets, Word documents, pdfs and for all I know coded into C# objects."

      If this is the primary data storage then they have other problems already. If this is secondary storage - look for it particularly in Sales and Marketing or possibly HR - it needs to be dealt with. Audit the business and delete any of it you find. Permanently. Even if it means going through old file system backups (not the same problem as RDBMS as regards data integrity). In the real world it's this sort of secondary storage in the hands of users that's most likely to cause damage.

    3. Simon Manning
      WTF?

      Re: Ivory tower IT

      This reminds me of a time, recently, when my partner showed me some video or photographs of an event held in the assembly hall of the School in which she works. I asked what the white rectangles were on every single piece of child's artwork on the walls of the room.

      Her answer blew my mind; "It's the name and class of the child-artist, we have to cover them for GDPR and child-protection reasons..."

      ...wtf?

  17. Ken Hagan Gold badge

    Is this GDPR or Right to Be Forgotten?

    In recent weeks I've seen quite a lot of people pointing out that "data necessary to provide the service" is exempt from GDPR. A reliable backup system is *definitely* fairly and squarely in that category, particularly if there are legal consequences to delivering wrong answers. So I can see no GDPR angle on this.

    As for the right to be forgotten, well, IANAL but wasn't all this discussed at length some weeks or months ago?

    1. Doctor Syntax Silver badge

      Re: Is this GDPR or Right to Be Forgotten?

      "As for the right to be forgotten, well, IANAL but wasn't all this discussed at length some weeks or months ago?"

      Weeks and months ago. And still we have numpties crawling out of the woodwork asking about which law trumps which when storage is legally mandated.

  18. Zippy's Sausage Factory
    Meh

    I'm comfortable with people wiping my data from a production system, and keeping it in backup, provided those backups don't stay around forever and aren't sent to the NSA at end of life...

    Paranoid? Moi? Well, I prefer "Sabotage", or "Volume 4" actually, but "War Pigs" is still a great song...

  19. Anonymous Coward
    Anonymous Coward

    Those without sin

    GDPR feels like the age of prohibition. Laws created without application to reality.

    Even the well intention-ed can readily be made into criminals.

    "Forget" about this going well ...or I will sue.

  20. LeeH
    Mushroom

    I first wrote about the issue of data backups re GDPR a year or so ago and many times since. Amazing how many times I wrote about it and yet it is only getting attention now. I mentioned it in forums here on El Reg (a well read tech site), on Facebook (a well read social site), in my own blog posts on my own sites (sites that attract techies), on Quora in answer to GDPR and EU related questions, in other places around the web, to my clients (who understood my advice), and to every day people when GDPR came up for discussion. I doubt I am the only person to consider the implications of GDPR. What did I get for my reward? Take a guess.

    This is not a newly recognised issue. It is an obvious issue that has been willfully ignored. How did people miss this?

  21. Craigie

    easy deletey?

    ' It will automatically delete any references to record X so there aren’t any referential integrity problems'

    Does anyone seriously enable cascading deletes on production databases?

    1. Duncan Macdonald

      Re: easy deletey? - unfortunately YES

      Then need to recover from backups!!!

      (Just imagine the result of DELETE FROM CUSTOMERS or for even more chaos DROP TABLE CUSTOMERS)

  22. Anonymous Coward
    Anonymous Coward

    Does anyone use WayBack Machine?

    Example: "I want you to delete the page in Byte Magazine from 1998 which has a really awful review of some software which my company created, and which mentions me - by name - in an interview."

    *

    Good luck with that!

    1. Simon Manning

      Re: Does anyone use WayBack Machine?

      Would that be considered PII? After all if the article only states; "...this piece of tat, coded by John Smith..." then there would likely be many John Smiths alive at the time of writing...

  23. SGJ

    Technically difficult "is not going to wash"

    If the ICO is now of the view that compliance being technically difficult "is not going to wash" isn't it about time she acted against the Home Office's refusal to remove mug shots of innocent people from Police Databases?

    see https://www.theregister.co.uk/2017/02/25/custody_images_review/

  24. handleoclast

    Backups aren't the problem

    Really they're not. Well, not technically. Legislatively, perhaps.

    It's restoring them that is the problem. Or it's backup to disk, then mere access can be a problem.

    Luckily, some bright spark mentioned in the article thought of that:

    The only practical thing to do is to detect and erase the information on restore, he suggested, which would be a big task but, in principle, doable.

    Erm, yeah, but I've deleted everything about Joe Bloggs of Wankstain, Essex, including his request to be deleted. So how do I know not to restore him?

    And once you've worked that one out, my favourite backup tool is rsync. Because it's bloody fast. You can even backup/restore an 80G server remotely over a shitty ADSL line in an hour (as long as the data on the server doesn't change much). If you want me to filter out Joe Bloggs from the restore then that is going to turn something fast into something slow, or at least require me to access his details in the backup so I can delete them manually before I do the restore, which legally I probably am not allow to do. Also can I do a full restore and delete him before I make the data live?

    The devil's in the details.

    1. Doctor Syntax Silver badge

      Re: Backups aren't the problem

      "Erm, yeah, but I've deleted everything about Joe Bloggs of Wankstain, Essex, including his request to be deleted."

      Two points. If you have some central record ID and that gets used as a foreign key in every other table affected then retain that foreign key. Otherwise retain the request. It will be needed to re-delete on restore. Without it you can't do as he asked so if you deleted it it you were doing it wrong.

  25. Claptrap314 Silver badge

    Ideology over mathematics

    As an American, I've pointed avoided most of the GDPR discussions. But this discussion has most of the commenters sounding like LEAs on encryption.

    Consider the following scenario:

    Company A has PII on individual X.

    Company A dutifully keeps backups of data.

    Company A is merged into company B.

    Company B dutifully maintains company A's backups.

    Company A's data bases are migrated to company B's schemata.

    Individual X applies to company B to be forgotten.

    The data in company A's backups is not indexed in any meaningful way in the current schema. A restore of this data cannot be automatically purged.

    Or how about this?

    A company acquires a dataset, and backs it up.

    The company merges the dataset into its existing databases.

    A de-dupe process is run on the merged data.

    Someone demands erasure.

    Again, the de-dupe and merge processes make automatic deletion of restored data effectively impossible.

    And these are about perfectly run shops. Real world is going to have much more trouble. Criminalizing less-than-perfect behavior is not going to encourage innovation. "Best effort" is really the only standard that can work. Unless you love selective enforcement.

    1. Doctor Syntax Silver badge

      Re: Ideology over mathematics

      "The data in company A's backups is not indexed in any meaningful way in the current schema"

      You've merged the data into B's schema. Why are you keeping backups you can't use?

      "Again, the de-dupe and merge processes make automatic deletion of restored data effectively impossible."

      Why is it impossible? Haven't you indexed it? On de-dupe you already deleted an entry so why should deletion of another be a problem?

      Both your examples are, in fact the same: merged data sets. If the merged data set is usable it would need proper indexing and should, therefore, be possible to delete as required.

      1. Claptrap314 Silver badge

        Re: Ideology over mathematics

        I'm talking about backups of the pre-merged data. Identification of pre-merged data relevant to a deletion demand requires tracing the current data back through the merge process. But the merge process itself is not a process that is supposed to be reversible. Scripting an ill-defined process...is problematic.

  26. EnviableOne

    Article 17 AKA RTBF is Qualified

    Usually it is only applicable if processing is based on consent.

    so providing you are storing data for a legal, contractual or other allowed method (not requiring the individual's consent) and only retaining it for as long as is nessacary, then RTBF does not apply

  27. David McCarthy

    The statement by the ICO spokesperson "data protection law is technology-neutral" is ingenuous.

    The European clots who put the Regulation together are, at best, technologically ignorant, and at worst not doing their job properly by creating regulations which cannot be complied with and enforced.

    Our new Policy says we won't delete backups because they are our DR lifeline, and as a micro-business we don't have the ability or resources to do anything different.

    So we've made it perfectly clear to anyone who wants to be forgotten. What more can we do?

  28. Nanashi

    I can think of an obvious way to handle this:

    Step 1: when backing up a user's data, encrypt it with a unique key for that user.

    Step 2: when a request-to-forget comes in, delete the key.

    Your backups will remain immutable but the user's data is inaccessible.

    Of course you'll need some system to manage all the encryption keys and that system is going to need its own backups, so this isn't a completely trivial solution, but encryption keys are small and it should be much easier to design a non-immutable backup system for them (especially since the keys themselves aren't PII, and you don't need to keep historical backups of them, just backups of the currently-valid set).

  29. patrickiest

    Some backup vendors DO know what they're backing up and CAN take action

    The industry sources may have been somewhat limited. Was very surprised to read the summary position: "Vendor software across the board doesn't know what it is backing up and won't necessarily easily or practicably find the data the subject has requested to be erased, according to industry sources.".

    Some vendors, such as Commvault, have considerable discovery and insight into the contents of data being backed up, including personal data, across their enterprises. They also have the ability to use that profiling to

    * enact and automating data policies

    * flag content for review

    * support prioritization of risk management according to various risk profiles and criticalities

    * perform proactive risk mitigation on the data from backups and directly from the data source

    * and support data subject requests including the right to access and right to be forgotten (with removal from backup sets and data sources).

    Points well taken from the article though. Many customers are challenged with how to deal with both the discovery and remediation of personal data within their environments. The market has some interesting options, so don't lose hope!

  30. jake_the_snake

    Backups are not archives

    I think that most people miss the key distinction between backups and archives. As other comments point out, backups are generally not indexed at a granular level. Archives are.

    Backups are an operational tool that allows for recovery in a disaster. As such, there should be no need to keep backups for longer than you would need to recover from in such a disaster.. say a month or two.

    Archives are generally kept online and provide fast access to specific records.

    If this regime is followed then the right to be forgotten is simple to implement. You delete or mask records in the archive and wait for backups to expire. And the records are gone in the 1 to 2 month period.

    1. Alan Brown Silver badge

      Re: Backups are not archives

      "Backups are an operational tool that allows for recovery in a disaster. As such, there should be no need to keep backups for longer than you would need to recover from in such a disaster.. say a month or two."

      You'd think so, but we get a constant trickle of people asking to restore a file they realise has been deleted and find that it went away a year ago, or need to go back to an old dataset for some reason.

      None of this is PII stuff, but it raises another problem in many backup environments where policy is set to avoid backing up PII, but users do stupid things and PII data gets placed in the areas which _are_ being backed up.

      The counterpoint to this is when non-PII/personal data (such as statistical data or source code) is placed in personal space and someone else needs to access it long after the person in question has left. Not enough organisations have procedures in place to ensure that "business" data is not locked away or lost in this manner.

  31. Anonymous Coward
    Anonymous Coward

    Backups are GDPR compliant trough retention.

    I work for a national health service servicing 2,8 million patients.

    We comply with GDPR, and any delete requests, trough the backups retention period.

    In other words, when the backup images expire, the data is gone.

    1. wcpreston

      Re: Backups are GDPR compliant trough retention.

      Waiting for stuff to expire from backups is a great response. The problem is too many companies keep backups for years, decades, or even forever.

  32. Exportgoldman

    What about ex staff?

    Curious if this law also covers ex staff, and if so does that mean A ex employee can force a company to forget all their work in on leaving. Chaos :)

    1. MdeB

      Re: What about ex staff?

      It amazes me that people are posting here without understanding the GDPR rules.

      "right to be forgotten" applies only to data that are held because the data subject has consented.

      It does not apply to data for which the organisation has (and has notified) another legal basis for processing the data.

      Thus complying with a request to delete personal data is not as simple as deleting all that subject's personal data (or making it anonymous by deleting the subject's name from the master index as some have suggested); it requires deleting a subset of data held. That also means that maintaining a list of requests to remove personal data IS allowed, because it is necessary to allow audit to show that you have complied with the request (and hence complied with the law, should it come to that).

  33. Claptrap314 Silver badge

    Yes and no. You CAN rewrite the blockchain if you get a majority of miners to agree to it. This was done, for instance with the Etherium breach. Doing it with an internal blockchain would not be that different from deleting data out of a git repos history.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like