back to article The glorious uncertainty: Backup world is having a GDPR moment

"The right to erasure is not absolute," the UK Information Commissioner's Office told us as the question of the backup tech industry's exposure to the EU's General Data Protection Regulation was raised in the week after it came into force. The concerns Just last week, Curtis Preston, chief technical architect at Druva, raised …

Page:

  1. Anonymous Coward
    Anonymous Coward

    Blockchain ?

    Worth noting that once data is in a blockchain, it's there forever, unless the blockchain was designed to remove data before creation.

    The best you can do is post a correction/addendum later on.

    1. Anonymous Coward
      Facepalm

      Re: Blockchain ?

      Worth noting that once data is in a blockchain, it's there forever, unless the blockchain was designed to remove data before creation.

      Yes, all the banks and other people getting excited by blockchains recently mostly haven't considered this at all. Other than suggesting that blockchains should be exempt from GDPR!

      https://www.theverge.com/2018/4/5/17199210/blockchain-coin-center-gdpr-europe-bitcoin-data-privacy

      1. Anonymous Coward
        Anonymous Coward

        Re: banks ... haven't considered this at all.

        You know why ? Because it costs. Just like DDA requirements (which they also suggested shouldn't apply to them going back).

        1. Anonymous Coward
          Joke

          Re: banks ... haven't considered this at all.

          That's ok, just try asking the Register to forget all our private information related to the existence of reality... oh, I know some have already forgotten reality exists... ;)

    2. Claptrap314 Silver badge

      Re: Blockchain ?

      No. A majority of miners can rewrite the chain. This was done, for instance, in response to the Etherium hack. For an internal blockchain, a rewrite is more-or-less trivial.

      Deleting data would be very similar, I believe, to purging proprietary data out of a git repo.

      Hard, but finite.

  2. Anonymous Coward
    Anonymous Coward

    A minor point but doesn't Article 17 use the word erase, not delete?

    My understanding is that just changing the data to something useless ("wibble" would be my weapon of choice) complies with the letter and spirit of the regulation.

    I can't think of a situation where erasure would be accomplished better by obfuscation than deletion but my point is the rules don't say you have to delete.

    1. Anonymous Coward
      Anonymous Coward

      In the context of data, erase == delete

      erase

      VERB

      [WITH OBJECT]

      1 Rub out or remove (writing or marks)

      ‘graffiti had been erased from the wall’

      1.1 Remove all traces of; destroy or obliterate.

      ‘over twenty years the last vestiges of a rural economy were erased’

      ‘the magic of the landscape erased all else from her mind’

      1.2 Remove recorded material from (a magnetic tape or medium); delete (data) from a computer's memory.

      ‘the tape could be magnetically erased and reused’

      ‘the file has been erased from the hard disk’

      1. cream wobbly

        Beware of the Leopard?

        Not really feeling it, are you?

        In computing jargon, erasure is as simple as unlinking. The data's still there, but there's no direct path to its retrieval. It's the electronic version of 'in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the Leopard.”'

        In legal jargon, erasure has a very definite meaning, closer to the common sense concept of erasure: the data's gone. This type of erasure in computing jargon means scrubbing; whether that's overwriting with zeroes, or "dd </dev/urandom >/that/guy" in a loop fifteen times with an upside-down chicken on your left elbow.

        Of course if you scrub something, you should log it...

    2. Anonymous Coward
      Anonymous Coward

      Not really relevant for this issue...

      Whether you want to scrub it or permanently delete it, the challenge of finding the personally identifiable data in a backup is the same.

  3. Aladdin Sane

    Not my field of expertise

    But deleting on restore would seem to be the most logical way to go about this. My only question is, once you've "forgotten" about somebody, how do you remember to forget them on a restore?

    1. Anonymous Coward
      Anonymous Coward

      Re: Not my field of expertise

      Erase-on-restore is probably a nonstarter because it is technically trivial to *not* erase-on-restore, so the PII is still definitely available and identifiable. Likewise you've get to the root of the problem in that you need to be storing a unique (i.e. not anonymous) identifier to perform the erase-on-restore in the first place.

      Anonymisation of your backups through something like tokenisation or classic data mastering techniques is really your only option. If you delete the tokenisation key or the master record, the record in the backup becomes (to some extent) anonymous. However even this is thorny because simply removing explicit PII is not necessarily enough to anonymise the data. Depending on the data context it may be trivial to reconstruct the identity, even if all of the unique keys and identifying fields are now random garbage.

      Yes, this is hard. I suspect that, based on what the guidance eventually says, static/cold backups will have to be strictly time limited to a period less than what we're currently used to and justified as legitimate business purposes. As long as we're all perfectly clear with our data subjects that we're doing that, we should be fine.

      1. Adam 52 Silver badge

        Re: Not my field of expertise

        Storing a list of erased people is legitimate. There are plenty of reasons to do it (protection from non-compliance claims is the obvious one).

        Just because it's trivial not to erase on restore doesn't make it non-compliant. It's technically trivial to make your s3 bucket public visible but as long as you don't do it you're OK.

      2. Doctor Syntax Silver badge

        Re: Not my field of expertise

        "Erase-on-restore is probably a nonstarter because it is technically trivial to *not* erase-on-restore"

        It's equally technically trivial to not act on the request in the first place. No difference.

        "If you delete the tokenisation key or the master record, the record in the backup becomes (to some extent) anonymous."

        How do you handle the restoration of the backup of the key?

    2. Chris 3

      Re: Not my field of expertise

      Keep a log of those people who have successfull requested deletion.

      If you restore a backup, re-run deletions from the time of the backup.

      That log would be covered by legitimate interest.

      1. really_adf

        Re: Not my field of expertise

        Keep a log of those people who have successfull requested deletion.

        If you restore a backup, re-run deletions from the time of the backup.

        That log would be covered by legitimate interest.

        Not sure your last point applies but I note only someone restoring data needs to be able to read the log and entries can be removed after the retention period for the data is reached.

        Seems like a pragmatic solution to me.

    3. Anonymous Coward
      Anonymous Coward

      Re: Not my field of expertise

      > My only question is, once you've "forgotten" about somebody, how do you remember to forget them on a restore?

      One suggestion elsewhere that sounds reasonable, is to one way hash their identifiers (email, etc) and store those. The original identifiers can't be recovered, however newly restored records could then be hashed and compared. If any records match, nuke those ones.

      On a side note, it would be kind of amusing (not in a good way) to see a hash collision play out through various systems. And if someone's details have a hash collision on one system, there's a reasonably chance they'll also collide elsewhere too.

    4. biolo

      Re: Not my field of expertise

      In some cases it is possible, say you've identified a row in a RDBMS that was covered by the RTBF request, if that row has a unique reference number (say customer or order ID), then you could add that unique reference number to your "future_forget" list. If the only way of identifying the row is by using the persons personal information, adding that to your "future_forget" list would have its own obvious GDPR problem, although you might be able to argue that that information was necessary in order to comply with GDPR and therefore lawful as long as you weren't using it to influence decisions. If the law requires you to retain info, then a GDPR request cannot compel you to delete it. Of course in this instance the data only exists because of the GDPR request, but surely you need to track RTBF requests, to show you have complied with them, and to do that you have to store the requesters personal info in your RTBF tracker. I think it's fair to say that this whole area is somewhat unclear.

    5. chivo243 Silver badge
      Boffin

      Re: Not my field of expertise

      @3 hrs Aladdin Sane

      "But deleting on restore would seem to be the most logical way to go about this. "

      That is what a GDPR readiness group told me a few weeks ago. and that a log of what's been deleted should be kept? then that remaining data should be backed up again...

      Lovely!

    6. Anonymous Coward
      Anonymous Coward

      Re: Not my field of expertise

      I'm currently writing GDPR cleardown procedures for people that no longer need processing.

      When the cleardown is run, the unique id for the person is stored, and the date it was deleted. This cannot be tied to any personally identifying data by itself. Nothing else is retained.

      In the very unlikely event where a customer accidentally sets a person to have left years ago, dated some time ago, and the data are mistakenly deleted we can tell them when it was deleted. Can't get the data back, of course.

      1. Adam 52 Silver badge

        Re: Not my field of expertise

        "When the cleardown is run, the unique id for the person is stored, and the date it was deleted. This cannot be tied to any personally identifying data by itself. Nothing else is retained"

        This only works for very, very simple organisations. Most organisations will be worried about being sued or regulator investigations. Those organisations will have to store somewhere the identity associated with the unique id in order to defend themselves.

        As soon as you do that your data subject can be reidentified from information which can reasonably be expected to be... whatever the precise wording is.

        1. Anonymous Coward
          Anonymous Coward

          Re: Not my field of expertise

          This is for anything but a simple organisation. Data are held for third parties. They define personal data that are no longer required, the age of data to be cleared down, and when it should be activated. If the third party has a need to retain the data for their own internal or legal purposes, that is their choice. They know that once the clear down is activated, historic data cannot be recovered. The minimum retention period should be sufficient to comply with potential legal requirements, HMRC requests, and so on, if the third party has any sense.

          Remember the 'right to erasure' does not apply 'for the establishment, exercise or defence of legal claims.'. Neither does it apply if the personal data is necessary 'for the purpose which you originally collected or processed it for'.

          I'm not involved in the legal side of the business, but the above seems to indicate that if there is a potential legal liability, data retention up and until the limit of that liability is fine.

    7. Cynic_999

      Re: Not my field of expertise

      The solution is pretty obvious to me. Deletions are only performed on live (current) data, BUT a record or log of all such deletions is kept. If and when it is ever necessary to restore from a backup archive, that deletion log is used to immediately delete the same data on the newly restored records before going live with the restored media.

      Something that could be trivially automated so that it is applied automagically after any restore script is run.

    8. wcpreston

      Re: Not my field of expertise

      Well this IS my field of expertise, so I'll chime in. :)

      Many backup systems have added (or are in the process of adding) features to delete personal data from the backup -- to a certain extent. For example – depending on how your backup is stored – it is technically feasible to delete all spreadsheets, word processing files, or PDFs with a certain person's identifier in them. But asking that same backup product to delete the person's data from within a file or database – while keeping the rest of that file intact – is venturing into extremely dangerous waters. If that's what we're asking, I'll have to agree with the quote in the article form Linus Chang, "deleting data from a backup is a terrible idea because it risks corrupting the backup, breaking referential integrity, breaking applications that were expecting that data to be present, and importantly, breaking any checksums on the data that would prove that a restore was successful."

      That leaves the "delete on restore" option. My opinion is that a RTBF journal/database that stores ONLY the unique identifiers (and no other data) – while it sounds on the face a direct defiance of the RTBF – is the best way to ensure the person "stays forgotten" if there is ever a restore from older backups. It's even possible to have the backup system trigger the "make sure these people stay forgotten" process after a restore.

      The RTBF article of the GDPR says you can keep data required to defend against a legal claim. In addition to being used for this "make sure they stay forgotten" process, this database I'm proposing can also be used to prove when someone asked to be forgotten, when they were forgotten, etc, in case of a GDPR claim. In addition, the use I'm proposing is also to protect against a legal claim – that you said you forgot somebody that you ended up restoring from backup. Ergo, I think it should be OK to have a RTBG journal/database. I am not a lawyer and am not giving legal advice on GDPR. I'm just spitballing here.

    9. Doctor Syntax Silver badge

      Re: Not my field of expertise

      My only question is, once you've "forgotten" about somebody, how do you remember to forget them on a restore?

      GDPR allows you to keep PII which is being held for a good reason. You couldn't, for instance, forget the delivery details of an order which is yet to be despatched. On this basis one should be able to hold the forget request until all the backups that the real data may be on have been superseded and wiped.

    10. bombastic bob Silver badge
      Devil

      Re: Not my field of expertise

      "how do you remember to forget them on a restore?"

      Ironically, you'll probably need to "remember" them in a "forget" data set. Google will have to "remember" them in a "forget" algorithm, too, so that search results don't show whatever it is that's supposed to be "forgotten". And don't even get started on www.archive.org [it's a GREAT resource for 'forgotten' data files and web pages that you might want to research, but are no longer available].

      I guess www.archive.org could just make it impossible for EU people to access it... or "remember" to "apply the 'forget filters'" if you have an EU IP address or something.

      I like the idea of GDPR in principle, but I think the actual mechanics of it will have too many unintended consequences [like for backups].

    11. spold Silver badge

      Re: Not my field of expertise

      >>> once you've "forgotten" about somebody, how do you remember to forget them on a restore?

      Tag the data subject records with a unique identifier (a meaningless but unique number - MBUN). When I forget someone and delete the records that MBUN is no longer linked to a an identity, Keep a list of all the MBUN identifiers whose records you forgot - when you restore delete any records on my MBUN list.

      Of course encrypt my backups - many authorities consider that in the event of a breach encrypted data is not a disclosure since the risk of re-identification is very small. Whether all this is sufficient is open to guidance and findings by individual DPAs but may well be a defensible position and unlikely to incur a huge fine.

  4. Anonymous Coward
    Anonymous Coward

    This is going to be interesting for marketing and mailshots to existing customers restored from a backup that asked for their data to be deleted.

    1. Adam 52 Silver badge

      You just re-delete as part of your restore process. Same as replaying a transaction log before bringing a database back up.

      1. AMBxx Silver badge

        You're assuming the log of those to be deleted has also been backed up.

    2. wcpreston

      I had a person reply to one of my twitter comments that this exact thing happened to him. The company "forgot" him, then a while later he started getting emails from them. Upon investigation, they realized the restored the marketing database to before he was forgotten. They had to institute the kind of process we're talking about.

  5. stungebag

    Not a problem

    I don't understand the problem. If a person's data is deleted then subsequent backups will not contain it. If it's ever necessary to restore from a backup taken prior to the deletion then later transactions, including the deletion, will be reapplied. Yes. it's theoretically possible to restore the backup and then do something nefarious, but if you're that sort of organisation you won't care about complying with GDPR in the first place.

    This is all being overthought, even though the Information Commisioner has repeatedly made it clear that enforcement will be appropriate to the organisation and circumstances and that those making an honest effort have nothing to fear.

    Maybe - just a thought - there are those trying to stir up GDPR FUD for financial gain. Oh, surely not?

    1. Aladdin Sane

      Re: Not a problem

      those making an honest effort have nothing to fear.

      Nurse! My sides!

    2. Duncan Macdonald

      Re: Not a problem - Not true

      This is another version of erase on restore. The problem is that an old copy of a database can be restored WITHOUT applying the later transactions (and this may well be the case for debugging a problem) in which case the persons data is accessible again.

      The question that the "right to be forgotten" legislation has to take into account is whether a commitment to delete a persons data if restored from a backup is sufficient.

      There is also the problem that an old backup of a users files may contain personal data that was not identified in a search for such data because it was only in backups. (An example - a user had a spreadsheet with names and addresses that was deleted before GDPR came into force but which still exists on old backups.)

    3. rmason

      Re: Not a problem

      @Stungebag

      People keep backups. That's the bit you're missing.

      Not everyone is on one week/month retention then overwrite.

      We have old backups going back years. Deleting something from *all of them* would require significant time, and probably 5 or so old types of type drive somehow being connected and bought back to life.

      1. Oliver Mayes

        Re: Not a problem

        "5 or so old types of type drive somehow being connected and bought back to life."

        If it's that difficult for you to restore a backup, do you really have a backup?

        1. Anonymous Coward
          Anonymous Coward

          Re: Not a problem

          "do you really have a backup?"

          It depends on the requirements. If it's for a legal requirement with very infrequent and non urgent access then yes. There are specialist data recovery firms that maintain infrastructure to support just this type of access.

          There is a world of difference having to retrieve something specific from a particular time, to having to restore data from all of the backup tapes over a period of time. I've know multiple tape drives being worn out during massive data disclosure exercises. Incidentally if the tape movement logs show a bunch of your old tapes have been retrieved and updated, data disclosure becomes even more of a world of pain.

          1. Doctor Syntax Silver badge

            Re: Not a problem

            "If it's for a legal requirement with very infrequent and non urgent access then yes."

            If this means that the PII has to be retained for legal purposes then you're in the clear.

        2. Doctor Syntax Silver badge

          Re: Not a problem

          "If it's that difficult for you to restore a backup, do you really have a backup?"

          And why are you even keeping it that long?

    4. Anonymous Coward
      Anonymous Coward

      Re: Not a problem

      Where I work we're on an 11 year retention policy for every single piece of data. We have data stored on DDS DAT tapes, some of it is so old and has been deemed applicable to audit and must be kept. I remember an episode 2 years ago where the purchasing team had to source a working DAT tape drive from eBay or Craiglist or some such nonsense.

      This raises the question, how can I erase your data if I cannot read it back? I appreciate the argument that, "Ignorance is no excuse." and "It's you're perogative to ensure you can read backed up data." but that argument will be tested at some point. You have Fred's data on a tape backup that you know you cannot dump in the bin but at the same time you can no longer read. What do you do? Break the law on retention or break the GDPR law? I suppose you pick the cheapest in terms of the fine and hope none of your customers find out! Ha ha!

      1. Doctor Syntax Silver badge

        Re: Not a problem

        "You have Fred's data on a tape backup that you know you cannot dump in the bin but at the same time you can no longer read."

        This raises questions about the sanity of the audit or about your failure to migrate the old data to new media once the old one becomes obsolete. It also raises the question of whether you have effectively forgotten everything on the old media already.

      2. Alan Brown Silver badge

        Re: Not a problem

        "Where I work we're on an 11 year retention policy for every single piece of data. We have data stored on DDS DAT tapes, some of it is so old and has been deemed applicable to audit and must be kept."

        I see this kind of misinterpretation of "kept" all the time.

        Keeping the data is not the same as keeping the media. Migrating backed up data to new media is critical to ensure that your backups actually remain accessible.

        If someone had to go find a drive on Ebay, that's prima-facie evidence that you haven't bothered actually periodically checking the integrity of those backups - which is a necessary requirement for any properly working backup system. As such the system should have a big red FAIL stamped on it.

        Two worst cases I can think of off the top of my head:

        1: The BBC micro and their Domesday book.

        2: The academic I work with who has a garage full of 1970s-era NASA 9-track tapes full of raw data from earth observation satellites he wants restored one day - for shits and giggles I found an outfit which can do it. They want over £250 per tape - not "because they can", but because the equipment is so fragile (and head wear such an issue) that it costs about that much just to keep it running (people scrounging old electronics to find working bits have to be paid)

        When I told him how much it'll cost, it put a dampener on his restoration plans. He'll never afford to be able to do it but he won't admit defeat and bin the tapes either. Every year he delays his decision the per-tape cost continues to climb and one day they won't be able to be restored at all (My suspicion is that the original data is still online inside NASA somewhere anyway, they seldom discard things)

    5. DavCrav

      Re: Not a problem

      "I don't understand the problem. If a person's data is deleted then subsequent backups will not contain it. If it's ever necessary to restore from a backup taken prior to the deletion then later transactions, including the deletion, will be reapplied. Yes. it's theoretically possible to restore the backup and then do something nefarious, but if you're that sort of organisation you won't care about complying with GDPR in the first place."

      GDPR means you shouldn't have that information any more. Not 'well, it's all the way at the back of the filing room, and I never go that far, so no worries, right?'

      GDPR and backups, without case law to guide, is an issue.

      1. wcpreston

        Re: Not a problem

        I'm not sure the filing room analogy works here. In this analogy, the filing room represents the main "production" copy of the data. And yes, you WOULD have to delete it in that case.

        But imagine, if you will, you had a scanned JPG of every piece of paper in each drawer, stored in a completely separate system. Continue to imagine that you don't have OCR, so you can't scan the contents of each JPG without physically pulling each one up, reading all the words in it, then moving on to the next image. Now imagine being asked to redact info from those JPGs without the ability to search them. Now you're a bit closer to what we're talking about here.

        I agree with your last comment. I look forward to further guidance from the ICO.

        1. DavCrav

          Re: Not a problem

          "I'm not sure the filing room analogy works here. In this analogy, the filing room represents the main "production" copy of the data. And yes, you WOULD have to delete it in that case.

          But imagine, if you will, you had a scanned JPG of every piece of paper in each drawer, stored in a completely separate system. Continue to imagine that you don't have OCR, so you can't scan the contents of each JPG without physically pulling each one up, reading all the words in it, then moving on to the next image. Now imagine being asked to redact info from those JPGs without the ability to search them. Now you're a bit closer to what we're talking about here."

          Just because it's difficult, maybe even impossible without deleting your backups, doesn't necessarily mean you don't have to do it under the law. If your backups are required by law, then you might have to restore then, delete the data, then re-archive.

          Nobody said that new legal requirements would be easy, but you know, people had two years to think about this. Why has it taken until a week after the law started for someone to say 'what about backups'?

          1. wcpreston

            Re: Not a problem

            We brought it up before. It's getting coverage now because the law is now in effect. Such is life with news.

            There are actually sections in the GDPR that speak to technical infeasibility and undue burden as a defense against certain requirements of the law. In addition, the need to keep the data for other valid business purposes is also a defense.

            As to what you're proposing (restore, delete, backup again) for every single request? The cost is so high that most companies would just pay the fine if the law were to be enforced that stringently. We're talking costs in the tens of millions every single time you get a request. My opinion is that is never going to happen. Not to mention the risk of doing something wrong and doing damage to the company.

            The ICO said they will provide guidance on this soon, and I for one am looking forward to it. I'm willing to bet the advice is going to be closer to what Robert Wassall said in the article. The data needs to not be accessible to production systems, not be used for any decisions, etc. To that i would add that a company must commit to deleting it if it ever DOES come out of the backup system via some kind of restore.

            My opinion so far.

            1. DavCrav

              Re: Not a problem

              "As to what you're proposing (restore, delete, backup again) for every single request? The cost is so high that most companies would just pay the fine if the law were to be enforced that stringently. We're talking costs in the tens of millions every single time you get a request."

              For now, it would cost a lot. My guess is that new backup software will be developed with extra functionality for GDPR and RTBF-based queries (which are of course different), and it will become 'best practice' to have that sort of backup, with the largest organizations being the first to be punished if they don't, and trickling down as technology is refreshed in companies.

              And all this will happen again (will it?) when ePrivacy Mk2 comes into force.

          2. Doctor Syntax Silver badge

            Re: Not a problem

            "Why has it taken until a week after the law started for someone to say 'what about backups'?"

            It hasn't.

    6. wcpreston

      Re: Not a problem

      I can only speak for myself. My blog was a response to comments I was seeing out there that suggested that GDPR RTBF was absolute and everyone should be able to delete personal data from production AND backup systems. So I suggested that wasn't possible given current technology (nor do I think full RTBF from backups is coming any time soon), and suggested the kind of process you mentioned in your comment. So I feel like I'm trying to clear up FUD, not create it.

    7. Doctor Syntax Silver badge

      Re: Not a problem

      " If it's ever necessary to restore from a backup taken prior to the deletion then later transactions, including the deletion, will be reapplied."

      You'd hope so but Murphy's Law can apply here.

  6. Dodgy Geezer Silver badge

    At some point....

    ...people will be asking the question: "Why exactly are we doing this? What is the cost/benefit ratio for this kind of work? Is this the best way to improve humanity's lot, or is it a completely excessive response to something which was a non-problem in the first place..."

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like