back to article The Australian Bureau of Statistics has made a hash of the census

The Australian Bureau of Statistics (ABS) has so badly mishandled the question of retaining names that its senior leadership need to consider their futures. The ABS is – sorry, was – probably one of Australia's most trusted bureaucracies, alongside the Bureau of Meteorology, the Australian Electoral Commission, and Geosciences …

  1. Diogenes

    Data matching

    Whilst the ABS may not be able to supply any unanonimous data to another body, I can see nothing that stops,say, the ATO - sending the # of Rufus T Firefly 123 Somewhere st Freedonia 1/1/1900 , declared income range 45-40k - and the ABS adding a 'yes/no' flag and sending it back, or Satanlink sending - same #, 4 kids y/n

    This is why do not trust the ABS as far as I can drop kick it and I stopped trusting the BOM , when they totally and utterly screwed up the aussie temperature record (Acorn -SAT)

    1. Pompous Git Silver badge

      Re: Data matching

      I stopped trusting the BOM , when...

      I discovered that the reason they don't display data from before 1910 because it shows Australia is cooler now than the 19th C. Despite which we still keep hearing that the "unprecedented warming" is causing all sorts of mayhem.

      1. Trixr
        Black Helicopters

        Re: Data matching

        Citation please.

  2. John Tserkezis

    I'm guessing there are going to be lots of Mr and Ms Smiths, Citizens and Mohameds then.

    And of course, there are going to be a heap of "we have nothing to hide, so we'll hide nothing" idiots.

    Let's see how that works out for them.

  3. Anonymous Coward
    Anonymous Coward

    Oops I appear to have misspelled my name. How did that happen. Honest mistake.

    1. Pompous Git Silver badge

      Oops I appear to have misspelled my name. How did that happen.

      Maybe you copied it from the pharmacist's spelling. I've never known a bottle of medicine to ever have my name(s) spelt correctly over a period of 60 years and several countries and states.

    2. Anonymous Coward
      Anonymous Coward

      I didn't misspell my name - either accidentally or deliberately - and yet after seven months and three attempts at fixing it Medicare still won't let me change my address through MyGov because apparently the name I gave Medicare isn't the same as the name I gave the ATO.

      Leads me to think what I could achieve if I actually got creative with it.

    3. bleh_meh

      encode your name before entering it into the forms online:

      censusfail.com

      1. ralphh

        Little Bobby Tables time?

  4. Anonymous Coward
    Anonymous Coward

    Stasi minded people

    How are these clowns going to verify the truth on the form?

    The main purpose of the census is to plan taxation - they may claim there are other purposes, but this is the only real reason to conduct the census - they think people are more honest to the census man than to the tax man, although they both work for the treasurer.

    If they government thinks that too many people have too much income they will push up tax, and if there are too many people with low incomes, then will push up tax to pay for benefits. Notice the common thread here?

    If you are worried about names, just make them up - use the Bogan Name Generator, or characters out of Jane Austen novels. It is not as though they can cross-reference them with the Electoral Roll (that would be illegal for the moment).

    Would I trust these people to know what security is? Not likely. This is an organisation that has been quietly lobbying their masters to get their sticky little fingers onto the data collected by other agencies such as the Department of Human Services and Border Security. They also have never really been paragons of information security - the only reason why there has been no breach is because no one actually cares about their crappy data. But with names and addresses - that is now valuable.

    1. Mark 65

      Re: Stasi minded people

      If you are worried about names, just make them up - use the Bogan Name Generator, or characters out of Jane Austen novels. It is not as though they can cross-reference them with the Electoral Roll (that would be illegal for the moment).

      I wouldn't be trusting them not to cross reference the data though. I have no problem giving my name along with my income (the ATO already have this pair), it is more the extra superfluous and invasive shit they also want to know that irks me. That they should only get without identifying information.

    2. Pompous Git Silver badge
      Big Brother

      Re: Stasi minded people

      If you are worried about names, just make them up

      Which is of course an offence with the possibility of being fined $1,800.

  5. Oengus

    Not reversible but derivable...

    Richard Chirgwin with a date of birth and an address will produce the same SHA-256 key (c2483d63179b71b37334f730385272c81b5d6bd3ae6edffb49234cfeb7f7d9a6, I just tried it) no matter the source system – but the hash cannot be reversed to deliver my personal data.

    If I have the same data Name, DOB and Address and can generate the SHA-256 key, I can associate the SHA-256 key with the original source and build a table of data including the unique SHA-256 key (as a key) and now I can match the SHA-256 key from the ABS data and add Name and Address - data de-anonymised.

    1. -tim
      Boffin

      Re: Not reversible but derivable...

      This is why we don't let Reg Hacks write crypto :-) That mentioned SHA-256 hash is subject to simple brute force attacks. There are only so many names and most can be found in databases that aren't very expensive. There are about 10,000,000 address in Australia and you can buy the same database as the census used. There are less than 45,000 birthdays for people under 122 years old. To to reverse the c2483d63179b71b37334f730385272c81b5d6bd3ae6edffb49234cfeb7f7d9a6 hash, and I had your name, I could try 450,000,000,000 other hash values to brute force the hash. My 18 mo old, $60 Antminer U3 does 63 billion hashes a second meaning the proper software would take nearly 7 seconds to reverse that hash. Without a name, it would require a dictionary of names and there are about 60,000 given names and about 150,000 unique surnames in the US. Slight optimization of the search space would still result large bitcoin operators being able to reverse most of the data in a short amount of time with equipment they have today.

  6. Bubba Von Braun

    I refused to work on a project many years ago that would link ATO, Social Security, Land records, Customer and Immigration, Veterans affairs, ABS pretty much the whole alphabet.. The objective to be able to data mach from a single console across the organizations. At the time I joked I should buy shares in a fibre company.

    25 years on and it seems nothing has changed.. well the tech now exists to do it, but the lessons haven't been learned.

    Remember a census taken in 1933, government needs the information "allles gute", and a few short years later it was an index that enabled genocide. And then you add the Five Eye's to this, I can see Donald ordering Drone strikes in Australia to target Muslims and Jedi's :-)

    I wont be completing the census.. for some reason the site is always down for me ;-)

  7. Anonymous Coward
    Anonymous Coward

    Power outage, Internet down, Illness

    Sometimes you just can't get things done.

    1. Bubba Von Braun

      Re: Power outage, Internet down, Illness

      Nah Proximity Tourette Syndrome.. go near the website, a census form, or a census collector and a stream of abuse starts flowing ;-)

  8. flibbertigibbet

    > It's a mess that the ABS created for itself.

    > It takes a lot to make me say “security is now no longer the primary consideration”, but that's what the ABS has achieved.

    The first statement is true. Their steadfast refusal to describe in concrete terms what their security arrangements are is bewildering, given the importance the place on getting accurate data. I suspect they haven't cottoned on to the citizens becoming accustomed to instant fact checking, so we where before we were happy to make do with assurances from people in power we now expect to be able to verify what they say. The ABS thinks it can get away without providing that verification, as they have done in the past.

    The second para is a bit of a stretch given the first one. If you take what they have said, stitch it together (it's strewn across a wide variety of web pages), and then fill in the blanks in a positive way, you end up with a fairly secure arrangement. Well as secure as it can be given ultimately it's all down to the whim of a minister. It isn't overly different security wise to what they did in the previous census.

    For now I'm giving them the benefit of the doubt and calling this a public relations disaster, caused by old men not realising people expect a greater level of transparency now we have the internet. Not only that, the same internet has given them a voice to complain loudly and widely if that transparency isn't delivered, allowing them to chip away at the trust the ABS needs from the populace to do it's job well.

    They don't really have a choice here - the problem is this is a new thing, and they haven't figured this our for themselves yet.

  9. Andrew Commons

    What is being missed....

    Is where the data is going to be as it is collected.

    The FQDN census.abs.gov.au resolves to 150.207.169.5 and 150.207.169.8. These are allocated to an ASN apparently assigned to IBM but used by Nextgen Group who provide network and data centre services and, far more interestingly, connection of networks to cloud services. They do not appear to offer hosting and are 70% Canadian owned.

    So if you are the ABS and want to collect information from every Australian household in the space of 24 hours (or probably less really) with pretty much a zero tolerance to failure are you likely to have that capability in house? On standby for use once every four years?

    If not, then where could you find that short term capability?

    It would be a nice raw data set to grab if you were a foreign government.

    1. Anonymous Coward
      Anonymous Coward

      Re: What is being missed....

      I hear the Chinese already have all the data anyway.

      : p

      1. Ksam!

        Re: What is being missed....

        Oh please, you believe all that american propaganda?

        The 5 eyes spy on their citizens far more China ever does.

  10. Anonymous Coward
    Anonymous Coward

    I can't wait.

    My real name is a little unusual in format*. Enough so that every government department, financial institution and other large organisation around finds a different way to put it in their system (sucks to be them!).

    I am somewhat eager to see what sort of a hash the Census Bureau's numptie web-form 'programmers' will make of it.

    *I don't have a family name. I don't want one. I don't need one. It is perfectly legal to not have one (the registrar's office explicitly checked that when I applied to drop the one I had previously).

    1. Adam 1

      Re: I can't wait.

      Maybe you can enter your surname as

      '); DROP TABLE residentdetails;--

    2. Pompous Git Silver badge

      Re: I can't wait.

      I don't have a family name. I don't want one. I don't need one.

      Is that you Informal?

  11. Anonymous Coward
    Anonymous Coward

    Hashing won't work. Anonymizing data is impossible.

    I don't mean to be all troll-like and obnoxious (it just happens naturally without effort), but has anyone here actually read a book on Computer Security? One cannot get privacy through hashing names and addresses. There are too many networks, with to much data, that's all to easy to correlate. And that's without personal unique identified. With unique identifiers (even hashed ones) its not even easy. It's an Excel macro.

    Remember Iceland Kids: Data anonymization doesn't.

    Here, read this. They explain the concept with grammar and references and stuff.

    https://www.oreilly.com/ideas/anonymize-data-limits

    1. Adam 1

      Re: Hashing won't work. Anonymizing data is impossible.

      As an example of this, combining date of birth with gender and suburb gets you an average 90% match to one person.

      1. Pompous Git Silver badge

        Re: Hashing won't work. Anonymizing data is impossible.

        As an example of this, combining date of birth with gender and suburb gets you an average 90% match to one person.

        That doesn't surprise me. In late 1970 I was working for the Commonwealth Employment Service in Launceston (Tasmania), a rather small provincial city. We had two clients with the same name and DOB living in the same suburb.

        The other problem I have referred to already. My Christian name is Jonathan spelt as it is in the Bible. Almost nobody is capable of spelling it correctly, never mind government departments. Jonathon is the most popular, followed by Johnathon and Johnathan. The Australian Taxation Commissioner has managed to use all three misspellings over the years as well as sometimes managing to spell it correctly.

  12. Anonymous Coward
    Anonymous Coward

    Australia discovers absolute data security!

    Mr Turnbull today said the organisation "always protects people's privacy".

    "The security of their personal details is absolute and that is protected by law and by practice," he said.

    "That is a given."

    Head-desk-head-desk...

  13. DanielR

    Damn Straight.

    "If two data sets – the Census and the Pharmaceutical Benefits Scheme, for example – contain enough data points to consistently identify me, then a hash of that data would work just as well for anonymous analysis.

    Richard Chirgwin with a date of birth and an address will produce the same SHA-256 key (c2483d63179b71b37334f730385272c81b5d6bd3ae6edffb49234cfeb7f7d9a6, I just tried it) no matter the source system – but the hash cannot be reversed to deliver my personal data."

    Hash keys as I've been carrying on about.

    The fact they made an excuse about using names as keys proves they need it for corporate data mining and scope creep.

    The fact they can't manage and design databases properly proves how hopeless they are.

    Then the data breaches come rolling in.

  14. DanielR

    Someone mentioned BOM. Their cloud servers were infiltrated and hacked by the Chinese. This will be the same.

    1. Anonymous Coward
      Anonymous Coward

      No, no. That was their excuse when they were called out for 'adjusting' the historic temperatures downwards to try and make the present temperatures appear higher than those of the past.

    2. DaveC449

      Oxymoron

      Maybe the Chinese will do a better job of running the country than the present mob. They would probably satisfy the right wing obsession with cutting costs and minimizing services without any problem at all and they have the perfect solution for dissension - eliminate it.

  15. xpusostomos

    hash no good

    Hashing achieves nothing much. It would take a computer half an hour to hash every name in Australia. Then having made such an index, you just compare hashes to names, and voila! All your data is suddenly re-identified. It wouldn't even be hard to speculatively hash many names. For example, take the 100 most common first names and surnames, and voila, probably 10% of your leaked data suddenly is re-identified. Expand to the 10,000 most common first names and surnames and you've probably re-acquired 90% of the data. In other words, hash is a fool's paradise.

    1. Adam 1

      Re: hash no good

      Half an hour? Does that include unboxing the computer and plugging it in?

      Relatively modest PCs can hash at "many billions per second" rates. Specifically designed hardware for bitcoin mining is measured in "many tens of billions per second".

  16. Anonymous Coward
    Anonymous Coward

    Boycott/data pollution?

    So given all of the problems being raised, how does one organise a boycott/disinformation action over the next month or so before the census closes? Scribble on toilet walls? Rant like a nutter at work?

    While I could possibly handle a fine (a long as it maxes out at the limit mentioned above), how do you convince even the others on this forum that it is worth taking the risk of polluting the data when they risk being punished? (Maybe it's already illegal to be having this conversation.)

    The chilling effect of a fine seems to have been very effective, judging from conversations I've had.

    So what to do, if anything?

  17. Richard D

    Found in the press

    An IBM Worldwide Security Solution Architect has waded into the census privacy debacle, declaring Australia’s sensitive census data will be “inevitably” hacked.

    As for the past three censuses, IBM has been contracted to help with the design of the online form and maintaining data capacity. IBM remains subject to the same electronic cyber security requirements as the ABS

    Australia's top statistician has said the Australian Bureau of Statistics is "ready" for the census on August 9, describing the department as having "the best security features [for which] you could ever ask".

    Makes you fell all warm and fuzzy

    1. tnosaj

      Re: Found in the press

      Linky for the IBM Worldwide Security Solution Architect's comments?

      1. diodesign (Written by Reg staff) Silver badge

        Re: Re: Found in the press

        Here (paywall). Basically, Philip Nye, an IBM security architect, tweeted: "Since Australia doesn’t have mandatory disclosure laws, will we ever find out when Census data is inevitably breached?"

        Which ppl took to mean: "Census will be breached." He deleted the tweet, it seems.

        C.

  18. This post has been deleted by its author

  19. Penguineer

    Sesame Street

    To add to the insult, the census is being marketed to us as if we were 5 year olds.

    Current TV ads for the census feature Count Von Count from Sesame Street.

    Or maybe I'm supposed to fill out their forms with a sense of nostalgia?

  20. John__Doh

    Lotus Notes

    The ABS website appears to use Lotus Notes (well Domino) it doesn't exactly fill you with confidence that they use the latest online technology does it!

  21. Dagg Silver badge
    Mushroom

    Still CRAPPED

    I have just received a 3rd paper census form to fill in! The original arrived 2 days late was fill in and posted back weeks ago.... And they still keep posting census forms through the letter box.

    How much is this crap fest costing?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon