back to article 8 out of 10 cats fear statistics – AI doesn't have this problem

If statistics were a human being, it would have been in deep therapy all of its 350-year life. The sessions might go like this: Statistics: "Everyone hates me." Pause. Therapist: "I'm sure it's not everyone..." Statistics: "And they misunderstand me." Pause. Therapist: "Sorry, I didn't quite get what you meant there..." …

Page:

  1. Anonymous Coward
    Anonymous Coward

    You can also include more data to get the answer you want if it's not "your" right the first time. Ask the question I suppose can be change the calculation method as sometimes there isn't just one set way of doing things.

    There are many ways to fudge numbers if you know how and have a deep understanding of the data you are using.

    You're right though people don't trust statistics because of politicians and that's not going to change any time soon.

    1. Primus Secundus Tertius

      In fairness to politicians at least in the Western world: they are generally representative of ordinary people. It is other people that people don't trust.

    2. This post has been deleted by its author

    3. veti Silver badge

      I don't think we can blame this on "politicians". Everyone and her dog abuses statistics.

      And there are many ways to fudge numbers if you don't know how, and don't have a deep understanding of anything. One of the problems of statistics is not just that it's easy to do them wrong, but that it's actually really hard to do them right.

      1. John 209

        I think you can blame politicians. In the U.S., most politicians graduated from law schools and shunned quantitative courses in their academic careers, yet they deal with national and international policies of profound importance like macro-economics, international trade and international economics, as well as military capabilities, all of which are based largely on quantitative assessments. As for, "Everyone and her dog...", the purpose of a representative democracy is not for the elected to represent the simple sum of constituents' opinions, if that's was best we'd have direct democracy, it's for putting people smarter than the average in power, to run the government.

  2. ElReg!comments!Pierre

    Statistics rulez

    Let's not forget that Nethack is almost entirely statistics, wrapped in a thin layer of "UI"...

    (well, technically the code is probabilities, but the observed effects are stats)

  3. elDog

    Statistics actually started much earlier - think sums and means

    I believe that ancient granaries and other warehouses didn't count every grain or small item; they sampled a cupful and then multiplied that sample by the number of cups (or whatever.)

    The imperial foot was calculated by getting a group of men to stand toe-to-heel and then dividing the total length by the number of men. This is an "average".

    The Seven Pillars of Statistical Knowledge is an excellent, short read.

    1. John 209

      Re: Statistics actually started much earlier - think sums and means

      True, but use of the mean or average (only a measure of central tendency) to represent a population or sample, without citing a measure of variation, is itself a frequent misuse of statistics in that relatively few in a population or sample are "average".

  4. TechnoTechno
    Thumb Up

    Nice Car

    1. 404

      Soon as I saw Pascal, I knew I was in trouble and my eyes glazed over.

      Pretty car though.

  5. Anonymous Coward
    Anonymous Coward

    The issue is a good percentage of the population doesn't grasp the concept of probability...

    ... otherwise they won't waste money in bets and other games. On the other side, the small percentage of those offering bets and other games understand it very well (and hire mathematicians to ensure it when they're not sure).

    1. This post has been deleted by its author

  6. Mage Silver badge

    Without statistics, there can be no self-driving cars, no Siri and no Google.

    Rats.

    1. Vinyl-Junkie
      Joke

      Re: Without statistics, there can be no self-driving cars, no Siri and no Google.

      Yes, my first thought on reading that sentence was "presumably there's also an upside...."...

    2. allthecoolshortnamesweretaken

      Re: Without statistics, there can be no self-driving cars, no Siri and no Google.

      It's settled, then.

      Statistics must die.

  7. Cuddles

    When is a car not a car?

    "Google was 99 per cent sure my photo is of a car"

    But was somehow only 98.5% sure that it's a vehicle. While I'm sure the code to figure these things out is horribly clever and complicated, I'd have thought one of the first and simplest parts would be to check if one category is a subset of another and take that into account when figuring the overall probability

    1. Thomas 6

      Re: When is a car not a car?

      The question is: when is a car not a vehicle?

      If it were a toy car, for instance, this would not neccessarily be classed as a vehicle. Especially if it had non-moving parts.

      1. Anonymous Coward
        Anonymous Coward

        Re: When is a car not a car?

        How about a full size 2D picture of a car?

        What about a car on a fairground roundabout? Would a passing car's AI recognise a fixed circular locus?

        What would it do when confronted with an apparent herd of galloping horses - going up and down?

      2. Cuddles

        Re: When is a car not a car?

        "If it were a toy car, for instance, this would not neccessarily be classed as a vehicle."

        Why not? A car is a vehicle, a toy car is a toy vehicle. No matter what modifiers you add to "car", there is never a situation where the same modifier cannot also be added to "vehicle".

        1. allthecoolshortnamesweretaken

          Re: When is a car not a car?

          Depends on, amongst other things, how you define "vehicle".

          If I define it as "something I can use to get myself from A to B", toys are not vehicles.

          If I define it as "some sort of box with wheels", a Matchbox toy is a vehicle; just like my trolley case or the IKEA thingy the rubber plant in the living room sits on.

          Tricky.

    2. Doctor Syntax Silver badge

      Re: When is a car not a car?

      As Magritte would tell you, it's not a car, it's a picture of a car.

      And I wonder what the self-driving car would make of this: https://www.google.co.uk/maps/@53.571895,-1.6610001,3a,15y,14.84h,76.54t/data=!3m6!1e1!3m4!1shgFY-Sgpy7aPBcFLra-3Tw!2e0!7i13312!8i6656?hl=en

      It's not a sheep. It's not even a picture of a sheep.

  8. Anonymous Coward
    Anonymous Coward

    In my experience the problem with applying Bayes Theorem is when someone has to decide on a weighting for something that is not clear cut. The probability works as long as the data doesn't break the constraints that were believed to operate on it.

    Usually the people using the algorithm have no idea how the original weighting was decided. Therefore they believe it in situations where they shouldn't.

  9. Anonymous Coward
    Anonymous Coward

    " It is approximately 2.8m tall. Humans (even with hats) of this height occur with a very low probability. The system will, for now, decide that this is not a human. "

    However it is quite common in some festivals for there to be one or more people walking on stilts - or wearing artificial heads that increase their size and height well beyond human limits. A human knows that inside that rig is an actual person. Al training is all about context.

    1. Disk0
      Facepalm

      Statistics killed Jesus

      a person might be carrying something or someone, like when they have a child on their shoulders, or they might be carrying a protest sign, or maybe even a wooden cross, in which case someone's dogma just ran over their karma.

      1. Frumious Bandersnatch

        Re: Statistics killed Jesus

        I think that you'll find that it's "someone's karma ran over his dogma"

  10. Vaidotas Zemlys

    Oh the irony

    Isn't it ironic that the article about misuse of the statistics uses one of the 6 incorrect interpretations of p-values, for explaining the statistical significance of chi square test. No, the p-value 0.043 does not mean that the probability of this particular data occurring by random chance is 4.3%. Google for "asa p-values" for ASA (American Statistical Association) statement on p-values. It lists 6 most common mistakes of p-value interpretation and the number 2 is used in this article.

    1. Anonymous Coward
      Anonymous Coward

      Re: Oh the irony

      And a Chi-square test should only be applied to a Chi-squared distribution.

      I'll wager that a single day's data is an in appropriate sample size for one's new product line.

      And that real world confounders like - were you advertising your new product before launch in lots of women's magazines?, or on TV where the viewership was skewed towards women? should be considered before reaching for the calculator or Excel (other spreadsheet and statistical software are available)

      1. Primus Secundus Tertius

        Re: Oh the irony

        But the numbers calculatated in this case for a chi-squared test probably are more or less chi-square distributed.

        If one assumes a binomial distribution of the raw figures. they deviate by two standard deviations from a 50-50 result. The probability of a deviation that size or more is roughly five percent.

      2. Kiwi
        Boffin

        Re: Oh the irony

        I'll wager that a single day's data is an in appropriate sample size for one's new product line.

        Was thinking very much the same. What was the first day of launch - a weekday (when more women are likely to be shopping during the day) or a weekend (when more men may be out)? There are claims that women have a tendency to buy stuff on sale and opening specials on impulse whereas men tend to take a bit more time (depending also on things like colour and so forth, not necessarily just the application - and the stress some people put themselves through over "do I go xbox or playstation?) (that's easy, both are made by evil companies, go PC all the way, and Linux.. Now, do I go Mint, or Debian, or CentOS, or Ububtu, no wait Ubuntu did that advertising/spying thing, or SuSE, or...Maybe Devuan to get away from systemd).

        One day of sales data makes for poor sampling, and poor sampling makes for poor stats. I can see companies tweaking their advertising based on "more women brought this in the first day, therefore more women want it so we'll advertise to women" when it was the Friday before Fathers Day that the item went on sale, and it's a good looking but cheap Leatherman-like tool in a great looking pouch.

        I'm not a stats person but even I know that one days worth of data would be a silly time to be analyzing what the data means!

    2. Frumious Bandersnatch

      Re: Oh the irony

      Absolutely.

      The way to look at this is to calculate the margin of error for this sample.

      The sample size is 2692+2128 = 4820

      We calculate the 95% margin for error as 1.96 * sqrt(0.5 * (1.0 - 0.5) / 4820) (footnote)

      This gives 0.0141157044469341 which says that 95% of the time, the expected number of women will be within 50 +/- 1.412%. This translates to a range of 4820/2 +/- 68 people, or [2342,2482]. The value 2,128 is outside this range so all we can say is that using a 95% confidence interval, the assertion that males and females are equally represented (p=0.5) is not supported by the sample.

      Chi-squared is slightly different since it's a measure of fit of a set of individual observations to the expected, but the above is effectively its application to the average case (ie, it ignores the spread of individual samples). Neither provides a measure of how unrealistic/unexpected the result [set] is, as Vaidotas Zemlys has pointed out.

      footnote: http://www.dummies.com/education/math/statistics/how-to-calculate-the-margin-of-error-for-a-sample-proportion/

  11. Whitter
    Boffin

    The purpose of statistics

    a) To describe a population clearly (be that theoretical or sampled)

    b) To compare populations meaningfully (and thus make value judgments)

    If nobody understands, then (a) has failed.

    If a conclusion is not supported by the statistics, then (b) has failed (and probably (a) too).

    i.e. The problem is not with statistics. It is with bad statistics.

    1. JEDIDIAH
      Mushroom

      Re: The purpose of statistics

      My favorite bit of statistical nonsense is a study that went out of it's way to obscure any relevant information that might be meaningful to a normal person seeking information on the topic in question. They basically cooked the numbers in order to make any useful distinctions disappear.

    2. Calimero

      Re: The purpose of statistics

      And what is the definition of bad statistics? Or, rather, what is the definition of statistics? You only say what its purpose is (are, in fact, as you list two). We can then understand what is the difference between statistics and bad statistics, and where the study went wrong. In fact, it is nothing wrong with the study - it obeys statistics. We just forget (and forgive) the fact that s a medicine works in 99.999% of the patients, but it does not work on you, you will 100000% die (if we talk about an antibiotic to treat your flesh eating bacteria).

      Let us try rather understand what could be the conclusions we draw from stats used to interpret weather data and those used to interpret data that will determine the percentage of people who will get a certain disease. If the weather prediction is wrong, the consequences could be: take an umbrella on a sunny day (mild annoyance), all the way to putting you in the middle of the twister. Ouch! If the data on disease say that only 0.02% of the population will get that, then you are at a low risk, indeed, of getting the disease. But if you got it, there is a very high likelihood that there will be no studies and dedicated medicine for it. Ouch again! So is this the fault of statistics? I cannot blame statistics, because there, on the first page of the book I have, it says clearly that winning the lottery is not guaranteed. So, use it with caution and with a clear understanding of what the interpretation of the data may be. I'd say, if we engage common sense, we'll be fine.

  12. Jeffrey Nonken

    Four out of five doctors surveyed agree!

  13. Flakk
    Joke

    "From then on it became increasingly popular for people to use statistical evidence rather than violence to support their arguments."

    Perhaps that's the source of the animosity toward statistics? It isn't quite as much fun to prove foul villainy upon the spreadsheet of one's opponent.

  14. Baldrickk

    Cat-food

    I remember hearing that before.

    Thing is, I always interpreted the updated message as "of those that responded to the survey"* when of course, they are discounting everyone who said "my cat doesn't exhibit a preference to a brand"

    So even the "clear" explanation can be mis-interpreted

    *note; I was a lot younger and much less of a sceptic at the time

  15. Primus Secundus Tertius

    Stats is hard

    Many of the techniques used in statistics can be mastered with practice, even by a social sciences graduate if they are willing. But the proof of these things is often extremely difficult: for example, that a binomial distribution with large numbers tends asymptotically to a Gaussian distribution.

    Stats therefore becomes a memory test, since it is difficult to re-prove a theorem that has slipped one's mind.

    1. Vaidotas Zemlys

      Re: Stats is hard

      Actually it is not that hard. Either use Stirling's formula, or method of characteristic functions. The proof of CLT for iid variables uses only few tricks and thus it is not hard to re-prove. I suspect any active mathematician can reprove all the theorems from the undergraduate courses in mathematics. Math.stackexchange.com is a living proof of this.

      1. Calimero

        Re: Stats is hard

        El Reg told us that "Machine learning is hard. And sometimes deeply offensive" - I believe it- how can you not to, if it was said here? :)

        http://www.theregister.co.uk/2015/07/01/google_photos_app_machine_learning_fail/

  16. ilmari

    I have to admit failure here, I simply can not comprehend the dice example, how 6 and 6 would be any less, or any more likely to come up than 6 and 5, or any other combination?

    1. Anonymous Coward
      Anonymous Coward

      There are 12 sides in total - six on each die.

      There are only a total of two faces, one on each die, that can produce a 6 - 6 combination..

      There are a total of four faces - 6 and 5 on each die - that can produce a 6 - 5 or 5 - 6 combination.

      So there are twice as many chances of a combination of a 6 and a 5.

    2. Cederic Silver badge

      I think it's the way it's written that causes confusion.

      Roll die one: 1/6 chance of being a 6.

      At this point, the chances of 6-6 are indeed the same as the chances of 6-5.

      The article isn't however exploring that scenario. It's treating 6-5 as meaning either of the dice shows the 6, and the other shows the 5.

      So 6-6 needs die one to be a 6, and die two to be a 6.

      6-5 needs die one to be a 6, two to be a 5

      5-6 needs one to be a 5, two to be a 6

      Thus the combination of 6 and 5 has two ways in which it can be reached, which is twice as many as the ways of rolling two 6s.

    3. Wensleydale Cheese

      "I have to admit failure here, I simply can not comprehend the dice example, how 6 and 6 would be any less, or any more likely to come up than 6 and 5, or any other combination?"

      Because it can be either:

      Dice A: 6, Dice B: 5

      or

      Dice A: 5, Dice B:6

      1. Prst. V.Jeltz Silver badge

        I shall throw my hat in the ring and see if i can explain it a different and hopefully simpler way...

        Look at how many ways you can get to your total

        12 - 6+6

        11 - 6+5 or 5+6

        04 - 2+2 or 2+2 or 3+1 or 1+4

        07 - 6+1 or 1+6 or 3+4 or 4+3 or 5+5 or 5+2

        08 - 6+2 or 2+6 or 5+3 or 3+5 or 4+4 so a lot more likely!

        No , wait this could be easier . Lets just look at how many ways dice one could land and you could still make your total with the second dice , (at 6 to 1.)

        To get a 12 D1 needs to be 6

        To get a 11 D1 can be 6 or 5

        To get a 04 D1 can be 1 or 2 or 3

        To get a 07 D1 can be 1 or 2 or 3 or 4 or 5 or 6

        To get a 08 D1 can be 1 or 2 or 3 or 4 or 5 or 6

        I therefore conclude the odds of getting a 7 are the same as an 8. Is that true? im pretty confused now!

  17. Anonymous Coward
    Anonymous Coward

    8 out 10!

    Eight out of ten owners who expressed a preference said their cat prefers it."

    And that might not be entirely accurate either - it might really be "Eight out of ten owners who expressed a preference said their cat prefers it to not eating." e.g. it's so bad that 20% of the cats would rather commit suicide than eat it.

  18. Anonymous Coward
    Anonymous Coward

    Stats abuse

    20% of car accidents are caused by alcohol.

    Hence 80% of accidents are caused by sober drivers.

    Stay drunk and live.

    1. Kiwi

      Re: Stats abuse

      20% of car accidents are caused by alcohol.

      When we used to have the "bad weekend on the roads" stuff reported (or rather, when I could be bothered paying attention to the trash that passes itself off as NZ's equivalent to sewer tabloids news media), I often wanted to know other stats. Eg you could have a weekend where 27 people were killed (average being 5-6) - but they were killed in a bus that was hit by a landslide that nobody could've seen coming, and that being the only accident for the period. I figured out probably before I was 12 that what could be a much more useful stat is the number of accidents and numbers of injuries. And when fatality rates started dropping significantly in the 2000's, was it due to safer driving or much more likely to things like airbags entering the cheaper cars? Given the state of NZ's "world leading driver education and licensing" I'd place my money on "kiwi drivers are still idiots, but their cars help protect them".

      Stats without context are just as bad as stats made up on the spot, at least 98.775% of the time.

  19. Anonymous Coward
    Anonymous Coward

    The Goodies had it best...

    9 out of 10 doctors recommend this product. Mind you, we had to search a bit for the right 9 doctors...

    1. Wensleydale Cheese

      Re: The Goodies had it best...

      "9 out of 10 doctors recommend this product. Mind you, we had to search a bit for the right 9 doctors..."

      I recall Alan Freeman delivering the line "Four out of five can't tell the difference between Stork and butter".

      a) it was hard to tell on the tellies of the day how much Stork or butter was spread on those bits of bread handed out. Possibly no more than a smidgen.

      b) some years later I came across mention of an organic compoundr (began with a T?) which only 20% of the population could taste. I did wonder if a similar chemical was present in Stork.

  20. Anonymous Coward
    Anonymous Coward

    I don't understand why the chi squared test is the appropriate test for the women and men buyers. A quick bit of searching hasn't made it any clearer.

    I'm no expert, but it seems to me this corresponds to the binomial distribution. The question would be: what is the probability that at least 2,262 buyers out of 4,390 are of one gender if each gender had a 50% chance of buying? Using the formula for the c.d.f of the binomial distribution, the answer is 4.15%. Here's the formula, http://www.wolframalpha.com/input/?i=2*(1-sum+k%3D0+to+2262+of+(4390+choose+k)*.5%5Ek*.5%5E(4390-k))

    The article says 4.3%. I'm guessing there's a deep reason it's close, but it calls in to question the use of the chi squared test. Why was it selected over others? How should one decide what test to use? How can one check their answer if they're not sure? This is exactly the problem people have with statistics. Hand wavey, monkey see monkey do processes that people don't understand the idea behind. You can treat statistical program functions like black boxes, but then what black box do you choose?

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like