back to article Benchmark smartphone drama: We wouldn't call it cheating, says Huawei, but look, everyone's at it

Huawei has addressed the issue of tweaking a phone's performance to improve its benchmark scores, after being caught redhanded. The practice isn't new, and has been carried out by a number of phone makers in the past. It works like this: To boost a system's benchmark score, the phone detects whether a popular benchmark is …

  1. Anonymous Coward
    Anonymous Coward

    Isn't this easy to fix?

    Mandate a temperature sensor be used while running benchmarks and placed on the hottest part of the phone. Report the temperature.

    1. Lee D Silver badge

      Re: Isn't this easy to fix?

      Or just stop using fabricated benchmark that aren't indicative of much, and run the programs that people are likely to run directly.

      That way any "cheat" is then available to the users the same as the benchmarkers, any performance enhancement causing battery use or higher temperatures, or cheaper shaders, etc. will also impact on normal use of the product, etc.

      Benchmarks are a silly idea because nobody wants to know the raw integer performance of a processor nowadays. It matters not and is hugely complicated by a myriad other factors (e.g. multi-processing, throttling, etc.).

      "How well does it run...(insert top-end stress-testing commercial software here)". In the PC world, that's whatever new game gets lowest FPS on everyone else's card. On a mobile? No reason you couldn't do a benchmark via something like Chrome / WebGL rendering, popular gaming apps, etc.

      It's like "fancy" interview questions. All you're doing is hiring people good at answering "fancy" questions.

      Rely on fabricated benchmarks and all you're doing is buying phones good at winning fabricated benchmark tests.

      But buy a phone that plays the equivalent of GTA V at 120fps on Ultra (or whatever), and you get... a phone that'll play that game like that. And it's hard to cheat that *AND* the next game in the series *AND* that other demanding game *AND* the game from 10 years ago without... making a phone that's generally good all round at that kind of activity.

      Benchmarks never meant anything back in the Dhrystone/Whetstone days, they don't mean anything now.

      1. Anonymous Coward
        Anonymous Coward

        Re: Isn't this easy to fix?

        Why fix it? Could these exclusive benchmark features be used in other applications spoofing identity? The article makes it sounds like an actual hardware improvement (albeit dangerous). Now if that's the case, I'd wager that phone gamers would risk it.

      2. Anonymous Coward
        Anonymous Coward

        Re: Isn't this easy to fix?

        "and run the programs that people are likely to run directly."

        It seems easy, doesn't it? But you're going to have to use the same data set and so on every time and that will be difficult. Games? You're going to have to find a way to cycle exactly the same game sequence. Email? You're going to have fun with that, sending thousands of messages to verify that as the available space fills up performance doesn't change. And if you use the same game with the same data, how long before the manufacturers get wise?

        If you look at the standard range of benchmarks they do attempt to replicate real world problems like rendering, they haven't been simple Whetstone/Dhrystone (or even Linpack) in ages. The people who develop them are rather clever.But so are the manufacturers.

        Hence my apparently simplistic suggestion. What limits phone performance is normally thermal throttling. As heat dissipation means get better, this becomes less of an issue so long as batteries hold up. So the best and simplest way to verify performance is to measure temperature. Better thermal design should result in higher performance, or in longer silicon life if it isn't pushed. But it will also become quite obvious if benchmark detection turns on more power because temperatures will go rather high.

        1. Lee D Silver badge

          Re: Isn't this easy to fix?

          "It seems easy, doesn't it? But you're going to have to use the same data set and so on every time and that will be difficult. Games? You're going to have to find a way to cycle exactly the same game sequence."

          Gosh. If only lots of games allowed you to play saved replays since, say, the days of Doom/Quake.

          The second the program you run is NOT the game the user will run, it can be detected (hell, my nVidia drivers do it automatically and "patch" shaders in games that it recognises... and you can tell it to force Intel Optimus for one program and nVidia for another. If nVidia can do it for game settings profiles, they can do it for benchmark programs to cheat. But try cheating when the game being run is the game being benchmarked and the only difference is that the reviewer loads in a replay of a game that he's created on one machine and loaded on all the others (so you can't even detect a "standard" benchmark replay file, etc.).

          And what kind of insane person would benchmark email (but it's very easy to do)? You would benchmark, say, Chrome against a WebGL test suite. Good luck detecting that, especially if you use a different test suite for each review (not each device, but each comparison of devices).

          Comparing review models is really easy. Hell, you don't even have to load a huge set of licensed benchmark suites on every machine to do so. Literally a Steam account with a bunch of games... just like... a user with a bunch of games on Steam.

          With phones etc. it's even easier - have a profile of the App Store apps you're pushing down to them and push down the same apps on them all.

          As soon as you get into "benchmarking software", it's a lazy review. It's "let's just load this and check the number". Not, as stated, the temperature, CPU usage, whether it's getting priority, real-world use, etc. etc. etc.

      3. big_D Silver badge

        Re: Isn't this easy to fix?

        @Lee D I sort of agree with you. What I do like is the German magazine c't, which, when it benchmarks phones or laptops uses disk and CPU/Graphic intensive software and benchmarks to calculate how long the device will work at full speed, before throttling performance.

        If you are using an HD for heavy database work and it starts throttling after a few minutes of sustained activity, you might want to look elsewhere. The same for a notebook, if you are doing processor intensive work and it throttles quickly (the early Lenovo Yoga Pros suffered from this), you know it isn't right for you.

        The same for phones. The raw performance isn't so much of a problem these days, that the benchmarks mean much, but I find the throttling information useful. For instance, a recent review of a new m2 drive said it managed over 3GB/s for the first 45 seconds, but after 2 minutes, it had collapsted to under 500MB/s, where it remained stable for the rest of the 10 minute test cycle.

        For me, that is useful information. Raw IOPS or throughtput under "ideal" conditions less so.

      4. Cuddles

        Re: Isn't this easy to fix?

        "Benchmarks are a silly idea... But buy a phone that plays the equivalent of GTA V at 120fps on Ultra (or whatever), and you get... a phone that'll play that game like that."

        Which is... drumroll... a benchmark. Benchmarking is simply comparing tests run under standard conditions. Comparing how well computers can run GTA5 is no different from comparing how well they can run Geekbench. The only difference is that with dedicated benchmarking software, you can be sure of running exactly the same tests in exactly the same way every time, while running a messy game makes that much more difficult. The tradeoffs are that badly made benchmarking software might not be representative of real world use, and that since there are relatively few testing suites around it's easier to cheat as was done here.

        The reality is that no testing method is going to be perfect, but simply complaining about benchmarking in general doesn't make sense. Benchmarking software attempts to solve a real problem with comparing performance; the fact that it brings its own, different problems doesn't mean simply abandoning the whole idea will help since that still leaves you with the original problems to deal with.

        Ideally what you want to do is along the lines of the testing TechReport does, which combines a variety of purely synthetic benchmarks, realistic benchmarks, and a selection of common programs and games including ones known to tax systems in different ways. For example, their most recent CPU benchmarks include memory and maths tests with AIDA64, Javascript performance in browser, WebXPRT benchmark, compiling with QTbench, 7-zip performance, Veracrypt disk encryption, Cinebench, Blender, Corona, Indigo, Handbrake, SPECwpc, Crysis 3, Deus Ex Mankind Divided, GTA 5, Hitman, and Far Cry 5. Six of those are dedicated benchmarking tools, six are standardised benchmarks performed with representative data sets in real programs, and five are best efforts at standardised tests in games.

        Every single part of that is a benchmark, and even the more synthetic ones play an important part in the overall evaluation. The problem with benchmarks is not that they're a silly idea or that they don't mean anything, but that far too often both the people carrying them out and the people looking at the results are lazy and don't understand what they're doing. Looking at the results of a single benchmark is indeed meaningless, probably more so for a synthetic one but even realistic ones vary hugely depending on the task. There's no point looking at only the results for GTA5 if that relies heavily on single-core performance and completely misses memory bandwidth problems, and even adding a few other games in doesn't help if they all end up relying on the same aspects of the system (or importantly, if you just blindly throw games at it assuming that will make as good test, without figuring out what you're actually testing). The only way to do benchmarking properly is as above - test lots of different things in different ways, and for some of that synthetic benchmarks are the best tool for the job.

        tl;dr - Benchmarking is far from a silly idea, and in fact pretty much every proposal for what should be done instead is just a different benchmark. The trick to doing it properly is simply to make sure you cover all aspects of performance, rather than just throwing one or two programs at it and calling it a day. If that's all you do, synthetic benchmarks are really no worse than anything else; you're not going to get a useful answer anyway.

    2. Mike Moyle

      Re: Isn't this easy to fix?

      Seems to me that the solution is to run benchmarks continuously until either the battery runs down or the phone cooks itself to death and publish those results alongside the battery life under normal use.

      It puts the benchmark and battery life numbers into a usable context.

  2. jms222

    VW

    So it's like the VW thing which they all probably do anyway.

    1. fidodogbreath

      Re: VW

      Or competitive cyclists doping.

      1. 45RPM Silver badge

        Re: VW

        @fidodogbreath

        I’m all in favour of a doped up sports league. Cyclists off their tits on performance enhancers, runners barely able to piss for drugs and so forth. See what that real limits of the human body are when chemically boosted.

        The athletes would have to be warned and accepting of the risk, of course, and a non-doped league run in parallel - but i think that it would make for great entertainment. Certainly more exciting than who has the fastest phone, anyway - who the hell cares as long as it’s quick enough?

        1. eldakka

          Re: VW

          I’m all in favour of a doped up sports league.

          Let's expand on this a bit further.

          In addition to general sports-enhancing drugs (steroids etc.), let's also allow categories for recreational drugs.

          Let's see the 100m sprint with those high on weed, vs those on cocaine, vs meth, vs LSD, etc.

          1. 45RPM Silver badge

            Re: VW

            @eldakka

            Hell yes. And make it relevant to people’s everyday lives too. The 100 metre coked up dash-and-grab with a 50” telly. For the cycling event it could be the pavement slalom with the winner being the first to nab 50 phones whilst amphetamined off their nuts. And the weed pizza discus, of course. An Olympics that people can relate to.

        2. caffeine addict

          Re: VW

          OR allow biomechnical mods. Replace Oscar Pestorious's blade springs with real springs and let him do the 100m and long jump at the same time.

          I know he's in prison, but with those new springs he wouldn't be for long. Just after he set a new high jump record...

        3. Cuddles

          Re: VW

          "I’m all in favour of a doped up sports league. Cyclists off their tits on performance enhancers, runners barely able to piss for drugs and so forth. See what that real limits of the human body are when chemically boosted."

          It's called the Olympics.

    2. Mark 85

      Re: VW

      Well.. all the cool companies are doing it, from cars to phones, computers, etc, If it has a processor, the data we see published is just so much BS. The performance figures have become just fluff and meaningless.

    3. Adam 1

      Re: VW

      > So it's like the VW thing which they all probably do anyway.

      Yes, except I doubt that the synthetic benchmark faking will lead to thousands of deaths p.a.

      1. Adam 1

        Re: VW

        So why the down votes? Peer reviewed journals use too many big words for you? Or have you got some paper showing how a fake CPU mark score is causing deaths? Both are wrong, but your moral compass is pretty screwed up if you can't understand why one is not a few orders of magnitude worse.

  3. Oh Homer
    Coat

    Huawei man!

    Giz a deek at them bench things cos there's a reet fettle.

  4. J27

    Huawei doesn't have any credibility left. I'd just as soon buy a phone from North Korea.

    1. Pen-y-gors

      Just bought an Honor View 10 - seems pretty credible to me!

  5. JohnFen

    I'm shocked!

    Companies engage in deceptive practices to get good benchmark scores? I'm shocked! Shocked, I tell you!

    Huawei is correct, this practice is as common as dirt. That's why I stopped paying any attention to benchmarks decades ago. Huawei is incorrect, though, in thinking that this fact excuses any of them for doing it. It's always a terrible practice to lie (and this is absolutely lying), and companies should get reamed a new one for doing it -- even if every single other company does it too.

    1. Chris G

      Re: I'm shocked!

      I have based the purchase of my last two phones on them having the features I need and then as many reviews from buyers as I can find. The opinions of tech journo's in Wired et al mean nothing to me as my criteria are not theirs.

      The benchmarking likewise means little to me as I have never bothered to learn what they signify.

      Ultimately statements from the manufacturer about ; how white your knickers will be washed or how long your batteries will last is just marketing, if something is so new there are no recommendations from buyers, it's a case of you pays yer money and takes yer chances.

  6. Flak
    Joke

    Moral relativism

    If everyone does it, it is not wrong anymore!

  7. Chewi
    Coat

    How did they ever think they'd get Huawei with it!?

    Seriously El Reg, you missed a trick there.

    1. Adam 1

      Re: How did they ever think they'd get Huawei with it!?

      Where would the Honor be in that?

  8. Anonymous Coward
    Anonymous Coward

    This reminds me of those bogus AV tests

    I believe all the benchmark tests for "antivirus" products are a bunch of BS and are rife with AV companies adjusting their programs to artificially inflate an already bogus, misleading benchmark.

    One such company that comes to mind is Quihoo360 that got was accused using a Bitdefender engine instead of their own QVM engine for benchmark tests.

    https://en.wikipedia.org/wiki/Qihoo_360#Controversies

    I feel that most of the "antivirus" programs (especially ones designed for Android) are complete BS to begin with and I've seen a recent disturbing trend in dodgy Android "antivirus" apps that start off seemingly OK but then after a month or so start exhibiting malware-like behaviour.

  9. Anonymous Coward
    Anonymous Coward

    Where are the Consumer-Agencies & Regulators here...

    Rigging emissions tests / exhaust fumes is bad, but this is ok? Bring on fines and concentrate minds I say! Personally I buy far less tech now. Ever since LG SmartTV's got caught phoning-home USB file-names, you could tell ethics in tech were total history! I just laugh now at $900 smartphones. But hey go for it! Get slurped and cheated - twice over.

    1. JohnFen

      Re: Where are the Consumer-Agencies & Regulators here...

      Well, in the US, rigging emissions tests are worse than this because those tests were rigged in order to evade laws about permissible emissions. Nobody is evading laws by rigging benchmarks.

      From a consumer protection standpoint, you're right, but the US isn't so interested in consumer protection anymore.

  10. Flakk
  11. Jay Lenovo
    Angel

    App Neutrality

    Even though I work in the valley of the shadow of tech, I will trust no benchmark, for bias and big money surrounds me. Only in real world testing, will I take comfort.

    1. Commswonk

      Re: App Neutrality

      Even though I work in the valley of the shadow of tech, I will trust no benchmark, for bias and big money surrounds me. Only in real world testing, will I take comfort.

      Now that really is exquisite . Have an upvote; would award more if I could.

  12. This post has been deleted by its author

  13. Queeg

    From Huawei

    Yes we cheated, but we did it for you.

    Because we care so much.

    still waiting for that sarcasm icon El Reg

  14. ccc13481

    This has been done since the dawn of time:

    It was done with Windows 3.1 graphics drivers in 1993:

    See inforworld march 1. 1993.

    1. Pedigree-Pete
      Pint

      Ref: Windows 3.1 graphics drivers in 1993

      Yep ccc13481. I remember it well. Hercules, S3, Datapath they were all "at it". Cheers for the memories. PP

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like