back to article So despite all the cash ploughed into big data, no one knows how to make it profitable

Mountains of cash keep pouring into the titans of big data despite the world's inability to do much of value with their software. To wit, in its last fiscal year, Cloudera pulled in over $260m, with analysts projecting over $330m for 2017. Hortonworks, for its part, yanked down $184m, up from $121m in 2015. Outside the open- …

  1. NE-bot
    Flame

    Data speculation

    ARGH! Big data is basically data speculation. Capture all the data-pokemon! You never know when you combine this with that or a fucking outside data set to magically make everything make sense. Big Data and IoT are basically money sink holes for anyone who doesn't have a direct data to cash flow (mostly ad sellers and merchants). Even those looking for efficeincy savings are only going to get returns if they are at some serious scale. Everyone else is just trying to perform some kind of data alchemy. Buy lead! Lots of it! Turn it into gold with our magic algorithm/sorcerers stone.

    As the events with terrorists recently have proven, it's what you do with the data that counts, not the current mindless focus on hovering it up.

    Fk sake.

    1. BillG
      Mushroom

      Re: Data speculation

      Thank you, Matt Asay, for saying what I told my old boss two years ago. Just emailed him this article.

    2. Flocke Kroes Silver badge

      Re: Data speculation - Done successfully

      Successful big data is rare, but the results can be huge. The obvious example is Robert Mercer, who had the data for a very successful "Get out the vote" campaign for Donald Trump. In the UK, the name you are looking for is Cambridge Analytica.

      1. oxfordmale78

        Re: Data speculation - Done successfully

        Based on their job ads Cambridge Analytica have an impressive technology stack, however, as a data scientist myself, it is almost never the technology stack that is the limiting factor, but mapping and data quality issues instead. For example, how can I map your username Flocke Kroes reliable to your other social media accounts and secondly how can I infer your political opinions from your posts, in particular if you only ever post videos of cats or dogs on your public profile. Based on my personal experience, I don't believe Cambridge Analytica can deliver what they are promising, they seems to have nothing more than a glorified spreadsheet to record voter information derived from door to door leafleting combined with a limited subset of voter information scraped from social media.

  2. Mage Silver badge
    Unhappy

    Big data profits?

    So what are the Facebook & Google parasites (on personal info) using to convince advertisers to spend with them and hoover up 98% of internet advertising revenue?

    Though if it IS a successful "Big Data" example, we'd be all better off without it.

    1. gv

      Re: Big data profits?

      It is reminiscent of the hype surrounding artificial intelligence in the 70s and 80s.

    2. Rafael #872397

      Re: Big data profits?

      I bet hardware sales are blooming! See, there's profit right there.

    3. Anonymous Coward
      Anonymous Coward

      Re: Big data profits?

      Google used to use probability theory and Bayes statistics. All of their early big data work is specifically designed to service problems from that space. Note the "used to" - this was the case when it had the Google internal culture, not Doubleclick one. That also worked.

      I am not quite sure what either one of them use now, but the amount of money they are throwing into AI and specifically neural networks is indicative of them no longer having the brainpower to try to figure out the underlying proper math from Probability Theory, Optimal Control, Bayes Statistics, etc. So instead of figuring out the math (which requires educated brainpower) everything is thrown into the big data numbercruncher.

    4. Anonymous Coward
      Anonymous Coward

      Re: Big data profits?

      So what are the Facebook & Google parasites (on personal info) using to convince advertisers to spend with them and hoover up 98% of internet advertising revenue?

      That in their respective fields and target markets they're a near monopoly?

  3. Steve Davies 3 Silver badge

    Big Data?

    Big bottomless pit where money goes and never comes back.

    The words

    Ponzi scheme, Bubble also come to mind.

    The only people who can even begin to get to grips with it all are the likes of the NSA, CIA, GCHQ etc and they don't always get it right.

    Perhaps in time it will prove to be useful but when eh? get that right along with a few good investments and you are good to go.

    1. AMBxx Silver badge
      Coat

      Re: Big Data?

      Yep, feels like we've been here before!

      I worked for Vignette in the early 2000s. I joined in their first 'cash flow positive' quarter. Not even magic profit, just more money in than out. All downhill from there. Lasted less than 12 months.

      Mine's the one with the P45 in the pocket.

    2. Paul Smith

      Re: Big Data?

      What on earth makes you think that the civil servants in those organizations can handle large IT projects any better then the civil servants in any other part of government? How many convictions in public court have you seen as a result of their efforts? Or do you just believe them when they say they are doing a good job?

    3. Wensleydale Cheese

      Re: Big Data?

      "Ponzi scheme, Bubble also come to mind."

      I also suspect that much of it is like squirrelling away useful looking answers from technical IT forums.

      There comes a point when most of those answers cease to be of interest, simply because the products concerned have evolved, introducing a new set of questions, or even ceased to be.

      You may be able to re-use that data for other research, say linguistic analysis (thinking of AI here), but there are plenty of other sources for that.

    4. Ian Michael Gumby
      Flame

      @Steve Davies 3 Re: Big Data?

      Sorry, but there are many companies that are profiting from their use of big data solutions.

      "The answer may come down to one tweet from a Gartner analyst. The spoiler? Virtually no one has been successful with their big data projects. They're spending lots of money but having little success."

      Hogwash.

      The issue is that those who are successful are not sharing their success stories. Nor are they talking to Gartner.

      If Matt meant to talk about those companies supporting the open source product, then sure, Hortonworks and Cloudera are still burning thru cash. MapR hasn't gone public and they really haven't released their financials and their results may be interesting.

      There are a couple of reasons why Cloudera and Hortonworks are having problems turning a profit but that's a different story.

      Companies that have invested in Big Data are having mixed results.

      In part, they hire cheap labor who don't know what they are doing.

      In part, they are still early on the adoption curve and need to level set expectations.

      In part, many companies don't understand the value in their data and have over estimated its worth and under estimated the costs involved in attempting to extract that data.

      And yes, I am one of those 'experts' but rather than the blue talking head, I went with the flame icon.

      Matt needs to do his homework before writing yet another "Hadoop is Dead" story.

      The truth... Hadoop is hard. Too many learn only the basics yet fail to learn what is needed to make things actually work. Free clue. All of those on shore gray haired guys who've been in IT for more than 25+ years ... you need those guys to help with the infrastructure engineering and to help level set expectations on what you can do with the data.

      And to your list:

      Facebook, Google, and others that I personally know but legally can't name.

  4. Anonymous Coward
    Anonymous Coward

    But, perhaps profitable for the companies using it

    This article shows how the suppliers of big data solutions are struggling to turn a profit, but is less demonstrative on whether companies that are using them are benefiting. I don't know either, but suspect that the ones getting the greatest value from it are also keen to keep quiet about it for competitive reasons.

    1. Paul Smith

      Re: But, perhaps profitable for the companies using it

      If I remember correctly, there was an article here on El Reg a couple of years ago that reported big data projects were returning a little under 50c for each dollar invested.

  5. Harry the Bastard

    it's not simply about the data

    it's also about the willingness, ability (and guts) to do what the data suggest doing

    if it's that large chunks of an organization could be radically transformed, shrunk, or even eliminated, the reaction tends not to be "woo yay go for it!"

  6. Your alien overlord - fear me

    Big Data is just like all those tax dodging schemes celebs and footballers sign up to. 'Loose' money and wipe it off your tax bill. Big Data doesn't really exist and when the taxman finds out, oh dear oh dear.

  7. jMcPhee

    We heard all about the 'new economy' in the 90's. Too many (excessively rich) people with too much cash to invest. Not much GDP gets created by knowing who watches which cat videos or who checked out Amazon for what brand of water faucet parts.

  8. Pat 11

    It doesn't have to actually work

    Google have got a great way to monetise big data but it's entirely independent of whether it actually improves results. All they had to do was convince marketing execs that it might give them an edge. Maybe it does, or maybe it doesn't, it's all the same when you have captured the advertising industry.

    When it comes to me important subjects like health, the evidence has to be a lot better, and nobody wants to be that guy who used an algorithm and killed a bunch of patients.

  9. Anonymous Coward
    Anonymous Coward

    Panning for mud?

    The raw reality in a lot of these Big Data projects is that there just isn't any valuable data to be found. The differences between individuals or buyers or whatever people are trying to sort is, in many cases, just noise, and not predictive of anything valuable. It doesn't matter how much mud you sift if there are no gold nuggets in the first case.

    1. James Anderson

      Re: Panning for mud?

      Too right.

      Don't normally pay much attention to the adds but lately I paid more attention.

      So I get lots of ads for travel to places I have just been to, hotels I have already booked, products I have already bought.

      I mean what's the point of sending an ad for a hotel I have already booked. I just bought a raspberry Pi , how likely is it that I will buy another one a week later?

      Headless chickens with a great sales team.

  10. Korev Silver badge
    Boffin

    HPC?

    Meanwhile in the less-hyped and unfashionable HPC space we're dealing with "Big Data" on a daily basis and helping our companies make money. Think seismic processing for oil & gas, CFD for cars, genomics and computational chemistry for pharmaceuticals. I wish that people look beyond they hype sometimes :)

  11. wyatt

    Company I work for is working with software that is able to extract and display results from gathered data. When asked what use it is, generally the answer is that it is 'interesting'. Not sure how much of a return it'll give for the cost of processing this data, and if the way it is processed will just return a result that you want, rather than one which gives you an insight into the business.

  12. Andy 73 Silver badge

    Operational costs and occasional wins

    I've ushered a couple of big data projects into production at a global company, and can say that for the right client, there is real value. The weasel words are 'for the right client'. Big data most often provides means to make marginal improvements in customer retention and spending, but until you're looking at millions of customers those improvements are elusive and the cost outweighs the return. It is possible to run such projects on cost effective 'mini clusters' and with a pretty small dev team, but more often than not, cautious management will load up such attempts with reassuringly expensive hardware and a 'big team' to match.

    The other angle is that reporting and compliance can benefit from big data - moving from reports that are typically days (if not weeks) out of date to near realtime, and archive free access to a company's entire history. That level of improved efficiency can be worthwhile, but more often than not, big data runs alongside existing traditional services and so is an additional cost rather than a cost effective replacement.

    It remains a very cluttered toolbox, requiring experienced devs who can integrate with the company as much as with the software. Building a solution is still a nuts and bolts job, which leaves employers at the mercy of over optimistic devs and divas.

  13. Emmeran

    42

    Yes, I thought it over quite thoroughly. It's 42.

  14. Anonymous Coward
    Anonymous Coward

    To me, big data is nothing more than the emperor's new clothes - nothing there at all.

    In my opinion, rather than investing (wasting) all that money on big data infrastructure, companies would be better off taking a cold hard look at the data they really need to keep around, and if it isn't DIRECTLY responsible for generating revenue, getting rid of it, and all the infrastructure that supports it.

    All the storage, servers, support staff, developers, consultants, software licensing...gone. I would bet in 90% of the cases, a company would save more money by ditching a big data solution and all its associated costs, than any amount of increased revenue it might bring in.

  15. Anonymous Coward
    Anonymous Coward

    But what IS this big data? what are they collecting? browsing habits? Fuck all use there except for parasitical ad slingers who end up bouncing off my ad blocker... fuck 'em

  16. Shameless Oracle Flack

    @Andy 73 said:

    "It remains a very cluttered toolbox, requiring experienced devs who can integrate with the company as much as with the software. Building a solution is still a nuts and bolts job, which leaves employers at the mercy of over optimistic devs and divas."

    Therein lies the problem, as Matt Asay the OP also pointed out. Companies really need an internal culture and org shift to make big data happen, combined with reasonable expectation setting. In a large company, big data requires almost a music business mindset: there will be lots of OK songs, some bombs, and a few big hits. Tolerance for failure and experimentation is required, as is organization agility (sadly lacking as pointed out by other commenters) that allows small teams to make small (and large) progress with lots of experimentation and risk-taking.

    One large company that has succeeded is EOG, the largest oil producer in North America. It starts with an agile culture and mindset, including an engineer CEO who experiments with data himself and constantly pushes his staff to develop new, data-driven apps that can incrementally improve operations and efficiency. The classic expensive slow waterfall with an expected outcome, or aimlessly swimming around in a dataset to find "something" is a recipe for failure, and its likely these org challenges that are the key hindrance to success at this point.

  17. Ilsa Loving

    Marketing trumps Technology

    I think this is one of the rare Register articles where the commenters provide valuable insights and really complement the article. Props to Andy and Shameless in particular.

    Big Data. The Cloud. AI. XML.

    Every major technology that comes about goes through this massive bubble cycle where marketers et al try to convince the world that this New Thing(tm) is a magic bullet that will solve all their problems.

    Worse is when the term is a co-op of existing technology, which then turns into background noise because said term is abused to the point of being nonsensical. I mean, what the hell IS "The Cloud" for example? No matter where you look, someone comes up with a new definition that just happens to help them peddle whatever widget they're peddling.

    And now everyone is going crazy over Hadoop and Big Data. even when it's not even remotely appropriate, because they read about it in a magazine as the "Next Big Thing" and wanted in on the bandwagon.

    Hadoop is a surprisingly powerful technology, but you have to fit the tool to the need, like every other tool ever conceived.

  18. Kevin McMurtrie Silver badge

    Math and Information Science failures

    If you teach a machine to analyze the behavior of millions of people, it just might learn to predict the behavior of millions of people. At the individual level, that means your machine maybe has a one-in-a-million chance of better prediction than manually researched market segment data. It's useless. Unlike manual research, big data doesn't even tell you why results are poor. The same goes for analyzing millions of low quality DNA tests for diseases, whether or not somebody is beautiful, or whatever the VC scam du jour is.

    The kind "big data" needed for accurate prediction per individual is generally not legal or ethical to collect.

    1. Vaidotas Zemlys

      Re: Math and Information Science failures

      Nobody wants to predict individual accurately, because it is not possible. All everybody wants is to identify a profitable slice of a group of individuals. The basic limitations of Statistics 101 still apply, you cannot fit more parameters than there is individual data points. And yes I know about the regularisation, etc, but if noise is stronger than the signal, you are basically screwed.

  19. CCCP

    Sawing our collective branch

    The vitriol here is telling. Tech people are a very conservative bunch - which is really weird as technology, mostly, is at the forefront if not the driver of change. I counted one positive comment about how it could be made to work in some situations.

    The problem is that the only viable owner of big data is the the CEO. But that is doomed at the present as they don't understand the potential, and so get bored. Maybe in the next decade we'll see a new class of CEOs for whom data flow is the essence of the company, and therefore big data is in the DNA - and I don't mean the adslingers or so called disruptors, but real companies.

  20. John Smith 19 Gold badge
    Gimp

    ""abundant data by itself solves nothing.""

    Much like all that personal data slurping by the various T&FLA's has found.

    When you're looking for a needle in a haystack is filling a field with haystacks a good plan?

    The correct answer if of course to get a metal detector.

    But what if you don't have one? Or you need the actual needle itself, not evidence it's in any given haystack?

  21. Alistair

    "abundant data by itself solves nothing."

    One of the reasons that big data projects tend to run into walls is that in many cases you have data streams from several arms of a company coming together. Each stream will have its own .... "gnomes" if you will, who have herded and cultivated that data for quite some time. And quite frequently the gnomes attached to each stream have no exposure to one another, or to each others work flows. This leads to the disparate pile of garbage that takes eons to sort through. There are no relationships built that can be used to collate the data in any form.

    What one requires is both an understanding of the data, where it comes from and its relationship to the *questions* you will be asking the data.

    Sadly, I am about to sound screamingly ageist.

    Experience. of the data. in all its chunks. may be required in order to develop the questions to be asked.

    Dumping everything into the lake without the concept of what will be done with it, or what questions will be asked is only going to make your 'big data' solution a 'big puddle of goo'.

    It most certainly is NOT "put EVERYTHING in here and all your questions will be answered" -- which is the interpretation I hear in many many many comments I've seen about big data. The objective rather is "Bring these analytical data here, and recreate the basic structure we already know works" - NOW add the stuff we're not too sure about and see what it tells us.

    a) build the framework of analysis

    b) add known data streams, verify that analysis looks reasonably similar to known good.

    c) add in data streams that are relevant - including relationships

    d) compare indicators and results.

    repeat c and d -- but make sure that the data scientist/analyst reviews are *listened* to at each iteration - it is possible that one can end up with more noise than data.

    Sadly what happens in many cases is "Build hardware/software/stack, toss ALL THE BASE IN " -- start looking at analysis. OH CRAP IT DUNNT WURK!

    1. Lieutenant Frost
      Headmaster

      Re: "abundant data by itself solves nothing."

      I work for a company that deals in data mining of clinical systems. Basically, depending on what data you have (and what data you're willing to export to our system [and drop the $$ for accordingly]), you'll be able to write queries against the data which may be from completely independent systems that would allow you to connect dots that you would not normally be able to see in each disparate system on it's own. In theory, it sounds cool - and when it works correctly, it is.

      - HOWEVER -

      This requires quite a few dots to be connected internally before such queries can be written:

      (i) The source system(s) must be capable of providing data to our systems (i.e. Hello, IoT) - Your systems have to be producing data for us to slurp up in the first place, be it the hospital EMR, heart/blood pressure/oxygen/etc monitors, barcode scanners, and so on.

      (ii) The source system(s) must be administered by people who are empowered to affect change in the organization - If your hospital's IT department can't get the C-suite to approve a simple maintenance contract for your critical literally-life-or-death systems, good luck getting them to agree to the cost or effort required in setting up an effective "big data" measure.

      (iii) The source system(s) must have users who can be considered subject-matter experts (SMEs) on how the systems work and what the data contained in them ~means~ - These are the people who are going to be able to validate that the data is moving from the source systems into the data mining solution correctly. This is where the experience part of the existing systems and user workflow comes into play.

      (iv) The source system(s) must be capable of providing the data to be analyzed - If you don't tell me when a patient enters or leaves a hospital, I can't tell you how many people came and went through admissions in a month. Sadly, this may be the most straightforward of the points being made.

      (v) The destination system must be capable of handling the data in a meaningful way - Let's face it, most of the data hoovered up by these big data systems is useless noise, loosely structured at best. Those familiar with healthcare systems will have at least heard of the HL7 standard, which suffers from the same level of bloat that all standards tend towards - systems that are built to handle HL7 messages are unwieldy, disgusting monsters by nature due to how loosely the data is structured. Put simply - the destination system has to know how to interpret the garbage being fire-hosed at it in a way that provides some level of value.

      So assuming you have technical requirements (i) - (v) met, you now have more problems:

      (a) Unless you know what you're looking for, all that data is useless - If you've ever needed an example of the phrase "the more you know, the less clear things become", then this situation fits perfectly. You need to bring the aforementioned SMEs together at this step to interpret the data being collected, and build a data model that can actually provide some level of value for the people who are footing the bill for the project. If they can't, then the project needs to be shelved and re-evaluated, because reaching this point without a clear idea of whether the data collection put in place will do what the big data system was brought in for is a huge red flag, and the objectives need to be re-evaluated.

      (b) Likewise, the culture around the system should be collecting ONLY what is useful, and discarding the rest - extraneous data that is collected only increases the chances of false positives and the signal/nose ratio in general. Demographics is an example - you may care about whether a subset of the population has a particular affinity for dill vs. sweet pickles, but odds are you won't. Don't collected that if it's meaningless -now-, even if it *might* be useful later.

      (c) The people receiving the reports filled with these data points need to care - As was (once again) mentioned before, the people who are getting the output of these projects need to have some level of buy-in into the systems. There needs to be a clear picture drawn by the results of a big data project on how it will impact the organization, otherwise they'll just shrug their shoulders and all the work was for naught.

      At the end of the day, these systems need to be connected to actual delivered value (i.e. increase in the bottom-line) in order to be considered anywhere near a success (and thusly justifying the expense) where it counts. Getting all these ducks in a row in an ideal scenario is unlikely enough, once you roll the theory out to reality, it's easy to see why big data projects aren't really taking off or providing value the way the market has promised it would.

    2. allthecoolshortnamesweretaken

      Re: "abundant data by itself solves nothing."

      Isaac Asimov: The Machine That Won The War

  22. Anonymous Coward
    Anonymous Coward

    the main problem I see is that, just like a lack of cyber security professionals, there is a sea of tools and nobody knows to use them proficiently. moreover, not just use the tools, but can wrap a compelling business use case and strategy behind them. Most deep developers working on some type of Hadoop functions....yeah, not exactly going in front of a C Level board room. There is a serious disconnect in talent, vision, etc to put the right horsepower behind these tools.

    Let me guess - everyone in here bashing " big data"...are probably Unix admin? Oracle DBAs? Storage admins?

    The guy who will become a VP is the guy who improved a business strategy with big data...that guy right now is 1 out of a million. perhaps with more STEM education, and changes in career fields, the new age of IT admins will be primarily cyber security, big data, cloud architects, etc.

    The technology is there. the vision is execution is not. old dogs....new tricks....

  23. Roj Blake Silver badge

    Horses for Courses

    There are some people who are really good at data analysis because they've been doing since before "big data" was a thing and because the data is relevant to the questions they want to ask. Banks and supermarkets for example.

    For most people however, they won't learn anything they didn't know already.

  24. Anonymous Coward
    Anonymous Coward

    Ransomware

    Some know how to make it profitable.

  25. Dinsdale247

    Theory, Meet Complexity

    As a recovering business analyst I would say this isn't a problem with big data, it's a problem with complexity. Complex solutions to implement, complex links to other systems, complex cleansing/processing systems, complex concepts required to know how to use the data.

    Anyone remember Enterprise Data Buses? That was a great idea until everyone realized how hard they were to implement and keep running.

  26. Hawk65

    People need to remember, you constantly got to get value from the data you are storing as it does cost you money to store. Storage is cheap they say, but your paying for it constantly. If you only get a nugget of information once and a while, then how long do you need to keep the data for. If you have data going back 20 years, is it skewing the results.

    My personal opinion is that we need to keep smaller subsets of data, a realistic purging point where data makes no sense to keep indefinitely. I think we also need to have some idea on what question we want to ask of our data to ensure the appropriate data is kept.

    Regards,

    Hawk65

    PS: Look at the organisations that tell you to keep all data, standing behind them are large storage or software organisations.

  27. Anonymous Coward
    Anonymous Coward

    You can't fight fashion

    "Virtually no one has been successful with their big data projects. They're spending lots of money but having little success".

    That reminds me of this well-known remark from a few years back:

    "The computer industry is the only industry that is more fashion-driven than women's fashion. Maybe I'm an idiot, but I have no idea what anyone is talking about. What is it? It's complete gibberish. It's insane. When is this idiocy going to stop?"

    - Larry Ellison at Oracle OpenWorld, September 2008.

    1. Anonymous Coward
      Anonymous Coward

      Cui bono?

      Although actually Larry Ellison and the crew at Oracle are among the only people to profit massively from this particular mindless fashion. Along with the vendors of server farms.

  28. Anonymous Coward
    Anonymous Coward

    it means asking the right questions that would have business value, implementing it in a logical and somewhat creative engineering manner and then understanding and interpreting the results.

    None of which any business manager is able to do.... they go in to it expecting that 'analytics' will provide them with 'magic' ideas and solutions they can pass off as their own.

    Sorry life isn't like that go back to highlighting spreadsheets then taking months of the year off for conferences and training....

  29. Jonathan 27

    I think it's just as likely that people will start getting angry at all the data capture and start lobbying to make "big data" illegal. Even most technical people I talk to don't realize the depth of the information that Google, Amazon, Microsoft and all the other big corporations (and a large number you've never heard of) store on you.

    You hear a big uproar about Windows "stealing your information", but average websites steal your information, including any website with ads of any kind. Android steals your information including tracking you everywhere you ever go (unless you're very careful about disabling it). iOS is nearly as bad.

    We'll see just how much the public can stand.

  30. Anonymous Coward
    Anonymous Coward

    Half the problem in my experience is capturing something worth analysing in the first place. Defect data is rather useless if one category field captures data on hand-wash bottles to fan control system failures!

    The second problem is the enormous legacy of non-digitised records. Drawings, microfilm, Mk. 1. Contents-of-Brainssss. Sort those elements out and big data can start to pay off.

    In the meantime though it's just a marketing gimmick to employ people while not solving the underlying problems...

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like