back to article Databricks pushes machine learning on easy mode: Rock star data scientist, meet sweaty engineer

Ninety-nine per cent of companies are struggling to make a success of machine learning, according to execs at analytics biz Databricks. Firms have spent years amassing data in the hopes of doing something useful with it eventually. Meanwhile, the meteoric rise of Facebook and Google has helped make data analytics sexy, …

  1. Destroy All Monsters Silver badge

    Le sigh

    But the whole point of the work is to get the enterprise consistently and effectively deploying machine learning, so is the messaging working?

    Don't deploy a tool without know what it is good for.

    As in every endeavour of statistical processing:

    1) WHAT ARE THE QUESTIONS YOU ARE ASKING?

    2) HOW DO ASCERTAIN THAT THE ANSWER IS CORRECT OR AT LEAST REASONABLE?

    3) DO YOU HAVE ENOUGH OF WHAT IT TAKES TO RUN EXPERIMENTS AND TEST HYPOTHESES?

  2. Anonymous Coward
    Anonymous Coward

    Marketing materials have echo

    This isn't the first time I've seen an analytics outfit completely echo the screams of businesses unable to deploy big data tools... KNIME comes to mind amongst a bunch of others. The sales hook is solid. But once again the real problems with organisations and mantronic processes are forgotten. Segregated databases designed in isolation, different naming schema, dissimilar but associated record keeping. "Fixing" those problems means wholesale system change on a large scale - not going to happen, sorry. Cost-benefit doesn't stack up. Come up with tools to let me plug those disparate worlds together and then analytics could come to life.

    In practice I've found the most powerful technique is dumping contents of various enterprise DBs into SQLLite then gluing them together for hacking in Python and Numpy. Unless you can reduce the skills requirements needed to do what I've just described, then the mantronic solution is going to win every time.

    1. Korev Silver badge

      Re: Marketing materials have echo

      In practice I've found the most powerful technique is dumping contents of various enterprise DBs into SQLLite then gluing them together for hacking in Python and Numpy

      This is what I tend to do too. Also, look into Pytables, it's excellent and very fast. Pandas can read/write the files too (although can create sub-optimal files sometimes compared to native pytables). I like creating well-indexed HDF5 files using Pytables and outperforming rather expensive databases :)

  3. SVV

    Don't employ a rock star data scientist

    They might choke to death on their own vomit.

    1. Old Coot

      Re: Don't employ a rock star data scientist

      That reminds me of a story that Richard Feynman tells (in one of his books) about working on the A-bomb project. He was 22, fresh out of physics grad school, working with a lot of famous older scientists, among them Niels Bohr. Bohr didn't like Feynman, but wanted Feynman to be present at all his meetings, simply because Feynman was such a smart-alec that he would say so when he thought something was bad idea, or wouldn't work. (The other scientists were too intimidated by Bohr's reputation, and afraid of looking stupid.)

      In other words, Bohr was the 'rock star' physicist on the program, and most people believed what he said because of his formidable reputation and achievements, But Bohr knew that he could still be wrong, even overlook something.

      So it is for the rock-star anything, even a legtimately gifted one. (Go back and listen to all the Beatles' albums; even they wrote some forgettable songs.)

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like