Or did they just include R because they're Pirates?
Dirty data, flogged cores: YES, Microsoft SQL Server R Services has its positives
The R language has enjoyed a great reputation in statistical computing and graphics for decades. However, it is also known as something for statisticians. Born around the time of Java, PHP and Python, R lags behind all three by a long chalk on the TIOBE rankings. Yet Microsoft spotted an opportunity in this era of analytics …
COMMENTS
-
-
Thursday 16th February 2017 19:21 GMT Adam 52
Re: Dirty data
You've not looked at many real-world transactional databases have you?
I'll give you some examples:
1. Has your business remained unchanged for decades? You didn't start asking for email address at some point around the late 90s? Or maybe storing full years around 1999?
2. How many online orders to you have from little bobby tables?
3. Have you ever sold a product at the wrong price and had to do a refund?
4. You remember when Yugoslavia split? How has that affected your regional roll-ups?
-
-
Thursday 16th February 2017 12:26 GMT Doctor Syntax
Here's the real rub: running R Services in SQL Server 2016 is running analysis on your transactional databases. That's your live database, your R code is running inside your production database, eating the CPU cycles and disk access, slowing down your expensive SQL server.
You could use a second server to run R, but then you've got the potential network bottleneck of moving the data back and forth between the machines.
That's only one of the problems. The data inside a transactional database is not designed for analysis; it's likely to be dirty, inconsistent and full of errors.
Let me second Joe's comment about dirty data.
Apart from that, you can always restore your transactional DB backup to your analytical server. That way you get real data and test the restore procedures at the same time. There may, of course, be other issues with this - such as data protection - but the objection as quoted really doesn't stand up.
-
-
Thursday 16th February 2017 18:16 GMT W. Anderson
Not a new R language/Relational Database capability
Scientists , economists and others have reprted using R programming language in conjunction with PostgreSQL Object/Relational database as well, so I do not se this opportunity as specific or original to Microsoft SQL Server only, other than the company promoting this new found functionality as their own.
-
Thursday 16th February 2017 21:48 GMT FozzyBear
Anyone
That attaches R or other analytical services directly to their production databases should be shot, drawn and quartered, stabbed, poisoned, drowned, impaled, tortured and then finally their tattered remains hung above the entry to the IT department as a warning to others
And I hope everyone recognises the restraint I have shown in the punishments that should be inflicted on the individual
-
Friday 17th February 2017 01:13 GMT Anonymous Coward
Re: Anyone
I never had any problems using PL/R in PostgreSQL, even in production.
The whole argument in this article about using Python instead isn't even a real comparison. Using server-side R vs client-side Python are two completely different things. If it involves retrieving huge amounts of data, even if R is slower, it's going to be faster on the server side as its not going to involve transferring huge amounts of data.
-
-
Friday 17th February 2017 02:45 GMT Mark 65
Another issue for newbies will be the need to get your head around the practicalities of R being a domain-specific language. You will almost certainly need an understanding of statistics to get meaningful answers out of your code.
I'm curious as to what analysis someone would be doing of the data without an understanding of statistics - mean, max, min?
-
Friday 17th February 2017 13:30 GMT P.B. Lecavalier
Hard is a Good Thing
"For all the criticism of the R language – it's hard to understand, slow and a memory hog"
If you find R hard, then you need to spend some time with SAS, which consists of 4 or 5 languages patched together because individually, each of them is utterly inadequate to get you anywhere. Then whatever you do in R with 1 or 2 lines of code will take at least five times that in SAS, and forget about the concept of "package". R is made for smart people by smart people, and it better stay that way to keep Excel and VBA kids at bay.
The article mentions python as a great language (I agree, it is), but R's internal documentation is so much better than python. Why? Because it always features examples for simple things! Whenever I look into the standard library reference of python and look for a module that could be of some use, I have to look somewhere else to figure out how to do anything with it (i.e., ok, so this is some instance of a class, then what do we do with that?). Without Google, most people using python would have a really hard time (myself included). Without Google, for R, it would be an inconvenience, but I could still find my way around.
Should it be a "memory hog" on a server (never had any issue with it, but never run it on a production server either) is not surprising because it's development did not focus on making it a daemon.