WARNING: Since I was the reviewer…this may be construed as a biased review of this book.
As a “data science consultant” I have a lot of engagements that start out “help us scale our data scientists”.
A lot of engagements
Within the first week of the engagement I invariably see the same anti-patterns repeated:
- Data scientist reads large data set from database and outputs to csv on his laptop
- csv is loaded into R or pandas dataframe
- data wrangling (nee ETL) is performed
- some actual ML algos are applied
- data is output back to csv
- csv is loaded back into database
There are so many “opportunities” in the above workflow.
To date, every data science, R, and python book focuses on efficiencies in a single step of the above process. But no book looks at the workflow holistically and tries to solve the “how do I scale my data scientist” problem.
This book makes a great attempt at doing just that.
Most books with “machine learning” in the title focus far too much time on algorithms and the mechanics of the data science process. And that’s great if you are a budding data scientist. But not everyone is. Some folks are tasked with “scaling their data science team”.
This book is geared to that use case.
There is brief introductory prose where needed to help a DBA or data professional understand the underlying “data science problem”. But primarily this book is geared to helping DBAs understand how to help a data scientist scale. Some topics to illustrate my point:
- show how to connect to SQL Server directly from RStudio and RTVS (R Tools for Visual Studio).
- “operationalizing R code”: ie, as a DBA, how do I take my data scientist’s code and make it run natively in the SQL Server MLS engine.
- a later chapter: how to use R as a DBA to make predictions on SQL Server performance. That’s cool
As an industry, we need more DBAs and data professionals (SQL Guys, ETL Guys, etc) that understand “data science”, R, and python and more data scientists with better understanding of databases. This book is a great first step. Well worth the money.
Thanks for reading. If you found this interesting please subscribe to my blog.
Related SQL Server and Data Science Posts
sql server data science