An effective solution to support enterprise analytics must do many things well. Here’s what I look for when evaluating platforms.
Think ahead to deployment
Although we practice data science, the last thing we want is a science experiment. We want deployed analytics that deliver business results. So I start my assessment of analytics platforms with deployment options. What are the varieties of analytics deployment that can integrate with my existing systems and business processes? Are the analytics feeding product recommendations or flagging product defects or alerting on fraudulent transactions or accelerating the processing of documents? A solid solution gives me deployment choices while maintaining unity and integrity at the core.
Be voracious: Find and join all relevant data
Regardless of the use case, business value driven by analytic insight depends on finding and connecting the data that relate to the outcomes we care about. Does the system give my team good access to all variety of data, to diverse data types from different parts of my organization, including the massive unstructured data like images sitting in object stores ready to yield insight through analytic transformations.
Business value driven by analytic insight depends on finding and connecting the data that relate to the outcomes we care aboutTweet This
Collect prisms: Assess, clean, and prepare potential predictors
Getting the raw data together is a crucial step, but only a first step. So much value comes from cleaning data and pre-processing it. Missing data needs to be found and treated. Skewed data needs to be normalized. Text can be cast into topics. Images can be converted into categories. With the right platform, each data field can be sent through a variety of appropriate transformation functions to produce an array of candidate predictors, some subset of which will prove predictively useful.
Go big and go fast: Scale algorithm training and deployment
Modern algorithms are hungry for more data. Make sure your solution can handle operations on lots of data quickly, both in transformation and predictive functions, both in training and deployment phases. The first go at a problem is rarely the right one: Enterprise analytics depends on being able to iterate quickly on massive data, over and over again, and your analytics solution needs to support the swift, scalable search for value.
From many, one: Harness diverse perspectives for more effective analytic innovation
A single analytic innovator takes multiple turns at the same problem, but more business value comes from harnessing creativity across a diverse team, all contributors bringing distinct perspectives and models to a shared task. I want a platform that gives the broad community of analytics professionals in my organization ways to participate with my analytics projects. We need support for diverse user types – business analysts, data scientists, developers, line of business managers; support for all the core analytics languages: SQL, Python, R; support for diverse front-end tools as well as deployment frameworks.
Does using the solution today make using it again tomorrow even better?
With the right platform all the outputs of our analytics are reusable assets; we are always looking for assets we can leverage, not one and done answers. Every model should be available as a starting point for other analysts and data scientists. The data processing that we do to get ready for the modeling should be deployable independently of the model, so that those pre-processed predictors are available to other models. Sometimes an analytics pipeline is itself a transformation function – a deep learning model turns pictures into labels. The precursor, intermediate, and final results of a model all can be useful as ingredients in other analytics pipelines – more potential fuel for a better model next time. Make sure that the findings that emerge from the analytics about the data and the relationships between data and the outcome, are captured as metadata so that future analysts aren’t starting from nothing, but instead have analytics-based knowledge of what data, under what transformations, matters to which outcomes.
These are the questions I need answers for when I assess an analytics platform. The stronger the answers, the stronger the platform, and the faster my analytics team can get to business results.
Nick has worked for Teradata as a field data scientist and in product marketing for advanced analytics. Nick earned his PhD in business and his MS in statistics and machine learning from Stanford University. Prior to joining Teradata, he was a professor at the Kellogg School of Management at Northwestern University, and a data scientist at McKinsey & Co.