Artikel

Is failure good for your data scientists?

If you’ve heard of data science (if you haven’t, where have you been and how did you find this blog?), you’ve probably heard of “fail fast”.

Chris Hillman

24. September 2017 3 min Lesezeit

If you’ve heard of data science (if you haven’t, where have you been and how did you find this blog?), you’ve probably heard of “fail fast”. The fail fast mentality is based on the notion that if an activity isn’t going to work, you should find out as quickly as possible, and stop doing it.

As the size, complexity and number of new data sources continues to increase, there is a corresponding increase in the value of discovery analytics. Discovery analytics is the method by which we uncover patterns in data and develop new use cases that lead to business value.

It is easy to see how discovery activities lead to a fail fast method. However, how can we learn from these failures, and how can we proceed without experiencing the same failures time and again?

Good failure, bad failure

There are two different types of failure possible in a data science project: good failures and bad failures. Good failures are a necessary part of the discovery process, and an important step in finding value in data. On the other hand, bad failures occur when they could have been avoided, and are basically of waste of everybody’s time. Examples of the cause of bad failures include:

Poor specification – this is not specific to data science and applies to any project that isn’t specified properly in terms of expected results and appropriate timelines.
Inappropriate projects for a data science methodology – it has become increasingly common to call all analytics data science. If a project can be solved using a standard data warehouse and business intelligence method, then you should probably just do that.
Poor expectation management – many data science projects suffer from this. It is important to ensure stakeholders are aware what can and cannot be expected from the results.
Data supply – a vital first step in any analytics project is to ensure that the necessary data feeds are available and accessible.

Let’s talk about publication bias. This phenomenon occurs in the publication of scientific papers, where it is usual to only publish studies that produce positive results. What is far less common is to publish a paper that highlights the amount of work you did in order to fail to produce anything of any worth! The problem is that this leads to teams making the same mistakes, or proceeding down the same creative cul-de-sacs as so many before them. Because of publication bias, we do not learn from each other’s mistakes.

Exactly that situation can occur in a data science team. Unless a true collaborative environment exists for discovery and predictive model development, the same failures will be made over and over again by different members of the team.

Move out of the cul-de-sac

In order to benefit from the fail fast approach, data science teams need to adopt a best practice method of sharing results, methodologies and discovery work – especially when their work is considered a failure. This can be done in many ways, but some of the more effective include regular discussion – similar to agile methodology’s stand-up meetings – and using appropriate software to aid the process.

Software tools exist to facilitate collaboration, issue tracking, continuous documentation, source control and versioning of programme code, as well as task tracking. These tools create a lineage of activities that is permanent and searchable.

If you want to hear more on this subject, why not come to see my presentation ‘My data scientists are failures’ at the Teradata PARTNERS conference in Anaheim this October.

Tags

Articles

Chris Hillman is the Senior Director, AI/ML in the International region and has been responsible for developing and articulating the Teradata Analytics 1-2-3 strategy and supporting the direction and development of ClearScape Analytics. Prior to this current role, Chris led the International Data Science Practice and has worked on a large number of AI projects in the International Region focusing on the generation of measurable ROI from Analytics in production at scale using Teradata, open source and other vendor technologies. Chris has spoken regularly at leading conferences including Strata, Gartner Analytics, O’Reilly AI and Hadoop World. Chris also worked to establish the Art of Analytics practice, promoting the value of producing striking visualisations that draw people into Data Science projects, while retaining a solid business-outcome foundation.

Zeige alle Beiträge von Chris Hillman

Bleiben Sie auf dem Laufenden

Abonnieren Sie den Blog von Teradata, um wöchentliche Einblicke zu erhalten

Geschäftliche E-Mail-Adresse*

Land*

Nein

Ich erkläre mich damit einverstanden, dass mir die Teradata Corporation als Anbieter dieser Website gelegentlich Marketingkommunikations-E-Mails mit Informationen über Produkte, Data Analytics und Einladungen zu Events und Webinaren zusendet. Ich nehme zur Kenntnis, dass ich mein Einverständnis jederzeit widerrufen kann, indem ich auf den Link zum Abbestellen klicke, der sich am Ende jeder von mir erhaltenen E-Mail befindet.

address1

Der Schutz Ihrer Daten ist uns wichtig. Ihre persönlichen Daten werden im Einklang mit der globalen Teradata Datenschutzrichtlinie verarbeitet.

Is failure good for your data scientists?

Good failure, bad failure

Move out of the cul-de-sac

Über Chris Hillman