TechBytes - Vantage Advanced Analytic Function: nPath
Hello and welcome to another TechBytes session. My name is Sri Raghavan, and today we’re going to be talking to you about our pattern matching function, which is nPath, which is one of the core and very first advanced analytic functions in Teradata Vantage. There are a couple of things we’re going to do today. First is, we’re going to define what nPath is. We’re going to cover a few use cases to which nPath can be applied. We’re going to show you a little bit of code and we’re also going to be showing you some visualizations associated with nPath. What is nPath? nPath is essentially a function which tells you, before a specific focused activity is arrived at, what are the different things that preceded it? What are the antecedent conditions? Let me give you an example. Let’s say you are a banker and you have tons of customers, but you’re interested in finding out why certain customers leave your bank. Now, they may be leaving for reasons that are good for them, and reasons that are bad for them. Now your job as a banker is to figure out how to determine churn, how to predict churn, meaning how do people leave? And one part of that is to find out if there are certain activities that you can see as a pattern that churners typically follow before they leave. That is what nPath does. It’s basically a behavioral analytic technique that looks at time series data, and creates sequences of activities before certain events that you’re focused on happen. What are the sample use cases to which nPath can be applied? These are things like customer churn. These are things like fraud activities. These are things like network outages. These are not just applied to people and their behaviors, but these are also applied to entities that are inanimate, like sensors for instance, or like network activities. All of these things are candidates to which nPath can be applied. Typically, nPath, just to be very clear, is not the only advanced analytic functions that you will be using to be able to do analytics. Now nPath in and of itself is a very, very powerful solution, and it is something which can be shown in the form of Sankey diagrams, which you see here as a visual output, but also in terms of tree diagrams and so on. But the point is, nPath is the tip of the spear. You first do nPath and then downstream, you do other analytic functions like, let’s say a regression analysis, into which nPath output can be used as an input. So the point that I want you to take out of this is that nPath is a very powerful analytic function, but it’s used in conjunction with other analytic functions for you to address business outcomes, and it is part of what was call multi-genre advanced analytics. So let’s take a look at what nPath looks like on the code. So let’s go to Teradata Studio. I am showing you nPath code. Now, let me explain a few things that are happening here in the code before I go into the results in detail. I have this retail dataset that I’m using, and for those of you who have seen the Sessionization TechBytes, which is a precursor to watching nPath, you would have seen that I divide my retail dataset into two parts. Customers who’ve churned, and customers who’ve not churned. And this retail dataset consists of all customer activities that have been undertaken, and I define anybody who is a churner as somebody who’s returned a product, and I’m calling nPath on those customers. Now, I have this dataset called nPath_churn, where I’ve already segmented out my retail dataset with the churn flag equals Y, meaning that you have returned a product. Okay, now here’s how cool this is. nPath is a fairly complex function where it takes all those activities and it parses a sequence of those activities and it comes up with a nice little pattern. You don’t have to worry about the logic that underlies nPath. That is something which is transparent to you. All you have to say is select star from nPath. What Teradata Vantage does for you is calls up the underlying function and all the logic of the underlying function, and all that it’s expecting you to do is to fill in the parameters. What are those parameters? Well, okay, remember, I said to do nPath, you need to know what it is your are anchoring your activity to, meaning that you are interested in all those sequences of activities that precedes a particular activity, meaning product return in this case. If you’re a financial company, you’re interested, let’s say, in people who commit fraud. And if there’s a fraud indicator, that would be your focus activity. If you are, let’s say, a telecommunications company, you’re probably interested in subscription activity. You want to find out, what are the things that happened before that focus event. So my focus event here is product return, so that’s why I say event. Anything which is not a product return will be given a synonym of E. Okay, that is all those activities that are not a return will simply be called an E, and all those that are a product return will be called a C. Every customer in this dataset has only one activity, the final activity, which is product return. They have a bunch of activities that precede product return, as you can tell, which is E. So what I’m looking for is a pattern, E*C. What I’m trying to tell you is that E*, * indicates more than zero. It’s either zero or more than zero, so any number of activities, but it has to precede product return, meaning that pattern is a specific indicator, what it is you’re looking for. I am basically saying, give me E*.C, which is, I’m saying, give me all those activities, but the first set of activities has to be anything which is not a product return. That’s why I call it an E. I’m basically saying it has to end with a C, which is a product return. I call up the pattern that I want, and I specify those events and I say, give me the very first instance of the customer ID, the timestamp of those activities, and give me the count of each pattern in my entire dataset. That’s what I have here. So I run nPath on that. I have for customer ID 64115, which is why I’m taking only the first instances of customer ID. The first instance of when the pattern started, the last instance of when the pattern ended, the number of events in the pattern, that’s why the count, and then I accumulate. Accumulate essentially puts together all the different activities that preceded product return. As you can see, this is a path. They start off browsing, then they go to return policy enquiry, and then they go to web chat and then product return. Here is my path for this customer. So here are all the customers for whom I have all the paths. As you can see, some customers just return a product straight away, which is this particular event, product return. They basically did not have any activities preceding it. They just simply returned the product. Now, this needs a little bit more explanation as to what I’m actually talking about. Now remember, I have a dataset, but I cut off the dataset at a certain time. I did not want to have all the entire sequence of activities for all customers from the beginning of time. Because I wanted to run this very quickly, I cut off the dataset at a certain time, so which means for this particular customer, 96046, I do not have any of those activities precede at the time. That’s why you are, for this customer, seeing that she or he simply returned a product. Now you might ask me, “Hey, but you showed me a pattern which says show me anything which starts off as a non-product return, that is E” right? And ends in product return. Well, this is where the star comes in handy. Star means any non-product return activity that happens, or doesn’t happen, which means this activity has to happen zero or more than zero times, one, two, three, four, and so on and so forth. So which means if I don’t have anything that is not a product return, that’s fine. It means, just show what it is I’ve done, there are some customers who have not ended up in product return, which means you won’t see them here, because they’re still customers, right? So I hope that made some sense. Now, you can change this a little bit and you can say, give me at least one instance of non-product return for a maximum of four, meaning that before I return my product, I need you to have done at least one, to a maximum of four activities that are not product return related, meaning that it has to be a complaint call or a product browsing, a product return and so on and so forth. So that’s why these options here in the pattern really make a difference. It’s so flexible and so powerful that it can change a few things. The idea here is that, look, you’re looking for a pattern and you’re looking for an event that heels to that pattern. In this case, it’s product return, and you want to show all the paths. Okay, so far, so good as far as code is concerned. Let’s show the visualization as to what that looks like. So here’s a Sankey diagram, which we’ve created on Teradata Vantage’s App Center. I’ll have a separate Tech Byte on what App Center is, but really quickly, App Center is an app building framework for us to showcase visualizations. It does a lot more than visualizations. You can actually show operationalization of your analytics using App Center. You can actually use containers in which you can have things like Jupiter notebooks to be able to execute Python code. All of those things are advanced analytics capabilities, which we will go into other Tech Bytes. But here quickly, what I want to show you here is the fact that I ran this nPath code that I just showed you and I rendered this visualization, and it’s a pretty interactive visualization. Here, as you can see, is product return. Now what it does is that it shows you all these antecedent conditions that happened prior to product return, but it’s very interactive, right? So sometimes you might want to say, hey, what is all this huge thing in the middle where people are returning products? Well, you want to see, it seems like, based on this, I can tell that there are so many people who’ve actually done complaint calls prior to that, that particular product return. Well, then you want to see, okay, where are all these complaint calls coming from? It seems like in this case, they’re coming from-- to your left, you will see online feedback, product browsing, return policy enquiry and store visit, which you see on the left. Now, why is this visualization important to you? Well, it’s important to you because it tells you in a graphical form, not in a tabular form, what the distribution is, prior to these activities. Now if you are a retailer, as you are in this case, you will say, “Look, there seems to be a lot of complaint calls coming. Let’s go look at the nature of those complaints,” meaning let’s go to the call center, take the transcripts of all those calls, and start doing some sentiment and text analytics on that. Let’s extract some sentiments out of that. So see what I mean, before when I told you, nPath is one of the analytic functions? It’s the tip of the spear. nPath, now supposed you ask other questions. Okay, I see this path and some events seem to be more important because of the thickness of those nodes. In this case, complaint call seems to be very big. Okay, what are they complaining about? Are they complaining about the fact that I don’t have enough products on my shelf? Are they complaining about bad service at the store? Are they complaining about the fact that the store is 100 miles away from their residence, and therefore it’s very hard for them to actually go back and keep getting new products, and therefore they have to return it and go somewhere else which is a little bit closer to them? Or is it because the quality of the products is so bad that they really have to return it? These are all those things that you start asking when you do nPath, and when you look at the visualization. So that’s what nPath does. I’m taking way too much time, but I thank you very much for your patience. Thanks for listening so far.