Issue dated - 12th April 2004

-


Previous Issues

CURRENT ISSUE
INDIA NEWS
NEWS ANALYSIS
COLUMNS
TECH FORUM

THE C# COLUMN

BETWEEN THE BYTES
TECHNOLOGY
SPECIALS <NEW>
Symantec Report
Security Headquarters
JobsDB
MINDPRINTS
HMA BANKBIZ
EC SERVICES
ARCHIVES/SEARCH
IT APPOINTMENTS
Openings At Jobstreet.com
WRITE TO US
SUBSCRIBE/RENEW
CUSTOMER SERVICE
ADVERTISE
ABOUT US

 Network Sites
  IT People
  Network Magazine
  Business Traveller
  Exp. Hotelier & Caterer
  Exp. Travel & Tourism
  Exp. Pharma Pulse
  Exp. Healthcare Mgmt.
  Express Textile
 Group Sites
  ExpressIndia
  Indian Express
  Financial Express

 
Front Page > Technology > Story Print this Page|  Email this page

Using Decision Trees to predict customer behaviour

In the latest instalment of his series on Customer Relationship Management, Khalid Sheikh explains how CRM can be used to predict customer behaviour, a vital need in many businesses

A Decision Tree is a predictive model that is used to make predictions through a classification process. The predictive model is represented as an upside down Tree—root at the top (or on the left-hand side) and leaves at the bottom (or on the right-hand side).

Decision Trees represent rules. By following the Tree, you can decipher the rules and understand why a record is classified in a certain way. These rules can then be used to retrieve records falling into a certain category, and the known behaviour of the category is the predicted behaviour of the entity represented by the record.

In CRM, Decision Trees can be used to classify existing customer records into customer segments that behave in a particular manner. The process starts with data related to customers whose behaviour is already known; for example, customers who have responded to a promotional campaign and those who have not; or customers who have churned (left the service for a competitor) and those who have not. The Decision Tree developed from this data gives us the splitting attributes and criterion that divide customers into two categories. Once the rules that determine the classes to which different customers belong are known, they can be used to classify existing customers and predict behaviour in future. For example, a customer whose record shows attributes similar to those customers who have churned in the recent past is more likely to churn, and that is the prediction that marketers are looking for to plan activities to pre-empt the churn.

Classification classes

A set of classification classes can be defined for a database having a large number of records such that each record belongs to one of the given classes. The classification process decides the class to which a given record belongs. The classification process in Decision Trees is also concerned with generating a description or a (predictive) model for each class from the given data set.

Predictive modelling

Predictive modelling is similar to the human learning experience in using observations to form a model of the important characteristics of some phenomenon. This approach uses generalisations of the ‘real world’ and the ability to fit new data into a general framework. Predictive modelling can be used to analyse an existing database to determine some essential characteristics (model) about the data set. The model is developed using a supervised learning approach.

This has two phases: training and testing. Training builds a model using a large sample of historical data called a training set, while testing involves trying out the model on new, previously unseen data called a test set, to determine its accuracy and physical performance characteristics. Applications of predictive modelling include customer retention management, credit approval, cross-selling, and direct marketing. Supervised classification is one of the techniques associated with predictive modelling.

Supervised classification

In supervised classification,

  • A training data set is used to generate the class descriptions (predictive models). For each record of the training set, the respective class to which it belongs is also known. Using the training set, the classification process attempts to generate the descriptions of classes (predictive models). These descriptions are then used to classify the unclassified records.
  • A test data set is used to measure the effectiveness of a classification. A test data set can be used to determine the effectiveness of a classification method. A set of test records whose classifications are already known are passed through the classifier and the resulting classifications are compared with the known classifications. The percentage of matching classifications is the measure of effectiveness of the classification method.

There are several approaches to supervised classifications. Decision Trees are especially attractive in the data-mining environment as they represent rules. Rules can be easily expressed in natural languages, and they can be easily mapped to a database access language like SQL. To summarise:

  • A Decision Tree represents a series of questions. Good questions produce a short series of questions.
  • Each question determines what follow-up question is best to be asked next.
  • The leaves represent the most specific classification for a data record. Decision Trees are drawn with the root at the top (or on the left-hand side) and the leaves at the bottom (or on the right). The root represents the most general classification—the entire dataset, the leaves represent the most specific classification. A data record enters the Decision Tree at the root node (the top) and then the record works its way down until it reaches a leaf node. The leaf node determines the most specific classification of the record.
  • Effectiveness can be enhanced by pruning the incompetent branches. Some paths are better than others are because the rules associated with them are better. The predictive effectiveness of the whole Tree can be enhanced by pruning incompetent branches.

Building the Decision Tree Algorithm

  • The algorithm attempts to find the test that will split records in the best possible manner among the wanted classification.
  • At each lower level node from the root, whatever rule works best to split the subset is applied.
  • The process of finding each additional level of the Tree continues. The Tree is allowed to grow until you cannot find better ways to split the input records.

Process of creating Decision Trees

All Decision Tree construction methods are based on the principle of recursively partitioning the data set till homogeneity is achieved. The construction of a Decision Tree involves the following phases:

  • Construction phase: The initial Decision Tree is constructed in this phase, based on the entire training data set. It requires recursively partitioning the training set into two, or more sub-partitions using a splitting criterion, until a stopping criterion is met.
  • Pruning phase: The pruning phase involves removing some of the lower branches and nodes to improve performance. The Tree constructed in the previous phase may not result in the best possible set of rules due to overfitting. Often the training dataset used for constructing a Decision Tree may not be a proper representative of the real-life situation and may contain noise. While building a Decision Tree from a noisy training data set, it might be prudent to grow the Decision Tree just deeply enough to guard against the possibility of incorporating unnecessary features making the Tree difficult to comprehend. A Decision Tree T is said to overfit the training data if there exists some other Decision Tree T’, which is a simplification of T, such that T has smaller error over the training set but T’ has smaller error over the entire distribution of instances. This situation is indicative of noise in the training set.
  • Processing the pruned Tree: In this step, the Decision Tree is processed to improve understandability.

Classification process

  • A record enters the Decision Tree at the root node. At the root, a test is applied to determine which child node the record will encounter the next.
  • Splitting attribute: Associated with every node of the Decision Tree is an attribute, called the splitting attribute, whose values determine the partitioning of the data set when the node is expanded. In the example described next, outlook, humidity, and windy are the splitting attributes.
  • Splitting criterion: The qualifying condition on splitting attribute for is called the splitting criterion. For a numeric attribute, the criterion can be an equation or an inequality. For a categorical attribute, it is a membership condition on a subset of values. In the example, Humidity < 75%, or > 75% is the criteria for the humidity attribute; whereas the outlook being sunny, overcast, or rainy are the criteria for the outlook splitting attribute at the root.
  • This process is repeated until the record arrives at a leaf node. All the records that end up at a given leaf of the Tree are classified in the same way. There is a unique path from the root to each leaf. The path is a rule, which is used to classify the records.

Example: The example has been adapted from the book, ‘Data Mining Techniques’ by Arun K Pujari; published in the year 2001 by Universities Press, Hyderabad. Based on training data set shown in Figure 1, the task of the supervised classification process is to find a set of rules to know what values of outlook, temperature, humidity, and wind, determine whether a golf player would choose to play golf. The training data, which contains the attributes values of golf players who decided to play and who decided not to play, is used to formulate the rules in Table 1. The rules are tested by making a prediction about the behaviour depicted in the test data set and then comparing the predicted behaviour with the actual behaviour that is already known. A match between the predicted and actual behaviour, shown by ( ), confirms that the rule is correct. While a mismatch between the predicted and actual behaviour, shown by ( ), indicates that the rule is incorrect.

The accuracy of the classifier is determined by the percentage of the test data set that is correctly classified. The last column of the second table in Figure 1 shows what is the known classification of the records in the test set; this classification is assumed as the correct classification. The column also shows whether the classification determined by the Decision Tree matches with the known classification. A check mark indicates that the classification determined by the Tree is the same as shown in the test data. A cross indicates the determined classification is opposite of the one shown in the column. It above shows the accuracy of the rules based on this. Once the fairly accurate rules are known the Decision Tree can be built as shown in Figure 2. The Tree is then used to find the class to which the data element belongs. The behaviour of the class is the predicted behaviour of the golf player under the situation described by the data element.


Table 1—The rules
Rule# Rule Description     Accuracy
  If..., and if..., then...  
1 If it is Sunny, and the humidity is 75% or less, play 50%
2 If it is Sunny, and the humidity is abve 75% do not play 50%
3 If it is overcost, - play 66.67%
4 If it is rainy, and not windy play 50
5 If it is rainy and windy do not play 0%

Figure 2.The Decision Tree (Adapted from Pujari 2001)

The author is associate professor of Supply Chain Management at S P Jain Institute of Management & Research, Mumbai. He can be contacted at khalid_sheikh@hotmail.com

<Back to top>


© Copyright 2003: Indian Express Group (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in
Mumbai by The Business Publications Division of the Indian Express Group of Newspapers.
Please contact our Webmaster for any queries on this site.