Our Senior Advisory Board member Jan Rauch introduces data mining as a highly customisable and reliable method of both quantitative and qualitative analysis in political risk analysis.
Jan Rauch is one of PRINCEPS Advisory’s Senior Advisory Board members and a professor of informatics at the Prague University of Economics and Business. His teaching and research focus on data science. He specialises in mechanizing hypothesis formation and observational calculi.
First and foremost, what is data mining?
Data mining is a discipline of informatics supported by a number of software solutions and theories, which aims to uncover interesting relationships within data sets and bring value to the owner of the data. It is directly related to the term „data science“.
How does the analytical process work, step by step?
We use CRISP-DM methodology, which stands for “Cross-industry standard process for data mining.” It consists of six phases: business understanding, data understanding, data preparation, modelling, evaluation, and deployment.
The aim of the first two phases is the formulation of a reasonable analytical question, which has the potential to provide interesting answers to the data owner. And the entire methodology, particularly its first two steps, is largely built on feedback – for example, the analytical question is informed by the real needs of the owner of the data sets, as well as the limitations of the data available. When the question has been determined, the data is prepared for a suitable analytical method, usually determined by the question itself. The method provides results which may or may not present the answers the data owner was looking for. If not, we reformulate the question and start the process over again. The resulting answers need to be interpreted in a wider context, and the information that proves to be useful is finally applied in practice. After some time, we reformulate the question again, in light of the feedback we receive.
How do you envision data mining being used for political risk analysis?
I think that its use is really very broad. But the true benefits will only become apparent based on the first applications. As ever, we will have to think them through and test whether or not the results bring value to the political analysts.
What type of information that data mining can unveil is the most interesting for political risk?
I think that would be so-called “nuggets”: deeply hidden, valuable information. It takes work to discover them, but I definitely think they exist, and not just in the form of basic association rules, but also interesting exceptions to the general norm. For example, we can analyse a population histogram and identify a whole sub-population, which entirely negates the general trend. Other relationships also exist, for example action rules – those are able to predict the action the data owner should take to generate profit. For example, by analysing client data and accounting for their various characteristics and behaviours we can identify how to influence their behaviour. Uncovering similar information is the result of an interactive process put into motion by the data miner and owner.
How reliable are data mining predictions?
It mostly depends on the analytical question, and the complexity of the used data set. If some data is missing, the error rate will naturally be higher. But a well-formulated analytical question can bring it down to a minimum.
Interested in our approach?
Follow us for news from the industry and more information about our work.