Kenex - Predictive Modelling

Predictive Modelling

Although a GIS is a perfect way of visualising data and producing maps from that data, GIS also allows you to create new data using statistically based gridding techniques or predictive maps using spatial data modelling techniques. This modelling is where businesses can really add value using their data rather than just passively using it to generate maps and figures. Kenex considers that the real value in your data, once converted to digital form, is created using these modelling techniques.

 

Basic statistical gridding allows you to predict unknown values from within a single layer such as topography, geochemistry, hydrology or climate data. However the real power of GIS is when spatial modelling is applied to combine several layers to predict outcomes based on probability such as:

  • Mineral prospectivity
  • Renewable energy project planning
  • Agricultural sustainability
  • Geotechnical risk
  • Environmental risk
  • Conservation planning

Spatial modelling uses multiple layers or themes related to the object or occurrence being searched for to statistically predict areas where it is most likely to be found. For example, conservation workers may be trying to find and protect a rare animal. They can use existing digital data about the conditions of the region they are searching in (e.g. vegetation types) along with their knowledge of the animals preferred habitat to model areas where the animal is likely to befound. This animal might always live in an alpine climate, mostly be found in rock bluffs or boulder piles, and might commonly be found on north-facing slopes. Spatial modelling can then use themes of climate, land cover and topographic slope to identify places where the rare animal might live.

 

The key to the success of spatial data modelling is related to the way it takes into account of how strongly a particular theme is related to the occurrences being modelled (i.e. what the probability is of an occurrence happening in the area of that theme) and then combines all the themes weighted according to their importance to make a prediction. When the three themes (soil, slope, and environment) are combined a probability map (like the one on the left here) is produced that shows the probability of an occurrence in any given area. The probability map then allows you to rank the likely occurrence of the animal. The modelling and ranking would then allow the conservation worker to focus their conservation efforts into the areas where the animal is most likely to be found.

 

As you can imagine, spatial modelling can be a very powerful tool that can be applied widely. Since the initial use of Weights of Evidence modelling in medical diagnosis and research, spatial data modelling has been successfully applied to mineral prospectivity, forestry, conservation, petroleum exploration, landslide occurrence, and could even be used to locate ideal housing and community locations for new families! 

 

Kenex specialises in all aspects of spatial modelling and importantly are experienced in its application in a business environment. Kenex also has been involved in the research and application of spatial data modelling to digital mapping software and regularly runs workshops at international conferences.

 

The simplest type of predictive spatial analysis is where maps, with the chosen input variable(s) represented by a series of integer values, are combined together using arithmetic operators. This type of analysis takes no account of the relative importance of the variables being used and is based on expert opinion. Fuzzy Logic techniques address the problem of the relative importance of data being used, but this technique still relies on expert opinion to derive weights that rank the relative importance of the variable for the map combination. Weights of Evidence, in contrast uses statistical analysis of the map layers being used with a training data to make less subjective decisions on how the map layers in any model are combined. Neural network techniques have been developed to mimic the thought process of the human brain and are entirely data driven techniques that are difficult to interpret. More details of the particular techniques and their application are given in our links page.

 

The spatial modelling is based on these three techniques are outlined below:

 

 

Weights of Evidence modelling

Weights of Evidence is a Bayesian statistical approach that allows for the analysis and combination of data to predict the occurrence of events. It is based on the presence or absence of a characteristic or pattern and the occurrence of an event. The technique was initially developed as a diagnostic tool in medicine. In spatial analysis, it has been used extensively in the exploration and mining fields.

 

An estimate of the (prior) probability of the occurrence of a training dataset to the map pattern being modelled can be calculated from the total number of occurrences distributed over the region being targeted divided by the area of that region. Two probabilities can be computed for each class in the themes of the model. For each class, a W+ probability value is computed from the presence of a feature (or training point) in the class area and a W- probability value from its absence from the class area. The contrast value C is calculated from the difference between the two and can be used as measure of correlation strength between the theme being tested and the occurrence of the feature being modelled e.g. the correlation between the theme of Alpine Tussock and a rare Powelliphanta New Zealand Land Snail. A unit area is chosen that represents the potential area extent of the occurrences being modelled and is used as a grid for the spatial calculations. A probability or statistical value of importance can then be calculated for all variables that are to be input into the model. This probability is based on the prior probability and the presence or absence of the variable in question. The odds of occurrence (logits) are then used to combine the various statistically valid variables that represent the model to produce a probability map.

 

Fuzzy Logic modelling

Fuzzy Logic deals with the concept of 'partial truth', i.e. truth values between completely true and completely false. It was introduced by Dr. Lotfi Zadeh of UC/Berkeley in the 1960's as a means to model the uncertainty of natural language. Zadeh says that rather than regarding fuzzy theory as a single theory, we should regard the process of fuzzification as a methodology to generalize any specific theory. Thus researchers have also introduced 'fuzzy calculus', 'fuzzy differential equations', and so on.

 

Fuzzy Logic is a popular and easily understood method for combining mineral exploration datasets using subjective judgment. Each exploration dataset to be used is weighted using a fuzzy membership function, which expresses the degree of importance of the various map layers as predictors of the deposit type under consideration. Themes may be combined by a variety of fuzzy combination operators (fuzzy AND, fuzzy OR, fuzzy gamma, etc) according to a scheme that may be represented with an inference network. The output from the fuzzy logic module is a map showing mineral favourability, combining the effects of the input evidential themes. No prior knowledge of mineral occurrence locations is required, so this method complements the 'data-driven' weights of evidence method, which requires that a set of training points (mineral occurrences) be known within the study area. 

 

Example of a Fuzzy Logic Decision Tree

 

Neural Network modelling

Neural Network analysis is a popular method used for multivariate prediction and two techniques are currently used.

 

Self-organizing Neural Network automatically groups the sample of input feature vectors into classes. This classification process extracts knowledge from the data in that known properties of a member of a class usually belong to other members. The process is iterative in that a coarse arbitrary grouping is made initially for which iterations change and refine it until the clusters do not change further. Each output feature vector has an associated fuzzy value of membership in one or more clusters.

 

Fuzzy Neural Network uses extra weights and relationships between variables to better model the output as a function of the inputs. It requires the user to associate the output values with the appropriate input feature vectors (fuzzy-membership values) and present all of the associated data to the network. The network then learns these input-output associations and will interpolate any given input feature vector in terms of the learned ones to provide an output fuzzy-membership value or vector (the combined data membership value or values). The main advantage of neural networks over the fuzzy logic, weights of evidence and logistic regression methods is that nonlinear relationships can be more readily modelled.

 

Why undertake Spatial Modelling?

  • Create predictive maps from digital data correlating the information from different spatial datasets.
  • All the important spatial factors can be combined into a single predictive map using multi-variable modelling techniques.
  • Deals with data overload and quality issues and takes advantage of digital data, computer power and storage.
  • Modelling can be a non-bias view of data.
  • Allows the combination of spatial data and knowledge in a way to manage and target more effectively.
  • Dynamic data processing that can be updated or modified over the time with new information.
  • Save time and money by putting resources into the most likely places the first time and undertaking risk assessment and management of assets.

 

 

 

To find out more about predictive modelling or GIS investigate the links below:

 

 

 

 

examples of predictive maps

 

 

 

 

 

example of a weighted theme

 

 

 

 

Useful GIS links:

 

Predictive Modelling

Geological and GIS Software

 

Read more about the new fuzzy logic tools integrated in ArcGIS 10