Why, and How, Should Geologists Use Compositional Data Analysis/Normal Processing of the Data

Case One
For this case, I will process the initial dataset as a whole, without differentiating between major and trace elements. Since we established that there are no statistical outliers and no zero values within the data, the first step was to determine their distribution, using the kurtosis and skewness test as described by Kashdan et al, (1979). As one can see from Table 2, all elements responded to a Normal Distribution Law, except for CaO and Na2O, so the next step was to transform those values into Logarithms before testing their correlations.

'''Table 2. Results of the analysis of kurtosis and skewness.'''

Using Excel data analysis capabilities, I then determined the correlation analysis of the data (Table 3).

'''Table 3. Correlation analysis of the initial dataset.'''

To determine the significance of the obtained correlations, I then proceeded to calculate the critical value of Student using equation 3 (Table 4).


 * Equation 3: Critical value of Student to determine the significance of the obtained correlations.

Where, tc-	critical value of Student r-	correlation n-	amount of data

'''Table 4. Critical values of Student for the correlation analysis of the initial dataset.'''

It is a common practice in geology that for n > 30 and a probability of 0.05 (95%), if tc > 3, then the correlation is significant. Table 6 shows which correlations are significant from the initial dataset.

'''Table 5. Results of the significant correlation analysis of the initial dataset.'''

Selecting the proper RCCs
Since the use of Range Correlation Coefficients (RCC) is not a common practice, I will explain in more detail the methodoLogy for their selection from the significant correlations. a.	We start by ranging all correlations from the highest positive to the lowest negative correlation. b.	Start at the bottom of the list and select all the negative significant correlations first to create the initial coefficient. c.	After the first time you use a correlation pair, every time you get the same element, put a dot on top as shown in equation 4.


 * Equation 4: Example of the calculation of the multiplicative factor on a RCC.

This example means that you had a significant positive correlation between Cu and As and two significant negative correlations between Co and Cu and Co and As.

d.	Once you finish with the negative significant correlations, go back to the top of the list and repeat the same process for the positive correlations. e.	If it is possible, do combine the obtained coefficients. f.	If you get a contradictory result, e.g. an element that has “conflicting correlations” with previous elements, take those elements out from the coefficient and start a new one.

You can reduce the size of the obtained RCC by eliminating the less frequent elements and subtracting their influence from the overall coefficient. For example, let us assume that we obtained the RCC represented in equation 5 as follow:


 * Equation 5: Hypothetical RCC to demonstrate the reduction process.

If we would like to eliminate the Sc and the L.O.I., we first eliminate the L.O.I. and subtract 2 from every element from equation 5.


 * Equation 6: Hypothetical RCC without the L.O.I. component.

Then we would subtract the remaining Sc as shown in equation 7.


 * Equation 7: Hypothetical RCC without the remaining Sc.



RCC from the initial dataset
According to Table 5, and using the methodoLogy just described, I obtained the RCC shown in equation 8.


 * Equation 8: RCC1 for the initial dataset.



In addition, because L.O.I. has a conflicting correlation with some elements from RCC1, I created a separated RCC for this case as shown in equation 9.


 * Equation 9: RCC 2 for the initial dataset.



Before we use SURFER v. 8.0 to graphically plot these coefficients over our model of mineralization, let us graphically analyze these coefficients using Grapher v. 7.0, also from the Golden Software suite of programs (www.goldensoftware.com).

Analysis of the RCCs from the initial dataset.
The objective of the processing of the data should be to obtain a RCC that will be as similar as possible to our theoretical ones represented by equations 1 and specially equation 2. We can see that we obtained all the embedded correlations, but “masked” by the presence of spurious (inexistent) ones. For example:


 * 1) There is no correlation whatsoever between Al2O3 and the other elements (Fig. 10)
 * 2) Same situation with the Fe2O3 (Fig. 11).
 * 3) Same situation with the Sc (Fig 12).
 * 4) Same situation with the TiO2 (Fig. 13).
 * 5) Same situation with the L.O.I. (Fig. 14).



Note however, that if we only choose the most strong correlations (r>0.92), then we get a RCC with just the embedded correlations.

Case Two
A more common way to study this kind of data will be to separate the major oxides from the trace elements and treat them separately. It is often intuitively clear for geologists that when mixing percentages with ppm or ppb, the existing correlation between the trace elements is masked or eliminated by the relationships between the major oxides. So I proceeded to separate the initial dataset into major oxides and trace elements (Tables 7 and 8 in the file Initial data.xls [worksheet “Processing”], located in the attached CD).

Major oxides
'''Table 6. Correlation for the major oxides from the initial dataset.''' '''Table 7. Critical value of Student for the correlations of the major oxides of the original dataset.'''

'''Table 8. Significant correlations for the major oxides from the initial dataset.'''

Analysis of the RCCs from the major oxides of the initial dataset.
As it was the case when we processed the whole dataset, here we obtain a RCC that contains the hypothetical one that we are looking to obtain (equation 1), but it is masked by the presence of other elements, many of them without real correlations between them (equation 10).

'''Equation 10. RCC3 for the major oxides of the initial dataset.'''



Trace elements
'''Table 9. Correlation analysis of the trace elements of the initial dataset.'''

'''Table 10. Critical value of Student for the correlations of the trace elements of the original dataset.'''

'''Table 11. Significant correlations for the trace elements from the initial dataset.'''

Analysis of the RCCs from the Trace Elements of the initial dataset.
As it was the case when we processed the whole dataset, here we obtain a RCC that contains the hypothetical one that we are looking to obtain (equation 2), but it is masked by the presence of other elements, many of them without real correlations between them (equation 11).

'''Equation 11. RCC 4 for the trace elements of the initial dataset.'''



Graphical representation of the RCCs
Using SURFER v.8 from Golden Software Inc., (you can download the demos from www.goldensoftware.com), I obtained Figures 15 – 18.



RCC1 only maps the southwestern border of the mineralized target in a much-dispersed fan that makes it impossible to use as a targeting tool. This was our best RCC from the analysis of the whole dataset.

The only reason why RCC2 partially covers the ore body is the strong and real correlations between Ni, Co, and K2O. None of these elements have however, a correlation with L.O.I:, therefore, this is a classic example of the formation of a spurious correlation because we applied correlation analysis to a “closed” dataset.

RCC3 contains one of the embedded correlations (SiO2 vs. Al2O3), but it also contains several spurious correlations and, since it is a petrographic association, has little to do with the location of the ore body.

Finally, RRC4 is mostly our main embedded correlation, and although it also contains some spurious components (e.g. correlation with Sc), it is not surprising that it maps perfectly the ore body.

Conclusions and recommendations from the processing of the initial dataset
Closed systems do provoke spurious correlations that mask the effectiveness of the established RCC. This is especially true when processing datasets that contain a combination of major oxides and trace elements. In those cases, I recommend to use only the extremely intense correlations.

A more useful solution is to separate major oxides from trace elements, and concentrate again only on the intense correlations. The disadvantage here is that we do not use the combine information of both groups of elements.

Will the transformation of the data be more efficient in the creation of RCCs that will help us target the mineralized zone within the granodiorite intrusive?

= Compositional Data Analysis =

The CoDaPack software (which is included in the attached CD and the user guide is presented in Appendix 1) offers three type of transformation, the Centered Log-Ratio transformation (CRL), the Additive Log-Ratio transformation (ARL), and the Isometric Log-Ratio transformation (IRL). The last two require a column with the residual (100 minus the sum of all the other components).

Centered Log-Ratio Transformation (CLR)
Appendix 1 contains the instructions on how to use the CoDaPack software. Since there are no zero values in our dataset, we can proceed directly to the CLR transformation (see table).

'''Table 12. Results of the CLR transformation of the initial dataset.'''

As Table 12 shows, the dataset is now “open”, since the sum of all the components is equal to zero, not 100%.

Once I achieved this transformation, I processed the data following the same steps as with the initial dataset. Tables 13 to 15 show the results of this process.

'''Table 13. Correlation analysis of the CLR transformed data.'''

'''Table 14. Critical value of Student of the CLR transformed data.'''

'''Table 15. Significant correlations of the CLR transformed data.'''

Using SYSTAT SPSS 10.0 for Windows I constructed a matrix of scatter plots (Fig. 19) to confirm the results from table 15, as well as some individual graphics using Grapher 7.0 which clearly show that all the correlations now are real (Figs. 20 – 23).



'''Figure 19. Matrix of scatter plots for the CLR transformed data.'''


 * [[Image:Real Correlations Figures_20_to_23.jpg]]

Equation 12 shows the RCC determined for the CLR transformed data and equation 13 shows the same RCC, but reduced by eliminating the SiO2 and the L.O.I.

'''Equation 12. RCC5 for the CLR transformed data.'''



'''Equation 13. RCC5a for the CLR transformed data after reducing the SiO2 and the L.O.I.'''



Fig. 24 shows that RCC5a can effectively target the copper mineralization within the granodiorite intrusive.



'''Figure 24. The RCC5a can effectively target the copper mineralization within the granodiorite intrusive.'''

Now, if we will use only the strongest correlations (r>0.95) then we will obtain RCC6 and RCC7 (equations 14 and 15).

'''Equation 14. RCC 6 for correlations stronger than ±0.95 for the CLR transformed data.'''



'''Equation 15. RCC 7 for correlations stronger than ±0.95 for the CLR transformed data.'''



As one can see from Fig. 25, RCC 6 is an almost perfect match with the location of the ore body. The RCC 7 (Fig 26) is similar to RCC 3 and represents a petroLogic association.



'''Figure 25. Almost perfect correspondence between the RCC 6 and the location of the ore body.'''



'''Figure 26. RCC 7 represents a petroLogic association of major oxides.'''

Additive Log-Ratio Transformation (ARL)
Table 16 shows the results of the transformation of the original dataset. Tables 17 through 19 show the results of the correlation analysis.

'''Table 16. ALR transformed data.'''

'''Table 17. Correlation analysis for the ALR transformed data.'''

I used SYSTAT SPSS 10.0 for Windows to construct a matrix of scatter plots (Fig. 27) to confirm the results from table 17.



'''Figure 27. Matrix of scatter plots for the ARL transformed data.'''

'''Table 18. Critical values of Student for the ARL transformed data.'''

'''Table 19. Significant correlations of the ALR transformed data.'''

Equations 16 and 17 shows the RCCs determined for the ALR transformed data. It is interesting to note that all the correlations here are positive.

'''Equation 16. RCC 8 of the ARL transformed data.'''



'''Equation 17. RCC 9 of the ARL transformed data.'''



Figures 28 and 29 show the spatial behavior of these RCCs with respect to the location of the ore body. If we combine both RCCs, we obtain equation 18 (Fig. 30).

'''Equation 18. Combination of RCC 9 and RCC 8 for the ARL transformed data.'''




 * [[Image:Best possible spatial correspondence Figures_28_to_30.jpg]]

Isometric Log-Ratio Transformation (IRL)
Table 20 shows the results of the transformation of the original dataset. Tables 21 through 23 show the results of the correlation analysis.

'''Table 20. IRL transformed data.'''

'''Table 21. Critical value of Student of the ILR transformed data'''

Figure 31 shows the result of a matrix of scatter plots constructed with SYSTAT SPSS 10.0 for Windows to test the results from table 21.



'''Figure 31. Matrix of scatter plots for the IRL transformed data.'''

Table 22. Critical value of Student of the ILR transformed data.

Table 23. Significant correlations of the ILR transformed data.

Equations 18 through 20 shows the RCCs determined for the ALR transformed data.

'''Equation 19. RCC 10 for the IRL transformed data.'''



'''Equation 20. RCC 11 for the IRL transformed data.'''



'''Equation 21. RCC 12 for the IRL transformed data.'''



'''Equation 22. RCC 13 for the IRL transformed data.'''



Figures 32 -35 show the result of the use of these RCCs as targeting tools.



Conclusions and Recommendations from the Compositional Data Analysis
One very important effect of “opening” a dataset by using any of these transformations is that we get rid off all spurious correlations. The transformed data do contain unexpected correlations, but they are real.

Another important point is that we do not need to process the data separately (e.g. separating major oxides from trace elements), but can process the whole dataset taking advantage of the information contained in both groups.

From all the RCCs obtained so far, the RCC 8, RCC 9, and especially the RCC9/8 (ALR) were by far the most efficient one for targeting the copper mineralization.

The CRL transformed data did also provide for useful RCCs, especially if we concentrate in the higher correlations.

Finally, the IRL transformed data was effective for as long as the geochemist will “interpret” the coefficient and not plot them blindly. For example, a geochemist should know that elements like Pb and Co, usually concentrate bellow the ore body (inframinerals), and therefore while using RCC 13, the investigator should concentrate on the lower values as an indication of the location of the ore body. We have a similar situation with RCC 11. The investigator should know that a common effect of sodic metasomatism would be the lixiviation of MgO and K2O; therefore, the geochemist should be looking for lower values of the RCC.

In general, I can state that transformed data are more effective for the location of the mineralized targets than the non-transformed dataset, and that the ARL method seems to be the most effective for processing this type of data. However, the geochemist should always use his background knowledge to help to decide the most efficient RCC for the studied area.

I recommend the use of the software CoDaPack for the processing of any type of “closed” dataset.