Constructing trait correlation matrices

Protocol

 

Authors

Lawren Sack, Aurora Kagawa, Christine Scoffoni, Megan Bartlett, Marissa Caringella

Author affiliations/Institutions

University of California, Los Angeles

Overview

This protocol will provide you with rigorous correlation matrices, enabling consideration of as many traits as you wish, including coefficients for Pearson correlations on untransformed data, Pearson correlations on log-transformed data, and Spearman rank correlations, and their significance. This is especially useful and time saving when studying many traits to determine their correlative structure at a glance! See “ExampleMatrix.xls”; each cell contains the three correlation coefficients for each pair of traits, their significance, and is highlighted in yellow if the Spearman and Pearson correlations (on either raw or log-transformed) are significant.

Materials/Equipment

Units, terms, definitions

RAW data: This is to obtain the Pearson correlation matrix of your raw data. If the Pearson coefficient (rp) turns out significant (p < 0.05), there a linear correlation between the two traits tested.

RANK data: This is to obtain the Spearman correlation matrix of your data. It will get rid of the effect of one point driving a correlation, by ranking the data. To be sure the correlation between two traits is significant, the Spearman coefficient (rs) should always be significant (p < 0.05).

LOG data: This is to obtain the Pearson correlation matrix after logging your raw data. You want to log data when the correlation between two traits is non-linear; in such cases, using just raw data may not be significant. The Pearson coefficient determined for LOG data is effectively that for a power law fitted to your data. If the Pearson coefficient on the LOG data is significant (p < 0.05) then the relationship between the two traits is non-linear.

Procedure

1) Create a new folder called “correlation_matrix” to work from

2) Make a new Excel spreadsheet :

– RAWData.csv

  • ­ Paste your data (species means for given traits), not including the species name, with the first row being the name of your different traits (see attached RAWData.csv file as an example)
  • ­ Make sure your trait names have no spaces (replace spaces by underscores), the shorter the name the better
  • ­ Save under the name RAWData and as .csv (comma delimited)

3) In R

– Go to File -> Change dir… ->browse and select the folder you created under the name “correlation_matrix”

– Open the R correlation matrix script (“three_corr_matrix_5May15.R”), copy it and paste it in R, press enter. The script will run a Pearson correlation on the RAW data, rank the data and run a Spearman correlation on the RANK data, then log the data and run a Pearson correlation on the LOG data

– Once done, it will have created 6 new files (2 for each data set; one is the r value and the other is the p value) in the correlation matrix folder

4) In the Master_correlation spreadsheet (attached)

– Paste the data from RANKcor (giving you the r values from the RANK data)

– Then paste the data from RANKp (giving you the p-values from the RANK data)

– Now you want to create the RANK STARS section, where * will appear if the p-value was below <0.05, ** if p< 0.01 and *** if p< 0.001. The cell will remain blank if the p-value was not significant (p>0.05). Paste in the formula: =IF(cellx<0.05,IF(cellx<0.01,IF(cellx<0.001,”***”,”**”),”*”),””)

– Repeat the same process for RAWcor/RAWp and LOGcor/LOGp putting one below the other on the same Master spreadsheet

– Then create the overall significance rows below the LOG STAR: a YES will appear if your two traits have their RANK p-values <0.05 and their RAW or LOG p values <0.05. (for your correlation to be significant you always want your RANK data to be significant and your raw or log data to be significant as well).

– Go Home ->Conditional Formatting->Highlight cells rules->equal to ->type in YES with: custom format, go to fill and choose yellow as background. Press OK

– Your cells with YES now appear YELLOW!

– Concatenate the data to get your correlation matrix with stars:

To do so, use the formula: =CONCATENATE (rvalue from RANKcor, star from RANK STAR,”,”,rvalue from RAWcor, start from RAW star,”,”,rvalue from LOGcor, star from LOG star).

You now want to make the cells yellow in your concatenated matrix using the conditional formatting. To do this you need to paste the YES matrix into Word, and then copy it and paste the “format” back into Excel.

– Open Word document, change the size to 22X22 by going to Page Layout -> size ->more pare sizes and write down 22 x 22

– Select and copy your YES matrix from the Master_correlation spreadsheet, then paste it in your Word document. Select all and paste this in a new excel spreadsheet as html. Select all and copy, then paste on to your concatenated matrix as format. Cells that are significant will now appear yellow.

NOTE: you cannot import more than 60 columns in the word document, so if you are looking at more than 60 traits, you might want to do this step twice selecting half of your data

– To finalize, color in black the 1 to 1 diagonal

– YOUR CORRELATION MATRIX IS NOW READY

Links to resources and suppliers

http://www.r-project.org/