Marker Data Quality


.  



Standards and methodology for marker data quality within the GCP are a point of continous attention within the GCP and are still in the process of development.
It is clear that each dataset, including the molecular data, should have a status on quality.

The Data Resolution method is a GCP-wide accepted procedure for quantifying data set quality, taking into calculation missing fields and missing value of the data sets. With this method the consistency of molecular marker datasets can be determined.
The Data Resolution method is based on the average of the repeated calculation of the correlation coefficient between pairs of markers, each marker originating from half the dataset. The method is applicable to different types of molecular markers, ranging from AFLPs to SSRs.

reference:
van Hintum TJ.2007. Data resolution: a jackknife procedure for determining the consistency of molecular marker datasets. Theor Appl Genet. 2007 Aug;115(3):343-9. Epub 2007 May 15.


To quantify the data quality R scripts are being developed. Finished scripts are available from the GCP repositories in Cropforge  . With this software meta-information of the GCP datasets, like missing fields, frequencies, duplications, missing data, binning, etc. can be quantified.

The GCP makes use of these standarized quality check procedures  for marker data.

At the ICRISAT LIMS Genotype Data Quality Control methods are embedded.


.  

back
back to the GCP bioinformatics portal page

  

GCP Bioinformatics
and Biometrics