• Microarray or mass spec?
  • Average by sample location?

  • Do PROC MEANS on each patient to verify normalization

  • Create/verify peak definition file for each dataset (if not provided by investigator)
    • Peak ID: 1-n
    • Patient ID
    • sample location # for patient
    • spectra # for sample location
    • Group
    • Easy Id for investigator (uniform length)

  • Create/verify patient definition file for each dataset (if not provided by investigator)
    • Patient ID: 1-n.
    • Patient name: name that the investigator can use to easily id the patient
    • number of sample locations
    • number of spectra

  • If averaging Create averaged peak definition file (SasAverageProgramTemplate)
    • Peak ID: 1-n
    • Patient ID
    • Group
    • Easy ID for investigator (uniform length)

  • Create/verify gene/protien definition file
    • Gene/protien ID: 1-n
    • name/label given by investigator

  • Create group definition file.
    • Show number of patients/sample locations.
    • If more than one data set, define and name.
    • Define each grouping for each data set.

  • Verify with investigator.
    • Group definiton file.
    • Number of patients.
    • Number of sample locations per patient.
    • Number of spectra.
    • Number of patients/sample location in each group.
    • Number of Groupings.

  • Generate Scores
    • Mass Spec. (Log 10)
      • WGA - log
      • Sam - log
      • Info - log
      • TTest - log
      • Fisher - 0=0, >0 = 1
      • Wilcoxen - raw
    • Microarray (Log 2)
      • WGA - log
      • Sam - log
      • Info - log
      • TTest - log

  • Create Summery Chart for scores

  • Distance run
    • Verify sign function.
    • Verify distance function.

  • Distance graph lables for each grouping
    • Verify train data sizes with group def.
    • Verify testing data sizes with group def.

  • Training error checking
    • Sample detailed distance output 1-2 per grouping.
      • Verify sign function.
      • Verify distance function.
      • Verify grouping.
      • Verify groups.

  • General Tips
    • Sas
      • Make sure that ttest/fisher/wilcoxen script has the correct number of genes/protients.
    • Averaging
      • Ignore 0 (average only peaks)
    • logging
      • Log only peaks > 0, 0s (non peaks) stay 0
    • Average before taking the log. This will protect the data distribution (2004-04-19).

-- JeremyRoberts - 09 Mar 2004
Topic revision: r9 - 19 Apr 2004, JeremyRoberts
 

This site is powered by FoswikiCopyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback