Nick Richardson's project- Adverse health outcomes of contemporary survivors of childhood and adolescent Hodgkin lymphoma

  • Overall goal is to evaluate neurocognitive function over time by using clinical and self-reported health outcomes of a Hodgkin’s survivor cohort previously ascertained via questionnaires from two(?) Children’s Oncology Group clinical trials.

Specific Aim 1: Determine the cumulative incidence of selected adverse physiologic and psychosocial outcomes in survivors of childhood and adolescent Hodgkin lymphoma treated in the Children’s Oncology Group from 2002 to 2009.

Specific Aim 2: Characterize the association between intensity of therapy and risk of selected long-term adverse health outcomes.

Aim 3: Determine the prevalence of high school graduation in survivors of childhood and adolescent Hodgkin lymphoma treated in the Children’s Oncology Group from 2002 to 2009.

  • This should be assessed among kids that are 13 to 17 at dx who also have a 5-year survey. The patients' cancer treatments should not have caused any interruption in school attendance.
  • Variable is school_complete

Study design

  • There are longitudinal surveys. Nick is entering them in redcap.
  • The surveys are administered after completion of tx, serially. All the surveys are identical!!!
  • There are 2 old clinical trials
  • Deb is chair of AHOD0031, which is a subset of Children's Oncology Group. Deb will send us the paper they put in JCO.
  • CCSS is the Childhood Cancer Survivor Study
  • All of the patients got chemo; some of them also got radiation
  • 2-year project
  • Data is entered somewhere
  • All data for this project is collected.
  • Exclusionary criteria: Patients that didn't receive tx. They are marked in the treatment column of the COG data. treat_no:treatment assignment = 0: Induction only
  • See study schema graph. There are three treatment groups that were randomized, but part of the treatment assignment depended on how the patient responded.
  • Good responders are stratum RER (rapid early responder).
  • Poor responders are stratum SER (slow early responder.)
Sample Size of Hodgkin Lymphoma Cohort (AHOD04N1)


  • Need to think carefully about how to analyze the data because all patients don't have the same amount of follow up.
  • They would like to make statements like "3% have hyperthyroidism by 5 years." I think we need to calculate that among the people with 5 years of fu.
  • For treatment comparisons, we are planning on doing subset analyses within good responders and within poor responders. This is because the randomization scheme depended on the patients' response to tx.

Clinical background

  • HD is Hodgkin Disease, but the appropriate terminology should now be Hodgkin lymphoma
  • Typical age of diagnosis is 15-17.
  • Treatment tends to last 6-8 months.
  • Leukemia and brain tumor patients are known to develop problems within 1 year post therapy
    • Rate of Special Education in general public: Children 3-21 years old served in federally supported programs for the disabled as of 2010 equals 13.1% for all disabilities. U.S. Department of Education, National Center for Education Statistics (2012). Digest of Education Statistics, 2011 (NCES 2012-001).
    • National Graduation Rates: The percentage of 16-24 year olds who have not earned a high school diploma or an equivalent GED was 7 percent in 2011
    • The 2011 graduation rate for full-time students pursuing a first time bachelor’s degree at a 4-year institution was 50%. U.S. Department of Education, National Center for Education Statistics. (2013). The Condition of Education 2013 (NCES 2013-037).


  • paper surveys and on disks
  • entered by Nick et all into redcap
  • Nick has created the redcap db, not in production mode yet, by making fields based on the paper survey
  • I gave lots of input on the redcap db set up and on data request from ccss: childhood cancer survivor study
  • We will have the reg number with which to merge the survey data
  • Variables 2-16 will come from the Children's Oncology Group nick and I met 1/29/2015 sobre specs for data request.
  • Check whether the survey date on the survey matches the survey dates given in COG


  • Date of ending tx is dlasteoc
  • FU time is dlasteoc to last survey date
  • Need to code whether patient had each treatment type (any surveys say yes) and the corresponding date they had it (approximated by the date of the first survey they reported it), and the corresponding time they developed it in relation to date of tx.

  • There is a PDF containing the patient survey for the contemporary cohort of survivors on the research drive. It is under the bookmark "patient questionnaire" and comprises pages 55-73.
  • We don't have date of birth but could calculate it from age at enrollment and date of enrollment.
  • Time of diagnosis should be very close to time of study enrollment.
  • Here is a note from Lu Chen, who sent the data from Seattle:
Here is a spreadsheet for AHOD0031 data as you requested; this is still based on 3/2012 cutoff. We can update the data say after 3/2015 (so extend FU by 3 yrs) later, but want to set up things first to see if the dataset works for your purpose. You may work with this dataset now if you are ready to start the analysis, before we can get the updated dataset which can be in the exact same format with some more follow-up.


The 2 tabs are basically the same dataset, with the first tab having a variable name followed a “label” that tries to explain the variable meaning, and the 2nd tab being the same dataset except 1st row being just the variable name without the label. I think most of the variable labels are self-explanatory as to what they are, with the values all in the “formatted or readable” format.


Below is the list of variables you want and included, with a few notes in the parenthesis ()

Date of diagnosis (and date of enrollment)

Date end of treatment (dlasteoc: the end date of the last End of Course (EOC) form, see EOC explanation below, and nlasteoc: the # for the last EOC form submitted)

    Age (at dx and enrollment)





    B symptoms – by type of B symptom (the overall variable and the 3 individual questions)

    Bulk disease (the overall variable and the 2 individual questions)

    Stratum (stratum: RER vs. SER vs. a few other special cases; treatment_no: randomized or assigned treatment group)

    ESR at diagnosis (esron: ESR collected at on-study)

    Ferritin at diagnosis (no data)

    Radiation fields (no data; however we included a variable below:

bifrtc4c6: whether a RT EOC, i.e., EOC4 or EOC6 was submitted; please note due to missing data submission or off-protocol RT,  no such form does not = no RT; also a RT EOC does not mean the patient finish the entire RT per protocol; it could be protocol RT with early off-protocol. So this variable SUGGESTs RT Yes/NO only)

    BMI or BSA at diagnosis (or height/weight if that is all you have): (Ht, Wt, and BSA reported on the first EOC form; 7 pts had no EOC form submitted so no data)

    Relapse and Date of same (date of relapse/PD, SMN, death, and last contact).

In addition, a variable called AMEND which indicates whether the patient was pre- or post-amendment #1 (3 cycles vs. 2 cycles).


EOC definitions are listed on the data collection form for reporting period, and copied below:

[] - Phase 1, Cycles 1-2 of ABVE-PC Induction (All Patients) (EOC #1 on SADD report)

[] - Phase 2, Cycles 3-4 of DECA (for SER Augmented therapy arm) (EOC #2 on SADD report)

[] - Phase 2, Cycles 3-4 of ABVE-PC Continuation (for RER and SER-Standard Arm) (EOC #3 on SADD report)

[] – Phase 3, End of IFRT (for RER cycle 5) (EOC #4 on SADD report)

[] - Phase 3, Cycles 5-6 of ABVE-PC (for SER Augmented therapy arm) (ECO #5 on SADD report)

[] – Phase 4, End of IFRT (for SER cycle 7) (EOC #6 on SADD report)


Meeting notes

2016 January 6

  • Accession number is just the order that the patient was enrolled in the study. Ignore it.
  • Nick will check on the data items I listed for him to check on my report.

OLD - Long-term Cohort - Study population includes 529 patients who will be at least five years from diagnosis at study entry, alive without evidence of recurrent disease, treated 1986-2000 on targeted risk-adapted chemotherapy protocols.

- Short-term Cohort - Study population includes 653 patients who are at least 1 years to greater than 10 years from diagnosis at study entry, alive without evidence of recurrent disease, treated 2002-2009 on targeted risk-adapted chemotherapy protocols.


  • If they show a deficit in function, they would like to do a prospective study or routinely apply neurocognitive testing. Perform established neurocognitive testing on Hodgkin’s survivors treated at Vanderbilt University Medical Center
  • For the aim of studying time of onset of deficits since tx, only the requirement of special education and the problem-solving questions will be useful (could establish this).
  • They think the problems develop soon after therapy.


    • For the risk factors, we need to make sure they are measured before the treatment in order to be considered risk factors. For the medical problems, we have this. For the psych ones, we do not.
    • descriptive graphs
    • propensity for finishing high school.
    • or make a more ordinal score for highest level of education
  • Inference issues
    • The outcomes and definitions for this study need to be completely overhauled. We are very limited by the data collected and (assumed) lack of planning and thought to study design for applicability to these research questions.
    • An issue with wanting to learn about time to onset of neurocognitive deficits is that we have cross-sectional (not longitudinal) data on the patients. We have one observation time for each patient, and the time of observation (since completion of therapy) varies across patients. If a patient is observed to have a deficit, the time of onset will not be known precisely, but we will know that it occurred before the time of observation.
    • We have (, in a sense,) time-to-event data for requiring special education. We have varying follow-up time among the patients, which should correlate with the outcome. One resolution for this is to define it as requiring special ed within one year of the end of tx. Again, the way we define it really needs to be as un-arbitrary and generalizable as possible.
    • How to analyze the 25 questions?
    • The educational attainment outcome is problematic because of differences in age at beginning of tx, duration of tx, and time (from end of tx) to assessment. I'm not sure we can come up with a definition/algorithm that will be generalizable.
  • We don't need to account for time in therapy when we decide what education level someone is expected to have completed. This is because they think the treatment wouldn't stop them from keeping up with school.
  • Info for sample size calculations
    • Base on power for estimating proportions within a certain margin. We will need (for power analysis and for comparison in paper) the rates of special education and of graduation rates.
    • Long-term Cohort - Study population includes 529 patients who will be at least five years from diagnosis at study entry, alive without evidence of recurrent disease, treated 1986-2000 on targeted risk-adapted chemotherapy protocols.
    • Short-term Cohort - Study population includes 653 patients who are at least 1 years to greater than 10 years from diagnosis at study entry, alive without evidence of recurrent disease, treated 2002-2009 on targeted risk-adapted chemotherapy protocols.

Recommendations for redcap:

  • please give us api access for the redcap project did this for myself
  • I was able to access the user rights, and I added api import and export. Also changed the de-identified field because it says "De-identified means that all free-form text fields will be removed, as well as any date/time fields and Identifier fields." I changed it to "Remove all tagged identifier fields" for now. Previously, I had "view and edit" rights for "data entry rights," which "*only* pertain to a user's ability to view or edit data on a web page in REDCap (e.g., data entry forms, reports). It has no effect on data imports or data exports." I've changed it to "Read only," to prevent me from accidentally making changes in the database.
  • Added a record number which is just a unique id for each row in the data, or survey.
  • Would we need to change the reg_number variable so that it is not a record ID field? This would be so that we can try setting this up in a tall/long format, ie, with multiple rows per person. I think this won't be possible if we have the reg number defined as the record id. however, it may be possible that you have to have one record id. Could this just be any unique number for each row? Like 1, 2, 3, ...?
  • Recommend we add min and max to reg number . probably min should be 0?
  • For accession number: should this be set to "Integer" for validation? I recall this variable is to identify which survey it is.(?) Are the accession numbers supposed to be 1, 2, or 3? If so, add max and min.
  • for the variable "date" I strongly suggest giving a min and max validation requirement. This prevents us from sending you a list of all the dates which were a thousand years ago. For example, you may know that all the surveys were collected between 2007 and 2014.
  • also for the date variable, I suggest you call this something more informative, like, surveyCompletionDate, or surveyDate.
  • Add field for survey date. There might have already been a date variable.
  • For question A1, right now you have this set up as one question. I suggest you instead delete this question and make an individual yes/no question for each of the answer choices (9). *While it can be thought of as one question, it is really also a series of questions: did you see a gynecologist (y/n), did you see an internist (y/n), etc. When you set this up a one field in redcap and use checkboxes, it automatically creates one variable for each of the answer choices. However, the variable names are health_providers___1, health_providers___2, ...health_providers___9. These variable names don't give information about what the question is asking. This creates the potential for errors in data manipulation and analysis and even in interpreting output results. Your new names could be something like: providerNone, providerGyne, etc., or sawNoProvider, sawGyne, sawInternist, etc. This could be set up as a matrix of questions like you've don't for other questions.
  • When designing new surveys or any data collection instruments, I'd very strongly suggest that you include a "no" field for each type of answer to questions such as this, where we would naturally think to write "check all that apply." The reason is that when you enter and analyze the data, you cannot legitimately distinguish between a "no" value and a missing value. All you know is that the person either checked the box or did not. If they didn't check it, we don't know whether it was because their answer was "no" or because they skipped the question or didn't read it.
  • For loc_health_care, is it possible for patients to have received care at more than one of the choices? If not, this one really should be either a radio button or dropdown list. That way you won't be able to accidentally enter two choices.
    • If they could have really gotten care at multiple locations, it would still be better as a series of y/n questions.
  • hosp_admission: set up validation for integer with min and max restrictions. Min is probably 0, and you probably have an idea of a number that would be the max possible (20? 8? 100?). This will prevent you having to go back and check the paper surveys if we find 187 in the database.
    • also variable name could be more informative: numberHospAdmissions? hospAdmissionTimes?
  • surgeries: could be more informative: any_surg, any_surgeries
  • antipsychotics: this isn't a database issue, but this question is about antidepressants mainly. Only one of the six examples given to the patient is an antipsychotic, while the others are antidepressants. "Antipsychotic Medications (for Depression, Anxiety, Post-traumatic Stress Disorder, or other Mood Disorders) such as Elavil, Prozac, Paxil, Zoloft, Navane, Ritalin or others." People analyzing the data itself, without reading the survey will assume that the question referred to antipsychotics, and their output and written summaries will refer to antipsychotics. This individual survey may be relatively short, so that you will probably remember what the question actually referred to, but when you combine it with the other sources of data, or you or the analyst comes back to the study 3 years later it will likely be assumed to refer to antipsychotics.
  • There's no field in the db for age of last menstrual period.
  • There's no field in db for H30, number of births
  • There's no field in db for question H.23. for other form of birth control, and also for the place where the patient can specify the other method.
  • Change order of yes/no to be consistent.
  • Add automatic skip patterns
  • I'm not sure if this information will/might be important or not in the future, but the survey includes a date of onset fields for the questions in B, C, and D, but I don't see them in the redcap database.
  • For the period_meds, the order of the answer choices, Yes/No, is opposite of that in this question on the survey (and also opposite of the order for the other y/n questions).

  • For the second cancers and relapses, there are fields in the survey for date of recurrence or dx for G3 and for G6, but I don't see a field to capture these in the database. I see them now. You probably should decide on one format for entering these dates, since they specify month and year. For example: mm/yyyy or mm-yyyy. Then, it would probably be helpful if you add some text in the "field note" field in the online designer for that question. For example, if you decide to use mm-yyyy, you can add "mm-yyyy" in the "field note" box to remind you what format to use as you're entering data.
  • Not sure if you need this, but there aren't any fields for age at first occurrence for H1-H5.
  • For K3, last_employed, the survey form provides a space for the person to enter their time in weeks, months, or years. Then on the redcap form, there is one field. An idea I have is to choose one of the three units and add a field note saying what units to enter the values in for that variable, and then, as you enter the data, convert whatever the paper survey has into the units you chose. There may be other ways I haven't thought of.
  • I noticed that there aren't any fields for M6-M9.
  • For pregnancies outcome, the answer choices are in the reverse order as they are on the survey. This might make it easy to make mistakes.

JoAnn will :
  • Check on wide vs long format in redcap Decided to try to just make it a tall file, not using any type of longitudinal module. He will need to NOT identify any variables as identifiers, because it will probably not allow multiple entries per id.
  • Check on best way to get both the data from Children's Oncology group and the survey data into a usable database. For example, is it better to keep variables in the database Tatsuki says it's better to just make a separate dbase in redcap and then merge it. _ _Discussed this with will and met with Nick. We are recommending he ask for csv in utf-8 encoding with some specs for metadata. See email in folder.
  • Best way to do double data entry. Check with Will and possibly Veida about best ways. Will thinks just looking at them is probably good. Or we could do the redcap double entry module.

Send us:
  • Info about the overall big study that Deb is chair of
  • Paper surveys
  • paper in jco
  • protocol

Meeting notes

2015 April 1

  • They are still entering the data into redcap.
  • There will be updates to the aims based on the data they have available.
  • Aim 1 will be to describe the cumulative incidence of late effects and secondary malignancies.
Cumulative incidence is calculated by the number of new cases during a period divided by the number of subjects at risk in the population at the beginning of the study.

It may also be calculated by the incidence rate multiplied by duration: is the number of new cases within a specified time period divided by the size of the population initially at risk.
  • Aim 2 will be a subset analysis on patients who were 13 to 18 years old at the time of diagnosis and use the 5 year data.
    • They want to evaluate cognitive function, but the only measure they have is whether they completed high school.
  • The questionnaires should be within a couple weeks of their target times.
  • Inclusion criteria for this aim is having "9 to 12 years" as the highest education completed at baseline. (dx?)

2015 January 29

  • Nick and I met to discuss how to request the data from Children's Oncology Group. We are sending a list of vars we need? but asking for all of them?
  • We will merge by reg number and assession number. The reg number is unique to each patient.
  • We don't need to enter it in redcap.
  • I asked him about making sure the id numbers are in a format consistent with those we already have.

2014 Feb 25 and March 7 Meetings

  • Survey data is completed
  • This is a subset of the stuff we talked about in Feb/March
  • Current data is in paper surveys and on disks

-- JoAnnAlvarez - 05 Mar 2015
Topic revision: r4 - 21 Jan 2016, JoAnnAlvarez

This site is powered by FoswikiCopyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback