-- JoAnnAlvarez - 28 Mar 2014

Active Surveillance Project


  • In PCOS data, is there a systematic difference in patients who have complete staging info and those who don't?
  • For the low risk category, it says that unknown stage in ceasar can qualify. Actually, having nonmissing stage is one of the exclusionary criteria for this analysis.
  • Missing values of comorbidities in ceasar. Calculating number of comorbidities if a few are missing? I think Dan told me to go ahead and do the imputation I proposed. This proposal is at the end of the active surv pdf.
  • For comorbidities, matt's paper used the six month surveys (in pcos) to define comorbidities. Should we be using the person file? It used the six month survey or imputed if missing with the 12 month survey. "The following indicators were set to yes if the person said he had even been told by a doctor of the condition: stroke, liver, ibd and ulcers. The rest of the indicators were only set to yes if the person answered that he was limited by the condition OR receiving medication for the condition."

Aim 1: To identify differences in the patient and disease characteristics among men with prostate cancer who chose conservative management (watchful waiting (WW) or active surveillance (AS)) from two eras (1990’s and 2010’s). We will compare the observation cohort within each study with respect to patient and disease characteristics. We hypothesize that use of surveillance for prostate cancer will have grown only slightly, but that the patients under surveillance in the later cohort will be younger, healthier, and have lower risk disease, indicating a modest increase in comfort among patients and providers with observing prostate cancer.


  • Aim 1: To identify differences in the patient and disease characteristics among men with prostate cancer who chose surveillance from two eras (1990’s and 2010’s). The goal is to determine the extent to which observation has shifted from a palliative strategy for high-risk prostate cancer patients (WW) to a deferred curative intervention strategy for men with lower-risk disease (AS). A part of Aim 1 is to look at this among low risk patients only.
  • Aim 1b: Some advocate for active surveillance for low risk men under 70.
  • For aim 1, we want to show both: 1) surveillance is used in different proportions in the 2 cohorts. This could be answered with univariate stats. and 2) patients in the new cohort who get AS are younger and healthier.
  • Look within low-risk group/ low-risk, low age, low comorbidity

Inclusion criteria

  • Men 80 and under, but 40 and over
  • PSA < 50 at baseline
  • Clinically localized
    • Only T1 and T2 (clinstage 11-15 in pcos) (Exclude T3)
    • What about patients who are missing stage?
  • Patients with complete data (revised)
    • The purpose of this requirement is just so they have the applicable variables.
    • CEASAR - must have baseline, 6-month, medical chart abst (do not need 12-month survey)
    • PCOS - 3718 with complete data. What data collections?

Data: ceasar and pcos

  • We need a combined data set with the PCOS and CEASAR data for analysis.
  • We did some preliminary analyses about a year ago, but now we are using more formal CEASAR variable definitions and we have more complete data. Previously, we had planned to use the combined data set (PCOS/Ceasar) that Sharon made for Matt's project. However, there are some variables that we need that aren't in it. (depression, passivity scale PDHCO, other?) Also, there are no exclusionary criteria applied. So I think that we will end up needing to re-do the data pulls from PCOS, data pul from ceasar, and the merge.
  • We need to select patients for the combined data set that form a group of comparable patients. (same inclusion criteria for both cohorts)
  • Alisha and Matt have previously combined these two cohorts for analysis.
  • In re-doing the pcos data creation code, we are not spending time aligning secondary treatments and other variables that are not relevant for the current project for now. For the AS project, we are only concerned with data from the first year.
  • For almost all variables, we will use the same variables that Sharon used and same way to calculate the quantities that Sharon used for Matt's paper.
  • However, for treatment, we need to use something based on treatment or tx.derived (in ceasar), since it has much more info.
  • There is a spreadsheet that tracks the combining of the two datasets. I will build on it as I go.
  • In PCOS, the "six month" data is considered baseline.

Outcome: primary treatment

  • Should be the treatment they got first.
  • Treatment considered surveillance/observation/no treatment if there is no evidence in our data that they got any tx other than observation/surveillance.
  • CEASAR: Use pdm.tx.Derived, which was created by categorizing tx.Derived. (TK copied the code into the )
  • PCOS: Use variable TRTMENT (person file). _For Matt's study, they used primtrt: primtrt = 1 for observation_Here's how to categorize:
active surveillance 1 = Watchful waiting
surgery                    2 = Rad prost + XRT + hormone
surgery                    3 = Rad prost + XRT
radiation                    4 = XRT + prost
surgery                    5 = Rad prost + XRT - unable to sequence
surgery                    6 = Rad prost + horm
surgery 7 = Hormone + rad prost
surgery                    8 = Rad prost + hormone - unable to sequence
radiation                 9 = XRT + hormone
radiation               10 = hormone + XRT
radiation               11 = XRT + hormone - unable to sequence
surgery                 12 = Rad prost only
radiation              13 = XRT only
hormone              14 = Hormone only

Exposures of interest:

  • Age at diagnosis (interact with cohort). Interest in the following groups: 40-59, 60-69, 70 +
  • Comorbidity (interact with cohort)
  • Cohort (ceasar/pcos)
  • Risk group (interact with cohort)
    • indicates chance of recurrence after treatment. We may call it modified D'amico risk. This is orthogonal to comorbidity.
    • Damico score combines clinical stage, gleason, PSA. Attempts to characterize risk of recurrence after tx. Damico score is not available in PCOS, so we are making a modified version.
    • One form of this variable will have three categories: low, medium, and high. There will also be a very low risk group, which will be contained in the low group. This should be a separate variable. These definitions are kind of complicated, and they are in a separate table. There are different rules about how missing values of the criteria variables are handled.
    • Very low risk (binary): PSA <= 10 and Gleason 6 and (nonmissing) T1 (clinstg = 11 in PCOS, all T1 in Ceasar (T1a, T1b, T1c,))
    • Low/Intermediate/High Risk variable:
      • Low risk: PSA <= 10 and Gleason 6 and either T1 or T2 or missing stage. (Actually, having nonmissing stage is one of the exclusionary criteria for this analysis.) So...
      • Intermediate: 10 < PSA <= 20, or Gleason 7 or T2b (last one is for ceasar patients)
      • High: PSA > 20 or Gleason 8-10 or stage T2c.
    • Must have all three low risk criteria to meet the low risk category. Any of the high risk criteria automatically qualifies you for the high risk category.
    • We could have defined risk with just one risk variable with four categories including low and very low. We are not doing that because there are people we are sure that are low, but we are not sure whether they qualify for very low or not. This is because the exact stage is unavailable for many of the pcos patients.

Covariates / variables

  • Race
    • Ceasar: race.Derived: from vand.race.f6 and imputed with race.hisp from registry if missing
    • PCOS: bestrace, from sample.csv from person file. Matt's paper used baseline/6month q 53 and 54
    • Matt's paper used black, white, hispanic, and other in the table
  • Overall qol (baseline): QOL is bladder/bowel/sexual function at baseline. We think there is a difference in reporting in the two cohorts. The newer cohort is more likely to give a true report of qol than the older one. Should we use z-scores (relative to the patient's cohort) for these measures?
    • Decided to use z-scores calculated on EPIC in ceasar and PCI in pcos. As a sensitivity analysis or for answers to reviewers' concerns, we could use the (unnormalized score) calculated on the common items (the way Matt did it).
  • Stage
    • Do not lump "T1/T2" (in pcos) in with other stages. Make a new stage var.
  • Gleason
    • We might use separate algorithms for the different cohorts if we use grade as a substitute for missing gleason in PCOS patients. Otherwise, we want to use the same definition for both cohorts.
  • Comorbidity
    • Calculate number of comorbidities based on the ones in the person data file. (pcos)
    • the comorbs. In model: 0, 1, and 2 +. For the subset model among only the "Eligible for treatment group" low risk patients, we will use 0 comorbidities or hypertension only.
    • Dan wants to have the individual comorbidity variables in the combined data in case we need them, in addition to the composite
    • For modeling, We think either 0 vs 1 vs 2+, or 0/1 vs 2+, or 0/1/2 vs 3+. Need to ask Dan.
  • depression (actually, for these 2 vars, we may only need them for Ceasar)_not using for aim 1_
  • passivity scale (PDHCO)_not using for aim 1_
  • site: want to control for site. Need to revisit in meeting
  • Education (use the variables previously used by Sharon for Matt)
  • Marital status
    • PCOS: maritalStatus, Baseline/6mo – q56. If missing, used q61 from 12 month
    • ceasar: Marital.status.Derived, from bg.relationship.F06
  • Income (use the variables previously used by Sharon for Matt). They adjusted for inflation.
  • Insurance (use the variables previously used by Sharon for Matt)
    • The way the insurances were previously categorized did not fit the categorizations of ceasar or Matt's paper. I couldn't find code that made any sense. I made my own code:
   # Insurance :
   ## This is Frank's code that they used earlier.
   # if response is yes to A8B (medicare)- code as "Medicare"
   # among remaining: if response yes to A8D (Private), A8E (HMO) or A8F (VA/military)- code as "private or military"
   # among remaining: if response is yes to A8c (medicaid or other public) or A8G (other)- code as "Medicaid or other"
   # among remaining: if response is yes to A8A- code as "no insurance"
   # everyone else should be missing
   insuranceFrank = factor(ifelse(a8b == 1, "Medicare",
      ifelse(a8d == 1 | a8e == 1 | a8f == 1, "Private or military",
      ifelse(a8c == 1 | a8g == 1, "Medicaid or other",
      ifelse(a8a == 1, "No insurance", NA))))),
   # Separate private, military, and other
   insuranceNew = factor(ifelse(a8b == 1, "Medicare",
      ifelse(a8d == 1 | a8e == 1, "Private",
      ifelse(a8f == 1, "VA/Military",
      ifelse(a8c == 1, "Medicaid",
      ifelse(a8g == 1, "Other",
      ifelse(a8a == 1, "No insurance", NA))))))),

   # Per Penson, First check if they said "no insurance." Then look at the variable insure.
   insuranceOld = factor(ifelse(a8a == 1, "No insurance", insure)),


Main model: tx ~ cohort + cohort*(age + risk + number of comorbidities) + other covariates + residual psa (lowest psa in the risk group?) + residual age spline? Subset model: tx ~ cohort + cohort*(age + psa + number of comorbidities) + other covariates + residual age spline? among only 0 or 1 comorbidity, young, and low risk (and again in very low risk).
  • Employment: leave as four categories.
  • Overall health: have more categories: combine fair and poor.
  • Education: less than college (high school or less), college or some college, advanced degree? Or see what we did earlier.
  • Insurance: Other, medicaid and none should be a group. Private; medicare; VA/Millitary. (4 categories)
  • Outcome of interest: treatment (observation vs. definitive local therapy)
  • Model covariates (in addition to exposures of interest): race, baseline urinary function, baseline sexual function, baseline overall QOL, marital status, education, income, insurance, site. We will add terms for residual age and residual psa.
  • Sensitivity analyses/ other iterations
    • Only including sites that were in both ceasar and pcos, which are Utah, Los Angeles, and Atlanta (Emory).
    • Among only those eligible for any of the therapies (except hormones): young age (70 and under), few comorbs (0 or 1), low risk (low and very low.). Do this also omitting people who only had hormones and no other treatment.
    • Omitting people who only had hormones and no other treatment
  • This should be a list of all the models we need:
    • main model
    • main model on subset on common study sites
    • main model on subset excluding people who only had hormones and no other treatment
    • low risk, young, few comorbs (Same model (except without risk variable))
    • very low risk, young, few comorbs (Same model (except without risk variable))
    • low risk, young, few comorbs on subset excluding people who only had hormones and no other treatment(Same model (except without risk variable))
    • very low risk, young, few comorbs on subset excluding people who only had hormones and no other treatment (Same model (except without risk variable))
  • We want to account for small differences in psa that are not captured by the risk score. One option would be to put PSA in the model in addition to risk stratum. Add psa as a separate covar.
  • Ways we can present the model results in a figure. Dan says he prefers predicted probabilities rather than odds or odds ratios because of their interpretability. Then said to use odds ratios to reduce the numbers of curves on figures.
  • Ways to present the model results: a forest plot with ORs of AS for age (eg, 60 compared to 75), comorbidities, PSA, and gleason, both ceasar and pcos
  • We are modeling under the assumption that PSA, gleason, comorbidities, and age do not interact with each other. We know that this is an oversimplification.
  • Clinically, Dan thinks there may be interplay between age, risk, and treatment. Interaction! Avoid 3-way interactions. while technically correct, too complex
  • Discussed centering number of comorbidities by cohort mean, but we are considering these differences between the cohorts as real differences (as opposed to erroneous data (eg. incorrect reporting)), and thus not going to adjust.
  • Options for modeling comorbidities in regression: either number of comorbidities or make a composite heart problems (the three variables: angina, heart attack, ?), and a stroke indicator. We could also combine stroke with the heart variable, and it would be an indicator of cardiovascular problems.
  • Correlation due to site: We want to account for variability due to geographic location. This is not something we would care to estimate. Should we use a random effect for seer site? (Make sure the var for this in pcos is a factor, not numeric.)
  • Unadjusted output requested:
    • Get freqs of low risk patients and very low risk patients (1) 70 years old and under and (2) with 0-1 comorbidities. Also by cohort.(?)
    • Bivariate comorbidity association table with treatment choice. (?)
    • Figure. 9-panel figure based on regression model.


  • looking at the unadjusted counts, we see that there is more sensible use of observation in the current era (ceasar)

Answered questions

  • For the calculation of the functional status variables, in PCOS, Frank's code is using the six month data. Is that correct? yes_The algorithms described in the documentation say we will use the baseline data. Are these available at baseline? _Dan emailed Matt and Dave about this. For pcos, six month is the baseline.

Completed (This may refer to stuff I did for the code to make the pcos data.)

  • Table 1. Sociodemographic characteristics (age, race, charlson) of PCOS and CEASAR cohorts (overall, among patients meeting inclusion criteria)
  • Table 2. Clinical characteristic of PCOS and CEASAR cohorts including PSA, clinical T stage, biopsy gleason, urinary function, sexual function, overall qol (SF36 single item)
  • Table 3. Treatment choice IN EACH COHORT by modified D'Amico risk stratum (unadjusted, including all treatment options - radiation, surgery, primary hormones, surveillance)
  • Table 4. Characteristics (Age, Comorb, PSA, Gleason, Clin stage) of surveillance patients in each cohort.
  • Stage
    • use clinstg in abstract
    • values of 13, 14, and 15 mean that the stage is known to be 1 or 2, but there is not enough information to determine which
    • exclusionary criteria: exclude 21 and greater
    • We will use this var for the purpose of defining a very low risk group, whose definition includes those with clinstg = 11.
    • We will not use stage in models for AS project because of the high amount of missing stage in pcos patients.
    • If I make a table for stage, those with 13, 14, and 15 will be missing. Make sure we are not recoding them to stage 1 or 2.
  • Gleason score
    • use d5_rec in abstract. (p 33 or 25 of the data dictionary)
    • For those that have a missing value (including 0 or 66, 99), impute using WHO grade, which is seergrde in person.csv using the following mapping:
      • 1 -> <6
      • 2 -> 7
      • 3 or 4 -> 8-10
    • Exclusionary criteria: exclude those with 0 or 66 first impute using WHO grade
  • Treatment
    • trtment from person file (sample.csv) pulls together data from 4 different sources. This is a six-month observation.
    • primtrt in abstract.csv is from the medical chart and doesn't include any patient report. This is a six-month observation.
    • Group into the following categories: no treatment, active treatment, hormone only._check_
    • We will use trtment. Impute missing values with primtrt1._check_
  • PSA
    • use b2 from abstract
    • be sure to do recode according to data dic. (8888.8 and 9999.9 should be set to missing).
    • This is one of the exclusionary criteria (exclude over 50)

Aim 2: Participatory decision-making: To determine the extent to which a malleable factor, involvement of the patient in medical decision-making (so-called ‘participatory’ or ‘shared’ decision-making), influenced patients to select AS in the modern CEASAR cohort, using available data from the baseline survey. We hypothesize that patient decisions to select AS will have been influenced by their level of involvement in participatory decision- making, controlling for components of the clinical scenario (age, comorbidities, clinical disease characteristics, and baseline urinary and sexual function.)

Aim 2 is to look at participatory decision making and tx choice in the ceasar group only. A sub-aim of 2b is pdm as an outcome with race as the exposure.
  • 2a: race predicts pdm (Shreus is on this one)
  • 2b: pdm predicts treatment (Shreus is on this one)
  • Aim 2 is on Ceasar data only, not on combined data.
  • Sharon is working on this one.
  • My question for the participatory decision making study: are we going to include an interaction in the model for patients we think should go with AS?
  • Want to control for demographics (insurance, marital status), baseline function, psychosocial measures (depression, passivity scale (PDHCO)), comorbidities, risk. Eventually want to know if all these variables are confounders
  • looked at conceptual model for pdm and other predictors of tx type and then tx type predicting long term patient outcomes (EPIC
  • There will be a series of models with the following subsetd: all, low risk only, everyone except hormone only patients, possibly by race.
  • Analysis
    • Model: Sharon will look at a correlation matrix for the independent variables in the model to aid in possibly paring down the model

Propensity score

  • Sherry wants to make a handful of composite variables and adjust for them. What is the goal of this? Could be just a personal preference.
  • Dan says that part of the drawback of the composite variables is that you do not get the interpretation of the model that you would get with

Data change log

  • 2015-04-03 I changed some tx assignments in the ceasar data. (See notes on ceasar data.) The effects on this cohort were that 12 patients were added to the observation group, two were removed from the surgery group, and 10 were removed from the radiation group.

Meeting notes

2015 April 20

  • For Active Surveillance, Dan B was saying that what John was talking about a washing out of the difference in effect age by cohort(?), and that should be explained in the text.
  • Which interaction was it?
risk by cohort age by cohort is getting washed out by the difference in how we use risk in the cohorts
  • What should the strategy regarding this be? We decided to explain it in a line or so of text within the results.
  • mmm. How do we know this anyway?
Should the explanation be in the discussion?

2015 April 13

  • John said the paper is focusing on two questions. (1) How are risk factors currently (Ceasar) being used to inform the choice of observation as a prostate cancer and (2) how has that changed from their role 10 years ago.
  • Finish table 3. Ceasar in numerator. Fill in the column that Dan wants with the p-value for interactions. Check whether the interaction term est and (wald?) pval are identical to the interaction term, and if not, why and what is it??
  • Get CIs for table 3
  • Fill in highlights in manuscript with numbers that they requested.

2015 January 30

  • The main conclusions are "we are using AS judiciously with regard to important clinical factors (age and risk group)"
    • Other "significant" results are site,
    • Comorbidities were not significant, so we need to interpret this
  • There is a draft

  • Make sure my tables in the report are formatted the same as he outlined.
    • Eg, match the categories.
    • We are asking that John fill in/format the tables in word himself.
  • Table 3 is like the ones on page 14. treatment by cohort columns are cohort ( ceasar and pcos ), stratified by risk. Kind of like stacking those tables on page 14.
Table 4: Multiple linear regression model for likelihood to elect observation (Entire cohort model) i.Included covariates in table: cohort, age, risk of recurrence, comorbidity ii.Will note in caption: “Adjusted for site, race, income, marital status, education, employment, sexual and urinary domain scores, insurance status, overall health” iii.Will note significant interaction terms cohort*age, cohort*risk in the text. iv.Joann/Totsuki: what is the simplest way to put this?

  • Table 4 seems to be communicating similar info as figure 1. But the fig has probablilities and the table has ORs.
  • The description of table 4 that John made had a descriptions of the info they want to convey, and the mock up table that dan sent is his idea for doing this.
  • The interactions are between cohort and age, cohort and comorbs, and cohort and risk.
  • we kind of like the table dan sent.
  • Email John saying that there is a section about
  • there is a remaining question about getting numbers about psa draws and prostate biopsies post-dx. We have counts/infor for the first two items, but he wants "Obtain men who transitioned to local therapy?"

2014 October 21

  • For age, do not use residual age and also age category. We will just use either a linear term or a spline.
  • We will then make the 9-panel figure just picking one age from each of the categories: 40-59, 60-69, 70-79; 55, 65, 75
  • We need to put out an abstract to AUA. Get John a couple of sentences about the methods.
  • We may later revisit the idea of subset model with psa as a separate linear term. (Right now, we are only using psa in the risk variable and not taking into account the differences within bins)

2014 October 8

  • For all the psa draws: treat all the 999, 9999, etc as no. The missings are no. Other questions: John.
  • Check report for tables that are cut off.
  • Discussed whether we really need to refit the models on low risk patients
  • Dan doesn't think we have info on the dates of prostate bx in ceasar.
  • Do the comorb imputation.
  • add residual age and residual PSA to models
  • make the 9-panel graphs we talked about
  • fit a confirmatory model in very low risk (or just re-run model, subdividing LR into VLR and not VLR)

2014 September 17

  • Another of their papers
  • Need to address how to model comorbidities:r 0 vs 1 vs 2+, or 0/1 vs 2+, or 0/1/2 vs 3+??? This will inform the way we handle 1 or two missing items from the comorbs. In model: 0, 1, and 2 +. For the subset model among only the "Eligible for treatment group" low risk patients, we will use 0 comorbidities or hypertension only.
  • Address how to address site variability

2014 September 3

  • How to model comorbs: TK and JA talked about this. We don't favor a linear term or using a spline. We think either 0 vs 1 vs 2+, or 0/1 vs 2+, or 0/1/2 vs 3+. Need to ask Dan.
  • work on models before getting the counts of psa draws, etc.
  • discussed how to deal with patients who are missing a few of the 8 comorbidities and thus have missing value of number of comorbs: decided to email Matt and Sharon asking what they did. Possible hot deck imputation. Could use info from either that patient or info from the whole distribution. This is partially dependent on the way we choose to model number of comorbs. Matt says he didn't do imputation but instead used individual types of comorbs.
  • need to revisit idea of proportion of variability due to site: defer discussion/decision to later meeting.
  • We will not use pdhco or depression for this study. Update notes in report, etc.
  • I've calculated any hormone tx correctly, however, that is not of interest for this study. What we need is an indicator of hormone only use. This is for the purpose of the secondary analysis excluding patients who only got hormone therapy.
  • Wants to submit to AUA with model results at beginning of November.
  • JA to do:
    • check dist of bl functional status in Matt's paper against those I've calculated.
    • work on modeling
    • make table of number of missing comorbidities by number of comorbidites among the answered (nonmissing) items. (group number of comorbs to 0, 1, or 2+)
    • later work on those other things Dan asked about: psa draws and prostate biopsies
    • email Matt and Sharon about how they handled comorbs
    • check out the error on page 12 of active Surv report.
    • Investigate the distributions/whatnot of the functional status z scores.
    • update notes re: use of pdhco and depression in models, etc.
    • Calculate indicator of only hormone tx. Take any hormone use out of tables and replace with hormone only. (Don't need to replace in the tx by cohort tables.) Update Ns reported at top of report.

2014 August 25

  • Talked about whether it's necessary to include a propensity score for cohort. Convinced Dan that we don't. There are unmeasured differences between cohorts, but (1) we may not have all the vars necessary to capture the differences, and (2) to answer the research question, which is differences in AS patients in the different eras, we shouldn't adjust for time.
  • need to revisit idea of proportion of variability due to site
  • For the functional status covariate, we will use the z scores on EPIC/PCI. As a sensitivity analysis, we can use the (unnormalized score) calculated on the common items (the way Matt did it.)
  • looking at the unadjusted counts, we see that there is more sensible use of observation in the current era (ceasar)
  • Watchful waiting is different from active surveillance. Both are considered observation.
  • To dos:
    • Take out table of demographics by cohort among observation patients only
    • Add new table of tx choice by cohort among low risk and under 70 and comorbidity 0 or 1_added to code._
    • Check whether numbers of missing comorbidities in ceasar are due to patients missing the whole 6 month survey
    • Get counts of prostate biopsy among active surveillance patients. Use mca.prostate.biopsy.
    • Among ceasar patients, find number of PSA's drawn after diagnosis. The vars are mca.psa2.yn and mca.fu.psa1, (?? Need to look at redcap form.)

2014 August 18

  • Talked a little about modeling psa. Ask TK if there are problems with just putting psa itself in.

2014 August 6

  • Discussed modeling form and iterations/sensitivity analyses
  • Will do a subset model on low risk and under 70. Only subset analysis is omitting ADT.
  • Talked about modeling age and psa.

2014 July 23

  • Sharon will make a few changes to the dd for the three year surveys for purposes of sending to capsure. We will check with Eden to see if
  • The measure in pcos of urinary function should align (loosely) with the epic incontinence. We will ignore irritative for now.
  • UCLA PCI is the name of the instrument used in pcos for functional status
  • Discussed the way the risk strata variable is defined: Very low is contained in low. The sets High, Intermediate, and Low form a partition of the sample space. Patients who are known to be very low will be used in subsets (in a table on the subsets or in a regression on the subset of known very low.) The high/intermediate/low will be used for display in the tables and possible as a regression coefficient in some models.

2014 July 9

  • Did some review of progress.

2014 June 11

  • Looked at table with exclusion criteria.
  • Dan wants to see numbers excluded for the different reasons by cohort.
  • Change the levels of the treatment variable so that "active surveillance is replaced by "observation."
  • Discussed the functional status variables. These will be used as covariates in the model for adjustment.
    • The functional status vars in PCOS are from an instrument preceeding the EPIC. There are some items that both measures have in common.
    • We discussed ways to align these for comparison. Here are some options:
      • Just align the two sets of three summary scores and compare them as they are, even though they measure slightly different things.
      • Align them as Matt did for his previous paper. He extracted only the questions that the two scales had in common and then scored them.
      • Use Z scores that are normed for each cohort. This would involved centering and scaling by the cohort mean and standard deviation, respectively. This is attractive since we think that the reporting truthfulness varies by cohort/era.
    • Dan prefers the Z score method. I do also. We may end up having to do one of the other methods if the reviewers request it.
  • Discussed how to handle missing values of the individual comorbidities. Dan wants to do whatever Matt did. He emailed him about it.
  • After we get numbers of missings by cohort, send revised email to Matt and co.
  • I noticed that there's something weird about the comorbidity count variable. It doesn't sum to 8. Check this.
  • Calculate risk strata. See newer .doc file.

2014 May 14

  • Discussed which variable for treatment in pcos to use. 4 = XRT + prost should be radiation.
  • Also discussed how to use the baseline functional status. Dan is going to send a note to the group. Will document.
  • Instead of spending time aligning the baseline functional status variables, we are going to think about this.
  • Focus on the tables and aligning the vars, including comorbidities, overall functional status, and risk definitions.

2014 April 30

  • Treatment variable: for ceasar primary analysis, they are excluding some patients. Pdm is including everyone. Aim 1 of active surveillance

2014 April 16

  • Dan talked about wanting parts of the model for the larger ceasar aim 1 to line up with the pdm predicting tx choice model. The demographics will be comorbidities (tibicat), disease chars (gleason, psa, clin stage), qol/function (urinary incontinence, bowel, sexual function, hormonal (all epics), sf36 mental and physical), psychosocial (ps? ps worry, cesd, social support), provider chars (pdm), and site
  • Dan was saying that conceptually, the propensity model for the main ceasar mod is/should be the same as the mode that we're using for the pdm -> tx. I added the point that one difference could be that you could have a very 'saturated' model for the prop score model, meaning extreme flexibility in terms of nonlinearity and interactions, since the goal is only prediction, and less on validity, whereas in the pdm model, the estimation of the coefficients and their standard errors are more important, so you would model them more parsimoniusly, only including the flexibility that one would expect and want to spend the df on. In light of this, Dan wants to use the prop model to start with and will probably want to combine some categories.

* Set ALLOWTOPICVIEW = JoAnnAlvarez, TatsukiKoyama, Alex Zhao, LichingHuang
Topic revision: r57 - 19 Jan 2017, JoAnnAlvarez

This site is powered by FoswikiCopyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback