Difference: ClinicAnalyses (1 vs. 490)

Revision 490
Changes from r470 to r490
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Line: 6 to 6
 

Added:
>
>

2017 December 21

Paul Slocum, OB/GYN Fellow

  • "The project is evaluating two groups of patients. A group with intrinsic sphincter deficiency (isd - as defined by Uro dynamic parameters) and a group without isd. We are investigating to see if patients with ISD have an increased rate of urgency urinary incontinence resolution after a mid-urethra sling when compared to those without isd. All patients have mixed urinary incontinence (stress and urge inconinence together) and subsequently had a midurethral slinf for the stress component. Would like to have some help with analyzing some data that I have collected. Data is clean and in stata already. "

2017 December 14

NO CLINIC: Department Meeting

2017 December 7

Maria Powell, Postdoctoral Fellow, Otolaryngology & Lea Sayce, Senior Research Specialist

  • "The purpose of this study is to investigate the effectiveness of the most common treatment approaches for phonotraumatic benign vocal fold lesions. Additionally, we are interested in the barriers to access and utilization of services from a tertiary care voice clinic. Our aims are: AIM I: We will evaluate the treatment history of patients with phonotraumatic lesions prior to referral to a tertiary care voice clinic in order to develop a comprehensive understanding of common community based treatment approaches, health care utilization, and associated costs of treatment in this patient population. AIM II: We will determine the quality of life impact of patients with phonotraumatic lesions at initial study center presentation (to establish baseline) and approximately 6 months later (to measure pre to post treatment specific changes in quality of life). AIM III: We will perform a multivariate analysis of factors influencing responses to: 1) voice therapy alone versus response to 2) surgical treatment."
  • "Questions to be addressed: 1) Are the proposed stats appropriate for our dataset? 2) We have collected data from 53 of the proposed 150 patients. We would like to determine if our study is adequately powered with this smaller number, and if not, what our target enrollment should be based on our preliminary data."
  • Recruited eligible patients from clinics at three different sites. Patients completed baseline and follow-up (~6 months) REDCap surveys. Patients were emailed three times prior to considering them lost to follow-up. Have complete data for 48 patients collected from surveys or EHR; primary outcome is VHI quality of life score. Recorded whether a patient uses their voice professionally (high voice user and/or singer). The treatment type and number of voice therapy sessions are determined by physician recommendation, so it is possible that patients were transferred to another treatment arm.
  • Recommend adding plots of the data rather than reporting only p-values. Can use EHR data to validate the survey data within patients. May be able to include more patients with a retrospective chart review. Recommend concluding that you have not been able to demonstrate a difference. Is there a way of quantifying therapy by the number of sessions? Would then be able to assess correlation between therapy and VHI score.

Shelby Ploucher, Neurology/Movement Disorders & Mallory Hacker (PI) & Max Turchan

  • "Our goal is to obtain a quote required for VICTR voucher request. The project is analyzing the relationship between active contact locations of DBS leads and clinical outcomes. We are requesting the voucher for statistical support in analyzing the data."
  • Treatment is deep brain stimulation in a distinct STN location compared to outside that location (3 other possible locations) in early stage Parkinson's Disease (PD) patients. Location is determined by physician recommendation. Primary outcome is 24-month UPDRS-III OFF motor score. Covariates in linear regression model include 24-month amplitude (stimulation voltage), 24-month LEDD (medications), 24-month active contact position, an interaction between amplitude and position, baseline age, baseline disease duration, and baseline motor score. Have enrolled 14 patients.
  • Due to small sample size, recommend focusing on data visualization rather than a linear regression model. There is not enough data to adequately estimate the effects. Can start with the number of locations, calculate the minimum distance between these locations and the actual DBS location for each patient, and use this summary variable to test whether the distance is different in the good responders compared to the poor responders. Will then be able to determine the median distance to a sweet spot. Can weight the patients based on the amplitude.
  • Recommend applying for 90-hour VICTR award ($5000) for biostatistics support.

2017 November 30

Giovanna Giannico, Pathology

  • "Study design in retrospective studies addressing outcome prediction based on a variable. In the case of predicting biochemical recurrence (BCR) in prostate cancer based on the expression of a marker, is it correct to design the study to include: 1) cases that have a minimum follow up that is arbitrarily designated; 2) enrich the cohort with patients that experienced the event (BCR)? Would it rather be correct to select patents consecutively to avoid introducing a bias?"
  • Collected pilot data on 78 patients. Primary outcome is time to BCR following treatment. Recommend taking a consecutive series of patients who meet the eligibility criteria (cross-sectional study) to determine whether presence of the biomarker is associated with morphology. Potential issue with patients who follow up with their local oncologist rather than VUMC after surgical procedure. Some patients will be censored but should not be removed from the study.
  • Selecting for patients that had a BCR event is a case-control study design; this design can only determine whether a signal exists. Can also incorporate matching cases to controls on select covariates (ex. stage, grade, treatment type). A cohort study is required to assess the predictive nature of the biomarker (using survival analysis). For a retrospective cohort study, it will be better to define the cohort to increase likelihood of follow-up (only use information that is known at cohort entry).

WITHDREW: Maria Powell, Postdoctoral Fellow, Otolaryngology

2017 November 16

Claire Kelsey, Injury Prevention Intern & Purnima Unni, MPH, CHES, Pediatric Trauma Injury Prevention Manager

  • "I have spent a large portion of my semester researching for and crafting a pilot program addressing firearm safety that is geared towards both children and their parents. This pilot is part of an NIH grant our department is applying for. As I believe Purnima has told you, an aspect of the parent piece includes a few questions to be incorporated in the routine PCP injury prevention questionnaire about the storage and safety practices in regard to firearms for those who may own them. These questions would ideally be asked in conjunction with other home safety questions including those related to car seats, various poisonous cleaners and drowning prevention. I would like to get feedback about how many participants we would need to recruit in order to get significant data if we are aiming for a 25%-30% response rate on the survey also taking into account that some parents we ask for permission to participate may say no."
  • Tier 1 will take place in PCP clinic with 3 groups (randomized to control with no education, active education with PCP verbally explaining statistics and communicating safe practices, or passive education with pamphlets and posters in waiting room). Survey questions include how many firearms, how stored, and where stored (score range 0-3). Will survey parents again 2-4 months after intervention. Concern with response bias based on survey topic. Recommend adding survey questions to assess effectiveness of education type. Can do a paired analysis within parent before and after intervention. Is there a positive response (ex. individual parent score goes from 3 to 2)? For the statistical analysis, can exclude parents who do not own firearms or who score 3 on the before survey. How many subjects are needed to have 80% power to determine a difference? What is largest sample size that is feasible? Preliminary data will yield information on the proportion of parents who own a gun and will benefit from education (intervenable). Should incorporate this information into the power calculation for control vs. passive and passive vs. active education, and can include a power curve in the grant proposal. Multiple PCP offices will recruit subjects.
  • Tier 2 will be an in-school curriculum and behavioral skills training. Schools will be selected based on various socioeconomic levels in Davidson County and a rural county. Could select schools in known counties of concern.
  • Recommend applying for 90-hour VICTR award ($5000) for biostatistics support.

2017 November 2

Adoma Manful, MPH Student

  • Aim is to determine which factors are associated with initiation of treatment for latent TB in a refugee population in Middle Tennessee. Refugees are required to complete a TB screen using a skin or blood test at Siloam Health Clinic within 90 days of arriving in U.S. (N = 1300), but some people do not attend the follow-up visit at the Metro Public Health Department (N = 748). Approximately 400 people actually initiated treatment. Primary hypothesis is that people who have more severe comorbidities (diabetes, hepatitis, HIV, etc.) are less likely to initiate treatment because they are focused on treating the more severe comorbidity. Also expect that people are less likely to initiate treatment after obtaining employment. Relocation or death is an issue with follow-up. Plan to match patient data across databases using first and last name and DOB. Able to rule out patients who develop active TB. Plan to use Charlson Comorbidity Index (range 0-6).
  • Recommend using the Elixhauser Index or another more comprehensive score rather than the CCI because of the arithmetic error by adding the hazard ratios. The database has unreliable application of ICD-9 codes used to calculate EI. May want to research the clinical trial for Johnson & Johnson and Janssen Pharmaceuticals treatment for multi-drug resistant TB.

2017 October 19

James Cook, Medical Student

  • 51 nodules, 45 patients, 148 features. 35 patients have cancer. There are 10 outcomes.
  • Goal is to build a predicted model. Issue is that need a model that will work for patients outside of this cohort and there is an insufficient amount of data. There is access to a validating data set with n=100 while likely majority lhave cancer.
  • May be able to answer a different question. 6 features across cancer that come up as statistically significant in the literature. It might be worthwhile to look at those 6 features rather than doing data reduction from the 148 features. A drawback is that effect size would need to be huge in order to pick up the difference in current data set.

Sarah Osmundson, Obstetrics and Gynecology

  • Attended clinic and Daniel Byrne recommended attend clinic. Working on K-23 application. Aim is to predict what women need in terms of opiod supply after released from hospital after C-section delivery. Surveyed 150 women, looked at demographics and characteristics and how much opiods were used. Need to make clear in write-up that will be doing calibration analysis - to what extent your model is really working when applied to new data. First, do internal calibration to assess how well model appears to be doing by splitting the data into a test and training data. If the internal calibration looks good, then validate it on external data.

Leah Acosta, Neurology/Cognitive Behavioral

  • "Analysis of implementation of a protocol to assess patients with normal pressure hydrocephalus (NPH), comparing pre-protocol to protocol patients, on different measures (e.g., percent who received a spinal tap, percent who reported improvement in certain symptoms). I want to make sure I did the analysis correctly, particularly since I’m dealing with small numbers (30 pre-protocol, 40 protocol) and some variables with outliers. I have my Stata commands and data is from REDCap, which I can send ahead of time if preferred."
  • Normal pressure hydrocephalus is a condition that results in decline in cognition, gait, and incontinence. Interested in assessing outcomes looking at patients who were assessed using protocol and patients who weren't using protocol. Main purpose of using the protocol is to undergo a spinal tap. In order to be considered in the population, the patient must be assessed by neurology and have one of gait decline, cognitive decline, incontinence. Currently using Wilcoxon test, t-test for continuous variables and Fisher test, chi-square test for categorical variables. One suggestion is to show graphically the degree of improvement for quantitative tests for cognitive improvement rather than "improved" versus "didn't improve". Boxplots may be more informative to display skewness and outliers. In write-up, make sure it's clear what questions aiming to answer before looking at results.

2017 October 12

Heidi Silver, Medicine/Gastroenterology

  • "To meet with William Dupont regarding reviewer comments on a submitted manuscript."

Jacob Fleming, Medical Student

  • "Retrospective database of TACE procedures; investigating outcome (post-embolization syndrome, readmission) by age (64 and younger vs. 65 and older). Want to clarify best way to address multivariate analysis."
  • TACE is a procedure for patients with liver cancer. Have 161 unique patients with 221 procedures. Outcomes of interest are readmission, post-embolization fever, nausea, pain, or portal vein thrombosis within 30 days of TACE. The Pugh score (continuous) was categorized as A vs. B or C. Recommend regression or principle components analysis to determine differences between the age groups. For example, a logistic regression (or probit) model for post-embolization nausea adjusting for age group and other covariates. Can use restricted cubic splines for variables with non-linear relationships. Due to small sample size, should limit model to most important covariates. An exploratory analysis can be done by plotting logit(P(disease)) vs. age. A good reference is Regression Modeling Strategies by Frank Harrell, Jr., PhD.

2017 October 5

Nicolas Baddour, Medical Student

  • "Our research question: Which factors are associated with higher risk of hospitalization for adult patients with CKD and which will be useful in the development of a hospitalization risk prediction model? My plan is to gather variables on a cohort we’ve defined from the RD, do univariate analysis on the variables to help filter significant ones for a model, then fit a Cox proportional hazard regression with our significant variables. I’m curious about potential confounders as well as limitations to the ways we are defining our cohorts."
  • Primary outcome is frequency of nephrology outpatient care. Limit sample to patients who established care prior to 2013. Time 0 is 1/1/2013. Covariates include hypertension, diabetes, etc. Concern with immortal time bias.
  • For secondary outcome as time to first hospitalization, may use a time-varying covariate Cox proportional hazards model. Time-varying covariates include age and laboratory measurements. If plan to use data collected at Time 0 to predict time to first hospitalization, a standard Cox PH model could be used.

Maureen Saint Georges, Pediatric Emergency Medicine Fellow

  • "We are starting a study looking at pediatric lacerations and randomizing them to sutures (control), Dermabond or Steri-Strips. While applying for a VICTR grant, they had some concerns about our sample size and so I wanted to go over our stats plan to make sure that it is appropriate."
  • Primary outcome is appearance score of scar (range 0-100, 100 is best). Sample size of 30 per arm, but reviewer is concerned that this calculation is for a superiority trial instead of a non-inferiority trial. Want confidence interval of the difference between the groups to have half the width of 7.5 given the standard deviation is 15 and average appearance score is 60; the sample size should be 64 for 80% power or 86 for 90% power. Could also conduct an adaptive trial and periodically analyze the data.
 

2017 September 28

Maxim Turchan, Health and Policy/Services Analyst II, Department of Neurology, Movement Disorders

  • "I am interested in determining whether or not a 3-way interaction exists between 2 categorical variables and 1 continuous variables using a linear mixed effects model with an autoregressive covariance structure (due to repeated measures with unequal number of longitudinal follow-up per subject) via the “nlme” package in R. I am still relatively new to both R and mixed effects modeling, and while I believe that I carried out the analysis correctly (both methodologically and philosophically), I would love to review my logic, code, and interpretation of the results with someone with significantly more experience."
Added:
>
>
  • Have N=95 subjects and a total of 370 observations. Outcome is quality of life score (range 0-100, 100 is poor). Want to assess three-way interaction between disease duration, genotype, and neurosurgical intervention (p = .01). Linear mixed model includes 3 main effects, 3 two-way interactions, and 1 three-way interaction.
  • Suspect that correlation structure is actually compound symmetry blended with AR1 when include random effects. Look at plot of standardized residuals vs. fitted values. Note residual standard deviations for models stratified by genotype were not very different. Small sample size is adequate to include only one main effect in the model. May have an influential point in the mutant with DBS group; plan to rerun model after excluding this data point and to compare results.
  • Recommend fitting one model and estimating the contrast within the model to get more stability. Output the predicted values (mean on the square root scale) and the design matrix that gives you those values. Then subtract those 2 vectors and use formula to calculate standard error of the contrast. Look at correlation structure using disease duration on raw scale. Can look into how to specify covariance structure with respect to time. May also consider a generalized least squares model.
 

2017 September 21

Courtney Zola, Infectious Diseases Clinical/Research Fellow

  • "Retrospective analysis in the Synthetic derivative comparing HIV-infected subjects with echocardiograms to matched controls with echocardiograms to determine rate of pulmonary hypertension and mortality. Preliminary data show an increased mortality rate with HIV and PH, but it seems to be out of proportion to what you would expect. Trying to assess if HIV is an independent risk factor for PH and then analyze contributing or mitigating factors (nadir lifetime CD4 count, viral load, treatment of HIV, etc)."
Added:
>
>
  • Out of 8,500 HIV-infected patients, 1,050 had an echocardiogram (25% have PH based on RVSP >40). In general, retention for patients undergoing HIV treatment is good. In the general population, there are approximately 30,000 eligible patients with an echocardiogram. Some patients have repeated echocardiograms (especially for cardiovascular concerns). Plan to use 1:2 or 1:4 matching on age, race, and sex. The SD uses the Tennessee Death Index to gather mortality data. May be able to categorize cause of death for patients with PH. Also want to know if HIV treatment impacts PH, to compare change in RVSP to change in viral load in HIV-infected patients, and to determine attributable risk for HIV and HIV with PH.
  • Bryan Shepherd may have additional information on HIV-related mortality resources. May want to collect data on patient workup, including right heart catheterization. Should start with a research question and plan study design and data collection around the question.
 
Changed:
<
<

Justin Shinn, Otolaryngology Resident

>
>

No Show: Justin Shinn, Otolaryngology Resident

 
  • "Retrospective review evaluating botox injections for patients with synkinesis. Primary assessment is for patient outcomes (improvements based on validated questionnaires) in addition to dosing information, muscles injected, dosing over time, etc. Predominantly need statistical assistance as well as help using Stata."

2017 September 14

Line: 28 to 105
 

Theresa Chikopela, MSCI Student

  • "I am looking at endothelial dysfunction (ED), plasma nitric oxide and body fat mass in HIV infected individuals in Zambia. I would like to find out if the lean and the obese have increased ED compared to normal BMI HIV positive individuals. I would also like to find out if this increase in associated with the increased nitric oxide in these individuals as the patho-physiology states. I would like to address which design would best answer these questions and the statistical tests possible. I am also interested in verifying the calculation of sample size for this study."
Changed:
<
<
  • ED is known to result in cardiovascular disease. This study will compare ED (ICAM1, VCAM1) and BMI. Plan to take a convenience sample of patients who visit clinic. If need to conserve resources, may want to set quotas for BMI ranges to avoid oversampling in any given BMI range. Can utilize regression and correlation on BMI and body fat mass (continuous variables). To calculate sample size for correlation between BMI and endothelial dysfunction, use graph to determine desired sample size for a specific margin of error (1/2 width of 95% CI) in estimating the correlation coefficient (ex. 0.1 yields 200 patients). See biostat.mc.vanderbilt.edu/ClinStat and locate graph in Biostatistics for Biomedical Research by searching for keyword "precision".
>
>
  • ED is known to result in cardiovascular disease. This study will compare ED (ICAM1, VCAM1) and BMI. Plan to take a convenience sample of patients who visit clinic. If need to conserve resources, may want to set quotas for BMI ranges to avoid oversampling in any given BMI range. Can utilize regression and correlation on BMI and body fat mass (continuous variables). To calculate sample size for correlation between BMI and endothelial dysfunction, use graph to determine desired sample size for a specific margin of error (1/2 width of 95% CI) in estimating the correlation coefficient (ex. 0.1 yields 200 patients). See biostat.mc.vanderbilt.edu/ClinStat and locate graph in Biostatistics for Biomedical Research by searching for keyword "precision".
 

Freeman Chabala, Biochemistry

  • For HIV patients, ART is the first line of treatment. Acute Kidney Injury status for new HIV patients in unknown. Goal is to develop model to predict AKI status at 3-month follow-up visit using baseline biomarkers and demographics (age, BMI, CT4 count, etc.) collected at the initial visit. Plan to enroll patients with normal serum creatinine (SCr) at initial visit. SCr is the primary outcome (continuous). Depending on distribution of SCr, may need to use ordinal logistic regression model. Use baseline SCr, biomarkers, age, BMI, and CT4 count to predict SCr at 3-month follow-up visit. Note that using baseline data will not provide much information on how patients respond to ART by 3 months because ART is initiated at baseline. Can also use landmark analysis for dynamic prediction. Start with every patient enrolled, take those who make it to 3-month visit and set this as new baseline. Then take those who make it to 6-month visit and set this as new baseline, etc. Dataset will have multiple rows for each patient (one per visit).
Line: 58 to 135
 
  • May want to change outcome to whether pain needs were met. Recommend creating causal diagram to specify how variables are likely related. Since ibuprofen is standard of care, could give every woman a sealed pill bottle with 0-30 pills (randomized). For example, patient randomized to 10, 20, or 30 pills. Instruct patient only to break the seal if she has unmanageable pain, then determine proportion who broke the seal. There are additional pill bottle options to send wireless signal if one pill was taken from the bottle or to weigh itself automatically to know how many pills were removed.
  • Already use Meds to Beds Program. Do have option to provide an extra paper prescription that is good for 30 days and to follow-up to see if it was filled. Could randomize patients to group that does or does not receive an extra prescription. Another approach is to write a consensus guideline and gather data on whether pain needs were met.
  • If want to focus on local population, could reduce sample size to maximize resources to visit patients. It would also be more feasible for these patients to come in for another prescription. Structured counseling may be used to determine actual need. May want to change goal to predict which patients need the most pills.
Added:
>
>
  • If applying for a VICTR voucher, biostatistics assistance is very likely to take less than 90 hours so a voucher should work
 

WITHDREW: Brian Adkins, Pathology Resident

  • "Allo-antibodies against red cell antigens in pregnant women lead to poor fetal outcomes. As such OB/GYNs follow serial antibody titers. Traditional tube titration in slow and subjective. Automated gel titratrion is available but testing requires further understanding before clinical implantation. We are trying to figure out sample size and number of tests we should be running to determine clinical cut offs for antibody levels."
Revision 470
Changes from r450 to r470
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Line: 6 to 6
 

Added:
>
>

2017 September 28

Maxim Turchan, Health and Policy/Services Analyst II, Department of Neurology, Movement Disorders

  • "I am interested in determining whether or not a 3-way interaction exists between 2 categorical variables and 1 continuous variables using a linear mixed effects model with an autoregressive covariance structure (due to repeated measures with unequal number of longitudinal follow-up per subject) via the “nlme” package in R. I am still relatively new to both R and mixed effects modeling, and while I believe that I carried out the analysis correctly (both methodologically and philosophically), I would love to review my logic, code, and interpretation of the results with someone with significantly more experience."

2017 September 21

Courtney Zola, Infectious Diseases Clinical/Research Fellow

  • "Retrospective analysis in the Synthetic derivative comparing HIV-infected subjects with echocardiograms to matched controls with echocardiograms to determine rate of pulmonary hypertension and mortality. Preliminary data show an increased mortality rate with HIV and PH, but it seems to be out of proportion to what you would expect. Trying to assess if HIV is an independent risk factor for PH and then analyze contributing or mitigating factors (nadir lifetime CD4 count, viral load, treatment of HIV, etc)."

Justin Shinn, Otolaryngology Resident

  • "Retrospective review evaluating botox injections for patients with synkinesis. Primary assessment is for patient outcomes (improvements based on validated questionnaires) in addition to dosing information, muscles injected, dosing over time, etc. Predominantly need statistical assistance as well as help using Stata."

2017 September 14

No Show: Alice Hoyt, Medicine/Allergy

  • "The aims of this project are to determine the preparedness and knowledge of K-12 schools on the topics of asthma and food allergy, then to pilot an asthma telemedicine program."

Shelby Blalock, Pharmacy Resident

  • "My project is regarding gabapentin effect on opioid usage in orthopedic trauma patients. The purpose of this study is to evaluate the safety and efficacy of gabapentin use in patients with traumatic open fractures. I would like to attend the biostatistics clinic in order to get questions answered regarding statistical tests and analyses and overall methodology as I am interested in applying for a VICTR grant. I would greatly appreciate your advice in moving forward with this project."
  • Gabapentin is currently prescribed on a continuous dose based on physician preference. Goal is to see whether the continuous dose reduces need for opioids for pain management. Primary outcome is median morphine equivalent. The distribution of median morphine equivalent is not likely to be normal, so it could be treated as an ordinal variable in an ordinal logistic regression model. To avoid treatment by indication bias, recommend querying five experts to list queues they use to decide whether to prescribe gabapentin. Take all unique queues and make sure this data is collected and adjusted for in a multivariable analysis. Ideally want 10 patients per covariate in the ordinal logistic regression model. Sample size should be at least 200, but recommend gathering additional information on distribution of median morphine equivalent to calculate a more precise sample size.
  • Recommend applying for 90-hour VICTR award ($5000) for biostatistics support.

Theresa Chikopela, MSCI Student

  • "I am looking at endothelial dysfunction (ED), plasma nitric oxide and body fat mass in HIV infected individuals in Zambia. I would like to find out if the lean and the obese have increased ED compared to normal BMI HIV positive individuals. I would also like to find out if this increase in associated with the increased nitric oxide in these individuals as the patho-physiology states. I would like to address which design would best answer these questions and the statistical tests possible. I am also interested in verifying the calculation of sample size for this study."
  • ED is known to result in cardiovascular disease. This study will compare ED (ICAM1, VCAM1) and BMI. Plan to take a convenience sample of patients who visit clinic. If need to conserve resources, may want to set quotas for BMI ranges to avoid oversampling in any given BMI range. Can utilize regression and correlation on BMI and body fat mass (continuous variables). To calculate sample size for correlation between BMI and endothelial dysfunction, use graph to determine desired sample size for a specific margin of error (1/2 width of 95% CI) in estimating the correlation coefficient (ex. 0.1 yields 200 patients). See biostat.mc.vanderbilt.edu/ClinStat and locate graph in Biostatistics for Biomedical Research by searching for keyword "precision".

Freeman Chabala, Biochemistry

  • For HIV patients, ART is the first line of treatment. Acute Kidney Injury status for new HIV patients in unknown. Goal is to develop model to predict AKI status at 3-month follow-up visit using baseline biomarkers and demographics (age, BMI, CT4 count, etc.) collected at the initial visit. Plan to enroll patients with normal serum creatinine (SCr) at initial visit. SCr is the primary outcome (continuous). Depending on distribution of SCr, may need to use ordinal logistic regression model. Use baseline SCr, biomarkers, age, BMI, and CT4 count to predict SCr at 3-month follow-up visit. Note that using baseline data will not provide much information on how patients respond to ART by 3 months because ART is initiated at baseline. Can also use landmark analysis for dynamic prediction. Start with every patient enrolled, take those who make it to 3-month visit and set this as new baseline. Then take those who make it to 6-month visit and set this as new baseline, etc. Dataset will have multiple rows for each patient (one per visit).

2017 September 7

Wendy Bottinor, Cardio-Oncology Fellow/MSCI

  • "We are looking for predictors of cardiovascular dysfunction in patients receiving VEGF inhibitors for treatment of renal cancer. I would like to look over the data set to make sure I am collecting the right information in the correct manner. I have an excel spreadsheet currently but I am planning to switch to a REDCap database that I have created but not put in production. I would also like to have a better understanding of how to analyze the data once it is collected."
  • Patients who receive VEGF inhibitors can also develop proteinuria. Laboratory data are collected at 3 time points (baseline, 2 and 4 weeks after starting treatment). Can use REDCap calendar to schedule lab draws for each patient. Goals are to assess how vascular function changes and baseline predictors of development of treatment side effects [hypertension (primary outcome) and proteinuria (continuous variable)]. This study design confounds treatment with temporal effects. Since the course of treatment is fairly long, it becomes hard to unravel the effects due to treatment versus the natural course of renal cancer. Patients may be prescribed an anti-hypertensive drug to treat hypertension or an ACE/ARB to treat proteinuria.
  • Recommend setting date of treatment start as Time 0. Before treatment is started, need to establish the extent to which each patient exhibits increasing SBP (or proteinuria) over time. Create spaghetti plot of each patient's SBP over time; change line color at time treatment is started. Decide whether to censor patient if SBP worsens. Calculate confidence band for the trend. Daily (or weekly) SBP measurements will provide more information before treatment is started. If patients are self-reporting SBP, it is useful if all standard machines are calibrated. Can build separate longitudinal models for degree of hypertension and proteinuria as a function of time. Can calculate margin of error in estimating mean SBP (ex. +/- 3-4 mmHg) using standard deviation of the first SBP measurement from each patient (similarly for proteinuria).

2017 September 5

Paul Slocum, OB-GYN

  • 182 women received sling surgery. About 30% of these had ISD. The goal is to see if the urgency component of their leakage post-surgery differs between those with and without ISD before operation.
  • Post operation measurements at 2wks, 6wks, 24wks, and up to 52wks.
  • Primary endpoint is receiving treatment for urgency incompetence (yes/no)
  • There will be important demographic data to consider and to adjust for, some of it may be missing (i.e., probably involves multiple imputation or some other method for handling missing data)
  • Analyses could include multivariable logistic regression.
  • Another analysis option would be to account for the length of follow-up using an offset and perform some version of multivariable Poisson regression.
  • There are several secondary endpoints that will also be considered for analyses. Most of them are similar yes/no outcomes; there is also some survey data pre- and post-operation that is of interest.
  • Data has already been collected and is in REDCap, so should be no to very limited data management. Just analyses.
  • Bryan Shepherd was statistician at biostat clinic.

2017 August 31

Sarah Osmundson, Obstetrics and Gynecology/Maternal Fetal Medicine

  • "I am writing an NIH career development award (K23). I need help with the proposed analysis plan for creating a clinical prediction tool using already collected data. The purpose of my award will be to learn about predictive modeling with the mentorship of Frank Harrell. However I need to put a basic overview in my application and provide a rough sample size calculation."
  • Study will look at opioid use after C-section in patients who are opioid naive and did not have major C-section complications. Have pilot data on how much of the opioid prescription was used within 2 weeks of hospital discharge; 22% of patients said they finished the prescription. Surveyed mothers' emotional wellbeing and whether pain needs were met. Want to reduce unused tablets, so primary outcome is amount of leftover opioid medication. Goal is to develop clinical prediction model to use as prescribing tool at hospital discharge.
  • May want to change outcome to whether pain needs were met. Recommend creating causal diagram to specify how variables are likely related. Since ibuprofen is standard of care, could give every woman a sealed pill bottle with 0-30 pills (randomized). For example, patient randomized to 10, 20, or 30 pills. Instruct patient only to break the seal if she has unmanageable pain, then determine proportion who broke the seal. There are additional pill bottle options to send wireless signal if one pill was taken from the bottle or to weigh itself automatically to know how many pills were removed.
  • Already use Meds to Beds Program. Do have option to provide an extra paper prescription that is good for 30 days and to follow-up to see if it was filled. Could randomize patients to group that does or does not receive an extra prescription. Another approach is to write a consensus guideline and gather data on whether pain needs were met.
  • If want to focus on local population, could reduce sample size to maximize resources to visit patients. It would also be more feasible for these patients to come in for another prescription. Structured counseling may be used to determine actual need. May want to change goal to predict which patients need the most pills.

WITHDREW: Brian Adkins, Pathology Resident

  • "Allo-antibodies against red cell antigens in pregnant women lead to poor fetal outcomes. As such OB/GYNs follow serial antibody titers. Traditional tube titration in slow and subjective. Automated gel titratrion is available but testing requires further understanding before clinical implantation. We are trying to figure out sample size and number of tests we should be running to determine clinical cut offs for antibody levels."

2017 August 17

David Kent, Otolaryngology

  • Planning to conduct a meta-analysis of several observational studies and one RCT that assessed changes in AHI and ODI before and after treatment for sleep apnea patients. Recommend obtaining subject-level data from published studies and conducting a paired t test or building a regression model. Recommend applying for 90-hour VICTR award ($5000) for biostatistics support.

Whitney Muhlestein, Medical Student

  • Goal is to determine whether medical student attitudes toward underserved populations change from beginning to end of one year working in a student-run clinic. Pre- and post-surveys were used to gather data on student attitudes and demographics; 30 out of 60 total students completed both the pre- and post-surveys. Gender and number of hours worked in the clinic were recorded for all students. Recommend looking at pre-survey empathy scores for students who did not complete the post-survey. May also predict post-survey completion using demographics and hours worked in the clinic.

2017 August 3

Quique Heurta, Clinical Fellow Allergy, Pulmonary & Critical Care Medicine

  • "Briefly, it’s a retrospective cohort of all hospital-acquired central line infections in adults at VUMC, and we are looking at factors associated with a poor outcome (specifically, the primary outcome is 60-day mortality or recurrence). We were mostly interested in whether prolonged antimicrobial treatment was associated with better outcomes. However, there’s an obvious issue here, which is that patients who die soon after diagnosis don’t have a chance to get a full course of antibiotics. I think this a competing risks problem, but I’m not sure exactly how to get around it. I have the exact dates of diagnosis, discharge/death, and start/stop dates for antibiotic treatment, so all the times can be calculated out."
  • Have ~400 subjects with measured outcomes. Collected information on blood cultures, immunosuppression status, and SOFA severity of illness score (range 0-30). Duration of antibiotics is not the same for all patients, and proportion of course completed should be included as a covariate in the model. Cannot confirm compliance for patients who were discharged on an antibiotic. Most patients are switched to oral antibiotics by discharge. Some patients have to stay on IV antibiotics after discharge; this is dependent upon the organism and provider recommendation. Mortality due to central line infection is included in the primary outcome. May want to consider restricting cohort to patients who completed course of antibiotics (and selected organisms with clear guidelines), then a logistic regression model would be appropriate. Time 0 can be day that antibiotic course was completed. Recommend including ICU stay and probability of death (because related to outcome) as covariates in the model. Watch time-dependent covariates, state transition model.

Maya Yiadom, MSCI Student

  • Currently enrolling patients in a clinical trial; the primary outcome is readmission within 30 days. Error has prevented enrollment of any Palliative Care or Geriatric patients. An interim analysis at the 50% enrollment time point revealed an actual readmission rate that is much lower than what was predicted in study design. Under what conditions can you change your expected detectable effect? Do not recommend extending enrollment for Palliative Care and Geriatric patients. Limited generalizability will be more accepted than changing current enrollment procedure. Need to make sure intervention is not watered down for patients from two medicine services when added in Palliative Care and Geriatric patients.

2017 July 27

Bianca Flores, Neuroscience Graduate Student

  • "I would like to double check if I am using the right formula to find sample size for my animal studies."
  • Sample size of mice (mutant and wild type) and number of neurons per mouse. Will use fluorescence to measure change in intracellular chloride or cell volume between Time 1 (baseline/isotonic) and Time 2 (intervention) and between Time 2 and Time 3 (baseline/isotonic). The three measurements will be collected 10 minutes apart. Also plan to collect time to return to baseline (Time 3 - Time 2, i.e. rate of unswelling); this is dependent upon highest point of response to intervention. Start with repeated measures ANOVA likelihood ratio test (global test) for whether there is a difference in response between the two mice groups at any time point. If there is a difference, then test individual differences at each time point. If there is no difference, do not test at each time point. Recommend doing a simulation study (incorporating standard errors for neurons and mice) and reporting the approximate power for a given number of mice and neurons based on limited resources. To compare time to task completion between the two mice groups, a sample size calculation based on the two-sample t test is appropriate.

2017 July 6

Lauren Lee Wray, Clinical Pharmacology Research Analyst

  • "I have some methodological questions about subsequent analyses following a latent class analysis. Here is a little background information: N = 1,580 and 4 clusters were derived with participants who have different trigger symptoms. All clusters have the same underlying medical issue under study. Now, we are adding SNPs to the analysis plan. I need help choosing analyses to run for this data."
  • All patients have atrial fibrillation (AF). The triggers are dichotomous variables for caffeine intake, sleep, etc. The 4 clusters are no trigger, vagal, adrenal, and combination; smallest cluster has 92 patients. Plan to test 30 SNPs to determine whether patients in the same cluster have similar genetic markers for AF.
  • Recommend looking at distribution of variables (used to determine clusters) within each cluster to verify clusters are homogeneous. Can run logistic regression model for a given trigger (ex. caffeine intake) with 30 SNPs as covariates; then rank SNPs based on correlation coefficient or chi-square statistic. Use additive model on logit scale with SNPs coded as 0, 1, or 2 to account for dominant and recessive genes. The SNPs could be co-expressed, so recommend doing a correlation analysis of SNPs to assess collinearity. Use variable clustering to visualize redundancies among variables. It will be better if the number of SNPs in the model can be reduced.

WITHDREW: Margaret Taylor, School of Nursing Melrose Faculty Practice

  • "I have a deadline to meet and think y'all could answer some questions. Honestly, it's really simple stuff, so much so you will probably laugh but to me it's a big deal."
 

2017 June 22

Changed:
<
<

Sophie Katz, Pediatric Infectious Disease

>
>

Ritu Banarjee, Pediatric Infectious Disease

 
  • We are planning a prospective observational cohort study of procalcitonin levels (a hormone) and its kinetics in infants and children, and would greatly appreciate assistance in determining the number of subjects needed to get accurate ROC curves.
Added:
>
>
  • Meeting notes: Blood test used primarily for adults. Interested in how tests perform in children. Infants through 18 year-olds. Interested in identifying cut-offs for sensitivity, specificity. Primary interest is negative predictive value. Can compare to microbiology culture. Will enroll patients with infection to get both blood test and culture. Localized infection location. Test in adults discrimination 81-94%
  • If hypothesized gold-standard defined infection rate is 10%, can produce 95% confidence intervals for estimation of predictive value for comparison tests. Using estimates of prevalence, sensitivity, specificity, positive predictive value and negative value, a sample size can be calculated based on the confidence interval desired. Email william.dupont@vanderbilt.edu for calculations. Recommend applying for 90-hour VICTR award ($5000) for biostatistics support.
 

2017 June 15

Daniel Markwalter, Center for Biomedical Ethics and Society

  • We are conducting a study to better understand family perceptions of transitions in care in the pediatric critical care unit as well facilitators and obstacles to family preparedness for transitions in this setting. We are using a grounded theory methodology and have data describing the percentages of certain subgroups that reference particular themes. We want to learn about the methods for comparing proportions between groups. For instance, if one group of 20 people reference a theme 85% of the time and a separate group of 25 people reference the theme 20% of the time, can we say these are statistically different?
Revision 450
Changes from r430 to r450
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Line: 6 to 6
 

Added:
>
>

2017 June 22

Sophie Katz, Pediatric Infectious Disease

  • We are planning a prospective observational cohort study of procalcitonin levels (a hormone) and its kinetics in infants and children, and would greatly appreciate assistance in determining the number of subjects needed to get accurate ROC curves.

2017 June 15

Daniel Markwalter, Center for Biomedical Ethics and Society

  • We are conducting a study to better understand family perceptions of transitions in care in the pediatric critical care unit as well facilitators and obstacles to family preparedness for transitions in this setting. We are using a grounded theory methodology and have data describing the percentages of certain subgroups that reference particular themes. We want to learn about the methods for comparing proportions between groups. For instance, if one group of 20 people reference a theme 85% of the time and a separate group of 25 people reference the theme 20% of the time, can we say these are statistically different?
  • Meeting notes: 4 main transitions in care. Read transcripts to develop complete collection of themes and grouping for words referenced. Consistent interview structure-conversational, might be slight nuances between interviews. Same process repeated with parent and physician. Want to compare percentage that parents identification of theme to physician identifying same theme.
  • May not be necessary to specify that groups are statistically similar or different. Description may be sufficient. Recommend graphical display. Can produce confidence bounds to proportion estimates. (Binomial confidence interval estimator-Wilson) Hypothesis test null assumes that physicians and parents should be exactly aligned and they may not be.

Ian Setliff, Pathology, Microbiology & Immunology

  • We have many features (100) of the antibody repertoire of each of several donors. Some of these are independent of each other, while others are not. We have a question of how to normalize our data correctly for subsequent analysis.
  • Meeting notes: Longitudinal dataset of 6 donors.

2017 June 1

Zachary Cox, Cardiology

  • "We need help determining the number of patients to enroll to have the power to determine a difference in our primary outcome in a randomized, prospective, parallel-design proof-of-concept clinical trial. We are comparing inhaled milrinone (investigational arm) to intravenous milrinone (control arm) on the primary outcome of cardiovascular hemodynamic variables."
  • Will include hospitalized patients with advanced heart failure and undergoing evaluation for heart transplant. Non-inferiority in outcome which is continuous hemodynamic output. A 20% change from baseline is considered standard by Medicare.
  • Recommend adjusting for baseline hemodynamic output in model to increase power. However, if correlation is < 0.5 between baseline and 72-hour measurements, then adjusting for baseline will just add noise. To calculate power, can utilize change from baseline. To estimate precision, need estimate of standard deviation of within subject variance at baseline. Recommend recruiting 20 patients per arm for a pilot study that can provide additional information to plan a large-scale trial.

Dan Ayers, Biostatistics

  • We are conducting a sensitivity analysis on a prediction model, adding patient outcomes from patients excluded from the original model build and comparing c-index and calibration slopes for goodness-of-fit. Our current process is,
    1. The sensitivity analysis will be conducted on the 2 sensitivity sets described below.
    1. Parameters of the full model will not be re-estimated.
    1. 20 datasets will be imputed using the new outcome set and all x-variables as in the original analysis. For sensitivity set 1, all new outcomes for the 65 added patients will equal 1. For sensitivity set 2, all new outcomes for the 65 added patients will equal 0.
    1. Prediction sets, using the original parameters of the full models, will be computed for each of the 20 datasets.
    1. The average prediction per patient will summarize that patients prediction.
    1. The average prediction set will be compared to the observed data to derive a calibration curve. The normal parameters, Dxy, c-index, etc will be used to summarize the goodness of fit.

  • Capture reasons physician did not order MRI to determine whether MRI data are missing at random. If data are not MAR, will need to do multiple imputation or exclude these observations. Use continuous probability (DOC) rather than categorizing as positive or negative. Can report median DOC stratified by MRI status and proportion MRI+ in DOC deciles.

Oscar Ayala, Graduate Student Biomedical Engineering

  • "I am working with Dr. Anita Mahadevan-Jansen. I am currently analyzing spectral data collected using Raman spectroscopy and trying to implement data reduction techniques to ultimately classify my results."
  • Goal to classify bacteria. Have 6 types of bacteria; some are wild type, and others are single gene mutations. Record number of photons (intensity) for 917 bacteria features. Collected multiple measurements from multiple colonies for a total of 162 intensity plots. Recommend grouping bacteria features into 50 intervals and summarizing average height in each interval. May consider PCA with penalization (smooth or non-smooth which penalizes some loadings to zero) and variable clustering analysis. Separately, could fit spline function with 30 knots and associate with classification. Preferable to use bootstrapping rather than cross-validation.

2017 May 25

Tanya Marvi, Medical Student

  • "My project is looking at platelet count in patients with musculoskeletal infection. I am working in stata and would like some assistance with fitting splines and building a predictive model for using platelets and crp to predict severity of infection. My goal is to build a model using an interaction term for interpreting the platelet count in the context of the CRP. Additionally, I would like help with fitting a proportional odds model with robust standard errors for platelet count overtime. If it is possible to work with someone familiar with Stata that would be very helpful."
  • Have 150-250 previously healthy pediatric patients who came to ED, had orthopedic consult, and were admitted to hospital. Diagnosis can be inflammation (excluded from study), localized bacterial infection in joint (septic joint), or infection in muscle. Infection is confirmed with two positive blood cultures. Goal to use CRP biomarker (which increases during infection) and blood platelet level (BPL, which decreases during infection but rebounds over time) to predict outcome. BPL can be confounded with treatment, so antibiotic administration was documented. The outcomes are death (but 0 patients died), complications (n=15 patients), LOS in hours (surrogate for infection severity), and Peds charge weight (standardized cost which is indicative of infection severity). Make sure there is a temporal relationship between outcome and predictors.
  • Recommend using response feature analysis which uses a biologically meaningful summary of data for each patient (ex. area under the curve or slope coefficient). This removes the correlation in the data. With independent data, a simple fixed effects analysis can be done. For example, use hospitalization Day 5 CRP and BPL to predict LOS. Include absolute value of change in BPL from baseline in model. Can also do a landmark analysis; take patients who survive to time t and look at relative importance of BPL. Remember to take logarithm of ratios and show raw spaghetti plots (specify alpha saturation or use grayscale based on LOS). Stata 'mkspline' program can use linear or cubic splines. See William Dupont's book Statistical Modeling for Biomedical Researchers (2009). It is also possible to use loess regression and to calculate confidence intervals with bootstrapping.

2017 May 18

Rita Pfeiffer, Graduate Student Department of Hearing and Speech Sciences/Program for Music, Mind, and Society

  • "I am investigating the feasibility of using a frame difference method to analyze movement interactions during social interactions between adults and preschoolers (who present with and without Autism). We obtained a segment of a social communication assessment, in which the experimenter points to posters to bid the child to jointly attend. After down sampling the video to 10 fps, we used a frame difference method to determine the number of pixel changes per video frame, thereby indicating the amount of movement. The output results in two time series: one capturing the movement from the experimenter, and the other capturing the child's movement. We are utilizing cross-correlation and coherence analyses to determine the relationship between the relationship between the two time series."
  • Have enrolled 13 children aged 2-3 years who have or have not been diagnosed with autism. Recorded at most four 30-second videos per child. Planning time series analysis of movement data for experimenter and child; want to look at relationship between the two and potential differences in children with autism. Should we use cross-correlation or cross-covariance values to compare our data set? How can we best use our coherence analysis to compare our data sets?
  • How does one determine the appropriate window size to do our analyses (for FFT, coherence, etc)? Recommend using two-fold cross-validation. Split data into 2 halves, fit models with different tuning parameters (window size in this case) with first half, and see how well the model predicts the data in the second half (e.g. calculate observed - predicted values). Choose window size that gives the best performance for primary analysis. Can report sensitivity analyses using different window sizes to explain that primary analysis results are not entirely dependent on choice of window size.
  • May contact Hakmook Kang to discuss additional time series or functional data analysis questions during a Tuesday clinic.

2017 May 11

Vivian Kawai, Medicine/Clinical Pharmacology Research Assistant

  • "We are conducting a candidate gene study for gestational weight gain in BioVU. Unfortunately we have several patients with missing prepregnancy weight and will like to see if feasible to impute this information using pregnancy weights at different gestational ages. If so, what weights are needed for this."
  • Defined gestational diabetes using criteria or previous diagnosis in medical record. Need pre-pregnancy and pre-delivery weights to calculate gestational weight gain. 10% of controls and 25% of cases (with gestational diabetes) are missing pre-pregnancy weight. Matched cases and controls on age and number of previous pregnancies. Collected repeated weight measurements during pregnancy, but do not have information on baby's gender or birth weight. Plan to calculate risk score for gestational diabetes.
  • Recommend using multiple imputation to generate values for missing pre-pregnancy weights. Starting June 1st, can apply for VICTR voucher for 90 hours of biostatistics support.

Jiancong Liang, Pathology, Microbiology and Immunology

  • Have 30 cases of papillary thyroid carcinoma with classic variant. This cancer has a 100% cure rate. Noted an unusually high proportion of cases (40%) with hashimoto thyroiditis. Endpoints were collected at time of cancer diagnosis and include tumor size, tumor stage, tumor multiplicity, and metastasis. Lymph node status and treatment response were collected later.
  • Already used Fisher's exact test to compare dichotomized tumor size between hashimoto groups. Recommend not dichotomizing tumor size and using Wilcoxon rank-sum test to compare continuous variable between hashimoto groups. Given small sample size, survival analysis will not be informative. Can generate Kaplan-Meier curves to observe trends. May want to use logistic regression model to predict hashimoto status. Contact William Dupont for additional support.

2017 April 20

Bhumika Piya, PhD Student Sociology

  • "I am examining the relationship between arsenic content in drinking water and body weight status (underweight, healthy weight, and overweight) using multinomial logistic regression. Since I use survey data, I have some questions regarding Stata's svy function and robust standard errors (esp. linearized standard errors and how that affects statistical significance). P.S. My mentor is no longer at Vanderbilt and won't be able to attend the session with me."
  • 2500 household surveys conducted in 8 communities. Body weight status was self-reported on survey. Arsenic content and salinity were measured in each community several months after surveys were completed (mean of several measurements dichotomized into high/low). Other covariates include demographics (age, sex, religion, health status), environmental stress, and community economic development index.
  • Do not recommend using sampling methodology. Can use observational clinical study analysis to determine effect of arsenic on body weight status. Arsenic should be a continuous variable in the model, rather than high/low level. Look at sum of squares for arsenic when do or do not include community in the model. Do not include community variable if include continuous community characteristic (e.g. economic development index) variables in model. When reporting the final results, if the communities are no different other than arsenic level, then this is the effect of arsenic on body weight status. Another option is to fit two ordinal models, one for BMI amount lower than ideal BMI and another for BMI amount higher than ideal BMI.
 

2017 March 30

Hannah Dietrich, Student

  • "We are part of a QI project gathering data on health literacy levels in the pediatric general surgery clinic. We have finished gathering data for this project, and would like to focus our questions mainly on graphics and statistical analyses for our data. We are primarily examining correlations between health literacy scores and other factors such as income, no-shows rates, etc."
Added:
>
>
  • Health literacy score measured on a 15-point scale (range 3-15). Have collected 60 surveys. Survey was validated at the VA but not in this clinic population. Do not have gold standard data to compare with survey data. If a new patient did not show, then could not gather survey data.
  • Goal to assess relationship between health literacy score and other patient characteristics (time between VUMC system entry and surgery, clinic no-show status). Can create histogram for health literacy score. Can use Wilcoxon rank-sum test (2 groups) or Kruskal-Wallis test (3 groups) to compare health literacy scores among groups. Recommend using a logistic regression model to predict clinic no-show status (outcome) using health literacy score and adjusting for patient characteristics (race, etc.). Need to decide on a desired level of precision to calculate required sample size.

Celestine Wanjalla, Infectious Diseases Postdoctoral Fellow

  • "Analysis's of cross-reactive T cells in human PBMCs. Calculation of sample no and power and best statistical analysis for my first two aims."
  • Looking at T-cell (CD8alpha) responses to different peptides (tetramer NLV, 2B9, 2A12) in 7 subjects with confirmed CMV infection. Planning to submit grant proposal. Can include CMV negative subjects as negative controls. Sample size will depend on costs and desired level of precision; including more than 7 subjects (ex. 20 in each group) will increase statistical power. Can include graph of power curve in proposal.
 

2017 March 23

Kristy Broman, Surgery Resident

Line: 32 to 96
 

2017 March 2

Miguel Cuj, Graduate Student Latin American Studies

Changed:
<
<
  • "My research project is about cross-sectional survey, in rural area of Guatemala about health status. 1) How compute the sample size in a target group in three small communities with only two selection criteria a) beneficiary of social program and b) older population >50 years old. 2) Which analysis statistic could you suggest beyond descriptive statistic? 3) Some issues about use of SP in clinical trial."
>
>
  • "My research project is about cross-sectional survey, in rural area of Guatemala about health status. 1) How compute the sample size in a target group in three small communities with only two selection criteria a) beneficiary of social program and b) older population >50 years old. 2) Which analysis statistic could you suggest beyond descriptive statistic? 3) Some issues about use of SP in clinical trial."
 
  • Concerns with chronic diseases in older population (diabetes, heart disease, etc.). Planning to collect survey data on demographics and health status opinions about armed conflict. Have a list of 150 potential subjects who meet eligibility criteria and their contact information. From this sampling frame, can select sample and randomize order of approaching subjects. For a qualitative study, can continue to collect surveys until reach saturation point for information. To calculate sample size based on desired margin of error for estimate of proportion, sample size equals 1/e^2 where e is the error. Similarly, can calculate expected standard errors based on sample size you can reach given limited resources. Do not need a power calculation because you are not testing a specific hypothesis. Can utilize information gathered from surveys to identify potential questions for focus groups.

2017 February 23

Aaditi Naik, Undergraduate Student

Changed:
<
<
  • I am an Undergraduate Research Assistant in Dr. David Charles's Movement Disorders lab in the VUMC Neurology Department. Our project aims to identify the prevalence of four previously-identified non-motor markers – spatial discrimination threshold, temporal discrimination threshold, vibration-induced illusion of movement, and kinesthesia – in a population of cervical dystonia patients, unaffected family members, and healthy volunteers (control group). Consenting participants will receive a neurological examination performed by a movement disorders neurologist, followed by an assessment of the four non-motor symptoms. Through analysis of the concurrence of the non-motor features across the three groups of participants, we hope to identify a combination of non-motor symptoms which is more prevalent in the cervical dystonia group, and therefore may be indicative of disease development. Our specific questions are: 
  1. What is the appropriate statistical test to assess the association between multiple non-motor features (2, 3, or 4 features) and participant group (patient, non-affected family member, healthy volunteer)?
  2. Based upon the suggested analysis for point 1, what would be an appropriate sample size and power? Expect to enroll 80 subjects per group.
  3. Would Kruskal Wallis (ordinal/interval)/Chi-square (categorical)/Kaplan-Meier (survival) be appropriate statistical tests to assess the difference between three groups for an individual variable identified during the study, e.g. age of onset, gender, income, etc.?
  4. Is a logistical regression model the best statistical test to assess the association of non-motor features with cervical dystonia, based on prevalence rates in the participant groups (patient, non-affected family member, healthy volunteer)? If so, which type of logistical regression would be most appropriate? Recommend using multinomial regression for all three groups or binary logistic regression (looking at two groups at a time) to predict group membership. Given two categories for outcome, take number of subjects in smaller group and divide by 15. This is the number of variables that can be included in the model. Can use bootstrapping to assess stability of model.
  5. What are the best methods through which to report qualitative data related to clinical features of sensory tricks, such as the types of sensory tricks used, frequency of use, effectiveness of use, etc.? Descriptive statistics on any number of clinical features.
>
>
  • I am an Undergraduate Research Assistant in Dr. David Charles's Movement Disorders lab in the VUMC Neurology Department. Our project aims to identify the prevalence of four previously-identified non-motor markers – spatial discrimination threshold, temporal discrimination threshold, vibration-induced illusion of movement, and kinesthesia – in a population of cervical dystonia patients, unaffected family members, and healthy volunteers (control group). Consenting participants will receive a neurological examination performed by a movement disorders neurologist, followed by an assessment of the four non-motor symptoms. Through analysis of the concurrence of the non-motor features across the three groups of participants, we hope to identify a combination of non-motor symptoms which is more prevalent in the cervical dystonia group, and therefore may be indicative of disease development. Our specific questions are:

    1. What is the appropriate statistical test to assess the association between multiple non-motor features (2, 3, or 4 features) and participant group (patient, non-affected family member, healthy volunteer)?

    1. Based upon the suggested analysis for point 1, what would be an appropriate sample size and power? Expect to enroll 80 subjects per group.

    1. Would Kruskal Wallis (ordinal/interval)/Chi-square (categorical)/Kaplan-Meier (survival) be appropriate statistical tests to assess the difference between three groups for an individual variable identified during the study, e.g. age of onset, gender, income, etc.?

    1. Is a logistical regression model the best statistical test to assess the association of non-motor features with cervical dystonia, based on prevalence rates in the participant groups (patient, non-affected family member, healthy volunteer)? If so, which type of logistical regression would be most appropriate? Recommend using multinomial regression for all three groups or binary logistic regression (looking at two groups at a time) to predict group membership. Given two categories for outcome, take number of subjects in smaller group and divide by 15. This is the number of variables that can be included in the model. Can use bootstrapping to assess stability of model.

    1. What are the best methods through which to report qualitative data related to clinical features of sensory tricks, such as the types of sensory tricks used, frequency of use, effectiveness of use, etc.? Descriptive statistics on any number of clinical features.
 

Jessica Grahl, Pharmacy Resident

Line: 56 to 129
 

Rany Octaria, MPH Student

  • "I am currently finishing my thesis conducting Social Network of Hospital Patient Sharing in TN to identify at-risk facility for multidrug resistant organism spread. My biostatistics advisor, Yuwei Zhu, recommended me to use your service to get input regarding the statistics I can use for my social network analysis."
  • For Table one, you may want to conduct ERGMs analysis (binary network or using the method proposed in paper appeared in the electronic journal of statistics for analyzing count network) instead of linear regression. For Figure 2, it is possible to run a community detection algorithm based on either modularity or psueduo likelihood to see if the communities detected by such automated way matches the geo distance of these hospitals.
Changed:
<
<
  • Want to identify which facilities are highly connected and why certain facilities are highly influential. Preliminary analysis used UCInet software. Recommend using StatNet software to run ERGM and using permutation tests. An R software package is available to use maximum likelihood for estimates. Recommend doing a sensitivity analysis with different thresholds (ex. 0, 1, >1). Data likely follow a Poisson or zero-inflated distribution; this can be verified by plotting a histogram. Binary network has degree (count of interactions); weight facilities by rank.
>
>
  • Want to identify which facilities are highly connected and why certain facilities are highly influential. Preliminary analysis used UCInet software. Recommend using StatNet software to run ERGM and using permutation tests. An R software package is available to use maximum likelihood for estimates. Recommend doing a sensitivity analysis with different thresholds (ex. 0, 1, >1). Data likely follow a Poisson or zero-inflated distribution; this can be verified by plotting a histogram. Binary network has degree (count of interactions); weight facilities by rank.
 

Ricardo Lugo, Cardiology Fellow

  • "Comparing Serum BNP level with 1) myocardial scar and 2) Ventricular Tachycardia recurrence rates after catheter ablation. I have a basic familiarity with R and am requesting assistance with 1) ensure I am using the appropriate analysis methods and 2) creating descriptive table (specifically patient characteristics stratified by BNP tertiles)."
Revision 430
Changes from r410 to r430
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Line: 6 to 6
 

Added:
>
>

2017 March 30

Hannah Dietrich, Student

  • "We are part of a QI project gathering data on health literacy levels in the pediatric general surgery clinic. We have finished gathering data for this project, and would like to focus our questions mainly on graphics and statistical analyses for our data. We are primarily examining correlations between health literacy scores and other factors such as income, no-shows rates, etc."

2017 March 23

Kristy Broman, Surgery Resident

  • "The question I am trying to answer is whether there is a way to compare two incidence ratio. I am using the SEER database and SEER Stat which has built in modules for calculating age standardized incidence ratio for specific events. The output I get is the total N, the total event number, and the standardized incidence ratio. This is essentially the ratio of observed to expected, but I cannot know how the expected is determined (this is a "black box" within the module. So I want to know if there is a way to essentially compare the already calculated standardized incidence ratios."
  • In cohort of patients with previous colon cancer, the outcome of interest is subsequent GI cancer. Want to know if SIR is significant. Module does not provide standard error or confidence interval for SIR. Reviewed formulas to calculate E* using SIR and given D along with the confidence interval.

Matthew Duvernay, Pharmacology

  • "I am initiating a pilot study to measure DNA methylation at the F2RL3 gene site in whole human blood samples and correlate this with blood cell function. There is a body of data comparing the levels of methylation in smokers and non-smokers at the particular gene that I am interested in. I would like to learn about how to estimate the optimal sample size needed in each of these two groups based on the variability in the published data sets."
  • Published data demonstrated that expression levels of receptor can change based on methylation status of gene. Hypomethylation is highly correlated with smoking. There is variation in methylation among smokers. Planning to look at platelets and monocytes from smokers and non-smokers. Evaluators will need to be blinded to smoking status. Expect different staining levels within a given cell. Classify cells as positive or negative and calculate proportion that are positive. Need to establish protocol for assessing intensity of expression.
  • Recommend using standard errors from published data to calculate sample size. Include multiple scenarios with a range of variances in proposal. Can also use possible number of samples given specific grant amount to conduct power analysis. With new pilot data, can calculate correlation between methylation and gene expression. Planning to apply for VICTR voucher in the future.

2017 March 16

WITHDREW: Tiffany Sarell, Clinical Pharmacist

2017 March 9

Dillon O'Neill, Medical Student

  • "I have a series of measurements made by 3 different readers. The measurements are a contiguous variable. Each reader read essentially all of the images in the dataset. Each reader also re-read 30 of the measurements 2 times. Need guidance as to most appropriate statistic for inter- and intra- observer reliability."
  • Retrospective look at pre and post angles using films from SKFE patients. Measure increased rate of vascular necrosis in hips that were manipulated vs. hips that did not move at all. Determination of stable vs. unstable SKFE uses 15 degree difference between pre and post angles as the cutoff point. The cutoff point was determined a priori.
  • Recommend looking at correlation of deltas (difference between pre and post angles) between the 3 readers. High correlation between readers indicates consistency among readers. Krippendorff's alpha (Stata KRIPPALPHA) allows for multiple readers who do not have to rate every patient; values close to -1 or 1 indicate consistency. Goal to determine AVN rates in stable and unstable SKFE patients. Only 3 patients received AVN. Recommend creating plots of data (ex. Bland-Altman-type plot for observed deltas from each reader (y) vs. mean delta (x)). Can also plot individual deltas. Should not compare observed rate to published rate because confidence interval for observed rate is so wide.

2017 March 2

Miguel Cuj, Graduate Student Latin American Studies

  • "My research project is about cross-sectional survey, in rural area of Guatemala about health status. 1) How compute the sample size in a target group in three small communities with only two selection criteria a) beneficiary of social program and b) older population >50 years old. 2) Which analysis statistic could you suggest beyond descriptive statistic? 3) Some issues about use of SP in clinical trial."
  • Concerns with chronic diseases in older population (diabetes, heart disease, etc.). Planning to collect survey data on demographics and health status opinions about armed conflict. Have a list of 150 potential subjects who meet eligibility criteria and their contact information. From this sampling frame, can select sample and randomize order of approaching subjects. For a qualitative study, can continue to collect surveys until reach saturation point for information. To calculate sample size based on desired margin of error for estimate of proportion, sample size equals 1/e^2 where e is the error. Similarly, can calculate expected standard errors based on sample size you can reach given limited resources. Do not need a power calculation because you are not testing a specific hypothesis. Can utilize information gathered from surveys to identify potential questions for focus groups.
 

2017 February 23

Aaditi Naik, Undergraduate Student

Changed:
<
<
  • "I am an Undergraduate Research Assistant in Dr. David Charles's Movement Disorders lab in the VUMC Neurology Department. My project aims to identify the prevalence of four previously-identified non-motor markers – spatial discrimination threshold, temporal discrimination threshold, vibration-induced illusion of movement, and kinesthesia – in a population of cervical dystonia patients, unaffected family members, and healthy volunteers (control group). Consenting participants will receive a neurological examination performed by a movement disorders neurologist, followed by an assessment of the four non-motor symptoms. Through analysis of the concurrence of the non-motor features across the three groups of participants, we hope to identify a combination of non-motor symptoms which is more prevalent in the cervical dystonia group, and therefore may be indicative of disease development. This study will fill an important unmet need, as to our knowledge there are no published studies assessing the comorbid presentation of these four non-motor symptoms in a single sample."
>
>
  • I am an Undergraduate Research Assistant in Dr. David Charles's Movement Disorders lab in the VUMC Neurology Department. Our project aims to identify the prevalence of four previously-identified non-motor markers – spatial discrimination threshold, temporal discrimination threshold, vibration-induced illusion of movement, and kinesthesia – in a population of cervical dystonia patients, unaffected family members, and healthy volunteers (control group). Consenting participants will receive a neurological examination performed by a movement disorders neurologist, followed by an assessment of the four non-motor symptoms. Through analysis of the concurrence of the non-motor features across the three groups of participants, we hope to identify a combination of non-motor symptoms which is more prevalent in the cervical dystonia group, and therefore may be indicative of disease development. Our specific questions are: 
  1. What is the appropriate statistical test to assess the association between multiple non-motor features (2, 3, or 4 features) and participant group (patient, non-affected family member, healthy volunteer)?
  2. Based upon the suggested analysis for point 1, what would be an appropriate sample size and power? Expect to enroll 80 subjects per group.
  3. Would Kruskal Wallis (ordinal/interval)/Chi-square (categorical)/Kaplan-Meier (survival) be appropriate statistical tests to assess the difference between three groups for an individual variable identified during the study, e.g. age of onset, gender, income, etc.?
  4. Is a logistical regression model the best statistical test to assess the association of non-motor features with cervical dystonia, based on prevalence rates in the participant groups (patient, non-affected family member, healthy volunteer)? If so, which type of logistical regression would be most appropriate? Recommend using multinomial regression for all three groups or binary logistic regression (looking at two groups at a time) to predict group membership. Given two categories for outcome, take number of subjects in smaller group and divide by 15. This is the number of variables that can be included in the model. Can use bootstrapping to assess stability of model.
  5. What are the best methods through which to report qualitative data related to clinical features of sensory tricks, such as the types of sensory tricks used, frequency of use, effectiveness of use, etc.? Descriptive statistics on any number of clinical features.

Jessica Grahl, Pharmacy Resident

  • "Antimicrobials and Delirium Questions we were asked to answer: 1) Do we have the culture collected at time of giving antibiotic so we might differentiate whether the infection or the antibiotic caused delirium? 2) Do we have the CAM assessment in continuous scale not just delirium yes/no? Additionally we would like to address the statistical analysis plan associated with this project. Dr. Mayur Patel and Joanna Stollings will be accompanying me."
  • CAM ICU used to determine delirium status (positive, negative, or unable to assess) twice daily. Total number of patients is 521; 150 patients did not receive an antimicrobial. Defined 3 antimicrobial groups. In ICU, sepsis status based on SIRS criteria was collected daily. Other infection status unknown.
  • Difficult to interpret results if large number of patients die and are removed from the analysis. Recommending including death and coma as outcome categories. Specify time windows for primary analysis (ex. 2 days antimicrobial use, 3 days delirium assessment). Is it necessary to have a blank-out period between antimicrobial administration and delirium outcome assessment? Can analyze outcome in 12-hour sliding time windows.

2017 February 9

Rany Octaria, MPH Student

  • "I am currently finishing my thesis conducting Social Network of Hospital Patient Sharing in TN to identify at-risk facility for multidrug resistant organism spread. My biostatistics advisor, Yuwei Zhu, recommended me to use your service to get input regarding the statistics I can use for my social network analysis."
  • For Table one, you may want to conduct ERGMs analysis (binary network or using the method proposed in paper appeared in the electronic journal of statistics for analyzing count network) instead of linear regression. For Figure 2, it is possible to run a community detection algorithm based on either modularity or psueduo likelihood to see if the communities detected by such automated way matches the geo distance of these hospitals.
  • Want to identify which facilities are highly connected and why certain facilities are highly influential. Preliminary analysis used UCInet software. Recommend using StatNet software to run ERGM and using permutation tests. An R software package is available to use maximum likelihood for estimates. Recommend doing a sensitivity analysis with different thresholds (ex. 0, 1, >1). Data likely follow a Poisson or zero-inflated distribution; this can be verified by plotting a histogram. Binary network has degree (count of interactions); weight facilities by rank.

Ricardo Lugo, Cardiology Fellow

  • "Comparing Serum BNP level with 1) myocardial scar and 2) Ventricular Tachycardia recurrence rates after catheter ablation. I have a basic familiarity with R and am requesting assistance with 1) ensure I am using the appropriate analysis methods and 2) creating descriptive table (specifically patient characteristics stratified by BNP tertiles)."
  • For a patient to have VT, they have to have something wrong with their heart. All patients in the study have some form of cardiomyopathy. The criteria for performing CA has been refined, and 1-3 per week are performed currently. Blood samples were collected at the time of CA procedure to measure BNP levels. There are a total of 59 patients and 30 events. The goal is to use biomarkers to identify which patients are good candidates for CA.
  • Preliminary analysis used Cox PH model for time to recurrent VT. Collected data on patient characteristics (comorbidities) and characteristics of CA procedure. Created BNP tertials for Table 1, each category has ~20 patients. LVEF and Endocardial Area were shown to be different among the tertials. Recommend reporting descriptive statistics for overall cohort; do not need to categorize BNP. Can still compare BNP levels between gender groups.
  • Ran Cox PH and multivariable Cox PH models with log(BNP). Given 30 events, you have 1-2 degrees of freedom. Can try non-linear term for BMI (ex. restricted cubic spline with 3-5 knots which will use 2-4 degrees of freedom). Plot overall Kaplan-Meier curve by log(BNP). Plot survival from multivariable model [ex. plot(predict(f, time=c(30, 60, 90)))]. Report concordance index for model. Crossvalidation using bootstrapping may not perform well with only 59 patients.
  • May be covered by Cardiology collaboration plan with Meng Xu and Shi Huang.
 

2017 February 2

Miriam Lense, Otolaryngology

  • "At my previous visit, it was suggested I get information about the reliability of the measure I am using for a power analysis for growth curve analyses."
Added:
>
>
  • Collecting parent-reported vocabulary measure at 4 time points and global language and communication assessment and social entrainment measurement at 9 months. Test-retest reliability of measure at 12 months is low (0.61) but higher at 18m (~0.87).
  • Recommend starting with PS software (http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize), but it assumes repeated measures are not correlated. Software produces power curves and text for grant proposal; sample size calculations are optimistic. Another option to run simulations of time series data in Stata or R to determine appropriate sample size.
 

Joseph Kuebker, Endourology Fellow

  • "We are going to retrospectively look at two groups of patients who underwent a ureteral stent for a ureteral stone and subsequent ureteoscopy. Our aims are to identify the rate of stone passage and thus subsequent negative ureteroscopy and predictors of this event."
Added:
>
>
  • In 89-92% of cases, the stone does not pass after the stent is placed, and a second surgery is required. Outcome of interest is result of ureteoscopy (positive vs. negative). Also plan to collect BMI, stone size, time between stent and URS, sex, side, use of alpha blocker, size of stent (2-3 categories), and stone location.
  • Recommend calculating sample size using PS software (http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize). Select the Dichotomous tab: output=power, design=independent & case control & odds ratio & uncorrected chi-square test, alpha=.05 (Type I error), n=100, p_0=.11 (event rate in controls), m=1 (ratio cases to controls), psi=2; yields power=.407. Software produces power curves and text for grant proposal.
  • Second project looking at radiation dose from KUB plain film x-rays to evaluate kidney stone disease. Generally have 2 x-rays during evaluation. Imaging techniques have changed and can lead to dose creep. Techs sometimes use higher radiation dose for better exposure and interpretability of x-rays. Goal to compare radiation dose between historical plain film x-rays (average 0.7-0.8 mSv) and digital x-rays. Plan to collect age, gender, BMI, and diameter. Recommend a meta analysis and calculation of confidence intervals.
 

2017 January 26

Drs. Andrew Link & Kristen Hoek, Pathology, Microbiology, Immun

Revision 410
Changes from r390 to r410
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Line: 6 to 6
 

Added:
>
>

2017 February 23

Aaditi Naik, Undergraduate Student

  • "I am an Undergraduate Research Assistant in Dr. David Charles's Movement Disorders lab in the VUMC Neurology Department. My project aims to identify the prevalence of four previously-identified non-motor markers – spatial discrimination threshold, temporal discrimination threshold, vibration-induced illusion of movement, and kinesthesia – in a population of cervical dystonia patients, unaffected family members, and healthy volunteers (control group). Consenting participants will receive a neurological examination performed by a movement disorders neurologist, followed by an assessment of the four non-motor symptoms. Through analysis of the concurrence of the non-motor features across the three groups of participants, we hope to identify a combination of non-motor symptoms which is more prevalent in the cervical dystonia group, and therefore may be indicative of disease development. This study will fill an important unmet need, as to our knowledge there are no published studies assessing the comorbid presentation of these four non-motor symptoms in a single sample."

2017 February 2

Miriam Lense, Otolaryngology

  • "At my previous visit, it was suggested I get information about the reliability of the measure I am using for a power analysis for growth curve analyses."

Joseph Kuebker, Endourology Fellow

  • "We are going to retrospectively look at two groups of patients who underwent a ureteral stent for a ureteral stone and subsequent ureteoscopy. Our aims are to identify the rate of stone passage and thus subsequent negative ureteroscopy and predictors of this event."

2017 January 26

Drs. Andrew Link & Kristen Hoek, Pathology, Microbiology, Immun

  • VR24294 VICTR 010917 "In an influenza vaccine clinical trial, we discovered a large number of both expected and unexpected differentially expressed human genes in primary innate immune cells 1 day after vaccination1. A group of these genes encode RNA-binding proteins and lncRNAs. Numerous studies have shown that these two classes of genes function as posttranscriptional regulators of gene expression2,3. The prolonged expression of pro-inflammatory genes can cause host tissue damage and has been implicated as the cause of autoimmune and other human diseases4. As a consequence, the innate response is tightly regulated and typically short-lived. We hypothesize that the RNA-binding protein and lncRNA genes expressed in human innate immune cells 1 day after stimulation function as either positive or negative posttranscriptional regulators of the innate response. Using human innate cell line models combined with functional and mechanistic experiments, this proposal experimentally tests the ability of candidate RNA-binding protein and lncRNAgenes to function as early innate response genes that posttranscriptionally regulate and modulate the human innate response."
  • Collected 6 immune types at 1, 3, 7, and 28 days post-vaccination from 35 subjects given placebo or influenza vaccine. The 840 samples were analyzed by a third party. Received a list of 80 genes likely involved in innate response and regulating the immune response. Next study will shut down one gene at a time. Will look at those 80 genes that are arrayed on a plate. Assessing each gene is an individual experiment to determine whether gene expression goes up or down. Plan to do this in triplicate and calculate false discovery rate.
  • When you calculate the false discovery rate when comparing two subjects, the power is equal to the alpha (ex. 0.05). What is the level of confidence that a gene labeled as a loser is actually a winner? Recommend analyzing all genes together and ranking genes in order of strength of association. Bootstrap to calculate false negative rate using Effron's method.

Maya Yiadom

  • Outcome: time to readmission. Primary analysis: intention to treat analysis for not intend vs. intend to call. Secondary analyses: controls vs. reached vs. not reached.
  • Initial power calculation using PS software with alpha = 0.005 for interim analysis and alpha = 0.048 for final analysis, baseline time to readmission of 11.51 days, 90% power, and minimum detection effect of 2 days (or 1 day) yielded 1524 and 3048 subjects, respectively. Another sample size calculation assuming a conservative 2% difference yielded 4344 patients per arm enrolled over 1.5 years for 80% power or 5805 patients per arm enrolled over 2 years for 90% power.
  • Recommend adding more looks to analysis. What is the largest sample size that could be enrolled within, say, 1 year? Collect this pilot data then plan a larger study. Another option to continue enrolling patients until research question is answered. You should study readmissions at 90 days for better power. If there is a difference at 90 days, then there is also a difference at 30 days.

2017 January 19

Kathryn McCrystal Dahir, Medicine

  • "In essence this is a study where we examined rare variants in the ALPL gene that were available in BioVU and looked for expected associations as well as a new associations which were discovered via PheWAS. We had 180 cases plus matched controls which were manually reviewed by two reviewers that were blinded. The manual chart review of the patients record in the SD is in Excel. Additionally we have some very basic statistics in R and summarized here. We would appreciate some advice from bio-stats on better visualization/representation of the data."
  • Location of point mutation in ALPL gene is important in function of hypophosphatasia (HPP). Interested in heterozygous autosomal recessive mutation. New treatment recombinant alkaline phosphatase for pediatric cases. Found 13 rare ALPL variants in BioVU; excluded variant with 11% occurrence because too common. In BioVU, searched for oral surgery clinic visits, dental visits, bone fracture x-rays, and hysterectomies.
  • Want to determine if some SNPs are more pathogenic than others. Recommend using a logistic regression model and reporting odds ratios for disease for each SNP (or compound heterozygous models) along with the confidence intervals.
  • Have funding for statistical support. Contact Frank Harrell, PhD regarding collaboration options.

Leon Scott, Clinical Orthopaedics & Rehabilitation

  • Recommend collecting repeated measures of force from each subject (multiple steps). Calculate average maximum force for each subject. Then calculate confidence intervals for the two devices (want intervals to overlap). To prove equivalence, need to establish what is the likely difference between the two devices (want difference to be clinically trivial). Define apriori what is clinically significant difference. Can use inter-rater reliability to estimate percentage of variation that is due to the difference in steps; report interclass correlation coefficient. Available R software package 'CCRM'. Also recommend generating Bland-Altman plots for each subject or combining data across all subjects. Ideally, the plot will look like a straight line; otherwise, it may show where the wearable sensor is not producing a good approximation of the force as measured by the laboratory instrument.

2017 January 5

Sandip Chaugai, Clinical Pharmacology

  • "I only have a couple of quick questions on meta-regression analysis of calcium channel blockers in hypertension. I spoke to Daniel Byrne, and he suggested me to attend the clinic."
  • Have long-, intermediate-, and short-acting drug classes; outcomes include mortality and heart failure. Each study randomized and matched patients prior to estimating odds ratios. For each study, need to verify on which covariates patients were matched (ex. diabetes). Can only combine subgroups of studies that adjusted for same covariates in estimation of odds ratio. Potential issue given that the control drug was different in each of the studies. Also concerned with validity of comparing odds ratios among 3 drug classes because the drugs are used in different populations.
  • Need to look at confidence intervals from fixed and random effects models. Recommend including additional tick marks and labels on x-axis (log odds scale). Note cannot make patient-level conclusions from population-level data (i.e. ecological fallacy). Number of covariates in meta-regression model will be limited by number of studies that can be combined. Recommend deciding which predictors are most important and including them in meta-regression model. The standard errors will reflect the number of studies that were combined (larger SE's with fewer studies). Can assess stability of model by looking at how results change when remove 1 or 2 of the studies from meta-regression model.
 

2016 December 22

Autumn Bagwell, Vanderbilt Specialty Pharmacy

  • "We are a group of novice researchers that recently began a number of outcomes projects that are primarily retrospective reviews of cohorts of patients on specialty medications. We have completed some of the project proposals and know our endpoints and project aims, but need assistance in a couple areas: 1) estimating the time/funds required to complete the stats for our projects to better build our budget proposals for potential project sponsors, and 2) ensuring we are applying the appropriate statistical tests based on our endpoints."
Added:
>
>
  • Project 1: Have 200 osteoporosis patients who received Forteo in Endocrinology clinic. Collected data from VSP and non-VSP patients. Outcomes are drug treatment completion (Y/N) and clinical outcomes (DEXA risk score, FRAX score, number of fractures). Planning to do univariate analysis comparing patients who completed drug treatment vs. patients who did not complete drug treatment. Recommend collecting data on duration of treatment completed and adverse events. Dr. Frank Harrell will send rough estimate for budget anticipating 12 projects per year.

Miriam Lense, Otolaryngology

  • Asking for assistance with conducting a power analysis for a growth curve analysis in preparation for a grant. This project is looking at longitudinal trajectories of language development (using questionnaire date) across 2 samples of children (total n~70) with 2-4 variables of interest. I expect to use a random intercept and slopes model and expect to include both linear and quadratic growth terms.
  • Previous studies have show that approximately 20% of children who have a sibling with a language developmental issue will also be diagnosed with a language developmental issue. Children are divided into 2 groups: low-risk or high-risk of language developmental issues. Language questionnaires will be completed by the parent at 4 time points (9, 12, 15, and 18 months of age, +/- 2 weeks around target time). Planning to do growth curve analysis across all patients, without stratification by risk group.
  • Recommend treating 9-month language measure as baseline and including this as a covariate in the regression model. Can estimate means at each of the other 3 time points. Need to estimate correlation using previous data and plug this into formula to calculate standard error. Since time points are equally spaced, may be able to assume AR1 correlation structure. Recommend modeling actual time (age in months) when language questionnaire was completed. When estimating trajectories, can include confidence bands around trajectories. When comparing the two curves, look for differences in coefficients at any of the 3 time points. Another option to randomize children to the number and timing of repeated language questionnaires.
  • Recommend writing proposal and returning to another Biostatistics clinic for additional feedback.
 

2016 December 15

Changed:
<
<

Melissa Henry, Department of Hearing and Speech Sciences

  • TBD
>
>

WITHDREW: Melissa Henry, Department of Hearing and Speech Sciences

 

Jennifer Erves, Internal Medicine, Meharry Medical College

Changed:
<
<
  • Interpretation of ordinal logistic regression
>
>
  • Interpretation of ordinal logistic regression. Race (1=black, 2=other, 3=white, reference). Barriers score is comprised of 7 questions, and Benefits score is comprised of 7 questions. May consider dropping age, income, grade, and race from model and look at impact on odds ratios for scores. Need to report descriptive statistics for scores stratified by ordinal outcome variable. Since Barriers score is significant, it will be useful to tease apart which barriers are significant. Any correlation between independent variables does not affect validity of the model. Degree of collinearity can be assessed by calculating variance inflation factor (VIF).
  • Second model includes 4 interactions between race and scores. Need to exclude race=2 from the model and to test proportional odds assumption in the model. Recommend using STATA syntax to generate interaction terms and using 'lincom' command to calculate meaningful odds ratios. May calculate and plot predicted probabilities for each level of ordinal outcome stratified by race group. Also recommend adding figures to explain model graphically (ex. boxplots for each score stratified by outcome and by race).
  • Additional options: May consider a stratified analysis within race groups. Dichotomizing the outcome variable for binary logistic regression is appropriate to simplify interpretation. Will still have issues with power if use polytomous regression.
  • Reference: Statistical Models for Biomedical Researchers by William Dupont
 

2016 December 8

Brent Cameron, Radiation Oncology Resident

Line: 55 to 102
 
  • "My faculty mentor is Cecilia Chung MD MPH; she is in the Division of Rheumatology and Immunology. I am hoping to attend Biostats Clinic to discuss my project on Resistant Hypertension in patients with Lupus. I am trying to apply for VICTR Biostatistics funding and would like a quote for the VICTR process. Additionally, we are in the process of extracting data from the synthetic derivative and would like advice on how to best format our dataset for analyses.
  • "The project is based in the Synthetic Derivative, so we have extensive de-identified longitudinal data on a cohort of n=1136 patients with Lupus as defined by an algorithm that has been validated to find patients with Lupus. The primary outcome of the study is to look at the incidence rate and prevalence of resistant hypertension (Blood pressure that is not controlled on 3 or more blood pressure medications) in patients with Lupus. We eventually hope to compare this to a matched control population (but for now we are just focusing on establishing the incidence rate and prevalence in the Lupus cohort). When establishing the incidence rate of resistant hypertension- we will pay close attention to the temporal relationship of Lupus and resistant hypertension. Patients will be considered to have resistant hypertension only in cases where this occurred AFTER the first ICD9 code for Lupus. Incidence rate will be defined as: patients with a history of SLE and development of resistant hypertension after first SLE ICD9 code / person years of observation time. We will extract data on patient age, gender, race, ethnicity, BMI, cholesterol (and the rest of the lipid panel), Creatinine, GFR, Lupus related labs including: ANA, anti-dsDNA (yes versus no), antiphospholipid antibodies (yes versus no), C3, C4. We are also interested in extracting data on comorbidities including Type 2 Diabetes (marked with a flag in the synthetic derivative), end stage renal disease, myocardial infarction, stroke (all based on ICD9 codes). We will also extract data on medications as listed in the synthetic derivative- categories of medications we are extracting are: Lupus medications (immunomodulators), Anti-malarials (used in treatment of Lupus), Corticosteroids and Blood pressure medications. We will possibly compare differences in the covariates listed above in patients with resistant hypertension versus patients without resistant hypertension. We will possibly compare differences in the covariates listed above in patients with resistant hypertension versus patients with controlled hypertension versus patients without hypertension.
  • "I have additionally attached a word document with a draft that has empty tables showing how I was thinking of displaying the data."
Changed:
<
<
  • VICTR application (35 hours) for statistical support (data analysis May 2017)
>
>
  • Recommend applying for 35-hour VICTR voucher for statistical support (data analysis May 2017)
 
  • Excel database structure for repeated SCr measurements. Recommend using REDCap database if possible.
  • Retrospective study from 1995-present. Expect ~200 out of 1136 patients to have resistant hypertension. Goal to establish incidence and prevalence rates. Survival analysis models 1) unadjusted and 2) adjusted for renal disease, hyperthyroidism, sleep apnea
Line: 1739 to 1787
 
META FILEATTACHMENT attachment="VR7850.R1_Pre-review_Questions_Round_2_10.25.16.docx" attr="" comment="" date="1479150220" name="VR7850.R1_Pre-review_Questions_Round_2_10.25.16.docx" path="VR7850.R1_Pre-review_Questions_Round_2_10.25.16.docx" size="517250" user="AmyPerkins" version="1"
META FILEATTACHMENT attachment="VICTR_Research_Proposal_Roman-no_CPT_2016.docx" attr="" comment="" date="1479150228" name="VICTR_Research_Proposal_Roman-no_CPT_2016.docx" path="VICTR_Research_Proposal_Roman-no_CPT_2016.docx" size="43476" user="AmyPerkins" version="1"
META FILEATTACHMENT attachment="BotScoringSheet.pdf" attr="" comment="" date="1479498680" name="BotScoringSheet.pdf" path="BotScoringSheet.pdf" size="345672" user="AmyPerkins" version="1"
Added:
>
>
META FILEATTACHMENT attachment="reserve_biostat_clinic_Thursday__January_19th.html" attr="" comment="" date="1484241042" name="reserve_biostat_clinic_Thursday__January_19th.html" path="reserve_biostat_clinic_Thursday_ January_19th.html" size="10182" user="AmyPerkins" version="1"
META FILEATTACHMENT attachment="VPA_poster_AES_2008_v5_12-3-08.pdf" attr="" comment="" date="1485540472" name="VPA_poster_AES_2008_v5_12-3-08.pdf" path="VPA_poster_AES_2008_v5_12-3-08.pdf" size="206311" user="AmyPerkins" version="1"
Revision 390
Changes from r370 to r390
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Click here for older notes
Added:
>
>

2016 December 22

Autumn Bagwell, Vanderbilt Specialty Pharmacy

  • "We are a group of novice researchers that recently began a number of outcomes projects that are primarily retrospective reviews of cohorts of patients on specialty medications. We have completed some of the project proposals and know our endpoints and project aims, but need assistance in a couple areas: 1) estimating the time/funds required to complete the stats for our projects to better build our budget proposals for potential project sponsors, and 2) ensuring we are applying the appropriate statistical tests based on our endpoints."

2016 December 15

Melissa Henry, Department of Hearing and Speech Sciences

  • TBD

Jennifer Erves, Internal Medicine, Meharry Medical College

  • Interpretation of ordinal logistic regression

2016 December 8

Brent Cameron, Radiation Oncology Resident

  • "Our questions are going to be centered around the specific statistical methods for analyzing our data. We plan on applying for VICTR voucher to have a biostatistician analyze the data. The trial involves quality of life metrics and well as more objective clinical metrics for evaluation of stereotatic radiosurgery for medically refractory tremor patients. Patients fill out questionnaires at baseline enrollment of the trial, then 3, 6, 9, and 12 months after the trial. We have enrolled 22 patients thus far. For some patients they have completed the 12 month period. Others have just enrolled and may only have 3 month time point. As in any trial, some patients did not show up for all the appointments so no every patient may have all data points. The questionnaires have 30+ questions at each time point. We don’t know for sure which questions on the survey may be positive. There is published data on similar studies using a different technique. We would like to report the Vanderbilt experience to show that our method is equivalent to others."
  • Goal is to enroll 30 patients total. Clinical assessment of tremor (ordinal 0-4) and handwriting (ordinal 0-4) at baseline and 3, 6, 9, and 12 months. Psychological assessment at baseline and 6 months. There are slightly different questions on the questionnaires. Aims are to show improvement in quality of life and clinical assessments after the procedure and to show efficacy is similar to gamma knife surgery (using published data).
  • You will need to develop a primary analysis strategy. Can generate a combined score for most important factors. Recommend calculating area under the curve for this score over time and taking average by the length of time the patient was followed. Can do a t test or signed-rank test on this calculated average. An exploratory analysis plotting scores for the factors may inform which factors can be clustered together. Recommend applying for 90-hour VICTR award for biostatistics support.

Jennifer Erves, Internal Medicine, Meharry Medical College

  • "I am analyzing data to identify factors influencing parental willingness of adolescent participation in a clinical trial. The current analysis is an ordinal logistic regression, and we are inquiring if we should use a binary logistic regression. We are also inquiring of the Chi-Square analyses we propose to use to identify if racial differences exist in parental willingness of adolescent participation in clinical trials."
  • Started with 290 participants and were advised to remove patients with missing data (n= 51). Sample size is 239 participants. Willingness to participate is ordinal on a 5-point scale. You can use a proportional odds model for this ordinal outcome. Dichotomizing this outcome variable for binary logistic regression is appropriate to simplify interpretation. May want to use a global chunk test comparing models with and without interactions with race.
  • If race is not a significant predictor in the regression model, then it is not recommended to continue with the chi-square test. If do continue with chi-square test, you may need to use Bonferroni's adjustment for multiple comparisons (although this is a very conservative approach). You will need to clearly explain your approach in the methods section.

2016 December 1

Adrienne Roman, Department of Hearing and Speech Sciences

  • "I am a postdoc in the Department of Hearing and Speech Sciences in Audiology. We are applying for a VICTR grant and received feedback on our application asking for us to consult with a biostatistician to better plan analyses. I have attached the pre-review questions we need to respond to as well as our application to VICTR. Our methodology is a bit complicated, but the goal of the study is to tell if individuals with cochlear implants (CIs) are affected when we increase individual's access to sound by having them sleep with their CIs on during the night. It will be a 5 week study where 2 weeks, individuals will sleep with their CIs on. We will also have other physiological data (cortisol and actigraphy measures) in addition to self-report surveys. Any feedback or recommendations would be greatly appreciated regarding advisement on statistical counseling."

Miller Tracy, Department of Psychiatry

  • Our project is analyzing data from the Bruininks-Oseretsky Test of Motor Proficiency (BOT2), which is an individually administered test used to measure gross and fine motor skills. I have attached a copy of the scoring page with the subtests that we use circled. We would like to do our analysis separately for kids (7-10 years) and adults (18-35 years). Within these groups we are comparing scores between participants that are typically developing and those that are on the autism spectrum (1 = kid w autism, 2 = TD kid, 3 = adult w autism, 4 = TD adult). We would like to compare differences in point scores between the typically developing and autism groups while controlling for age and possibly WASI score, which is a partial cognitive assessment that measures IQ. Note that point scores are derived from raw scores on the scoring sheet. We’d like to compare point score differences for subtest totals but it might also be beneficial to test differences on tasks within subtests. We are having trouble deciding which model would be best to use for this kind of data and analysis.
  • Each subtest score is ordinal. There are 20-30 subjects per group.
  • Recommend using rank-sum test and proportional odds regression (or polytomous regression which requires a large number of parameters). May consider generating scatterplots and boxplots to visualize data. Permutation tests provide an estimate of the magnitude of the effect, and the p-value is the proportion of permutation tests where the test statistic is as extreme or more extreme than the critical value. Can conduct separate regression analysis with ASD group using ordinal severity score.

2016 November 17

Tenisha Hinners, Pathology Resident

  • "I have been working on a retrospective case-control pilot study with Dr. Alison Woodworth, who was previously a faculty member in the Department of Pathology, Microbiology, and Immunology. The objective of the study is to determine the diagnostic utility of four serum markers (hCG, AFP, CA-125, CRP) and maternal age to predict ectopic pregnancy (EP) at presentation to the emergency department in pregnant women with symptoms of vaginal bleeding, abdominal pain, or cramping. Specifically, we want to know if a combination of these markers into a multivariable logistic regression model will provide a more powerful predictor of pregnancy outcome (viable pregnancy vs. spontaneous abortion vs. ectopic pregnancy) than any one marker on its own. We have collected data on a small sample size of 122 (16 EP, 30 spontaneous abortions (SA), 51 viable intrauterine pregnancies (VIP)) and want to know if this type of analysis is feasible with our numbers."
  • Gold standard for EP diagnosis is transvaginal ultrasound or laparoscopic surgery for EP documented in medical chart. Can look for patterns in scatterplots of each combination of markers with dots color coded by diagnosis. Compare means of biomarkers among diagnosis groups in an exploratory analysis. Recommend regressing each of the 4 markers against EP individually. May consider creating a weighted score of the markers but may still not achieve statistical significance. Will not have adequate power for a logistic regression model given sample size.
  • Recommend applying for 35-hour VICTR voucher for statistical support.

Logan LeBlanc, 3rd year medical student

  • The Vanderbilt Street Psychiatry Program assists mentally ill homeless patients in a street setting. Through a partnership with a local non-profit, qualifying patients are also applied for disability+medicaid coverage, and a small cohort of patients (~60) have been approved and received this coverage. Experientially, our program has recognized that patients have a significant reduction in frequency of VUMC ED visits and hospitalizations once they receive disability+medicaid coverage. We are hoping to better investigate and quantify this reduction.
  • We are conducting a retrospective cohort study among patients who have been approved for disability+medicaid coverage through our program. We have used starpanel records to total the number of ED visits, hospitalizations, and total length of stay for each patient in the cohort in the 1-year period preceding medicaid approval and the 1-year period following approval. We have also collected similar data for the 6-month and 18-month periods before and after approval, and I have questions about the best method of analysis.
  • Recommend calculating rate of ED use per 100 person-months before and after approval. Use total calendar time since approval even if patient has not been seen in the ED for a period of time. This assumes that the likelihood of transiency is the same for a pre-specified time window (ex. 6, 12, or 18 months) before and after approval. May want to consider limiting analysis to group of patients with documented ED visit at least 6 months (or 12m, 18m) prior to approval. Would know that patient lived in the area during that time.
  • Alternatively, could vary time window based on duration of specific patient's records. Potential bias with underestimating ED utilization prior to approval and bias with patients being more likely to stay in the area once have approval and are receiving care.
  • Can return to clinic for additional assistance or apply for 35-hour VICTR voucher for statistical support.

2016 November 10

Jocelyn Durlacher, Medical Student

  • "My faculty mentor is Cecilia Chung MD MPH; she is in the Division of Rheumatology and Immunology. I am hoping to attend Biostats Clinic to discuss my project on Resistant Hypertension in patients with Lupus. I am trying to apply for VICTR Biostatistics funding and would like a quote for the VICTR process. Additionally, we are in the process of extracting data from the synthetic derivative and would like advice on how to best format our dataset for analyses.
  • "The project is based in the Synthetic Derivative, so we have extensive de-identified longitudinal data on a cohort of n=1136 patients with Lupus as defined by an algorithm that has been validated to find patients with Lupus. The primary outcome of the study is to look at the incidence rate and prevalence of resistant hypertension (Blood pressure that is not controlled on 3 or more blood pressure medications) in patients with Lupus. We eventually hope to compare this to a matched control population (but for now we are just focusing on establishing the incidence rate and prevalence in the Lupus cohort). When establishing the incidence rate of resistant hypertension- we will pay close attention to the temporal relationship of Lupus and resistant hypertension. Patients will be considered to have resistant hypertension only in cases where this occurred AFTER the first ICD9 code for Lupus. Incidence rate will be defined as: patients with a history of SLE and development of resistant hypertension after first SLE ICD9 code / person years of observation time. We will extract data on patient age, gender, race, ethnicity, BMI, cholesterol (and the rest of the lipid panel), Creatinine, GFR, Lupus related labs including: ANA, anti-dsDNA (yes versus no), antiphospholipid antibodies (yes versus no), C3, C4. We are also interested in extracting data on comorbidities including Type 2 Diabetes (marked with a flag in the synthetic derivative), end stage renal disease, myocardial infarction, stroke (all based on ICD9 codes). We will also extract data on medications as listed in the synthetic derivative- categories of medications we are extracting are: Lupus medications (immunomodulators), Anti-malarials (used in treatment of Lupus), Corticosteroids and Blood pressure medications. We will possibly compare differences in the covariates listed above in patients with resistant hypertension versus patients without resistant hypertension. We will possibly compare differences in the covariates listed above in patients with resistant hypertension versus patients with controlled hypertension versus patients without hypertension.
  • "I have additionally attached a word document with a draft that has empty tables showing how I was thinking of displaying the data."
  • VICTR application (35 hours) for statistical support (data analysis May 2017)
  • Excel database structure for repeated SCr measurements. Recommend using REDCap database if possible.
  • Retrospective study from 1995-present. Expect ~200 out of 1136 patients to have resistant hypertension. Goal to establish incidence and prevalence rates. Survival analysis models 1) unadjusted and 2) adjusted for renal disease, hyperthyroidism, sleep apnea

Timothy Hegeman, Cardiovascular Medicine Fellow

  • "I have a project and would like to review the feasibility if possible. The practice of carotid artery stenosis screening using duplex ultrasound is common, but the impact of such screening is unknown. Question: Does screening with carotid ultrasound improve outcomes in cardiac surgery? Patients: All patient undergoing the CABG, SAVR, MVR, MVRe, TAVR, Aortic arch repair, LVAD, Transplant. Exposure: Receiving a carotid ultrasound at VUMC within 12 months of the procedure. Primary outcomes: Incidence of perioperative mortality and/or stroke. Secondary outcomes: mortality, stroke, for the CABG subgroup the number of grafts, LOS, reimbursement/cost."
  • Have 10 years of data and 10,000 cardiac surgeries with 40% having had an ultrasound. Expect event rates of 2-4% stroke and 2-4% mortality. Collecting age and status of carotid disease, smoking status, hyperlipidemia, and hypertension in 12 months prior to procedure. Do not have data on patients who had ultrasound and physician decided to change procedure or not to do the planned procedure, which will bias results. Recommend altering research question or narrowing population to reduce bias. Limitation of study when reason for ultrasound unknown.
  • Patients at higher risk of stroke/mortality (ex. having carotid disease) may be more likely to have ultrasound. Certain procedures (ex. valve surgery) have a higher risk of stroke. Urgency of procedure may be related to higher mortality. TAVR really only done in last 4 years. May be difficult to control for large number of procedures in statistical model. May consider adding some procedures to exclusion criteria.
  • Recommend applying for 90-hour VICTR award for statistical support. Will need to control for calendar year to account for changes in surgery procedure and the fact that fewer ultrasounds are being done in the last few years.

2016 November 3

Scott Karpowicz, Pharmacy Administration Resident

  • "We previously presented to the Monday clinic in August regarding my project proposal to examine the impact of a discharge prescription service on hospital readmissions at Vanderbilt. We’re currently in the application process for a VICTR grant (VR22383), and we’d like to request biostatistics assistance with data analysis. Dan Byrne provided some initial feedback on our VICTR application and requested that we visit another clinic for further discussion. Our responses to his questions are attached."
  • Note potential for physicians to recommend Meds-to-Beds service more often for sicker patients (indication bias). Possibility that patients who decline service cannot pay for meds at the bedside. Also unable to determine whether patients who decline service actually have the prescription(s) filled at an outside pharmacy.
  • Recommend calculating propensity score and either 1) including propensity score in your regression model or 2) weighting regression by propensity score. Intention-to-treat analysis will help with indication bias.
  • Will need statistical support to calculate propensity scores and build regression models.

Yaa Kumah-Crystal, Department of Pediatrics

  • "I would like to review my criteria for identifying good matching controls for a cluster analysis I will be doing and I would like to run some thoughts by you statisticians to see if you have any additional suggestions.
  • "Aim: This study aims to improve communication between families managing pediatric diabetes and their providers. We hope to demonstrate the using before-visit questionnaires to help families identify their barriers to adherence will lead to better communication, and increased documentation of the family’s barriers to adherence in the provider’s clinical notes.
  • "Population: We have 17 provider participants. We have 102 intervention patients who will complete before-visit questionnaires to identify their diabetes barriers before their clinic visit. Each of the providers will have encounters with 6 different intervention patients that have completed a before-visit questionnaire. We will evaluate the providers notes after the encounter with their 6 intervention patients to see if their notes show an increased frequency of documentation of barriers to diabetes adherence. We will compare the frequency of this documentation of barriers to adherence for the providers notes in their 6 intervention patients that completed the before visit questionnaire prior to their clinic visit compared to the frequency of documentation of barriers for 12 control patient notes that did not complete a before visit questionnaire.
  • "Data collection: We will match the intervention patients to patients seen by the same provider during the intervention, and we will also use historical notes from the intervention patients seen by the same provider prior to the intervention as a basis of comparison. Patients will be matched based on clinical criteria relevant to their diabetes management and related to their potential barrier to adherence. The matching criteria will include: Age, gender, A1C, and duration of diabetes. We will use cluster sampling to determine the changes in documentation per provider for each cluster group. We will perform analysis on clinical notes generated from notes for patients that are in the intervention compared to notes for patients that are not in the intervention, that are generated during the intervention period. Notes generated by providers from patients that are not participating in the intervention will be analyzed to compare changes in documentation.
  • Sample Size Justification and Statistical Analysis Plan: In this study we plan to enroll 17 eligible providers with 103 patients in the intervention group and 17 providers with 206 patients in the control group within a 6 month duration period. We target enrollment in order to achieve an average of 6 intervention patient participants for each provider participant, and 12 control patients per provider participant. We will use Generalized Estimating Equation (GEE) method to adjust for the cluster effect within provider and to determine the intervention effect on the outcome of barriers to adherence, which is a grading scheme on a scale from 0 to 5. The average degree of documentation in the preliminary study was 1.55 (SD 1.72). Based on our preliminary study, in the worst case scenario for our cluster evaluation where the correlation coefficient in the clusters is 1, the valid sample size will reduce to the number of providers, which are 17s in both intervention and control groups. Assuming that the difference in the experimental and control means is 1.5 with standard deviation 1.7, based on the two-sample t-test statistics we will be able to reject the null hypothesis that the population means of the experimental and control groups are equal with probability (power) .829. In a best case scenario where the correlation coefficient is 0 within the clusters, it is equivalent to have a valid sample size of 103 in the experimental group and 206 in the control group. The power to reject the null hypothesis that the population means of the experimental and control groups are equal will increase to 1.000. The Type I errors associated with the previous two power calculation are 0.05.This analysis was performed using PS: Power and Sample Size Calculation version 3.0.43, by William D. Dupont and Walton D. Plummer, Jr."
  • Recommend a cluster randomized trial where the physicians are randomized to intervention or control (8 in each group) without any crossover. Control group will fill out another form or complete no forms. Will need at least 6 patients per physician. Outcome is whether physician documents certain information in the medical chart (ordinal variable with 6 levels). Will need to review physician documentation in a patient's chart even if the patient did not agree to complete the form.
 

2016 October 27

Mali Schneiter, DO

  • Review of protocol: "Risk of Endometrial Hyperplasia and Carcinoma in Marathon Runners: A Cross Sectional Survey"
Added:
>
>
  • Risk factors for endometrial cancer: polycystic ovarian syndrome, early menarche, late menopause, high estrogen exposure, anovulatory process, obesity are additional risk factors.
  • Sample size will depend on incidence of endometrial cancer in general population. Control data from WHO, but this does not include BMI data. This could result in confounding by BMI, if BMI affects cancer risk. It would be ideal to match on age and BMI. At which point in time does BMI matter?
  • May consider collecting pilot data demonstrating reduced risk of cancer in marathon runners. May also consider using available data in Synthetic Derivative on endometrial cancer diagnosis. This information could be used in a power calculation to assess feasibility of larger study.

Kelsey Gregory, Pediatrics Resident

  • Review of protocol: "iSLEEP (Improving Safe Sleep Learning and Education in the Early Period)"
  • Plan to enroll 200 mothers in a 2-month period at VUMC nursery. Randomized to 1 of 3 interventions: standard oral teaching, standard oral teaching + Video A, standard oral teaching + Video B.
  • May consider weighting questions when calculating survey summary score. Expect to see improvement in scores and one intervention to show higher level of improvement.

Rebekah Nevel, Pediatric Pulmonary Fellow

  • "I am doing a project on growth in children with one type of rare interstitial pediatric lung disease. I have a specific question regarding completion of a Kaplan Meier curve on duration of continuous supplemental oxygen requirement in those with and without failure to thrive."
  • Nested retrospective cohort within prospective study. Need to be cautious of immortal time bias when patients have been tracked since birth. Patients were enrolled at diagnosis (generally first 1-2 months of life).
  • Measurements: age at coming off supplemental O2, whether currently on supplemental O2, current age, and initial weight percentile.
  • Must define time of cohort entry (initiation of O2), time of exit (come off O2), end of study (8/2016), status (0 = still on O2, 1 = came off O2). Patients still on O2 at end of study are censored at that time (8/2016).
 

2016 October 13

Joey Starnes, MD/MPH Student

Line: 1644 to 1732
 
META FILEATTACHMENT attachment="Aims-10-21-15.docx" attr="" comment="" date="1446130826" name="Aims-10-21-15.docx" path="Aims-10-21-15.docx" size="12834" user="LiWang" version="1"
META FILEATTACHMENT attachment="Pilot_Full_Protocol_4.7.16.docx" attr="" comment="Kropski - Phase Ib Trial protocol" date="1460554697" name="Pilot_Full_Protocol_4.7.16.docx" path="Pilot Full Protocol 4.7.16.docx" size="241895" user="JonKropski" version="1"
META FILEATTACHMENT attachment="Survey_Research_Protocol.docx" attr="" comment="" date="1476826956" name="Survey_Research_Protocol.docx" path="Survey Research Protocol.docx" size="17624" user="AmyPerkins" version="1"
Added:
>
>
META FILEATTACHMENT attachment="Gregory_Health_Sciences_Protocol_for__iSLEEP.doc" attr="" comment="" date="1477580922" name="Gregory_Health_Sciences_Protocol_for__iSLEEP.doc" path="Gregory_Health_Sciences_Protocol_for_ iSLEEP.doc" size="34304" user="AmyPerkins" version="1"
META FILEATTACHMENT attachment="VR22383_Pre-review_questions_for_PI.docx" attr="" comment="" date="1477940882" name="VR22383_Pre-review_questions_for_PI.docx" path="VR22383_Pre-review_questions_for_PI.docx" size="80151" user="AmyPerkins" version="1"
META FILEATTACHMENT attachment="J_Durlacher_Tables_for_Biostats_Clinic.docx" attr="" comment="" date="1478183283" name="J_Durlacher_Tables_for_Biostats_Clinic.docx" path="J_Durlacher_Tables_for_Biostats_Clinic.docx" size="63010" user="AmyPerkins" version="1"
META FILEATTACHMENT attachment="VICTR.Proposal.v1.pdf" attr="" comment="" date="1478183482" name="VICTR.Proposal.v1.pdf" path="VICTR.Proposal.v1.pdf" size="174384" user="AmyPerkins" version="1"
META FILEATTACHMENT attachment="VR7850.R1_Pre-review_Questions_Round_2_10.25.16.docx" attr="" comment="" date="1479150220" name="VR7850.R1_Pre-review_Questions_Round_2_10.25.16.docx" path="VR7850.R1_Pre-review_Questions_Round_2_10.25.16.docx" size="517250" user="AmyPerkins" version="1"
META FILEATTACHMENT attachment="VICTR_Research_Proposal_Roman-no_CPT_2016.docx" attr="" comment="" date="1479150228" name="VICTR_Research_Proposal_Roman-no_CPT_2016.docx" path="VICTR_Research_Proposal_Roman-no_CPT_2016.docx" size="43476" user="AmyPerkins" version="1"
META FILEATTACHMENT attachment="BotScoringSheet.pdf" attr="" comment="" date="1479498680" name="BotScoringSheet.pdf" path="BotScoringSheet.pdf" size="345672" user="AmyPerkins" version="1"
Revision 370
Changes from r350 to r370
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Click here for older notes
Changed:
<
<

2016 August 25

>
>

2016 October 27

Mali Schneiter, DO

  • Review of protocol: "Risk of Endometrial Hyperplasia and Carcinoma in Marathon Runners: A Cross Sectional Survey"

2016 October 13

Joey Starnes, MD/MPH Student

  • I have been helping with a project in the Department of Pediatric Surgery/Trauma under Purnima Unni. I was hoping to come to clinic to briefly discuss analysis of the dataset we have collected. This is a small dataset consisting of golf cart accidents identified from the medical record. We hope to characterize the nature of these accidents and the injuries associated with them, primarily through descriptive statistics and relative risk. We have also considered doing a heat map or GIS with zip code data.
  • Trauma database from 2008-2015 with 30 events (golf cart accident) out of 3300 trauma cases. About half of the cases were referred to Vanderbilt from another hospital. Variables include diagnosis, role of child in event, age, gender, treatment, disposition, zip code of injury location, and injury location on body. National dataset includes 1500 events with diagnosis and injury location on body
  • Primary research question: characterize golf cart accidents involving children. Recommend reporting descriptive statistics (including percentages of broad diagnoses) and discussing the limitation that this data includes tertiary cases

Joe Wick, Medical Student

  • "I am working on a project with Dr. Clint Devin in the department of orthopedic surgery. Our question regards inter-rater reliability. We are working on a project in which we intend to send a survey to physicians at other institutions asking them to determine whether patients need surgery based on imaging (CT scans, MRIs) that we send with the survey. Physicians will be able to select one of three answer choices on the survey. I am hoping that the biostats clinic can help us to determine the proper inter-rater reliability calculation to use (e.g. is Cohen’s kappa appropriate?), the proper number of patients to include on the survey, and the proper number of respondents/raters to send the survey to."
  • Recommendations can include surgery, back brace, or no additional treatment
  • From registry of 800 cases that were initially treated with a brace based on initial images (supine scans), identified 13 cases for which the recommendation changed to surgery after review of follow-up images (upright scans). Survey assumes initial images are sufficient and upright scans are not required. Do upright scans add anything to treatment decision? What is the utility of the follow-up images?
  • Surgeons identified to complete the survey are PI's colleagues
  • The kappa statistic adjusts for the frequency-of-event issue. High levels of agreement would yield low power, but this research question does not warrant a sample size calculation.
  • A prospective design could incorporate a patchwork assignment of reviewers (surgeons) to patients.

2016 September 29

 

Jeeyeon Cha, MD, PhD

Added:
>
>
  • I'm preparing a VICTR grant for a small clinical trial. I'd like to discuss experimental/study design, appropriate measurements and data analysis, and interpretation of results, but am open to discussing other measures/aspects as indicated. Please let me know if you require any further information.
  • Previous study of preterm birth in mice. Want to know if pathway is relevant to preterm birth in humans. Methods involve staining of placental sample. Plan to compare staining among 4 groups, preterm laboring vs. preterm non-laboring vs. term laboring vs. term non-laboring. Study has already been done in Japan with 6-8 samples per group.
  • What is the required sample size? Look at standard deviations from published Japanese study and assume a 25% higher SD for the new study. Can utilize PS: Power and Sample Size Calculation software (http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize). Input in t-test tab: output sample size, independent design, input alpha=.05, power=.9, delta=2, sigma=3, m=1, graph difference in population means with x-axis range 1-4 and y-axis range 0-200 (max sample size), review description which explains required sample size per arm. * Recommend applying for 90-hour VICTR award for Biostatistics support beginning with study design through manuscript publication

Lanier Sachs, Special Education

  • "I work for Drs. Laurie Cutting and Sheryl Rimrodt-Frierson at the Education and Brain Sciences Research lab in the department of Special Education. We are interested in attending the clinic to discuss participant randomization procedures for our upcoming IRB-approved clinical trial. We seeking help determining best practices for randomizing 99 patients into three equal groups matched on both age and gender, and help in creating a randomization schedule to provide to VU IDS and CTC."
  • "Our study is a clinical trial involving both behavioral and pharmacological intervention in children ages 10-17 with Neurofibromatosis Type 1 and comorbid reading problems. The question we are asking is whether there are facilitative effects of the pharmacological agent on responsiveness to reading tutoring that improve learning, both at short-term and long-term time points. We plan to use slope-as-outcome hierarchical linear model in our analysis."
  • NF is a genetic disorder with prevalence of 1 in 3500. Goal is to address learning disabilities associated with disease.
  • Plan to recruit 99 subjects over 4 years based on previous power analysis. For randomization, plan to match on age and gender. How should this be done? IDS requested randomization document to explain how to randomize patients as they come in.
  • Classic method to use is randomized block design which produces a randomization scheme per block (ex. males aged 10-13y). REDCap incorporates randomization design. Can generate blocks per year of study enrollment, but this will add another level of complexity to the statistical analysis.
  • Use computer program (ex. R) to create ordered list of treatment assignment within each block. After setting the seed, use function sample() to specify block size and generate condition assignments for each of the blocks.
  • May consider applying for 35-hour VICTR voucher for Biostatistics support beginning with study design through manuscript publication

Wyatt McDonnell, Pathology, Microbiology, & Immunology

  • Planning pilot study with 24 participants measured at 4 time points. Collecting blood sample and sequencing to develop antibody profile; there are 7 features of sequencing that will be documented. Do sequencing changes (biomarkers) predict co-infection state (categorical outcome: no infection, TB only, AIDS only, both TB & AIDS)? Group 1 will hold AIDS status constant; Group 2 will hold TB status constant.
  • Want to calculate power given sample size.

2016 September 22

Tarsheen Sethi, Clinical Fellow in Hematology-Oncology

  • "I am a hematology/oncology fellow and MSCI student and am working on a project titled "MYD88 and PD-1 Pathways in Central Nervous System Lymphoma" and need help with streamlining my data analysis."
  • Recommend constructing Kaplan-Meier curves.
  • Recommend collaborating with researchers at other care centers in order to increase sample size.
  • Use initial data to estimate the sample size needed for future studies.

Ryan Castoro, Physical Medicine

  • Preparing grant for VICTR. Need grantsmanship advice for a grant that does not plan to implement statistical analyses.
  • Recommend pursuing valid, small-sample exploratory analyses.
  • Recommend providing reviewers a plan to pursue scientific studies after initial exploratory phase.

2016 September 15

Grace Umutesi, MPH Student

  • "I am leaving for Kenya this weekend (Sat Sept 17th) to on the Evaluation of Kijabe Nurse Anesthetist training program and I had few question concerning a sample calculation before I leave."
  • Planning to conduct an evaluation of 3 clinics since the placement of CRNAs with advanced training. This will include a facility assessment (supplies, personnel) and interviews with mothers to determine community perception of obstetric care received (ex. where was baby born, what influenced decision for this location, knowledge of available resources such as CRNA nurse). Want to compare historical opinions to opinions after CRNAs were placed (within the last 3 years).
  • How to decide number of individuals to interview? Want to compare rate of mothers who chose to deliver at hospital between historical and current groups. Recommend interviewing as many mothers as possible. We can later generate the necessary power curves.
  • Which individuals to interview? Want representative sample. Have eligibility form for inclusion criteria (age, multiple pregnancies). Will interview mothers at the clinic or visit homes. Limiting interviews to patients in the clinic would limit generalizability to entire community. Churches could be another potential source for interview. May want to look at national data (birth records, immunization records) or consider sampling 6-year-olds and interviewing their mothers. May also want to interview mothers at facilities without CRNAs.

Andrew McKown, Fellow in Pulmonary and Critical Care

  • "I would like some help please in interpreting an analysis. I have a dataset of ICU patients in which I am assessing whether steroid use lowers the risk of ARDS. Using multivariate logistic regression, there is a significant reduction in risk, but a reviewer requested an analysis accounting for competing risk with the outcome death. I am using the cmprsk package in R, but I need some assistance in interpreting the output."
  • Outcome of interest for acute respiratory distress syndrome (ARDS) is death within 96 hours of admission to ICU. Documented whether patient was on steroids prior to admission (prescribed for immunosuppression or asthma treatment). Transplant patients were excluded.
  • Do steroids reduce risk of ARDS? ARDS is a disease of inflammation and is sometimes treated with steroids.
  • Cohort includes 1080 patients, 30-40% developed ARDS, some died within 96 hours of admission. On the morning of ICU Day 2, a patient was enrolled in the study if (s)he had at least one risk factor for ARDS. Days are by calendar day, not 24 hours.
  • Time 0 (inception point) should be Day 2 to avoid immortal time bias. You cannot look forward (after Time 0) to define the cohort. This will exclude patients who had ARDS upon admission.
  • Options for outcome of interest: 1) death within 96 hours of Day 2; 2) ARDS or death; 3) death without ARDS and death with ARDS; 4) given patient died or had ARDS, what is probability that ARDS was diagnosed.
  • Because of confounding by indication, recommend propensity score analysis adjusted for ~50 variables (include splines for continuous variables)
  • Resources:

2016 September 8

William Martinez, Internal Medicine

  • "I would like to review my analysis of survey data we collected from 837 physicians. I have done most of the analysis in SAS. My primary questions is now to do poststratification weights to adjust for survey nonresponse. We have demographics for the total group of physicians surveyed and the demographics of the respondents."
  • Surveyed interns and residents from six different institutions regarding attitudes and behaviors toward speaking up on safety issues. The overall response rate was 50%. Found differences in response rates by gender and postgraduate year (PGY, across institutions). Survey question responses were on a 5-point Likert scale.
  • Demographics included gender, specialty, PGY, study site, self-reported formal training in patient safety. Respondent demographics were self-reported, and some values are missing. Demographics of physicians surveyed were from administrative data (no missing data).
  • Generated 56 strata from institution, gender, PGY, and specialty. A select number of strata had zero physicians represented in sample.
  • Conducted analysis both with and without weights. Recommend comparing standard errors between weighted (expect to be larger) and unweighted analyses.
  • We have assumed that any missing data are missing at random.

2016 September 1

Dr. Karl Moons, Visiting Scholar

Please do not schedule any clients.

2016 August 25

WITHDREW: Jeeyeon Cha, MD, PhD

  I'm preparing a VICTR grant for a small clinical trial. I'd like to discuss experimental/study design, appropriate measurements and data analysis, and interpretation of results, but am open to discussing other measures/aspects as indicated. Please let me know if you require any further information.

2016 August 18

Whitney Muhlestein, Medical Student

Changed:
<
<
  • I am doing outcomes research in the Neurosurgery Department, and my mentor is Dr. Lola Chambless. I am using machine learning to predict outcomes but am also analyzing my data with classic statistics. I just want to make sure I am doing everything properly.
>
>
  • I am doing outcomes research in the Neurosurgery Department, and my mentor is Dr. Lola Chambless. I am using machine learning techniques to predict whether a patient is discharged to home or not following a particular neurosurgical procedure based on preoperative conditions.
  • I was interested in trying a machine learning approach because I hadn't seen it used a lot in neurosurgery outcomes research, and I wanted to see if I could build a more predictive model than a basic logistic regression using different classes of machine learning models in an ensemble approach. I trained my models (34 different machine learning models, including a logistic regression) using a training data set (67% of the data), and then validated those models on a holdout dataset (the remainder of the data). I ranked the predictive power of the models based on the AUC of the ROC curve from the holdout dataset. The model with the highest AUC ended up being an ensemble model combining a Random Forest Classifier, and Elastic Net Classifier (which is a regularized regression), and a Nystroem Kernel SVM.
  • I am also analyzing my data with classic statistics. Specifically, I am comparing characteristics of patients who discharge home and those who do not to look for statistical significance. I did some basic statistics comparing preoperative characteristics of patients who do go home and those who do not. I want to make sure that I am using the correct types of statistical tests and that I am treating missing data appropriately.
 

2016 August 11

Juan Pablo Arroyo, Internal Medicine

  • Assistance with VICTR application for a study on the role of chloride as a predictor of residual kidney function after donor nephrectomy. The PI is Dr. Gautam Bhave.
Changed:
<
<
  • Have sample of 850 patients
  • Planning to use linear regression model. Chloride (Cl) and creatinine (Cr) were measured at pre-surgery, immediately following surgery, and post-surgery. The dependent variable will be post-surgery Cr, and the independent variables will be pre-surgery and post-surgery Cl and pre-surgery Cr.
  • Would be ideal to have lag in Cl and Cr measurements and to study longitudinal measurements from healthy controls.
>
>
  • Have sample of 850 patients. Chloride (Cl) and creatinine (Cr) were measured at pre-surgery, immediately following surgery, and post-surgery.
  • Planning to use linear regression model. The dependent variable will be post-surgery Cr, and the independent variables will be pre-surgery Cl, post-surgery Cl, and pre-surgery Cr. It would be ideal to have lag in Cl and Cr measurements and to study longitudinal measurements from healthy controls.
 
  • Recommend applying for 90-hour VICTR award for Biostatistics support

Joseph Conrad, Chemistry

Changed:
<
<
  • "I’m working with my research group on a human subjects study design to compare the performance of standard of care rapid diagnostic tests for malaria and enhanced versions of these tests. The study will collect primary blood specimens from individuals in malaria endemic regions in rural Zambia and will be incorporated into the resubmission of an upcoming R01 application. This is a resubmission and previously received a priority score of 37 with criticism that the 900 person human subjects study was too ambitious for the apparent early stage technology."
  • "I’d like to attend an upcoming Biostats Clinic to discuss this study and receive feedback on study design and proposed statistical analysis and suggestions for improvement."
  • Issue with rapid diagnostic test having low sensitivity with low parasitemia levels; these infections go unrecognized.
  • Sample will be comprised of people presenting to a local clinic with malaria symptoms or people in households with a case of diagnosed malaria. Not selecting cases and controls based on gold standard test. Will have paired data.
  • Goal is to demonstrate that the enhanced rapid diagnostic test performs better than standard of care rapid diagnostic test. The gold standard for malaria diagnosis is thin smear microscopy or PCR (reference). Will compare results of rapid diagnostic tests to gold standard truth, then compare accuracies, sensitivities, and specificities of each rapid diagnostic test. Need to decide what margin of error for the difference in proportions would be acceptable to make conclusions. Reference aggressi or fliece
>
>
  • "I’m working with my research group on a human subjects study design to compare the performance of standard of care rapid diagnostic tests for malaria and enhanced versions of these tests. The study will collect primary blood specimens from individuals in malaria endemic regions in rural Zambia and will be incorporated into the resubmission of an upcoming R01 application. This is a resubmission and previously received a priority score of 37 with criticism that the 900 person human subjects study was too ambitious for the apparent early stage technology. I’d like to attend an upcoming Biostats Clinic to discuss this study and receive feedback on study design and proposed statistical analysis and suggestions for improvement."
  • There is an issue with the rapid diagnostic test having low sensitivity with low parasitemia levels; these infections go unrecognized. Sample will be comprised of people who present to a local clinic with malaria symptoms or people in households with a case of diagnosed malaria. We will not select cases and controls based on gold standard test. We will have paired data.
  • Goal is to demonstrate that the enhanced rapid diagnostic test performs better than standard of care rapid diagnostic test. The gold standard for malaria diagnosis is thin smear microscopy or PCR (reference). Plan to compare results of rapid diagnostic tests to gold standard truth, then compare accuracies, sensitivities, and specificities of each rapid diagnostic test. Need to decide what margin of error for the difference in proportions would be acceptable to make conclusions.
 

Brad Christensen, Internal Medicine

Changed:
<
<
  • Retrospective study of bone marrow disorder MDS
  • 2005-2015 sample includes 250 patients
  • Outcome: measure of scar tissue/fibrotic marrows on scale 0, 1, 2, 3. Patients with 0/1 generally have 14 years to AML, and patients with 2/3 have 5 years.
  • Assess trend in degree of fibrosis using rank correlation. Sample size calculation based on acceptable margin of error for half of width of confidence interval (see http://biostat.mc.vanderbilt.edu/tmp/bbr.pdf page 8-14).
>
>
  • Retrospective study of bone marrow disorder MDS. Sample data from 2005-2015 includes 250 patients.
  • Outcome: measure of scar tissue/fibrotic marrows on scale 0-3. Patients with 0 or 1 generally have 14 years to AML, and patients with 2 or 3 have 5 years.
  • Assess trend in degree of fibrosis using rank correlation. Sample size calculation based on acceptable margin of error for half the width of the confidence interval (see http://biostat.mc.vanderbilt.edu/tmp/bbr.pdf pp. 8-14).
 

2016 August 4

Tyler Casey, PharmD, PGY-2 Psychiatric Pharmacy Resident

Line: 60 to 141
 
  • Need to choose a central outcome measure; probably the Ottawa regret scale
  • Can size the study for power or for precision (margin of error; 1/2 the width of the confidence interval for the treatment difference)
  • Ottawa scale review paper of 5 or so studies, gives means and SDs from each study; 16 is a safe estimate
Changed:
<
<
  • The number of patients in each of 2 groups that is necessary to achieve a margin of error of 6 in estimating the difference in means with 0.95 confidence is __
>
>
  • The number of patients in each of 2 groups that is necessary to achieve a margin of error of 6 in estimating the difference in means with 0.95 confidence is __
 
  • If you wanted to achieve half that in the margin of error you would need 4x as many subject

2016 July 14

Line: 336 to 417
 
  • analyzing data regarding entrepreneurs in Haiti

2015 Oct 15

Sarah Tanaka, medical student

Changed:
<
<
  • I’m a medical student working on a project in ophthalmology. I am in need of some help with my project in planning the data collection process from Synthetic Derivative so that it will be most effective for analysis later by a biostatistician
>
>
  • I’m a medical student working on a project in ophthalmology. I am in need of some help with my project in planning the data collection process from Synthetic Derivative so that it will be most effective for analysis later by a biostatistician
 
  • Retrospective cohort study looking at risk factors (ventilation, hospital LOS, etc). Total 120 patients and 60 had events. Only baseline factors can be evaluated.

Jessica Hinshaw

  • Validation for dichotomous variables (I tried cronbach’s alpha and had really low co-effcients) and also analyzing tertiles
Line: 1561 to 1643
 
META FILEATTACHMENT attachment="z.pdf" attr="" comment="" date="1445530876" name="z.pdf" path="z.pdf" size="66045" user="ShiHuang" version="1"
META FILEATTACHMENT attachment="Aims-10-21-15.docx" attr="" comment="" date="1446130826" name="Aims-10-21-15.docx" path="Aims-10-21-15.docx" size="12834" user="LiWang" version="1"
META FILEATTACHMENT attachment="Pilot_Full_Protocol_4.7.16.docx" attr="" comment="Kropski - Phase Ib Trial protocol" date="1460554697" name="Pilot_Full_Protocol_4.7.16.docx" path="Pilot Full Protocol 4.7.16.docx" size="241895" user="JonKropski" version="1"
Added:
>
>
META FILEATTACHMENT attachment="Survey_Research_Protocol.docx" attr="" comment="" date="1476826956" name="Survey_Research_Protocol.docx" path="Survey Research Protocol.docx" size="17624" user="AmyPerkins" version="1"
Revision 350
Changes from r330 to r350
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Click here for older notes
Added:
>
>

2016 August 25

Jeeyeon Cha, MD, PhD

I'm preparing a VICTR grant for a small clinical trial. I'd like to discuss experimental/study design, appropriate measurements and data analysis, and interpretation of results, but am open to discussing other measures/aspects as indicated. Please let me know if you require any further information.

2016 August 18

Whitney Muhlestein, Medical Student

  • I am doing outcomes research in the Neurosurgery Department, and my mentor is Dr. Lola Chambless. I am using machine learning to predict outcomes but am also analyzing my data with classic statistics. I just want to make sure I am doing everything properly.

2016 August 11

Juan Pablo Arroyo, Internal Medicine

  • Assistance with VICTR application for a study on the role of chloride as a predictor of residual kidney function after donor nephrectomy. The PI is Dr. Gautam Bhave.
  • Have sample of 850 patients
  • Planning to use linear regression model. Chloride (Cl) and creatinine (Cr) were measured at pre-surgery, immediately following surgery, and post-surgery. The dependent variable will be post-surgery Cr, and the independent variables will be pre-surgery and post-surgery Cl and pre-surgery Cr.
  • Would be ideal to have lag in Cl and Cr measurements and to study longitudinal measurements from healthy controls.
  • Recommend applying for 90-hour VICTR award for Biostatistics support

Joseph Conrad, Chemistry

  • "I’m working with my research group on a human subjects study design to compare the performance of standard of care rapid diagnostic tests for malaria and enhanced versions of these tests. The study will collect primary blood specimens from individuals in malaria endemic regions in rural Zambia and will be incorporated into the resubmission of an upcoming R01 application. This is a resubmission and previously received a priority score of 37 with criticism that the 900 person human subjects study was too ambitious for the apparent early stage technology."
  • "I’d like to attend an upcoming Biostats Clinic to discuss this study and receive feedback on study design and proposed statistical analysis and suggestions for improvement."
  • Issue with rapid diagnostic test having low sensitivity with low parasitemia levels; these infections go unrecognized.
  • Sample will be comprised of people presenting to a local clinic with malaria symptoms or people in households with a case of diagnosed malaria. Not selecting cases and controls based on gold standard test. Will have paired data.
  • Goal is to demonstrate that the enhanced rapid diagnostic test performs better than standard of care rapid diagnostic test. The gold standard for malaria diagnosis is thin smear microscopy or PCR (reference). Will compare results of rapid diagnostic tests to gold standard truth, then compare accuracies, sensitivities, and specificities of each rapid diagnostic test. Need to decide what margin of error for the difference in proportions would be acceptable to make conclusions. Reference aggressi or fliece

Brad Christensen, Internal Medicine

  • Retrospective study of bone marrow disorder MDS
  • 2005-2015 sample includes 250 patients
  • Outcome: measure of scar tissue/fibrotic marrows on scale 0, 1, 2, 3. Patients with 0/1 generally have 14 years to AML, and patients with 2/3 have 5 years.
  • Assess trend in degree of fibrosis using rank correlation. Sample size calculation based on acceptable margin of error for half of width of confidence interval (see http://biostat.mc.vanderbilt.edu/tmp/bbr.pdf page 8-14).

2016 August 4

Tyler Casey, PharmD, PGY-2 Psychiatric Pharmacy Resident

I am a psychiatric pharmacy resident and I am putting together a study proposal and am looking to get advice from the biostats clinic. My study will be looking at whether patients who are CYP2D6 poor metabolizers are more likely to have experienced adverse effects from antidepressants. The study is still in its early stages, I am looking to discuss: whether the study design I've selected is appropriate for my aims; what is the proper statistical analysis; and how do I find the appropriate patient sample size.
  • This is a retrospective study of patients who have experienced a major depression episode and non-response to an antidepressant. The BioVU database will be used to identify the patient's genotype and categorize the patient as a poor, intermediate, extensive, or ultrarapid metabolizer. It is cost-prohibitive to have genotyping done for patients who are not already in the BioVU database.
  • Can conduct a paired case-control study matching cases (poor metabolizers) and historical controls on pre-specified factors (ex. age, sex, race, medication adherence, and other factors that may be indicative of non-response). Match the patients based on probability of non-response. If a validated score has already been developed, this can be used. When calculating the sample size, note the prevalence of the genotype of interest (poor metabolizer) and determine a reasonably sized difference you want to be able to detect.
  • Another option is to adjust for additional factors in a regression analysis on all eligible patients. This will require 10-20 patients per degree of freedom in the model.
  • Need to develop a clear definition of adverse effects (ex. cause the patient to stop the drug, change therapy, or change dosage) because they can be highly varied.
  • Planning to apply for VICTR award (90 hours). Review http://biostat.mc.vanderbilt.edu/wiki/Main/VICTRBiostatPolicies and fill out the "VICTR Resource Request" at https://starbrite.vanderbilt.edu.

2016 July 28

Laurel Teller, Doctoral Student, Hearing and Speech Sciences

  • "My research project entitled "Does Complex Syntax in Parent Input Vary by Child Language Status?" will compare parent language input variables for children with different language levels. I need help to develop my regression analysis and correlations. I do not have data yet, but would like to talk through my research questions and how to set up the analyses. My research questions are as follows: 1) What is the relation between measures of parent complex syntax input and child language outcomes? (planning to use mixed effects linear regression); 2) How does the proportion of specific types of complex syntax compare among parents of children categorized in three language groups 16 months prior to the outset of the study? (planning to use ANOVA with linear contrasts); and 3) What is the relation between parent complex syntax and associated parent language measures and maternal education level? (planning to use Pearson's R)."
  • Sample includes families (parent and child pairs) in one of three child language levels. Ten "typical language" children were matched with ten "delayed receptive language" children. There were also five un-matched children with problems expressing themselves. The child's language level was assessed 16 months prior to assessing the complex syntax in the parent's input. To categorize the child's language level, cut points were applied to the child's performance on a test. It is recommended to research whether any published journal articles have justified the cut points that were used. Currently, there is no known global assessment of a child's ability to communicate.
  • The parent and child were recorded speaking in their home for one day. Portions of the recording were transcribed by the researcher until 200 utterances by the parent were transcribed. Complex syntax is defined as a sentence with more than one verb. The proportion of parent utterances classified as complex syntax (out of 200) was documented. The proportion of different types of complex syntax were also documented. Complex syntax does not account for speech rate or repetitiveness of speech. An algorithm was applied to account for the amount of background noise.
  • The researcher was not blinded to the child's language group when transcribing the recordings. It would be better to randomize the order of processing the tapes. Given the small sample size, the proportions need to have been measured with high precision (high test-retest reliability, high observer reliability, and low inter-observer variability).
  • Research Question (RQ) 1: It is recommended to use linear models (with all fixed effects) for each of the dependent variables 1) parent's proportion of complex syntax (mean 25%, SD 15) and 2) number of complex syntax types. The independent variables are child language score, gender, age, and ethnicity. Note that mixed effects linear models could be used if the data were longitudinal (repeated recordings) and clustered on families.
  • RQ 2: As a guideline, there should be ~15 families per research question, so the analysis should be simplified given the small sample size. It is recommended to use variable clustering to explain how the different types of complex syntax (proportions) run together. If any of the proportions were correlated, this would make the analysis more complex and require an even larger sample size to tease out the relationships. As a solution, you can use cluster analysis for the proportions and create cluster scores instead of unentangling the relationships.

2016 July 21

Chirayu Patel, Radiation Oncology

  • I need help to determine the appropriate sample size for a randomized clinical study of the impact of educational intervention in the clinic (using visual presentation during clinic consultation) on regret regarding decision to undergo radiation therapy vs. surgery, perceived side effects, and satisfaction with cancer care. We plan to use the EPIC side effects scale (EPIC 26; after treatment), Ottawa regret scale, and SCA service satisfaction scale for cancer. My mentors are Eric Shinohara and Austin Kirschner.
  • Planning session can happen at various durations after consultation; patients can forget radiation side effect discussion
  • Need to choose a central outcome measure; probably the Ottawa regret scale
  • Can size the study for power or for precision (margin of error; 1/2 the width of the confidence interval for the treatment difference)
  • Ottawa scale review paper of 5 or so studies, gives means and SDs from each study; 16 is a safe estimate
  • The number of patients in each of 2 groups that is necessary to achieve a margin of error of 6 in estimating the difference in means with 0.95 confidence is __
  • If you wanted to achieve half that in the margin of error you would need 4x as many subject

2016 July 14

Amanda Currie, Research Intern, Department of Neurology.

*Regarding the appropriate use of matching for a clinical trial utilizing a historical control group. The data is collected from a prospective pilot clinical trial testing the safety and tolerability of deep brain stimulation (DBS) in early stage Parkinson’s disease (PD). The trial randomized 30 subjects to treatment with DBS + optimal drug therapy (ODT) or ODT alone, and the primary analysis at 2 years is reported in the attached manuscript (Charles et al., 2014). Fourteen subjects with DBS were followed for 3 additional years to gather long-term data.

*We hope to compare data from subjects in the DBS + ODT group to a historical control group (treated with ODT). I believe that data from subjects randomized to the optimal medical management group of a 5-year trial of creatine in early stage PD are the best available control group. We have requested access to this dataset and are awaiting a reply (we hope to gain access to patient-level data). The primary outcomes paper for this trial is attached (NETPD, 2015). Although this was the best available control group I could find, some differences exist between the populations. Most notably, the inclusion criteria for the DBS study require an antiparkinsonian medication duration of 6 months – 4 years (mean 2.0 years), whereas the inclusion criteria for the creatine study require an antiparkinsonian medication duration of 3 months – 2 years (mean 0.8 years).

*During this clinic, I would like to gain guidance on the best way to select a control group from this study of creatine in early PD. Specifically, I have the following questions: *1. How many patients should be included in the control group for this analysis, and which factors should be used to select them (age, sex, antiparkinsonian medication duration, etc.)? As a reference, five-year data is available for approximately 345 subjects in the creatine study. *2. Would it be possible to use some of the data for the 132 patients who completed six years of follow-up by creating a new “baseline” at their 1-year visit? *3. Is there a way to create a model based on the 5-year data from the creatine study that could be used to predict patients’ scores on a number of measures at a time point that is comparable to the 5-year mark in the DBS study?

Jill Chafetz, Center for Professional Health

  • Wants to learn more about proportional odds regression

2016 July 7

Michael Ripperger, Student

  • "I have general questions regarding the optimal inferential statistical methods I could use to analyze a pre/post intervention program the hospital is undergoing. I will be interpreting program effectiveness with the available clinical data, but this is not a clinical study. I am working under Dr. Colin Walsh in the HARBOR Lab in the Department of Bioinformatics."
  • Subjects are high utilizers of ER services. This study is a paired design matching the same patient's pre- and post-intervention data. Cases are post-intervention; controls are pre-intervention.
  • Intervention is a specific care plan; outcome is the number of ER visits. Also planning to collect length of stay, type of visit, and cost to hospital.
  • Can generate a spaghetti plot of the number of ER visits pre and post. Can use a Wilcoxon signed rank test to compare number of ER visits between pre and post. Data are likely appropriate for a negative binomial model.
  • Would have been ideal to randomize subjects to intervention or control (no specific care plan) and to compare the number of ER visits between the two groups.

Jill Chafetz, Center for Professional Health

  • "I have questions about interpreting the SPSS output from the severity study (earlier clinic). I did not see values for the independent variables as a whole (such as cohesion or flexibility), only for the levels, plus the regression did not show all of the levels. I also need information about comparing prevalence rates of ACE (Adverse Childhood Events) scores from a sample of MDs to a much larger sample of the general public. The larger sample data come from the CDC, but I have not found either raw data or published statistics that would allow me to run comparisons."
  • For the independent variable 'Threshold', SPSS automatically set category 4 (highest) as the baseline rather than category 0 (lowest). The log odds for the baseline category is equal to the estimated intercept (alpha) in the model. The odds ratio for category 0 compared to category 4 is e^beta0, where beta0 is the estimated coefficient for category 0. To makes thing easier to interpret, there is a way to specify the baseline category as 0 in SPSS.
  • Recommend collapsing dependent outcome 'Severity' into 2 categories to learn how SPSS handles the independent variable(s) in a binary logistic regression model. Then continue with proportional odds regression model.
 

2016 June 9

Jill Chafetz, Center for Professional Health

Revision 330
Changes from r310 to r330
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Changed:
<
<
Older Notes
>
>
Click here for older notes
 

Added:
>
>

2016 June 9

Jill Chafetz, Center for Professional Health

  • There were 279 physicians referred to a course in maintaining boundaries. Those with a very extreme violation were fired and not referred to the course. The collected data include severity of the boundary violation classified by the investigators (ordinal, 1-4, "other", "harassment" (toward colleague/staff only), "impropriety" (toward patient), "violation"(toward patient)) and type of family background assessed by 25-question instrument FACES (ordinal, 1-3, "balanced", "midrange", "extreme"). The 3 family background types were further subdivided into 16 subtypes (categorical), and one of the subtypes ("disengaged rigid") accounts for 30% of the physicians. The sixteen subtypes have been validated with good reliability. There is no control group with zero violations.
  • The question is whether the subtypes predict severity of the boundary violation using ordinal logistic regression.
  • Recommend using proportional odds regression lumping together "impropriety" and "harassment" categories due to concerns regarding appropriateness of ordering in severity of violation
  • Due to small cell counts, recommend re-categorize 16 family types into 2 variables 1) cohesiveness ("separated", "connected", "disengaged", or "enmeshed") and 2) flexibility ("structured", "flexible", "rigid", or "chaotic")
  • Recommend including other covariates in the regression model to adjust for possible confounding: age, gender, specialty, race, marital status
  • May want to add control group with zero violations from previous study (n=117)
  • May want to contact Bill Cooper at Vanderbilt's CPPA regarding their program to address multiple complaints about a physician

2016 June 2

Alyssa Hasty, Molecular Physiology and Biophysics

  • Assistance with VICTR application and future R01 submission
  • "We have found in mice, that adipocyte iron concentration is associated with obesity-related metabolic disease. In humans, indices of overall body iron overload also correlate with metabolic disease. I would like to design a study using lean and obese subjects to determine whether their adipocyte iron concentrations relate to metabolic phenotypes. In addition, we are interested in the adipose tissue macrophage iron content and handling. I would love statistical assistance to determine how many subjects I will need and how I will perform the statistical analyses once we have all of the data."
  • Preliminary work on human adipose tissue samples from gastric bypass patients. Planning pilot study of subcutaneous adipose tissue samples from 5 lean and 5 obese subjects.
  • Plan to compare macrophage iron concentration and handling between two groups adjusting for 20 covariates. This would require a minimum of 200 subjects for a linear regression model (guideline of 10-15 observations per degree of freedom). Can consider using propensity scores for dimensionality reduction if relationship between covariate(s) and outcome is not of primary interest.
  • Pilot data collection will not be completed prior to August R01 submission. Can utilize PS software to generate power curves using prior mice data. Requesting VICTR voucher for statistical support to prepare R01 submission.
  • For analysis of grant data, can contact Richard Peek (GI) regarding Biostatistics collaboration.

Matt Lenert, Biomedical Informatics

  • Cases previously hospitalized for congestive heart failure (CHF) and controls previously hospitalized for another reason. Outcome is unplanned readmission within 30 days; have 643 events. May want to consider time to readmission as secondary outcome.
  • Collected information on treatments (ex. schedule follow-up with PCP, follow-up telephone from hospital 1 week post-discharge), discharge location, and risk profile. Only have date of death if occurred during hospital stay.
  • Goal to determine how treatments decrease risk of readmission and incorporate this into treatment decisions for future patients.
  • Can use logistic regression to compare unplanned readmission within 30 days between the two groups. Time to readmission can be analyzed using Cox model.
  • Have already tested second-order interactions in logistic regression model and calculated area under the ROC curve (AUC). Can use bootstrap to account for possible overfitting with stepwise selection method that was used.
  • May want to contact Dan Byrne in Biostatistics regarding similar studies and read the following article for more information: http://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-014-0241-z.

2016 May 26

Viraj Mehta, Ophthalmology

  • "I'm evaluating eye motility outcomes after surgery for orbital floor fractures in children. I have collected all the data, and needed help figuring out the best way to analyze it."
  • Viraj Mehta and his mentor Elizabeth Mawn came to clinic today. They have a small data set of children with orbital floor fractures and are interested if time to surgery affects improvements in eye motility. There may be some confounding in their data between time to surgery and external referrals, as children who present at Vanderbilt are operated on immediately.
  • They would like a small VICTR grant to help with their analysis, which is appropriate. I said that I would refer them to Amy for help with this process

2016 May 5

Lauren Heusinkveld, Neurology Division of Movement Disorders

  • Questions concerning appropriate methods for statistical analysis and handling missing data in quality of life measures in a recently completed clinical trial testing deep brain stimulation in Parkinson’s disease patients. Faculty mentors for this project are Mallory Hacker, PhD, and David Charles, MD.
  • Randomized 30 patients to treatment (DBS plus ODT or ODT alone); 28 completed 2-year trial. Dose increased in both treatment groups as trial progressed through time. At some point, increasing dose will no longer improve quality of life.
  • Outcome is quality of life measured by Parkinson's Disease Questionnaire (PDQ-39).
  • Missing 3-year data for 12 patients for the extension study (Year 3-5). Generate spaghetti plots and conduct linear regression on each individual patient's QoL scores. Collect slope of regression line for each patient and use rank-sum test to compare slopes between the two treatment groups. If the data are non-linear, then calculate area under the curve.
  • How should 4 patients who worsened and crossed over to DBS during the extension study be handled in the statistical analysis? Censor control patients at the time of treatment crossover.

2016 Apr 28

Bianca Flores, Student

  • "I am trying to use R for a 2 way ANOVA weighted means, however my dataset is not being read. My data I am analyzing pertains to mice (wildtype, heterozygotes, and homozygotes) to assess their performance on a motor task."

2016 Apr 21

Elizabeth Martinez, clinical fellow in pathology

  • The study is a clinicopathologic investigation into Acute Vascular Lesions in the Kidney Transplant and Relationship to Cellular and Antibody-Mediated Rejection through a retrospective review of kidney transplant biopsies with acute vascular rejection and assessment of the concurrence of T-cell mediated rejection by two different criteria (CCTT and Banff). Our cohort includes about 390 biopsies from a span of a decade (diagnosis rendered by VUMC Renal pathology division) all with vascular rejection. I have obtained data on patient demographics and clinical characteristics at time of biopsy, allograft characteristics, timing of rejection episode from data of transplant, C4d status, and when available donor specific antibody status, and followup information on clinical outcome/graft function/survival available on a subset within the cohort.
  • Some of my main concerns relate to the use of survival (K-M) curves to show graft survival from event of rejection episode and also the overall graft survival from time of transplant. I want to ensure I am going about this most soundly and appropriately. I would also like to depict graphically the timing distribution of the rejection episodes.
  • Dataset includes repeated biopsies for select patients who received a second graft after the first failed; subjects were followed for a 5-year period.
  • Censoring is okay as long as it is uninformative, where the risk of graft loss would have been the same if the subject was not censored.
  • Recommend generating a cumulative morbidity curve for time to graft loss using Kaplan-Meier and time of transplant as time 0. Do not recommend using time of rejection as time 0 because you need to account for the level of severity/progression at the time of rejection. Can create a table with the percentage of patients transplanted within prespecified time intervals to show variability in the time to rejection.
  • Recommend submitting a VICTR application for 90 hours of biostatistical consultation (https://starbrite.vanderbilt.edu/funding/)

Cherie Fathy, MD/MPH Candidate

  • I will be requesting your help on my project on Pediatric Ocular Involvement in SJS/TEN
  • All 48 subjects have SJS/TEN, including 36 subjects who developed ocular involvement (OI). Goal is to identify independent risk factors for OI. A model of recurrent SJS/TEN would be limited given only 6 subjects had recurrence and the difficulty with determining status for all subjects.
  • There is a concern for overfitting a multivariable logistic regression model given that the effective sample size is 12. Any effects are at risk of being exaggerated.
  • Do not recommend including length of hospital stay as a covariate in the model since this is unknown at time of admission. Recommend uncorrected chi-square statistic over Fisher's exact test. If logistic regression models are used, confidence intervals for the odds ratios and ROC curves may be calculated.
  • There is not enough data to make definitive conclusions. This will be an exploratory analysis of a small dataset, so be cautious in discussing the results.

2016 Apr 14

Jonathan Kropski, Medicine, Pulmonary/Critical Care Fellow

  • Assistance with VICTR application
  • My primary need is assistance with the statistical analysis plan, and a quote for Biostatistics support for a Phase Ib clinical trial grant we are submitting to VICTR. The primary outcome is to demonstrate safety and tolerability of the proposed treatment 12 weeks after randomization. Our plan is to randomize 30 subjects 2:1 to active drug vs. placebo and follow them for 12 weeks to reach the primary endpoint (safety - proportion of patients who permanently discontinued therapy due to adverse events). Secondary clinical and biomarker endpoints will be assessed after 12 weeks, and 6, 9 and 12 months after randomization.
  • I have uploaded our proposed study protocol
  • Dr. Harrell recommended 90 hour request given multiple secondary analyses and time points.

Oakleigh Folkes, Student

  • "I have data from a behavior paradigm that I ran in which two mice enter a tube and the mouse that backs out is the loser and the mouse that remains in said to be dominant and the winner. Each time a mouse wins I give that mouse one point. I do not know now how to look at this data statistically, or if I need to. On the third day of the test I gave the mice a drug treatment. I do not know how to look at this data other than just based on observation."
  • "I also have data from a three chamber social approach, in which one mouse explores three chambers and one of the chambers contains a mouse. Time spent in each chamber is measured. I gave half the cohort a drug treatment, and the other vehaicle. In this instance I do not know if I should use a one or a two way anova."
 

2016 Apr 7

Chenjie Zeng, Epidemiology

  • "I plan to build a predictive model with newly-found risk factors and previously known factors using a cohort study data. The outcome is binary. I wish to know what would be the best way to test the added predictive values of the newly-found factors."
  • Needs assistance responding to comments from BioVU committee review; also planning to apply for VICTR voucher
Changed:
<
<
  • Recommend clarifying that insufficient sample size will be used to gather information to plan larger, adequately powered study
>
>
  • Recommend clarifying the insufficient sample size will be used to gather preliminary data to plan a larger, adequately powered study in the future
 
  • Planning to use logistic regression for binary outcome; covariates age, sex, race, and comorbidities
Changed:
<
<
  • Selected 40 SNPs to include in model based on biological plausibility; plan to use weights from previous analyses to calculate genetic risk score; recommend clarify that model will not be overfit with inclusion of genetic risk score
>
>
  • Selected 40 SNPs to include in model based on biological plausibility; plan to use weights from previous analyses to calculate genetic risk score. Recommend clarifying model will not be overfit when including genetic risk score as a covariate
 
  • Bootstrap on c statistic to assess optimism

Jennifer Madu, Graduate Student

  • "I need help in deciding what type of statistics I need for my research results. I am evaluating improvement of nurse's knowledge and attitudes on end of life utilizing a clinical course called the ELNEC(End of Life Nursing Education Consortium). Pre/post tests and surveys are to assess their knowledge before and after the ELNEC."
Changed:
<
<
  • Tests included questions regarding agreement with statement (true/false), but there is no overall scoring mechanism. Pre/post tests can be linked for individuals
  • Recommend univariate analyses for each question
  • May consider generating total score for test by summing total number of correct responses; can compare pre/post knowledge using sum rank test
  • Recommend creating plots with line connecting an individual's pre- to post-score; plot improvement scores as well. See parallel coordinate plot (search wikipedia for example).
>
>
  • Tests included true/false questions, but there is no overall scoring mechanism. Pre/post tests can be linked to individuals. Recommend univariate analyses for each question
  • May consider generating a total score for knowledge test by summing total number of correct responses; can compare pre/post knowledge scores using sum rank test
  • Recommend creating plots with line connecting an individual's pre- and post-score; plot improvement scores as well. See parallel coordinate plot (search wikipedia for example).
 

Paul Yoder, Dept. of Special Education

  • Needs help understanding methods of rate estimation in a partial-interval-estimated count framework.
Line: 180 to 251
 
  • analyzing data regarding entrepreneurs in Haiti

2015 Oct 15

Sarah Tanaka, medical student

Changed:
<
<
  • I’m a medical student working on a project in ophthalmology. I am in need of some help with my project in planning the data collection process from Synthetic Derivative so that it will be most effective for analysis later by a biostatistician
>
>
  • I’m a medical student working on a project in ophthalmology. I am in need of some help with my project in planning the data collection process from Synthetic Derivative so that it will be most effective for analysis later by a biostatistician
 
  • Retrospective cohort study looking at risk factors (ventilation, hospital LOS, etc). Total 120 patients and 60 had events. Only baseline factors can be evaluated.

Jessica Hinshaw

  • Validation for dichotomous variables (I tried cronbach’s alpha and had really low co-effcients) and also analyzing tertiles
Line: 1404 to 1475
 
META FILEATTACHMENT attachment="WHICAP_biostatppt05.15.15.pptx" attr="h" comment="" date="1432652490" name="WHICAP_biostatppt05.15.15.pptx" path="WHICAP_biostatppt05.15.15.pptx" size="92217" user="MeridithBlevins" version="1"
META FILEATTACHMENT attachment="z.pdf" attr="" comment="" date="1445530876" name="z.pdf" path="z.pdf" size="66045" user="ShiHuang" version="1"
META FILEATTACHMENT attachment="Aims-10-21-15.docx" attr="" comment="" date="1446130826" name="Aims-10-21-15.docx" path="Aims-10-21-15.docx" size="12834" user="LiWang" version="1"
Added:
>
>
META FILEATTACHMENT attachment="Pilot_Full_Protocol_4.7.16.docx" attr="" comment="Kropski - Phase Ib Trial protocol" date="1460554697" name="Pilot_Full_Protocol_4.7.16.docx" path="Pilot Full Protocol 4.7.16.docx" size="241895" user="JonKropski" version="1"
Revision 310
Changes from r290 to r310
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Line: 6 to 6
 

Added:
>
>

2016 Apr 7

Chenjie Zeng, Epidemiology

  • "I plan to build a predictive model with newly-found risk factors and previously known factors using a cohort study data. The outcome is binary. I wish to know what would be the best way to test the added predictive values of the newly-found factors."
  • Needs assistance responding to comments from BioVU committee review; also planning to apply for VICTR voucher
  • Recommend clarifying that insufficient sample size will be used to gather information to plan larger, adequately powered study
  • Planning to use logistic regression for binary outcome; covariates age, sex, race, and comorbidities
  • Selected 40 SNPs to include in model based on biological plausibility; plan to use weights from previous analyses to calculate genetic risk score; recommend clarify that model will not be overfit with inclusion of genetic risk score
  • Bootstrap on c statistic to assess optimism

Jennifer Madu, Graduate Student

  • "I need help in deciding what type of statistics I need for my research results. I am evaluating improvement of nurse's knowledge and attitudes on end of life utilizing a clinical course called the ELNEC(End of Life Nursing Education Consortium). Pre/post tests and surveys are to assess their knowledge before and after the ELNEC."
  • Tests included questions regarding agreement with statement (true/false), but there is no overall scoring mechanism. Pre/post tests can be linked for individuals
  • Recommend univariate analyses for each question
  • May consider generating total score for test by summing total number of correct responses; can compare pre/post knowledge using sum rank test
  • Recommend creating plots with line connecting an individual's pre- to post-score; plot improvement scores as well. See parallel coordinate plot (search wikipedia for example).

Paul Yoder, Dept. of Special Education

  • Needs help understanding methods of rate estimation in a partial-interval-estimated count framework.
  • Conducted data simulation using known rate of events
  • Given interval duration and true count, how do I know how much correction is applied by Poisson?
  • Estimated lamba calculated as -log(N0/N), where N0 is the number of intervals with no event, N is the total number of intervals, and N = N0+N1

2016 Mar 31

Paul Yoder, Dept. of Special Education

  • Needs help understanding methods of rate estimation in a partial-interval-estimated count framework.

Mariana Ciobanu, Pediatric Neurology

  • Statistics help with headache quality improvement grant proposal
  • Pediatric Neurology clinic headache patients comprise 40% of clinic visits, and there is seasonality with more headache visits during school year.
  • Plan to evaluate current referral/triage system and identify multiple interventions to implement in Pediatric Neurology clinic, General Pediatrics clinic, and ED
  • Aims are to improve time spent waiting for appointment (internal vs. external referral) and to decrease number of ED visits related to headache
  • Recommend factorial cluster randomization approach to evaluate interventions rather than interrupted time series design; can randomize residents in their rotations
  • Do not recommend use of SPC charts as their purpose is to show uniformity over time, and the objective for this project is to compare randomized groups.
  • Discussed VICTR award options

2016 Mar 24

Mark Tyson and Rohan Bhalla, Urologic Surgery

  • Assistance with VICTR application -- "We are using a national dataset (NSQIP) to study length of stay after cystectomy."
  • The biostatistics conference room computer was not responsive during this consulting session, so please excuse the brevity. Mark Tyson and Rohan Bhalla would like to apply for biostatistics support through VICTR. They introduced their study and their main objective is to assess determinants of length of stay following surgery in a large (n=2000) national database. There are over 100 potential covariates and they would like to build a prediction model. Given the complexity of the data and the potential for some iterative work between the statistician(s) and researchers, we recommend they apply for 90 hours of biostat support (a VICTR award).

2016 Mar 17

Stephanie Moore, Pharmacology Graduate Student

  • Assistance with VICTR application
  • "We are beginning an investigation into SNPs of specific genes of interest to us in populations that develop aberrant mineralization following injury."
  • Future goal is to identify patients at risk of mineralization at time of injury
  • May want to consider a continuous response for mineralization severity score or an ordinal response for count of instances where mineralization is referenced in the subject's medical record
  • Recommend minimum of 10-20 subjects per factor when fitting model
  • Recommend working on analysis plan with Dr. Quinn Wells who may contact Dr. Frank Harrell for guidance

Cherie Fathy, Medical Student

  • Retrospective study in Ophthalmology to assess whether increasing age is associated with an increased risk for receiving unsolicited patient complaints.
  • Complaints are a rare event; do not expect risk to be proportional over time
  • Plan to record complaint (repeated measure) in dataset along with provider's age at the time of the complaint
  • Censored after last complaint is recorded
  • Recommend using Cox model with cluster sandwich estimator with age as a time-dependent covariate
  • Reference: Modeling Survival Data: Extending the Cox Model (2000) by Terry Therneau & Patricia Grambsch

2016 Mar 3

Arion Kennedy, Molecular Physiology and Biophysics

  • Assistance with VICTR response
  • I am interested in quantifying immune cells in liver biopsies of obese patients with various pathologies of nonalcoholic fatty liver disease (NAFLD).
  • Previous published data provides SEM for CD8 counts (analyzed on the wrong scale)
  • Compute SD by multiplying SEM by square root of number of subjects used in the SEM calculation
  • See Section 5.8.3 of Biostat for Biomedical Research at http://biostat.mc.vanderbilt.edu/ClinStat
  • t critical value is 2.1 when n1=n2=10
  • Get the margin of error for estimating the difference between any two of the means (using 0.95 confidence level)
  • Can probably use WebPlotDIgitizer to digitize raw data
  • This would also allow taking logs and computing SD(log CD8 count)
  • Once you have SD(log) you can compute the margin of error on the log CD8 count scale
  • Antilog of this margin of error provides the multiplicative margin of error (fold change margin of error)

Dafina Krasniqi

  • "I am uncertain about the test to use when selecting the sample size for a three arm test."

Mia Keeys, Sociology Graduate Student

  • "The problem has to do with calculating sample size in a three arm randomized control trial."
 

2016 Feb 25

Juan Pablo Arroyo, Internal Medicine

  • Assistance with VICTR application
Added:
>
>
  • "The goal of our study is to evaluate if there is a correlation between the levels of serum Cl, the creatinine and the hospital admission rate in patients with congestive heart failure in steady state conditions. The idea would be to perform a longitudinal retrospective analysis."
  • Also have retrospective data collected from kidney donors (5 visits over 1-year time period). Recommend change-point linear regression methods to quantify relationships between Cl, creatinine, and hospital admissions.
  • In the CHF population, Cl is known to affect kidney fill volume which is lowered by administration of diuretics. Plan to use training and testing data sets with k-fold cross-validation to develop tool to predict hospital readmissions. Recommend using Cox Proportional Hazards model for time to hospital readmission with predictors Cl, creatinine, and additional covariates.
  • Dan Byrne's group already utilizes Cl in models predicting VUMC hospital readmissions.
  • The next steps are to develop a protocol, research available patient numbers in Synthetic Derivative, and return to Biostatistics clinic for additional feedback and power analysis discussion before submission to VICTR.

2016 Feb 18

Cherie Fathy, Medical Student

  • "The topic is on the epidemiology and risk factors for ocular involvement of Pediatric EM/SJS/TEN (severe allergic reactions). The statistics are done, but just wanted to make sure that I did the right tests."

Bill Heerman, Pediatrics & Internal Medicine

  • Discussion of sample size calculation for prospective cohort study
  • "I am hopeful that we will be able to use a latent class analysis to identify growth trajectories that are associated with asthma incidence and severity."
 

2016 Feb 11

Katherine McDonell, Neurology

Revision 290
Changes from r270 to r290
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Older Notes
Added:
>
>

2016 Feb 25

Juan Pablo Arroyo, Internal Medicine

  • Assistance with VICTR application

2016 Feb 11

Katherine McDonell, Neurology

  • Assistance with VICTR application

Jim May

  • Discussion of pilot RCT proposal

2016 Feb 4

Ben Theobald, Medical Student

  • Discuss use of the PLCO risk assessment model for lung cancer in the setting of incomplete data

Caitlin Ridgewell, MPH Student

  • Study of personality and psychotic symptomology
  • Question regarding clustering of patients into groups in the most efficient way vs. examining the data dimensionally

2016 Jan 28

Akshitkumar Mistry, Neurological Surgery

  • Discussion of meta analysis on survival time between SVZ positive and negative glioblastomas

May Ou, Chemical Engineering

  • Voucher question

Kimberly Albert, Center for Cognitive Medicine – Vanderbilt Psychiatry

  • Voucher question

2016 Jan 7

Chris Brown

  • I have data that was analyzed by Li Wang and have a few questions I was hoping to get answered.

Sudipa Sarkar, Endocrinology

  • Discussion of grant application to study statin exposure and non-alcoholic fatty liver disease

2015 Nov 19

Liane Moneta-Koehler

Interested in comparing grad school outcome predictiveness of old GRE and new GRE. If the types of students being admitted has changed as a result of the change in the test the interpretation may be complex. The analysis is for BRET and the outcome is a score on the first year required course for all students. A suggestion was made to consider meta analysis but that doesn't seem to be appropriate here. First period is 20 years. Second period is 4y.
  • Should correlations be compared or should slopes be compared? Analysis of slopes would probably assume that the two GRE scores have the same standard deviation or are calibrated to each other.
    • Analysis of differences in slopes: fit a linear model first-year score = b0 + b1*GRE + b2*[new GRE] + b3*GRE*[new GRE]; test of interest is the interaction test (H0:b3=0); [x] is 1 if x is true, 0 otherwise. Test of student outcomes being better in one time period than another when the absolute value of GRE score (from whichever test) is held constant: H0: b2=b3=0. Data are stacked in a tall and thin dataset. Adjusting for intensity of biology coursework in undergrad could be important to do.
  • Because time spans are fairly long, one could also put time in the model as a smooth trend. A general approach is a regression spline in time with several knots, three of the knots being close to the GRE transition point. This model would not have [new GRE] in the model.
  • Is there any hint that grade inflation has changed over time so as to confound the result?

David Paik, Division of Cardiovascular Medicine and Cell and Developmental Biology

  • Making associations with groundwater ion concentrations vs. mortality rates from heart disease in 99 larger cities. We have used Spearman rho test, but we would like to confirm with biostatisticians.
    • Have average income, education which could be covariates
    • Some useful methods: loess nonparametric smoother on top of a regular scatterplot, thermometer plots superimposed on US map with two thermometers per city; better: add a 3rd thermometer measuring geology; consider one of Daniel Carr's micrographics
    • What about climate? How does one adjust for latitude and longitude in a regression model? Perhaps a tensor spline.
    • Can get quick statistical assistance if the lead investigator has a primary appointment in Cardiovascular Medicine (then contact Frank Harrell); or by applying for a voucher from VICTR
  • Determining whether patients whom were prescribed with a particular drug for manic-depressive disorder has lower incidence rate/mortality rate from heart diseases. We have extracted patient data from Synthetic Derivative, where we know the number of patients who were prescribed a drug of our interest, and how many among this group has had heart disease. We also have the corresponding numbers from a control patient pool (who were matched by age, gender, and ethnicity). We would like to know what the best way to compare these data and determine whether prescription of the drug does indeed affect the disease incidence. Also interested in associations with a pre-selected SNP.
    • Need to worry about confounding due to indication. Need to have very accurate determination of which controls have the disease that causes the drug to be prescribed, and very accurate prescription data.
  • A couple of other minor questions related to similar data analysis.

2015 Nov 12

  • I am a postdoc in the BRET office and am investigating how GRE scores predict PhD student success (for example the correlation between GRE-Quantitative scores and first year graduate GPA). Specifically, I am trying to compare the predicability of the new GRE to the old GRE. Since my independent variables (GRE scores) and populations differ, I believe I may need to use some type of meta-analysis technique to compare the correlations. Can you help me out with this? I typically run my stats in SPSS, but am open to solutions that use R or Prism.

Taylor Hudson, neurology department

  • To discuss a new study we would like to start. It will be based on a recently completed study on inter-rater reliability for which we will bring the data.
  • Two raters will evaluate each subject and decide whether to refer or not. Suggest McNemar's test or Kappa statistics. Can also calculate sample size based on prevalence, sensitivity, and specificity.
  • Apply for VICTR voucher for study design. Suggest $2000.

Stephania Miller-Hughes, Meharry Medical College

  • Quote to include as part of VICTR biostat voucher request. The voucher will be used for developing and writing a grant statistical analysis plan based on the multiphase optimization strategy which uses a factorial study design. Have informally discussed the proposed study design with Robert Greevy, PhD
  • Apply for VICTR voucher to help with study design, sample size and power calculation, and analysis plan. Estimate $5000.

2015 Nov 5

Melissa Ann Warren, Pulmonary and Critical Care Medicine

  • I am currently working on a clinical research project where I am attempting to create a CXR scoring system which can be used to predict outcomes in patients with acute lung injury. My mentor has previously created and validated a CXR scoring system in a population of patients as a means to predict pulmonary edema by comparing her score to measured lung weights. I have recently applied my new score to that same population of patients. At a glance, it does not appear that the correlation of CXR score to lung weight is much better using my score as compared to hers. However, I am wondering what test I can use to actually test this. Ex: if her r=0.63 and my new r=0.66, how can I tell if that is a significant change?

Mhd Wael Alrifai, Neonatology

  • Name of project: Paretneral Protein Calculator (PPC)
  • Type: Randomized controlled clinical trial, un-blinded
  • Help needed: Discussing the primary and secondary outcomes
  • Study status: Ongoing data collection

2015 Oct 29

Sheryl Rimrodt-Frierson

  • (1) recommended methods to answer the questions and (2) what other information I need to gather for a power analysis.

2015 Oct 22

Kelly Wolenberg

  • Statistical analysis regarding clinical ethics consult services.

Joey, MPH

  • analyzing data regarding entrepreneurs in Haiti

2015 Oct 15

Sarah Tanaka, medical student

  • I’m a medical student working on a project in ophthalmology. I am in need of some help with my project in planning the data collection process from Synthetic Derivative so that it will be most effective for analysis later by a biostatistician
  • Retrospective cohort study looking at risk factors (ventilation, hospital LOS, etc). Total 120 patients and 60 had events. Only baseline factors can be evaluated.

Jessica Hinshaw

  • Validation for dichotomous variables (I tried cronbach’s alpha and had really low co-effcients) and also analyzing tertiles
  • A study about baseline nutrition. Ten categories of different food.
 

2015 Oct 1

Jessica S. Thomas, Department of Pathology, Microbiology and Immunology

  • To discuss data analysis/interpretation for a project I am currently working on. The study is a comparative analysis of cytomegalovirus viral loads in whole blood and plasma using 4 different assay methodologies/testing platforms with the goal of understanding the interrelated effects of specimen type, assay methodology, use of different calibrants, and patient specific variables on CMV viral load quantitation. I have collected the viral load data on all testing platforms, as well as select clinical variables, on a cohort of 25 patients and would really appreciate some guidance/assistance in the statistical analysis.
Line: 1224 to 1312
 
META FILEATTACHMENT attachment="Preterm_babies_with_CHD_database.pdf" attr="" comment="For April 9 Consultation" date="1428425690" name="Preterm_babies_with_CHD_database.pdf" path="Preterm babies with CHD database.pdf" size="56644" user="MeridithBlevins" version="1"
META FILEATTACHMENT attachment="Clay_2015_Turner-Hazinski_4-8-15.pdf" attr="h" comment="" date="1429796067" name="Clay_2015_Turner-Hazinski_4-8-15.pdf" path="Clay_2015 Turner-Hazinski 4-8-15.pdf" size="378788" user="MeridithBlevins" version="1"
META FILEATTACHMENT attachment="WHICAP_biostatppt05.15.15.pptx" attr="h" comment="" date="1432652490" name="WHICAP_biostatppt05.15.15.pptx" path="WHICAP_biostatppt05.15.15.pptx" size="92217" user="MeridithBlevins" version="1"
Added:
>
>
META FILEATTACHMENT attachment="z.pdf" attr="" comment="" date="1445530876" name="z.pdf" path="z.pdf" size="66045" user="ShiHuang" version="1"
META FILEATTACHMENT attachment="Aims-10-21-15.docx" attr="" comment="" date="1446130826" name="Aims-10-21-15.docx" path="Aims-10-21-15.docx" size="12834" user="LiWang" version="1"
Revision 270
Changes from r250 to r270
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Older Notes
Added:
>
>

2015 Oct 1

Jessica S. Thomas, Department of Pathology, Microbiology and Immunology

  • To discuss data analysis/interpretation for a project I am currently working on. The study is a comparative analysis of cytomegalovirus viral loads in whole blood and plasma using 4 different assay methodologies/testing platforms with the goal of understanding the interrelated effects of specimen type, assay methodology, use of different calibrants, and patient specific variables on CMV viral load quantitation. I have collected the viral load data on all testing platforms, as well as select clinical variables, on a cohort of 25 patients and would really appreciate some guidance/assistance in the statistical analysis.

2015 Sep 24

Alvin D. Jeffery, U.S. Department of Veterans Affairs

  • To discuss a model I developed for predicting functional status of injured older adults (Dr. Maxwell is the PI)

Blair Taylor Stocks

  • My mentor and I are interested in creating a predictive/correlative model that could correlate the success of Type 1 Diabetes prevention trials with past data of the same therapeutics used to prevent or reverse T1D in mice. Since I have little experience in data mining from large clinical trials data, I was hoping to go over our thoughts on the model with you all, and see what statistical analyses would be best to use. I have a few papers on the topic that I could use as examples and we could go form there.

2015 Sep 17

Kathleene Wooldridge, MD, VA Quality Scholars Fellow, Instructor in Medicine

  • Discuss a strategy for an interrupted time series with control analysis that I’ve been working on in Stata for a quality improvement initiative.

Meera Kumar

  • We have data on an excel spreadsheet and need help analyzing it. The spreadsheet is on patients with a diagnosis of ALL (B and T cell) who have had allogeneic transplant.
  • Need biostat help with survival analysis. $2000 voucher is suggested.

2015 Sep 3

Justin M. Bachmann, Division of Cardiovascular Medicine

  • I’d like to evaluate the effect of a quality improvement intervention at the Dayani center on wait times for cardiac rehabilitation appointments after discharge. I have wait time data from 2013 to 2014 and the intervention was implemented throughout 8/2013 to 9/2013. Mean wait times were approximately 16 days during the first six months of 2013 and 11 days during the first six months of 2014, and this difference was significant using a t-test. This t-test analysis is admittedly crude, so I’d like to speak with the biostatisticians about next steps and how these data can best be presented visually. Possibilities include an interrupted time series analysis.

2015 Aug 27

Karl Zelik, Mechanical Engineering

  • Discuss sample size calculations for a grant proposal. An excerpt from the grant is attached, which overviews the study design (repeated measures design, with multiple parameter studies) and includes my own sample size calculations (which I would like to discuss and receive feedback on). Aim 1 is basic science, and Aim 2 is a nearly identical experiment but in a clinical population.
  • Statistical Analysis & Sample Size. For each parameter sweep, statistics will be computed on outcome measures using a repeated measures analysis of variance (ANOVA), with significance level of 0.05 and Holm-Sidak correction. To account for size differences between subjects all data will be non-dimensionalized prior to statistical analysis, using base units of gravitational acceleration, body mass and leg length (e.g., (Zelik and Kuo, 2010)). Non-dimensionalization will cause discrete independent variables (values listed in Table 1) to become continuous, so we will bin data prior to statistical analysis.
  • Paired design, use paired t-test for sample size calculation based on the standard deviation of the difference within subject. If the variation is small, try to use smaller type I error of 0.01. Can also plot power versus effect size.
  • Consider multivariable linear model for analysis.

Michael O’Connor

  • The project is a clinical trial in which children with cystic fibrosis were given a DHA supplementation pill at two different doses. It was a RCT with a cross-over design that included a placebo arm in addition to two arms for each of the doses of DHA. Blood, urine, and exhaled breath condensate were collected at baseline and after each of the study arms. The blood is analyzed for 20 different plasma fatty acids; the urine and exhaled breath are each analyzed for a metabolite of prostaglandin-E (single value). The goal enrollment was 18 participants, but was powered at 13 participants. We enrolled 17 participants, but 3 participants dropped-out prior to completing the first study arm (only have baseline values for these 3 participants). In addition, 1 participant only completed two of the study arms (placebo & high dose) and 1 participant completed just one arm (low dose). Luckily, these last two participants did different study arms.

My questions:

1. What is the best way to deal with the missing values for the two participants who did not complete the study, but completed at least one arm? From my limited research, I think I could do something like a skillings-mack test to deal with the participant who is missing just one arm, but this would not help with the one who is missing two arms. I also thought about “normalizing” the data to the baseline (dividing the values by the baseline) and then treating the values within each arm as unpaired and doing a kruskal-wallis (this might have additional value in thinking about how much participants increased from their baseline).

2. I know I will need to correct for multiple comparisons with the plasma fatty acids (considering that I looked at 20 different plasma fatty acids on each sample. Is it appropriate to set my p-value at 0.05/20= 0.0025?

3. The plasma fatty acids were run in triplicate and the value for each fatty acid is the average of the three runs, what is the best way to express the amount of variation?

  • Missing covariates can be imputed but imputation on response variable is not recommended.
  • Can do complete cases analysis and then include the other 3 pts for sensitivity analysis.
  • Start with the global test on whether any arm is different, if yes, then perform pairwise test.
  • Repeated measures design, fit model to adjust for baseline measures
  • For multiple comparison, present both adjusted and unadjusted pvalues
  • Scatter plot for the triplicate to examine the distribution, either average or median.

2015 Aug 20

Gabriel Bryan Winberry, Pediatric

  • I am in the process of submitting an IRB and application for VICTR funding for my fellow project. I was hoping to meet with someone to discuss sample size and basic study design (I am working on a pilot study in adolescents with Crohn’s Disease).

Mark Tyson

  • I am finalizing a protocol for a non-inferiority trial comparing observation versus surgery for bladder cancer. The primary endpoint is proportion of patients who experience progression of disease at 12 months. We know from previous studies that the risk of progression at one year is about 1%. The largest tolerable margin we would accept is 5% in the observation group before we said that this method is unacceptable. Using a type 1 error rate of 5% and assuming a 10% dropout and a 10% cross over from the observation group to the surgery group, I have calculated that 135 patients will be needed to achieve an 80% power.
  • Defining non-inferiority margin: odds ratio vs. relative risk vs. absolute proportion
  • Develop utility measures or assign a multi-level rank based on panel discussion, use ordinal scale instead of binary outcome, in order to increase power.
  • Apply for $5000 voucher

2015 Aug 13

Kate Hartley, Brent Roach, Radiology

  • Two methods to measure tumor size: US vs. MRI. Want to know which method is better for certain cases.
  • Patients characteristics, tumor types are all available. Also know the actual tumor size in most.
  • Descriptive statistics and some graphs showing how US performs. Paired t-test, multiple linear regression with adjustment of other confounders (Regress MRI on US).

Tamala Bradham, QSRP

  • The effect of board time on patients outcome like mortality, readmission etc.
  • Suggest multivariable logistic regression.
  • A VICTR voucher of $2000 is suggested.
 

2015 July 30

Nick Kramer, Meharry Medical College, M4

  • Nick is requesting additional input on an ongoing project.
Added:
>
>

Mark Tyson, Urologic Surgery

I am interested in having someone double check my sample size calculation for a randomized trial. Attached is our protocol, but briefly we are studying active surveillance versus surgery for low risk nonmuscle invasive bladder cancer. We estimated that a sample size of 148 eligible randomized patients is required to detect a 20% improvement in event-free survival in the active surveillance group by using a 5%-level one sided log-rank test with 90% power. In this calculation, we assume a 20% withdrawal rate. This study design calls for one-sided testing, since the standard (surgery) would only be affected if the active surveillance approach proved superior to the surgery group in terms of event-free survival. If the active surveillance group is the same as or inferior to the surgery group, surgery will remain the standard of care. Furthermore, on the basis of anecdotal experience, active surveillance is unlikely to result in a higher event rate than surgery, thereby justifying the one-sided approach.

2015 July 23

Andy Wooldridge, Palliative Care

I am involved in the beginning stages of a clinical research project involving palliative care and physical therapy in our palliative care unit. I was hoping to review the study design with someone and get tips on data collection and future analysis plan.
  • Currently, the outcome (Likert scales of confidence in providing care) is measured as a pre- and post-survey immediately before and immediately after a teaching/training intervention.
    • Think about whether a control group might increase the validity.
    • Include another post-measure following in-home practice.
    • Hard endpoints could include early discharge or patient QoL.
  • Sample size discussion: want the effect size to be smallest detectable difference of clinical significance.
    • For this Likert scale, we discussed dichotomizing the outcome to a binary endpoint only for purposes of sample size calculation. This might require less pilot data, but actually getting pilot data is more advisable.
    • Here's some example text from PS sample size software based on MADE UP effect sizes: We are planning a study of independent cases and controls with 1 control(s) per case. Prior data indicate that the failure rate among controls is 0.6. If the true failure rate for experimental subjects is 0.8, we will need to study 81 experimental subjects and 81 control subjects to be able to reject the null hypothesis that the failure rates for experimental and control subjects are equal with probability (power) 0.8. The Type I error probability associated with this test of this null hypothesis is 0.05. We will use an uncorrected chi-squared statistic to evaluate this null hypothesis.
  • Sometimes pre- and post- groups have a learning effect when responding to surveys. They may anchor their answers to the first survey. This is a potential source of bias: Hawthorne effect, learning fatigue, general time trend.
  • To do the analysis of a five point Likert scale, use a Wilcoxon test for the comparison. To power that test, you need to know the distribution of the five survey responses.
  • For survey development, consider using some validated surveys from the literature. If the survey is internet (e.g. REDCap) then consider using slider scales instead of a five point Likert, because the data will be more continuous which is statistically more powerful. Possible to do this on a paper form. Not sure whether this slider scale approach has been examined for purposes of phone/in-person interview. Note: you would still analyze this as an ordinal variable.
 

Kathleene Wooldridge, MD, VA Quality Scholars Fellow, Instructor in Medicine

  • Questions on data cleaning using Stata.
Added:
>
>
  • one record per patient with up to 30 medications in columns. There are 8-9 check boxes for types of error per medication: same/omission/dose change. There is source of error which is also recorded as 3-4 check boxes, but there probably shouldn't be overlap (ie, multiple types)
  • Reshape the dataset from "wide" to "long" so that there is 1 patient - 1 medicine per row. Then you might reshape it back to patient level after some data manipulation.
  • See the Stata "reshape" command for converting from "wide" to "long" and vice versa. It is very helpful to experiment with a small data set to get the feel of how reshape works.
  • The Stata "egen" command can be very helpful for summarizing (means, counts, sums, etc.) within groups (e.g. all the records for a single patient). It is also good for indicating if a particular condition exists within groups.
  • The egen functions marked "allows by varlist" would be the most relevant here.
 

2015 July 16

Statisticians: Chris Fonnesbeck, Bill Dupont, Frank Harrell, Shi Huang

Robert Wilson, Orthopaedic Surgery

Revision 250
Changes from r230 to r250
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"
Changed:
<
<

Data and Analysis for Clinical and Health Research Clinic

>
>

Data and Analysis for Clinical and Health Research Clinic

Older Notes
 
Added:
>
>

2015 July 30

Nick Kramer, Meharry Medical College, M4

  • Nick is requesting additional input on an ongoing project.

Kathleene Wooldridge, MD, VA Quality Scholars Fellow, Instructor in Medicine

  • Questions on data cleaning using Stata.

2015 July 16

Statisticians: Chris Fonnesbeck, Bill Dupont, Frank Harrell, Shi Huang

Robert Wilson, Orthopaedic Surgery

I am a Vanderbilt orthopaedic surgery resident and I would like to attend the lunchtime biostats clinic this thursday, July 16th. I am in the process of writing a grant for a project that would need a Markov decision model. I would like to meet with a statistician who would be interested in helping me create the Markov model for the project. The study will be looking at a comparison of the cost-effectiveness of 2 surgical interventions in pediatric orthopaedic cancer surgery.
  • Osteosarcoma
  • Two types of surgical implants: metal or allograft
  • 5-year cumulative mortality 0.3
  • Interested in simulation to estimate expected cost; some branches are death, revision surgery, etc.
  • Currently tabulating complication risks from the literature (infections, etc.)
  • Probabilistic graphical models - Bayesian networks - may be of use; more of a "closed" network than what a Markov sequential process would entail
  • Take a look at netica software; R has gRain and other packages
  • A sensitivity analysis will be needed to check assumptions about parameters
  • Options for biostat suspport: VICTR (general support), Kristin Archer-Swygert, VICC Shared Biostat Resource (Tatsuki Koyama), Chris Fonnesbeck (works with Kristin)
  • Talk to Dave Pinson at some point

2015 July 2

Jennifer Cunningham Erves, Surgery

My name is Jennifer Cunningham, and I recently attended a bio-statistics clinic to discuss a logistic regression to identify factors associated with parental willingness and how those predictors differ by race. We have completed the analysis based on the advice received in the bio-statistics clinic on June 4. We were told we could return to discuss the steps taken to ensure we completed the analysis correctly.

Laila Agrawal, Hematology/Oncology

I am a hematology/oncology fellow. I would like to attend a biostat clinic to go over stats for a retrospective review on rates of follow-up after abnormal mammogram to compare rates of follow-up between different groups (insurance status, race/ethnicity, language, etc.).

2015 June 18

Sam Rubinstein, Internal Medicine

My name is Sam Rubinstein; I'm a PGY-1 in the department of internal medicine. I'm working on a research project involving the effect of stem cell transplantation on proteinuria in patients with amyloidosis; my PIs are Dr. Frank Cornell (dept of hematology/oncology) and Dr. Langone (dept of nephrology). We've collected and compiled our data, and would like to get funding so that we can have some assistance from a statistician at analyzing the data. I am in the process of writing an application for a $2000 voucher from VICTR, and it looks like the process involves attending a biostatistics clinic to approve the estimate. Would it be possible to arrange for a meeting on Wednesday or Thursday of next week so that we can get this process started?

Nick Kramer, Meharry M3

Chris and I are working on a systematic review of weight bearing after posterior acetabular fractures. Specifically we want to look at Merle d’Aubigne functional scores and complication rates. However, there is very little research that looks directly at this question. Thus, we have compiled every article we can find on posterior acetabular fractures that lists their Merle d'Aubigne scores and weight bearing protocol. Our primary question for you is how can we best interpret this data.

2015 June 4

Jennifer Cunningham, Post-Doctoral Fellow, Meharry-Vanderbilt Alliance

From email to clinic list:
I conducted a study looking at "Factors associated with Parental Willingness of their Adolescents Participation in HPV vaccine clinical trials". Specifically, I need assistance in conducting a logistic or multiple regression to identify factors associated with parental willingness and how those predictors differ by race.

My specific aim was the following: To identify parental willingness and factors influencing their willingness of their child’s participation in HPV vaccine clinical trials that may be unique to African American parents as compared to Caucasian parents of adolescent girls aged 9 to 12 years. The survey will be administered to parents in community organizations to demonstrate factors influencing parental willingness of their child’s participation in HPV vaccine clinical trials and how they differ across African American and Caucasian parents using multiple regression analysis.

The dependent variable is Parental willingness of their adolescent to participation in HPV vaccine clinical research trials.

The independent variables were child's gender, Child's health, Child's insurance, HPV vaccine intent, knowledge of clinical trials (CT) prior to survey, type of CT information, comprehension of CT information, personal experience with CT, parent education level, parent gender, child race, parent race, benefits of HPV vaccine (MeanBenefits),barriers of HPV vaccine (MeanBarriers), knowledge of CT (MeanCRTKnowledge), advantages of CT for child (MeanCRTAdvantages), disadvantages of CT for child (MeanCRTDisadvantages), and trust of medical researchers (MeanTrust).
  • Conduct extensive descriptive statistics analysis understanding the relationship between the dependent variable, scales and demographic information;
  • Notes from Dandan Liu: treat the dependent variable (scored 1 to 5) as ordinal variables and use proportional odds regression model;
  • The primary analysis will investigate the effect of 6 scales on the outcome;
  • The secondary analysis will investigate heterogeneous effects by including an interaction term between race and the scale of interests

Kalpana Manthiram, Pediatric Infectious Disease

From email to clinic list:
My project is a clinical project in Pediatric Infectious Diseases. We are doing a matched statistical analysis of family history of cases with a fever syndrome and healthy controls.
  • Use Mc-Nemar test for matched case-control study
  • For categorical variables, use Bhapkar test

2015 May 28

Jeannine Skinner, Senior Research Associate, Meharry-Vanderbilt Alliance

From email to clinic list:
I would to reserve a time for biostat clinic on Thursday, May 28th if possible. Attached is my presentation.
  • Notes from Frank Harrell: It is important to have a statistical analysis plan for each study. Does yours have one? It would not be a good idea to do a medial split on age but rather to check for a smooth interaction with age.
  • Unlikely that we can modify existing analysis which seems OK, suggest senior author attend clinic to make case for re-analysis.

Magda Grabowska, Urologic Surgery

From email to clinic list:
I want to identify the number of patients I need to include on a tissue microarray in order to determine whether my protein of interest predicts biochemical failure for prostate cancer.
  • Prostate cancer tumors identified with three punches sampled to create slides. Outcome is recurrence. Would like to power study to look at one protein, with the ability to expand the study to include other biomarkers. The one protein would be measured as a score from 0-6.
    • Able to stratify patients into risk groups using nomograms and medical records (including Gleason score, etc.).
    • Have 8000 patients, half have tissue/follow-up. Need to ensure that follow-up is complete to rule out recurrence. Plan to do this via inclusion criteria. However, this cohort may no longer be "representative" because ideally inclusion criteria is applied at time of initial biopsy.
    • Outcome will be time to recurrence.
  • Suggest nested case-control study: cases are all patients who recur. For each case, select a control from risk set of men who had at least as much recurrence free follow-up. Maximize power this way, doing 1:1 or 2:1 controls:case. * Sample size: Using PS (available for free download on Vanderbilt Wiki or Bill Dupont's website), we did the following sample size calculation: We are planning a study of matched sets of cases and controls with 1 matched control(s) per case. Prior data indicate that the probability of exposure among controls is 0.1 and the correlation coefficient for exposure between matched cases and controls is 0. If the true odds ratio for disease in exposed subjects relative to unexposed subjects is 2, we will need to study 69 case patients with 1 matched control(s) per case to be able to reject the null hypothesis that this odds ratio equals 1 with probability (power) 0.8. The Type I error probability associated with this test of this null hypothesis is 0.05.
  • Suggest adjusting for Gleason score (and other known risk factors) in analysis. This would then measure the amount of info the stain gives above and beyond the Gleason score.
  • Our power calculation is very best case scenario where you are looking only at the stain by itself. If plan to adjust for other risk factors, odds ratio for stain might be decreased, so then you would need more patients. May want to bump up sample size as much as possible past these sample size calculations.
  • Tennessee tumor registry may be helpful source of follow-up.

2015 May 21

Danielle Kimmel, Chemistry

From email to clinic list:
I am Danielle Kimmel, and I am a postdoc in the chemistry department under David Wright. We are interested in developing a receiver operating characteristic curve for our rapid diagnostic tests for malaria. The end goal is for us to design an experiment to provide this information for any of our rapid diagnostics, but first we wanted to get a grasp of how many patient samples or tests needed to be performed.

James Harty, Radiology

From email to clinic list:
I'm a fourth year radiology resident doing a research project with Dr. Manning from VUIIS. I'm going to be submitting a VICTR proposal and need help with the Data Analysis/Sample Size Justification section.
 

2015 May 14

CANCELED for Employee Celebration picnic

Line: 13 to 126
 
I would like some statistical assistance on selection of the amount of patient's I would need to enroll for a study. My concept is validation of a FitBit's heart rate against telemetry in patients less than four years old who have cyanotic congenital heart disease and are admitted to the hospital for any reason. Validation of FitBit heart rate data in these patients is the first step with ultimate goal of providing a wearable sensor (i.e. FitBit) for near continuous home monitoring of this patient population in hope of finding associating home monitoring data with outcomes. This may allow for development of predictive software to warn clinical decompensation if near continuous home monitoring may be validated and adopted. Thank you for any assistance you can provide with this request.
Changed:
<
<
>
>
  • Might focus on mean absolute difference between the new measurement and a gold standard, and a confidence interval for that
  • Question of how long to monitor a child e.g. 24h or a few hours
  • Next stage is to predict clinical endpoints, which will require a large number of subjects and events unless there is a hard clinical response that is a continuous measurement

Isaura Diaz, Mark Clay, Pediatrics, Divisions of Cardiology and Critical Care

I am interested in setting an appointment for one of the upcoming Wednesday or Thursday Biostat Clinics. I am with one of our clinical fellows on a quality improvement project which is primarily survey based. The project will include survey, intervention, and post survey follow-up. We need help on determining sample size, statistical methods needed to analyze data, etc.
  • Would like to have a general family satisfaction measure. May use # times language services were used, how often approached the family.
  • If base the analysis on frequency of use of services, it is very important to capture the patient load all along the course of the study so that the # services can be normalized for susceptible patient load
  • Simplest analysis (but requiring lots of assumptions) is to compare two counts (e.g., Poisson distribution)
  • Need to be clear on individual patient/family assessments vs. system assessments

John Eifler, Urologic Oncology, Tim Shaver, Biochemistry

  • Retrospective case-control study of novel fusion events to correlate with biochemical recurrence of disease
  • Patients 2003-2009 having radical prostatectomy; looking at rising PSA (most happen in first 3y)
  • If biochemical recurrence can be measured on a continuous scale, power will be GREATLY increased
  • ~10% event rate (novel fusion)
  • To calculate sample size, try using software: http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize
    • To do the case-control sample size, you'll need additional information from the pilot data. Might be easier just to do a dichotomous power size and look at the change in sample size as the effect sizes change.
 

2015 Apr 30

Douglas Conway, Jill Pulley, VICTR

Line: 997 to 1128
 

Maxim Terekhov, Center for Human Genetics Research

  • I am working with Dr. Jonathan Wanderer on updating the PACU pain score analysis project and I have a couple of questions regarding the modeling methodology that was used.
  • Update models with more pts/more variables.
Deleted:
<
<

2013 Dec 19

Bennett Spetalnick, Tara A.Nielsen, OB/GYN

  • Retrospective study, about 2 year period. 4500 deliveries per year. About 400 vs 400 for each treatment group per year.
  • The primary endpoint is composite adverse outcome (maternal and neonatal).

DIa Beachboard, PMI

  • Electronic images of host cells invaded by virus. Compare wild type virus and mutant virus. The endpoint is binary pheonotype of vesicle. Another type is vesicle size.
  • 30% aberrant phenotype in wild type vs 60% in mutant virus. 200nm vesicle for wild type vs 280nm for mutant.

2013 Nov 21

Dia Beachboard, Department of Pathology, Microbiology, and Immunology and Pediatrics

  • I am trying to determine statistical power and what tests to use for electron microscopy quantification I am doing.

    I have two types of analysis that I will be preforming on 3 data sets. The first analysis is a nominal categorical analysis of normal vs. abnormal vesicles within my EM sections. The second analysis will be to determine if there are differences in the diameter of these vesicles (quantitatiive/continuous).

    For my three data sets, I will be answering different questions. For data set 1, I have 5 samples and want to compare 4 mutant to wild type. For data set 2, I will have 6 samples but I will be comparing two samples at a time. I am testing 3 mutations in two backgrounds and will be determining if there are differences in the vesicle between background but not across mutations. For data set 3, I have wild type and mutant virus in the presence or absence of a drug (4 samples total) and want to compare each sample in the presence or absence of drug then compare whether the mutant has a different response to the drug than wild type.

    My specific questions about statistically power are: 1) how many cell sections do I need to image to get statistical power? 2) how many vesicles do need to analyze for statistical power?

Allie Greenplate

  • I'm working on a VICTR grant application and need help determining a minimum sample size. The hypothesis of my proposal is that melanoma infiltrating T cells will have a different phenotypic and functional profile compared to healthy T cells. We have primary melanoma tumors from a clinical collaborator that have been disaggregated to single cells. From there I will perform mass cytometry on each sample to determine phenotype and function. Mass cytometry is similar to traditional fluorescent flow cytometry, however the antibodies are coupled to metal isotopes, allowing us to measure 30+ parameters as opposed to 4-8 parameters measured in flourescent flow. This allows for greater resolution of small subpopulations

2013 Nov 7

Imani Brown, MPH student

  • REDCap branching logic question. Whether missing field when not applicable will affect statistical analysis.
  • Suggest attend REDCap clinic about how to start new section.

2013 Oct 31

Sharidan Parr, Nephrology Fellow, Department of Nephrology and Hypertension

  • Retrospective study on hospitalized patients.
  • Propose to identify pts hospitalized with AKI and matched hospitalized pts without AKI, measure urine protein at baseline and at least one after baseline
  • Will do propensity score matching on age, ethnicity, diabetes (severity of DM by A1C), BP, medication exposure
  • Some pts do not have protein at baseline, for those pts, change from no to yes is also important
  • Can analyze the whole cohort on the post protein measures using linear mixed-efects model: post protein ~ baseline protein + AKI + covar
  • Can do subgroup analysis on pts with protein at baseline (continuous outcome) and without protein at baseline (binary outcome).
  • If post protein is measured at different times between patients, need to adjust for time length as well

2013 Oct 10

Pratik Pandharipande, Jennifer Guiseffi, Cardiovascular Medicine

  • REDCap database. N=234 total pts. Retrospective study.
  • There are two different surgical procedures.
  • Primary outcome: post-op day 0, 1, or 2 delirium development.
  • Covariates: age, charlson index, pre-op ef, intra-op midazolam
  • Second outcomes: 30 days and 6 months mortality, related to delirium.
  • Will apply for VICTR voucher for biostat support. Estimate ~$4000. Need $1000 cost share.

2013 Oct 03

Michael O’Connor pediatric pulmonary fellow

  • Inpatient admission for cystic fibrosis exacerbations with increased airway clearance and intravenous antibiotics will result in improvement in serum fatty acid profiles as well as a decrease in pro-inflammatory metabolites

2013 Sep 26

Lynne Caples Vanderbilt School of Medicine

  • ongoing cardiovascular RCT; want to stop early
  • Stopping rules are usually specified before the trial starts. Use of particular method depends on individual study. (type of disease, type of treatment, question of interest...)
  • Early stop due to efficacy.
  • The important thing is to provide proper operating characteristics

Heather Maune, PGY4 Obstetrics & Gynecology

  • A new policy on Preventing the Primary Cesarean Delivery
  • Interested in the some outcomes before and after the implementation of new policy. Time is confounder.
  • Power analysis shows patients collected from four months maybe enough.

2013 Sep 19

Alex de Feria

  • Questions about what type of analysis and also about sample size for the database. The project is centered around patients seen at the Vanderbilt Center for Inherited Heart Disease.

Brooke Weaver, VPH

  • Have REDCap data base, want to generate some graphics.

2013 Sep 5

David Francis, ENT

  • Need biostatistics assistance with a surgical study. N=56.
  • Evaluate grade change over time for each patient up to 20 years
  • Grade has 4 levels: invasive, severe, mild/moderate, none.
  • Want to relate the grade change over time with the procedure. The patients started at difference state and had difference # of procedures at different time.
  • Longitudinal data analysis. Goal to publish paper.
  • Suggest apply for voucher in amount of $5000.

Jill Danford, ob/gyn

  • Retrospective study on the procedure of taking out the mesh. Outcome: after surgery pain (improved, unchanged, or worse)
  • N=226. Measurement of pain came from chart by description. About 70% of patients will have improved outcome. N=18 worse, N=44 unchanged.
  • Apply for VICTR voucher, suggest $2000.

Cesar Molina, Ortho

  • Retrospective study on development of minor complication, major complication, or death after procedure (2005-2011)
  • Problem: huge sample size, all risk factors are significant
  • Suggestion: report odds ratio with confidence interval as well as the p-value. Consider forest plot.

2013 Aug 29

Rachel Thakore, Vanderbilt Orthopaedic Institute Center for Health Policy

  • Compare length of stay and costs for tibia fracture patients who have received flap coverage (N=45) to those that have not (N=106)
  • Original data has been analyzed and manuscript has been written
  • Association between treatment and length of stay adjusting for age, gender, race, and ASA score; also want to look at the difference of cost between two groups
  • Retrospective chart review from 1/1/2010 to 12/31/2011, should consider time for cost analysis
  • The distribution of length of stay is probably skewed, can consider survival analysis

Pampee Young and Fred Oakley, Pathology

  • Look at transfusion reaction comparing peds with adult population, can use chi-square test
  • Only had age for pts with reaction. There several different reaction. Want to see relationship between each reaction and age.
    • Logistic regression of certain reaction on age. Consider nonlinear effect of age.

2013 Aug 22

Heather Kistka, PGY-5, Department of Neurological Surgery

  • a survival analysis investigating the predictive value of pre-op KPS score compared to post op KPS score.
  • KPS score takes the value of 0, 10, 20, 30, ..., 90.
  • Usually pre-op KPS is predictive
  • N = about 140.
  • Question: which one is a better predictor for patients' survival, pre-op KPS or post-op KPS
  • Apply for a VICTR Voucher. $4000

2013 Aug 8

Shaoying Li, Dept. of Pathology, Microbiology and Immunology

  • T0 is time of diagnosis. Event is death. Censored at today. Many patients have very short follow-up time.
  • Would like to examine the association between some risk factors and prognosis (survival).

2013 July 25

Sharmin Basher, Cardiovascular Medicine - Fellow

  • Sample size/power: investigate the effectiveness of supplementary written information to women during cardiovascular disease prevention consulting compared to consulting alone.
  • Sample size/power using PS
  • need a method to calculate the total score

2013 June 13

Lara Hershcovitch, Emergency Med-Housestaff

  • Study of patients with Parkinson disease, patients will have medication or medication with deep brain stimulation

2013 June 6 - Canceled

2013 May 23

Erin Fortenberry, Department of Obstetrics and Gynecology

  • Cohort: all the moms that gave second birth during 2007-2013 and also had c-section before
  • Are interested in the effect of gestational age (37-42weeks, could be in days) on the composite morbidities for unsuccessful TOLAC
  • Sample size estimation using PS

Shwin Krishna, Peds

  • Need biostat support for analysis only. Suggest applying for $2000 Voucher

2013 May 9

Wissam Abdallah, Ben Shoemaker, Fellow, Division of Cardiovascular Medicine

Correlation of left ventricular fibrosis by cardiac magnetic resonance imaging with success of radio-frequency ablation in atrial fibrillation
  • Check the estimate on the median time
  • Need use simulation to calculate sample size based on Cox model; Might dichotomize the continuous predictor and then use PS
  • For Aim 2, could use the marginal error of correlation for sample size justification
  • Could apply for $2,000 for biostat support

Olivier Boutaud, Department of Pharmacology

  • Sample size calculation based on t test using PS

2013 April 25

Jenna Faircloth, Pharmacy Resident, VCH

IRB approved research project with two pharmacists and two physicians reviewing antithrombin III supplementation. I have completed data collection and summary but have questions about which statistical tests I should use for data analysis.

Natasha Rushing, OB/GYN Resident

Retrospective chart review focusing on outcomes associated with diagnosis of chorioamnionitis.

2013 April 04

Connie Lewis, Cardiology

  • Hypothesis: More African American males with hypertension have heart failure.
  • What are the risk factors associated with young people having heart failure.
  • n=360 patients in database all with heart failure (not case-control)
  • Descriptive summary table to describe risk factors for heart failure by age (20-30, 30-40,...etc.)
  • This is a pilot study, risk factors for heart failure in young patients have not yet been described in the literature
  • Create awareness of heart failure in a young population
  • About 20 hours of support is recommended

Summer Wirth, Erin Rebele, OB/GYN

  • The Role of Transvaginal Ultrasound in the Diagnosis of Retained Products of Conception.
  • Is radiology report consistent with the pathology findings.
  • Pathology is the gold standard, don't want to perform unneeded procedures
  • Can the radiology report be depended on to prevent unnecessary procedures.
  • 650 total DNC (final number will be a subset DNC of detained product, about n=200), most of them will have had an ultrasound beforehand
  • Needs for residency presentation in May
  • About 20 hours of support is recommended

2013 March 28

Carla Sevin, Pulmonary Critical Care

  • After critical care, pts have follow up to determine if they are still having problems.
  • Studying if ICU recovery program is beneficial on outcomes (readmission, mortality)
  • Secondary outcomes: health care utilization, physical impairments, cognitive effects, quality of life.
  • Has intervention and control group., plus follow up over 6mos
  • Intervention gets a hospital visit.
  • Suggested getting an option for determining mortality.
  • Consider doing a time-to-event.
  • Suggested finding the confounders associated with readmission and mortality
  • Determine risk of the outcome in relation between the two groups in order to calculate sample size.
  • Using PS with a Detectable Alternative, Power = .8, m1 = 12, n = 220/group, accrual = 27mos, Follow-up =6, and 1:1 for the two groups: Results were .725 or 1.421.

2013 March 21.

Ruki Odiete, Cardiology

  • Give rats diabetes and then give heart attack
  • There are 5 groups: control for DM, DM, control for MI (SM), MI, DM+MI; have 5-7 rats per group
  • Outcome variable is strain. There are differences in strain among groups. Adjusting for other variables is difficult due to the limited sample size
  • Try to plot and calculate correlation between LARSS (strain) and EDV (volume) for all together, then separate into 5 groups
  • Regress strain on volume and group: strain ~ volume + group
  • Suggestions: 1. plot raw data and check distribution; 2. fit regression model of strain on volume for all together; 3. fit regression model for DM and control only: strain ~ volume + group

Verena Wyvill Brown, Department of Pediatrics

  • Project to understand the patterns of child maltreatment reporting as it relates to cultural and societal variables in Latino, white, and black children in Middle Tennessee.

2013 March 14

Benjamin Shoemaker, Cardiovascular Medicine

  • Outcome: atrial fibrillation burden (continuous) after bariatric surgery. It will be repeatedly measured (up to every month) for 2-3 years.
  • Question1: association between weight loss and AF burden
  • Question2: association between repeated measures of AF burden and weight change over time as well as other comorbities (6-10).
  • Sample size: 30 patients.
  • AF burden is expected to decrease over time. Comorbidities might also change over time
  • Can power the study based on the first primary aim, and put other as secondary analysis
  • VICTR voucher for biostat support in grant preparation in amount of $2000

Brandon Perry,

  • Evaluation of an Early Hemoglobin A1c-Based Screening Strategy for Gestational Diabetes N=1850 with 150 GDM cases
  • Q1: Develop a risk assessment tool for early diagnosis of Gestational Diabetes
  • Q2: Calculate incidence and prevalence of GDM at vanderbilt
  • Q3: Is there a statistically significant difference between GDM and controls regarding C-section rates, post-op complications, birth trauma, preeclampsia, and weight gain
  • Q4: Is there a statistically significant difference between GDM babies and control babies in terms of gestational age at delivery, baby location, birth weight, presence of shoulder dystocia, feeding issues, and baby bilirubin levels?
  • Q5: Calculate the time to event (% of patients that are being picked up early)?
  • Q6: Is there a difference in cost between GDM and control patients
  • Apply for VICTR voucher and $2000 is suggested

Kaushik Mukherjee, Surgery

  • Apply for VICTR voucher for obtaining data from national data base

2013 Feb 21

Fred Oakley, Pathology

  • Study on comparison of transfusion reactions between adult and pediatric populations
  • Total 108 events for peds and 277 events for adult during 2 years observed at Vanderbilt Medical Center
  • Need to know total number of transfusions (peds and adults) and report incidence rate for each adverse events
  • Since it is not a random sample, statistical test is not needed

J Newton, OB/GYN

  • Retrospective cohort study from two hospitals (academic medical center and community hospitals)
  • Compare exposed (borderline fluid level in pregnant women from ultrasound) with unexposed (normal fluid level) for a number of outcomes
  • There are total 175 exposed and 562 unexposed
  • 8 categorical outcomes and 3 continuous outcomes (age of mom, avg days in hospital, gestational age of baby)

2013 Feb 14

Romina Sosa, Clincal Pharmacology

  • The effect of aspirin on lung cancer is mediated by COX2
  • Treat the lung cancer patients with aspirin for 7 days and measure PGE2 (pilot study)
  • Applying for VICTR funding, requires sample size justification
  • Preliminary data on PGE2 level in the normal patients. 50 normal patients. Mean +/- SD before and after aspirin. 135 +/- 100 vs 75 +/- 60 (raw data is available to calculate within-subject correlation)
  • In lung cancer patients, would expect PGE2 level reduce by 40% with a baseline level of 135 or higher.
  • $500 biostat support

David Johnson, Aaron Fritts, Radiology

  • The goal of the study is to optimize the use(?) of Port cather
  • data set: 96 patients, during the past two years, who had port cather check.
  • The response variable: whether the cather needs intervention (yes/no), about half ascertained; the presence of problem with the cather (multiple situations).
  • The independent variables: tip position (yes/no), left/right side, vein (2), operator (2 level plus few NAs)
  • Could apply for a Voucher of $4000. The department copay $1000.

2013 Feb 7

Bennett Landman, EE, Pathology

  • Analysis of survey results. There are three surveies. Most are descriptive statistics and comparisons between two groups.
  • Need expert on survey analysis to oversee the method used. PI can do the actual data analysis.
  • Suggest involve Biostatistician in the design of the survey
  • $500 VICTR voucher is appropriate

J Newton, Ob/Gyn

  • Analysis of two cohort studies, possibly combined or birth outcomes. Investigator will be teamed with Li Wang for some funding arrangement.

2013 Jan 17

Rachel Wolf

Third year medical student applying for funding for a clinical research project. Currently working on proposal and need some statistics guidance

2013 Jan 3

Kathy Niu

Outcome: Catatonia subtype are categorical. Multiple visits overtime. DSM IV or V to establish criteria over 2009-2012.N=610 with repeats ~279 patients

--Assuming two levels of Catatonia - superous vs. excited can also be ordinal.

Predictor or etiology variable interest is whether it is medical or psychiatric with many different ways to assess.

Additional variables: Demographic variables medical history and drugs that they are taking.

Time effect may be associated with treatment variables. The medical era or practice is going to affect the outcome measurement or assessment.

Need to take into consideration the missingness of covariates.

Analysis considerations for multivariable regression. Correlated data analysis. Complex analysis example: Mixed Effect Logistic Regression with patient id as random effect.

May want to pick the first occurrence of the outcome as there are only very few with multiple outcomes and in that case the logistic regression analysis will be appropriate.

Additional notes: CTSA vouchers for biostatistical input. Check the excel spreadsheet from heaven vs. hell in biostatistics website.

2012 Dec 20

Paul Morphy

  • Study background: Special education, total of 38 subjects. Determine if two interventions are equivalent.
  • To test equivalence, two one-sided tests, or operationally 90% confidence interval (CI) is used to see if it would lie within the limit a priori decided (Reference: Schuirmann (Biometrics, 37: 617, 1981) proposed in bioequivalence testing).
  • The limit should be decided a priori: convention in education, a fifth of standard deviation considered for difference boundary.
  • For this study, the confidence interval approach is a better approach since it may be hard to set the limit that can be acceptable for everyone. Also calculate 95% CI.

2012 Dec 13

Mike Baker, Cardiology; Stewart Benton, Medicine

  • Compare traditional treatment (put the valve in through operation) and a new procedure (via leg), primary outcome is measured bleeding parameters
  • Every patients will undergo the new procedure and will compare their outcomes with historical findings
  • If the outcome is binary, 10% will have normal values before procedure and 70% will have normal values after, use McNemar 's test
  • If the outcome is continuous, use paired-t test to compare values before and after. Need to get estimate of the distribution and variation of the outcome from a pilot study (need about 6 or 7 subjects)
  • Better use the raw readings. Need to find the standard deviation of the pre post difference, also try to search literature for the distribution
  • Another approach is to sequentially estimate margin of error and stop when it reaches some threshold which is defined in front
  • Can apply for a VICTR voucher and estimate $2000 is suggested

2012 Dec 6

Edem Binka

  • Three groups of patients: specific renal support therapy, traditional renal support therapy, without any renal support therapy
  • Outcome: sodium and potassium level
  • Also want to look at association between age, weight, gender and outcome, should not be any other disease associated
  • Suggest descriptive statistics due to limited sample size, nonparametric test to compare between groups
  • If have ~50 patients, linear model is possible
  • Will apply for VICTR voucher support and $3000 is suggested

2012 Nov 29

Robyn Tamboli, Dept. of Surgery

  • Advised the investigator that sample size re-estimation needs careful thinking and planning; References on this topic were provided in clinic follow-up; A VICTR study design voucher was recommended.
  • Email: I need some guidance on a power calculation. I was awarded VICTR funds to look at how ghrelin affects insulin sensitivity before and after RYGB (VR178.3). I have complete pre and post data on four subjects. Would you advise that I re-determine my sample size based on the data from these subjects? Also, I would like to submit an amendment to add a lean comparator group and need to determine sample size. There is one paper on the effect of ghrelin infusion in lean subjects; however the ghrelin dose and methods for measuring insulin sensitivity are different from my approach. How do I determine sample size with the less than ideal available data?

2012 Nov 15

J Michael Newton, OB/GYN

  • pilot project to evaluate the association between BMI and pregnancy outcomes in obese women;
  • recommend a descriptive study to collect preliminary data;
  • n=360 with over-sample in extreme BMI range;
  • recommend a VICTR voucher of $3500.

William Dresen, medicine resident

  • Investigator email note:
I am a third year medicine resident working with Ben Shoemaker, one of the cardiology fellows, on a retrospective multi-variate model analyzing a-fib ablations
  • Retrospective review of sample size n=1800 with about 600 events, a-fib ablations. There is a list of about 20 variables;
  • recommend a tree-based model to understand the data and possibly logistic models to analyze the data;
  • Investigators requested a VICTR voucher of $2000.

2012 Nov 8

Peter R. Martin, Psychiatry and Pharmacology

  • Investigator email note:
I would like to request help with determining the sample size of a BioVu study to replicate preliminary genetic data we have previously obtained suggesting a protective human mu opioid receptor variant with respect to addiction. The only day I and my genetics collaborator Al George are available is Thurs Nov 8th. Please let us know how to proceed as this is my first time attending the Biostatistics Clinic.
  • Have identified two groups of subjects using the Synthetic Derivative (opioid and non-opioid dependent)
  • Look at association between addiction and phenotype
  • Try to match case and control by some important factors
  • Comparing 1.5% to 7%, need 169 cases and 169*2 controls

Romina Sosa

  • Investigator email note:
Question is on sample numbers I am writing a grant and need to figure out how many patients I will need to analyze to look at a biomarker. Briefly, I am looking at a marker on platelets that may be able to give good predictive information of platelet activation in disease processes. Our lab has preliminary data in a patient population with metabolic syndrome, that this is the case. I am trying to look at the same marker in a population with hematologic disease. How do I come up with numbers to write on the grant? Does the preliminary data we have in metabolic syndrome help me predict the sample numbers I will need?
  • Pts with metabolic syndrome, mean 9.6 (SD 3.7); normal pts, mean 3.7 (SD 1.1)
  • Plan study on pts with hematologic disease, need to calculate sample size.
  • If can get measures before and after treatment, then use paired test which needs less pts
  • Standardise treatment period (take second measurement after 3 months treatment)

2012 Nov 1

David Hak Kim, Cardiovascular Medicine Division

  • Investigator email note:
I would like to attend the biostatistics clinic regarding two related studies.

First, I have a database of 125 pts that have presenting with acute coronary syndrome under the age of 35 yrs old, hypothesizing that an increased BMI is related to worse atherosclerotic disease and outcomes. I would like to perform descriptive statistics of the cohort as well as stratify outcomes based on BMI. I have de-identified the data and it is attached.

The second study will be utilizing the synthetic derivative, identifying all patients under the age of 45 that has suffered an MI. The goal of the study is to stratify the group into different age groups (ie <20 yrs, 20-30, 30-40, >40) and note differences in the prevalence of traditional risk factors as well as BMI. Our hypothesis is that early presentation of MI is due to risk factors independent of traditional risk factors.

Would these be appropriate to present to the biostats clinic, and if so, Thursday?
  • Look at young patients with Acute Coronary Syndrome (ACS)
  • Study 1: retrospective cohort study, age between 18 and 35, ACS, year 2000 to present, N=124
    • A paper published based on 10 clinical trials about ACS (age < 45)
    • Want to compare between this study and the results published - not appropriate
    • Association between obesity and other risk factors in ACS
    • Suggest include all the pts with and without ACS and design as a case-control study
  • Study 2: use synthetic derivative to replicate the results published
  • Qualify for instant biostat voucher

2012 Oct 25

John Schneider, Department of Otolaryngology

  • Interested in conducting a survey looking at patient expectations regarding potential therapies for chronic sinusitis.
  • Now in the stage of developing the survey but need to conduct a power analysis to determine the sample size.
  • Patients have two choices for treatment: medical management or surgical management
  • Patients available for the study: already had multiple medications; already had multiple surgeries; or newly diagnosed patients without any treatment
  • Interested in identifying patients' characteristics regarding their choices of medical or surgical
  • Suggest do a pilot study first and the analysis will be mainly descriptive, then design a hypothesis generating study

Marissa Blanco, Kathryn Carlson, Department of Pediatrics

  • Study of developing appropriate discharge instructions for non-English speaking patients
  • Two arms: control and Spanish speakers
  • Will compare proportions for the two groups after intervention (control will remain same as 10%, Spanish speaker group will increase from 10% to 35%)
  • Can calculate power comparing two binomial distributions
  • Suggest work with Ben and Kelly through PEDS collaboration

2012 Oct 18

No clinical investigators.

  • Discussed Prof. Pena's seminar, and the paper on FDR-controlling procedure by Benjamin and Hochberg, JRSSB, 1995 with a group of students.

2012 Oct 11

Vincent Agboto, Meharry Medical College CTSA

  • Vincent could not attend clinic on 10/11. Rescheduled for another date.

  • Investigator email note:
I am sending you this message because I want to pay a visit to the clinic this week with an investigator that I am helping with an R01 on vaccines to new born in Bangldesh. We had studio a few days ago and we are working on revising the draft based on the suggested changes at the studio. I revised the statistical concepts and I would like to stop by just to discuss them with you.
Please let me know if we can stop by on Thursday to discuss the statistical help that I provided to her so we can get her moving.

2012 Oct 4

Jeremy D. Moretz, PGY1 Pharmacy Practice Resident

  • Attended clinic on 10/4:
The research question was discussed, suggesting the possibility of treating bleeding as a continuous endpoint instead of binary; analyzing the association between blood thinning index and bleeding outcome might also be a possible research question; propensity score approach is needed in the logistic regression or GLM.

  • Investigator email note:
Briefly, we will be attempting to retrospectively analyze the bleeding risk associated with either argatroban or IV UFH drug use for patients listed for cardiac transplantation.
Project Title: Use of DTI Anticoagulation for HIT Risk Reduction during Heart Transplantation Listing and Evaluation
Questions to be Answered: Is there a difference in bleeding events among inpatients awaiting cardiac transplantation who are anticoagulated with argatroban as compared to intravenous UFH?
Expected Outcomes of the Study: We hypothesize that patients anticoagulated with argatroban will experience more bleeding events that those anticoagulated with IV UFH. The identification of bleeding risks and potential reduction of risk will enlighten current practice of anticoagulation and contribute to the understanding of the practice of inpatient anticoagulation in patients awaiting cardiac transplantation.

2012 Sep 27

Rajshri Mainthia, Cancer Center; Alexander Parikh, Surgery

My name is Rajshri Mainthia. Dr. Parikh (PI) and I are applying for a VICTR grant (voucher) to be used for biostats support for a project we are working on. In order to complete our application, we have been told that we should attend a biostats clinic to get an idea of specifically what type and how much biostats support we need.

If possible, we would like to come to the biostats clinic on Thursday Sept 27th.

Attached is a description of the project, as well as the data and results we have thus far. Also attached is the database as a Stata file.

The analysis we would be requesting help with includes:
1) Looking over our univariate analysis/demographics data (Tables 1-6)

2) Multivariate regression analysis.
Outcomes:
-Radiotherapy use in Stage 2A and higher rectal cancer (neoadjuvant, adjuvant, neither or both)
-Chemotherapy use in Stage 3A or higher (all sites)
-Sphincter preservation (LAR/coloanal vs APR, extent, etc) for rectal cancers only
-Overall survival

Primary exposure variable: Insurance type (4 types)

Covariates: age, gender, race, rural, metro, primary site, tumor grade, tumor stage, margins, #lymph nodes examined, time to first treatment, tumor site, and type of resection

  • Attended clinic on 9/27.
  • Study the influence of health insurance in Tennessee on the treatment of colorectal cancer
  • There were 4 insurance types: private, government, TN care/medicare, and uninsured
  • Primary hypothesis: patients with certain type of cancer at certain stage should be receiving certain treatment (like chemotherapy) regardless of what insurance he/she has
  • Need help with going through the univariate analysis which have been done and the multivariable model
  • Possible apply for a VICTR voucher. Suggested $6000

Shanna Arnold, PUI

  • Study looking at whether biomarker can predict recurrence in bladder cancer patients within two years
  • Will have N=100 patients per year. About 50% will have recurrence within 2 years.
  • Asked about what study design can be used if want to stop the study earlier when observed a clinically meaningful hazard ratio
  • Can use Bayesian design, but require lots of work in front
  • Flexible frequentist method will also be possible, as long as the stopping rule is clearly defined in front (for example, after observing 50 events, we will stop and look at the data)

2012 Sep 20

Eric Millica, Dermatology

I am a dermatology resident looking to put together a clinical research project on diagnostic drift in the grading of atypia by dermatopathologists.  I am a little lost on the best way to set up the analysis looking at three classes of atypia and how they change over 3+ time periods for each pathologist (something along the lines of a kappa statistic, but with multiple time periods).

Some of the faculty members here just told me about your clinics and I see that you have one tomorrow.  I know it is less than two days notice, but I was wondering if it would be possible for me to come to the clinic tomorrow.  If not, do you know when your next clinical research clinic will be held?

Thanks,

 
Changed:
<
<
Eric Millica

  • Has attended the clinic and may apply a voucher for assistance of sample size calculation and the analysis. $2000 is appropriate.

2012 Sep 13

Kendell Sowards, Surgery Trauma

  • Study of association between cpk and the outcome N=200 (renal failure)
  • Many patients had missing cpk values which are probably not random (need to justify)
  • cpk above 210 vs. below 210; dichotomization loses power and is not suggested
  • Try logistic regression, use original continuous cpk value and adjust for other factors
  • Some outcome only has 11 events. Is it possible to define the outcome as continuous?
  • Suggest apply for a design voucher $2000

Thais Plama, Urology

  • Study of urinary incontinence in relation to age, BMI, number of children, type of delivery
  • N=~700. Urinary incontinence as outcome, use logistic regression, age, BMI continuous
  • Score as outcome, use linear regression

2012 Aug 23

Claudia Ramirez, School of Medicine

  • Controlled clinical trial on bp-control drug.
  • 45 subjects in each arm. Within each arm, 3x3 crossover design.
  • Endpoints: MAP, plasma NE and other five secondary endpoints
  • substudy: 7 subjects in each arm
  • Applying for Voucher, estimate $7500

Phil Lammers, School of Medicine

  • Pilot study. The effect of Aspirin on PGE-M production derived from cox-2 activity in smokers
  • No previous information on effect size and variability available
  • Plan to enroll 20 male smokers at first, and then probably enroll more after effect size obtained
  • Applying for Voucher, estimate $6000 for sample size justificaiton, study design, data analysis, manuscript preparation
  • Pi is Cancer Center member and can use collaboration instead of VICTR?

Heather Kistka, Neurosurgery

  • Retrospective study. The effect of anti-depression on the progression-free survival and overall survival
  • Large brain tumor dataset, 141 subjects dead
  • Covariates: age, diabetes, smoking, chemotherapy, gender
  • May want to also include patients who are still alive. Should have representitive sample of the population. Don't subset based on the endpoint of interest.

2012 Aug 16

Romina Sosa, Clinical Pharmacology

  • Question about power calculation
  • Patient with metabolic syndrome, two groups (intervention and placebo)
  • Outcome: Lysyk-MDA-crosslink
  • Mean (SE) for the two groups: 3.7 (0.44) N=6 vs. 9.58 (1.15) N=10
  • Use SD and the difference to calculate the power with PS software

S.Nicole Chadha, Allergy Immunology

  • Retrospective chart review of 40 patients; Need descriptive statistics
  • Will apply VICTR voucher, estimate of ~20 hours ($2000)

2012 Aug 9

Victor Nwazue, Clinical Pharmacology

  • Sample size for a study of long term outcome
  • POTS patients who came to the clinic ~10 years ago and will bring them back (around 16 patients if all can come)
  • Pilot study with fixed sample size (N=16 maximum)
  • Outcome is continuous, have before and after measures, use Wilcoxon signed rank test (use nQuery to calculate power based on nonparametric test); or to use the precision approach (report margin of error)
  • Have to think about other confounders, like life styles

2012 July 5

VICTR statistician: Li Wang

Ryan Hollenbeck, Cardiology; Jeremy Pollock, internal medicine

  • Find optimal blood pressure in survivors of cardiac arrest treated with hypothermia
  • First 24 hours blood pressure. Survivors will stay in hospital ~ 10-12 days.
  • Outcome: in hospital mortality. Secondary outcome: neuological function.
  • Usually blood pressure is measured every hour, but will be more frequent for sicker patients.
  • Dose of medication (to adjust the blood pressure) should be considered since it might relate to the outcome
  • Possible approach: AUC for time spent above 80.
  • Plot of blood pressure over time for each patient
  • N=200, 120 died, can have around ~5 covariates
  • Apply for VICTR voucher, suggested $6000

Amlan Bhattacharjee, Anesthesiology

  • 60 cases, ~200 matched controls (matched on age, CPT code, surgeon)
  • Prediction of violations

2012 June 28

VICTR statistician: Chang Yu, Li Wang

James Johnson, Tom Talbot (Department of medicine)

  • Interrupted time series analysis *Total 3000-4000 observational opportunities, had convinient sample every month and watch how many times they wash hands
  • Had data from 2006 to 2012; Enhanced program started in 2009; Percent of hand hygiene adherence over time *Adherence is defined as # correctly performed / # total opportunities *Sampling method: same observer goes to the same clinic every month *Primary question: compare adherence rate before and after intervention *Contact Frank, apply for VICTR voucher requesting working with Amy Graves, estimated ~40 hours ($4000)

2012 June 06

Diego Hijano, Pediatrics

  • Study about breast milk vs. formula (N=700) within 28 days of birth
  • Outcome: disease yes/no (abour 10% prevalence rate); secondary: severity
  • Collect information: mother, birth weight,
  • Plan to apply for VICTR voucher for data cleaning/checking, analysis, manuscript preparation, suggest $6000

2012 May 31

VICTR statistician:

Drs. Courtney Horton and Candice Mcnaughton

  • Retrospective chart review pediatric trauma patients Level I and Level 2, start 2008-N= 4000 but will limit to those that have
  • an INR value-To have data for future study.
  • Using INR values as predictor of pediatric trauma patients outcome
  • Examine whether INR has added predictive ability to routinely collected clinical data
  • Outcomes considered: transfusion (Yes, No), among those transfused amount transfused, length of stay, in hospital mortality
  • Time to event analysis if looking at time to mortality and account for censoring.
  • Use INR as continuous variable - make recommendations to physicians using all available data
  • Other factors that are routinely collected: for adjustment age, gender, race, insurance status mechanism of injury vital signs mechanism of arrival, ESI, injury severity index. (Other measure GCS, not as helpful among children)
  • Suggest apply for a VICTR voucher for pulling data. Peds. collaboration for analysis (Dr. Ben Saville)

2012 May 10

VICTR statistician: Chang Yu, Li Wang

Peter Bream, Everett Gu, Radiology

  • Retrospective review of the use of stent graft. ~100 grafts. If there is problem, will put stent
  • Follow total 20-30 stents
  • Interest in primary and secondary patency rate (the survival time for the graft)
  • The patients have more than one interventions, Record the data in long format. Each patient will have multiple rows for each observation.
    • column names: id, age, gender, physician, visit date, outcome
  • Suggest apply for a VICTR voucher in amount of $6000. First $2000 is free, then 50% cost share.

2012 Apr 19

Jennie Esbenshade, Peds, Hospital Med, ID, Adam Esbenshade MSCI

  • Consultants: Yuwei Zhu, Frank Harrell
  • Detecting flu virus in health care workers in pt care area 2009
  • If sx, asked to come back and get nasal swab
  • Serology also; no flu detected
  • Multiplex PCR to detect other viruses
  • Questionnaire data to capture symptoms
  • Random 200 regular non-sick swabs selected as controls
  • 2394 swabs, 42+ (35 sx, 7 asymptomatic)
  • 119 "sick" swabs
  • Also interested in variation with type of health care personnel
  • +/- specimen vs. cough, runny nose, aches, fatigue, fever >24h, sex, age, child @home, MD/RN
  • Did not use P-values in deciding which variables to put in multivariable model
  • Second model predicting the probability of + and sick from non-sx variables sex, age, MD/RN
  • Suggest variable clustering and redundancy analysis
  • Estimate standard $2000 biostat voucher will be adequate

Margot Lazow, working with Dr Kim in ophthalmology

  • Uveitis - autoimmune - immunosuppressants for a year - topical and systemic
  • Taken off treatment, look at recurrences of inflammation compared with those who did not have recurrence
  • What are the risk factors?
  • Age, sex, type of med, underlying med condition
  • Watch for treatment by indication bias in original selection of immunosuppressants; capture original reasons
  • Variable clustering may be helpful in dealing to lots of symptoms/risk factors
  • Ask at least 3 clinical experts to list factors thought to be related to recurrence and those thought to be related to drug prescribing
  • Duration of disease, severity of disease

2012 Apr 12

VICTR Biostatistician attendees:

Meredith Pugh (Pulmonary and Critical Care)

  • 10 year retrospective cohort study for a rare disease called PAH which is related to pulmonary hypertension
    • Have approximately 246 (min age 65 yr) patients meet inclusion criteria and of those, 37 had PAH
  • Looking for the association between PAH and following potential risk factors
    • age
    • gender
    • tissue disease
    • other clinical measurements
  • Thoughts on proposed analysis:
    • Logistic regression - report adjusted odds ratio and 95%CI for selected 3-4 important risk factors, may have model over-fitting problem if including too many covariates, ORs along with CIs can be displayed by figure
    • Descriptive analysis
  • Suggested possibly applying for a VICTR Biostat Resource Request
  • Because some analyses were done, recommend requesting $2,000.

2012 Mar 29

VICTR Biostatistician attendees: Chang Yu

John Grave, Peter ?? (Health Service Research)

  • Finite mixture models and methods to differentiate primary care provider from speciality visits based on medical records.

2012 Mar 8

VICTR Biostatistician attendees: Chang Yu, Li Wang, Hui Nian

Chris Anderson (Urology)

  • Using SEER data (national cancer registry) to look at bladder cancer patients
    • Have approximately 300 meet inclusion criteria
  • Looking at the management of these patients and their survival
    • Have good info on the patients' follow-up
      • Approx half died
    • Some received chemo after surgery --- about 70 patients.
    • Because treatment was not randomized, considering a propensity score analysis
    • Wanted to know if this is correct approach or if some other analysis is better
  • Thoughts on proposed analysis:
    • Logistic regression - what variables are associated with getting more chemo post-surgery --> where he wants to use the Propensity Score analysis
    • Cox regression - outcome is death and primary predictor is whether patient received chemo after surgery; use propensity score outcome as covariate

2012 Feb 23

VICTR Biostatistician attendees: Chang Yu

Nursing project on evidence-based practice survey and electronic medical record review.

Meta-analysis of respiratory infection in developed and developing countries.

2012 Feb 16

VICTR Biostatistician attendees: Chang Yu, Li Wang

Josh Smith, DBMI

  • Study drug adverse effects or indication
  • Have table listing findings for all the drugs, 164,000 pairs, want to know method of taking sample for the reviewers to review
  • Need to standardize the procedure, from the 3 sources to evaluate the drug symptoms as: correctly identified as AE, indications, or undefined
  • Can take the trained reviewers as the gold standard and compare our tool to the reviewers
  • website "sider" for side effects, can it be used as the gold standard?
    • compre the tool to "sider", pay attention to those which don't agree and let reviewers to review
    • also take sample from those not in "sider"

Diane Andens, CRC

  • Look at the correlation between the score from questionaire and the urine residual in bladder
  • 25 subjects is a good number for pilot study
  • Primarily descriptive: describe the distribution of the scores and compre to ultrasound results

2012 Jan 19

VICTR Biostat attendees: Li Wang, Chang Yu

Carrie Geisberg (Cardiology)

  • Has death proportions from cardiology patients from two different groups across time
    • Do not have raw data, have only aggregate --- eg, proportion of patients in each group who were dead at 1 year, 3 years, 5 years.
  • Issue with data: not necessarily same follow-up on all patients; data is broken down into time frames when patients received their surgery
  • Would like to know if proportion of dead patients at each of the time points is significantly different between the two groups.
  • Discussed trying to get a better "cohort" of patients --- ask for those patients who had surgery during a certain time frame and are followed for a specific period of time; then tally the proportion dead at 1, 3, and 5 years.
    • Also discussed trying to get "time to death/follow-up" (in days) for each patient, instead of aggregate level data.
  • Suggested possibly applying for a VICTR Biostat Resource Request
    • Because eventual goal is publication, recommend requesting $4,000.

Beatrice Stefanescu (Neonatology/Peds)

  • Randomized control trial regarding ventilator associated pneumonia (VAP)
    • Want to look at time to off of ventilator --- may or may not have had VAP while on the ventilator
    • Recommend applying for a VICTR Voucher --- $2,000 should cover it
  • Have 2nd study looking at neurological impairment of babies (at 18 months) put on one of two different breathing machines

2011 Dec 8

VICTR Biostat attendees: Li Wang, Chang Yu

Rubin Baskir (Cardiology)

  • Brought in data in electronic form; was also able to get a raw death rate for each county.
  • Also have raw data from California
  • Generated a two-way table of Lithium Category vs Death Rate Category as well as Chi-Square test
  • Also generated a sunflower plot as discussed last week
  • Discussed performing a Poisson Regression to estimate the effect Lithium Category using the raw Death Rate as the outcome.
  • Suggested moving forward with the VICTR request for $2,000 Voucher in order to get "pretty graphs" and Poisson Regression analysis.

Dr. Zhaoliang Li (Cell Biology)

  • Has submitted a manuscript and needs some help on how to address the reviewers' comments.
  • Reviewers asked for some "additional" statistical analysis for a graph
    • Graph represents expression of over time
    • Graph has three lines depicting three different cell lines --- normal expression, under expression, and over expression
    • Each line represents 3 samples (ie, a mean +/- SD is shown each time point on each line)
    • Chang's suggestions:
      • Calculate slope for each line between time 0 and time 12hrs; report slopes only in response to reviewers.
      • Perform non-parametric Wilcoxon rank-sum (aka, Mann-Whitney U test) test to compare (1) over expression to normal expression and (2) under expression to normal expression at time point 12 hrs only.
        • So, will report two p-values.
      • Make sure you state limitations of data (ie, only 3 samples per cell line).

2011 Dec 1

VICTR Biostat attendees: Li Wang

Rubin Baskir (Cardiology)

  • Has county level data from Texas for a given year --- ion level (in ground water) and death rate.
    • Both variables are categorical.
  • Would like to know if there is any correlation between ion level and death rate.
  • Recommend creating a 2-way table (ion level bvs death rate).
  • Discussed creating a sunflower plot.
  • Discussed running a Chi-Square test, Spearman correlation, and Chi-Square trend test.
  • Recommend returning Thurs Dec 8 with data in electronic format.

2011 Nov 10

VICTR Biostat attendees: Li Wang, Chang Yu

Jason Williams (VUIIS - Imaging Science)

  • Regarding a submitted VICTR request - have already submitted a protocol; initially requested $9,000 --- Dan Ayers helped write stat analysis plan and sample size.
  • In the past, Dan Ayers has supported PIs in related studies
  • Support requested is actually for meetings/consultation over the data accrual time period (2 years) and eventual analysis at end of study.
    • Dan Ayers would fulfill the meetings/consultation over the two years and mentoring of the MS statistician at the end of the 2 years
    • MS statistician would fulfill the analysis at the end of the 2 years
  • Li Wang to send an email to Frank Harrell (CC Chang Yu, Dan Byrne, and Jason Williams) to ensure this structure of work over time is "okay" under VICTR; also to ask Frank if this study may fall under the new Imaging Collaboration.
  • Update: Chang and Li met with Dr. Williams on Nov.16, 2011. We felt that $5000 is reasonable to apply for support of the statistical analysis and preparation for the manuscript, which will be accomplished by VICTR biostatisticians (master biostatistician doing the analysis under supervision of PhD biostatistician).

2011 Nov 3

VICTR Biostat attendees: Chang Yu, Terri Scott

Sabina Gesell (Pediatrics)

  • Completed study looking at children's physical activity (measured by accelerometers) during an after school program (either at a school or a community center)
  • Main objective: Compare the community center program to the school program --- community center program was designed to get the kids active.
  • Roughly 50 kids in each group
  • All African American children
  • Measured the children's activity multiple times in each child.
    • Have number of minutes spent in each level of activity (rest, sedentary, lowintensity, moderateintensity, and vigorousintensity).
    • Sum of these numbers = total time spent in the program (on that day) --> this total is different across the children (ie, some children stay longer than others).
  • How much time (specifically, proportion of time) is spent in (1) moderate+vigorous intensity activities, and (2) sedentary+rest
  • Possible regression: Poisson or Negative Binomial regression
  • Final goal: manuscript
  • Cost estimate: $6,000 (60 hours) --- will need to submit a protocol.

2011 Oct 20

VICTR Biostat attendees: Chang Yu, Dan Byrne, Terri Scott

Laurie Cutting (Pediatrics)

  • Sent email regarding "Neurobiology and Treatment of Reading Disability in NF1" NIH grant application.
  • Hypothesis: drug magnifies affect of tutoring
  • Discussed how many arms to move forward with -- 4 (placebo, only tutoring, only drug, or tutoring and drug) or 3 (only drug, only tutoring, or tutoring and drug)
  • Need revised sample size calc -- based on hypothesis, study should be powered based on the interaction effect.

Emily Reinke (Sports Medicine)

  • Take X-rays of individual's knees and measure the space (ie, distance in mms) between bones
  • Have two raters that will be making measurements from the same images using the same method
  • Need to determine how many times each rater needs to examine the same image and how many times the two rates need to examine the same image
  • Major aim of project: to measure and describe the distance --- so, don't need both raters to examine all images each (that is, each raters can examine a subset of the images)
  • Recommend a Bland-Altman plot to further examine agreement between raters
  • nQuery Advisor -- sample size for a precision (ie, width of a confidence intervals) around the desired intra- and inter-rater ICC values.
    • Will determine how many images the raters will have to examine in common
    • Would be nice if the images chosen for the two raters to examine each are also the set of images each rater will examine twice

2011 Oct 13

VICTR Biostat attendees: Chang Yu, Dan Byrne, Terri Scott, Li Wang

James (Jim) Powers (Medicine)

  • Discussed "Exploring the Utility of Ultra-Brief Delirium Assessments in Non-Intensive Care Geriatric Population: the GEM study" research study that was explained in an email sent to the biostat-clinic email address on Oct 6, 2011.
  • Dr. Powers has conducted some initial analyses, but would like support for additional analyses.
  • Goal: manuscript.
  • Desired additional analyses, include examining the variability and confidence intervals (possible bootstrap CIs) in more detail. Also would like some possible sub-group analyses (may be difficult because have only 7 patients with delirium; possibly can use a mixed effect models). Possibly calculate kappa.
    • Would be interesting to look at time from admission to study involvement (and time from admission to unit to study involvement), where "study involvement" is defined as the first day the CAM/DSM-IV are assessed.
    • Would also be interesting to compare the data from each rater (for each patient) to each other.
  • IMPORTANT: because this is a VA project, can only have someone who has WOC clearance (eg, Ayumi, Jennifer, Svetlana, Sam, Shirley, David) work with the data --- none of the current VICTR folks have VA clearance.
    • Ayumi is happy to provide oversight; she suggests Svetlana or Jennifer for the bulk of the work.
  • Suggest requesting $4,000 Biostat support from VICTR.
    • Already has support from Chair for additional 50-50 cost sharing plan.

2011 Sept 29

VICTR Biostat attendees: Daniel Byrne, Chang Yu, Hui Nian, Li Wang

Mary Sundell, Nephrology and hypertension

  • Assess the agreement between two measurement methods
  • Each patient has observations at three time points
  • Suggested plotting the data at baseline as a start, can use Bland-Altman plot

Ryan Hollenbeck, Cardiovascular Medicine

  • Effect of early catheterization on clinical outcome
  • The patients were not randomized
  • Suggested use propensity score to account for the group differences, matching, and Cox proportional hazard model. The outcome is time to discharge from hospital, and the patients can be censored at that time. Also good to have complete record of patients' current status and analyze the overall survival.
  • Will be applying for VICTR --- suggest applying for $2,000.

2011 Sept 22

VICTR Biostat attendees: Terri Scott, Li Wang

Shubhada Jagasia, Endocrinology

  • Reference to email sent by Brandon Perry (9/21/2011).
  • Related to "Hemoglobin A1C as a screening tool for gestational diabetes" study (VR2144)
  • Would like to look at sub-groups of women who have different HA1C "profiles" across their pregnancy --- in terms of result of early screening, screening at usual time spot (24-28 weeks), etc.
  • Would like to also compare these subgroups of women to "control" women (ie, those whose H1Ac and glucose testings were normal throughout their pregnancy) --- would like to see if their are "predictors" of the various profiles.
    • Questions of interest "Is there an H1AC in the first trimester that correlates with GDM in the second trimester?"; "Is it possible to intervene with certain groups of women to reduce the incidence of GDM?"
  • Goal is a manuscript.
  • Will be applying for VICTR --- suggest applying for $4,000.

2011 Sept 15

VICTR Biostat attendees: Terri Scott

Pampee Young, Pathology

  • Human stem cells --- not understood if/how stem cells are different across people
    • Looking at possible potency assays; and wondering "are stems cells different?" (across people)
    • 10 patients; bone marrow from each; made stems cells from each persons sample
    • How do the "test" results compare?
  • Need support for analyzing the data --- descriptives (including, mean +/- SD; 25th, 50th, and 75th percentile; and range) and graphs (ie, boxplots and stripcharts).
    • Also would like to see how distributions differ across gender and age.
  • Need support for manuscript writing, etc.
  • Suggest requesting $4,000.

2011 Sept 8

VICTR Biostat attendees: Terri Scott

Oscar Gomez, Peds Infectious Disease

  • Wish to submit two Biostat requests to VICTR --- one for support for an IRB proposal on a grant that's already been submitted; and one for a grant to be submitted in Jan 2012.
  • For IRB proposal --- need confirmation of needed sample size as well as statistical analysis plan. Would be good to have a general review of study design and other statistically related issues. Lastly, discuss using REDCap to collect and manage data; include sentences in IRB proposal.
    • Feel $2,000 (~ 20 hours) would be sufficient support for work required.
  • Grant to be submitted in Jan 2012 will be an RO1 --- Will need support regarding specific aims, study design, choice of measures (ie, outcomes and predictors), sample size calculation & justification, statistical analysis plan, estimate of biostats support needed for grant budget, and data collection/management.
    • Because it's an RO1, would recommend asking for $4,000 support --- will need to submit a letter of support for the 50% cost-sharing plan (over $2,000).

2011 June 2

VICTR Biostat attendees: Terri Scott

Sunil Kripalani, Medicine

  • Submitted a Voucher request for sample size calc for R01 (Sept)
  • Will need: sample size calc, stat analysis, and study design
  • Will be looking at a subset of patients from the ISCHEMIA worldwide clinical trial (David Maron)
  • Subset who speak English and are in the US or Canada - want to engage them in an ancillary study -- either traditional clinical management or a telephone based intervention that would improve their medical adherence
  • Hypothesis: intervention would improve their medical adherence as well as cardio outcomes
  • Was suggested (from ISCHEMIA PI) to also have a life style intervention --- 2 by 2 factorial study
  • Was suggested (also from ISHEMIA PI) to not randomize at the patient level because of another ancillary study that will be randomizing two different BP interventions --- so, have cluster randomization instead
  • Thoughts from Dr Kripalani: ask each patients at enrollment whether they are adherent then enroll those folks who are non fully adherent
  • Thoughts of a three arm study --- adherence intervention, lifestyle intervention, and no intervention
  • Will have ~150 sites in US/Canada, but only ~800 patients across all sites --- so small number of patients per site (will affect sample size calc for cluster randomization)
  • Will be following the patients longitudinally for a year or so --- analysis will involve some repeated measures modeling
    • Will need to consider sample size in order to perform proposed analysis (yet to be determined) and dropout/missing data
  • Thoughts for revision of VICTR request:
    • Request Frank's involvement since he's been involved in the ISCHEMIA design
    • Increase requested amount to $4,000 (because of many nuances of study design)
    • Approach the sample size calc from the point of you of "we'll have X patients enrolled in the US/Canada, what effect size can we find with 80% power?"

2011 May 26

VICTR Biostat attendees: Frank Harrell, Dan Byrne, Terri Scott

David Tabb (Biomedical Informatics)

  • Wanting to submit an RO1 in October --- involving proteomics
  • Needs a statistician's time for writing the grant --- preferably someone with proteomics experience (eg, Ming Li)
  • Question: should Dr. Tabb use VICTR funds to get help? Or should he utilize Ming (and the Cancer group)?

Crystal Rice (CRC Nurse)

  • Comparing two types of saline flushes --- used to flush IVs
    • (1) pre-filled syringe; and (2) bag flush
  • Patients have complained about taste and/or smell of the different flushes
    • Would like to validate that the flushes actually cause a smell/taste
    • At this point, goal is not to determine if there is a difference (ie, hypothesis test) but to describe their reaction to the flush (if any)
  • Would like to blind patients to which flush they are receiving
    • Have a cross-over randomized trial
    • Would like to piggy back on an existing study that involves flushes
  • Smell - Yes/No; Taste - Yes/No; Degree of taste/smell; Type of taste (metallic, bitter, sweet, etc)?
  • Also collect simple descriptives of each patient --- age, gender, race, smoking, pregnant, concomitant meds
  • Plan to exclude chemo patients

Amy Dreischerf (Endocrinology) & Charles Keil (Human Nutrition & Gastro)

  • Have data from a joint PI from a previous from a genotype and nutrition study
  • Would like to explore some additional hypotheses with "extra" data that were collected in original study
  • Goal: poster (for conference) & brief summary of findings (for summer internship)

2011 May 5

VICTR Biostat attendees: Terri Scott, Frank Harrell

Karen Chen & Samit Patrawala (Dermatology)

  • Follow-up from 2011 April 21
  • Prioritized analysis:
    • Plaque & patch stage vs tumor stage patients (at time of diagnosis) -- looking at outcomes (ie, number of patients, outcome, survival, etc)
    • Medical therapy vs radiation therapy --- looking at outcomes
    • Whether received Bexaritene (Targretin capsules) or not -- looking at outcomes
    • Focus will be descriptive --- should calculate some confidence intervals
  • Feel $6,000 estimate is accurate

Christi Parker (Pharmacy)

  • W/ Anesthesiology; studying cardiac surgery patients
  • Adrenal insufficiency
  • Etomidate - studying incidence of adrenal insufficiency in patients who receive Etomidate
  • Initial StarBRITE request: $2,000 for "Data analysis"
  • Desires:
    • Primary predictor: Etomidate yes/no
    • Primary outcome: Adrenal sufficiency yes/no
    • Cortisol levels w/in 72 hours of being induced with Etomidate for surgery
    • N approx 250 patients (1 record per)
    • Basic desired analysis: 2x2 of Etomidate vs Adrenal insufficiency
    • 2ndary outcomes: vaso-presser hours, mechanical vent hours, hosp LOS, ICU los, receipt of stress dose steroids
      • Does Etomidate have an effect on the secondary outcomes
    • Possible adjusted analysis - will need to determine possible confounders
    • Suggest collecting continuous adrenal outcome
    • Want a manuscript --- way in the future!
  • Suggestions: continue with $2,000 request; draft detailed ranked analysis plan.

2011 April 21

VICTR Biostat attendees: Chang Yu, Terri Scott

Karen Chen (4th year medical student working with Dept of Dermatology)

  • 40 patients with Tumor stage T-cell lymphoma
    • Either initially diagnosed with tumor stage or progressed to tumor stage
  • Have collected various demographic and clinical data on each patient
    • Have REDCap database --- have a lot of questions that want to answer (what's possible will depend on the data once we get a look at it)
      • Data: clinical characteristics collected at presentation; data collected for each treatment period (possible multiple treatment perdios; treatment periods can overlap)
    • Want to look at "survival" (time to death; time to progression; disease specific survival; time to "treatment failure")
    • Final goal: abstract/poster leading to a manuscript
  • Estimate of hours: 60 --> $6,000 --> will have to cost share on $4,000 (Need to talk with mentor to identify key question(s) and thus $ may vary.)

2011 April 14

VICTR Biostat attendees: Frank Harrell, Chang Yu, Terri Scott, Li Wang

James (Jim) Powers & Mac Buchowski (Medicine)

  • Elderly patients -- two groups (inpatient (much sicker) and "more healthy")
  • Two techniques for measuring amt of water in patient --- want to compare the two techniques (for clinical utility); one more invasive/time consuming than the other (other based on bedside measures)
  • Question: can we develop a "simple" measure (based on bedside measures) to predict stuff for care of these patients.
  • Statistical suggestions:
    1. Calculate rank correlation b/w the two measures
    2. Perform regression analysis -- try to predict one measure using the other(may adjust for other covariates)
      • Can also calculate mean abs error from predicting one w/ the other (good measure of clinical utility)

Ken Monahan & Evan Brittain (Cardiology)

  • Submitting grant (due May 15) to examine how right side of heart works using an ultrasound method.
    • Need help with statistical portions of grant -- sample size & stat analysis plan.
      • Feel $2,000 would be enough for this.
  • Bland-Altman comparison of MRI vs ultrasound method
  • Transit time --- using MRI and new technique
    • Standard deviations of transit time (using both methods) would be good to have for sample size calcs
  • Establish reproducibility in "normative" population; also, reproducibility in popn w/ known pulmonary hypertension
    • Interested in how transit time differs b/w groups
  • Sample size will be determined (at this point) based on funding
    • Consider calculating sample size needed to estimate a correlation coefficient w/in a specific "margin of error" -- need SD estimates
    • Would like to calculate sample size needed in the future for further studies
  • Recommendation for amt to specify in grant budget for stat support (if grant awarded): 100 hours of time (max)

2011 February 17

James (Jim) Powers, Medicine

  • Also with Jim, Bill Gregg (Informatics)
  • Attending VICTR junior biostatisticians: Terri Scott, Hui Nian, Li Wang
    • Frank Harrell and Cindy Chen also in attendance.
  • Discussed data to be pulled from StarPanel:
    • For each patient, at each clinic visit -- measures like # high risk meds, total # meds, # htn meds, # diabetes meds, # dementia medications, BP, GFR, Pneumovax, and Flu vaccine.
    • For each patient, for each day 1 or more "contacts" was made with Clinic --- total # contacts made on that day.
    • For each patient --- dates of any "visit" (hospital admission, ED visit, PCP visit) and type of visit.
      • Similarly, no shows/cancellations.
      • For hospital admissions -- length of stay, primary reason for admission.
        • Similar for ED visit, PCP visit.
  • Discussed possible analysis:

2011 January 27

James (Jim) Powers, Medicine

  • Also with Jim, Renee Porier
  • Attending VICTR junior biostatisticians: Terri Scott, Hui Nian
    • Also, Cindy Chen
  • Email Biostat Clinic email address with data set, protocol, and data questions on 1/21/2011.
    • NOTE: not all of the needed variables are included in the emailed data set, including BP and Study ID.
    • Also discussed getting longitudinal data on each patients --- emailed data set has one row per patients.
    • Also, before & after group only includes those patients are only those who came to the clinic throughout the study period --- that is, those that died (for instance) are not included.
  • Has submitted Biostat services request in StarBRIITE -- perform statistical analyses and assist with publication.
    • Suggest to revise request to include discussion of what data needs to be collected (ie, pulled from StarPanel and StarTracker dashboard) for the analysis.
  • Interventional study in geriatric population --- have before & after measures for core group of ~600 patients.
  • Recommended revising request to $6,000 --- will need letter of support from Dept/Div.

2011 January 20

Michael Osgood, Research Fellow, Surgery

  • Goal of study: sample size calculation for a clinical trial
  • Compare two groups of patients with binary outcome
  • Estimated proportions in each group and use PS software to calculate the required sample size

Amy Pyle, Fellow, Pathology; Dan Anderson, RAII, Pathology

  • How to get Biostatistical help for a funded VICTR study

2010 June 17

Cyndya Shibao, Asst Prof, Clinical Pharmacology

  • Goal of study: differences in energy expenditure (total and resting) between autonomic failure patients (N=10) and matched controls (N=15)
  • Resting energy expenditure correlates highly with body fat free mass
    • Want to adjust for this and gender in a model
  • Interested in how to graph results
  • Linear model: REE ~ grp + FFM + grp*FFM
  • Could do a logistic regression with group as the outcome, adjusting for other factors
  • Could graph FFM by REE for both groups, adding a fitted line with confidence interval
  • Also recommend "Forest Plots"

Jorge Gamboa, Fellow, Clinical Pharmacology

  • 3 x 3 cross over design, patients (N=15) receive all three treatments with 7 day wash-out period in between each.
    • Each patient is measured 5 times per treatment
  • Problems with missing data - none are missing all time points
  • What would you like to compare among three treatments? Ending value? Peak? Time to peak? AUC?
  • Suggest making plot to start
    • x-axis - time; y-axis - outcome measure; color - treatment
    • Use graph to determine how to compare across groups
  • To deal with missing data, use mixed effects model.
    • Could also use data around it to impute

2010 June 10

Michelle Griffith and Jeff Boord, Endocrinology

  • Need help estimating statistical support needed for project
  • Working with Bioinformatics to get data concerning patients and blood sugar levels
    • Interested in building model predicting hypo- and hyper-glycemic events
    • About 35,000 admissions in dataset
    • Estimate what percent will have hypo- and hyper-glycemic events.
    • Consider separate models for each disease
  • Recommend getting initial voucher of 20 hours to get a better picture of what will be needed.
  • New system beginning July 1: After first $2,000, department will have to pay half of biostat request

2010 June 3

Jeff Kantor, Pediatric GI

  • Studying association between isoprostanes (measure of oxidative stress) and percent body fat
  • Retrospective cohort of 158 patients between the age of 8 and 17, currently healthy, many obese
    • Combination of patients previously studied in CRC
  • Chang suggested to make a list of factors that could effect isoprostane levels
    • diet, stresses to body, undiagnosed, dm, etc
  • Can include around 10 variables in the model.
    • would need to include age interaction with many variables.
  • Consider redundancy of some variables.
    • Ex: Weight in lbs and kgs are perfectly correlated and it would not make sense to put both in a model
    • Are there some variables which could be considered highly correlated?
>
>
Older Notes
 

META FILEATTACHMENT attachment="Skinner_biostat_pptabb.ppt" attr="h" comment="" date="1421090173" name="Skinner_biostat_pptabb.ppt" path="Skinner_biostat pptabb.ppt" size="507904" user="MeridithBlevins" version="1"
META FILEATTACHMENT attachment="SplinesWithInteraction.do" attr="" comment="" date="1426281572" name="SplinesWithInteraction.do" path="SplinesWithInteraction.do" size="6488" user="WilliamDupont" version="1"
META FILEATTACHMENT attachment="Preterm_babies_with_CHD_database.pdf" attr="" comment="For April 9 Consultation" date="1428425690" name="Preterm_babies_with_CHD_database.pdf" path="Preterm babies with CHD database.pdf" size="56644" user="MeridithBlevins" version="1"
META FILEATTACHMENT attachment="Clay_2015_Turner-Hazinski_4-8-15.pdf" attr="h" comment="" date="1429796067" name="Clay_2015_Turner-Hazinski_4-8-15.pdf" path="Clay_2015 Turner-Hazinski 4-8-15.pdf" size="378788" user="MeridithBlevins" version="1"
Added:
>
>
META FILEATTACHMENT attachment="WHICAP_biostatppt05.15.15.pptx" attr="h" comment="" date="1432652490" name="WHICAP_biostatppt05.15.15.pptx" path="WHICAP_biostatppt05.15.15.pptx" size="92217" user="MeridithBlevins" version="1"
Revision 230
Changes from r210 to r230
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Added:
>
>

2015 May 14

CANCELED for Employee Celebration picnic

2015 May 7

Patrick Holmes, Pediatric Cardiology

From email to clinic list:
I would like some statistical assistance on selection of the amount of patient's I would need to enroll for a study. My concept is validation of a FitBit's heart rate against telemetry in patients less than four years old who have cyanotic congenital heart disease and are admitted to the hospital for any reason. Validation of FitBit heart rate data in these patients is the first step with ultimate goal of providing a wearable sensor (i.e. FitBit) for near continuous home monitoring of this patient population in hope of finding associating home monitoring data with outcomes. This may allow for development of predictive software to warn clinical decompensation if near continuous home monitoring may be validated and adopted. Thank you for any assistance you can provide with this request.

2015 Apr 30

Douglas Conway, Jill Pulley, VICTR

From email to clinic list:
The palliative care research team would like to schedule a biostats clinic in order to better understand the statistical components of a potential survey instrument to assess patient understanding of therapy, prognosis. We would also like input regarding contamination of our control group as the intervention (early palliative care consult) can be considered standard care and thus, control participants might get exposed to the intervention (although not “early”).
  • Difficulty in choosing primary endpoint. Something like ventilator-free days?
  • Can there be multiple endpoints? Some advantage in having a simpler endpoint that will convince skeptics. Some want a "hard" endpoint.
  • Multiple endpoints might be combined using utilities
  • Should assessments be made after death or prospectively?
  • What is a "safety" endpoint here? Caregiver burden?
  • Preference for validated instruments especially if need to combine multiple items; but doesn't hurt to ask a handful of individual questions for qualitative assessment
  • There are mixed method approaches
  • It may be easy to contaminate the control group, e.g., just by asking them questions
  • Post-death assessments may be most unbiased if timed wisely
    • Need to time carefully with regard to receipt of hospital/physician bills

Mark Clay, Pediatrics, Cardiology, and Critical Care

From email to clinic list:
I have written a research proposal, and my question is have I chose the appropriate statistical analysis for the data to be obtained, how many subjects do I need for statistical power, are there any methodology flaws? I have attached the proposal.
  • Correlation between body/extremity temperature and blood flow
  • Interested in exploratory analysis of surface temperature patterns and several traditional lab measurements
  • Some variation in hospital room ambient temperature; may need to accurately capture these temperatures for adjustment
  • Are changes in temperature important (central - peripheral) or are absolute temperatures important?
  • May do a 20-subject pilot study to find out sources of variation in the measurement due to camera angle etc.
  • Typically there is an esophageal or rectal temperature probe for measurement of core temp
  • Sample size will need to be larger if the model relating to cardiac output is an empirically-derived model (as opposed to plugging into an existing biomathematical model)
  • Sample size based on accurately estimating a linear or Spearman correlation coefficient: see Section 7.5.2 in http://biostat.mc.vanderbilt.edu/wiki/pub/Main/ClinStat/bbr.pdf
  • Consider a response feature approach to analyzing the data
  • Estimate pilot study sample size based on confidence interval widths.

2015 Apr 23

Mariu Carlo, Geriatric Medicine

From email to clinic list:
I’m planning to apply for VICTR funding for a research project I am doing. My research project is a secondary data analysis of the BRAIN-ICU cohort looking at whether executive dysfunction at 3 months after discharge from the ICU is an independent predictor of mental health outcomes.

Deonni Stolldorf, VA Quality Scholars Program

From email to clinic list:
I would like to present a project on April 23rd regarding a secondary data analysis related to Lean management. The purpose of the project is to determine how lean was implemented in different units and if the implementation approach was associated with the sustainability of Lean in these units.

2015 Apr 16

Colin Walsh, Biomedical Informatics and Medicine

From email to clinic list:
I’m interested in chatting with an expert about ways in which calibration of a predictive model (e.g. Regression analysis) can be compared between different models. I am comfortable with how to measure discrimination, calibration, and clinical usefulness, but would love to discuss whether a particular metric of calibration is more convincing to an expert audience in choosing between two hypothetical models.

Dr Walsh is currently working on a comparison of five hypothetical models for prediction of hospital readmission using four years of training data and one year of validation.

In his previous experience, most comparisons discussion discrimination using C statistic and briefly discuss calibration (using plot). He is interested in discussing most rigorous way to compare calibration of models. Which is better?

A few discussion points.

-- Not recommended to split on time. "Any time you split on a variable that is easy to model, setting self up for failure."

-- Literature comparing calibration curves is sparse, but is an area of interest. If can calculate two calibration curves, may be able to bootstrap and obtain CIs.

-- May consider Spiegelhalter calibration as a starting point.

-- Recent paper on how to estimate calibration curves most accurately (PC Austin, EW Steyerberg Statistics in Medicine)

Gerasimos Bastas, Physical Medicine & Rehabilitation

From email to clinic list:
This is in follow-up to previous meeting and to help ascertain sample size calculation for re-submission of a VICTR proposal.

At previous clinic there was a general discussion on identifying the total number of subjects to perform the overall study. This calculated sample size (which is quite large) was considered scientifically worthwhile, but not feasible in the require time window according to recent feedback from VICTR.

At this clinic, there was discussion regarding the most appropriate way to propose a pilot study that VICTR would consider both feasible and worthwhile for funding. Prior work in the field is based on smaller sample size (less than 30).

For the first calculation, an appropriate level of precision was decided and the required sample size was calculated. It was suggested to look at sample size from the other direction. That is, determine what is a reasonable sample size that can be evaluated in a given time frame. Then, calculate the level of precision that would be obtained using that sample size and justify why that is scientifically relevant.

Pick a measurement that are interesting but is difficult or unstable. Make calculations based on that measurement.

Recommended to compair of continuous measurements; this will allow sample size for power calculation to be smaller.

Plan to return next week to discuss next week.

2015 Apr 9

Divya Suthar, Pediatric Cardiology

I have set up the redcap database and would like to enlist the help to ensure I have data entered in the format that it is easy to analyze when we reach the analysis portion of the research.
  • A brief overview of the project: We aim to compare the morbidity and mortality of preterm babies with congenital heart disease with preterm babies with no heart disease. We will be looking at the following parameters:birth weight, gestational age, initial APGARs, immediate post natal acid base balance, presence of an antenatal diagnosis, gestational age at birth versus at the time of surgery (analyzing time and reasons for deferring surgery), presence of extracardiac anomalies, other co-morbidities of prematurity such as intraventricular hemorrhage, bronchopulmonary dysplasia, necrotizing enterocolitis, pulmonary hypertension, presence of aspiration/reflux and need for additional surgical procedures, presence of thrombosis, culture proven sepsis, presence of arrhythmias, medications such as ionotropes etc, need for cardio-pulmonary resuscitation, extracorporeal membrane oxygenation support while in the hospital , length of hospital stay, long term follow up and overall outcome. In addition, we aim to compare surgical outcomes in this preterm population with CHD with all infants <1 year who underwent open or closed heart surgery at our institution excluding the patients in this study. This comparison may extend to include recently published national surgical data with a similar cohort. We will also be comparing the incidence of heart defects in preterm babies with the incidence reported in a recent study by Egbe et al, using the Nationwide Inpatient Sample database, to identify any institutional differences.
  • Morbidity/mortality of pre-term babies with/without heart defects. Want to estimate outcomes to help advise parents on treatment decisions. Also interested in comparing term and pre-term babies who have open heart surgery.
  • Want to determine best approach to set up REDCap database. There is no longitudinal database.
    • REDCap has branching logic (i.e. skip patterns) that can be used when not all participants require entry of certain questions.
    • Recommend looking at Spreadsheet from Heaven on biostatistics wiki.
  • Discussing the analysis now.
    • Wish to estimate the incidence of complications among pre-term infants stratified by heart disease. And look at discharge and survival for pre-term vs. non-pre-term babies using Kaplan-Meier estimates.
  • Estimating effect of surgical intervention on pre-terms compared with normal pre-terms with no surgical intervention will be difficult to draw inference because babies that require surgery are inherently different than those that do not receive it. Could derive propensity score to indicate whether or not the surgery would be early or late. Sort patients by propensity score and trim data. If there are some patients who should not have delayed intervention then they should not have entered a clinical trial, and are trimmed. Then analyze those patients not trimmed and adjust for propensity score. Need to convince yourself that the propensity score is effectively matching patients across relevant variables. Then create a standard "Table 1" using inverse probability weighting (IPW). If confounders balance across groups then you might proceed.
    • Advanced methods including marginal structural models may allow for consideration of counterfactuals, but they are usually used in longitudinal or repeated measure studies.
    • Otherwise, for descriptive studies may just acknowledge confounding by indication (i.e. surgery).

Chengxian Zhang and Oscar Gomez, Pediatric Infectious Diseases

We have a clinical case/control study to study the associations of infectious agents and childhood diarrhea in Colombia, South America. In this study, both children with or without diarrhea were enrolled and the infectious agents were tested from the stool samples. We have collected some data for a pilot study paper and will publish a paper later this year for the whole study, so we need some advice on how to analyze and present the data with proper biostatistics methods.
  • Have about 450/450 cases/controls for study. Recruited cases by inclusion criteria of diarrheal illness and presentation to ED in hospitals. Recruited controls from healthy check-ups at same hospitals and matched for age group.
    • Concerns for case/control recruitment might be that they are confounded by socioeconomic status (for example). It is helpful to adjust for these potential confounders (if collected) in multivariable analysis.
  • Assuming cases/controls are comparable, using binary outcome (pathogen status) then do multivariable logistic regression with case status as relevant exposure. Consider adjusting for clinic site.
  • Analyzing pilot data will likely be an underpowered comparison and makes it difficult to do the full regression model that this study may require. Consider looking at distributions as first step, and also finalizing a Statistical Analysis Plan. Total number of pathogens are ~30, but interested in mostly in ~9 emergent pathogens. Each might be separate models unless you are interested in co-infection. If there are biological reasons to group pathogens, then you could look at coinfection.
    • Consider ranking pathogen analysis in order of importance prior to analysis since there are issues of multiple comparison (REFERENCE: Cook RJ, Farewell VT: Multiplicity considerations in the design and analysis of clinical trials. J Roy Statistical Association A 159:93-110; 1996).
  • Interested in supporting staff and/or faculty effort to analyze this data. Meridith will put them in touch with Leena Choi of allocation committee.

2015 Mar 26

Bryan Hill, Urogynecology

I am doing a retrospective chart review determining the relationship between mid-urethral pressure profiles and abdominal leak point pressures during urodynamic analysis.

Background:

Urodynamics is a test ordered for patients with urinary incontinence. It measures, among other things, the maximum pressure of the female urethra (mmHg) called MUCP, and pressure at which the patient leaks after bearing down (called LPP) at either capacity or at a set volume of 150 mL instilled into the bladder. During a clinic visit, the physician will examine the patient and ask her to squeeze her vaginal muscles (kegel squeeze) and this is recorded on a scale of 0, 1, 2, 3, 4 (zero absent squeeze, 4 max squeeze). Our aim is to determine if there is a correlation between MUCP and LPP. Our secondary aim is to determine if a relationship exists between MUCP, LPP, and Kegel squeeze. LPP and MUCP are measurable data corresponding to mmHg and the Kegel squeeze is ordinal data. Our questions are the following:

Would a spearman's correlation be adequate for comparing ALPP and MUCP? The data appears to follow a normal distribution, however, we did not test for this. Therefore, we are using a Spearman's correlation rather than Pearsons because we are assuming the data is not "normal". This would be more conservative.

When we are comparing our ranked data to the measured data, can we still apply either a Spearmans or Pearsons' correlation or should we use an ANOVA instead? Also, would linear regression be more appropriate in this situation?

  • Outcomes: 2 measures of pressure (MUCP and ALPP), also measuring Kegel muscle tone as 0-4 scale assigned by clinician.
  • Chart review of 550 records (400 complete) from research derivative. Included: Females with urodynamics, complaint of urinary incontinence. Exclude: males, comorbidity.
  • Graphical representations of paired relationships (e.g. scatter plots). Spearman's correlation is appropriate; however, the interpretation may be less meaningful than a graphic. Could also do a Spearman correlation for muscle tone scale and pressure measures.
  • What are thoughts on binning 0-1 as Kegel weak and 2-4 as Kegel strong? Not recommended.
  • Next step may be linear regression or a proportional odds model. Interested in knowing how to predict 1 from 2 for all possible combinations (i.e. three models). If all outcomes are of equal interest. Would assess whether all three tests are needed.
    • If you decide to do parametric analysis (like linear regression) then you could conduct some influence analyses to examine the outliers.
    • Include interaction terms for the two covariates on outcome.
    • If a secondary motivation is to use the noninvasive test (i.e. Kegel) to predict performance on the more intensive test, then maybe leave out the other pressure measure as a covariate (since it is collected as the intensive test).
  • How to handle out of range values that are still plausible? Could do proportional odds model for all three outcomes which is more robust to these values.
  • Take caution not to try to correlate these outcomes with UI because the sample is biased. Limit analysis/conclusions to correlations within this population.
  • Future directions could be to use these measures to look at surgical outcomes.

Lisa Jambusaria, Department of Obstetrics & Gynecology

- WILL RESCHEDULE Study Title: Incontinence Rates after Midurethral Sling Revision Techniques. I have performed a retrospective chart review of all midurethral sling revisions done at Vanderbilt University in the last ten years (303 patients) to determine the rate of incontinence, pain and urgency between the four different types of sling revision performed: midline transection, partial excision, full excision and urethrolysis. I needed help with the data analysis: what test should I use to look for a difference between the four different types of revisions surgeries in continence, pain and urgency outcomes? as well as their baseline characteristics? How do I establish CIs? What type of table is appropriate to show my results? How do I make subgroups based on yes/no questions preoperatively and if the answers change post operatively?

2015 Mar 19

Matt Rioth, Department of Hematology and Oncology and Biomedical Informatics

I have a dataset of cancer patients' tumor genetic sequencing that contains two types of results: variants of "known" and "unknown" significance. However in many instances the known and unknown seem to be closely related (ie the unknown significance are happening not due to random chance) We are trying to determine the best means determining if there is indeed an association between the known and unknown significance variants. I can upload an de-identified data file and provide more details on the analysis we have attempted.
  • Question from Frank: I want to ask you about the phrase "association between the known and unknown significance variants". I assume this should be"significant variants". Please elaborate on "significant". Any analysis that is restricted to statistically "significant" features may be problematic, causing bias and over-interpretation, plus multiplicity problems.
  • 90 patients all of whom have metastatic breast cancer. Breast Cancer tumor (targeted XM) sequencing yields variants of known clinical significance and unknown clinical significance (i.e. prognostic, therapeutic, or diagnostic significance). A nonsignificant variant may be a gene that appears rarely. There are a total of ~360 variants.
    • What are the differences between variants of known vs. unknown significance? Why are they classified that way? According to lab that assigns this category, this is an expert consensus and chart review.
    • Is it worth pursuing germ line data? Have robust phenotype data for breast cancer (pathology, metastatic, lymphnode sites)
  • Since there is no clinical out come and wIthout a control group, it's difficult to assign significant variants. Might be worth looking at datasets that have noncancers and cancers in order to determine how genes/variants are different across groups. Or within cancers, how do they separate the tumor subtypes or another clinical situation.
  • Interested in looking at associations between "unknown significance variant" and "known significance variant". Distinguish between USV worth pursuing and USV that are not.
    • Use logistic regression to predict unknown variant using known variant -- this would have issues of overfitting and correlation among variants.
    • Logic regression: variant A and B in known variant, then estimate probability of having unknown variant Q. Regression tree-based method. Package in R exists that is called logicreg.
    • Would it be interesting to group k variants associated with an organ or pathway. Automated ways to group gene ontology labels based on metrics of distance between labels. Cluster variants with this gene ontology domain knowledge -- Python package GOgrapher.
  • Difficult to distinguish between a common variant and a significant variant.
  • Discussed network graphics which show a edge if both variants are observed in same patient, try using a different summary to draw the edge (i.e. pearson correlation or GINI-index). Color lines by positive/negative correlation and width of line proportional to magnitude of correlation. Only draw edges for USV to KSV and ignore all other pairs.

2015 Mar 12

Jessica Castilho, Division of Infectious Diseases

Jessica came for help with restricted cubic spline models with interaction terms in Stata. I (WDD) explained how to do this and how to graphically explore the data set to determine which models are reasonable. After meeting with her I created a Stata do file to illustrate these procedures. A copy of this do file is attached.

2015 Mar 5

Jessica Castilho, Division of Infectious Diseases

I have a question about calculating and graphing predicted values for a regression model that includes cubic splines and an interaction term using Stata for observational, clinical data. I was wondering if there would be someone available at the noon biostatistics clinics either Thursday or Friday of this week or (particularly given the weather) Monday March 2nd with Stata familiarity who could help me.

I am currently collecting diagnostic and demographic information of every patient who was admitted into Stallworth Rehabilitation hospital and had to be readmitted to VUMC or another acute care center due to an acute m

2015 Feb 26

Susan Salazar, OB-GYN

I attended a clinic last fall to get help with ideas for a database I am creating. I want to determine if anemia is correlated to pyelonephritis in our population of pregnant and recently delivered women. The medical biller has indicated that approximately 400 women over the past two years have that particular diagnosis code (we delivered approximately 9000 women over the past two years).

After a review of the literature, I’ve pared down the list of contributing factors to about 6. I would also like to gather data from charts from women who got pyelonephritis after they delivered.

My questions are thus:

If I review 400 records, how many “control” cases should I review? I was going to match them based on the due date of the person who actually had pyelonephritis. Does it need to be one for one or maybe just a percentage?

Should I pare the variables down even further? (maybe just three?) I have an idea of their weighted correlation so I could use the top three that seem to be most associated with pyelonephritis.

I can come to another clinic or I can email my protocol as it exists right now.
  • Our recommendation was nothing more than a 4:1 or 3:1 match of controls to cases. We also recommended Susan attend a REDCap clinic for development of a data base that will allow easy transition from data collection to data analysis.
  • Given the scope of the work and that a manuscript is involved, we estimate that it will require about 45 hours of work at most.
 

2015 Feb 5

Jean P. Betancourt, Vanderbilt Stallworth Rehabilitation Hospital

Line: 1716 to 1928
 
    • Are there some variables which could be considered highly correlated?

META FILEATTACHMENT attachment="Skinner_biostat_pptabb.ppt" attr="h" comment="" date="1421090173" name="Skinner_biostat_pptabb.ppt" path="Skinner_biostat pptabb.ppt" size="507904" user="MeridithBlevins" version="1"
Added:
>
>
META FILEATTACHMENT attachment="SplinesWithInteraction.do" attr="" comment="" date="1426281572" name="SplinesWithInteraction.do" path="SplinesWithInteraction.do" size="6488" user="WilliamDupont" version="1"
META FILEATTACHMENT attachment="Preterm_babies_with_CHD_database.pdf" attr="" comment="For April 9 Consultation" date="1428425690" name="Preterm_babies_with_CHD_database.pdf" path="Preterm babies with CHD database.pdf" size="56644" user="MeridithBlevins" version="1"
META FILEATTACHMENT attachment="Clay_2015_Turner-Hazinski_4-8-15.pdf" attr="h" comment="" date="1429796067" name="Clay_2015_Turner-Hazinski_4-8-15.pdf" path="Clay_2015 Turner-Hazinski 4-8-15.pdf" size="378788" user="MeridithBlevins" version="1"
Revision 210
Changes from r190 to r210
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Added:
>
>

2015 Feb 5

Jean P. Betancourt, Vanderbilt Stallworth Rehabilitation Hospital

I am currently collecting diagnostic and demographic information of every patient who was admitted into Stallworth Rehabilitation hospital and had to be readmitted to VUMC or another acute care center due to an acute medical event. Medicare tracks the percentage of patients in each rehabilitation hospital that have to be readmitted into an acute care centers and incurs a penalty on those above the national average. Vanderbilt is once such institution; henceforth, I would like to be able to manipulate my saved data, which includes variables such as length of hospital stay, Admitting Physician, time of admit and discharge, reason for transfer, among others and make comprehensive charts comparing the different physicians, and looking for factors and trends in order to identify risk factors for readmission and thus be able to identify which patients are at risk in the future.

Peter Freswick and Maribeth Nicholson, Pediatric Gastroenterology Fellow

  • Our project has to do with inappropriate Cdiff testing in the childrens hosptial pre and post an intervention, with 13 months data collected for both timeframes.
  • Suggestion to keep age as continuous since it's likely that a 12 month old and 35 month old have different CDIFF testing rates.
  • A generalization of the interrupted time series: Model time as a smooth curve/continuous effect with nonlinearity (e.g. restricted cubic splines) to observe the change over time.
  • Poisson regression with an offset for the hospital volume or fractions using linear regression. Consider an interaction effect between age and time to explore the intervention effect by age "group". Also, model age as a smooth curve.
  • To give a crude answer to this question: "Is the distribution of age of children who receive CDIFF different pre- and post-intervention?" you can use a wilcoxon rank sum test. This does NOT account for seasonality and we suggest the above methods for VICTR support.
  • Would like to apply for a VICTR voucher to support this project. Suggest applying for a standard VICTR voucher, time required 35 hours ($2000).

2015 Jan 29

Manisha Gupte, Cardiovascular Medicine

Working witih Michael Hill PhD, Assistant Professor, Division of Cardiovascular Medicine, Dept. of Medicine
What we have done is treat our animals with either vehicle (saline) or Neuregulin (NRG) and measured cardiac function at various time points. The two function parameters that we measured was Fractional Shortening (FS) and Ejection Fraction (EF). I have attached the raw data file as well as a graphs of the raw FS and EF values. As you will see, in the NRG-treated animals, FS increased during the later stages of the treatment period while in contrast, in the vehicle-treated animals, FS decreased during that same time period. Thus, we have a difference in the FS values between the 2 groups during those treatment stages. The same holds for the EF values as well. What we would like to know is whether at any of those time points, the difference in FS and EF values between the vehicle and NRG-treated animals is statistically significant?
  • Looking at raw data for 3 rats - longitudinal trends - some inconsistencies
  • Would be good to estimate the confidence intervals for the differences in the two groups, over time
  • Analysis of last time point: sig. equal-variance t-test and Wilcoxon 2-sample test
  • Full longitudinal analysis would be good (e.g., generalized least squares (growth curves))
    • For that, use the baseline variable as the covariate
  • Consider having replacement rats when deaths occur during surgery

Li Alemo Munters, Research Fellow, Division of Rheumatology

I need help to analyze attached data set and present with sensitivity, specificity, positive and negative likelihood ratio in relation to response in glycerol versus in disease activity; glycerol versus VO2 max; glycerol versus 5VRM; glycerol versus cycling time. I would like the values to be presented with 95 % confidence intervals (CI). (A statistician that I have worked with in Sweden used MedCalc. For Windows, version 12.5, MedCalc Software, Ostend, Belgium). The study includes patients randomized to two groups (exercise group (=1) or control group (=2)). Either the patients were a responder (=1) or a none-responder (=2) in the variables mentioned above (included in the attached Excel file). (I have used Fishers exact test to investigate difference between the Exercise group and the control group as to frequency of responders and non-responders in disease activity, VO2 max, 5 VRM, glycerol and cycling time). But I don't know how to perform the analyzes wished for above.
We are performing a pilot study were we hypothesis that patients with myositis have altered metabolic capacity/ low insulin sensitivity in muscle and low muscle performance compared to matched healthy controls and that endurance exercise improves metabolic capacity/insulin sensitivity and muscle perfromance in patients compared to a non-exercising control group. We use delta glycerol as a surrogate marker for meabolic capacity/insulin sensitivity and muscle performance assessed by cycling time to exhaustion.
In attached power point is how I would like to present my data. Slide 1: compare patients and healthy controls at baseline and lower muscle performance (cycling time to exhaustion) and a scatterplot to display individual values. Slide 2: with-in group comparison baseline vs. follow-up for the exercise group and the non-exercising control group and line-plot to display individual values baseline to follow-up.
  • The file is in the conference room computer directory clinicuploads
  • Discussed significant information loss and arbitrariness of responder analysis by email
  • Also discussed conservatism of Fisher's "exact" test
  • Disease activity is determined from several ordinal and continuous variables; consider ranking all these variables across subjects and computing the average rank; analyze average rank vs. treatment group. Collapsing several variables into one.

## download and install: http://www.r-project.org/ ## graphical user interface: http://www.rstudio.com/

## set randomization seed and create fake data set.seed(1) x1 <- rnorm(18) x2 <- rnorm(18) y1 <- rnorm(18) y2 <- rnorm(18) group <- c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2) m1 <- m2 <- .5 ## graphic 1 plot(x1,y1,pch=group) points(m1,m2,cex=2) ?legend plot(x1,y1,pch=group,xlabel="label",ylabel="label") points(x2,y2,pch=group) arrows(x1,y1,x2,y2) * Check out VICTR as a resource: http://biostat.mc.vanderbilt.edu/wiki/Main/VICTRBiostatPolicies

Richard Gumina, Cardiovascular Medicine (Time Permitting)

  • Email: When we plot out the %aggregation and genotype we have the following graph (below). It appears that there is a trend for decreased aggregation in the Aspirin treated group with AA versus AG versus GG. When John conducted t-test of aspirin AA versus Aspirin GG the p-value was 0.037. Is there a way to use these findings? Is there a way to assess for trend in the ASA data?
  • two continuous measures pre and post along with categorical SNP groups

2015 Jan 22

Richard Boyer, MD/PhD Candidate, School of Medicine

  • I would like to go over a few requested manuscript edits from a statistical editor. The study is a failure mode analysis of a particular medical infusion pump, where we performed benchtop trials with different flow conditions and evaluated pump performance. I've copied the reviewer comments below and bolded the comments that I would like to discuss.

From the Statistical Editor: When responding to each of these concerns, please display the new text using colored (e.g., red) font. Please do not use Microsoft Word Track Changes in the uploaded manuscript. Although Track Changes can be convenient, when uploaded into Editorial Manager formatting changes and updated field codes will be displayed making it difficult to view new numerical values and impractical to copy and paste text into the subsequent review, if necessary.

I was asked to perform a statistical review of the authors' paper. However, it is unclear to me what findings are intended to be reproducible. The journal requires the use of the STROBE checklist. Please do so. First, clearly separate what are intended to be interesting motivating observations versus what are intended to be science (i.e., reproducible). Second, for each of the findings meant to be reproducible, provide the actual P-value unless P < 0.0001. When P = 0.05, under typical conditions, if a trial were repeated, the probability is only around 50:50 of obtaining P <= 0.05 on the second trial. For P = 0.008, the chance is around 75%. P = 0.0002 gives you a 90% chance. The probability of failure is only around 5% until P < 0.0001. See Goodman SN. A comment on replication, P-values, and evidence. Statistics in Medicine 11:875-879, 1992. Another recent paper on the topic is: Johnson VE. Revised standards for statistical evidence. PNAS 110: 1913-1917, 2013. This is the reason for using the observed P-value itself unless P < 0.0001. Shafer SL and Dexter F had an editorial on the topic, Publication bias, retrospective bias, and reproducibility of significant results in observational studies. Anesthesia & Analgesia 2012;114:931-2."

Clearly indicate what are entirely independent replications and the sequences followed (i.e., randomization). For example, I consider the section P10L20 -L32. "Three Belmont disposable sets were occluded ... Each of the unilaterally blocked disposable sets was evaluated in triplicate trials at 10, 100, and 500 ml/min for 10' at each flow rate without interruption." What was the precise sequence? Does triplicate trials mean 10, 10, 10, 100, 100, 100, 500, 500, 500? Please be clearer in your revision. Incorporate the sequence and the resulting correlations in your statistical analyses.

P9L17 "Calcium effects were evaluated with the addition of IV calcium chloride (1000 mg of 10% CaCl2) to the base fluids in 100 mg increments up to 1000 mg in serial trials in triplicate as described above." Are these entirely separate experiments or the same disposable set and the solution is changed incrementally? How, then, was Figure 6 produced with different flows? Please do not write out a response, simply revise your Methods showing the new text. However, be sure that your statistical analysis uses an appropriate model for the correlations within sets and trials. Referring, for example, to Figure 6, there seem to be 30 different mean values. Does this imply that there were 90 different disposable sets or were there 3 sets and 30 measurements made using each of the 3 sets? If the latter, the correlations need to be taken into account. Such approaches are basic in engineering statistics. This is engineering statistics in that there are so many problems with small numbers of expensive machines, so many tests are run on the same machines, and thus correlations of results are considered.

Consider (simultaneously with multiple comparisons) the adjustment to maintain family-wise error rates. Please DO NOT hesitate to add the statistical model(s), either to the Statistical Methods or to an Appendix, whichever the authors prefer.

It is not clear to me that, once the correlations have been modeled, the authors will be able to test their statistical assumptions. If not possible, then please treat post hoc P < 0.01 as statistically significant.

2015 Jan 15

Jeannine Skinner, PhD, Senior Research Associate, Meharry-Vanderbilt Alliance

* Discuss protocol for "Cross-sectional and longitudinal effects of vitamin D and pulse pressure on cognitive function". PowerPoint * Longitudinal sample of 80+ adults with continuous cognitive test outcomes and vitamin D as the exposure of interest. Would like to discuss analysis and VICTR support. * Suggest that vitamin D be included in any analysis as continuous measure (not dichotomized to <20 ng). * With longitudinal data, there exists correlation between visits of any one patient that require methods to account for this correlation. One suggestion is a 'response feature analysis' where the multiple measures for one patient may be summarized using one measure (e.g. slope, AUC). Some more advanced methods are more difficult to implement, but they could include mixed effects models or GEE. * If there is dropout, it is important to understand whether it is ignorable.
  • Assuming that the data is in good shape, it is reasonable to request a $2000 (35 hour) VICTR voucher for support of statistical analysis for 1 manuscript.

Gerasimos (Makis) Bastas, MD, PhD, Assistant Professor of Physical Medicine & Rehabilitation

  • I would like to reserve some time to come for an initial consultation regarding a clinical care project for amputee care. I am planning to apply for VICTR funding soon. I am looking to identify and link meaningful metrics (such as pelvic motion) to amputee walking performance in order to develop a useful tool for clinical assessment.
  • There is way to perform non-invasive pelvic motion analysis that affords:
    • Cadence
    • Speed
    • Stride length
    • % stride length/height
    • Gait cycle duration
    • Step length for each lower limb
    • Stance phase duration for each lower limb
    • Swing phase duration for each lower limb
    • Single support duration for each lower limb
    • Double support duration
    • Pelvic girdle angles (sagittal, coronal, transverse plane rotations) with estimation of side-to-side symmetry
    • I am looking to also collect data such as Age, Height, weight, Type of amputation (above or below knee), Type of surgical technique (myoplasty/myodesis), Residual limb length, Time since amputation, Need for prosthetic device revisions (how many / year), Activity level (hours of use per day), Level of activity intensity with device (mild, moderate, high-impact), Device composition (more for reference, e.g. type of foot, liner, knee)
  • In having participants perform walking trials, I am looking to establish amputee specific patterns (in means and ranges and clinically significant threshold values of pelvic motion) but also perform covariate analyses to see if specific indices can be developed to establish clinically significant metrics (and overall trends) to help surveil performance and suggest interventions timely.
  • Current gold standard is an expensive infrared set-up (instrumented gait motion capture lab), new tech is a sensor on a belt that can measure multiple movements. Want to ascertain whether the data from the tech can be used as a global tool for prosthetic performance. Would like to establish threshold values for "safe", "requires monitoring", "unsafe".
  • This would be an exploratory study with multiple levels of intervention. Possible to have 20 people with multiple measures and control over 1 aspect of limb.
    • Collecting a lot of data makes sense provided the variety of patients, movements and sensor output.
  • Initial aim would be to capture subjective feedback from patients/provider with the sensor output.
  • Difficulty sounds like establishing a gold standard. There are many methods for correlating this sensor output with a gold standard.
    • A different type of study would be to select patients with specific amputation and generate hypothesis surrounding gait measures that require intervention. Randomize to 2 groups -- 1 with device and 1 without device, then determine clinical outcomes.
    • May need to document what constitutes pathologic walking styles.
    • First do a descriptive study of groups of individuals and their movements.
    • Within subjects are these measure reproducible and reliable.
  • Suggest that VICTR be used as mechanism to generate data for larger research study. How many patients? Hard to answer this question without a testable hypothesis.
    • Recommend collecting equal numbers of groups (i.e. normal, type of amputation). Simplify the data collection as much as possible. Erring on the side of large amount of data to support exploratory analysis can't hurt if it's feasible.
  • Sample size options:
    • To officially establish a sample size, we could focus on the precision of the confidence intervals to estimate population means for subgroups.
    • If you have20 patients per parameter, you have good predictive ability. Example of this type of justification is in: Arnold, Donald H., Tebeb Gebretsadik, Karel GM Moons, Frank E. Harrell, and Tina V. Hartert. "Development and Internal Validation of a Pediatric Acute Asthma Prediction Rule for Hospitalization." The Journal of Allergy and Clinical Immunology: In Practice (2014). Meridith will forward this PDF along.
  • A VICTR voucher of at least 90 hours is recommended. We advise working with the VICTR statistician prior to data collection. If analysis for a given study requires more than 90 hours of work, it will be the PI’s responsibility to provide a center number to the department of Biostatistics for the remainder. Bill Dupont, Yuwei Zhang, and Meridith Blevins were in attendance.

2015 Jan 8

Fernanda Maruri, Research Coordinator, Infectious Diseases

* I have a small dataset in which I compared the differences of a continuous variable with another variable that has 3 levels. I wanted to adjust for multiple comparisons so I did a Bonferroni, Benjamini-Hochberg correction, and Dwass, Steel, Critchlow-Fligner Method. I'm interested in knowing which method is more appropriate. I read some of the literature, and it looks like most advanced biostatisticians don't recommend doing multiple comparisons corrections for this kind of analysis, but in the past, when we submitted papers for publication, the reviewers requested multiple comparisons corrections. I would like to come to one of your clinics this week if you have room and talk to someone about this issue.

2014 Dec 11

Jessica Wilson, Endocrinology Fellow

  • Retrospective descriptive study about glucagon stimulation test to diagnose growth hormone deficiency
  • Indications: concern about pituitary tumor or suspected growth hormone deficiency; pediatric history to be confirmed as an adult; or post-pit. surgery
  • Data from adults 2008-2014, n=42; stable endocrinology attending and test used during this period
  • Interested in GH peak vs. nadir-to-peek glucose, sex, ...
  • Sometimes IGF I test used for screening
  • GH < 3 is currently suggested for diagnosis; suggestion that higher BMI should have lower cutoff
  • Blood glucose and hormones taken q30m for 4h
  • Sample includes some diabetics (all controlled)
  • Descriptive statistics - boxplots (including quartiles) will be useful; can also consider extended box plots
  • May be useful to compute Spearman rank correlation coefficients between all variables and calendar time (use year + fraction of a year)
  • Obesity and use of oral estrogen are of interest
  • Can use individual Wilcoxon or Spearman test for association with peak GH release
  • A multivariable model predicting GH may be of interest if a small number of highly clinically relevant predictors can be pre-specified
  • Scatterplots are almost always helpful
  • Correlation coefficients can be useful for getting a rough ranking of strength of association with GH across multiple patient characteristics
  • Sometimes it is interesting to try to correlate lab measurements with things that should not matter, to make sure they don't (e.g., time of day when patient study started)
  • Show "spaghetti plots" of time-response for glucose and GH; plus individual scatterplots (one for each patient) and hysteresis loops

2014 Dec 4

Angela Joanne Weingarten, pediatric cardiology

Project Description: Diagnostic cardiac catheterization in congenital heart disease is essential for accurately assessing hemodynamics in children and adult cardiac patients. The current standard for pressure measurements in congenital heart disease are fluid filled catheters for transduction of pressure in real time. These catheters in are introduced to the heart using the Seldinger technique in which guide wires are used to introduce catheters to the heart through the neck or groin. In our lab, we use catheters that are 4-7 french which is an external diameter or 1.2 -2.3 mm. While these catheters are good for most pressures measurements, there are certain situations when pressure measurements of smaller or stenotic structures are limited by the thickness of the catheter and its pliability. In adult cath labs, a new technology has been developed and is commonly used to measure pressure gradients in diseased coronary arteries called pressure wires. These wires are similar in size and structure to the introducer wires that we use in our lab to introduce catheters to the heart, but have the ability to transduce pressure. The diameter of the wire is 0.36mm and therefore are able to transverse stenotic coronary arteries. These devices are FDA approved for adult cardiac cath and have been described in the literature for congenital heart disease in children, but the pressure measurements have not been validated for children. The aim of our project is to validate these pressure wires in children and congenital heart disease using a comparison of the standard catheter pressure measurements.
  • Question: help with evaluating the power I would need to validate a device used in the cath lab.
  • Compute mean absolute disagreement between two measurements
  • Make Bland-Altman plot (difference vs. average)
  • Issue of precision, not power
  • Precision (margin of error) gets better as the square root of the number of patients
  • To be able to do a sample size calculation requires preliminary data with differences
  • Or analyze data sequentially until margin of error is satisfactory

Paula DeWitt, Center for Biomedical Ethics and Society; Madhu Murphy, pediatric cardiac intensive care unit

Email: We are wanting to test the effectiveness of a “journey board” (see attachment) designed to better prepare parents for their child’s stay and eventual discharge from Vanderbilt’s pediatric cardiac intensive care unit. This will entail giving self-administered surveys with preparedness and satisfaction items to parents of children hospitalized in the pediatric ICU immediately before and immediately after the parent has been exposed to a 15-minute educational intervention using the journey board. The intervention will take place in the child’s hospital room (or another room in the unit) and will consist of a clinician walking the parent through the journey board, and answering any questions the parent may have. Immediately before this, a researcher (not the clinician) will approach the parent, explain the study, and ask if the parent would like to participate. If yes, the parent will be given two short (5-10 minute) self-administered surveys to complete. He/she will be asked to complete one prior to the intervention and one immediately after the intervention. The data will be used to assess the effectiveness of the journey board in preparing parents, and we would like your advice concerning numbers of parents we will need to interview to obtain statistical significance and statistical techniques to be used.
  • Frank's note: Design is confounded with time/fatigue/learning. Also there is little precedent for doing a pre-post study with such little time between pre and post. I think you will need to do a randomized study to attribute any effect to the intervention. Randomize 1/2 of families to get the intervention, 1/2 to get the prevailing treatment, and give survey at the "after" time point for both groups.
  • Discussed individual patient randomization vs. pre-post design (the latter based on calendar time, not pre and post within the same patient)
  • Staffing constraints - some patients randomized will not be assessible due to no staff present the day but can likely assume this day of week effect is random
  • Sample size could be chosen so there is adequate precision or power for the single most important subscale in the parental stressor scale
  • Need a standard deviation of this scale, and a relevant difference on that scale not to miss, or a margin of error in estimating a mean difference between treatment arms

Ryan Delahanty, Epidemiology

  • ICD9 codes can be independent or dependent variables
  • 14,000 patient database; 500k total codes
  • Interested in precursor codes
  • Example question: using previous ICD9 codes to predict later readmission
  • Discussed two data reduction methods: AHRQ-type diagnostic groupings of ICD9 codes and projecting the codes onto UHC expected mortality (Dan Byrne)
  • May want to consider lab data, number of previous admissions, etc., and don't forget age (spline to account for nonlinearity); watch for importance of "present on admission" ICD9 flags
 

2014 Nov 20

Tracy Marien - Endourology and Laparoscopic Surgery

Added:
>
>
  My research is in regards to stone composition in the US and the correlation between various compositions and age, gender, and geography.
Added:
>
>
Large dataset from lab.
  • 100,000 patients in csv file, one row per patient with a cell that has mixed information for each component per type per patient (type = % vs. absolute)
  • Interested to know dominant components, do older patients have different stone types, are there geographical differences?
  • Location: city and state, zip code
  • Suggest applying for a standard VICTR voucher, time required 35 hours ($2000)

Kate Clouse, VIGH

Question about a power calculation in a study of pregnant women. Want to screen N pregnant women and estimating a strep point prevalence (range expected 0.05 - 0.25), then want to power to detect differences in detection between 2 assays. Want to be able to detect an absolute difference of 0.1 or more. Two assays are done on the same samples so data are matched. Hope for discrepancy <= 0.05. The culture is considered the gold standard, newer one is a rapid assay.
  • Estimate disagreement proportions
  • Estimate directional agreement proportions (sensitivity, specificity)
  • Oversimplification: what is the sample size N such that one can estimate the probability of disagreement to within a margin of error of +/- 0.05. Answer: N=380. N=1530 pairs to get a margin of error of +/- 0.025.
  • To estimate sample size needed to estimate sens and spec need to know the proportion culture positive
  • Kappa statistics will also be of interest (chance agreement - corrected proportions, e.g., if two true probabilities are 0.9 and samples were independent, would get 0.81 agreement just by chance)

Meridith Blevins

Reviewing methods of a colleague's paper on sexual behavior in men. Goal is to create a sexual risk (for HIV) score based on 11 behaviours. So far a lot of mysterious univariable analysis has been done. Ordinal predictors currently being treated as polytomous. One general solution is to treat the predictors as polytomous but put a "successive category" penalty on regression coefficients connected to the same question. Hans van Houwelingen had a paper on this in Stat in Med.
  • Build a "best" model (e.g. using penalization) then ...
  • Do model approximation (sometimes called pre-conditioning in the literature) where you use backwards step-down regression predicting the predicted values from the "best" model to develop an easier-to-use sub-model; how many variables can be dropped before the approximation accuracy (R^2) drops below 0.95?
  • Maybe also entertain keeping the variables as linear in the model then at the last step using all the dummy variables for the remaining questions
 

2014 Nov 13

Schola Nwachukwu - Endocrinology

Added:
>
>
  I and my team will be needing some advice on a research project we are working on. Our project is a biomedical research which entails using electronic medical records with associated genomic date as a tool for discovery in novel diabetes pathways. We are looking at type 2 diabetics with extreme phenotypes of glucose and lipids. We currently have data on triglycerides:HDL ratio which we have plotted into a histogram. We are hoping to get advice on the best sample size of extreme phenotypes to use based on the data we have.
  • Need to know minor allele frequencies
  • Suggest computing the total sample size n then genotyping the lowest n/2 ratio patients and the highest n/2 ratios; no need to solve for a cutoff
Line: 22 to 216
 
  • VALID cohort captures all VUMC ICUs 2006-2013
  • Use a multivariable propensity model as a descriptive tool to understand treatment selection
    • Main reason to use propensity score analysis is that the number of potential covariates is too large in relation to the effective sample size
Changed:
<
<
    • If <= 15 clinically potentially interesting covariates, ordinary covariate adjustment may be fine; need to include all known reasons that steroids may be used
>
>
    • If <= 15 clinically potentially interesting covariates, ordinary covariate adjustment may be fine; need to include all known reasons that steroids may be used
 
  • n=1000, 410 outcomes so effective sample size is good except for there being 130 on steroids
  • Dose-response curve is of interest; prednisone dose equivalents have been calculated
    • Regression spline in cube root of dose eq. may be worth trying
Line: 39 to 234
 

Brief project summary:
Changed:
<
<
* A brief previously studied ultrasound protocol would be conducted at the bedside of patients on ventilation for <48hrs
>
>
* A brief previously studied ultrasound protocol would be conducted at the bedside of patients on ventilation for <48hrs
  * The scan would sort patients in one of seven "diagnosis" bins * The standard of comparison would be the documented chart diagnosis of the cause for their respiratory failure * The goal would be to calculate sensitivities and specificities for each "diagnosis" bin
Line: 1518 to 1714
 
  • Consider redundancy of some variables.
    • Ex: Weight in lbs and kgs are perfectly correlated and it would not make sense to put both in a model
    • Are there some variables which could be considered highly correlated?
Added:
>
>

META FILEATTACHMENT attachment="Skinner_biostat_pptabb.ppt" attr="h" comment="" date="1421090173" name="Skinner_biostat_pptabb.ppt" path="Skinner_biostat pptabb.ppt" size="507904" user="MeridithBlevins" version="1"
Revision 190
Changes from r170 to r190
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Added:
>
>

2014 Nov 20

Tracy Marien - Endourology and Laparoscopic Surgery

My research is in regards to stone composition in the US and the correlation between various compositions and age, gender, and geography.

2014 Nov 13

Schola Nwachukwu - Endocrinology

I and my team will be needing some advice on a research project we are working on. Our project is a biomedical research which entails using electronic medical records with associated genomic date as a tool for discovery in novel diabetes pathways. We are looking at type 2 diabetics with extreme phenotypes of glucose and lipids. We currently have data on triglycerides:HDL ratio which we have plotted into a histogram. We are hoping to get advice on the best sample size of extreme phenotypes to use based on the data we have.
  • Need to know minor allele frequencies
  • Suggest computing the total sample size n then genotyping the lowest n/2 ratio patients and the highest n/2 ratios; no need to solve for a cutoff
  • May need to do an analysis to show that triglycerides and HDL are irrelevant for this purpose once you know the ratio
  • Best to plot ratios using a log axis

Erin McGuinn, Matt Semler, General Internal Medicine Division

I will be coming to biostats clinic tomorrow at noon to discuss data analysis for a project I am doing with the VALID database regarding chronic glucocorticoid use and risk of ARDS. Setting is sepsis.
  • Look at steroid use pre-admission
  • Y=ARDS in first 96h; secondary in-hospital mortality
  • Confounding by indication especially severity of illness
  • VALID cohort captures all VUMC ICUs 2006-2013
  • Use a multivariable propensity model as a descriptive tool to understand treatment selection
    • Main reason to use propensity score analysis is that the number of potential covariates is too large in relation to the effective sample size
    • If <= 15 clinically potentially interesting covariates, ordinary covariate adjustment may be fine; need to include all known reasons that steroids may be used
  • n=1000, 410 outcomes so effective sample size is good except for there being 130 on steroids
  • Dose-response curve is of interest; prednisone dose equivalents have been calculated
    • Regression spline in cube root of dose eq. may be worth trying
    • Steroids may be long-term vs. short-term; a secondary analysis could include an interaction between dose and type
    • If dose eq. distribution varies greatly by short/long, there is great difficulty in figuring out where to put knots on spline functions; ordinary polynomial in cube root of dose eq. has worked well in other situations (quadratic probably, maybe cubic)
  • How to handle interplay between ARDS and death? Union the two outcomes? Or create an ordinal outcome scale (0, 1 (ARDS), 2 (death))

2014 Oct 30

Susan Eagle, Anesthesiology

Nick Salterelli, Emergency Medicine

I am working with two emergency physicians to build a clinical study to be performed in the ICU, so it looks like Wednesdays make the most sense, but any day is fine.

Brief project summary:

* A brief previously studied ultrasound protocol would be conducted at the bedside of patients on ventilation for <48hrs * The scan would sort patients in one of seven "diagnosis" bins * The standard of comparison would be the documented chart diagnosis of the cause for their respiratory failure * The goal would be to calculate sensitivities and specificities for each "diagnosis" bin

The above would be the minimum acceptable analysis. I'd like to make further comparisons, but realize adding additional complexity may be limited by my ability to acquire an adequate sample size. In order to move forward, I was hoping for some help understanding how much larger sample sizes would need to be for the following comparisons:

1. The above protocol being completed by two groups of operators, and comparing their performance 2. The above protocol being completed with two different ultrasound devices, and comparing their performance 3. Combining 1 & 2 to compare both different operators and devices

Additional questions:

* Is multiple operators performing the protocol on one patient a valid way to increase sample size for these questions?

I will have funds available through the medical school student research program for statistical support down the line. Additionally, I plan to apply for a VICTR voucher once our protocol is complete. For now, hoping to get some of these questions answered so I can move forward in planning for funding/device acquisition/department approval/etc.

2014 Oct 23

Christopher Brown, MD

My project is a randomized trial of once daily versus twice-daily labs for patients who are being actively digressed for congestive heart failure. My question is the best way to randomize the patients given the limitations of The complexity of ordering the labs for the patients. More specifically my question is: can I randomize based on the team they're placed on. Meaning team A does once daily labs and Team B does twice-daily labs and patients are randomly assigned to each team; does that count as randomization or is en bloc sealed envelope a better methodology. Also hoping to determine the number of patients needed to achieve the appropriate power to detect a difference in outcomes.
Email from Robert Greevy:
 I often attend the Thursday clinic and I will tomorrow if I'm available. I always ask dozens questions, and I've noted some I would likely ask below.
1) What is your primary outcome?
2) How could the twice-daily impact the outcome differently than once-daily?
3) What is the expectation of the outcome under the status quo, e.g. median time to some event, 30 day M&M rate, etc?
4) What is the smallest clinically meaningful effect size that you would like to be able to detect for that outcome?
5) If randomizing at the patient level, how well do you estimate the protocols would be adhered to, e.g. would we be lucky if half the patients actually got the number of labs they were assigned to get?
6) If randomizing at the team level, how many teams could be randomized? Are there alternatives to randomizing teams, e.g. randomizing study days such that the team on that particular day will follow a randomly assigned protocol?
7) If randomizing at some sort of cluster level, how well do you estimate the protocols would be adhered to?
8) What are the potential sources of bias to worry about, e.g. team quality, season, week day, etc.?
Your question is essentially how much do I need to randomize, and the answer depends on who you need to persuade and what are the limitations of your setting. Would randomizing groups A and B qualify as having done a randomized study? No, that study design would essentially require just one coin toss. If bias exists between the teams, e.g. one team provides better care, then the study will be biased.
That said, the alternative of randomizing at the patient level may not be preferable. If adherence to the randomized assignment would be poor, the study would require complex analysis and still not be very persuasive. The 2006 SPORT trial is a nice example: http://jama.jamanetwork.com/article.aspx?articleid=204281 and http://onlinelibrary.wiley.com/doi/10.1002/sim.4510/full .
I suspect there may be an in-between design, such as randomly assigning the intervention based on study day or something in that vein, that would strike the best balance between logistics/adherence and randomization helping to control for unobservable sources of bias.

2014 Oct 16

Mary DeAgostino, MPH student

I am working on my analysis plan for my MPH thesis project surrounding sex differences within the NUSTART study results, and would really appreciate some help in the structure for the analysis.
  • Sex differences in body composition following ART initiation in HIV-infected adults. Data is longitudinal with baseline, 6 weeks, 12 weeks. Intervention is nutritional supplement. N=1800, subset=900 with CRP (C-reactive Protein, marker for inflammation).
  • Outcomes: Fat Mass/Lean Mass, Upper Arm, Leg circumferences, etc. (Body Anthropometrics)
  • Exposure is time on ART and grouping by sex?
  • Control for age, CD4, etc., is there a difference between men/women in gains following ART?
  • Consider CRP in the above question by sex as well.
  • Response Feature Analysis: Take a biologically plausible summary of repeated measures (e.g. AUC, slope). This takes you back to 1 observation per patient. For fat mass gain, then AUC could capture the extent to which fat mass increases over time. If you expect a linear relationship between treatment and fat mass gain, then you could get the slope of fat mass for all visits and use this summary measure. If response feature is slope, can take logarithm before modeling.
  • Alternatively, some more "fancy" methods take correlation into account. Do both response feature and mixed effects model and investigate that results are the same.
    • GEE model is a good method that uses Huber White sandwich estimator to adjust for correlation.
    • Repeated measures analysis with random slopes and random intercepts (mixed model).
    • For continuous data, generalized least squares takes into account the correlation between repeated measures on the same patient.
  • Use of multiple imputation: to retain data because of random missing data, probably wouldn't impute main covariates of interest (i.e. outcome and exposure)
  • Effect modifiers: put in an interaction term.
  • In limitations, acknowledge differences in completers/noncompleters.
  • Make sure that you look at the data to make sure the models aren't "screwy". Try spaghetti plots. Or draw spaghetti plots of subset of patients from percentiles.
  • For VICTR, number of hours would be 25 hours.

2014 Oct 9

Matthew Kolek, M.D. Vanderbilt Heart and Vascular Institute

I would like to reserve a spot for biostats clinic Thursday. I have 3 questions concerning my dataset: 1) best analysis 2) interim power calculation 3) quote for stats support for final analysis

I’m doing a VICTR-supported pharmacogenetic study to see if genetic variants modulate how patients with atrial fibrillation respond to beta-blockers. I’ve studied 31 patients so far and am in the process of asking VICTR for more funds. I would like to present VICTR with an interim power calculation. I would also like to ask VICTR for funding for stats support.
  • There are about total 80 subjects. Heart rate will be measured before and after taking atenolol.
  • Primary outcome is the hear rate after treatment, which is measured every minutes. Since the total time for each patient is different, can calculate adjusted area under the curve and the take the difference between before and after.
  • Imputation is not suggested due to data not missing at random.
  • Consider do a longitudinal analysis using generalized least squares using all the data.
  • Need to apply VICTR voucher for statistical analysis, suggest $4000 including manuscript preparation

Jo Ellen Wilson, Psychiatry

2014 Sep 18

Jo Ellen Wilson, Psychiatry

Jo Ellen would like help developing an analysis plan for her VICTR proposal concerning:

Brief Introduction and Background:

Delirium, a syndrome of acute brain dysfunction is routinely screened for and recognized in intensive care unit (ICU) patients. Catatonia, a neuropsychiatric phenomenon, characterized by psychomotor changes, can appear as clinically indistinct from delirium in some patient settings, yet is not routinely screened for in the ICU setting. This study seeks to explore the relationship between delirium and catatonia, the extent to which an overlap syndrome exists, and the extent to which this overlap syndrome is clinically relevant.

Study Aims:
  1. To determine the degree of overlap between diagnostic criteria for delirium and catatonia in medical and surgical ICU patients. We hypothesize that delirium and catatonia will occur as an overlap syndrome in the critically ill population, such that those who meet delirium criteria will also frequently display signs of catatonia that are missed in routine practice. We will test this hypothesis by screening medical and surgical ICU patients on mechanical ventilation or in shock who have consented to the MIND-USA or MENDS-II studies and enrolling them into the D-Cat (delirium and catatonia) study.
  2. To determine if patients with delirium manifest more catatonic signs than those without delirium. We hypothesize that patients who are delirious (CAM+) will manifest higher Bush-Francis Catatonia Rating Scale scores than those who are not delirious (CAM-). To test this hypothesis, we will have each patient enrolled into the D-Cat study evaluated twice daily while in the ICU and once daily on the general medical ward by a reference rater (performing the CAM) and C&L Consultant Psychiatrist (who will perform the BFCRS).
  3. To understand the clinical relevance of the co-existence of delirium and catatonia as an overlap syndrome to clinical outcomes including length of stay, survival, and long-term cognitive impairment. We hypothesize that patients with delirium and catatonia (D+C+) will have a longer hospitalization, worse survival, and more severe decrements in long-term cognitive impairment than those without this overlap syndrome. To test this hypothesis, we will monitor length of stay, 3 and 12 month survival, and a battery of neuropsychological tests on all patients enrolled into the D-Cat study.
  4. To determine if patients with delirium who are on benzodiazepines experience less catatonic signs than those who are experiencing delirium in the absence of having received benzodiazepines. We hypothesize that the use of benzodiazepines in ICU patients and its differential effect on delirium vs. catatonia (i.e., benzodiazepines are thought to worsen delirium but are used as a treatment for catatonia) will serve as a means of better understanding the differences between and phenomenology of delirium and catatonia. To test this hypothesis, we will study the performance on the CAM and Bush-Francis Catatonia Rating Scale scores (see Aim 2) in light of benzodiazepine exposure in each patient and across the population.
  5. To determine if patients with a hyperactive delirium and catatonia experience an increase in the number of catatonic signs after starting an antipsychotic medication. We hypothesize that the use of antipsychotics in a subset of ICU patients who have a hyperactive delirium and (excited) catatonia will experience a worsening of their catatonic signs as compared to those with hypoactive delirium (i.e., antipsychotics are thought to improve delirium but anecdotally are thought to worsen an excited catatonia). As a complement to Aim 4, this will serve as a means of understanding better the differences between and phenomenology of delirium and catatonia. To test this hypothesis, we will study the performance on the CAM and Bush-Francis Catatonia Rating Scale scores (see Aim 2) in light of antipsychotic exposure in each patient and across the population.

  • Need to consider censoring due to death
  • Would be better to treat D and C always as ordinal severity levels rather than dichotomous
    • Can envision this as a scatterplot
    • Can phrase one of the questions as "What is the distribution of C for each level of D?", and can address synergy using continuous variable x continuous variable interaction assessment
  • Beware of confounding by indication: drug usage depends on severity of symptoms
  • Can study drug usage as a function of the joint levels of D and C (i.e., reverse modeling) to understand the impact of drug usage jointly on D and C
    • i.e., regression model predicting drug exposure from pre-drug D and C levels and from their interaction
  • What other variables should be in LOS, cognitive outcomes, and survival models? It will probably be important to adjust for prognostic factors (e.g. for survival, the acute physiology score of APACHE II or III); perhaps also consider number of days already in the ICU before the assessment began
  • Sample size minimum is 96 (number needed to estimate a single proportion with +/- 0.1 margin of error); will need more for associations; n=200 is perhaps OK
  • Pick most important aim to guide sample size
  • How to deal with longitudinal data?
    • For the purpose of relating C to D sometimes it is OK to pool all available days as if you had more subjects
      • For getting a confidence interval for the correlation coefficient (e.g. Spearman rho) can use the cluster bootstrap
      • N=200 yields a margin of error of +/- 0.14 in estimating a correlation coefficient
    • A summary statistic approach may come into play (e.g., per-patient area under the curve or average response; per-patient slope)
 

2014 Sep 11

Michael O'Connor and Don Arnold

Line: 10 to 153
 
  1. Best statistical test for inter-rater reliability for this dataset (kappa?).
  2. Is it statistically ok to calculate inter-rater reliability for the complete AAIRS and for the individual components of the score?
  3. We’re not doing a sample size calculation, at least for the pre-video time period. Should we do one for the post-video?
Added:
>
>

Joseph Conrad, Wright Research Group, Vanderbilt University

  • Email: Would like to discuss more statistical analysis of flow cytometry (last discussed on May 22)
 

2014 Sep 4

Rifat Wahab, Department of Radiology

  • Email: Dr. Spalluto and I are currently conducting a survey at the One Hundred Oaks Breast Center. The survey is in regards to whether patients have a gender preference with the person performing their diagnostic mammograms and biopsies. Additionally, the study has been IRB approved. We need assistance from the Biostatistics department on how to best organize our results. Also, could you please help us determine how many surveys would we need to obtain to have statistically significant data?
Added:
>
>
  • Main question: What makes these patients comfortable, i.e. what are patient preferences?
  • Surveys filled out by patients before imaging procedure; anonymous
  • Started survey last month (on paper); not for women coming for screening exam
  • Variance in how survey is requested and how long in waiting room
  • Need to get an accurate count of the total number of women who came for the diagnostic test, and compare demographics of those doing the survey with overall demographics of women at the clinic (diagnostics only)
  • Regarding sample size, focus on precision. For a yes/no male/female question the margin of error (1/2 the width of the 0.95 confidence interval) goes down as the square root of the number of respondents. For n=96 the margin of error is 0.1 if the true probability is 0.5. For n=200 the margin of error is 0.07. For n=300 it is 0.06. For n=500 it is +/- 0.04.
    • When stratifying by race or education the demoninators are lower and hence margin of errors higher
    • Margins of errors for measures of association are higher
  • One approach to sizing the study is to determine the fraction of subjects that are in the smallest category for which you reallywant to have an estimate of a probability and the make sure there are enough (e.g., 96) in that category
    • Or just compute the overall n that will achieve an adequate overall margin of error for estimating the prevalence on one category
  • If desire to look at associations between two or more variables, it is best to completely pre-specify the associations to be tested, before looking at the data
 

Jejo Koola, VA Medical Informatics

  • Email: I would like assistance with a basic sample size question. Experimental design involves subjects evaluating a scenario and producing a yes/no answer. Outcome measure is % correct for each subject. There will be two cohorts, and each subject will analyze XXX scenarios. I need help figuring out the optimal combination of subjects and scenarios.
Added:
>
>
  • Visualization study for high dimensional data
  • Provider decisions using usual approach (having to open up and examine electronic medical record) vs. visualization
  • Attendings/residents from various specialities; disease=cirrhosis of liver
  • Visualization could provide uncertainty around the risk estimate and the applicability of the risk estimate to the current cohort
  • Suggested nailing down a clinical outcome measure before considering power
  • Debate in the literature on whether to present uncertainty measures to clinicians
  • Categorization of physician predictions doesn't help
  • See http://biostat.mc.vanderbilt.edu/wiki/pub/Main/FHHandouts/apha.pdf and Dialog.png
 

2014 Aug 28

NO CLINIC: Roundtable Discussion with Dr. Karl Moons

2014 Aug 21

Revision 170
Changes from r150 to r170
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Added:
>
>

2014 Sep 11

Michael O'Connor and Don Arnold

This project is currently ongoing, but was discussed with Ben Saville during study design, but we will now request VICTR funding for biostats support as we complete the project. We do not have any data yet to discuss, but are study aims are listed at the bottom of this email. Are main statistical questions are as follows:
  1. Best statistical test for inter-rater reliability for this dataset (kappa?).
  2. Is it statistically ok to calculate inter-rater reliability for the complete AAIRS and for the individual components of the score?
  3. We’re not doing a sample size calculation, at least for the pre-video time period. Should we do one for the post-video?

2014 Sep 4

Rifat Wahab, Department of Radiology

  • Email: Dr. Spalluto and I are currently conducting a survey at the One Hundred Oaks Breast Center. The survey is in regards to whether patients have a gender preference with the person performing their diagnostic mammograms and biopsies. Additionally, the study has been IRB approved. We need assistance from the Biostatistics department on how to best organize our results. Also, could you please help us determine how many surveys would we need to obtain to have statistically significant data?

Jejo Koola, VA Medical Informatics

  • Email: I would like assistance with a basic sample size question. Experimental design involves subjects evaluating a scenario and producing a yes/no answer. Outcome measure is % correct for each subject. There will be two cohorts, and each subject will analyze XXX scenarios. I need help figuring out the optimal combination of subjects and scenarios.

2014 Aug 28

NO CLINIC: Roundtable Discussion with Dr. Karl Moons

2014 Aug 21

L. Tyson Heller and Todd Rice, Internal Medicine

  • We would like to discuss a study on the use of Intraosseous Catheters during code situations for the establishment of central venous access.

2014 Aug 14

Reyna L. Gordon, Otolaryngology & VKC (Peabody)

  • Email: I am designing a study for an R03 proposal in which I'd like to use either Structural Equation Modeling or Hierarchical Regression, and I need help conducting a power analysis.
  • Correlation between musical rhythm and grammar in kids. Want to explain the mechanism behind this relationship in R03 proposal.
  • For the Stat Analysis Plan: Preliminary work will look at validity of grammar measures and how to combine them for one outcome.
    • Reference for mediation analysis: MacKinnon, David P., Amanda J. Fairchild, and Matthew S. Fritz. "Mediation analysis." Annual review of psychology 58 (2007): 593.
  • For the Sample Size: Instead of calculating for effect size, this study will need to give consideration to the large number of covariates being considered in whatever model (SEM, regression, HLM) in order to prevent overfitting and have reproducibility.
    • Why? powering to detect a correlation of 0.7 will give much smaller sample size than is needed to fit these large models
    • Discussed limiting sample size based on response type. For a continuous outcome, if the total number of parameters is p (fitted covariates), then the sample size you need for reproducibility is in the range defined by [10*p,20*p].
      • Reference table 4.1: Harrell FE. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis

2014 Aug 7

Jessica Pippen

  • I am working on a RCT investigating patient satisfaction with interval vs immediate postpartum IUD insertion. Attached is a draft of the protocol.

2014 July 31

Jo Ellen Wilson, Psychiatry

  • I am working on a VICTR application for funds for biostat support, and I'll need to include in my application the proposed length of time for analysis (cost, etc).
  • Study on Catatonia, and its association with delirium. Total N=62 patients.
  • Original estimate of $5000 for aim 1 and aim 2. Suggest $9000 for all the five aims (need cost share of $3500).

2014 July 24

Daniel Croymans, Vanderbilt Health & Wellness Intern

  • We are analyzing health risk assessment data on 20K subjects over multiple collection years. I would like to attend clinic on 7/24 to discuss the following topics.
  • errors with regression models * GBM – solved by creating a separate dataset which only included subjects with dependent variable data. It would be nice to not have to make a subset dataset every time. If this is the only possibility, can we go over how to create this automatically in R – currently manually done with sorting in excel:

* gbm_1 <- gbm(JobProductivityY11 ~., distribution = "poisson", shrinkage = 0.00001, cv.folds= 10, data = vandy)

Error in checkForRemoteErrors(val) : 8 nodes produced errors; first error: Missing values are not allowed in the response"

  • GLM Diagnostics – others listed in attached R code.
    • # Print ANODEV table:

Anova(glm_y10g1,type="III")

exp.chi<-data.frame(data.frame(Anova(glm_y10g1,type="III"))[,1])

vandy_glm1 <- ddply(vandy, vandy$!MedicalCareSickDaysY10)

BMI_calculatedY10, AgeCalcPSY10_AsOf01012013_Y10, GenderPSCodeY10, EthnicGroupAbbrY10, EducationLevelY10, FLSAStatusRecodeY10, SubstanceUseSmokingStatusY10)

rand.chi<- data.frame(rbind(replicate(1000, c(data.frame(with(vandy, Anova(glm(sample(MedicalCareSickDaysY10,16302, FALSE) ~ BMI_calculatedY10 + AgeCalcPSY10_AsOf01012013_Y10 + GenderPSCodeY10 + EthnicGroupAbbrY10 + EducationLevelY10 + FLSAStatusRecodeY10 + SubstanceUseSmokingStatusY10, family = poisson),type="III")))[,1]))))

summary(c(rand.chi[1,])>exp.chi[1,])

summary(c(rand.chi[2,])>exp.chi[2,])

summary(c(rand.chi[...,])>exp.chi[...,])

* ERROR:

exp.chi<-data.frame(data.frame(Anova(glm_y10g1,type="III"))[,1])

vandy_glm1 <- ddply(vandy, vandy$!MedicalCareSickDaysY10)

Error in `[.data.frame`(envir, exprs) : undefined columns selected

* Additional ERROR: related to unequal x and y table.

* Review Spline graphs – how to overlay multiple graphs, and a few general tips for additional 'beautification' of the table :).

plsmo(vandy$!BMI_calculatedY10, vandy$JobProductivityY10, method="supsmu", datadensity=TRUE)

plsmo(vandy$BMI_calculatedY10, vandy$JobLimitationsY10, method="supsmu”)

general advice with generalized linear models: evaluating fit of model (goodness of fit), cross-validation of the model, graphing the model.

* #GLM model Goetzel replication: http://www.ncbi.nlm.nih.gov/pubmed/20061888

* ## Visitsi = f(BMIi, Agei, Sexi, Race/Ethnicityi, Educationi,, Professioni, Smokingi, Site), where Visitsi is an outcome vari- able for an individual i, and f indicates the link function for the model. All dependent variables, except presenteeism, were counts of events.

* #GLM Goetzel Year 10 variables - MedicalCareSickDaysY10, BMI_calculatedY10, AgeCalcPSY10_AsOf01012013_Y10, GenderPSCodeY10, EthnicGroupAbbrY10, EducationLevelY10, FLSAStatusRecodeY10, SubstanceUseSmokingStatusY10

glm_y10g1 <- glm(MedicalCareSickDaysY10 ~ BMI_calculatedY10 + AgeCalcPSY10_AsOf01012013_Y10 + GenderPSCodeY10 + EthnicGroupAbbrY10 + EducationLevelY10 + FLSAStatusRecodeY10 + SubstanceUseSmokingStatusY10, family = poisson, data = vandy)

summary(glm_y10g1)

* # GLM model after GBM – separate inquiry from above.

glm_1 <- glm(JobProductivityY11 ~ AgeCalcPSY10_AsOf01012013_Y10 + SickDays_HRA_Y6 + SickDays_HRA_Y5 + SickDays_HRA_Y4 + BMIY9 + SickDays_HRA_Y2 + SickDays_HRA_Y3 + BMI_calculatedY10, family = poisson,data = vandy1)

summary(glm_1)

* #GLM model Goodness-of-fit testing - if p < 0.05 then model is not a good fit for actual data.

with(glm_1, cbind(res.deviance = deviance, df = df.residual, p = pchisq(deviance, df.residual, lower.tail = FALSE)))

2014 July 10

Paula DeWitt and Jessica Turnbull, Center for Biomedical Ethics and Society

  • Email: I would like to sign up for the biostat clinic this Thursday at 12:00 at 2525 West End. One of the physicians in our office (Center for Biomedical Ethics) is a pediatric intensivist at Children’s Hospital (Jessica Turnbull). She e-mailed me an excel file (attached) with aggregated data from 2 surveys (pre- and post-test) conducted with pediatric bedside nurses before and after implementation of a “rounding tool.” Jessica asked me if it was possible to determine whether any of the pre- and post-test items are significantly different. 79 nurses took the pre-test and 49 nurses took the post-test. The surveys were anonymous, so they are not matched pairs.
  • Frank Harrell's Response: In general you can't make conclusions from pre-post designs even with perfect matching [due to general time trends, learning, fatigue, and wishful thinking]. Without the matching it's even more challenging. The REDCap Survey system has a way to link up anonymous responses across time, I believe, if set up prospectively. If with the existing data you do not have a way to describe differences between the 49 nurses and the 79 - 49 = 30 nurses, the survey may need to be re-done, unfortunately. We'll know more after the discussion but be prepared for bad news just in case.
  • Pediatric ICU palliative care; modeling after a Mt Sinai adult ICU project
  • Rounding tool triggers for consideration of palliative care + educational component
  • Pre -> educational lecture + checklist for consult interventions -> Post assessment
  • Discussed Hawthorne effect, seasonality, bias due to subjects dropping out and not having a post assessment
  • Was anonymous due to nurses' sensitivity to being judged for negative comments
  • There may be some nurses who took the post-test who did not take the pre-test
  • Need to determine how many nurses attended the lecture; think it is less than 1/2 of them
    • Attendence of a lecture is perhaps an outcome variable in and of itself
  • Can emphasize in an abstract or publication the feasibility argument; use this experience to be ready to do a cardiac study
  • Regarding analysis, without the ability to pair responses, would need to treat as unpaired (2-sample) but override the sample size in statistical tests to a total of 79 [but also need to note interpretation difficulties]
  • In future consider VICTR resources, e.g., to refine future designs

Thomas Reed

  • Summer research program with Diabetes, Kidney, GI. Medical student from GWU.
  • Interested in SPSS; can cover general questions in clinic that are not SPSS-specific
  • Need to present at symposium at end of July
  • Looking at SPRINT trial - keep SBP below 140 mmHg vs. below 120 mmHg; does this change outcomes (cardiac events)
  • Ambulatory blood pressure measurements at month 27; n=27, only 7 are dippers; monitored for 24h; measured q30m
    • Have means separately by sleep and non-sleep intervals
  • Interested in nighttime dipping of BP; doesn't happen for some subjects
  • Hypothesis is that non-dipping is associated with harder to control BP and will need more anti-hypertensive meds
  • Have the raw data
  • Simplest analysis is to compare the 24h profile to the current med usage
  • Need a continuous measure of dipping; otherwise statistical power will be minimal
  • One possibility: get the area under the SBP curve during sleep [normalized for time duration, i.e., mean SBP during sleep] and subtract it from the SBP at some universal reference time (right before sleep? noon?)
  • Correlate amount of dipping with amount of meds, show scatterplot
  • Need to plot reference SBP vs. AUC, and Bland-Altman plot (difference vs. sum of reference SBP and mean sleep SBP)
  • How to quantify intensity of anti-hypertensive med usage?
  • May also want to measure SBP control at surrounding visits

2014 July 3

Jennifer Cunningham-Erves, MMC Department of Surgery and Office of Research

  • Email:I would like to reserve a time to attend the Clinical and health research clinic on Thursday. Specifically, I need assistance in calculating the sample size for my current research study.
  • Administering a survey on parental willingness to consent adolescent participation in clinical trials of new vaccines/drugs.
  • Discussed the issue of responding positively versus actually enrolling child in trial -- could result in overestimation. Other bias might result from nonresponse (probability of participating in a trial might be linked to probability of responding to the survey); especially if African Americans are less willing to respond to the survey (as indicated historically).
    • Bias should at least be acknowledged.
    • Could simulate scenarios for bias from nonresponse to assess the bias effect (ie, sensitivity analysis to assumption of random nonresponse).
  • Would like to test the association between race (AA/CA) and willingness to enroll.
    • Prior information: Proportion of AA in study: 30%. Proportion of CA parents willing: 50%. Proportion of AA parents willing: 10%.
    • May want to do a small pilot study to get a better idea of the proportions needed for sample size and power calculation.
  • PS Software available here: http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize
  • For examining factors associated with willingness, use logistic regression modeling. Suggest she choose the predictors that are important apriori (not based on statistical significance). In that case, sample size might be motivated by 10-20 events per parameter in the model.
    • Give some thought to interaction terms or consider modeling both races separately (though you lose power). Could consider using ridge regression to run a full (overparameterized) model as it shrinks poorly predictive/insignificant parameters. Stata/R would do this kind of regression, but probably not SPSS.
  • Consider applying for a CTSA voucher for the analysis portion of the study.

2014 June 19

Tultul Nayyar, Department of Neuroscience & Pharmacology

  • Email: I would like to attend the clinic on Thursday (19th June, 2014) to get an estimate for a voucher for biostatistical analysis of my data. I have data from two groups of women on different parameters of depression.
  • Want to find a biomarker to distinguish between two groups. Have N=25 patients.
  • Estimate $2000 CTSA voucher.
 

2014 June 12

John M. Flynn, Internal Medicine, Vanderbilt University

  • Email: My topic is the study of CD39 mediated regulation of human platelet function and its' influence on cardiovascular disease.
Line: 24 to 190
 
    • Key to response feature: you need a biologically sensible summary (here: the slope).
  • Some talk of gating which is drawing clusters by hand to determine which data are eligible to calculate population statistics (subjective exclusion of intermediate subpopulations and/or outliers). There are computer algorithms, but they do not always outperform human judgement. Would be ideal to have objective 'gating'.
    • What about supervised learning algorithm that does allow some human judgement? It exists.
Changed:
<
<
  • What sample size is needed to precisely estimate CD4 count from fluorescence? If only interested in CD4<250, 250=<CD4<500, CD4>=500? These numbers are commonly used in clinical algorithms. When considering patients below 250, we want to minimize FN at risk of FP.
>
>
  • What sample size is needed to precisely estimate CD4 count from fluorescence? If only interested in CD4=500? These numbers are commonly used in clinical algorithms. When considering patients below 250, we want to minimize FN at risk of FP.
 
    • How many samples are necessary to ensure that our estimates of quality are precise?

2014 May 15

Joseph Conrad, Wright Research Group, Vanderbilt University

Revision 150
Changes from r130 to r150
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"

Data and Analysis for Clinical and Health Research Clinic

Added:
>
>

2014 June 12

John M. Flynn, Internal Medicine, Vanderbilt University

  • Email: My topic is the study of CD39 mediated regulation of human platelet function and its' influence on cardiovascular disease.
  • Identify a certain gene which should be correlated with CD39
  • Power is useful in designing study. Since the data is already collected, calculating confidence interval is more meaningful.
  • ADP-induced aggregation is the primary endpoint.
  • N=220. Frequency of A is 0.16. One way analysis of variance to compare aggregation among GG (0.36), GA (0.48), and AA (0.16). If significant, then compare between two groups (AA vs. GA; AA vs. GG; GA vs. GG) using two sample t-test. Scatter plot to check the distribution of the aggregation. Boxplot is also helpful.
  • Need to have standard deviation of aggregation and a meaningful difference to calculate power. Control to case ratio is 0.84/0.16=5.25.

2014 May 22

Joseph Conrad, Wright Research Group, Vanderbilt University

  • Email: I’ll be stopping by again with a grad student or two to discuss experimental design and analysis of flow cytometry data in the development of clinical diagnostic devices.
  • Discuss flow cytrometry for detecting fluorescence of CD4 receptors
  • Want to establish that total fluorescence and cell count have a linear relationship. Collect data from N individuals with different dilutions/pp. Longterm goal is to predict CD4 count from the total fluorescence.
    • Keep all data points and use the "linear mixed effects" approach.
    • Can calculate a confidence band for the linear relationship in a typical person. Also, a 95% prediction band would give the range for most of the population.
    • From this model you can get estimates for experimental variability, population variability, and individual variability.
    • If you have data for HIV- and HIV+, can test for a difference in slopes of the two groups.
  • Another option would be to calculate the slope for each individual patient using simple linear regression. This gives estimates for how density varies across separate patients. Then you have one observation per patient, and you can use traditional tests and simple statistics that do not account for correlation. This approach is called "response feature analysis". It's a way of side stepping this 'nasty' repeated measures issue.
    • Key to response feature: you need a biologically sensible summary (here: the slope).
  • Some talk of gating which is drawing clusters by hand to determine which data are eligible to calculate population statistics (subjective exclusion of intermediate subpopulations and/or outliers). There are computer algorithms, but they do not always outperform human judgement. Would be ideal to have objective 'gating'.
    • What about supervised learning algorithm that does allow some human judgement? It exists.
  • What sample size is needed to precisely estimate CD4 count from fluorescence? If only interested in CD4<250, 250=<CD4<500, CD4>=500? These numbers are commonly used in clinical algorithms. When considering patients below 250, we want to minimize FN at risk of FP.
    • How many samples are necessary to ensure that our estimates of quality are precise?

2014 May 15

Joseph Conrad, Wright Research Group, Vanderbilt University

  • Email: Applying for VICTRS support to collect biospecimens (blood, saliva, urine, stool) that will be used as reagents in development of diagnostic devices to detect biomarkers of disease. The application requires a sample size justification. Also want to discuss study design and statistical analysis of flow cytrometry experiments.
  • Collect data for diagnostic device development in RLS (Zambia). Proposed collection of 250 specimens (1 specimen/visit), need to justify.
    • Malaria, CD4, HIV, TB diagnostics would be tested in the same cohort.
    • Endpoint is to set up standard curves.
  • Start with gold standard count of cells, then put through 2 assays (one SOC; one experimental) and they each yield a count. Wish to demonstrate that the recovery date (count recovered/gold standard count) is noninferior in the experimental condition to the SOC.
  • When creating standard curves using prediction models, consider using ten-fold cross-validation to assess the predictive accuracy. Plan to discuss modeling more next week.

2014 May 1

Amanda Salanitro Mixon, Section of Hospital Medicine, Vanderbilt University

  • Email: I need help recoding some variables, how to use skewed covariates, and the best multivariable model to choose. I have a dataset describing medication discrepancies in home health patients.
  • Discussed possibly restructuring data to one baseline record + one record per med per patient
    • Or more general, to allow extension to 1st clinic visit after home visit: yes/no flag for med at discharge, med at home, med at 1st clinic visit
  • Poisson and negative binomial models are good candidates. Need an offset for # of meds at hospital discharge.
  • Worthwhile to verify that errors of omission operate the same way with respect to covariate effects as errors of commission, dosage, frequency
  • Fit two separate models, temporarily assuming linearity for continuous covariate effects, and compare regression coefficients across the two model fits
  • Another feature of interest: time until resolution to medication discrepancy
  • Could model outcomes for individual medications using GEE-type polytomous logistic regression for Y=omission, commission, other
    • would not give credit or penalty for being on lots of meds
    • could have as a covariate the number of meds at d/c to test dilution of attention
  • Covariate transformations needed are only weekly related to marginal distributions of covariates

2014 April 24

Bindong Liu, Associate Professor of Microbiology and Immunology, Center for AIDS Health Disparities Research, Meharry Medical College

I am writing a grant application. I proposed to test the effect of a new drug in blocking HIV transmission using female tissue samples. Basically, the tissue sample will be drug or mock treated, then HIV will be loaded on the tissue. The amount of HIV pass through the tissue will be measured. By comparing the amount of HIV passing through the tissue between mock and drug treated sample, we will be able to calculate the efficiency of the drug in blocking HIV transmission. It would highly appreciate if you could help me to do a power analysis and sample size estimation.
  • Novel drug; nothing known about it in relation to HIV
  • Human tissue from surgery, pre-treated with drug or mock
  • Treated - untreated to measure effectiveness of drug
  • Goal: block HIV viral transmission
  • 40-50 samples/year
  • Grant application - 4 years, around 140 samples
  • Can tissue be split to allow each patient to be her own control? Almost always.
  • Need to know the transformation of viral load that makes the differences exchangeable (Bland-Altman)
    • If log transformation is known to yield a Gaussian distribution with constant variance, the log transformation probably works
  • Need to find a previous study that quantifies the whole distribution of viral load or provides an estimate of the standard deviation of viral load
    • Experiment needs to be similar in the ways that matter, regarding variability (treatment effect not relevant for this)
    • Can get an upper bound on the SD of differences in the log; provides conservative power or margin of error estimates

Stephen Clark, Assistant Professor, Department of Neurology, Division of Neuro-oncology

  • Clinical trial on blame tumors with methylating agent before death
  • Tumors pre-treatment, 4-6 brains at autopsy
  • CIMP methylator phenotype; epigenetics; can change meth. status of DNA
  • Global (within-patient proportion of cells meth.) + CIMP identification
  • Every patient is treated; pre-post paired design
  • Quantity of interest: double difference: difference in pre-post differences between normal and tumor tissue
  • Genetic heterogeneity across different regions
  • Discussed advantages of precision-based planning (margin of error that is likely to arise from the final estimate of the quantity of interest)
  • To do power or precision calculation requires a standard deviation estimate and an estimate of correlation between measurements within the same patient
  • If none of those are available, best approach is to say that this is a pilot study with unknown power/precision and the analytic plan is x
  • Additional complexities:
    • How to handle repeated measurements within patient (regions)
    • Need to show differences are exchangeable, e.g., they satisfy the Bland-Altman conditions (no correlation between Y-X and Y+X); sometimes proportions need to be transformed to achieve this
    • Different doses are used
    • Inherent selection problems associated with autopsy studies
  • Recommended VICTR voucher

2014 April 17

Mallory Hacker, PhD, Department of Neurology

  • Email: "I am designing a BioVU study to examine the prevalence of a SNP in groups of patients with various movement disorders. I am looking for help selecting the proper control group for these analyses."
  • Using bioVU data and patients w/Parkinson's treated with Deep Brain Stimulation, do they have a higher prevalence of this SNP?
  • Wants to plan nested case control with Parkinson's patients w and w/o DBS. How select a group of healthy controls? Decision to initiate patient on DBS should not be informed directly by allele (SNP data not available to physician).
  • Suggest she analyze w/2 parameter model so that you can separately assess the hetero and homozygous patients. There are 50 w/DBS and SNP info and 200 w/o DBS and SNP info. Could be 700 more than just need sequencing.
    • Need a power calculation and statement: "Estimate the odd ratios for DBS related to alleles?"
  • Using ICD-9 codes, can estimate diagnosis date of Parkinson's, then 9-12 years later they may receive DBS (<10% population). Can draw Kaplan-Meier plot of time from diagnosis to DBS (crude analysis). Can adjust for covariates in Cox regression model.
    • Only want to include patients with known date of diagnosis. Consider using only patients with some health care interaction prior to diagnosis (to ensure they were not transfers).
    • Important to assess dropout following diagnosis. If patients are not engaged in health care system for > 1 year, then censor at last encounter. This assumes non-informative censoring.
    • Give consideration to whether death is a competing risk with time to DBS.
  • Differences between groups (e.g. Table 1) will inform inclusion criteria, which is not all that different from designing a case-control. Important covariates for grant writing might be age, race, and gender.
  • When selecting control group, match on time from diagnosis, age, race and gender.
  • If you seek VICTR support, go heavy on programmer support and moderate on statistical support.
  • Can use PS software to estimate power for survival analysis: http://biostat.mc.vanderbilt.edu/wiki/Main/PowerSampleSize

2014 April 10

Katie Rizzone, MD, Orthopaedics and Rehabilitation

  • Would like to discuss project looking at age of sports specialization and how it correlates to risk of injury in young athletes. It would be a retrospective cross-sectional design.
  • Sports specialization is the age at which an athlete plays one sport exclusively. Research related to injury in kids early age specializing among elite athlete population (D1 students).
  • Research questions: "What is the average age of sports specialization? (conditional on being a college athlete)" "Do athletes in major division 1 (Vanderbilt) specialize earlier than athletes mid-major D1 athletes?" "Describe injury history by age of specialization and school."
    • Correlate with sport type, age, sex, race, and university (belmont vs. vandy). Detail injury history.
    • Would be helpful to better understand the reason for specialization and maybe the amount of time devoted to specialized sport (athletic exposure annually "dose"). The dose will likely modify the effect of age on injury.
    • Need to give some thought to how Vanderbilt and Belmont students may be different otherwise (ie potential confounding).
  • Plan to distribute REDCap survey, aware of potential for recall bias. This is a descriptive study of student elite athletes.
  • Inclusion/exclusion criteria need to be specified, measurement of key outcomes (like injury) will be important, VERY important to maximize response (over 90% at a minimum).
  • Survey validation, can distribute survey and analyze results for validity/reliability. Or approach experts for feedback on "face and content validity".
  • Plan is to write one manuscript from the results. VICTR study support and/or voucher: 40 hours is a rough estimate for biostatistics support given the complexity of the analysis.

2014 April 3

Lisa Jambusaria MD, Fellow, Division of Female Pelvic Medicine and Reconstructive Surgery, Department of Obstetrics & Gynecology

I am designing a randomized cross over trial to study the potential therapeutic benefit of Montelukast for bladder pain syndrome. I wanted to see if I can meet at a Thursday stats clinic to go over the type of statistics we would require and the cost for funding application.
  • Protocol is to be uploaded to the main conference computer by the investigator by going here
  • Arithmetic error in dropout calculation; need 56/0.7 not 56 * 1.3
  • Wording of difference you want to be able to detect - based on minimum clinically meaningful change
  • Better way to justify sample size: state sample size that is affordable and compute power to detect a 30% improvement at that sample size; also quote the precision (margin of error; a function of standard deviation and n)
  • Y = 0-80 symptom scale;
  • Statistical methods such as Wilcoxon signed test assume interval scale and smooth distribution of Y
  • Applying for VICTR funding (study, not voucher)
  • Need to describe randomization process; randomization will be generated by a VICTR biostatistician
  • Double check that Invest. Pharmacy is comfortable logging into REDCap
  • Need to see if REDCap can add a second-period reminder
  • Estimated stat time to request: 20 hours ($2000)

Kathleen Weber VUSOM III

Study done by Martin Jordanoff in Radiology. Diagnostic accuracy study.
  • Y = binary
  • Does it matter whether reading a study outside their area of expertise?
  • Timed vs untimed images?
  • Attendings vs. residents?
  • 12 standard images, 14 radiographers; each read twice, total of 336 readings
  • Computer-controlled image presentation, diagnosis recorded
  • Need to account for intra-observer variability
  • Patients independent or radiographers independent?
  • Making inference to population of patients or population of radiologists?
  • Complete fixed effects analysis would make strong independence assumptions
  • Could one use random effects for both radiographers and patients?
  • Can adjust for radiographer characteristics as fixed effects

Jessica Pippen

Contraception is an important aspect of postpartum care. Women are often more motivated and receptive during this period to initiating contraception due to immediate access to healthcare and the new responsibility of caring for a newborn. Moreover, this time period presents increased risk for unintended pregnancy. Women may begin ovulation before the return of menses and reinitiation of sexual activity before the 6 week routine postpartum visit, the time at which most forms of contraception are began. Nonetheless, there are still some barriers to implementing contraception within the recommended 3 week period and even by the six week postpartum follow up visit. These barriers vary by population and include inability to obtain prescriptions, failure to return to clinic for postpartum visit, or lack of understanding regarding the safety and side effects of contraception.

The IUD is a known effective long acting reversible form of contraception with a cumulative pregnancy rate of less than 1% within the first year of use. Traditionally IUDs are placed at the 6 week postpartum visit, known as interval insertion. However studies have shown safety and efficacy in placement of IUDs in the immediate postpartum period (within 10 minutes of placental delivery). Although this method ensures that patients initiate contraception within the recommended window, its main disadvantage is higher rates of expulsion (1 – 4.5 % versus 6 – 20 % within the first year)iv. Nonetheless, for some patients in whom transportation, loss of insurance or work scheduling present barriers to interval initiation of contraception, the benefit outweighs the risk of expulsionv. In this study we hope to compare patient satisfaction along with expulsion, breastfeeding an unintended pregnancy rates in those patients who have immediate versus interval placement of IUDs (Paraguard and Mirena). Our question: Will immediate IUD placement decrease unintended pregnancy rates and provide better patient satisfaction than interval IUD placement?
 

2014 March 27

Stephanie Fecteau, VKC

  • Follow-up to earlier visit to go over latest analysis
Added:
>
>
  • Longitudinal analysis of cortisol levels
 

2014 March 20

Nyal Borges, Medicine

  • Would like to create a prediction model to estimate probability of discharge (survival) based on condition and co-morbidities. Rough rule of thumb is to have 10 events per parameter in the model. We are not concerned about covariate effects, but more interested in the actual prediction performance. Could consider logistic regression.
Revision 130
Changes from r110 to r130
Line: 1 to 1
 
META TOPICPARENT name="GcrcCY"
Changed:
<
<

Data and Analysis for Clinical Research Clinic

>
>

Data and Analysis for Clinical and Health Research Clinic

 
Added:
>
>

2014 March 27

Stephanie Fecteau, VKC

  • Follow-up to earlier visit to go over latest analysis

2014 March 20

Nyal Borges, Medicine

  • Would like to create a prediction model to estimate probability of discharge (survival) based on condition and co-morbidities. Rough rule of thumb is to have 10 events per parameter in the model. We are not concerned about covariate effects, but more interested in the actual prediction performance. Could consider logistic regression.
  • Machine learning might be a good setting too -- support vector machine, random forest methods, decision tree. These include cross validation which allows for assessing performance of the prediction. Outputs will be predictions.
  • Plan is to write 1 manuscript.
  • VICTR study support and/or voucher: 40 hours is a rough estimate for biostatistics support given the complexity of the analysis.

Ben Holmes and Nyal Borges, Medicine

  • Want VICTR support for project examining outcomes in 260 cardiac arrest patients with statins as main exposure. This is a retrospective cohort study. There is no censoring and minimal missing data. Recommend logistic regression for neurological outcomes score and survival. Potential confounders/moderators include: age, time to intermediate events, dose of anesthesia, indication for taking statins (co-morbidities).
  • Ridge or lasso regression could be used to shrink parameters in this case where there are many covariates to adjust for.
  • Plan is to write 1 manuscript.
  • VICTR study support and/or voucher: 40 hours is a rough estimate for biostatistics support given the complexity of the analysis.

2014 Feb 27

Natasha Rushing, Department of Obstetrics & Gynecology, PGY3

  • I am planning to attend the clinic on this thursday. I planning for my research and am attaching a copy of the lates IRB protocol draft. The question that I need answered is, how many charts do I need to review in order to power the study. Can you please let me know what information you will need so that I can have it prepared for the thursday session? Just an FYI, there are on average 4100-4500 deliveries at Vanderbilt per year. 85% of these are term deliveries (national statistic) and 2-4% of all term deliveries will have chorioamnionitis. The primary fetal outcome will be number of days for admission, and I'd like to be able to detect a difference of 3 days. The primary maternal outcome will be presence of postpartum fever. Not really sure how to address the difference I'd like to power since this is not a continuous variable.
  • want to correlate histological diagnosis to clinical diagnosis. Compare between stage I and severe patients
  • Use PS program

2014 Feb 20

Yuwei, Statistician

  • Ran 250 models and 18 are significant. How should I adjust for multiple comparison?
  • If lots of data reference Peter O'Brien manuscript. Predict what groups the persons are in based on their lab values.
  • For each marker get R-squared (or Spearman's rho) and rank each marker. Use bootstrap to get confidence interval of ranking. If marker #20 was the best marker and never worst than the 8th. This gives a perfect multiplicity adjustment. The confidence interval gets wider when you have more markers to compare to each other. Maybe subgroup this analysis by time rather than pool all time points.
    • Reference for this method -- Frank will check.

Paula DeWitt, PhD, Research Analyst, Center for Biomedical Ethics and Society

  • Email: Would like to follow up on information discussed at last clinic.
  • Other analysis options for overall improvement (not domain specific).
    • Get proportion of questions improved per subject divided by 8 and then calculate the mean proportion and confidence interval.
    • Add up all 8 questions with the 1-5 scale pre and post, then do a pre-post Wilcoxon rank sum test.
  • If able to run a study with 4x the number of patients, then confidence intervals would be half as wide.
  • Confidence intervals are great summary measures because they give information on what is and is not going on (versus p-values which can only be used to reject the null).
    • Wilson score confidence interval is just one interval for a proportion.

2014 Feb 13

Paula DeWitt, PhD, Research Analyst, Center for Biomedical Ethics and Society

  • Email: We did a training session/simulation at CELA this week, focused on increasing Nurse Practitioner ability to prepare families for ICU care transitions. As a condition of VICTR funding, we did a pre- and post-test survey to see if the training increased the NPs’ ability to prepare families for the various transitions. There were only 4 NPs in the training, so the sample size is very small.... We are planning a grant proposal to do a larger study.
  • Paula did a feasibility study collecting pre-post data on 4 subjects.
  • Suggest looking at the improvement at the question-level and summarzing the proportion of subjects who show an improvement from pre to post. If three subjects of four improved, you can report the proportion and 95% confidence interval.
  • Here are the 5 possible 95% Wilson score confidence intervals:
    • 4/4 successes = 100% (51%-100%)
    • 3/4 successes = 75% (30%-99%)
    • 2/4 successes = 50% (15%-85%)
    • 1/4 successes = 25% (1%-70%)
    • 0/4 successes = 0% (0%-49%)
    • Interpretation for 3/4 successes: Data are consistent with the underlying true chance of improvement being between a 30% and 99%.
  • Link to Wilson Score interval in SPSS
  • Another option is to test whether the training improved results by more than 50%. It's called a one-sided fisher's exact test.
  • A second analysis would be to estimate for each person, the proportion of questions they improved upon. However, it gets away from assessing each individual domain. Consider the Wilcoxon signed-rank test in this situation.

2014 Feb 6

John Koethe, MD, MSCI, Infectious Diseases

  • Email: I have a dataset from a large HIV treatment RCT in Zambia that I would like to use for a secondary analysis of body composition and inflammation. I’d like to apply for a VICTR voucher to support the analysis and wanted to get feedback on the project and a quote.
  • RCT just completed in Zambia/Tanzania looking at early mortality (specifically nutrition) in HIV patients with low BMI on treatment.
    • 1800 patients block randomized to receive plumpy nut regular and plumpy nut with extra nutrients. Supplementation occurred 2-4 weeks before treatment.
  • Result from parent trial were inconclusive; however, they wish to proceed with secondary analyses:
    • Reduced CRP is accompanied by increased lean muscle mass and grip strength after 6 weeks of tx/supplementation. N=360 at baseline and N=300 at 6 weeks.
      • Will need to give careful consideration for informative missingness (death/LTFU). If you did a complete case analysis, can you justify that the observed effect would have been larger had the CRP for death/LTFU been observed.
      • When correlating one change with another it is hard to interpret the causal pathway (it could be circular in reasoning).
      • Measure CRP at times 1 and 2 and predict mid-upper arm circumference at time 3. Then mid-upper arm circumference is a latent dependent variable.
      • May be easier to do baseline analyses, and then a landmark analysis.
      • When doing survival analysis, consider including both measurements and not change. Maybe even an interaction term.
      • Need to state the question with extreme specificity.
  • There are many databases, but they have been cleaned and concatenated for this analysis.
  • VICTR study support and/or voucher: 45 hours is a rough estimate for biostatistics support given the complexity of the analysis.

2014 Jan 30

Sylvie A. Akohoue, MMC

  • Topics: Missing data, Nonparametric tests with adjusted variables, and interpretation of results.
    • Discussed validation of scales to measure food intake and consumption behaviors.

Maxim Terekhou, Anethesiology

  • Email: Population 1) We want to demonstrate difference in behavioral risk for LGBT vs non-LGBT patients and STD. 2) Approximately 220 employees have registered for same-sex domestic partner benefits; Reported incidence of smoking in LGBT patients approaches 50%, whereas it approaches 25% in the general population. We therefore anticipate that with 220 patients in the LGBT group, at a 2-sided alpha of 0.05, we will have 90% power to demonstrate a difference of X, between LGBT and non-LGBT groups.
  • Would like to estimate probability of smoking (or STD) while adjusting for covariates (e.g. alcohol consumption).
    • Need to understand how people are enrolled and identified as LGBT... (self report or somehow this is recorded via insurance).
      • Is it fair to treat LGBT as a homogenous group?
    • Not clear what the control group would consist of. Representativeness of sample needs to be established.

2014 Jan 23

Laura Edwards, MPH student

  • I am looking at 1 year longitudinal health systems data measured each month in 10 districts in Mozambique. I want to confirm which statistical test to run and need help knowing how to best display this information graphically, because right now my graphs are separated by district and look really busy.

  • health indicators (targets within a health system), and how close districts get to the target
  • time series of proportions (% of target)
  • some districts have incomplete reporting

  • recommend time series analysis (e.g. logit-linear model of time)
  • for multivariate outcomes, might generate a score or fit a series of models to avoid having to deal with complex analysis
  • spline models

Some recommended references:

2014 Jan 9

Matthew Taussig, medical student

  • I have a few questions regarding statistical analysis of a case control study that I inherited in the Schoenecker lab.

    We are looking at levels of hypertension in children with slipped capital femoral epiphysis and tibia vara (orthopedic problems in children associated with obesity) compared to age and sex matched controls from an obesity clinic without orthopedic problems. I have done frequency matching and

    1) want to make sure I did that correctly, and 2) I want to know which analyses are most appropriate

    working with Dr. Jonathan Wanderer on updating the PACU pain score analysis project and I have a couple of questions regarding the modeling methodology that was used.
  • Compare incidence of hypertension in two disease group. Both diseases are associated with obesity. So include obesity controls and compare two disease groups to the control.
  • Primary outcome: hypertension. Have continuous BP (average of three BP) and calculated percentile based on age and gender. Should use original BP
  • Suggest work with Ben Saville and Meng Xu through Pediatric collaboration
  • Since the sample size is large enough, can include all the patients and fit a linear model: Blood Pressure ~ age + gender + groups, or logistic regression: hypertension ~ age + gender + groups

Maxim Terekhov, Center for Human Genetics Research

  • I am working with Dr. Jonathan Wanderer on updating the PACU pain score analysis project and I have a couple of questions regarding the modeling methodology that was used.
  • Update models with more pts/more variables.

2013 Dec 19

Bennett Spetalnick, Tara A.Nielsen, OB/GYN

  • Retrospective study, about 2 year period. 4500 deliveries per year. About 400 vs 400 for each treatment group per year.
  • The primary endpoint is composite adverse outcome (maternal and neonatal).

DIa Beachboard, PMI

  • Electronic images of host cells invaded by virus. Compare wild type virus and mutant virus. The endpoint is binary pheonotype of vesicle. Another type is vesicle size.
  • 30% aberrant phenotype in wild type vs 60% in mutant virus. 200nm vesicle for wild type vs 280nm for mutant.

2013 Nov 21

Dia Beachboard, Department of Pathology, Microbiology, and Immunology and Pediatrics

  • I am trying to determine statistical power and what tests to use for electron microscopy quantification I am doing.

    I have two types of analysis that I will be preforming on 3 data sets. The first analysis is a nominal categorical analysis of normal vs. abnormal vesicles within my EM sections. The second analysis will be to determine if there are differences in the diameter of these vesicles (quantitatiive/continuous).

    For my three data sets, I will be answering different questions. For data set 1, I have 5 samples and want to compare 4 mutant to wild type. For data set 2, I will have 6 samples but I will be comparing two samples at a time. I am testing 3 mutations in two backgrounds and will be determining if there are differences in the vesicle between background but not across mutations. For data set 3, I have wild type and mutant virus in the presence or absence of a drug (4 samples total) and want to compare each sample in the presence or absence of drug then compare whether the mutant has a different response to the drug than wild type.

    My specific questions about statistically power are: 1) how many cell sections do I need to image to get statistical power? 2) how many vesicles do need to analyze for statistical power?

Allie Greenplate

  • I'm working on a VICTR grant application and need help determining a minimum sample size. The hypothesis of my proposal is that melanoma infiltrating T cells will have a different phenotypic and functional profile compared to healthy T cells. We have primary melanoma tumors from a clinical collaborator that have been disaggregated to single cells. From there I will perform mass cytometry on each sample to determine phenotype and function. Mass cytometry is similar to traditional fluorescent flow cytometry, however the antibodies are coupled to metal isotopes, allowing us to measure 30+ parameters as opposed to 4-8 parameters measured in flourescent flow. This allows for greater resolution of small subpopulations
 

2013 Nov 7

Imani Brown, MPH student

  • REDCap branching logic question. Whether missing field when not applicable will affect statistical analysis.
Revision 110
Changes from r90 to r110
Revision 90
Changes from r70 to r90
Revision 70
Changes from r50 to r70
Revision 50
Changes from r30 to r50
Revision 30
Changes from r10 to r30
Revision 10
Changes from r1 to r10
 
This site is powered by FoswikiCopyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback