You are here: Vanderbilt Biostatistics Wiki>Main Web>Clinics>ClinicGeneral>GenClinicAnalyses (16 Oct 2017, LaurieSamuels)EditAttach

- 2017-10-16
- 2017-10-02
- 2017-09-25
- 2017-09-18
- 2017-09-11
- 2017-08-28
- 2017-08-14
- 2017-07-24
- 2017-07-10
- 2017-06-19
- 2017-06-12
- 2017-06-05
- 2017-05-29
- 2017-05-22
- 2017-05-15
- 2017-05-08
- 2017-05-01
- 2017-04-24
- 2017-04-17
- 2017-03-27
- 2017-03-20
- 2017-02-27
- 2017-02-20
- 2017-02-13
- 2017-02-06
- 2017-01-30
- 2017-01-09
- 2016-12-19
- 2016-11-21
- 2016-10-17
- 2016-10-10
- 2016-09-19
- 2016-09-12
- 2016-08-29
- 2016-07-25
- 2016-07-18
- 2016-06-20
- 2016-05-09
- 2016-05-02
- 2016-04-11
- 2016-03-14
- 2016-02-22
- 2016-02-01
- 2016-01-25
- 2016-01-04
- 2015 Dec 14
- 2015 Dec 7th
- 2015 Nov 30th
- 2015 Nov 23rd
- 2015 Nov 16th
- 2015 Nov 9th
- 2015 Nov 2nd
- 2015 Oct 19th
- 2015 Oct 5th
- 2015 Sep 28
- 2015 Sep 21
- 2015 Sep 14
- 2015 Aug 24
- 2015 Aug 17
- 2015 Aug 3
- 2015 June 29
- 2015 June 22
- 2015 June 15
- 2015 June 1
- 2015 May 11
- 2015 Apr 27
- 2015 Apr 20
- 2015 Apr 13
- 2015 Mar 9
- 2015 Feb 23
- 2015 Jan 12
- 2014 Dec 15
- 2014 Nov 24
- 2014 Nov 17
- 2014 Nov 3
- 2014 Sept 29
- 2014 Sept 22
- 2014 Sept 15
- 2014 Sep 8
- 2014 Aug 18
- 2014 July 21
- 14July14
- 23June14
- 28April14
- 14April14
- 31Mar14
- 10Feb14
- 27Jan14
- 13Jan14

- First step: see how comparable the responders are to the non-responders (or the whole set of programs) in terms of # residents, # MD/PhD residents, NIH funding amount
- From there we can talk about statistical testing. We will probably want to use a finite-population correction since the whole population = 63 programs
- Regardless of comparability, descriptive statistics (10 out of 23... etc.) will still be interesting to report

- We are concerned because we don't know the number of times each system was tested. If it's not possible to get this information, one possibility might be to simulate data to try to get a sense of the possible scope of the impact of frequency-of-testing
- The overall project seems like a good fit for a VICTR voucher or short-term biostatistics support (its scope is too large for clinic). To inquire about short-term biostats support, email Yu Shyr, Chair. Another possibility might be working with a student (email Jeffrey Blume, director of graduate studies).
- Will need to keep in mind: some systems get swallowed up into other systems.
- Longitudinal data analysis won't be feasible without the complete testing data (we would need the non-violations in addition to the violations).
- Next level (after other issues resolved): geospatial correlation (tricky, though, because of the upstream/downstream issue)

- Pearson’s chi-squared or Fisher’s exact test, depending on n in each category
- Ordinal regression (vs. multinomial?)

- Pearson’s chi-squared or Fisher’s exact test, depending on n in each category
- Logistical regression

- You are welcome to come back to clinic, but as a member of the Nephrology division you are also welcome to work directly with Thomas Stewart
- Recruit patients across a range of SES's; will probably want to limit to patients who are either AA or white, due to likely low numbers in other groups

- You are welcome to come back to clinic, but as a member of the Gastroenterology division you may be able to work directly with Chris Slaughter
- Identifying appropriate controls for this study will be tricky

- for enrolled patients, look at previous 3 years then follow forward 2 years
- 250 children & 300 adults seen at clinic
- plasma & DNA samples
- outcome is pain and acute chest syndrome
- 3 genotypes and plasma biomarkers
- believe 2,2 genotype will have increased pain and chest syndrome (1,1 (26%) 1,2 (55%) 2,2 (19%))
- look at incident rate at 1 yr and 2 yr; poisson model; need to know what difference would be expected between the groups
- sample size - graph of incidence rate that can be detected vs sample size needed
- Simplest approach: find confidence interval formula for a Poisson rate; assume lowest true rate and solve for n such that multiplicative margin of error is 1.5 with 0.95 confidence
- Simplest confidence interval is lambda +- 1.96 * sqrt(lambda / n); once you have an upper limit on lambda can solve for n to give acceptable margin of error for lambda
- Would be better to get the multiplicative margin of error for the ratio of two Poisson rates (to simplify we may assume the sample size in each group is the lowest of the three genotype group sizes)
- See http://statsdirect.com/help/rates/compare_crude_incidence_rates.htm

- Perception of collaboration between doctors and nurses in Guyana
- Pre- and post-team-building exercise, then 4-6w later
- 27 participants with 2 dropouts by the end; 15 nurses, 10 doctors at the end
- Issue of using means vs. proportions for Likert scales; want to look at disagreements of perception before and after training
- Nurses used more spread of answers than doctors
- 2 demographic variables, 15 Likert questions; need to combine into a single global scale for graphical individual profiles and for stat analysis
- Can do a formal analysis of variability of responses within subject, e.g. compute the SD over 15 questions within subject and see if nurses have more variation than doctors
- Main analysis on mean
- Form within-person difference from baseline (paired data)
- Do 2-sample (unpaired) t-test comparing these differences - nurses vs doctors

- Be sure to graph all measurements (on summary score)
- Pre-post design often provides an upper limit to an intervention effect

- Sample size is fixed based on fellowship time. Power and sample size should be calculated accordingly.
- Keep the measure in the continuous form (0-100) instead of dichonimization.
- Consider to have CTSA statistician's early involvement at the design stage. Given this involves design, grant writing, data collection, data analysis, and manuscript preparation, a 90 hour work maybe needed.
- As prediction is involved (identify characters that are related to high measures), model validation should be considered.

- If dropout is not random, either GLS with a serial correlation structure or a linear mixed-effects model would be more appropriate than GEE.
- Do not collapse the diet variables into quintiles; leave them as continuous variables
- For the power calculation, it may be possible to ask for conditional approval to have access to a subset of the data to get estimates of the quantities needed for a power calculation.
- You can do a simplified power calculation with just one wave of data, and argue that the power will be higher when there are more data points per person.
- Possibly useful R packages: longpower (thank you for bringing this to our attention!), pwr (in particular, the pwr.f2.test function).
- Simulation could also be a useful approach, but it would also require some background information about the standard deviations of the variables

- Keep all levels of the "bother" variable (do not collapse it). When using it as an outcome, a good approach might be proportional odds logistic regression
- Consider trying to cluster the variables rather than the people (using factor analysis or another approach)
- A good reference for the preceding points: http://www.springer.com/gb/book/9783319194240
- Developing something like the Crohn's Disease Activity Index, https://en.wikipedia.org/wiki/Crohn%27s_Disease_Activity_Index, may be useful

- Email Hakmook Kang to talk about the possibility of working through the KC biostatistics core to get an estimate of how many children and timepoints you would need to do the flexible-breakpoint approach discussed in the article
- We also discussed an approach using restricted cubic splines. It's possible that this approach would let you use fewer subjects; it may be useful even though you are expecting a linear relationship

- See https://stats.idre.ucla.edu/stata/dae/ordered-logistic-regression/ for an explanation of proportional odds logistic regression and some helpful language for describing the results

- In deciding which categories to collapse, look at the sample overall (not by complication status)
- To increase power, consider treating the outcome as an ordinal, rather than binary, variable if there are enough people in the additional groups
- Look at the cross-tabulation between physician and sling type to see whether it is feasible to include both
- Leave the continuous variables as is (do not categorize them). May want to consider log-transforming age.
- Try variable clustering to see which variables may be collinear/redundant
- Consider combining less important (less interesting) variables into a score
- For binary logistic regression, we generally want to have 10--20 people in the smaller outcome group for every degree of freedom (continuous variable or single category) in the model
- If you apply for VICTR funding, we recommend the larger time amount if you are interested in a publication or presentation. In your application, you can cite these notes as evidence that you have been to a biostatistics clinic.

- Get more information about the survey design (especially number of people surveyed) so that you can compare the response rates in 2012 and 2016. If they are not close to each other, it will be harder to justify comparing the results of the two surveys
- If possible, get info about demographic makeup of the people surveyed in 2012 and 2016 from the organization's records. If, for example, the mean age of respondents is very different from the known mean age of the people surveyed, you will know that in at least that one aspect, the respondents are not representative of the people surveyed.
- Chi-squared tests should be fine if the categories are exhaustive (but this is secondary to the nonresponse issue)
- If possible, get more info about the outcomes and model specifications used for the regressions in Table 3.

- This project investigates speech-language imbalances in children. We are interested in the best way to measure imbalances using five standardized tests. Simple range scatter and standard deviation have been discussed. We are also interested in the best way to analyze whether increased synchrony between the five tests is associated with a decrease in stuttering frequency based on two years of development.

- The objective of this study was to measure the energy expenditure (oxygen consumption O2/kg/min) of adults practicing common yoga movements. For each individual, participants were asked to do movements in a standing position, lying position, and seated position (body orientation). In addition, each movement was done with different variations serially. In addition, participants were asked to walk at low and moderate intensities to compare energy expenditure of a comparative aerobic exercise to yoga.

- This project investigates differences in skin conductance levels in children who stutter and are persisting, children who stuttered and recovered, and children who do not stutter. All children were followed 3-4 times across a two year period. At each visit, skin conductance levels were measured during a neutral video and speaking task, a positive emotion-inducing video and speaking task, and a negative emotion-inducing video and speaking task. We would like to discuss the best statistical models for our hypotheses.

- Note that at each timepoint, there are 7 skin conductance measures (a "baseline" and 6 other measures)

- Recommendations:
- Keep all possible timepoints from all possible subjects. Do not exclude subjects based on their trajectories or baseline characteristics
- Use continuous versions of the stuttering outcomes if possible; at a minimum, collapse the outcomes into 5 ordinal categories
- Use a longitudinal mixed-effects model. Each subject will contribute 1, 2, or 3 rows depending on how many of the timepoints they have. You can model severity as a function of time-1 severity, age, sex, the seven time-1 conductance measures (or a reduction thereof; try a redundancy analysis first), time in days, and squared time in days, with random effects for subject (and possibly time and squared time). We recommend a continuous-time correlation structure, but this might be tricky with the mixed-effects model; generalized least squares might work better.
- If we can get a clear, simple plan and the analysis is not a multi-step analysis and the dataset is clean (and tall and thin, with the relevant time-1 variables and non-identifying subject ID on each row), we may be able to conduct the analysis during a clinic.
- Starting next month, we will be able to take on longer short-term projects for a charge.
- The Kennedy Center statistics core may also be able to do this. If you come back to a clinic, please remind us to invite Hakmook.

- For each overall question category, try a scatterplot of a) the means and b) the standard deviations for each item, with staff values on the x-axis and parent values on the y-axis (or vice-versa). Label each point with the question number or a short phrase to identify it
- Do variable clustering within the staff items and the parent items, to see which items tend to be answered similarly by the same person (hcavar in stata)
- Rather than doing several univariate analyses comparing the relationship between the demographic items and each survey item, do a single regression analysis for each survey item, with all the demographic items included in the model at once. Collapse the categorical items into 2 or at most 3 categories, and just assign numeric values (e.g. 1--5) to the levels in the binned
*continuous*items like distance and treat those as continuous variables (so they will have just one term in the model). Actually, though, drop distance altogether and just use travel time. The overall F-statistic from the regression will tell you whether anything in the model matters. The best approach would be a proportional odds model, but ordinary regression will be next best. - It's ok to take the means of means (across items in a particular category) and talk about those, but there aren't enough data points to warrant a statistical test.

- Instead of doing t-tests, do wilcoxon rank-sum test (only 5 response options)
- Rather than overlaying the parent and staff histograms, show the parent mean as a dot on the staff histograms
- Do the "dot-histograms" by hospital because the hospitals are so different, even if tests comparing hospitals are not significant
- Don't put too much weight on the p-values; this is exploratory research with relatively small sample sizes
- For the two similar staff questions, run a correlation on the responses to help justify using only one of the questions. Use a Spearman rank correlation.
- We don't think it would make sense to take the mean of the responses for the parent "how often" questions
- For any set of questions, it could be interesting to order the means to see which questions had the highest or lowest means, but it wouldn't make sense to do a statistical test comparing the means of the different items.

- Recommend a randomized cross-over study design with double blinding if possible
- Select a side-effect measurement tool
- Clearly state inclusion/exclusion criteria

- "I would like to request some time to talk to another statistician about exploratory factor analysis I am doing in R with the psych package. This procedure is fairly new to me and I have some questions that I would like help with."

- Association between ITSP and illness severity score
- Association between parenting style (PSDQ) and infant adoptation.

- Bladder neck size on incontinence, controlling for BMI, age, preop score, disease status, and stitch.
- Restricted cubic spline examples: MSCI Biostat II STATA

- I'm working on a project involving longitudinal data with children who stutter and persist, children who stutter and recover, and children who do not stutter.

- Matched design, 1:1, 1:many, BOOM
- match on socio-economic, clinical factors, etc.

- Change point analysis
- see if readmission rates change at time of policy implementation

- REQUEST FOR VICTR SUPPORT: Clinic statisticians recommend a 90 hour voucher.

- Developing a randomized controlled clinical trial in mental literacy. Working notion, to increase mental literacy, communications which in turn increase mental health outcomes.
- Submit concept paper to NIMental Health. Questions to address and want to get statistical expertise.
- Questions: 4 educational arms and a control group for a total of groups. Setting community mental health clinics

- Consider cluster randomization. Figure out how many clinics that you will have access to. Five arms note one clinic receive one arm.
- How to assess "fidelity"? Recording data consistently. Approach with assessment for some of inter-rater reliability.

- Mediation analysis (Baron & Kenny, structural equation modeling). First you need to show that your intervention has an association with response variable. Mediator will be communication for example

- * (Y~X) Education is associated with improved mental health.
- * (X~M) Education works through health literacy and/or communication(Mediators) to improve mental health.
- Will I benefit from cross-over design? We believe that once knowledge is gained it will be difficult to have a "wash out". Cross over design will be more appropriate to a set up such the development of new drug with clear wash out.
- Question from biostatisticians: do you need 4 arms? Can you combine some of these educational programs.
- Transient effect: Is it common in the literacy literature and look into other clinical studies such as in diabetes which require behavioral changes. There are issues of relapse and maintaining adherence.
- Timeline: Extend two years follow up time to address the "transient effect" although most studies have short follow up. Can you follow up subjects on StarPanel to show that you can address long term effects. Need to sit down with statitiscians to address realistically the multiple issues. How many clinics do you think that you could have access to? Recruitment time? How many subjects are needed?
- Consider short term effects and long term outcomes. Can you design you study pragmatically without too much effort to collect data? Using the real set up Dr. entries for follow up assessment.
- Recommendation: Follow up with VICTR voucher and statistician for help with proposal.

- Survey - baseline assessment - residents and attendings - 85 questions
- Additional survey after rotation

- Children previously seen - diagnostic visit; 4y ago; stuttering up to 18m; English is primary language
- New follow-up for status at one point in time
- Baseline variables that originate from continuous measurements (e.g., age at onset) need to be analyzed as continuous variables
- Include baseline stuttering severity as a predictor
- With a maximum of 150 children the maximum number of candidate predictors might be around 10 if the outcome variable is almost continuous (it's worse if outcome is almost binary)
- Stuttering is multi-dimensional, e.g., some children may reduce amount of speaking because of the problem, so they seem to stutter less
- May consider a compound summary of all the outcome measures, e.g., average rank across children; clinical ranking of scenarios can also be used
- Dependent variable needs to have at least 5 frequently levels and be ordered or continuous
- If there is one standout, popular scale, that one could be used by itself
- Empirical variable selection requires an enormous sample size to reliably find the "right variables" so it's best not to use selection procedures; can find various approximations to the model for clinical non-computerized application
- Data reduction methods (variable clustering, principle components, redundancy analysis) can be useful for effectively reducing the number of predictors to use in the multivariable model

- To go over analysis produced by VICTR biostatisticians

- Best to present all the raw data
- Might use 3 quartiles (25th and 75th percentiles and median) as descriptive stats and use Wilcoxon signed rank test for testing for a difference between baseline and 4h
- There's also two types of samples - same study repeated with different samples, sample drug concentration
- Only have 2 patients; plan to have 5 later
- Better to not average over the 3 replicates - may hide variability
- Bland-Altman plot (mean-difference plot) is a good way to show agreement and whether variation is stable over base levels. If band of variability expands going from left to right, this is an indication that perhaps the analysis should be done on the log concentration scale.
- Other useful ways to summarize data: mean absolute difference between estimated and true concentrations - separately by no gel and gel
- Can also show mean absolute differences between replicates ignoring the true concentrations
- There are problems with lower limit of detection, representing missing values that are not randomly missing; ordinary analysis may be problematic

- Interested in variation over time within patient
- Variants are summarized into polygenetic risk scores
- Difficulty in interpreting results if patients are being treated for the lab abnormality being studies
- How to define time zero?
- May want to ignore records corresponding to post-Rx periods
- Started with HDL
- Side study: confirm that med initiation that is supposed to modify HDL really does
- Simplest longitudinal analyses:
- Compute within-patient Gini's mean difference to correlation with gen. risk score; asks whether gen. risk is correlated with variability
- Similar but summarize with the median to correlate gen. risk with overall height of the longitudinal records
- Summarize entire longitudinal record with slope and intercept; AUC and relate summary measures to gen. risk score

- Would be useful to summarize the data using representative patients after clustering on mean HDL, shape, number of observations, maximum time gap between any two measurements
- Another type of analysis: summarize each patient using the 9 deciles of HDL; use these deciles to predict polygen. risk score
- Does not take time ordering into account
- Might add a slope or shape summary to the deciles

- I am a physical therapist in the Sports Medicine outpatient department and we are planning two studies that we would like to discuss. Primarily, though, we would like to discuss a prospective observational study we will be performing this coming school year with overhead athletes – we will be looking at the relationship of core strength to the likelihood of shoulder injury in overhead athletes. We plan to test the athletes’ core strength at start of their season and then collect data on injuries and time lost from playing their sport during the season. Specifically, we have questions about what our number of subjects should be in order to determine a difference and what we will need to do statistically in order to analyze the data.
- Outcomes: number of days (or proportion) lost during the season due to shoulder injuries
- Need information on the proportion of athletes who would get shoulder injury during a season. Sample size needed would be large if the proportion is very low.
- Could use logistic regression to examine association between core strength and incidence of injury
- Consider other factors that could affect shoulder injury such as the type of sport, number of years practicing, etc. These factors can be adjusted for in the regression model.
- To calculate the sample size, need to specify the outcome, type of analysis used, the meaningful difference (effect size: odds ratio of injury upon one unit change in core strength) you want to detect, and some preliminary data on the outcome measurements (rate or variation). A rule of thumb: 20 cases of injury are needed for each factor you'd like to analyze.
- Consider choosing a type of sports with the greatest association between core strength and shoulder injury.
- how to quantify core strength, a single summary score?
- A second study I am wondering about is an Anterior Cruciate Ligament Reconstruction study where we are going to compare a group of patients in a home based program versus standard care (control). We are wanting to do a feasibility study this year in our clinic, and I think it will be a prospective case-control study, or maybe prospective cohort—we also want to know about N size and analysis after ward.
- Enroll 7 patients in one month. Feasibility study.

- Parkinson's disease - norepinephrine; VICTR application
- Original intention peripheral blood pressure support
- Interested in a combined medication regiment
- Goal to get nor. into CNS
- Propose to study n=16 patients
- Need dose titration 100mg bid -> 600mg 3/day
- Which dose do patients tend to end up with?
- Is a safety & tolerability study, partly dose-finding
- Patient response that is monitored is blood pressure - minimizing orthostatic symptoms without side effects; target supine BP plus headaches, dizzyness, mania; symptoms are of primary emphasis
- Is there an accepted symptom summary scale? If not may need to just count the number of symptoms present
- But dose adjustments are clinical adjustments based on a symptom "gestalt"
- Target for analysis is final dose
- Need SD of dose; best available data will probably come from what doses are used long-term in clinical practice; we'll assume this is a stand-in for the final tolerable dose
- Once a useful SD estimate is found, it can be used to compute the likely margin of error in estimating the population mean required dose when n=16, with say 0.95 confidence. The margin of error is the half-width of the confidence interval.
- Would be good to know what evidence exists for the usefulness of plasma drug concentrations in estimating the final required dose

- PQI project. Two types of images (new vs. old method) were performed for each patient.
- Examine the agreement between the two methods based on the paired data (kappa stat). Readings are ordinal values.
- Let a few radiologists read the two sets of images in random order to study the agreement.
- May need a couple of hundreds of patients, and a few (2 to 6) radiologists. (also want to have good agreement between radiologists, that is, readings of a certain method do not heavily depend on the experiences of radiologists).

- Mary-Margaret Fill, TDH EIS
- Neonatal abstinence syndrome and long term outcomes
- Merge TennCare data with educational data
- Suggest regression model with traditional covariate adjustment unless need to do special matching (family, neighborhood)
- Biggest assumptions: children move away from TN for reasons unrelated to potential educational achievement
- Confounding: women giving birth to infant with NAS may tend to be different from those not having an NAS child; need to adjust for all factors related to this that might be associated with educational outcome
- Also what is the effect of school on test scores?
- Birth records have mother's educational level, zip code, tobacco use
- Matching records may be challenged by mother changing last name
- Might also look at infant and mother utilization of services, diagnosis of ADHD, etc.; cross-correlate with educational achievement

- See Chapter 8, P. 8-12 of http://biostat.mc.vanderbilt.edu/tmp/bbr.pdf - suggest using the r=0 curve. This approach is using the margin of error based on 0.95 confidence limits. E.g.: "With a sample size of N subjects we can estimate the correlation coefficient between two variables to within a margin of +/- xx with 0.95 confidence (see graph)."
- Important to prioritize the comparisons and to report them in this pre-specified order so that no multiplicity corrections will be needed
- A regression model that allows for interaction between time since trauma and amount of trauma would allow for estimation of the time-decay or enhancement of memories-effect. The time interaction effect may be nonlinear.

- Animal model for exposure to stress, long at differential response to stress
- Interested in susceptibility to stress
- Measure of anxiety is a key measure (high = more anxious)
- Each animal has a baseline measure
- Would be good to do a Tukey mean-difference plot (Bland-Altman plot) to be sure that the delta is an adequate summary of the two measures
- Also watch for floor and ceiling effects

- Using the delta as a continuous stress response measure will optimize power and minimize arbitrariness
- Discussed regression to the mean
- Problem with choice of anxiety measure out of many
- A composite measure may help, e.g., average z-score or average rank; can do Spearman rho rank correlation on the result, against another variable; can describe variability in ranks across anxiety measures
- Otherwise analyses of disparate measures can be hard to reconcile

- Shade tree clinic, where patients do not have insurance or do not have enough insurance can get medical service.
- Primary outcomes: number of ER visits, length of hospital length of stay. Will compare before and after pts visited the clinic.
- N=680 patients and estimate to have ~300 meet inclusion (time span between first visit and last visit greater or equal to 1 year).
- Need estimate for VICTR application. Suggest for $5000.

- Need a quote for biostatistical support for a VICTR grant submission
- Want to assess the association between brain tumor grade (total 128, 93 1s and 35 2s) and gender, age at diagnosis, Edema (0-3), draining vein, necrosis, location (8 different location).
- Will apply for VICTR voucher in amount of $2000.

- We will specifically be seeking some guidance regarding graphical representation of data related to statin doses in children and adolescents.

- Discussed analysis for reviewer's comments

- The purpose of the study was to evaluate whether patients with single ventricle physiology undergoing the second stage of surgical palliation, who’s length to weight ratio was >90% were at higher risk for increased ICU length of stay, ventilator times, and increased non-invasive ventilation when compared to those whose length for weight was <90%. Analyzing the data with the Mann-Whitney U Test there was a statistically significant difference between ICU length of stay and ventilator hours for those with weight for length >90% compared to those <90%. However, I attempted to analyze the data again with Spearman’s to see if there was a correlation between increasing z-score percentile and there was no statistically significant correlation.
- Clinic question: Has the data been analyzed appropriately to answer the question? Should I be concerned that Spearman’s correlation did not show a statistically significant correlation between the variables even though there was a statistically significant difference between the groups? Should I use and how might I best demonstrate association or risk related to weight for length z-score >90% with linear regression?

- I am designing a study for a small group of human subjects to test the feasibility of a new tool that I designed for breast cancer assessment using medical images. I would like some guidance on effective study designs for a small number of patients and for determining the accuracy of a new tool when there is no current clinical equivalent to compare to.
- Need a measureble outcome to calculate the required sample size

- The csv consists of sample ID, the covariates I want to test (age as an integer and categorical variable; poor.risk through transcription, which are all categorical variables; and num.muts, which is an integer) and the OS and PFS data (for censoring rows, 0=censored and 1=dead). I would like to include the interaction between age and poor.risk, because I have biological reason to believe that that interaction is relevant. My questions concern: measuring goodness of fit of the model; how to interpret the interaction term; how to estimate power, given the large number of covariates and small sample size

- "If possible I would like some help interpreting results of 2 Wilcoxon Rank Sum tests in which one is significant and the other is not."
- Compare

- The goal of the project is to examine the impact that the palliative care unit has had on the medical intensive care unit in terms of patient length of stay and mortality. I have collected data regarding some parameters per and post opening of the palliative care unit. I am interested in the best approach in analyzing the data.
- Have data a year before and a year after the unit opened. Want to compare LOS and mortality in MICU. Both groups had palliative consult, only some patients after went to the palliative care unit.
- Wil apply for VICTR biostat support. Suggest for $5000 study.

- GDM project analysis. Associaion between hoursehold income and education with the five primary endpoints.

- To discuss experimental study design and data analysis for a project within REDCaps
- Two REDCap data base can be merged based on common identifier.

- R questions about fitting logistic regression model and plotting the figure.

- I am working with the data from the National Comorbidity Survey Replication, a nationally representative sample used to estimate prevalence rates of psychological disorders. I have questions about what types of analyses to use with a complex sampling design that includes strata, clusters, and weights.

- Subpopulation command in stead of subset anaysis.

- Stratified cluster randomized trial.
- Intervention group: predicted mortality risk score obtained for all the patients, based on which "top patients" will be provided with hospice and will be expected to get better life quality. Control group: standard care. Individual agencies (50 in total) will be randomized to intervention/control group.
- Cutoff of risk score may vary within and across the sites. Information obtained from prediction model: median life expectancy, probabilities of death during certain time periods.
- If the primary outcome is continuous, would need SD to calculate sample size. If it's the time to event, we will need expected median time in each group.
- Consider the flexible/sequential design, having pilot sites included in the final analysis.
- Biostat resources: VICTR Voucher (35 hours). Dr. Matt Shotwell

- Apply for Voucher (90 hours)
- Have collected car collision/victims data, demographics of the passengers, road characteristics.
- Define collision as fatal vs serious.
- Aim to develop a prediction model to predict the severity of collision based on location, time, etc.

- Our study is on incidence of eye disease seen at Vanderbilt. We have data on 33,000 patients looking at incidence of disease and I would like to discuss how to best analyze this data.
- Whether incidence at Vanderbilt can represent incidence in Nashville

- I was hoping to come to biostats clinic today to get some help with sample size calculations for my project.
- Cross over design. Within subject correlation is 0.7. Need power calculation.

- Power analysis of survival analysis

- "I plan to implement a different type of interview in the first episode psychosis outpatient clinic at Vanderbilt Psychiatric Hospital and investigate how it contributes to improve adherence and management. The type of interview is called Shared decision making approach which is a little bit different to what we are used to. I am planning to train the MDs and providers on this technique and then compare measurable outcomes before and after the training. The outcomes would be no-show clinic rates, hospitalizations, etc. (things that are recorded automatically on the patient's chart). "
- Statistical tests: Wilcoxon signed rank test for continuous outcomes, and McNemar's test for binary outcome
- Primary outcome: number of times that pt did not show up within 3 months raning 0-4. Proportional odds logistic model to analyze. num of no show after intervention ~ number of no show before intervention + age + gender
- Sample size calculation: use PS for paried binary outcome

- Want to correlate the resistanze to therapy based on imaging with cell signal
- Wilcoxon signed rank test (paired t-test); Wilcoxon rank sum test (two group t-test)
- Mixed-effects model to adjust for other covariates.

- Name of project: Paretneral Protein Calculator (PPC)
- Type: Randomized controlled clinical trial, un-blinded
- Help needed: Discussing the primary and secondary outcomes, designing the database
- Study status: IRB approved, enrollment starting next week
- Research question: the effect of intervention on the accuracy of protein prescription. The primary endpoint is the ratio of target days to total days (target days are the days when prescriptions are given with correct amount).

- My research topic is on the effect of statins on non-alcoholic fatty liver disease
- retrospective cohort study.

- I am interested in discussing sample size calculations. We are conducting a clinical trial evaluating the effectiveness of a prophylactic antiepileptic drug (levetiracetam) in brain tumor patients. For 14 days following surgery, patients will be randomized to either drug or no drug. The primary outcome is the development of a clinical seizure and the follow-up time to primary endpoint is 14 days.

- I would like to address a few questions regarding sample size calculation for a translational study on the role of alternate complement activation in sickle cell lung disease

- Retrospective cross-sectional study on heart failure patients.
- Outcome is the low potassium, related to urine output per hour.

- Criteria for giving EKG to diagose STEMI.
- Trigger criteria: typical symptom, atypical S

- Compare CT values between four groups.
- Use non parametric test: Kruskal Wallis test (ANOVA), Wilcoxon Rank sum test (two sample -t-test)

- sample size calculations for a grant proposal

- Had questions about VICTR proposal review. Suggest use Wilcoxon Rank Sum test or Wilcoxon Signed Rank test to compare between and within subjects
- Try to identify subset of b-cell in this set of subjects - will be able to provide descriptive statistics
- Consent 60 subjects will estimate to have 30 subjects. Will quantify b-cell and compare b-cell among two different locations. First get a percentage of b-cell of the mixture then calculate the absolute number of b-cell per gram tissue.
- Will find SD from preliminary data and calculate required sample size based on that.

- retrospective chart review of 1750 patients. correlation between screening exam results with 15 diseases. 923 patients had actual visits within two years (gold standard of disease).
- Analysis data set: two-by-two tables based on 923 patients. Compare demographics between 923 patients with (1750-923) patients.
- I am currently finishing a research project that is regarding various diagnoses that are able to be picked up on a screening exam (for diabetic retinopathy). To this point, I have calculated the following values for the 16 diagnoses of relevance: true positives/negatives, false positives/negatives, positive/negative predictive values, and sensitivities/specificities. However, I am unsure what the best test is to determine statistical significance or importance of these numbers--eg, do I use a 95% CI, odds ratio, etc. One issue with these results is that although I have a very large sample size for the initial screened population (over 900), many of the diagnoses have less than 5-10 true positive results.
- Zero or close to zero number in certain cells. Wilson confidence interval. binom.confint() of binom package.

- We are working on a grant and we have some questions about a power calculation for a Repeated Measures ANOVA (based on effect size from a previous study).

- Discussion about microstimulation data to develop a test of the hypothesis that stimulating two areas in the brain from which evoked movements differ produces a blend of those movements (endpoint neuronal encoding)
- Need help understanding how to organize the data in order to build a model to explain physiological results (e.g., how the dual stimulation sites interact)
- Suggest apply for a $5000 VICTR voucher.

- I am requesting assistance in figuring out statistical significance. We see a trend in the data with the diagnosis of chronic lung disease leading to increased risk of death after trach placement vs other diagnosis.
- Babies in NICU, outcome is alive/died, want to compare chronic lung disease to other diagnosis.
- There were ~15 diagnosis, among whom 12 had chronic lung disease.
- Total 115 babies (25 died in NICU). Primary outcome is the death in NICU. 8 (or 11) babies who had lung disease and died.
- Plot Kaplan-Meier curve first for description, use log-rank test.
- Can use Cox proportional hazard model to analyze the association between lung disease and survival in NICU.
- Could also apply for a $2000 VICTR voucher.

- My project is looking at radiation-induced atrial fibrillation, specifically in patients with breast and lung cancers. I have raw data extracted from the Synthetic Derivative and am hoping for some guidance regarding my data analysis plan and how I might be able to best display my data.
- There are ~3000 breast cancer pts (125 had AF), ~2000 lung cancer pts.
- To test the association between radiation and AF, include all pts (y=AF, x=radiation y/n, cancer side); then take subset of pts who had radiation, fit a model of radiation dose/side with AF.
- Length of follow up is different for all patients. Can use survival analysis. If certain proportion of pts died before developing AF, should treat those pts as competing risks events.
- If apply for VICTR voucher, suggest $5000.

- Email: My research is investigating statin dose intensification according to the ACC/AHA 2013 Cholesterol Guidelines in post-ACS patients. I am interested in performing logistic regression analysis on ~300 patients and potentially Spearman rank r correlation coefficient.
- Two groups: historic control and intervention group. Binary outcome. Primary aim is to assess the outcome difference between groups.
- Chi-sq test and multivariable logistic regression can be used to test the primary hypothesis.
- Suggest propensity score adjustment.
- Will apply VICTR voucher in amount of $2000.

- Email: I would like to request a biostats clinic reservation on Monday 5/11 from 12-1 for a comparative effectiveness research project. The main question is selection of the best primary outcome to maximize power for a population size that will be fixed (secondary to funding and patient enrollment). Our second question is the best statistical analysis method for 3 independent continuous variables (ANOVA vs the 2 experimental groups independently compared to the standard of care comparison arm). Please let me know if you would like me to send anything in advance.
- Use continuous outcome to maximize power
- Wilcoxon rank sum test to compare two new treatment groups to the standard care group
- Multivariable linear model adjusting for baseline weight and treatment regimen

- Email: We will be completing a retrospective chart review looking at pregnancy and delivery outcomes in women with gestataional diabetes. We plan to use RedCap database for data entry.

- I am a third year medical student working on a research project that is evaluating the teleretinal imaging program at the Nashville VA Hospital. I attended one of your clinics about a month and a half ago and greatly appreciate the help I received at that time. I have now completed my data collection and am moving on to the data analysis portion of the research, and would like to discuss my revised project with you to see what the best way is for me to proceed.
- As an overview, I am looking into the teleretinal screening program to evaluate its efficiency and its accuracy at diagnosing abnormalities other than diabetic retinopathy (the true purpose). I have recorded the data on the following topics:
- Demographics (Age, sex, ethnicity)
- Months from consult entry to screening
- Days from screening until note loaded to chart
- Screening diagnoses, diagnoses found at subsequent visits, and diagnoses found at previous visits
- No-show rate for the screenings
- Consult timing
- Months since prior screenings and clinic visits

- Had imaging readings and clinic diagnosis on ~1700 subjects. There were 18 diagnosis categories, looking at their agreement.
- Will apply for VICTR voucher. Suggest $2000 for up to 35 hours

- NICU data analysis
- time trend of gestational age when receiving ECMO (Y2004-2014) for C-section babies. To evaluate the effect of policy change (increase gestational age for C-section baby in 2007) on ECMO.
- Only have the information on birth year available. Fit a linear regression model
- Also have the information on the total number of all ECMO babies. With an assumption that the proportion of C-section babies remains the same, could fit a poisson linear regression model.

- I have a retrospective dataset of patients who underwent a new cochlear implant programming procedure. The data contain pre- and post-intervention objective performance data, demographic data, and information about the cochlear implant type and location. I am trying to develop model(s) that can answer the following questions: 1) How can we predict whether a patient will be a responder to re-programming? 2) Which variables are most predictive of change in performance from baseline?
- 177 patients.
- Endpoint: measurement performance (0-100)
- Predictors: 15 ~ 20
- Fit a multivariable linear regression model. Predictor importance can be measured based on the model.

- We attended a biostats clinic on February 23rd to develop a statistical plan. Now that we have a dataset completed, we are having difficultly with our regression models and would appreciate your input.

- I would like to request a methods clinic (to review my methods) for a retrospective chart review study on female college athletes and stress fractures I am writing an IRB for.

- Study of diet intervention, body composition, insulin resistance, lipo.
- Could apply for a VICTR voucher of $4000.

- Frank's note: Design is confounded with time/fatigue/learning. Also there is little precedent for doing a pre-post study with such little time between pre and post. I think you will need to do a randomized study to attribute any effect to the intervention. Randomize 1/2 of families to get the intervention, 1/2 to get the prevailing treatment, and give survey at the "after" time point for both groups.

- TB clinic in Nashville, 203 cases (information on case only) in year 2013.
- Treatment: completed vs not completed (refused, lost to follow up, etc.)
- Research question: 1. treatment completion rate. 2. the association between patient's characteristics and treatment completion.
- Prepare data set as http://biostat.mc.vanderbilt.edu/wiki/Main/DataTransmissionProcedures
- Is there an association between country of origin and acceptance of treatment (accepted vs. no accepted)
- Apply logistic regression analysis (dichotomous or binary response variable) and include the variables of interest. *General rule of thumb the smaller sample size/10 will help you assess your regression power or how many variables you can include in regression model *N=48 that refused and will be the limiting sample size in regression analysis
- Country of origin main factor and will have to think of best way of grouping *The covariates of interest: Age as continuous non-linear; gender, marital (married vs. non-married) and country of origin

- Total number of instruments used per tray (25-100), usually less than 50% is used.
- Will compare unnecessary cost between specialty
- Will apply for VICTR voucher. A standard $2000 is appropriate.

- Patients underwent liver transplant who had plastic stent to treat leak, about 20-30% needed mental stent later
- Want to predict early whether patient needs mental or not so pt does not need to surfer pain
- The current data only gives conditional needs to mental if had plastic already
- Suggest do descriptive statistics and plan bigger study to develop prediction model
- Use R for internal validation and calibration using bootstrapping method (rms package)

- Want to know the relationship between Cortisone treatment and bacterial change.
- Each subject will be his own control: cortisone on one arm and no cortisone on the other. Each arm will be tested at two sites, one normal skin and one tape stripping skin. Observe bacterial change. Therefore, each subject will have 4 tested samples and each sample measured twice (total 8 per person)
- Look at treatment effect on normal skin. Suggest amount of $2000.

- There are limitations of pre post design. Many factors will affect the outcome besides the intervention like time.
- Box plot with raw data to explore the distribution
- Can use Wilcoxon signed rank test to compare continuous outcomes before and after
- Consider ANCOVA (Analysis of Covariance) to analyze post while adjust for pre specified covariates like previous experience

- Retro spective study of post renal transplant patients. Follow those patients for two years to observe a rare event.
- Describe users characteristics. Some pts took medication for entire 6 months, some stopped prior 6 months for certain reasons, some retook it later. Can consider using certain amount of time to define user.
- Binary/categorical variables can be described as frequency and percentage

- Logistic regression model with robust standard error is appropriate.

- Could include baseline RSA if colinearity is not an issue.

- Generalized Linear Regression with Negative Binomial Distribution is good.

- Probably.

- Not a real outlier

- Take into account the correlation within each subject.
- Might have carry-over effects between different periods. Could test on equivalent carry-over effects.

- Outreach for engineering education
- looking at data from engineering camp for girls (looking for changes in self- efficacy)
- self- efficacy- feeling that you can accomplish something in your life (this scale has been validated) * some girls have participated for one year, some have participated for about three years

- also interested in differences in pre-post scores for the one year attended by student

- descriptive statistics: consider summary statistics across different categories (i.e. pre and post scores by different school types, year of study, grades)
- Consider repeated measures type analysis (longitudinal data analysis) for assessment of slope over time (year) of self-efficacy variable
- Per question of interest - may need to reformat data to “long style or vertical format” (i.e. have row 1 id=1: 2012 post self-efficacy score, row 2 id=1: 2013 post, row 3 id=1 2014 post self-efficacy score, etc. for each of the girls)
- adjust for age , school type (consider the role of additional potential confounders)
- Consider applying for VICTR funding– for assistance in repeated measures type of analysis.
- Need to account for the correlated nature of data and verification of assumptions (such as in Mixed effects modeling or generalized least squares)
- Account for the missing data

- Limitation: lack of control group (there is no way to conclude that the program is the only thing that is improving self efficacy)

- pre vs. post score for any given year
- consider doing boxplots for each of the pre and post scores for each year (these can serve as your summary statistics)
- Univariate analysis: Wilcoxon Signed rank test to see if there is a difference between the distributions of pre and post scores (data in horizontal format works)
- cautioned combining the pre-scores over the three years, and post scores over the three years (year may be a confounder and impact trend of the data)
- pre vs. post study (may see a difference, however no guarantee that improvement is from the program- not an randomized controlled trial)
- Motivated and selected group of girls and may have higher self-efficacy baseline score (pre) - consider comparing self-efficacy scores with those reported in other studies among girls.

- Two primary endpoints; one with greatest variability (glucose infusion rate needed to maintain desired blood glucose level) has most variability and hence will be conservative to plan for
- 10 with type I DM 10 without
- Other covariates: age, insulin required to maintain blood glucose, HbA1c
- Baseline liver glycogen assessment
- Start with 3 hour fructose infusion to stimulate liver glucose update vs. saline infusion (randomized), then insulin infusion then 2 hour period where become hypoglycemic (using clamp)
- Need a good estimate of the standard deviation across patients for infusion rate - use the dog data taking all relevant time periods and stratify by liver glycogen to compute 12 SDs; then we can compute an averaging by averaging the variances and taking the square root
- Need clinically relevant difference (in mean infusion rates) not to miss: estimate 1 mg/Kg/min
- Language for grant application something like: The power calculation was based on a 2-sample t-test without covariate adjustment for HbA1c, age, etc. The actual statistical test will be ANCOVA adjusting for these factors, which will increase the actual power a bit (increase would be more had the sample size been larger; the sample size chosen has a penalty for estimating the effects of the baseline covariates).
- Last aim: most general way to assess to to fit a smooth function of time to the longitudinal (serial) measurements, separately for each of two groups, and test for differences in shape of the two curves. A convenient choice is to fit a quadratic function of time to each curve. This increases power over individual time point tests. Suggested statistical method: generalized least squares or mixed effects linear model.
- Suggested contacting Li Wang to tell her that a VICTR voucher is in the works

- Want to assess the impact of the implementation of an ASP on antibiotics use.
- Monthly antimicrobials (AMs) use in days from 2009-2012 April. Data from many hospitals including Vanderbilt. Want to compare Vanderbilt to ALLCHA.
- ASP intervention started 2012 March at Vanderbilt. Can see less use of AMs after intervention.
- The comparison of pre and post might be biased by other factors like time not just by intervention. Institution effect is hard to assess since all institutions started intervention at different times.
- Also needs to adjust for other factors like date for seasonal effect.
- Linear model of VCG ~ intervention + rcs(time)
- Better to have individual data for all the hospitals which had both pre and after data to assess intervention effect using mixed-effects model, or just compare between hospitals using data after intervention to see whether Vanderbilt does better than others
- Consider get Vanderbilt rank among all CHA

- wants to do a pilot study to get preliminary results for a grant submission.
- requesting data from Southern Community Cohort Study (SCCS). Needs power analysis and statistical plan for the data request.
- applying for VICTR biostats support for funding for this prelim project. Needs estimate.
- about 3200 men enrolled. max follow-up 10 years. about half finished the whole study period.
- Prediction of screening frequency by baseline characteristics. Association between prostate cancer stage and frequency of screening.
- all patient self-reported data, at 5 year and 10 year. (have you had screening within the last year?)
- GEE model of screening frequency (recent screening yes/no at 5 year, 10 year) on age, race, interaction between age and race, ...
- Ordinal logistic regression model of prostate cancer stage/grade on screening frequency (need be carefully defined) prior to diagnosis. Need consider different follow-up of the patients.
- Contact Li Wang(li.wang@vanderbilt.edu) for budget estimate.

- How can I determine the required sample size (i.e. number of subjects or raters) for interval estimation of the Kappa statistic for an intraobserver and interobserver study with multiple raters? Our number of subjects is currently 20 (N=20) and our current number of raters is 27 (n=27). Further, we are hoping the given sample size will give at least 80% power at the 0.05 level of significance (two-sided).
- >library(kappaSize)

- Email: I am submitting an early career grant for a starter type project due August 1 and needed help with performing and writing up power/sample size calculations.
- Specific Aim #1: identify group of lupus patients of about 1135. Lupus nephritis patients of about 400. Nephritis is severity indicator.
- Specific Aim #2: Determine the association between ED use and meeting standards of quality of care in management of SLE and in the treatment of SLE nephritis, as defined by the Quality Indicator Set for SLE. For aims #2, I would likely be performing Chi squared tests comparing 3 groups (non, occasional, and frequent ER users) for most of those sub-aims.
- Specific Aim #3: Determine the association between ED use and corticosteroid use in SLE and SLE nephritis. For aim #3, I would likely be using multiple linear regression.
- For binary outcomes, use logistic regression with adjustment of other confounders.
- Ratio will be treated as continuous variable and will be analyzed using general linear model.
- Hypothesis: more ED use will have higher steroid dose. Will analyze current steroid dose and #ED visits in the past 12 months. Steroid dose will be a ordered categorical variable with 4 levels. Can use Chi-square test. Proportional odds model can be used to adjust for other confounders.
- Grant due Aug 1st, need to be done July 21st.

- Survey on quality of life (N=1000). There are 7 GOSE questions about health states (0-100). Can describe the distribution for each GOSE. Predictors include gender, age, and years of education.
- Want to compare between GOSE scores. Multiple comparison issues (21 comparisons).
- Can use mixed-effects model taking into account of within subject correlation.

- N=36 patients who had CLL transplant with two types (8 vs. 27). Want to compare survival between two groups.
- Time from transplant to death or relapse. Sample size is limited. Mainly descriptive. Want to write manuscript.
- Can apply for voucher of $4000.

- I am fourth year medical student doing a project for dermatology. We are doing a meta-analysis of pediatric vitiligo patients to assess which populations need thyroid studies performed. I have a spreadsheet of the data. I need help analyzing it.
- Research question: the percentage of thyroid abnormalities in pediatric vitiligo patients.
- Only have aggregated data. Could have an overall estimate of percentage. Also could explore the variability between studies.
- Apply for a $2000 Voucher.

- One-year prospective study. Will record the numbers of surgeries in Ethiopia (an African country) and the number of perioperative mortalities.
- Sample size calculation to reach a desirable precision of mortality rate estimate.

- we want to find out if the IRLS estimation algorithm is reversible -- e.g., given only the Fisher information matrix and scoring function (and \beta coefficients), can we go back to the original Y or X matrices
- Context is confidentiality with data coming from multiple sites, with each site's data maintained independently, and controlled
- How to do model diagnostics without residuals?
- Does the distributed computing model lead to good statistical modeling practice? E.g.: covariate transformations, Y transformation, normality of residuals [could compute residual vector separately by center and share an ECDF of the residuals)
- How often are practitioners of distributed statistical analysis assuming linearity of covariate effects? Being careful about transforming Y or modeling Y robustly?
- Can't reverse the process to solve for an individual's datum if model is full rank, n > p, no parameter is devoted to only one subject, residual vector is secret
- If a single parameter is devoted to 5 subjects at one site, may possibly be able to solve for a summary statistic for the 5 (e.g., race has 4 levels and one of the levels only applies to 5 subjects at a site)

- May be able to discern that one site has an overall better level of Y than another site
- Not able to get a robust sandwich covariance matrix estimator if residual vector is not provided; sandwich estimation requires U matrix not just U vector
- Even if residuals are available, it may not be possible to work backwards to an individual from a given site because estimates come from a global beta vector over all sites
- We seldom use OLS with health care data; the need for weighted X'X (X'VX) instead of X'X as used in OLS makes the identification problem more difficult in general, because V is a function of the current beta estimate (for all sites combined)
- Worthwhile working out the special case where Y is binary and there is a single X that is binary or polytomous, and there is no special knowledge (e.g., k subjects are of type x and all have the same Y)
- Worth taking another look at data squashing

- Metabolic flux analysis
- Rate of metabolite turnover
- Which metabolic phenotypes are produced in high titre-achieving production processes
- Protein therapeutics; cost of production
- 14 conditions (cell lines); correlations between fluxes (80 reactions- flux, mass spec); looking for up-regulation
- 80 Spearman rank correlations x 14; each correlation 10 observations (clones)
- Two controls; secondary controls
- Independent experimental units: clones, manipulations of cell lines
- See if a unified model would be a better approach than pairwise analysis
- Must be able to precisely estimate a quantity such as a correlation coefficient in order to be reliable in picking "winners" across reactions
- Low precision (low number of independent experimental units) implies low probability of selecting the optimum reaction/condition
- Dimensionality is high enough that an "omics" method may be needed
- Recommend contining discussion at a Tuesday or Friday clinic

- My project involves survey data of 220 Spanish and Arabic-speaking patients in the Center for Women's Health. I've completed all of the descriptive statistics but need help with the correlations. For example, I know from having surveyed patients myself that those patients who reported speaking "Arabic only" at home were more likely to self-report speaking English "not very well", but I don't know how to express this statistically.
- To test association between two variables A and B,
- If A is a continuous variable and B is categorical variable, use Kruskal Wallis test (or Wilcoxon rank-sum test)
- If A and B are both categorical variables, use chi-square test
- If A is ordinal variable and B is binary, use chi-square trend test
- If A and B are both continuous variables, use spearman's correlation coefficient.

- What are major factors of degradation? Pulling apart mechanisms.
- Clinical target: liver tumors/biopsy; visualize needle
- What is the best study design?
- Ask trained readers to assess utility of image
- Discusssed hypothesis testing vs estimation study
- One estimand could be the mean absolute number of levels different
- Can relate an ordinal measure to quantitative measures of image quality
- Can estimate # patients needed if have a reliable estimate of the standard deviation of an absolute difference of interest
- May consider progressively ruining an image to see when it becomes uninterpretable
- One goal is to develop a model to predict expert's quality rating from multiple quantitative physics-based measures
- May consider an ordinal response model / multinomial model

- can't arrive before 1pm on Wednesdays, so attending Monday clinic
- "I am going to perform an email survey of surgical residents (approx 5500 in the US) and wanted to know what you think an appropriate response rate would be and the best method to do statistical analysis (rough draft of survey attached). Or should the questions be revised to facilitate a better statistical analyisis?"
- make the variable as continuous as possible using sliding bar

- grant proposal relating to the development of new diagnostic technologies for neglected tropical diseases

- Survey on two cohorts, VA-based cohort and university-based cohort.
- Outcome: global physical and mental health score. Pain is part of global score, and also a barrier to level of reintegration success. Could calculate a global score without pain. Could examine how pain correlates with reintegration and outcome.
- A specific question (meaning of life) in two standardized questionnaire. Could include both in the model predicting outcome.

- Cortisol measures 3 per day
- % of increase because times not noted accurately
- Need Bland-Altman plot to check proper transformation: post - pre vs. (post + pre)/2 or log(post) - log(pre) vs. geometric mean of pre and post
- want the transformation that makes the graph flat and random

- 1/2 of families received a service dog after 3 weeks
- Suggest longitudinal analysis using 3 daily x 15 weeks, allowing for correlation; only one day per week
- Correlation structure based on approximate time of measurements in days + fraction of day
- Model smooth time trend, allowing for separate trend in those randomized to service dog; check for shape change between two groups
- Easiest-to-interpret method generalized least squares with AR1 continuous-time correlation structure

- ECMO: what predicts survival to hospital discharge; initiated by cardiac surgeons
- Collecting patients from last 2 years (N=60 so far)
- Discussed margin of error of 0.1 in estimating a single probability with n=96
- Alternate endpoints: LOS, censor on death, i.e. Y=time to successful discharge
- Or: ordinal outcome Y=1, 2, 3, ... longest LOS, dead = longest LOS + 1; effective sample size almost equal to # subjects
- Also have Glasgow coma scale at discharge; could factor into ordinal outcome
- May be possible to use a complex high-information scale to derive a severity of illness-based score that is then used to predict mortality
- Has reduced many variables to one

- What to do with patients who died before ECMO was available?

- CTE - Chronic Traumatic Encephalitis caused by multiple concussions. Survey is designed to ask questions about awareness of CTE among parents of young athletes (junior high and high school). The plan is to distribute the survey using Vanderbilt connections with local high schools.
- Recommendations:
- Maximize response rate (by giving parents incentives of some sort)
- Ensure that the survey is brief
- Make sure the responses are anonymous
- Use numbers instead of categories
- Simplify the language
- Branch questions
- Incorporate visual analog scale (instead of categories)
- Order questions in a logical way

I | Attachment | Action | Size | Date | Who | Comment |
---|---|---|---|---|---|---|

R | BoxPlotR.R | manage | 5.7 K | 17 Apr 2006 - 11:44 | QingxiaChen | |

doc | InforegardingwhatmySPSSfilesays.doc | manage | 24.5 K | 17 Apr 2006 - 11:44 | QingxiaChen | |

sxc | LOA_condensed_data.sxc | manage | 22.1 K | 04 Dec 2006 - 09:17 | PatrickArbogast | Data from Edward Butterworth |

xls | Oluwole_Biostat_Clinic.xls | manage | 46.5 K | 25 Aug 2014 - 11:30 | SharonPhillips | data file for Olalekan Oluwole |

doc | StatisticalAnalysisRequest.doc | manage | 22.5 K | 17 Apr 2006 - 10:26 | QingxiaChen | |

png | WellsIschemicCollat.png | manage | 37.0 K | 31 Jan 2011 - 13:58 | MattShotwell | |

png | WellsIschemicEF.png | manage | 37.4 K | 31 Jan 2011 - 13:55 | MattShotwell | |

EXT | analysis | manage | 3.9 K | 11 Feb 2006 - 20:30 | QingxiaChen | |

csv | biost_clinic_stephanie_vaughn.csv | manage | 4.3 K | 23 Apr 2007 - 11:37 | PatrickArbogast | |

dta | biost_clinic_stephanie_vaughn.dta | manage | 1.7 K | 01 May 2007 - 11:12 | PatrickArbogast | Stata datafile for Stephanie Vaughn |

log | biost_clinic_stephanie_vaughn.log | manage | 8.1 K | 01 May 2007 - 11:13 | PatrickArbogast | Analysis results for Stephanie Vaughn from April 30th clinic |

xls | biost_clinic_stephanie_vaughn.xls | manage | 25.0 K | 23 Apr 2007 - 11:37 | PatrickArbogast | |

csv | boxplotdata.csv | manage | 2.7 K | 17 Apr 2006 - 10:27 | QingxiaChen | |

sxc | clintCarroll.sxc | manage | 40.4 K | 26 Feb 2006 - 21:30 | FrankHarrell | Clint Carroll Langerhans Data |

sxw | clintCarrollabstract.sxw | manage | 8.7 K | 26 Feb 2006 - 21:27 | FrankHarrell | Clint Carroll Langerhans Abstract |

doc | specificaims.doc | manage | 25.5 K | 13 Feb 2006 - 10:11 | ChuanZhou | Specific Aims |

rda | tang.rda | manage | 13.4 K | 19 Dec 2009 - 08:42 | FrankHarrell | Data from Yi Wei Tang processed using R code above |

Edit | Attach | Print version | History: r455 < r454 < r453 < r452 | Backlinks | View wiki text | Edit wiki text | More topic actions

Topic revision: r455 - 16 Oct 2017, LaurieSamuels

Copyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback