You are here: Vanderbilt Biostatistics Wiki>Main Web>CepiProject>CepiPlans (12 Nov 2010, JoAnnAlvarez)EditAttach

- Baby Books Project
- Documents
- BLAH
- Meeting 10 Aug 09
- Meeting 22 Jun 09
- Meeting 01 Jun 09
- Meeting 23 Mar 09
- Meeting 19 Feb 09
- Meeting 9 Feb 09
- Meeting 26 Jan 09
- Meeting 17 Dec 08
- Meeting 10 Dec 08
- Meeting 25 Nov 08
- Meeting 19 Nov 08
- Meeting 12 Nov 08
- Meeting 29 Oct 08
- Meeting 22 Oct 08
- Phone Conference 16 Oct 08
- Meeting 08Oct08
- Meeting 5 Oct 08

- Principal Leadership Project
- Providence Project

- logicmodel.pdf
- Here is a diagram illustrating the research goals of the Baby Books project.

- psychometrics.pdf
- Here are some descriptive statistics on the different psychometric measures administered, by wave.

- factoranalysis.pdf
- Here are some confirmatory factor analyses that we did on each measure.

- implementation.pdf
- This document gives summaries of the implementation variables.

- baseline.pdf
- Baseline comparisons of the three groups.

- atrisk.pdf
- This document shows comparisons of the three treatment groups for several measures.

- simplifiedmodel.pdf
- Knowledge questions model.

- healthmaintenance.pdf
- This looks at immunizaztion and well-child visit recommendation adherance.

- homeinventory.pdf
- This looks at the Home Inventory overall summary score and several subscores.

- nutrition.pdf
- This is looking at an overall percent correct for nutrition behaviors outcome.

- readingfrequency.pdf
- This is the reading frequency outcome.

- hsa.pdf
- Two safety practices outcomes from the Home Safety Assessment.

- in.pdf
- Choke hazards.

- injuries.pdf
- Injuries.

- tvuse.pdf
- TV use outcome.

- We discussed the organization of Principal Leadership data
- Year 1 data is complete and ready to be analyzed
- We will write an analysis plan for year 1 data
- Years 2-3 have a different structure, but we can also think about/draft an analysis plan for all 3 years of data
- No timetables were discussed
- Len will get back to us about prioritizing covariates
- Carolyn will be filling role of Katie

- Video files
- For the toy play video, we have data for two waves. We will probably do a separate analysis on each wave.
- We may do a formal test that wave 7 has an even higher group effect than wave 6.
- We will look at the maximum score for every toy. We will think about how to incorporate the frequency.

- Aim 2
- We will use a linear mixed model.
- Should we include a group-knowledge interaction in the model?
- We will do a SEM as a secondary analysis, trying on the Home Inventory outcomes first.

- Aim 1
- Should we model the differences between the taught and untaught scores and check for a group effect?
- JoAnn will run an ANOVA to check the effect of group on baseline knowledge (at wave 1).
- We will look up the cross-lagged panel design (for reciprocal effect analysis).

- Time variables in the various longitudinal models
- We will use the baby's age as the time variable.
- We may look at the length of time the child had the book as a separate side question.

- Multiple testing
- We discussed several ways to account for multiple testing, and decided on using a significance level of 0.01 by consensus.

- Covariates
- Stephanie will make a list of core covariates to be used in every model, and a few others that will be included in specific models.
- JoAnn will make some tables to show the correlation between public assistance and education.
- For each outcome with a wave 1 measurement, we will include the wave 1 measurement as a baseline covariate in the analysis (the outcome will consist of all measurements after wave 1).

- Video files
- Aims 2-5 will rely on data coded on procoder video.
- There are three types of video files: cognitive assessment of child, reading interaction, and playing interaction.
- Eventually, some summary variables from the videos will be added to the main baby books data.
- Stephanie will send JoAnn and Ben codebooks for the video files, or information about the types of data available, including summary scores.
- Stephanie will communicate with John about transfer capabilities of the procoder files.

- Aim 2: To determine if increases in knowledge of child development and parenting strategies results in changes in parenting behavior.
- Could be partially addressed without the video files.
- For the main predictor, knowledge, Len and Stephanie are interested in looking at effects of knowledge in specific topics.
- There are about six topics: safety, discipline, nutrition, sleep, maternal health.
- Ben and Warren raised issues of strong potential for lack of normality and continuity (positive weight on scores of zero) for these subgroups because of small numbers of questions in each subgroup.
- To examine this, Ben and JoAnn will look at the distributions of these topic scores.

- For the outcome, parenting behaviors, there is a list of seven measures, each consisting of many variables. We will need to think about how to aggregate these into fewer measures (e.g. 7 summary scores). It was noted that, for example, there is a reading subscore in the HOME inventory, a reading video component, and the home literacy questionnaire, that could possibly all be combined.
- The HOME inventory has subscales that are used in the literature frequently.

- Important covariates
- Some of these are listed on the graphic representation under item 14.
- Stephanie will email Ben and JoAnn a list of important mediators, by research aim, in terms of type of variable (continuous, binary, ordinal, categorical) available.

- Discussed structural equation modeling versus glm.
- SEM will be used as a secondary analysis to confirm theory, but we will use more simplistic approaches for the primary analyses.

- Study group names: Len pointed out that the group labeled "Commercial Book" in the data did not receive a commercial book (post-meeting update: in one wave, the "control book" group did actually receive a commercial book). Laurie and JoAnn will do the coding necessary to rename "Commercial Book" as "Control Book."
- Material on inside and back covers will be considered separately from the material in the regular text. Laurie has already coded the numbers correct and numbers asked, for each mother, for each wave. Variable names can be found in N:\Projects\Baby Books\Data\BabyBooksVariablesToCalculate.xls.
- Combining the No Book and Control Book groups: Len proposes to combine the No Book and Control Book groups for analysis. Ben says this type of collapsing is routinely done in pharmaceuticals to obtain greater statistical power. It may be of interest to test for a difference in all 3 groups, as well as the treatment group versus No Book and Control Book groups combined, but with the understanding that such approaches will lead to more hypothesis tests (and is therefore very much exploratory). One reason for not combining them would be that the Control Book group might differ systematically from the No Book group on the outcomes because the mothers are spending more time reading to there babies.
- Simplified model for immediate work: The outcome would be each mother's percent correct of all "taught" items, for the entire study. This would include only items that were previously "taught" in regular pages of the books, and not those items that were only taught on the covers. This would not penalize for questions that were not answered or waves that were missed. The percent of "untaught" items answered correctly would be used as a covariate in this model. A limited number of other important covariates could also be included. The analysis can be repeated for mother's percent correct of all "untaught" variables (with limited covariates), with the expectation of not finding significant differences. The analysis could also be repeated for "taught in prenatal" or "taught on inside/outside cover" questions, but this remained somewhat unresolved (and depends on whether prenatal questions were included in the "taught" items).
- Other model considered but ultimately rejected: We also considered doing a separate version of the model above for each wave. Some of the waves have very few items, so we thought this wouldn't work because the outcome for those waves would violate the assumption of continuity.
- Ben proposed a more complex joint model: repeated measures joint logistic model. This could incorporate question-specific covariates such as wave, prenatal book material, board book page, inside/outside cover, and amount of time mother was in possession of the book (as well as interactions with treatment). These question-specific variables may involve a moderate amount of programming and would certainly involve creating many new variables. The repeated measures model would jointly model whether each question was answered correctly for all the n~~50 taught items, and would be an effective method of combining information from all waves (leading to better estimates of the variance) without losing important data.
- Attrition analysis: Len mentioned that we need to do a routine analysis of the patterns of missingness. This should be added to our to-do list.

- Discussed variables to calculate assessing adherence to pediatrician-recommended check-up schedule.
- This adherence is a main outcome, yet subjects were not directly asked what the dates of the check-ups were.
- Discussed ways to deduce the dates from the data.
- Strategy is to see how many subjects' dates are available in the data.
- Plan to contact Ana Regina to discuss allocation of tasks.
- Planned to assess demographic differences between treatment groups.
- Laurie will email us the names of demographic variables.

- Discussed concepts of pre versus post measures. Considered alternate terminology: "taught" (presented in a book given before the wave in which the question is asked) versus "untaught"
- Established that all phone wave knowledge questions are "taught."
- One strategy considered: give each person two scores for each wave: taught and untaught. These could be the number correct or percent correct.
- The knowledge questions will be an outcome in a regression model. Some important variables to control for are the Wonderlic score and education level.
- There are five subscores of knowledge that will be considered.
- Knowledge is also a predictor variable for other outcomes.
- Issues considered, but not necessarily resolved:
- Do we need to take into account the length of time between the time the book was given and the time the relevant question was asked?
- For the knowledge questions, should we combine the no book group and the commercial book groups?
- Are trajectories for each individual an important outcome?
- What questions do we have enough data to answer?

- To investigate low alphas, JoAnn will compute alphas for the CES-D again without the three low-performing items. For some of the measures with low alphas, JoAnn and Ben will look at the alphas by interviewer, race, and education level.
- JoAnn will compute Cronbach's alpha on the Parenting Satisfaction Scale.
- JoAnn will re-run the factor analysis for the Home Inventory without the two items with zero sample variance. We will correct the error that Ben found in our factor structure for the Parenting Stress Index.
- We will make a table of numbers of complete cases for each measurement by wave.

- We worked from the plans document that Stephanie sent us, which JoAnn has posted in an editable format here. The main headings are roughly in order of priority.
- JoAnn and Ben will run descriptive statistics on all variables by wave to check for discrepancies and usability.
- JoAnn and Ben will compute Cronbach's alpha statistic on psychometric measures to check for internal consistency and that items were correctly reversed where applicable.
- We talked briefly about the possibility of using factor analysis on psychometric measures.
- Ben and JoAnn will calculate medical adherence variables after meeting with Laurie to learn more information. The first ones to start on will be the immunization schedules and the well-child visits, because they can be done before the chart reviews.
- CepiProject will take over checking the variables that Laurie calculated.
- We agreed to meet at least every two weeks and maybe more often depending on progress.

- Decided to try the following way to export the R data into SAS:
- Use R code to retrieve the original internal values for all factor/categorical variables that were originally created at CepiProject. This would also remove the formats associated with these variables. This file is the sasdat.Rdata in Baby Books/DataR/dataprocessing. It will have to be uncompressed before being input to stat/transfer.
- Use stat/transfer to create a sas7bdat data file and also an automatically-generated sas program file to put formats on any factor variables that were created in R.
- The CepiProject team will apply formats to all the original factor variables by using the center's existing sas code.

- We discussed how often this export would need to take place. Some suggestions were monthly, quarterly, or once after all data waves have been closed.
- We discussed whether it was necessary to export new variables and datasets created by Ben and JoAnn during the analysis. We agreed to discuss the group of new datasets periodically as a group and have the center decide on a per-dataset basis whether they see a potential future use.
- We decided to have weekly meetings with the entire research group.
- Ben and JoAnn may have 5% of their effort allocated to projects other than Baby Books.
- We decided to try to schedule another group meeting before Christmas.

- Ben and JoAnn have transferred their merge of the waves in R into SAS using stat/transfer. This requires the use of some of the stat/transfer options to automatically produce a sas program to put formats on every variable that has value labels. So far this transfer looks successful, but the following are proposed to ensure the data's integrity:
- Ana Regina and Laurie will do some spot checks.
- JoAnn and Ben have exported the R version of wave2 into SAS. Laurie will use proc compare to compare this with the original wave2 SAS dataset.

- JoAnn will create a SAS dataset in the Linux version of SAS with a few new variables with formats to test whether the file will be available in Windows.
- Ben and JoAnn plan to email the entire group with a link to the twiki website when we have added updates about substantial progress.
- We reached a consensus on the need for monthly meetings with the entire research group.

- We discussed the progress of transferring the R data back into SAS. The factor variables do not have labels, but the other variables do.The factor value labels are all being converted to characters. Our next step is to try some different options in stat transfer. Laurie will try these today if time permits; otherwise, JoAnn will try Thursday.
- We discussed comparing the pre- and post- knowledge tests (Opinions About Babies questionnaire). Stephanie has made a chart in Excel with different questions on the tests. We need additional clarification regarding spreadsheet variables before proceeding. Ben will e-mail Stephanie. Depending on Staphanie's answer, McNemar's test may be the appropriate tool.
- Merge/Import is looking good so far. Ben will merge in SAS to check R.

- Summary
- Discussed incurred cost data and merging issues

- We can merge incurred cost variables into one large data set, and subset when it comes time for the analysis
- Laurie will check formats for aw1bkdys-aw8bkdys
- Laurie will check on one data value that should be missing but is coded in SAS as an actual date
- Laurie will make sure there are no missing values for awwave
- Laurie will check on $AGEUNITS. format for awoih01c-awoih08c vars
- JoAnn will give Ana Regina the merged R file as .Rdata file, with one new variable with R format and label. Ana Regina will try to get this back into SAS using Stat-transfer. If this doesn't work, we may need to do the merge and new variables in SAS before exporting to R.
- We discussed strategies for checking merges. 1) checking main variables for basic stats and n, especially ones that should not have any NA (awfamid). Look at famid*wave freq table and send to data manager. Take column sums for # per wave, should match the number of rows in wave dataset.
- We plan to ask Ana Regina if there are any checks on the merge she wants at the next meeting.

- Ban and JoAnn told Laurie about their progress so far in importing the data and verifying that it was imported correctly.
- We discussed how the missing values are being imported in R.
- For numeric variables: The four types of missing values from SAS are all NA (missing) in R.
- For character variables: So far, NONE of the SAS missing values are showing up as missing in R, but rather as a character string, for example, "M."

- Discussed folder organization. After we are sure about the data import, we plan to make a folder called "Data R" at the same level as the current "Data SAS" folder. JoAnn plans to make one separate R code file that imports the data.

- Discussed ways to check integrity of the data JoAnn and Ben imported from SAS to R. We decided that JoAnn will compare descriptive statistics computed on the SAS data with those computed on the new R data for several different types of variables that are likely to be problematic, especially dates.
- Next JoAnn and Ben plan to merge waves 2, 3, A, B, C, and D. We discussed some possible complications with the merge and appropriateness of making one big merged file or a couple smaller ones and agreed to talk to Ana Regina about the incurred cost variables. We decided that one way to check the merge would be for JoAnn and Ben to independently merge the files in R and SAS, respectively, and then compare the results.
- The next task for Ben and JoAnn is to check the variables that have been computed by other variables. One way would be to calculate the new variable in R and compare it to the existing one. Another way would be to make two by two frequency tables for the variables that record whether the mother answered each question on the opinions about babies measurement.

- Discussed general outline of project
- Data cleaning and formatting to be coordinated through Laurie
- There will need to be some structure for weekly tasks and monitoring of the analysis. This can be done either through a spreadsheet or the Biostat Twiki. Tasks will be communicated and coordinated through Katie and Stephanie. Len will be kept in the loop mainly by Stephanie, but also by Katie, Ben, and JoAnn.
- Data will be transfered to R by Biostats for analyses.
- Ben and JoAnn will be sure to give detailed comments on R code, such that others in the future can understand program
- Validation will need to occur for any type of merging (Should we merge all waves into one data set?). We will use unique ID for merging.
- Need to set time/date with Laurie to discuss adherence variables
- Preliminary analyses will be investigated after data cleaning/merging is complete.
- We discussed co-authorship. This will need some future discussion, but if Ben and JoAnn give an intellectual contribution then they would expect co-authorship. Stephanie expressed interest in single author papers if she alone has the idea and knows exactly what data she needs, such that Biostats are simply crunching numbers without intellectual contributions. Ben and JoAnn feel this would be a rare occurence, as their goal is to contribute intellectually to all analyses dealing with these data.
- Laurie will send Biostats a SAS transport file (.xpt) for Wave 2 data (both with and without new variables), Ben will send her example SAS code
- Bios group cannot run SAS files created by CepiProject unless they change all paths and directories. This is complicated because the SAS files generally reference several macros and other files with directory names. This may be a strong reason to have Laurie "SASify" the remaining data before turning over to Bios. We don't want to have 2 copies of SAS code (one for windows and one for Linux).

- Browsed data and dataSAS folders
- Discussed what group will manage merging (Biostat) and initial data sets (SASifying) for future waves (TBA)
- Need to set up meeting with Samuels to review data folders and files
- Discussed codes for missing data
- Discussed Dec 1 report deadline (talk to Stephanie for details)

- Summary
- We reviewed progress on the data import, merge, and calculated variable checking. We found and discussed some inconsistencies.

- We noted that the merged data set has an incorrect number of rows, 754, instead of 802. JoAnn plans to check on this.
- Laurie noticed that in some of the variable labels, unmatched quotes have been replaced by "\x92." JoAnn plans to try fixing this.
- On the pdf document, the function latex(describe()) does not display dates correctly, but the same variable's dates did display correctly somewhere else on the same document.
- JoAnn will re-run the program, specifying the missing values before assigning the labels and formats.
- We will add the studygrp variable to all the waves. It currently only appears in wave 1.
- We will import waves 0 and 1.
- We discussed whether certain variables should be included in the merge and came to the consensus that all variables will be in the merge.
- We plan to ask Ana Regina if there are any checks on the merge she wants at the next meeting.

- cronbachs.pdf
- Here is a document with info about missing data as well as reliability measures for the components of the instrument.

- CFA favors the six factor model over a one factor model.
- However, the one factor model may not be optimal. We will look at two- and three- factor models.
- If a two-factor model is superior to the six-factor model, we can throw out the factor one questions and replace them with the CEPI trust questions, since they are highly correlated.
- Warren noted that we may not have the appropriate data for EFA.
- Warren will make a scree plot for the teacher means data.

- Len suggested that aggregate data (averaged over teachers) should not be factor analyzed for substantive reasons. He suggested using teacher-level records from the several samples, study the effects of missing data, and then see if results of psychometric analysis are consistent in all samples. Len wants a short form, perhaps using items that are rarely missing.
- Warren will run the data samples separately through his torture test, and see if results are consistent across samples.
- Ben will do a likelihood ratio test on the six factor model against the one factor model, and compare relevant fit statistics
- Planned to eventually look at how and what feedback was actually given to the principals (check with Katie).
- After each round of new results, we will meet with Len and Katie to plan subsequent steps.

- Discussed pros and cons of aggregating by school.
- Aggregating by school
- We won't have enough data (72) for CFA.
- Conflict with basic principles of factor analysis: Schools don't think about principals, teachers do.

- Not aggregating
- Missing data problem: only 25% of the data are complete cases. We could think about imputation to solve this. However, there are concerns that this would lead to fewer factors.
- Is there a multi-level CFA? Would this even be possible without teacher id's?

- Aggregating by school
- Discussed changing the instrument mid-study. All agree that it would be a slippery slope. CEPI wants to shorten the instrument with the goal of reducing the burden of the participants.
- How would we shorten the instrument?

- Discussed Providence study and general design (patients nested within counselors nested within sites)
- Study was randomized by site
- 2*2 design for feedback vs. no feedback and training Alliance vs. no training alliance (4 groups, about 10 sites per group)
- General #'s: ? observations per patient, >3 patients per counselor, 3-15 counselors per site, 32-40 sites, about 400 new patients (but collected data on old patients as well)
- Want to know if they need to account for correlation within site, even with very small ICC (Intra-Class-Correlation). Can they justify this with a statistical argument?
- Trying to fit a random intercept and slope model, also adjusting for subject correlation (repeated measures), possibly region and counselor clustering
- Have potential power issues by adjusting, plus convergence issues with mixed model
- Bios recommends a formal test for variance component(s) of site. Ben will research and get back to team.

Edit | Attach | Print version | History: r77 < r76 < r75 < r74 | Backlinks | View wiki text | Edit wiki text | More topic actions

Topic revision: r77 - 12 Nov 2010, JoAnnAlvarez

Copyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback

Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback