Statistical Graphics

DavidAirey 08 Nov 2004: During Frank's entertaining lecture on use of graphics, Frank mentioned the desire to see more use of not only confidence intervals for mean estimates, but when the interest was in the difference between means, the confidence interval for that difference should be plotted too. I'm sifting through some of the notes and graphics for the example. Anyone see it?

FrankHarrell: See p. 14 of

DavidAirey 11 Nov 2004: In addition to Stata ( (and fumbling around with R 2.0), I use a fun statistics package for exploratory graphics called Data Desk (, sadly now in a developmental coma. My confusion with the above stemmed from my use of boxplots in this package. I usually overlay a 95% confidence interval on the box. This CI is actually for the median and constructed in such a way as to allow inference at the 0.05 level across the boxes graphed (median 1.58(high hinge - low hinge)/sqrt(n); derived in Chapter 3 in Velleman and Hoaglin (1981)).

FrankHarrell: There is a way to make approximately correct conclusions about significant differences based on overlap of two intervals, when those intervals have roughly 70% coverage. But it is always best to display an interval for the actual difference.

Statistical Software and Excel

Lynne Hinger (lynne.hinger@Vanderbilt.Edu) 10 Nov 2004: In the ExcelProblems handout it states:
 On allstat a few years ago, a weird example was mentioned. Enter a 
 column of zeros and set a cell to contain their sum. Now change one cell 
 to O (upper case O). Now change another cell to l (lower case l). What 
 happens? What would be reasonable behaviour */for statistical 
 purposes/*? Do you want to trust this software?
I tried this in excel and wasn't really sure what the problem was - it was just the sum. He doesn't really explain the "reasonable behavior" he is looking for? It made sense to me ?? (And that worries me!)

FrankHarrell: I haven't tried that myself. I take it that the non-numerics were not properly excluded from the sum or more likely that a sum of non-missing values was computed instead of giving an answer of NA that would alert the analyst about bad data in the column. There may also be problem when counting the number of non-missing entries for the purposes of getting a mean.

