Difference: ExcelProblems (1 vs. 10)

Revision 10
10 Sep 2013 - Main.FrankHarrell
Line: 1 to 1
 
META TOPICPARENT name="StatComp"

Problems with Excel

Line: 7 to 7
 
Added:
>
>
 
From: P.J.Wells@OPEN.AC.UK [mailto:P.J.Wells@OPEN.AC.UK]
Revision 9
16 Apr 2013 - Main.FrankHarrell
Line: 1 to 1
 
META TOPICPARENT name="StatComp"

Problems with Excel

Line: 6 to 6
 
Added:
>
>
 
From: P.J.Wells@OPEN.AC.UK [mailto:P.J.Wells@OPEN.AC.UK]
Revision 8
29 Jan 2012 - Main.FrankHarrell
Line: 1 to 1
 
META TOPICPARENT name="StatComp"

Problems with Excel

Changed:
<
<
>
>
 
From: P.J.Wells@OPEN.AC.UK [mailto:P.J.Wells@OPEN.AC.UK]
Revision 7
12 Jul 2010 - Main.FrankHarrell
Line: 1 to 1
 
META TOPICPARENT name="StatComp"

Problems with Excel

Changed:
<
<
>
>
 
From: P.J.Wells@OPEN.AC.UK [mailto:P.J.Wells@OPEN.AC.UK]
Revision 6
21 Dec 2007 - Main.FrankHarrell
Line: 1 to 1
 
META TOPICPARENT name="StatComp"

Problems with Excel

Added:
>
>
 
From: P.J.Wells@OPEN.AC.UK [mailto:P.J.Wells@OPEN.AC.UK]
Revision 5
25 Sep 2007 - Main.FrankHarrell
Line: 1 to 1
 
META TOPICPARENT name="StatComp"

Problems with Excel

Changed:
<
<
>
>
 
Revision 4
25 Sep 2007 - Main.FrankHarrell
Line: 1 to 1
 
META TOPICPARENT name="StatComp"
Changed:
<
<
See also Patrick Burns' Spreadsheet Addiction page for general and computational problems with spreadsheets.

There is also an interesting article about how the open source gnumeric spreadsheet team corrected computational errors that Microsoft was unable to fix in Excel: http://www.csdassn.org/software_reports/gnumeric.pdf
>
>

Problems with Excel

 
From: P.J.Wells@OPEN.AC.UK [mailto:P.J.Wells@OPEN.AC.UK]
Revision 3
03 Jan 2005 - Main.FrankHarrell
Line: 1 to 1
 
META TOPICPARENT name="StatComp"
Changed:
<
<
See also an interesting article about how the open source gnumeric spreadsheet team corrected computational errors that Microsoft was unable to fix in Excel: http://www.csdassn.org/software_reports/gnumeric.pdf
>
>
See also Patrick Burns' Spreadsheet Addiction page for general and computational problems with spreadsheets.

There is also an interesting article about how the open source gnumeric spreadsheet team corrected computational errors that Microsoft was unable to fix in Excel: http://www.csdassn.org/software_reports/gnumeric.pdf
 
From: P.J.Wells@OPEN.AC.UK [mailto:P.J.Wells@OPEN.AC.UK]
Line: 294 to 296
  web-site at http://www.gre.ac.uk/~cd02/eusprig/ containing some interesting material and links.
Added:
>
>

On 19Dec04, Ted.Harding@nessie.mcc.ac.uk wrote:

There is a huge literature on this topic, some of it published in journals, much of it floating around on the web and in the archives of mailing lists.

Tim's reference above is interesting, but only one example. The McCullough and Wilson reference given there, though now somewhat dated, identifies many of the classic problems. Googling on mccullough wilson excel will throw up a host of followups.

Informed statistical comment on the problems of Excel encountered by serious users can be found by browsing in the mailing list ASSUME (Association of Statistics Specialists Using Microsoft Excel): http://www.jiscmail.ac.uk/lists/assume.html. The most recent serious issue reported there is the RAND() bug: see ASSUME archives for Dec 2003 followed up in the March 2004 archives. The latter point to a statement from Microsoft:
  SYMPTOMS
  When you use the RAND function in Microsoft Office Excel 2003,
  the RAND function may return negative numbers.
  CAUSE
  This problem may occur when you try to use many random numbers,
  and you update the RAND function multiple times. For example,
  this problem may occur when you update your Excel worksheet by
  pressing F9 ten times or more.
  RESOLUTION
  This problem is fixed in the Microsoft Excel 2003 Hotfix Package
  that is dated February 29, 2004. 
http://support.microsoft.com/default.aspx?scid=kb;en-us;834520

(and the deeper you probe in this, the worse it gets). While using Excel for statistics has some limited value in the context of initiating to statistics students whose IT experience is limited to exposure to courses on Excel and Word, and the teacher wants to build on such experience, I think that Excel should never be used for serious statistical work, for several reasons.
  1. The many reported (and some allegedly fixed) bugs in calculation and algorithms necessarily provoke suspicion that others still exist or may have been introduced. One simply cannot trust the results without checking.
  2. Too many things can be done silently and invisibly, "behind the spreadsheet", by Excel. Changes to data and differences between what is shown on the spreadsheet and what goes into exported files can arise without the user being aware of them. A particularly frightening example is the "sort" disaster reported to ASSUME (8 Dec 2003) by Allan Reese.
  3. Excel has some value as a straightforward data entry pad. However, I have seen far too many cases where sloppy usage has led to the resulting spreadsheet containing "information" which is either superfluous or wrong, in ways which would not be obvious to the user. For example, a "missing data" cell, if blank, may be interpreted as having value zero. Some people enter "." for missing data, but often are not consistent. If inadvertently a space is entered in a cell outside the intended row/column range of the data (or, I suspect, even if the spreadsheet cursor wanders outside the range) then when the sheet is exported (e.g. as "CSV") these extra rows and columns will be included. In one case I received an Excel spredsheet with hundreds of such extra rows and dozens of extra columns, together with dozens of cases where " " and "." had been used inconsistently, all this over and over on each of about 6 "worksheet" pages; not to mention data in the wrong columns etc. It took about 4 days of continuous work to clean this up. To be frank, for entering complex data the discipline enforced by a properly designed Data Entry Form in a database package would avoid such problems altogether, and such should be used. The illusion of success that Excel gives the user is a most treacherous danger and frankly I simply do not, in the first instance, trust data in a spreadsheet.
  4. The use of formulae in cells to generate cell values can cause all sorts of problems. One to especially watch out for is that a formula may have been wrongly or inappropriately "copied" from one column to another or from one worksheet to another. You can of course check this by moving the cell cursor to such cells and noting what the formula is, but as you can imagine this is a horribly uhpleasant process (and by the way take care that you don't inadvertently alter it while you're doing this!). Also see Allan Reese's "sort" disaster above, which was formula-induced.

I could go on. I've written at length already because many readers of R-help may be in situations where they necessarily receive data in Excel files, or have to use Excel themselves, and may not yet have become aware of the risks. So I'm writing as a warning to them: Don't trust Excel, but if you must use it then check everything, make sure it's what it should be, and make sure that it stays that way when the spreadsheet is accessed (as in (3) or (4) above, things may change invisibly).

 
Revision 2
26 Jun 2004 - Main.FrankHarrell
Line: 1 to 1
 
META TOPICPARENT name="StatComp"
See also an interesting article about how the open source gnumeric spreadsheet team corrected computational errors that Microsoft was unable to fix in Excel: http://www.csdassn.org/software_reports/gnumeric.pdf
Line: 7 to 7
  Sent: Donnerstag, 8. März 2001 19:34 To: allstat@JISCMAIL.AC.UK Subject: Summary: Excel for statistics
Changed:
<
<
>
>
 

Last month I posted a request for information about the alleged deficiencies
Line: 33 to 33
  I've tried to give appropriate credit for specific points, but as might be expected a number of references were the subject of multiple independent messages, so I hope that the following generic credit will do justice to
Changed:
<
<
>
>
  Andrew Bertie Jan Beyersmann Jill Binker
Line: 67 to 67
  +44 1908 654658

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Changed:
<
<
>
>
  1) Fundamental shortcomings of digital computation
Deleted:
<
<
  These naturally apply to all programs, including high-end specialist statistical ones; see B.D. McCullough's two-part article in The American Statistician (11/98, Vol 52, parts 4 and 5/99, Vol 53 part 2 -- available
Line: 89 to 88
  As a non-specialist, I found Part 1 especially chastening. For a simple demonstration I devised the following (prompted by a Excel-G post on the autofill feature); try:
Changed:
<
<
>
>
  Enter "100" in A1

Enter "=A1-0.1" in A2
Line: 97 to 96
  Use the auto-fill facility to copy A2 down through (say) A110

Set Excel to display 13 places of decimals in the relevant cells.
Changed:
<
<
>
>
  How many of these cells would you want to use as the basis for any further calculations which depended on the assumption that the contents were exact multiples of 0.1?
Line: 107 to 105
 

(a) A similar approach to the above is applied to Excel in:
Changed:
<
<
>
>
  McCullough, B. D. and Wilson, B. (1999) On the accuracy of statistical procedures in Microsoft Excel 97. Computational Statistics and Data Analysis 31:27-37.
Changed:
<
<
>
>
  For those whose institutions have the necessary subscriptions, this is available on-line at http://www.elsevier.nl/gej-ng/10/15/38/37/25/27/article.pdf.
Line: 141 to 136
  (c) Knut M. Wittkowski is working on a (very impressive) animated PowerPoint presentation on a number of issues, of which the current version is
Changed:
<
<
available from http://www.rucares.org/Course/Excel.ppv.
>
>
available from http://www.rucares.org/Course/Excel.ppv.
 

(d) Nick Cox pointed out the eccentric nature of Excel's approach to
Line: 201 to 197
 

... a tiny dataset that made Excel give a totally meaningless regression output.
Changed:
<
<
>
>
  x1 x2 x3 y 1 2 3 21 3 2 5 33 5 4 9 45 7 4 11 49 9 6 15 61
Changed:
<
<
>
>
  You do a model of y = x1 x2 x3

Excel gives you a load of old tosh.
Line: 298 to 294
  web-site at http://www.gre.ac.uk/~cd02/eusprig/ containing some interesting material and links.
Deleted:
<
<
 
 
This site is powered by FoswikiCopyright © 2013-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback