Converting Documents Produced by Sweave

Converting from LaTeX to html

The recommended format for statistical reports to send to collaborators is pdf produced by pdflatex. Sometimes it is necessary to give collaborators a version of a report that can be edited outside of LaTeX, or to post a report on a web site. Experiments with latex2rtf and hevea have shown that these are not adequate for reports that incorporate advanced features such as latex(describe()) output. One of the most reliable approaches is to use TtH to convert from LaTeX to html.

A linux script is needed to translate Sweave's LaTeX output for use with TtH (thanks to Ben Bolker for providing most of the code). Here are the steps needed to get going.

  1. Install necessary packages (instructions for Debian/Ubuntu variants):
    • Install package 'tth'
      • Using terminal: sudo apt-get install tth
    • Install package 'netpbm' if you are including LaTeX picture code in documents as produced by latex.describe to create the ppmtogif executable
      • Using terminal: sudo apt-get install netpbm
  2. Download 'sweave2html' script
    • Using the mouse
      1. Using the system menu, open the /home/username/bin/ folder.
      2. Create and save a script file called 'sweave2html' (without the .txt extension) by right clicking in the folder and selecting 'Create New\Text File.'
      3. Copy and paste the sweave2html commands (found in the sweave2html link) into the script file.
      4. Mark this script as executable by right clicking on the sweave2html file, choosing 'Properties,' and the 'Permissions' tab. Check the 'Is executable' box.
    • Alternatively, the script may be created in a terminal by typing the commands:
      1. cd ~/bin
      2. wget http://biostat.mc.vanderbilt.edu/wiki/pub/Main/SweaveConvert/sweave2html -nv ( Note: Type the web address; not the contents of the link.)
      3. chmod u+x sweave2html
To convert your .tex file to html and create all the needed graphics files, do the following.

  1. Create a folder called 'graphics' in your project directory and have Sweave use it by putting the following command in your .Rnw or .nw file: \SweaveOpts{prefix.string=graphics/plot}
  2. In a terminal (shell), open the (project) directory where the .tex file is located (using the command 'cd ~username\pathname').
    1. Run Sweave by typing 'Sweave filename (without the .Rnw or .nw extension).'
    2. Type the command 'sweave2html tex_filename (without the .tex extension).'
You can view the filename.html output in a browser such as konqueror and copy and paste it into an OpenOffice document and save in a variety of formats including Word. Use Select All(control+a), cut and paste and all graphics and table formats will be preserved.

It is important to give your collaborator all the .pdf files in the graphics directory to use in manuscripts; do not let them use the lower resolution graphs that will be included in the filename.html document. Bundle all the necessary files to send to the collaborator, using for example

zip /tmp/z.zip foo.pdf foo.html *.gif graphics/*.pdf graphics/*.png
E-mail /tmp/z.zip as an attachment.

Using TeX4ht

The TeX4ht package is a comprehensive LaTeX to html convertor. It may be installed easily using apt-get. In one test it performed well (including greek letters and superscripts and LaTeX picture environments) although I did not see how to get postscript or pdf graphics to appear in the final output. Advanced summary.formula.reverse tables are handled nearly perfectly, include those that contain micro dot charts. TeX4ht is used as follows:
htlatex foo.tex            # produces foo.html
mk4ht oolatex foo.tex      # produces an OpenOffice .sxw file
Note that the tth package has to be installed for htlatex to run completely.

My test of the oolatex option resulted in output that was not as good as running htlatex and opening the resulting .html file in OpenOffice. See StatReport for more information and example output, and note its comment about turning off picture links in the OpenOffice document.

Current Best Approach for Converting from LaTeX to Word

The following approach works well in many cases (e.g., documents with greek letters, simple math expressions, and bibliographies). Define the following script as l2h in your ~/bin directory:
htlatex $1.tex
rm -f $1.idv $1.lg $1.tmp $1.4tc $1.xref $1.4ct
zip /tmp/$$.zip $1.html $1.css $1*x.png
oowriter $1.html
echo "pack [button .h -text \"/tmp/$$.zip contains html and related files for\ncollaborator to unpack into one folder, or:\n\nClick Edit ... Links ... Break Link\nClick View and uncheck Notes, then\nSave as Word 97/2000/XP and exit OpenOffice\" -command exit]" | wish
rm -f $1*x.png ${1}2.html $1.dvi 
Run l2h my to convert my.tex to html and open OpenOffice to save it in Word 97/2000/XP. A popup will give you some pointers, such as unlinking pictures so if you e-mail the document to someone it will be self-contained. Examples are attached (see below for intro.tex and intro.doc ). This process gives you two options. First, you can e-mail your collaborator the .zip file the script creates in /tmp. Second, you can go ahead and save the result as a Word 97 document. Try having your collaborator use the html approach first. html files can be opened directly in Word, and Word will use the html style sheet ( css file) that is included in the zip file.

If you do not use many LaTeX packages, tables are not complex, and do not make major use of equations, a faster approach is to install the latex2rtf package to very quickly convert from LaTeX to rich text format ( rtf), using a command such as latex2rtf -o my.rtf my.tex.

You can create a file that can be opened in Firefox that beautifully renders equations without resorting to graphics by using MathML. The attached intro.xhtml was created by running mk4ht xhmlatex intro.tex then renaming intro.html as intro.xhtml. We don't currently know how to make OpenOffice open such files. To properly view intro.xhtml you have to save it to a local file so you can point to it outside of foswiki.

Using OpenOffice Exclusively

The odfWeave package by Max Kuhn can be used to produce reports directly in open document format, and the output can be save in Word format. At present, graphics are somewhat low resolution. Source code is similar to what is used with Sweave. Here is how to run an example (in linux), after installing the odfWeave package and the latest OpenOffice. The file can then be exported to open document or Word format.
 R library(odfWeave) odfWeave('/usr/local/lib/R/site-library/odfWeave/examples/examples.odt', '/tmp/out.odt')
You can then open /tmp/out.odt in OpenOffice Writer. Note: On some systems the correct file name will be /usr/lib/R/site-library/odfWeave/examples/examples.odt.

This approach does not allow you to use the advanced table making capabilities of Hmisc that rely on LaTeX.

Weaving with Raw HTML

Greg Snow has written a document showing how to use raw HTML and the R2HTML package to produce .html reports.

Batch Conversion of Document Formats

cd /tmp
bunzip2 ooconvert-*
tar xvf ooconvert-*
# You may need to edit line to change python2.3 to python
sudo chmod a+x ooconvert
sudo mv ooconvert /usr/local/bin or to ~/bin

One-step Conversion of LaTeX Documents to Word

  • Install ooconvert, tth, tex4ht
  • Put the following script in ~/bin and chmod +x to make it executable
  • Run it by saying ltx2doc foo to convert foo.tex to foo.doc
mk4ht oolatex $1.tex
rm -f $1.css $1.idv $1.lg $1.tmp $1.4tc $1.xref $1.4ct
ooconvert $1.odt $1.doc
rm $1.odt
But see above for a better approach through html and OpenOffice.

Converting to Word or OpenOffice by Converting from PDF

http://pdftoword.com does a surprisingly good job in many cases, including good handling of graphics. Convert your LaTeX document to pdf then use this server to convert to .doc or .rtf which will be e-mailed to you.
Topic attachments
I Attachment Action Size Date Who Comment
pdfpdf htmlWeave.pdf manage 60.8 K 26 Jul 2006 - 22:59 FrankHarrell Automating Reports with Sweave by Greg Snow
docdoc intro.doc manage 69.5 K 06 May 2009 - 19:20 FrankHarrell Result of l2h intro after telling OpenOffice? to save in Word format
elsetex intro.tex manage 8.5 K 06 May 2009 - 19:19 FrankHarrell LaTeX? test file to try with l2h
elsexhtml intro.xhtml manage 41.7 K 17 May 2009 - 11:57 FrankHarrell Result of mk4ht xhmlatex intro.tex then renaming intro.html to intro.xhtml
elseEXT sweave2html manage 0.6 K 14 May 2009 - 10:44 WillGray Sweave LaTeX to html convertor
Topic revision: r28 - 29 May 2009 - 13:53:09 - FrankHarrell
 
Register | Log In
Copyright © 2009 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback