The Statistical Computing Series

The Statistical Computing Series is a monthly event for learning various aspects of modern statistical computing from practitioners in the Department of Biostatistics. We focus on topics related to the R language, Python, and related tools, but we include the broadest possible range of content related to effective statistical computation. The format varies, depending on the speaker and the topic, from lectures to demonstrations to hands-on workshops.

If you have a particular topic you would like to see covered, please send a request.

There have been several requests for coverage of various topics. Here is a short list, if you are interested in contributing but are seeking inspiration:

  • writing R functions with formula arguments
  • writing R functions with methods
  • using makefiles
  • other graphics packages (base graphics)
  • lme4/nlme
  • reshape (package not function)/plyr
  • R data structures
  • bootstrapping / random number generating
  • imputation (using various packages and functions)
  • bibtex
  • software for slide presentations

Time & Location

Fourth Friday of each month at 1:30 pm in the Biostatistics Conference Room (11105, 2525 West End Avenue).

Email Notification

We send out email notifications the week of a particular presentation. If you would like to be added to the list, please let us know.

Spring 2017 Schedule

Use R to animate travel history!

27 January, 2017 Minchun Zhou

Plotting is not sufficient for data visualization in many cases, such as travel history data. People travel a lot, with family, with friends or alone. If you can animate your travel history, it’s like reviving good memories. In this talk, I will briefly demonstrate how to plot data on Google Maps, draw great circles, and make animations using R!

Using the R package GMD to do collaborative statistical document construction

24 February, 2017 Nicholas Strayer

Lucy and I have recently made the R package GMD to solve the problem “how do you construct a statistical report/ homework while working simultaneously with collaborators?”. GMD is an alpha-level package that allows you to keep a local .Rmd file in sync with a remote google doc. Simply paste the share url of the google doc into the function and automatically R will pull the google doc, put it into an .Rmd on your local machine and render the results. This effectively let’s you use google docs as your text editor, with all its benefits of history and multi-user editing while avoiding the hassle of continuously copying and pasting the text into R to check for syntax errors etc.

Introduction to Variational Bayesian Methods

24 March, 2017 David Schlueter

In Bayesian analysis, the most common strategy for computing posterior quantities is through Markov Chain Monte Carlo (MCMC). Despite recent advances in efficient sampling, MCMC methods still remain computationally intensive for more than a few thousand observations. A more scalable alternative to sampling is Variational Inference (VI), which re-frames the problem of computing the posterior distribution as a minimization of the Kullback-Leibler divergence between the true posterior and a member of some approximating family. In this talk, we provide a basic overview of the VI framework as well as practical examples of its implementation using the Automatic Differentiation Variational Inference (ADVI) engine in PyMC3.

Gaussian Processes Made Easy

28 April, 2017 Chris Fonnesbeck

A common applied statistics task involves building regression models to characterize non-linear relationships between variables. It is possible to fit such models by assuming a particular non-linear structure, such as a sinusoidal, exponential, or polynomial function, to describe a given response by one variable to another. Unless this relationship is obvious from the outset, however, it involves possibly extensive model selection procedures to ensure the most appropriate model is retained. Alternatively, a non-parametric approach can be adopted by defining a set of knots across the variable space and use a spline or kernel regression to describe arbitrary non-linear relationships. However, knot layout procedures are somewhat ad hoc and can also involve variable selection. A third alternative is to adopt a Bayesian non-parametric strategy, and directly model the unknown underlying function. For this, we can employ Gaussian process models. I will compare three packages for fitting GP models in Python that make building Bayesian non-parametric models easier than they have ever been.

Click to view previous presentations

Topic revision: r139 - 20 Mar 2017, ChrisFonnesbeck

This site is powered by FoswikiCopyright © 2013 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Vanderbilt Biostatistics Wiki? Send feedback