The Why and How of R package development: Part 1

UW R-Ladies

4/21/23

Topics

  1. What is an R package, and why would I make one?
  2. Special considerations for writing functions in R packages
  3. Brainstorm ideas for your own package
  1. Building a “dummy” package using R Studio and devtools
  2. Set up and start building your own package

What is an R package?

“Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data.”
-Hadley Wickham & Jenny Bryan in R Packages

R packages are…

  • Portable

    • Everything in an R package directory (functions, documentation, data, etc.) is “built” into a “tarball” (packageName.tar.gz) that is easy to download, install and load
  • Open source

    • If you install a package, you can see all the function code

Inside the geom_point() function in ggplot2

library(ggplot2)
geom_point
function (mapping = NULL, data = NULL, stat = "identity", position = "identity", 
    ..., na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) 
{
    layer(data = data, mapping = mapping, stat = stat, geom = GeomPoint, 
        position = position, show.legend = show.legend, inherit.aes = inherit.aes, 
        params = list2(na.rm = na.rm, ...))
}
<bytecode: 0x1176f60f0>
<environment: namespace:ggplot2>

Why should I make an R package?

  • To share R functions (and/or data) with others (for a general audience!)
  • To share R functions (and/or data) with others (in your lab, company, etc.)
    • via GitHub or other file sharing tool
  • To store functions (and/or data) for yourself!

R packages can be huge and complicated, or can have only one or two functions! It’s entirely up to you, and what you think will be useful for your intended audience.

Note: The more public-facing your R package, the more complex the documentation should be, and the more “generalized” the functions should be.

Required Elements of an R Package

Package

R/: contains R code files that contain function(s)

man/: contains documentation files for each function

DESCRIPTION: A file containing key package metadata

NAMESPACE: A file that determines which other packages your package relies on, which functions your package exports, etc.

Optional additional elements

data/: Contains example dataset(s) as .rda files
inst/: Can contain multiple objects, including a CITATION file
tests/: Contains files that perform automated tests of each function
vignettes/: Contains package vignette(s) as .Rmd files
LICENSE: A file that explains the license you want to use for your package (e.g. Creative Commons, MIT, etc.)
README.Rmd: A file that explains your package, displays on GitHub repo homepage and on CRAN

R Function Basics

General function structure:

function_name <- function(parameters){
  output <- doSomething(parameters)
   return(output)
}

Example:

# define the function
mean_two_numbers <- function(num_1, num_2) {
  mean <- (num_1 + num_2) / 2
  return (mean)
}

Example:

# define the function
mean_two_numbers <- function(num_1, num_2) {
  mean <- (num_1 + num_2) / 2
  return (mean)
}

# use the function
mean_two_numbers(1,2)
[1] 1.5

Practice!

  • Write a function with…
    • a single input: a “name” character string
    • a single output: the phrase “Hello [name]”

  • Write a function with…
    • a single input: a character string that is either “even” or “odd”
    • a single output: a numeric vector that returns the even or odd numbers between 1 and 10
  • now, add another input(s) so the user can define the start and end of the numeric sequence

R Functions in Packages - Special Considerations to make the user experience better (and pass CRAN checks!)

Function names and arguments should be meaningful!

function1 <- function(argument1, argument2) {
  output1 <- argument1[argument2,]
  return(output1)
}
function1(argument1 = mtcars, argument2 = 2)
              mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4

Better…

getRow <- function(dat, rowInd) {
  newRow <- dat[rowInd,]
  return(newRow)
}

getRow(dat = mtcars, rowInd = 3)
            mpg cyl disp hp drat   wt  qsec vs am gear carb
Datsun 710 22.8   4  108 93 3.85 2.32 18.61  1  1    4    1

Default arguments can make it easier on users

Example: The getRow() function automatically returns the first row of a data frame, unless the user specifies otherwise

getRow <- function(dat, rowInd = 1) {
  newRow <- dat[rowInd,]
  return(newRow)
}

getRow(dat = mtcars)
          mpg cyl disp  hp drat   wt  qsec vs am gear carb
Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4

The user can still give a different argument if they’d like

getRow(dat = mtcars, rowInd = 4)
                mpg cyl disp  hp drat    wt  qsec vs am gear carb
Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1

Check that the function inputs are of the expected type (i.e. character, numeric, string, etc.)

getRow <- function(dat, rowInd = 1) {
  # check that the 'dat' argument is a data frame
  if (is.data.frame(dat) == FALSE) {
    stop("Wrong input")
  }
  # check that the 'ind' argument is numeric
  stopifnot(is.numeric(rowInd))
  
  newRow <- dat[rowInd,]
  return(newRow)
}

Try to make the ‘dat’ argument a vector

getRow(dat = c(1,2,3,4))
Error in getRow(dat = c(1, 2, 3, 4)): Wrong input

Try to make the ‘rowInd’ argument a character

getRow(dat = mtcars, rowInd = "3")
Error in getRow(dat = mtcars, rowInd = "3"): is.numeric(rowInd) is not TRUE

Provide informative error messages!

getRow <- function(dat, rowInd = 1) {
  # check that the 'dat' argument is a data frame
  if (is.data.frame(dat) == FALSE) {
    stop("Wrong input")
  }
  # check that the 'ind' argument is numeric
  stopifnot(is.numeric(rowInd))
  
  newRow <- dat[rowInd,]
  return(newRow)
}

getRow(dat = c(1,2,3,4))
Error in getRow(dat = c(1, 2, 3, 4)): Wrong input
getRow <- function(dat, rowInd = 1) {
  # check that the 'dat' argument is a data frame
  if (is.data.frame(dat) == FALSE) {
    stop("The 'dat' argument must be a data frame")
  }
  # check that the 'ind' argument is numeric
  stopifnot("The 'rowInd' argument must be numeric" = is.numeric(rowInd))
  
  newRow <- dat[rowInd,]
  return(newRow)
}

getRow(dat = c(1,2,3,4))
Error in getRow(dat = c(1, 2, 3, 4)): The 'dat' argument must be a data frame
getRow(dat = mtcars, rowInd = "3")
Error in getRow(dat = mtcars, rowInd = "3"): The 'rowInd' argument must be numeric

  • Include ‘…’ in the argument list

getRow <- function(dat, rowInd = 1, ...) {
  newRow <- dat[rowInd,]
  return(newRow)
}
  • Don’t make changes to the user’s environment without changing them back before exiting the function

    • Example: Don’t change settings in “par” without returning those settings to the state they were before the function!

R Functions: Further Reading

Check out the “functions” chapter in Hadley Wickham’s book Advanced R

Cover of Advanced R Book

Next Friday, we’ll go over the basic steps of making a package using a fake example, and then start building your own basic pacakge framework!

  • We’ll use RStudio and the “devTools” package, as well as GitHub for version control and easy sharing down the road!

Prepare for next week:

  1. What problem(s) will you package address?
    • Is there a set of functions you or your collaborators use all the time?
    • Have you figured out a workflow that could be helpful for others?
  2. How broad is your audience? How widely will you share your package?
  3. What will your package be called? Some helpful advice
library(available)
available("Yay", browse = FALSE)
  1. What other packages will yours “depend” on?

General Resources