Skip to main content

Reproducible Research Using R: Packages & Functions Reference

Reproducible Research Using R
Packages & Functions Reference
  • Show the following:

    Annotations
    Resources
  • Adjust appearance:

    Font
    Font style
    Color Scheme
    Light
    Dark
    Annotation contrast
    Low
    High
    Margins
  • Search within:
    • Notifications
    • Privacy
  • Project HomeBrooklyn Civic Data Lab
  • Projects
  • Learn more about Manifold

Notes

table of contents
  1. About
    1. 0.1 What You’ll Learn
    2. 0.2 What You Should Know First
    3. 0.3 What This Book Does Not Cover
  2. How to Use This Book
    1. 0.4 Chapter Anatomy
    2. 0.5 Code, Data, and Reproducibility
    3. 0.6 Acknowledgments
  3. 1 Getting Started with R
    1. 1.1 Learning Objectives
    2. 1.2 RStudio
    3. 1.3 R as a Calculator
      1. 1.3.1 Basic Math
      2. 1.3.2 Built-in mathematical Functions
    4. 1.4 Creating Variables and Assigning Objects
    5. 1.5 Vectors
      1. 1.5.1 Numeric Vectors
      2. 1.5.2 Character Vectors
      3. 1.5.3 Logical Vectors
      4. 1.5.4 Factors (categorical)
      5. 1.5.5 Indexing (1-based in R!)
      6. 1.5.6 Type Coercion
    6. 1.6 Data Frames
      1. 1.6.1 Creating Your Own Data Frame
      2. 1.6.2 Functions to Explore Datasets
      3. 1.6.3 Working With Columns Within Data Frames
    7. 1.7 Reading & Writing data
    8. 1.8 Packages
    9. 1.9 Getting Comfortable Making Mistakes - Help and Advice
    10. 1.10 Key Takeaways
    11. 1.11 Checklist: Before Moving On
    12. 1.12 Key Functions & Commands
    13. 1.13 💡 Reproducibility Tip:
  4. 2 Introduction to tidyverse
    1. 2.1 Learning Objectives {tidyverse-objectives}
    2. 2.2 Using Packages
      1. 2.2.1 Installing Packages
      2. 2.2.2 Loading Packages
    3. 2.3 Meet the tidyverse
      1. 2.3.1 The Pipe
    4. 2.4 Manipulating Data in tidyverse
      1. 2.4.1 Distinct
      2. 2.4.2 Select
      3. 2.4.3 Filter
      4. 2.4.4 Arrange
      5. 2.4.5 Mutate
      6. 2.4.6 If Else
      7. 2.4.7 Renaming Columns
      8. 2.4.8 Putting them all together
    5. 2.5 Insights Into Our Data
      1. 2.5.1 Count
      2. 2.5.2 Summarizing and Grouping
    6. 2.6 Common Gotchas & Quick Fixes
      1. 2.6.1 = vs ==
      2. 2.6.2 NA-aware math
      3. 2.6.3 Pipe position
      4. 2.6.4 Conflicting function names
    7. 2.7 Key Takeaways
    8. 2.8 Checklist
    9. 2.9 Key Functions & Commands
    10. 2.10 💡 Reproducibility Tip:
  5. 3 Visualizations
    1. 3.1 Introduction
    2. 3.2 Learning Objectives
    3. 3.3 Base R
    4. 3.4 ggplot2
      1. 3.4.1 Basics
      2. 3.4.2 Scatterplot - geom_point()
      3. 3.4.3 Bar Chart (counts) and Column Chart (values)
      4. 3.4.4 Histograms and Density Plots (distribution)
      5. 3.4.5 Boxplot - geom_boxplot()
      6. 3.4.6 Lines (time series) - geom_line()
      7. 3.4.7 Put text on the plot - geom_text()
      8. 3.4.8 Error bars (requires summary stats) - geom_errorbar()
      9. 3.4.9 Reference lines
    5. 3.5 Key Takeaways
    6. 3.6 Checklist
    7. 3.7 ggplot2 Visualization Reference
      1. 3.7.1 Summary of ggplot Geometries
      2. 3.7.2 Summary of other ggplot commands
    8. 3.8 💡 Reproducibility Tip:
  6. 4 Comparing Two Groups: Data Wrangling, Visualization, and t-Tests
    1. 4.1 Introduction
    2. 4.2 Learning Objectives {means-objectives}
    3. 4.3 Creating a Sample Dataset
    4. 4.4 Merging Data
      1. 4.4.1 Binding our data
      2. 4.4.2 Joining Data
      3. 4.4.3 Wide Format
      4. 4.4.4 Long Format (Reverse Demo)
    5. 4.5 Comparing Means
      1. 4.5.1 Calculating the means
      2. 4.5.2 t.test
    6. 4.6 Key Takeaways
    7. 4.7 Checklist
      1. 4.7.1 Data Creation & Import
      2. 4.7.2 Comparing Two Means
    8. 4.8 Key Functions & Commands
    9. 4.9 Example APA-style Write-up
    10. 4.10 💡 Reproducibility Tip:
  7. 5 Comparing Multiple Means
    1. 5.1 Introduction
    2. 5.2 Learning Objectives {anova-objectives}
    3. 5.3 Creating Our Data
    4. 5.4 Descriptive Statistics
    5. 5.5 Visualizing Relationships
    6. 5.6 Running a T.Test
    7. 5.7 One-Way ANOVA
    8. 5.8 Post-hoc Tests
    9. 5.9 Adding a Second Factor
    10. 5.10 Model Comparison With AIC
    11. 5.11 Key Takeaways
    12. 5.12 Checklist
    13. 5.13 Key Functions & Commands
    14. 5.14 Example APA-style Write-up
    15. 5.15 💡 Reproducibility Tip:
  8. 6 Analyzing Categorical Data
    1. 6.1 Introduction
    2. 6.2 Learning Objectives {cat-objectives}
    3. 6.3 Loading Our Data
    4. 6.4 Contingency Tables
    5. 6.5 Visualizations
    6. 6.6 Chi-Square Test
    7. 6.7 Cross Tables
    8. 6.8 Contribution
    9. 6.9 CramerV
    10. 6.10 Interpretation
    11. 6.11 Key Takeaways
    12. 6.12 Checklist
    13. 6.13 Key Functions & Commands
    14. 6.14 Example APA-style Write-up
    15. 6.15 💡 Reproducibility Tip:
  9. 7 Correlation
    1. 7.1 Introduction
      1. 7.1.1 Learning Objectives
    2. 7.2 Loading Our Data
    3. 7.3 Cleaning our data
    4. 7.4 Visualizing Relationships
    5. 7.5 Running Correlations (r)
    6. 7.6 Correlation Matrix
    7. 7.7 Coefficient of Determination (R^2)
    8. 7.8 Partial Correlations
    9. 7.9 Biserial and Point-Biserial Correlations
    10. 7.10 Grouped Correlations
    11. 7.11 Conclusion
    12. 7.12 Key Takeaways
    13. 7.13 Checklist
    14. 7.14 Key Functions & Commands
    15. 7.15 Example APA-style Write-up
      1. 7.15.1 Bivariate Correlation
      2. 7.15.2 Positive Correlation
      3. 7.15.3 Partial Correlation
    16. 7.16 💡 Reproducibility Tip:
  10. 8 Linear Regression
    1. 8.1 Introduction
    2. 8.2 Learning Objectives {lin-reg-objectives}
    3. 8.3 Loading Our Data
    4. 8.4 Cleaning Our Data
    5. 8.5 Visualizing Relationships
    6. 8.6 Understanding Correlation
    7. 8.7 Linear Regression Model
    8. 8.8 Checking the residuals
    9. 8.9 Adding more variables
      1. 8.9.1 Bonus code
    10. 8.10 Conclusion
    11. 8.11 Key Takeaways
    12. 8.12 Checklist
    13. 8.13 Key Functions & Commands
    14. 8.14 Example APA-style Write-up
    15. 8.15 💡 Reproducibility Tip:
  11. 9 Logistic Regression
    1. 9.1 Introduction
    2. 9.2 Learning Objectives {log-reg-objectives}
    3. 9.3 Load and Preview Data
    4. 9.4 Exploratory Data Analysis
    5. 9.5 Visualize Relationships
    6. 9.6 Train and Test Split
    7. 9.7 Build Logistic Regression Model
      1. 9.7.1 McFadden’s Pseudo-R²
      2. 9.7.2 Variable Importance
      3. 9.7.3 Multicollinearity check
    8. 9.8 Make Predictions
    9. 9.9 Evaluate Model
    10. 9.10 ROC Curve + AUC
    11. 9.11 Interpretation
    12. 9.12 Key Takeaways
    13. 9.13 Checklist
    14. 9.14 Key Functions & Commands
    15. 9.15 Example APA-style Write-up
    16. 9.16 💡 Reproducibility Tip:
  12. 10 Reproducible Reporting
    1. 10.1 Introduction
    2. 10.2 Learning Objectives {r-markdown-objectives}
    3. 10.3 Creating an R Markdown File
    4. 10.4 Parts of an R Markdown File
      1. 10.4.1 The YAML
      2. 10.4.2 Text
      3. 10.4.3 R Chunks
      4. 10.4.4 Sections
    5. 10.5 Knitting an R Markdown File
    6. 10.6 Publishing an R Markdown file
    7. 10.7 Extras
      1. 10.7.1 Links
      2. 10.7.2 Pictures
      3. 10.7.3 Checklists
      4. 10.7.4 Standout sections
      5. 10.7.5 Changing Setting of Specific R Chunks
    8. 10.8 Key Takeaways
    9. 10.9 Checklist
    10. 10.10 Key Functions & Commands
    11. 10.11 Summary of Common R Markdown Syntax
    12. 10.12 💡 Reproducibility Tip:
  13. Appendix: Reproducibility Checklist for Data Analysis in R
    1. 10.13 Project & Environment
    2. 10.14 Data Integrity & Structure
    3. 10.15 Data Transformation & Workflow
    4. 10.16 Merging & Reshaping Data
    5. 10.17 Visualization & Communication
    6. 10.18 Statistical Reasoning
    7. 10.19 Modeling & Inference
    8. 10.20 Randomness & Evaluation
    9. 10.21 Reporting & Execution
    10. 10.22 Final Check
  14. Packages & Functions Reference

Packages & Functions Reference

This table consolidates the packages and commands used throughout the book, what each command does, and where it is first introduced.

PackageCommandWhat it doesFirst introduced
base R<-Assigns values to objects for later use.Introduction to R
base Rc()Combines multiple values into a single vector.Introduction to R
base R:Creates integer sequences (e.g., 1:10).Introduction to R
base R[]Indexes and subsets elements from vectors or data frames.Introduction to R
base R$Accesses or creates columns within a data frame.Introduction to R
base Rsqrt()Computes square roots.Introduction to R
base Rlog() / log10()Computes natural and base-10 logarithms.Introduction to R
base Rround()Rounds numeric values to a specified number of digits.Introduction to R
base Rclass()Identifies the data type (class) of an object.Introduction to R
base Rlength()Returns the number of elements in a vector.Introduction to R
base Rfactor()Converts character data into categorical (factor) variables.Introduction to R
base Rlevels()Displays the levels associated with a factor.Introduction to R
base Rdata.frame()Combines vectors into a tabular data structure.Introduction to R
base Rhead() / tail()Displays the first or last rows of a dataset.Introduction to R
base Rstr()Displays the internal structure and data types of a dataset.Introduction to R
base Rsummary()Produces descriptive summaries of variables or model results.Introduction to R
base Rtable()Creates frequency tables for categorical data.Introduction to R
base Rnrow() / ncol()Returns the number of rows or columns in a dataset.Introduction to R
base Rcolnames()Displays or modifies column names of a data frame.Introduction to R
base Rread.csv()Imports CSV files into R as data frames.Introduction to R
base Rgetwd() / setwd()Gets or sets the current working directory.Introduction to R
base Rinstall.packages()Installs packages from CRAN.Introduction to R
base Rlibrary()Loads an installed package into the current R session.Introduction to R
base R?function_nameAccesses built-in help documentation for a function.Introduction to R
magrittr%>%Passes the result of one operation into the next.Introduction to tidyverse
dplyrselect()Chooses specific columns from a dataset.Introduction to tidyverse
dplyrfilter()Keeps rows that meet logical conditions.Introduction to tidyverse
dplyrarrange()Orders rows based on column values.Introduction to tidyverse
dplyrmutate()Creates or modifies columns.Introduction to tidyverse
dplyrrename()Renames columns using new_name = old_name.Introduction to tidyverse
dplyrdistinct()Returns unique rows or value combinations.Introduction to tidyverse
dplyrif_else()Creates values based on a binary condition.Introduction to tidyverse
dplyrcase_when()Applies multiple conditional rules.Introduction to tidyverse
base Ris.na()Identifies missing (NA) values.Introduction to tidyverse
tidyrdrop_na()Removes rows containing missing values.Introduction to tidyverse
dplyrcount()Counts observations by group.Introduction to tidyverse
dplyrgroup_by()Groups data for grouped operations.Introduction to tidyverse
dplyrsummarise()Computes summary statistics for groups.Introduction to tidyverse
dplyrn()Returns group size within summarise().Introduction to tidyverse
base RsessionInfo()Displays information about the current R session, including loaded packages.Introduction to tidyverse
dplyrinner_join()Performs a SQL-style inner join, keeping only rows that match in both datasets.Comparing Two Groups
tidyrpivot_longer()Converts data from wide format to long format.Comparing Two Groups
tidyrpivot_wider()Converts data from long format back to wide format.Comparing Two Groups
base Rrbind()Combines multiple data frames by binding rows together.Comparing Two Groups
base Rmerge()Joins two data frames together based on a shared key variable.Comparing Two Groups
base Rmean()Calculates the average of numeric values.Comparing Two Groups
statst.test()Tests whether two group means differ significantly.Comparing Two Groups
statscor()Computes Pearson correlation coefficients.Correlation Analysis
statscor.test()Computes and tests correlations.Correlation Analysis
base Rpairs()Creates a scatterplot matrix.Correlation Analysis
GGallyggpairs()Enhanced scatterplot matrix with correlations.Correlation Analysis
ppcorpcor.test()Computes partial correlations.Correlation Analysis
base Rifelse()Recodes variables conditionally.Correlation Analysis
base Rset.seed()Ensures reproducibility when generating random data.Comparing Multiple Means
statsrnorm()Generates random values from a normal distribution.Comparing Multiple Means
statsaov()Fits ANOVA models.Comparing Multiple Means
supernovasupernova()Displays ANOVA results in structured tables.Comparing Multiple Means
statsTukeyHSD()Performs post-hoc pairwise comparisons.Comparing Multiple Means
base Rplot()Visualizes post-hoc comparison results.Comparing Multiple Means
AICcmodavgaictab()Compares models using AIC.Comparing Multiple Means
base Rxtabs()Constructs contingency tables using a formula interface.Analyzing Categorical Data
janitortabyl()Creates clean contingency tables.Analyzing Categorical Data
janitoradorn_percentages()Converts counts to percentages.Analyzing Categorical Data
janitoradorn_ns()Displays counts and percentages together.Analyzing Categorical Data
janitorclean_names()Cleans names of an object.Correlations
statschisq.test()Performs Chi-Square tests of independence.Analyzing Categorical Data
gmodelsCrossTable()Detailed cross-tabulations.Analyzing Categorical Data
pheatmappheatmap()Heatmap visualization of residuals or contributions.Analyzing Categorical Data
rcompanioncramerV()Measures association strength between categorical variables.Analyzing Categorical Data
statslm()Fits linear regression models.Linear Regression
broomtidy()Tidies model coefficients.Linear Regression
broomglance()Extracts model-level statistics.Linear Regression
lmtestbptest()Tests heteroscedasticity.Linear Regression
statsAIC()Compares regression models.Linear Regression
statsstep()Performs stepwise model selection.Linear Regression
statsglm()Fits generalized linear models, including logistic regression (binomial family).Logistic Regression
base Rexp()Converts log-odds to odds ratios.Logistic Regression
caToolssample.split()Splits data into training/testing sets.Logistic Regression
caretconfusionMatrix()Evaluates classification performance.Logistic Regression
psclpR2()Computes pseudo R² values.Logistic Regression
caretvarImp()Assesses predictor importance.Logistic Regression
carvif()Detects multicollinearity.Logistic Regression
pROCroc()Builds ROC curves.Logistic Regression
pROCauc()Computes area under the ROC curve (AUC).Logistic Regression
knitrknitr::opts_chunk$set()Sets global chunk options in R Markdown.Reproducible Reporting
knitrkable()Creates formatted tables for reports.Reproducible Reporting
nycOpenDatanyc311()Downloads NYC 311 Service Request data from NYC Open Data.Reproducible Reporting
citation("base")
#> To cite R in publications use:
#> 
#>   R Core Team (2025). _R: A Language and Environment
#>   for Statistical Computing_. R Foundation for
#>   Statistical Computing, Vienna, Austria.
#>   <https://www.R-project.org/>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {R: A Language and Environment for Statistical Computing},
#>     author = {{R Core Team}},
#>     organization = {R Foundation for Statistical Computing},
#>     address = {Vienna, Austria},
#>     year = {2025},
#>     url = {https://www.R-project.org/},
#>   }
#> 
#> We have invested a lot of time and effort in creating
#> R, please cite it when using it for data analysis.
#> See also 'citation("pkgname")' for citing R packages.
citation("ggplot2")
#> To cite ggplot2 in publications, please use
#> 
#>   H. Wickham. ggplot2: Elegant Graphics for Data
#>   Analysis. Springer-Verlag New York, 2016.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Book{,
#>     author = {Hadley Wickham},
#>     title = {ggplot2: Elegant Graphics for Data Analysis},
#>     publisher = {Springer-Verlag New York},
#>     year = {2016},
#>     isbn = {978-3-319-24277-4},
#>     url = {https://ggplot2.tidyverse.org},
#>   }
citation("dplyr")
#> To cite package 'dplyr' in publications use:
#> 
#>   Wickham H, François R, Henry L, Müller K, Vaughan D
#>   (2023). _dplyr: A Grammar of Data Manipulation_.
#>   doi:10.32614/CRAN.package.dplyr
#>   <https://doi.org/10.32614/CRAN.package.dplyr>, R
#>   package version 1.1.4,
#>   <https://CRAN.R-project.org/package=dplyr>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {dplyr: A Grammar of Data Manipulation},
#>     author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller and Davis Vaughan},
#>     year = {2023},
#>     note = {R package version 1.1.4},
#>     url = {https://CRAN.R-project.org/package=dplyr},
#>     doi = {10.32614/CRAN.package.dplyr},
#>   }
citation("tidyr")
#> To cite package 'tidyr' in publications use:
#> 
#>   Wickham H, Vaughan D, Girlich M (2024). _tidyr:
#>   Tidy Messy Data_. doi:10.32614/CRAN.package.tidyr
#>   <https://doi.org/10.32614/CRAN.package.tidyr>, R
#>   package version 1.3.1,
#>   <https://CRAN.R-project.org/package=tidyr>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {tidyr: Tidy Messy Data},
#>     author = {Hadley Wickham and Davis Vaughan and Maximilian Girlich},
#>     year = {2024},
#>     note = {R package version 1.3.1},
#>     url = {https://CRAN.R-project.org/package=tidyr},
#>     doi = {10.32614/CRAN.package.tidyr},
#>   }

Annotate

Previous
Textbook
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org