Skip to main content

Reproducible Research Using R: How to Use This Book

Reproducible Research Using R
How to Use This Book
  • Show the following:

    Annotations
    Resources
  • Adjust appearance:

    Font
    Font style
    Color Scheme
    Light
    Dark
    Annotation contrast
    Low
    High
    Margins
  • Search within:
    • Notifications
    • Privacy
  • Project HomeBrooklyn Civic Data Lab
  • Projects
  • Learn more about Manifold

Notes

table of contents
  1. About
    1. 0.1 What You’ll Learn
    2. 0.2 What You Should Know First
    3. 0.3 What This Book Does Not Cover
  2. How to Use This Book
    1. 0.4 Chapter Anatomy
    2. 0.5 Code, Data, and Reproducibility
    3. 0.6 Acknowledgments
  3. 1 Getting Started with R
    1. 1.1 Learning Objectives
    2. 1.2 RStudio
    3. 1.3 R as a Calculator
      1. 1.3.1 Basic Math
      2. 1.3.2 Built-in mathematical Functions
    4. 1.4 Creating Variables and Assigning Objects
    5. 1.5 Vectors
      1. 1.5.1 Numeric Vectors
      2. 1.5.2 Character Vectors
      3. 1.5.3 Logical Vectors
      4. 1.5.4 Factors (categorical)
      5. 1.5.5 Indexing (1-based in R!)
      6. 1.5.6 Type Coercion
    6. 1.6 Data Frames
      1. 1.6.1 Creating Your Own Data Frame
      2. 1.6.2 Functions to Explore Datasets
      3. 1.6.3 Working With Columns Within Data Frames
    7. 1.7 Reading & Writing data
    8. 1.8 Packages
    9. 1.9 Getting Comfortable Making Mistakes - Help and Advice
    10. 1.10 Key Takeaways
    11. 1.11 Checklist: Before Moving On
    12. 1.12 Key Functions & Commands
    13. 1.13 💡 Reproducibility Tip:
  4. 2 Introduction to tidyverse
    1. 2.1 Learning Objectives {tidyverse-objectives}
    2. 2.2 Using Packages
      1. 2.2.1 Installing Packages
      2. 2.2.2 Loading Packages
    3. 2.3 Meet the tidyverse
      1. 2.3.1 The Pipe
    4. 2.4 Manipulating Data in tidyverse
      1. 2.4.1 Distinct
      2. 2.4.2 Select
      3. 2.4.3 Filter
      4. 2.4.4 Arrange
      5. 2.4.5 Mutate
      6. 2.4.6 If Else
      7. 2.4.7 Renaming Columns
      8. 2.4.8 Putting them all together
    5. 2.5 Insights Into Our Data
      1. 2.5.1 Count
      2. 2.5.2 Summarizing and Grouping
    6. 2.6 Common Gotchas & Quick Fixes
      1. 2.6.1 = vs ==
      2. 2.6.2 NA-aware math
      3. 2.6.3 Pipe position
      4. 2.6.4 Conflicting function names
    7. 2.7 Key Takeaways
    8. 2.8 Checklist
    9. 2.9 Key Functions & Commands
    10. 2.10 💡 Reproducibility Tip:
  5. 3 Visualizations
    1. 3.1 Introduction
    2. 3.2 Learning Objectives
    3. 3.3 Base R
    4. 3.4 ggplot2
      1. 3.4.1 Basics
      2. 3.4.2 Scatterplot - geom_point()
      3. 3.4.3 Bar Chart (counts) and Column Chart (values)
      4. 3.4.4 Histograms and Density Plots (distribution)
      5. 3.4.5 Boxplot - geom_boxplot()
      6. 3.4.6 Lines (time series) - geom_line()
      7. 3.4.7 Put text on the plot - geom_text()
      8. 3.4.8 Error bars (requires summary stats) - geom_errorbar()
      9. 3.4.9 Reference lines
    5. 3.5 Key Takeaways
    6. 3.6 Checklist
    7. 3.7 ggplot2 Visualization Reference
      1. 3.7.1 Summary of ggplot Geometries
      2. 3.7.2 Summary of other ggplot commands
    8. 3.8 💡 Reproducibility Tip:
  6. 4 Comparing Two Groups: Data Wrangling, Visualization, and t-Tests
    1. 4.1 Introduction
    2. 4.2 Learning Objectives {means-objectives}
    3. 4.3 Creating a Sample Dataset
    4. 4.4 Merging Data
      1. 4.4.1 Binding our data
      2. 4.4.2 Joining Data
      3. 4.4.3 Wide Format
      4. 4.4.4 Long Format (Reverse Demo)
    5. 4.5 Comparing Means
      1. 4.5.1 Calculating the means
      2. 4.5.2 t.test
    6. 4.6 Key Takeaways
    7. 4.7 Checklist
      1. 4.7.1 Data Creation & Import
      2. 4.7.2 Comparing Two Means
    8. 4.8 Key Functions & Commands
    9. 4.9 Example APA-style Write-up
    10. 4.10 💡 Reproducibility Tip:
  7. 5 Comparing Multiple Means
    1. 5.1 Introduction
    2. 5.2 Learning Objectives {anova-objectives}
    3. 5.3 Creating Our Data
    4. 5.4 Descriptive Statistics
    5. 5.5 Visualizing Relationships
    6. 5.6 Running a T.Test
    7. 5.7 One-Way ANOVA
    8. 5.8 Post-hoc Tests
    9. 5.9 Adding a Second Factor
    10. 5.10 Model Comparison With AIC
    11. 5.11 Key Takeaways
    12. 5.12 Checklist
    13. 5.13 Key Functions & Commands
    14. 5.14 Example APA-style Write-up
    15. 5.15 💡 Reproducibility Tip:
  8. 6 Analyzing Categorical Data
    1. 6.1 Introduction
    2. 6.2 Learning Objectives {cat-objectives}
    3. 6.3 Loading Our Data
    4. 6.4 Contingency Tables
    5. 6.5 Visualizations
    6. 6.6 Chi-Square Test
    7. 6.7 Cross Tables
    8. 6.8 Contribution
    9. 6.9 CramerV
    10. 6.10 Interpretation
    11. 6.11 Key Takeaways
    12. 6.12 Checklist
    13. 6.13 Key Functions & Commands
    14. 6.14 Example APA-style Write-up
    15. 6.15 💡 Reproducibility Tip:
  9. 7 Correlation
    1. 7.1 Introduction
      1. 7.1.1 Learning Objectives
    2. 7.2 Loading Our Data
    3. 7.3 Cleaning our data
    4. 7.4 Visualizing Relationships
    5. 7.5 Running Correlations (r)
    6. 7.6 Correlation Matrix
    7. 7.7 Coefficient of Determination (R^2)
    8. 7.8 Partial Correlations
    9. 7.9 Biserial and Point-Biserial Correlations
    10. 7.10 Grouped Correlations
    11. 7.11 Conclusion
    12. 7.12 Key Takeaways
    13. 7.13 Checklist
    14. 7.14 Key Functions & Commands
    15. 7.15 Example APA-style Write-up
      1. 7.15.1 Bivariate Correlation
      2. 7.15.2 Positive Correlation
      3. 7.15.3 Partial Correlation
    16. 7.16 💡 Reproducibility Tip:
  10. 8 Linear Regression
    1. 8.1 Introduction
    2. 8.2 Learning Objectives {lin-reg-objectives}
    3. 8.3 Loading Our Data
    4. 8.4 Cleaning Our Data
    5. 8.5 Visualizing Relationships
    6. 8.6 Understanding Correlation
    7. 8.7 Linear Regression Model
    8. 8.8 Checking the residuals
    9. 8.9 Adding more variables
      1. 8.9.1 Bonus code
    10. 8.10 Conclusion
    11. 8.11 Key Takeaways
    12. 8.12 Checklist
    13. 8.13 Key Functions & Commands
    14. 8.14 Example APA-style Write-up
    15. 8.15 💡 Reproducibility Tip:
  11. 9 Logistic Regression
    1. 9.1 Introduction
    2. 9.2 Learning Objectives {log-reg-objectives}
    3. 9.3 Load and Preview Data
    4. 9.4 Exploratory Data Analysis
    5. 9.5 Visualize Relationships
    6. 9.6 Train and Test Split
    7. 9.7 Build Logistic Regression Model
      1. 9.7.1 McFadden’s Pseudo-R²
      2. 9.7.2 Variable Importance
      3. 9.7.3 Multicollinearity check
    8. 9.8 Make Predictions
    9. 9.9 Evaluate Model
    10. 9.10 ROC Curve + AUC
    11. 9.11 Interpretation
    12. 9.12 Key Takeaways
    13. 9.13 Checklist
    14. 9.14 Key Functions & Commands
    15. 9.15 Example APA-style Write-up
    16. 9.16 💡 Reproducibility Tip:
  12. 10 Reproducible Reporting
    1. 10.1 Introduction
    2. 10.2 Learning Objectives {r-markdown-objectives}
    3. 10.3 Creating an R Markdown File
    4. 10.4 Parts of an R Markdown File
      1. 10.4.1 The YAML
      2. 10.4.2 Text
      3. 10.4.3 R Chunks
      4. 10.4.4 Sections
    5. 10.5 Knitting an R Markdown File
    6. 10.6 Publishing an R Markdown file
    7. 10.7 Extras
      1. 10.7.1 Links
      2. 10.7.2 Pictures
      3. 10.7.3 Checklists
      4. 10.7.4 Standout sections
      5. 10.7.5 Changing Setting of Specific R Chunks
    8. 10.8 Key Takeaways
    9. 10.9 Checklist
    10. 10.10 Key Functions & Commands
    11. 10.11 Summary of Common R Markdown Syntax
    12. 10.12 💡 Reproducibility Tip:
  13. Appendix: Reproducibility Checklist for Data Analysis in R
    1. 10.13 Project & Environment
    2. 10.14 Data Integrity & Structure
    3. 10.15 Data Transformation & Workflow
    4. 10.16 Merging & Reshaping Data
    5. 10.17 Visualization & Communication
    6. 10.18 Statistical Reasoning
    7. 10.19 Modeling & Inference
    8. 10.20 Randomness & Evaluation
    9. 10.21 Reporting & Execution
    10. 10.22 Final Check
  14. Packages & Functions Reference

How to Use This Book

This book is designed to be flexible. You can read it cover-to-cover, jump directly to specific chapters, or use it as a reference alongside your own projects.

0.4 Chapter Anatomy

The breakdown of the book is as follows:

  • Part I: Foundations
    • Getting Started with R
    • Working with Data Using the tidyverse
    • Data Visualization with ggplot2
  • Part II: Making Comparisons
    • Comparing Two Groups: Data Wrangling, Visualization, and t-Tests
    • Comparing Multiple Means
    • Analyzing Categorical Data
  • Part III: Relationships and Modeling
    • Correlation
    • Linear Regression
    • Logistic Regression
  • Part IV: Reproducible Communication
    • Reproducible Reporting with R Markdown

Most chapters follow a consistent structure:

  • Conceptual explanation of why a tool or method is useful
  • Step-by-step code examples
  • Visualizations and outputs
  • Interpretation and best practices
  • A checklist to reinforce reproducible habits

This repetition is intentional. Consistency helps build intuition.

0.5 Code, Data, and Reproducibility

All code in this book is meant to be run, modified, and occasionally broken. Learning happens when you experiment. As my father always says:

That’s why they put erasers on pencils

The datasets used throughout the book are provided in a companion R package so that readers can load them directly without downloading files manually. This ensures that examples work the same way for everyone.

When figures, tables, or analyses appear in this book, they are generated directly from code—never copied and pasted from external software.

0.6 Acknowledgments

This book would not exist without the curiosity, questions, and persistence of students at Brooklyn College. Their willingness to wrestle with messy data and imperfect code shaped both the content and the tone of this text.

Additional thanks go to the Open Educational Resources team at Brooklyn College and to the broader R community, whose commitment to open tools and shared knowledge makes projects like this possible.

Annotate

Next Chapter
1 Getting Started with R
PreviousNext
Textbook
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org