Skip to main content

The 2025 Brooklyn Open Data Collection: Analyst Portfolios: About

The 2025 Brooklyn Open Data Collection: Analyst Portfolios
About
  • Show the following:

    Annotations
    Resources
  • Adjust appearance:

    Font
    Font style
    Color Scheme
    Light
    Dark
    Annotation contrast
    Low
    High
    Margins
  • Search within:
    • Notifications
    • Privacy
  • Project HomeBrooklyn Civic Data Lab
  • Projects
  • Learn more about Manifold

Notes

table of contents
  1. About
    1. 0.1 How to Use This Book
    2. 0.2 Companion Textbook
    3. 0.3 Instructor Note
    4. 0.4 Why NYC Open Data?
    5. 0.5 Contributors
    6. 0.6 Acknowledgments
    7. 0.7 How to Cite This Volume
  2. 1 Toxic Homes: Exploring Mold Exposure Complaint and Domestic Violence Report Trends in NYC
    1. 1.1 Loading, Prepping, Cleaning, & Aggregating
      1. 1.1.1 Data Preparation & Cleaning
      2. 1.1.2 Aggregating Mold Data & DV Data
    2. 1.2 Exploring the Data
      1. 1.2.1 Domestic Violence Data
      2. 1.2.2 Mold Exposure Data
      3. 1.2.3 Summary Stats
      4. 1.2.4 Borough/Year Distributions
      5. 1.2.5 Heat Map
      6. 1.2.6 Preliminary Correlation
    3. 1.3 Temporal Trends
      1. 1.3.1 Exploring Mold Resolution
      2. 1.3.2 Quick Look at Resolution Time
      3. 1.3.3 Average Resolution Delay per Month
      4. 1.3.4 Lagged Data
    4. 1.4 Statistical Analysis
    5. 1.5 Regression Models
    6. 1.6 Discussion & Insights
  3. 2 Beating Around the Bush: Uncovering the Hidden Link Between Urban Trees and Wildlife Activity
    1. 2.1 Required Packages
    2. 2.2 Data and Methods
      1. 2.2.1 Data Sources
      2. 2.2.2 Data Cleaning and Preparation
    3. 2.3 Descriptive Analysis (Plots)
      1. 2.3.1 Street Tree Distribution Across Boroughs (Bar chart)
      2. 2.3.2 Wildlife Incidents Across Boroughs (Bar chart)
      3. 2.3.3 Combining Tree and Wildlife Data at the Borough Level (Table)
      4. 2.3.4 Wildlife Incidents Relative to Street Tree Availability (Standardized bar chart / rate per 10,000 trees)
      5. 2.3.5 Spatial Distribution of Street Trees (Binned spatial density plot / heatmap)
      6. 2.3.6 Park-Level Patterns in Wildlife Incidents (Faceted horizontal bar chart)
      7. 2.3.7 Species Involved in Wildlife Incidents (Faceted horizontal bar chart)
    4. 2.4 Inferential and Exploratory Analyses
      1. 2.4.1 Differences in Average Street Tree Size Across Boroughs (One-way ANOVA)
      2. 2.4.2 Association Between Borough and Wildlife Condition (Chi-square test of independence)
      3. 2.4.3 Exploratory Relationship Between Street Tree Abundance and Wildlife Incidents (Simple linear regression)
    5. 2.5 Discussion and Implications
      1. 2.5.1 Conclusion
      2. 2.5.2 Audience & Relevance
      3. 2.5.3 Connection to Open Data
  4. 3 Environmental Stressors and Social Complaints in New York City
    1. 3.1 Research Question
    2. 3.2 Data Sources
    3. 3.3 Reproducible Workflow
    4. 3.4 Loading Downloaded Excel Datasets
    5. 3.5 Accessing NYC Open Data via API (311 Noise Complaints)
    6. 3.6 Data Cleaning and Preparation
    7. 3.7 Merging Datasets
    8. 3.8 Descriptive Statistics
    9. 3.9 Visualization 1: Flooding Complaints by Borough
    10. 3.10 Visualization 2: Flooding and Noise Complaints
    11. 3.11 Statistical Analysis
    12. 3.12 Results
    13. 3.13 Discussion
    14. 3.14 Limitations and Future Directons
    15. 3.15 Connection to Open Data
    16. 3.16 Conclusion
  5. 4 The Madison Square Garden Effect in the NBA
    1. 4.0.1 What is Madison Square Garden?
    2. 4.0.2 What makes MSG so special?
    3. 4.0.3 Is the MSG effect real?
    4. 4.0.4 Three overarching research questions:
    5. 4.1 —————————————————————————–
    6. 4.2 NBA Data Project
    7. 4.3 —————————————————————————–
    8. 4.4 Q1: Do the New York Knicks experience a special home-court advantage due to playing at MSG?
    9. 4.5 —————————————————————————–
    10. 4.6 Q2: Do visiting players play differently at MSG than other arenas?
      1. 4.6.1 For context, let’s look at the league-wide home vs. away comparisons.
      2. 4.6.2 Let’s see if visiting players play better or worse at MSG compared to other away games.
    11. 4.7 —————————————————————————–
    12. 4.8 Q3: Who benefits the most from playing at MSG?
      1. 4.8.1 Which players put up the best performances at MSG? (min = 8 games played at MSG)
      2. 4.8.2 Who steps up their game the most playing at MSG vs. other away games?
      3. 4.8.3 Let’s also look at shooting efficiency.
      4. 4.8.4 How do the stars of the NBA today perform at MSG compared to other venues?
    13. 4.9 —————————————————————————–
    14. 4.10 Conclusion: Is the MSG Effect detectable?
      1. 4.10.1 On an individual player performance level: yes.
  6. 5 NYC Restaurants and Museums
    1. 5.1 Packages
    2. 5.2 Data Loading, Cleaning, and Merging
    3. 5.3 Loading Data
    4. 5.4 Cleaning and Merging Data Sets
      1. 5.4.1 Cleaning “restaurant_rating_data” Set
    5. 5.5 Cleaning “restaurant_data” Set
    6. 5.6 Merging Data Sets
    7. 5.7 Inputting Ratings for EACH Restaurant
    8. 5.8 Deleting Restaurants Without Rating from Google
    9. 5.9 Merging “dba” and “name” Columns
    10. 5.10 Deleting Unnecessary Columns in “merged_restaurant_data” Set
    11. 5.11 Cleaning “museum_data” Set
    12. 5.12 Goal 1: Statistical analysis (higher ratings)
    13. 5.13 Creating New Column
    14. 5.14 Typing “Yes” or “No”
    15. 5.15 Binning ratings into Groups
    16. 5.16 Contingency Table
    17. 5.17 Visualizing our Data
    18. 5.18 Chi-Square Test
      1. 5.18.1 Chi=Square Interpretation
    19. 5.19 Goal 2: Statistical analysis (Restaurant Violations)
    20. 5.20 Creating New Column
    21. 5.21 Typing “None” or “Critical”
    22. 5.22 Contingency Table
    23. 5.23 Visualizing our Data
    24. 5.24 Chi-Square Test
      1. 5.24.1 Interpretation
    25. 5.25 Fisher’s Exact Test
      1. 5.25.1 Interpretation
    26. 5.26 Goal 3: Creating an interactive Map
    27. 5.27 Conclusion
    28. 5.28 References
  7. 6 Leading Causes of Death and Indoor Environmental Complaints
    1. 6.1 Loading Libraries and importing data sets
    2. 6.2 Cleaning the data sets
    3. 6.3 Looking at both data sets
    4. 6.4 Visualizations
    5. 6.5 Pairing Complaint types with Causes of Death
    6. 6.6 Process of merging data
    7. 6.7 Merged Data
    8. 6.8 Corrleation between causes of death and indoor environmental complaints
    9. 6.9 Linear Regression
    10. 6.10 Relevance and Conclusion
  8. 7 Social Infrastructure & Well-Being
    1. 7.1 Libraries Used
    2. 7.2 Data Loading
    3. 7.3 Cleaning
      1. 7.3.1 Basic Events Cleaning
      2. 7.3.2 BoroReport Cleaning
      3. 7.3.3 Final Events Cleaning
    4. 7.4 Events Count
    5. 7.5 SNAP Benefits Count
    6. 7.6 Merging
    7. 7.7 Linear Regression
    8. 7.8 Conclusion

About

During the Fall 2025 semester, students in the M.S. program in Psychological Research at Brooklyn College completed the inaugural offering of Reproducible Psychological Research. Using the R programming language, students developed weekly R Markdown documents to solve simulated real-world analytical problems using authentic datasets, with an emphasis on transparency, documentation, and reproducibility.

For their final projects, students were tasked with conducting independent, original research using open data related to New York City. Rather than working with pre-cleaned or artificial datasets, students engaged directly with messy, real-world data and were responsible for every step of the analytical workflow—from data acquisition and cleaning to analysis, visualization, and interpretation. A majority of projects utilized data from the NYC Open Data Portal, though students were encouraged to explore any open NYC-based data source that aligned with their research questions.

Each project in this volume represents a complete, reproducible research artifact. Students were required to meet the following criteria:

  1. The data must be openly available
  2. The data must meaningfully relate to New York City
  3. The research question, analysis, and interpretation must be original

Collectively, these projects demonstrate not only technical proficiency in R, but also the ability to ask meaningful questions about the city students live in, evaluate real-world data critically, and communicate findings in a clear, reproducible manner. This volume serves both as a showcase of student growth and as an example of how open data and open-source tools can be used to conduct rigorous, socially relevant research.

Chapters are organized in alphabetical order of the student’s last names.

0.1 How to Use This Book

This volume is designed for students, educators, and practitioners interested in applied data analysis, reproducible research, and open data. Each chapter represents an independent research project and can be read on its own. Readers are encouraged to explore the accompanying code, reproduce analyses, and adapt methods for their own work.

0.2 Companion Textbook

This volume is designed to accompany the open-access textbook Reproducible Research in R, which provides the conceptual foundations, worked examples, and technical instruction used throughout the course.

The textbook is freely available online at:

https://martinezc1-reproducible-research-in-r.share.connect.posit.cloud/

Readers new to reproducible research or the R programming language are encouraged to consult the textbook alongside this volume. While the projects in this book stand on their own, the textbook offers additional context on methodology, statistical reasoning, and reproducible workflows.

0.3 Instructor Note

This volume was developed as part of a graduate-level course emphasizing reproducible research practices, open-source tools, and applied data analysis. Students were encouraged to take intellectual risks, work with imperfect data, and document their analytical decisions transparently. The goal was not perfection, but clarity, rigor, and reproducibility.

Through this project, students aimed to:

  • Formulate original research questions using open data
  • Apply reproducible workflows in R and R Markdown
  • Critically evaluate real-world data limitations
  • Communicate findings clearly to a public audience

0.4 Why NYC Open Data?

New York City’s open data ecosystem provides a unique opportunity to study real-world phenomena at scale. By working with publicly available city data, students were able to connect statistical methods to the communities, systems, and policies that shape daily life in New York City. To support not only these projects, but also broader public access to NYC Open Data, the nycOpenData package was created.

0.5 Contributors

The following students contributed original research projects to this volume as part of the Fall 2025 offering of Reproducible Psychological Research. Links are provided for completed chapters included in this edition.

  • Crystal Adote - Chapter 6
  • Jonah Dratfield - Chapter 7
  • Joyce Escatel Flores - Chapter 5
  • Robert Hutto - Chapter 4
  • Isley Jean-Pierre - Chapter (chapter forthcoming)
  • Shannon Joyce - Chapter 1
  • Emma Valentina Tupone - Chapter 3
  • Xinru Wang - Chapter 2
  • Laura Werner - Chapter (chapter forthcoming)

0.6 Acknowledgments

Special thanks to the Brooklyn College Open Educational Resources (OER) team for their support throughout the development of this project. Their dedication to open education and student-centered learning helped make this volume possible.

0.7 How to Cite This Volume

If you use or reference this volume, please cite it as:

Martinez, C. A. (Ed.). (2025). NYC Open Data: Student Research Projects in Reproducible Psychological Research. Brooklyn College, CUNY. Open-access textbook.

Annotate

Next Chapter
1 Toxic Homes: Exploring Mold Exposure Complaint and Domestic Violence Report Trends in NYC
PreviousNext
Analyst Case Studies
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org