Skip to main content

The 2025 Brooklyn Open Data Collection: Analyst Portfolios: Title Page

The 2025 Brooklyn Open Data Collection: Analyst Portfolios
Title Page
  • Show the following:

    Annotations
    Resources
  • Adjust appearance:

    Font
    Font style
    Color Scheme
    Light
    Dark
    Annotation contrast
    Low
    High
    Margins
  • Search within:
    • Notifications
    • Privacy
  • Project HomeBrooklyn Civic Data Lab
  • Projects
  • Learn more about Manifold

Notes

table of contents
  1. About
    1. 0.1 How to Use This Book
    2. 0.2 Companion Textbook
    3. 0.3 Instructor Note
    4. 0.4 Why NYC Open Data?
    5. 0.5 Contributors
    6. 0.6 Acknowledgments
    7. 0.7 How to Cite This Volume
  2. 1 Toxic Homes: Exploring Mold Exposure Complaint and Domestic Violence Report Trends in NYC
    1. 1.1 Loading, Prepping, Cleaning, & Aggregating
      1. 1.1.1 Data Preparation & Cleaning
      2. 1.1.2 Aggregating Mold Data & DV Data
    2. 1.2 Exploring the Data
      1. 1.2.1 Domestic Violence Data
      2. 1.2.2 Mold Exposure Data
      3. 1.2.3 Summary Stats
      4. 1.2.4 Borough/Year Distributions
      5. 1.2.5 Heat Map
      6. 1.2.6 Preliminary Correlation
    3. 1.3 Temporal Trends
      1. 1.3.1 Exploring Mold Resolution
      2. 1.3.2 Quick Look at Resolution Time
      3. 1.3.3 Average Resolution Delay per Month
      4. 1.3.4 Lagged Data
    4. 1.4 Statistical Analysis
    5. 1.5 Regression Models
    6. 1.6 Discussion & Insights
  3. 2 Beating Around the Bush: Uncovering the Hidden Link Between Urban Trees and Wildlife Activity
    1. 2.1 Required Packages
    2. 2.2 Data and Methods
      1. 2.2.1 Data Sources
      2. 2.2.2 Data Cleaning and Preparation
    3. 2.3 Descriptive Analysis (Plots)
      1. 2.3.1 Street Tree Distribution Across Boroughs (Bar chart)
      2. 2.3.2 Wildlife Incidents Across Boroughs (Bar chart)
      3. 2.3.3 Combining Tree and Wildlife Data at the Borough Level (Table)
      4. 2.3.4 Wildlife Incidents Relative to Street Tree Availability (Standardized bar chart / rate per 10,000 trees)
      5. 2.3.5 Spatial Distribution of Street Trees (Binned spatial density plot / heatmap)
      6. 2.3.6 Park-Level Patterns in Wildlife Incidents (Faceted horizontal bar chart)
      7. 2.3.7 Species Involved in Wildlife Incidents (Faceted horizontal bar chart)
    4. 2.4 Inferential and Exploratory Analyses
      1. 2.4.1 Differences in Average Street Tree Size Across Boroughs (One-way ANOVA)
      2. 2.4.2 Association Between Borough and Wildlife Condition (Chi-square test of independence)
      3. 2.4.3 Exploratory Relationship Between Street Tree Abundance and Wildlife Incidents (Simple linear regression)
    5. 2.5 Discussion and Implications
      1. 2.5.1 Conclusion
      2. 2.5.2 Audience & Relevance
      3. 2.5.3 Connection to Open Data
  4. 3 Environmental Stressors and Social Complaints in New York City
    1. 3.1 Research Question
    2. 3.2 Data Sources
    3. 3.3 Reproducible Workflow
    4. 3.4 Loading Downloaded Excel Datasets
    5. 3.5 Accessing NYC Open Data via API (311 Noise Complaints)
    6. 3.6 Data Cleaning and Preparation
    7. 3.7 Merging Datasets
    8. 3.8 Descriptive Statistics
    9. 3.9 Visualization 1: Flooding Complaints by Borough
    10. 3.10 Visualization 2: Flooding and Noise Complaints
    11. 3.11 Statistical Analysis
    12. 3.12 Results
    13. 3.13 Discussion
    14. 3.14 Limitations and Future Directons
    15. 3.15 Connection to Open Data
    16. 3.16 Conclusion
  5. 4 The Madison Square Garden Effect in the NBA
    1. 4.0.1 What is Madison Square Garden?
    2. 4.0.2 What makes MSG so special?
    3. 4.0.3 Is the MSG effect real?
    4. 4.0.4 Three overarching research questions:
    5. 4.1 —————————————————————————–
    6. 4.2 NBA Data Project
    7. 4.3 —————————————————————————–
    8. 4.4 Q1: Do the New York Knicks experience a special home-court advantage due to playing at MSG?
    9. 4.5 —————————————————————————–
    10. 4.6 Q2: Do visiting players play differently at MSG than other arenas?
      1. 4.6.1 For context, let’s look at the league-wide home vs. away comparisons.
      2. 4.6.2 Let’s see if visiting players play better or worse at MSG compared to other away games.
    11. 4.7 —————————————————————————–
    12. 4.8 Q3: Who benefits the most from playing at MSG?
      1. 4.8.1 Which players put up the best performances at MSG? (min = 8 games played at MSG)
      2. 4.8.2 Who steps up their game the most playing at MSG vs. other away games?
      3. 4.8.3 Let’s also look at shooting efficiency.
      4. 4.8.4 How do the stars of the NBA today perform at MSG compared to other venues?
    13. 4.9 —————————————————————————–
    14. 4.10 Conclusion: Is the MSG Effect detectable?
      1. 4.10.1 On an individual player performance level: yes.
  6. 5 NYC Restaurants and Museums
    1. 5.1 Packages
    2. 5.2 Data Loading, Cleaning, and Merging
    3. 5.3 Loading Data
    4. 5.4 Cleaning and Merging Data Sets
      1. 5.4.1 Cleaning “restaurant_rating_data” Set
    5. 5.5 Cleaning “restaurant_data” Set
    6. 5.6 Merging Data Sets
    7. 5.7 Inputting Ratings for EACH Restaurant
    8. 5.8 Deleting Restaurants Without Rating from Google
    9. 5.9 Merging “dba” and “name” Columns
    10. 5.10 Deleting Unnecessary Columns in “merged_restaurant_data” Set
    11. 5.11 Cleaning “museum_data” Set
    12. 5.12 Goal 1: Statistical analysis (higher ratings)
    13. 5.13 Creating New Column
    14. 5.14 Typing “Yes” or “No”
    15. 5.15 Binning ratings into Groups
    16. 5.16 Contingency Table
    17. 5.17 Visualizing our Data
    18. 5.18 Chi-Square Test
      1. 5.18.1 Chi=Square Interpretation
    19. 5.19 Goal 2: Statistical analysis (Restaurant Violations)
    20. 5.20 Creating New Column
    21. 5.21 Typing “None” or “Critical”
    22. 5.22 Contingency Table
    23. 5.23 Visualizing our Data
    24. 5.24 Chi-Square Test
      1. 5.24.1 Interpretation
    25. 5.25 Fisher’s Exact Test
      1. 5.25.1 Interpretation
    26. 5.26 Goal 3: Creating an interactive Map
    27. 5.27 Conclusion
    28. 5.28 References
  7. 6 Leading Causes of Death and Indoor Environmental Complaints
    1. 6.1 Loading Libraries and importing data sets
    2. 6.2 Cleaning the data sets
    3. 6.3 Looking at both data sets
    4. 6.4 Visualizations
    5. 6.5 Pairing Complaint types with Causes of Death
    6. 6.6 Process of merging data
    7. 6.7 Merged Data
    8. 6.8 Corrleation between causes of death and indoor environmental complaints
    9. 6.9 Linear Regression
    10. 6.10 Relevance and Conclusion
  8. 7 Social Infrastructure & Well-Being
    1. 7.1 Libraries Used
    2. 7.2 Data Loading
    3. 7.3 Cleaning
      1. 7.3.1 Basic Events Cleaning
      2. 7.3.2 BoroReport Cleaning
      3. 7.3.3 Final Events Cleaning
    4. 7.4 Events Count
    5. 7.5 SNAP Benefits Count
    6. 7.6 Merging
    7. 7.7 Linear Regression
    8. 7.8 Conclusion

Reproducible Research Using R: NYC Open Data Projects

Christian Martinez

2026-01-06

Annotate

Next Chapter
About
Next
Analyst Case Studies
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org