Skip to main content
Menu
Contents
The 2025 Brooklyn Open Data Collection: Analyst Portfolios: Title Page
The 2025 Brooklyn Open Data Collection: Analyst Portfolios
Title Page
Visibility
Show the following:
Annotations
Yours
Others
Your highlights
Resources
Show all
Reader Appearance
Adjust appearance:
Font
Font style
Serif
Sans-serif
Decrease font size
Increase font size
Decrease font size
Increase font size
Color Scheme
Light
Dark
Annotation contrast
abc
Low
abc
High
Margins
Increase text margins
Decrease text margins
Reset to Defaults
Search
Enter search criteria
Execute search
Search within:
chapter
text
project
Sign In
avatar
Edit Profile
Notifications
Privacy
Log Out
Project Home
Brooklyn Civic Data Lab
Projects
Sign In
Learn more about
Manifold
Notes
Close
table of contents
About
0.1 How to Use This Book
0.2 Companion Textbook
0.3 Instructor Note
0.4 Why NYC Open Data?
0.5 Contributors
0.6 Acknowledgments
0.7 How to Cite This Volume
1 Toxic Homes: Exploring Mold Exposure Complaint and Domestic Violence Report Trends in NYC
1.1 Loading, Prepping, Cleaning, & Aggregating
1.1.1 Data Preparation & Cleaning
1.1.2 Aggregating Mold Data & DV Data
1.2 Exploring the Data
1.2.1 Domestic Violence Data
1.2.2 Mold Exposure Data
1.2.3 Summary Stats
1.2.4 Borough/Year Distributions
1.2.5 Heat Map
1.2.6 Preliminary Correlation
1.3 Temporal Trends
1.3.1 Exploring Mold Resolution
1.3.2 Quick Look at Resolution Time
1.3.3 Average Resolution Delay per Month
1.3.4 Lagged Data
1.4 Statistical Analysis
1.5 Regression Models
1.6 Discussion & Insights
2 Beating Around the Bush: Uncovering the Hidden Link Between Urban Trees and Wildlife Activity
2.1 Required Packages
2.2 Data and Methods
2.2.1 Data Sources
2.2.2 Data Cleaning and Preparation
2.3 Descriptive Analysis (Plots)
2.3.1 Street Tree Distribution Across Boroughs (Bar chart)
2.3.2 Wildlife Incidents Across Boroughs (Bar chart)
2.3.3 Combining Tree and Wildlife Data at the Borough Level (Table)
2.3.4 Wildlife Incidents Relative to Street Tree Availability (Standardized bar chart / rate per 10,000 trees)
2.3.5 Spatial Distribution of Street Trees (Binned spatial density plot / heatmap)
2.3.6 Park-Level Patterns in Wildlife Incidents (Faceted horizontal bar chart)
2.3.7 Species Involved in Wildlife Incidents (Faceted horizontal bar chart)
2.4 Inferential and Exploratory Analyses
2.4.1 Differences in Average Street Tree Size Across Boroughs (One-way ANOVA)
2.4.2 Association Between Borough and Wildlife Condition (Chi-square test of independence)
2.4.3 Exploratory Relationship Between Street Tree Abundance and Wildlife Incidents (Simple linear regression)
2.5 Discussion and Implications
2.5.1 Conclusion
2.5.2 Audience & Relevance
2.5.3 Connection to Open Data
3 Environmental Stressors and Social Complaints in New York City
3.1 Research Question
3.2 Data Sources
3.3 Reproducible Workflow
3.4 Loading Downloaded Excel Datasets
3.5 Accessing NYC Open Data via API (311 Noise Complaints)
3.6 Data Cleaning and Preparation
3.7 Merging Datasets
3.8 Descriptive Statistics
3.9 Visualization 1: Flooding Complaints by Borough
3.10 Visualization 2: Flooding and Noise Complaints
3.11 Statistical Analysis
3.12 Results
3.13 Discussion
3.14 Limitations and Future Directons
3.15 Connection to Open Data
3.16 Conclusion
4 The Madison Square Garden Effect in the NBA
4.0.1 What is Madison Square Garden?
4.0.2 What makes MSG so special?
4.0.3 Is the MSG effect real?
4.0.4 Three overarching research questions:
4.1 —————————————————————————–
4.2 NBA Data Project
4.3 —————————————————————————–
4.4 Q1: Do the New York Knicks experience a special home-court advantage due to playing at MSG?
4.5 —————————————————————————–
4.6 Q2: Do visiting players play differently at MSG than other arenas?
4.6.1 For context, let’s look at the league-wide home vs. away comparisons.
4.6.2 Let’s see if visiting players play better or worse at MSG compared to other away games.
4.7 —————————————————————————–
4.8 Q3: Who benefits the most from playing at MSG?
4.8.1 Which players put up the best performances at MSG? (min = 8 games played at MSG)
4.8.2 Who steps up their game the most playing at MSG vs. other away games?
4.8.3 Let’s also look at shooting efficiency.
4.8.4 How do the stars of the NBA today perform at MSG compared to other venues?
4.9 —————————————————————————–
4.10 Conclusion: Is the MSG Effect detectable?
4.10.1 On an individual player performance level: yes.
5 NYC Restaurants and Museums
5.1 Packages
5.2 Data Loading, Cleaning, and Merging
5.3 Loading Data
5.4 Cleaning and Merging Data Sets
5.4.1 Cleaning “restaurant_rating_data” Set
5.5 Cleaning “restaurant_data” Set
5.6 Merging Data Sets
5.7 Inputting Ratings for EACH Restaurant
5.8 Deleting Restaurants Without Rating from Google
5.9 Merging “dba” and “name” Columns
5.10 Deleting Unnecessary Columns in “merged_restaurant_data” Set
5.11 Cleaning “museum_data” Set
5.12 Goal 1: Statistical analysis (higher ratings)
5.13 Creating New Column
5.14 Typing “Yes” or “No”
5.15 Binning ratings into Groups
5.16 Contingency Table
5.17 Visualizing our Data
5.18 Chi-Square Test
5.18.1 Chi=Square Interpretation
5.19 Goal 2: Statistical analysis (Restaurant Violations)
5.20 Creating New Column
5.21 Typing “None” or “Critical”
5.22 Contingency Table
5.23 Visualizing our Data
5.24 Chi-Square Test
5.24.1 Interpretation
5.25 Fisher’s Exact Test
5.25.1 Interpretation
5.26 Goal 3: Creating an interactive Map
5.27 Conclusion
5.28 References
6 Leading Causes of Death and Indoor Environmental Complaints
6.1 Loading Libraries and importing data sets
6.2 Cleaning the data sets
6.3 Looking at both data sets
6.4 Visualizations
6.5 Pairing Complaint types with Causes of Death
6.6 Process of merging data
6.7 Merged Data
6.8 Corrleation between causes of death and indoor environmental complaints
6.9 Linear Regression
6.10 Relevance and Conclusion
7 Social Infrastructure & Well-Being
7.1 Libraries Used
7.2 Data Loading
7.3 Cleaning
7.3.1 Basic Events Cleaning
7.3.2 BoroReport Cleaning
7.3.3 Final Events Cleaning
7.4 Events Count
7.5 SNAP Benefits Count
7.6 Merging
7.7 Linear Regression
7.8 Conclusion
About This Text
Reproducible Research Using R: NYC Open Data Projects
Christian Martinez
2026-01-06
Annotate
Close
Next Chapter
About
Next
Analyst Case Studies