Skip to main content

The 2025 Brooklyn Open Data Collection: Analyst Portfolios: 1 Toxic Homes: Exploring Mold Exposure Complaint and Domestic Violence Report Trends in NYC

The 2025 Brooklyn Open Data Collection: Analyst Portfolios
1 Toxic Homes: Exploring Mold Exposure Complaint and Domestic Violence Report Trends in NYC
  • Show the following:

    Annotations
    Resources
  • Adjust appearance:

    Font
    Font style
    Color Scheme
    Light
    Dark
    Annotation contrast
    Low
    High
    Margins
  • Search within:
    • Notifications
    • Privacy
  • Project HomeBrooklyn Civic Data Lab
  • Projects
  • Learn more about Manifold

Notes

table of contents
  1. About
    1. 0.1 How to Use This Book
    2. 0.2 Companion Textbook
    3. 0.3 Instructor Note
    4. 0.4 Why NYC Open Data?
    5. 0.5 Contributors
    6. 0.6 Acknowledgments
    7. 0.7 How to Cite This Volume
  2. 1 Toxic Homes: Exploring Mold Exposure Complaint and Domestic Violence Report Trends in NYC
    1. 1.1 Loading, Prepping, Cleaning, & Aggregating
      1. 1.1.1 Data Preparation & Cleaning
      2. 1.1.2 Aggregating Mold Data & DV Data
    2. 1.2 Exploring the Data
      1. 1.2.1 Domestic Violence Data
      2. 1.2.2 Mold Exposure Data
      3. 1.2.3 Summary Stats
      4. 1.2.4 Borough/Year Distributions
      5. 1.2.5 Heat Map
      6. 1.2.6 Preliminary Correlation
    3. 1.3 Temporal Trends
      1. 1.3.1 Exploring Mold Resolution
      2. 1.3.2 Quick Look at Resolution Time
      3. 1.3.3 Average Resolution Delay per Month
      4. 1.3.4 Lagged Data
    4. 1.4 Statistical Analysis
    5. 1.5 Regression Models
    6. 1.6 Discussion & Insights
  3. 2 Beating Around the Bush: Uncovering the Hidden Link Between Urban Trees and Wildlife Activity
    1. 2.1 Required Packages
    2. 2.2 Data and Methods
      1. 2.2.1 Data Sources
      2. 2.2.2 Data Cleaning and Preparation
    3. 2.3 Descriptive Analysis (Plots)
      1. 2.3.1 Street Tree Distribution Across Boroughs (Bar chart)
      2. 2.3.2 Wildlife Incidents Across Boroughs (Bar chart)
      3. 2.3.3 Combining Tree and Wildlife Data at the Borough Level (Table)
      4. 2.3.4 Wildlife Incidents Relative to Street Tree Availability (Standardized bar chart / rate per 10,000 trees)
      5. 2.3.5 Spatial Distribution of Street Trees (Binned spatial density plot / heatmap)
      6. 2.3.6 Park-Level Patterns in Wildlife Incidents (Faceted horizontal bar chart)
      7. 2.3.7 Species Involved in Wildlife Incidents (Faceted horizontal bar chart)
    4. 2.4 Inferential and Exploratory Analyses
      1. 2.4.1 Differences in Average Street Tree Size Across Boroughs (One-way ANOVA)
      2. 2.4.2 Association Between Borough and Wildlife Condition (Chi-square test of independence)
      3. 2.4.3 Exploratory Relationship Between Street Tree Abundance and Wildlife Incidents (Simple linear regression)
    5. 2.5 Discussion and Implications
      1. 2.5.1 Conclusion
      2. 2.5.2 Audience & Relevance
      3. 2.5.3 Connection to Open Data
  4. 3 Environmental Stressors and Social Complaints in New York City
    1. 3.1 Research Question
    2. 3.2 Data Sources
    3. 3.3 Reproducible Workflow
    4. 3.4 Loading Downloaded Excel Datasets
    5. 3.5 Accessing NYC Open Data via API (311 Noise Complaints)
    6. 3.6 Data Cleaning and Preparation
    7. 3.7 Merging Datasets
    8. 3.8 Descriptive Statistics
    9. 3.9 Visualization 1: Flooding Complaints by Borough
    10. 3.10 Visualization 2: Flooding and Noise Complaints
    11. 3.11 Statistical Analysis
    12. 3.12 Results
    13. 3.13 Discussion
    14. 3.14 Limitations and Future Directons
    15. 3.15 Connection to Open Data
    16. 3.16 Conclusion
  5. 4 The Madison Square Garden Effect in the NBA
    1. 4.0.1 What is Madison Square Garden?
    2. 4.0.2 What makes MSG so special?
    3. 4.0.3 Is the MSG effect real?
    4. 4.0.4 Three overarching research questions:
    5. 4.1 —————————————————————————–
    6. 4.2 NBA Data Project
    7. 4.3 —————————————————————————–
    8. 4.4 Q1: Do the New York Knicks experience a special home-court advantage due to playing at MSG?
    9. 4.5 —————————————————————————–
    10. 4.6 Q2: Do visiting players play differently at MSG than other arenas?
      1. 4.6.1 For context, let’s look at the league-wide home vs. away comparisons.
      2. 4.6.2 Let’s see if visiting players play better or worse at MSG compared to other away games.
    11. 4.7 —————————————————————————–
    12. 4.8 Q3: Who benefits the most from playing at MSG?
      1. 4.8.1 Which players put up the best performances at MSG? (min = 8 games played at MSG)
      2. 4.8.2 Who steps up their game the most playing at MSG vs. other away games?
      3. 4.8.3 Let’s also look at shooting efficiency.
      4. 4.8.4 How do the stars of the NBA today perform at MSG compared to other venues?
    13. 4.9 —————————————————————————–
    14. 4.10 Conclusion: Is the MSG Effect detectable?
      1. 4.10.1 On an individual player performance level: yes.
  6. 5 NYC Restaurants and Museums
    1. 5.1 Packages
    2. 5.2 Data Loading, Cleaning, and Merging
    3. 5.3 Loading Data
    4. 5.4 Cleaning and Merging Data Sets
      1. 5.4.1 Cleaning “restaurant_rating_data” Set
    5. 5.5 Cleaning “restaurant_data” Set
    6. 5.6 Merging Data Sets
    7. 5.7 Inputting Ratings for EACH Restaurant
    8. 5.8 Deleting Restaurants Without Rating from Google
    9. 5.9 Merging “dba” and “name” Columns
    10. 5.10 Deleting Unnecessary Columns in “merged_restaurant_data” Set
    11. 5.11 Cleaning “museum_data” Set
    12. 5.12 Goal 1: Statistical analysis (higher ratings)
    13. 5.13 Creating New Column
    14. 5.14 Typing “Yes” or “No”
    15. 5.15 Binning ratings into Groups
    16. 5.16 Contingency Table
    17. 5.17 Visualizing our Data
    18. 5.18 Chi-Square Test
      1. 5.18.1 Chi=Square Interpretation
    19. 5.19 Goal 2: Statistical analysis (Restaurant Violations)
    20. 5.20 Creating New Column
    21. 5.21 Typing “None” or “Critical”
    22. 5.22 Contingency Table
    23. 5.23 Visualizing our Data
    24. 5.24 Chi-Square Test
      1. 5.24.1 Interpretation
    25. 5.25 Fisher’s Exact Test
      1. 5.25.1 Interpretation
    26. 5.26 Goal 3: Creating an interactive Map
    27. 5.27 Conclusion
    28. 5.28 References
  7. 6 Leading Causes of Death and Indoor Environmental Complaints
    1. 6.1 Loading Libraries and importing data sets
    2. 6.2 Cleaning the data sets
    3. 6.3 Looking at both data sets
    4. 6.4 Visualizations
    5. 6.5 Pairing Complaint types with Causes of Death
    6. 6.6 Process of merging data
    7. 6.7 Merged Data
    8. 6.8 Corrleation between causes of death and indoor environmental complaints
    9. 6.9 Linear Regression
    10. 6.10 Relevance and Conclusion
  8. 7 Social Infrastructure & Well-Being
    1. 7.1 Libraries Used
    2. 7.2 Data Loading
    3. 7.3 Cleaning
      1. 7.3.1 Basic Events Cleaning
      2. 7.3.2 BoroReport Cleaning
      3. 7.3.3 Final Events Cleaning
    4. 7.4 Events Count
    5. 7.5 SNAP Benefits Count
    6. 7.6 Merging
    7. 7.7 Linear Regression
    8. 7.8 Conclusion

1 Toxic Homes: Exploring Mold Exposure Complaint and Domestic Violence Report Trends in NYC

Author: Shannon Joyce

Housing conditions are an important factor in public health and well-being. Poor residential environments, such as mold exposure, can contribute to physical health problems and increased stress. Domestic violence is also influenced by environmental and social stressors, making housing conditions a relevant area of study.

This project examines the relationship between residential mold complaints and domestic violence reports in New York City from 2010 to 2024. I am using two datasets from NYC Open Data: 311 Complaint Data to extract residential mold complaints and NYPD Complaint Data Historic to extract domestic violence reports. Using NYC 311 mold complaint data and DV report data, I explore whether these two types of reports follow similar patterns over time. The goal is not to determine causation, but to understand whether mold complaints and DV reports tend to rise and fall together.

The analysis focuses on monthly aggregated data and includes exploratory summaries, correlation analyses, and regression models. I also explore delayed (or lagged) relationships and mold complaint resolution time to better understand how timing may play a role.

1.1 Loading, Prepping, Cleaning, & Aggregating

1.1.1 Data Preparation & Cleaning

library(tidyverse)
library(readxl)
library(ggplot2)
library(mosaic)
library(AICcmodavg)
library(knitr)
mold_data <- read_excel("311_Service_Requests_from_2010_to_Present_20251215.xlsx")
dv_data <- read_excel("NYPD_Complaint_Data_Historic_20251218.xlsx")
mold_data_clean <- mold_data %>% filter(
           `Location Type` == "RESIDENTIAL BUILDING" | `Location Type` == "Residential Building" | `Location Type` == "Loft Residence" | `Location Type` == "Mixed Use Building" | `Location Type` == "Apartment" | `Location Type` == "3+ Family Apartment Building" | `Location Type` == "1-2 Family Dwelling" | `Location Type` == "1-2 Family Mixed Use Building" | `Location Type` == "3+ Family Mixed Use Building" | `Location Type` == "Single Room Occupancy (SRO)")

dv_data_clean<- dv_data %>% rename(
  "complaint_number" = `CMPLNT_NUM`,
  "inc_occur_date" = `CMPLNT_FR_DT`,
  "inc_occur_time" = `CMPLNT_FR_TM`,
  "inc_end_date" = `CMPLNT_TO_DT`,
  "inc_end_time" = `CMPLNT_TO_TM`,
  "precinct_occur" = `ADDR_PCT_CD`,
  "report_date" = `RPT_DT`,
  "key_code" = `KY_CD`,
  "offense_type" = `OFNS_DESC`,
  "class_code" = `PD_CD`,
  "class_code_desc" = `PD_DESC`,
  "attempt_completion" = `CRM_ATPT_CPTD_CD`,
  "offense_level" = `LAW_CAT_CD`,
  "borough" = `BORO_NM`,
  "occur_location" = `LOC_OF_OCCUR_DESC`,
  "premise_desc" = `PREM_TYP_DESC`,
  "juris_code_desc" = `JURIS_DESC`,
  "jurisdiction" = `JURISDICTION_CODE`,
  "park_occur" = `PARKS_NM`,
  "development" = `HADEVELOPT`,
  "development_code" = `HOUSING_PSA`,
  "x_coord" = `X_COORD_CD`,
  "y_coord" = `Y_COORD_CD`,
  "suspect_age" = `SUSP_AGE_GROUP`,
  "suspect_race" = `SUSP_RACE`,
  "suspect_sex" = `SUSP_SEX`,
  "transit_district" = `TRANSIT_DISTRICT`,
  "patrol_borough" = `PATROL_BORO`,
  "station_name" = `STATION_NAME`,
  "victim_age" = `VIC_AGE_GROUP`,
  "victim_race" = `VIC_RACE`,
  "victim_sex" = `VIC_SEX`
  )

mold_data_clean <- mold_data_clean %>%
  mutate(Created_Date_Original = `Created Date`) %>% 
  separate(
    col = `Created Date`,
  into = c("Year","Month","Day"),
  sep = "-", 
  remove = FALSE
)

dv_data_clean <- dv_data_clean %>% separate(
  col = inc_occur_date,
  into = c("Year","Month","Day"),
  sep = "-",
)

mold_data_clean <- mold_data_clean %>%
  filter(Descriptor != "Unsafe Mold Cleanup")

mold_data_clean<- mold_data_clean %>% mutate(
  `Complaint Type` = recode(
    `Complaint Type`, 
    "UNSANITARY CONDITION" = "Mold",
    "Unsanitary Condition" = "Mold",
    "MOLD" = "Mold",
    "GENERAL" = "Mold",
    "GENERAL CONSTRUCTION" = "Mold"
  ))

mold_data_clean<- mold_data_clean %>% mutate(
  `Month` = recode(
    `Month`, 
    "01" = "01 - January",
    "02" = "02 - February",
    "03" = "03 - March",
    "04" = "04 - April",
    "05" = "05 - May",
    "06" = "06 - June",
    "07" = "07 - July",
    "08" = "08 - August",
    "09" = "09 - September",
    "10" = "10 - October",
    "11" = "11 - November",
    "12" = "12 - December"
    ))

dv_data_clean<- dv_data_clean %>% mutate(
  `Month` = recode(
    `Month`, 
    "01" = "01 - January",
    "02" = "02 - February",
    "03" = "03 - March",
    "04" = "04 - April",
    "05" = "05 - May",
    "06" = "06 - June",
    "07" = "07 - July",
    "08" = "08 - August",
    "09" = "09 - September",
    "10" = "10 - October",
    "11" = "11 - November",
    "12" = "12 - December"
  ))

dv_data_clean$complaint <- "DV"

dv_data_clean <- dv_data_clean %>%
  mutate(Year = as.numeric(Year)) %>%
  filter(Year >= 2010)

mold_data_clean<- mold_data_clean %>% mutate(
  Borough = case_when(
    City == "NEW YORK" ~ "MANHATTAN",
    City == "BROOKLYN" ~ "BROOKLYN",
    City == "ARVERNE" ~ "QUEENS",
    City == "BRONX" ~ "BRONX",
    City == "JAMAICA" ~ "QUEENS",
    City == "SPRINGFIELD GARDENS" ~ "QUEENS",
    City == "FLUSHING" ~ "QUEENS",
    City == "STATEN ISLAND" ~ "STATEN ISLAND",
    City == "RICHMOND HILL" ~ "QUEENS",
    City == "ASTORIA" ~ "QUEENS",
    City == "HOLLIS" ~ "QUEENS",
    City == "RIDGEWOOD" ~ "QUEENS",
    City == "FOREST HILLS" ~ "QUEENS",
    City == "ELMHURST" ~ "QUEENS",
    City == "MASPETH" ~ "QUEENS",
    City == "SOUTH RICHMOND HILL" ~ "QUEENS",
    City == "JACKSON HEIGHTS" ~ "QUEENS",
    City == "BAYSIDE" ~ "QUEENS",
    City == "FAR ROCKAWAY" ~ "QUEENS",
    City == "SAINT ALBANS" ~ "QUEENS",
    City == "CORONA" ~ "QUEENS",
    City == "WOODSIDE" ~ "QUEENS",
    City == "QUEENS VILLAGE" ~ "QUEENS",
    City == "REGO PARK" ~ "QUEENS",
    City == "ROSEDALE" ~ "QUEENS",
    City == "SUNNYSIDE" ~ "QUEENS",
    City == "OZONE PARK" ~ "QUEENS",
    City == "EAST ELMHURST" ~ "QUEENS",
    City == "MIDDLE VILLAGE" ~ "QUEENS",
    City == "WOODHAVEN" ~ "QUEENS",
    City == "SOUTH OZONE PARK" ~ "QUEENS",
    City == "ROCKAWAY PARK" ~ "QUEENS",
    City == "KEW GARDENS" ~ "QUEENS",
    City == "FRESH MEADOWS" ~ "QUEENS",
    City == "COLLEGE POINT" ~ "QUEENS",
    City == "LONG ISLAND CITY" ~ "QUEENS",
    City == "OAKLAND GARDENS" ~ "QUEENS",
    City == "WHITESTONE" ~ "QUEENS",
    City == "HOWARD BEACH" ~ "QUEENS",
    City == "CAMBRIA HEIGHTS" ~ "QUEENS",
    City == "BELLEROSE" ~ "QUEENS",
    City == "LITTLE NECK" ~ "QUEENS",
    City == "BREEZY POINT" ~ "QUEENS",
    City == "GLEN OAKS" ~ "QUEENS",
    City == "FLORAL PARK" ~ "QUEENS",
    City ==  "PELHAM" ~ "BRONX",
    City == "NEW HYDE PARK" ~ "QUEENS",
    City == "QUEENS" ~ "QUEENS",
    City == "MANHATTAN" ~ "MANHATTAN",
    City == "Far Rockaway" ~ "QUEENS",  
    City == "Astoria" ~ "QUEENS",
    City == "Elmhurst" ~ "QUEENS",
    City == "Corona" ~ "QUEENS",       
    City == "Ozone Park" ~ "QUEENS",
    City == "Forest Hills" ~ "QUEENS",
    City == "Jamaica" ~ "QUEENS",  
    City == "Arverne" ~ "QUEENS",
    City == "Bayside" ~ "QUEENS",
    City == "East Elmhurst" ~ "QUEENS", 
    City == "Flushing" ~ "QUEENS",
    City == "Middle Village" ~ "QUEENS",
    City == "Ridgewood" ~ "QUEENS",
    City == "Woodside" ~ "QUEENS",
    City == "Oakland Gardens" ~ "QUEENS",
    City == "Rego Park" ~ "QUEENS",
    City == "Hollis" ~ "QUEENS",
    City == "Saint Albans" ~ "QUEENS",
    City == "Springfield Gardens" ~ "QUEENS",
    City == "Kew Gardens" ~ "QUEENS",
    City == "Fresh Meadows" ~ "QUEENS",
    City == "Howard Beach" ~ "QUEENS",
    City == "South Rich mond Hill" ~ "QUEENS",
    City == "Whitestone" ~ "QUEENS",
    City == "South Ozone Park" ~ "QUEENS",
    City == "College Point" ~ "QUEENS",
    City == "Jackson Heights" ~ "QUEENS",
    City == "Maspeth" ~ "QUEENS",
    City == "Long Island City" ~ "QUEENS",
    City == "Rockaway Park" ~ "QUEENS",
    City == "Sunnyside" ~ "QUEENS",
    City == "Woodhaven" ~ "QUEENS",
    City == "Floral Park" ~ "QUEENS",
    City == "Glen Oaks" ~ "QUEENS",
    City == "Queens Village" ~ "QUEENS",
    City == "Bellerose" ~ "QUEENS",
    City == "Little Neck" ~ "QUEENS",
    City == "Richmond Hill" ~ "QUEENS",
    City == "Rosedale" ~ "QUEENS",
    City == "Cambria Heights" ~ "QUEENS",
    City == "New Hyde Park" ~ "QUEENS",
    City == "Breezy Point" ~ "QUEENS",
  )
)

mold_data_clean <- mold_data_clean %>% filter(!is.na(Borough))
mold_data_clean<- mold_data_clean %>% filter(Year != 2025)

mold_data_clean <- mold_data_clean %>%
  mutate(
    created_date = as.Date(`Created Date`),
    closed_date  = as.Date(`Closed Date`),
    resolution_days = as.numeric(closed_date - created_date)
  )

kable(head(mold_data_clean, 3))
Unique KeyCreated DateYearMonthDayClosed DateAgencyAgency NameComplaint TypeDescriptorLocation TypeIncident ZipIncident AddressStreet NameCross Street 1Cross Street 2Intersection Street 1Intersection Street 2Address TypeCityLandmarkFacility TypeStatusDue DateResolution DescriptionResolution Action Updated DateCommunity BoardBBLBoroughX Coordinate (State Plane)Y Coordinate (State Plane)Open Data Channel TypePark Facility NamePark BoroughVehicle TypeTaxi Company BoroughTaxi Pick Up LocationBridge Highway NameBridge Highway DirectionRoad RampBridge Highway SegmentLatitudeLongitudeLocationCreated_Date_Originalcreated_dateclosed_dateresolution_days
635736952024-12-31T23:06:14.000202412 - December31T23:06:14.0002025-01-19T11:58:36.000HPDDepartment of Housing Preservation and DevelopmentMoldMOLDRESIDENTIAL BUILDING10451283 EAST 149 STREETEAST 149 STREETNANANANAADDRESSBRONXNANAClosedNAHPD conducted an inspection of this complaint. The conditions observed by the inspector did not violate the housing laws enforced by HPD. The complaint has been closed.2025-01-19T00:00:00.00001 BRONX2023310072BRONX1005910236991PHONEUnspecifiedBRONXNANANANANANANA40.81713-73.92175(40.81713468822815, -73.9217467475281)2024-12-31T23:06:14.0002024-12-312025-01-1919
635834602024-12-31T21:19:39.000202412 - December31T21:19:39.0002025-01-03T16:24:36.000HPDDepartment of Housing Preservation and DevelopmentMoldMOLDRESIDENTIAL BUILDING104682719 MORRIS AVENUEMORRIS AVENUENANANANAADDRESSBRONXNANAClosedNAHPD inspected this condition so the complaint has been closed. Violations were issued. The law provides the property owner time to correct the condition(s). Violation descriptions and the dates for the property owner to correct any violations are available at HPDONLINE. If the owner has not corrected the condition by the date provided, you may wish to bring a case in housing court seeking the correction of these conditions.To find out more about how to start a housing court case, visit HPD’s w2025-01-03T00:00:00.00007 BRONX2033170043BRONX1013153255563PHONEUnspecifiedBRONXNANANANANANANA40.86809-73.89550(40.86808863174485, -73.89549921281306)2024-12-31T21:19:39.0002024-12-312025-01-033
635834082024-12-31T20:55:28.000202412 - December31T20:55:28.0002025-01-10T16:55:45.000HPDDepartment of Housing Preservation and DevelopmentMoldMOLDRESIDENTIAL BUILDING1143585-15 139 STREET139 STREETNANANANAADDRESSJAMAICANANAClosedNAHPD conducted an inspection of this complaint. The conditions observed by the inspector did not violate the housing laws enforced by HPD. The complaint has been closed.2025-01-10T00:00:00.00008 QUEENS4097100002QUEENS1034989197495PHONEUnspecifiedQUEENSNANANANANANANA40.70861-73.81699(40.7086094282564, -73.8169884642595)2024-12-31T20:55:28.0002024-12-312025-01-1010
kable(head(dv_data_clean, 5))
complaint_numberYearMonthDayinc_occur_timeinc_end_dateinc_end_timeprecinct_occurreport_datekey_codeoffense_typeclass_codeclass_code_descattempt_completionoffense_levelboroughoccur_locationpremise_descjuris_code_descjurisdictionpark_occurdevelopmentdevelopment_codex_coordy_coordsuspect_agesuspect_racesuspect_sextransit_districtLatitudeLongitudeLat_Lonpatrol_boroughstation_namevictim_agevictim_racevictim_sexcomplaint
298690828202412 - December31T00:00:00.00013:00:002024-12-31T00:00:00.00013:10:001132024-12-31T00:00:00.000344ASSAULT 3 & RELATED OFFENSES101ASSAULT 3COMPLETEDMISDEMEANORQUEENSINSIDERESIDENCE-HOUSEN.Y. POLICE DEPT0(null)(null)NA1046104187464<18BLACKFNA40.68101-73.77699(40.681014, -73.776991)PATROL BORO QUEENS SOUTH(null)<18BLACKMDV
298698016202412 - December31T00:00:00.00008:00:002024-12-31T00:00:00.00009:00:001162024-12-31T00:00:00.000344ASSAULT 3 & RELATED OFFENSES101ASSAULT 3COMPLETEDMISDEMEANORQUEENSINSIDERESIDENCE-HOUSEN.Y. POLICE DEPT0(null)(null)NA104802817897025-44BLACKMNA40.65769-73.77013(40.657687, -73.770132)PATROL BORO QUEENS SOUTH(null)18-24WHITE HISPANICFDV
298704508202412 - December31T00:00:00.00016:50:002024-12-31T00:00:00.00016:56:001072024-12-31T00:00:00.000344ASSAULT 3 & RELATED OFFENSES101ASSAULT 3COMPLETEDMISDEMEANORQUEENSINSIDERESIDENCE - APT. HOUSEN.Y. POLICE DEPT0(null)(null)NA1050645203097UNKNOWNBLACKFNA40.72389-73.76046(40.723891, -73.760464)PATROL BORO QUEENS SOUTH(null)65+WHITEMDV
298678676202412 - December31T00:00:00.00007:00:002024-12-31T00:00:00.00007:30:001132024-12-31T00:00:00.000344ASSAULT 3 & RELATED OFFENSES101ASSAULT 3COMPLETEDMISDEMEANORQUEENSINSIDERESIDENCE-HOUSEN.Y. POLICE DEPT0(null)(null)NA105147818993625-44BLACKMNA40.68776-73.75759(40.687762, -73.757589)PATROL BORO QUEENS SOUTH(null)45-64BLACKMDV
298672417202412 - December31T00:00:00.00002:50:002024-12-31T00:00:00.00002:55:001012024-12-31T00:00:00.000344ASSAULT 3 & RELATED OFFENSES101ASSAULT 3COMPLETEDMISDEMEANORQUEENSINSIDERESIDENCE-HOUSEN.Y. POLICE DEPT0(null)(null)NA105407515743645-64BLACKMNA40.59854-73.74856(40.598536, -73.74856)PATROL BORO QUEENS SOUTH(null)45-64BLACKFDV

This chunk deals with some heavy cleaning of large datasets; I am mostly standardizing column names, values, filling in NA values for Borough (using the City column), and getting rid of otherwise NA values. I also separated the dates so that Year, Month, and Day could be individually utilized in the project.

1.1.2 Aggregating Mold Data & DV Data

aggregated_dv_data <- dv_data_clean %>%
  group_by(Year, Month, borough) %>%
  summarise(
    `complaint` = n(),
    .groups = "drop"
  )
aggregated_dv_data <- aggregated_dv_data %>%
  filter(borough != "(null)")

aggregated_dv_data <- aggregated_dv_data %>%
  mutate(Year = as.character(Year))

aggregated_dv_data<- aggregated_dv_data %>% rename(
  "Borough" = "borough")

aggregated_dv_data<- aggregated_dv_data %>% rename(
  "DV Reports" = "complaint"
)

aggregated_mold_data <- mold_data_clean %>%
  group_by(Year, Month, Borough) %>%
  summarise(
  `Complaint Type` = n(),
  .groups = "drop"
)

aggregated_mold_data<- aggregated_mold_data %>% rename(
  "Mold Complaints" = "Complaint Type"
)

aggregated_dv_mold_data <- left_join(
  aggregated_dv_data,
  aggregated_mold_data,
  by = c("Borough", "Year", "Month")
)

aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
  mutate(
    year_month = paste(Year, Month)
  )

aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
  group_by(Borough) %>%
  mutate(time_index = row_number()) %>%
  ungroup()

kable(head(aggregated_dv_mold_data, 15))
YearMonthBoroughDV ReportsMold Complaintsyear_monthtime_index
201001 - JanuaryBRONX9109542010 01 - January1
201001 - JanuaryBROOKLYN13067792010 01 - January1
201001 - JanuaryMANHATTAN5414102010 01 - January1
201001 - JanuaryQUEENS7913152010 01 - January1
201001 - JanuarySTATEN ISLAND154582010 01 - January1
201002 - FebruaryBRONX8187382010 02 - February2
201002 - FebruaryBROOKLYN9386512010 02 - February2
201002 - FebruaryMANHATTAN4243382010 02 - February2
201002 - FebruaryQUEENS6032732010 02 - February2
201002 - FebruarySTATEN ISLAND151292010 02 - February2
201003 - MarchBRONX9699412010 03 - March3
201003 - MarchBROOKLYN11498702010 03 - March3
201003 - MarchMANHATTAN5004152010 03 - March3
201003 - MarchQUEENS7583952010 03 - March3
201003 - MarchSTATEN ISLAND147552010 03 - March3

In this chunk, I aggregated the domestic violence reports and mold complaints datasets into one, grouped by month and borough, in order to easily analyze trends.

1.2 Exploring the Data

1.2.1 Domestic Violence Data

1.2.1.1 Summary Stats

dv_summary <- aggregated_dv_data %>%
  summarise(
    total_reports = sum(`DV Reports`),
    start_year = min(as.numeric(Year)),
    end_year = max(as.numeric(Year)),
    boroughs = n_distinct(Borough),
    avg_monthly = mean(`DV Reports`)
  )

kable(dv_summary)
total_reportsstart_yearend_yearboroughsavg_monthly
669136201020245743.4844

From the beginning of the year in 2010 to the end of the year in 2024, there were a total of 669,136 domestic violence incidents reported across all five boroughs in New York City. Without grouping by year or borough, there are roughly 743 domestic violence incident reports filed each month.

1.2.1.2 Borough/Year Distribution

dv_by_year_borough <- aggregated_dv_data %>%
  group_by(Year, Borough) %>%
  summarise(total_reports = sum(`DV Reports`)) %>%
  pivot_wider(names_from = Year,
    values_from = total_reports)

kable(dv_by_year_borough)
Borough201020112012201320142015201620172018201920202021202220232024
BRONX112741144112207126301292412594129161256913662129431201312499141931414914905
BROOKLYN140941434415232150321502214101137711305213039122501125612362132861318013652
MANHATTAN608161286471640966276555679165866550686461177160719468807215
QUEENS872691829128942594678906869787339047932487789368102051048412071
STATEN ISLAND201520562414224223642178211220362048186816871871209921262259

From 2010 to 2024, Brooklyn and Bronx have consistently had the highest amount of domestic violence reports in NYC. Brooklyn was the borough with the most amount of reports since 2010, but sometime in 2018, Bronx took the title for highest reported incidents and has been the borough with the highest number of reports since then. Staten Island has had the lowest reported incidents each year.

For the most part, domestic violence incident reports have risen across the boroughs consistently from 2010-2024. However, Brooklyn is the only borough in which the number of reported incidents are lower in 2024 than they were in 2010. (This could be something interesting to look into!)

1.2.1.3 Heat Map

library(ggplot2)

dv_plot_data <- aggregated_dv_data %>%
  group_by(Year, Borough) %>%
  summarise(total_reports = sum(`DV Reports`))

ggplot(dv_plot_data, aes(x = Year, y = Borough, fill = total_reports)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "lightblue", high = "darkred") +
  labs(
    title = "DV Reports by Borough and Year",
    x = "Year",
    y = "Borough",
    fill = "DV Reports"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45))

Above is a heat map of domestic violence incident reports from 2010-2024. This reflects what was observed in the previous table; The Bronx and Brooklyn typically tend to have a higher volume of reports, and Staten Island has stayed largely below the average.

1.2.2 Mold Exposure Data

1.2.3 Summary Stats

mold_summary <- aggregated_mold_data %>%
  summarise(
    total_complaints = sum(`Mold Complaints`),
    start_year = min(as.numeric(Year)),
    end_year = max(as.numeric(Year)),
    boroughs = n_distinct(Borough),
    avg_monthly = mean(`Mold Complaints`)
  )

kable(mold_summary)
total_complaintsstart_yearend_yearboroughsavg_monthly
388293201020245431.4367

From 2010-2025, there have been a total of 412,698 mold complaints in residential buildings reported across all five boroughs in New York City. Without grouping by year or borough, there are roughly 431 residential mold complaints made to 311 every month.

1.2.4 Borough/Year Distributions

mold_by_year_borough <- aggregated_mold_data %>%
  group_by(Year, Borough) %>%
  summarise(total_complaints = sum(`Mold Complaints`)) %>%
  pivot_wider(names_from = Year,
    values_from = total_complaints)

kable(mold_by_year_borough)
Borough201020112012201320142015201620172018201920202021202220232024
BRONX79738726738477788295843479467510951064605953937895131143812847
BROOKLYN824410404837392999280907477007908974763605397819982651038911092
MANHATTAN397744033683384047875062439540314942311929855027534766606757
QUEENS325839073205329033513328287528243569240620273125334742294736
STATEN ISLAND615852730757747688720691864529504777721960770

From 2010 to 2024, Brooklyn and Bronx seem to have the highest complaints of mold in residential buildings, and they compete for first place. Staten Island was found to have the lowest number of residential mold complaints to 311 each year.

Across all 5 boroughs, there are more mold complaints in 2024 than there were in 2010, with the overall trend being an increase in 311 complaints for residential mold.

1.2.5 Heat Map

mold_plot_data <- aggregated_mold_data %>%
  group_by(Year, Borough) %>%
  summarise(total_complaints = sum(`Mold Complaints`))

ggplot(mold_plot_data, aes(x = Year, y = Borough, fill = total_complaints)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "wheat", high = "darkgreen") +
  labs(
    title = "Mold Complaints by Borough and Year",
    x = "Year",
    y = "Borough",
    fill = "Mold Complaints"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45))

Above is a heat map of residential mold complaints to 311 from 2010-2025. Similar in density to the domestic violence report heat map; The Bronx and Brooklyn seem to have a higher volume of complaints, and Staten Island has stayed largely below the average.

1.2.6 Preliminary Correlation

cor.test(aggregated_dv_mold_data$`DV Reports`, 
         aggregated_dv_mold_data$`Mold Complaints`)
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  x and y
#> t = 43.206, df = 898, p-value < 2.2e-16
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  0.7992741 0.8418473
#> sample estimates:
#>       cor 
#> 0.8217037
  • Strength: 0.82 (very strong)

  • Direction: positive

  • Significance: statistically significant (p<0.05)

I ran a correlation between domestic violence reports and residential mold complaints to see if there was a substantial relationship between them, and there is! The relationship between the two is positive and very strong, suggesting that a higher amount of DV reports is associated with a higher amount of mold complaints, and a lower number of DV reports is associated with a lower amount of mold complaints. This tells us that both variables move together.

1.2.6.0.1 Let’s visualize this:
ggplot(aggregated_dv_mold_data, aes(x = `Mold Complaints`, y = `DV Reports`, color = Borough)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE, color = "gray18") +
  labs(title = "DV Reports vs Mold Complaints by Borough",
       x = "Mold Complaints",
       y = "DV Reports") +
  theme_minimal()

However, this only tells us that domestic violence reports and mold complaints co-occur, and it does not tell us anything about causality.

Let’s dive into how domestic violence reports and mold complaints develop over time!

1.3 Temporal Trends

1.3.0.1 Yearly Counts

yearly_counts_wide <- aggregated_dv_mold_data %>%
  group_by(Year) %>%
  summarise(
    `DV Reports` = sum(`DV Reports`, na.rm = TRUE),
    `Mold Complaints` = sum(`Mold Complaints`, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  pivot_longer(
    cols = c(`DV Reports`, `Mold Complaints`),
    names_to = "Type",
    values_to = "Count"
  ) %>%
  pivot_wider(
    names_from = Year,
    values_from = Count
  )

kable(yearly_counts_wide)
Type201020112012201320142015201620172018201920202021202220232024
DV Reports421904315145452457384640444334442874297644346432493985143260469774681950102
Mold Complaints240672829223375249642646026586236362296428632188741686626506271933367636202

Looking at this table, we can see that DV reports and mold complaints have staggered over the years, but overall they seem to have an increasing trend. Both variables have higher reports in 2024 than they did in 2010.

1.3.0.2 Monthly Counts

monthly_counts <- aggregated_dv_mold_data %>%
  group_by(Year, Month) %>%
  summarise(
    total_dv = sum(`DV Reports`, na.rm = TRUE),
    total_mold = sum(`Mold Complaints`, na.rm = TRUE)
  )
#> `summarise()` has grouped output by 'Year'. You can
#> override using the `.groups` argument.

kable(head(monthly_counts, 12))
YearMonthtotal_dvtotal_mold
201001 - January37022516
201002 - February29342029
201003 - March35232676
201004 - April33432251
201005 - May35831787
201006 - June38671856
201007 - July38721765
201008 - August36171958
201009 - September36181766
201010 - October35421969
201011 - November33651661
201012 - December32241833

Above is a table that separates the total domestic violence reports and mold complaints by each month per year. If we plot this, we can see how they both trend over time compared to one another:

plot_data <- aggregated_dv_mold_data %>%
  pivot_longer(
    cols = c(`DV Reports`, `Mold Complaints`),
    names_to = "Type",
    values_to = "Count"
  )

ggplot(plot_data, aes(x = time_index, y = Count, color = Type)) +
  geom_line(linewidth = 0.5) +
  facet_wrap(~Borough) +
  scale_color_manual(values = c("DV Reports" = "darkred", "Mold Complaints" = "darkgreen")) +
  labs(
    title = "Monthly DV and Mold Reports by Borough",
    x = "Time",
    y = "Number of Reports",
    color = "Report Type"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45)
  )

Here, we have line plots of domestic violence reports and residential mold complaints per month from January of 2010 to December of 2024 (faceted by borough). We can see similar peaks across the boroughs (especially The Bronx and Brooklyn).

So, how exactly does mold exposure relate to domestic violence reports over time?

We’ve established that a relationship exists between the two variables themselves, but we need to look closer at this data. How does domestic violence in a given borough during a given month correlate with mold complaints in the same borough during the same month, and how does those variables move together?

Month-by-Month DV vs. Mold Counts

cor.test(monthly_counts$total_dv, monthly_counts$total_mold)
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  x and y
#> t = 5.1733, df = 178, p-value = 6.155e-07
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  0.2272817 0.4822876
#> sample estimates:
#>       cor 
#> 0.3615268
  • Strength: 0.36

  • Direction: positive

  • Significance: statistically significant (p<0.05)

This is a much more realistic look at how mold complaints and DV reports move together across time! We got rid of borough size differences and were able to focus on the complaints and reports that were really happening in the same area during the same time. We can see that month-by-month DV reports and mold complaints have a moderate, positive correlation with one another and this result is statistically significant. This suggests that mold complaints and DV reports tend to coincide with each other each month.

1.3.1 Exploring Mold Resolution

Counts of mold complaints provide insight into the amount of housing issues that exist, but they do not tell us how long the residents are dealing with the exposure to mold. When investigating predictors of household stress, the length of time a complaint remains unresolved may be important.

So, let’s take a look at how resolution time may play a role in the relationship between domestic violence reports and mold complaints:

1.3.2 Quick Look at Resolution Time

res_time<- mold_data_clean %>%
  group_by(Borough) %>%
  summarise(
    avg_resolution_days = mean(resolution_days, na.rm = TRUE),
    median_days = median(resolution_days, na.rm = TRUE),
    min_days = min(resolution_days, na.rm = TRUE),
    max_days = max(resolution_days, na.rm = TRUE),
    n_complaints = n()
  ) %>%
  arrange(desc(avg_resolution_days))

kable(res_time)
Boroughavg_resolution_daysmedian_daysmin_daysmax_daysn_complaints
MANHATTAN20.60248110309069015
QUEENS18.42648120418449477
STATEN ISLAND17.6776911080510925
BRONX15.02155100975129145
BROOKLYN14.03738903980129731

Right away, we see a huge variation in the amount of days it has taken for a residential mold complaint to be resolved. Some reports are addressed as early as the same day, while others can take years to be fully resolved. The average amount of days it takes for a complaint to be resolved varies per borough, but it is roughly between 14 to 21 days, or 2 to 3 weeks.

1.3.3 Average Resolution Delay per Month

mold_monthly_resolution <- mold_data_clean %>%
  group_by(Year, Month) %>%
  summarise(
    avg_resolution_days = mean(resolution_days, na.rm = TRUE),
    n_complaints = n()
  ) %>%
  arrange(Year, Month)
#> `summarise()` has grouped output by 'Year'. You can
#> override using the `.groups` argument.

kable(head(mold_monthly_resolution, 12))
YearMonthavg_resolution_daysn_complaints
201001 - January20.314742516
201002 - February16.095712029
201003 - March15.599632676
201004 - April14.462462251
201005 - May14.817571787
201006 - June14.813581856
201007 - July14.592171765
201008 - August14.015351958
201009 - September13.325781766
201010 - October18.397561969
201011 - November20.971671661
201012 - December26.945901833

Above, we see the monthly average amount of days it took to resolve a mold complaint in 2010. This closely follows our averages from the previous table, with December as a bit of an outlier.

1.3.4 Lagged Data

dv_mold_lagged <- aggregated_dv_mold_data %>%
  arrange(Borough, time_index) %>%
  group_by(Borough) %>%
  mutate(DV_next_month = lead(`DV Reports`, n = 1)) %>%
  ungroup()

kable(head(dv_mold_lagged, 12))
YearMonthBoroughDV ReportsMold Complaintsyear_monthtime_indexDV_next_month
201001 - JanuaryBRONX9109542010 01 - January1818
201002 - FebruaryBRONX8187382010 02 - February2969
201003 - MarchBRONX9699412010 03 - March3875
201004 - AprilBRONX8757982010 04 - April4940
201005 - MayBRONX9405762010 05 - May51015
201006 - JuneBRONX10155822010 06 - June61043
201007 - JulyBRONX10435532010 07 - July7970
201008 - AugustBRONX9705282010 08 - August8983
201009 - SeptemberBRONX9835342010 09 - September9914
201010 - OctoberBRONX9145622010 10 - October10959
201011 - NovemberBRONX9595132010 11 - November11878
201012 - DecemberBRONX8786942010 12 - December121052

Because psychological effects related to mold exposure may not develop immediately, I wanted to explore whether there are delayed temporal patterns between mold complaints and domestic violence reports. Specifically, if mold complaints increase in one month, could this be associated with higher levels of domestic violence in the following month?

To examine this, I created a lagged dataset that matches residential mold complaints from one month with domestic violence incidents reported in the following month.

1.3.4.0.1 Let’s visualize what this looks like:
ggplot(dv_mold_lagged, aes(x = `Mold Complaints`, y = DV_next_month)) +
  geom_point(alpha = 0.5, color = "darkgreen") +
  geom_smooth(method = "lm", color = "darkred") +
  labs(
    title = "Next Month DV Reports vs Current Month Mold Complaints",
    x = "Mold Complaints (Current Month)",
    y = "DV Reports (Next Month)"
  ) +
  theme_minimal()

These two variables still look very closely related! But, does time passing really have anything to do with it?

Let’s conduct some statistical tests to dig deeper!

1.4 Statistical Analysis

1.4.0.1 Lagged Data Correlation Analysis

cor.test(dv_mold_lagged$DV_next_month, dv_mold_lagged$`Mold Complaints`)
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  x and y
#> t = 41.32, df = 893, p-value < 2.2e-16
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  0.7865287 0.8316650
#> sample estimates:
#>       cor 
#> 0.8102952

I conducted a Pearson’s correlation test to examine the relationship between mold complaints in one month and DV reports the following month.

  • Strength: 0.81 (very strong)

  • Direction: positive

  • Significance: statistically significant (p<0.05)

The results show a very strong positive correlation, suggesting that months with higher mold complaint counts are associated with higher domestic violence reports in the following month. However, this result is similar to the basic correlation between mold complaints and DV reports (conducted earlier on) and does not account for other factors that might influence the relationship, such as borough.

To better understand how additional variables (such as borough and average resolution time) affect this association, I conducted regression analyses.

1.5 Regression Models

1.5.0.1 DV ~ Mold

aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
  mutate(Borough = factor(Borough)) %>%
  arrange(Year, Month)

lm_dv_mold <- lm(`DV Reports` ~ `Mold Complaints`,
  data = aggregated_dv_mold_data)

summary(lm_dv_mold)
#> 
#> Call:
#> lm(formula = `DV Reports` ~ `Mold Complaints`, data = aggregated_dv_mold_data)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -922.11 -177.91  -14.37  167.41  609.43 
#> 
#> Coefficients:
#>                    Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)       306.12047   12.25800   24.97   <2e-16 ***
#> `Mold Complaints`   1.01374    0.02346   43.21   <2e-16 ***
#> ---
#> Signif. codes:  
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 207.4 on 898 degrees of freedom
#> Multiple R-squared:  0.6752, Adjusted R-squared:  0.6748 
#> F-statistic:  1867 on 1 and 898 DF,  p-value: < 2.2e-16
AIC(lm_dv_mold)
#> [1] 12160.34

This linear regression model tests the association between monthly residential mold complaints and domestic violence reports across all 5 boroughs and all time periods.

  • Strength: strong (R^2 = 0.67)

  • Direction: positive

  • Significance: statistically significant (p<0.05)

  • AIC: 12160.34

Results suggest a strong and statistically significant positive association between mold complaints and DV reports. On average, months with higher numbers of mold complaints are associated with higher numbers of reported domestic violence incidents. However, this model does not account for differences across boroughs or temporal patterns.

1.5.0.2 DV ~ Mold + Borough

lm_borough <- lm(
  `DV Reports` ~ `Mold Complaints` + Borough,
  data = aggregated_dv_mold_data
)

summary(lm_borough)
#> 
#> Call:
#> lm(formula = `DV Reports` ~ `Mold Complaints` + Borough, data = aggregated_dv_mold_data)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -257.79  -42.87   -3.50   37.93  432.04 
#> 
#> Coefficients:
#>                        Estimate Std. Error t value Pr(>|t|)
#> (Intercept)           931.33548   15.54248  59.922  < 2e-16
#> `Mold Complaints`       0.19574    0.01975   9.909  < 2e-16
#> BoroughBROOKLYN        59.10721    9.02091   6.552 9.56e-11
#> BoroughMANHATTAN     -452.89589   11.17680 -40.521  < 2e-16
#> BoroughQUEENS        -198.79959   12.56259 -15.825  < 2e-16
#> BoroughSTATEN ISLAND -768.91015   15.80207 -48.659  < 2e-16
#>                         
#> (Intercept)          ***
#> `Mold Complaints`    ***
#> BoroughBROOKLYN      ***
#> BoroughMANHATTAN     ***
#> BoroughQUEENS        ***
#> BoroughSTATEN ISLAND ***
#> ---
#> Signif. codes:  
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 85.58 on 894 degrees of freedom
#> Multiple R-squared:  0.9449, Adjusted R-squared:  0.9446 
#> F-statistic:  3069 on 5 and 894 DF,  p-value: < 2.2e-16
AIC(lm_borough)
#> [1] 10571.03

This linear regression model tests the association between monthly residential mold complaints and domestic violence reports within each borough rather than across the city.

  • Strength: very strong (R^2 = 0.94)

  • Direction: positive

  • Significance: statistically significant (p<0.05)

  • AIC: 10571.03

The association between mold complaints and DV reports remains positive and statistically significant. Including boroughs greatly increases the R^2, showing that much of the variation in DV reports is explained by differences between boroughs rather than mold alone. The borough coefficients compare each borough to the Bronx and highlight that DV reporting levels differ substantially across boroughs.

1.5.0.3 DV ~ Mold + Borough + Average Resolution Days

aggregated_dv_mold_data <- aggregated_dv_mold_data %>%
  left_join(mold_monthly_resolution, by = c("Year", "Month"))

lm_resolution_borough <- lm(
  `DV Reports` ~ `Mold Complaints` + Borough + avg_resolution_days,
  data = aggregated_dv_mold_data
)

summary(lm_resolution_borough)
#> 
#> Call:
#> lm(formula = `DV Reports` ~ `Mold Complaints` + Borough + avg_resolution_days, 
#>     data = aggregated_dv_mold_data)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -258.31  -43.84   -2.01   38.16  403.40 
#> 
#> Coefficients:
#>                        Estimate Std. Error t value Pr(>|t|)
#> (Intercept)           994.96968   17.64318  56.394  < 2e-16
#> `Mold Complaints`       0.17645    0.01944   9.078  < 2e-16
#> BoroughBROOKLYN        59.17001    8.78658   6.734 2.95e-11
#> BoroughMANHATTAN     -459.33996   10.92506 -42.045  < 2e-16
#> BoroughQUEENS        -207.33753   12.29650 -16.862  < 2e-16
#> BoroughSTATEN ISLAND -781.57967   15.49694 -50.434  < 2e-16
#> avg_resolution_days    -3.03676    0.43241  -7.023 4.30e-12
#>                         
#> (Intercept)          ***
#> `Mold Complaints`    ***
#> BoroughBROOKLYN      ***
#> BoroughMANHATTAN     ***
#> BoroughQUEENS        ***
#> BoroughSTATEN ISLAND ***
#> avg_resolution_days  ***
#> ---
#> Signif. codes:  
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 83.35 on 893 degrees of freedom
#> Multiple R-squared:  0.9478, Adjusted R-squared:  0.9475 
#> F-statistic:  2704 on 6 and 893 DF,  p-value: < 2.2e-16
AIC(lm_resolution_borough)
#> [1] 10524.65

This linear regression model continues to test the association while incorporating resolution time of mold complaints.

  • Strength: very strong (R^2 = 0.95)

  • Direction: negative

  • Significance: statistically significant (p<0.05)

  • AIC: 10539.76

Even after controlling for borough and resolution time, mold complaints remain a statistically significant predictor of DV reports. Average resolution days show a statistically significant negative association with DV reports, suggesting that months with longer resolution delays are associated with fewer reported DV incidents. While this may intuitively feel like the opposite of the expected result, this could be due to many factors, such as a lack of borough/community relationship. For instance, this relationship may have been lower in times where city officials took longer to respond to mold complaints. If there was a low borough/community relationship, it is possible community members felt less confident in filing domestic violence reports, out of lack of perceived resources.

Overall, the final linear regression model is our best predictive model for domestic violence reports. With R^2 of 0.95, and the lowest AIC out of the three regression models (10539.76), this model best supports the hypothesis that domestic violence reports can be predicted by residential mold complaints in the same area at the same time.

1.6 Discussion & Insights

Overall, the results show a consistent positive association between residential mold complaints and domestic violence reports. Simple correlations suggest that months with more mold complaints tend to have more DV reports as well. This pattern appears at both yearly and monthly levels.

When borough differences are accounted for in regression models, the relationship between mold complaints and DV reports still remains statistically significant, but weaker. This suggests that while borough-level differences explain much of the variation, mold complaints still have an independent association with DV reports.

Adding average mold resolution time shows that longer resolution delays are associated with lower DV report counts. This may be due to reporting behavior or service engagement rather than a direct effect.

Lagged analyses were used to test whether mold complaints in one month are related to DV reports in the following month. Although the lagged relationship remains positive, it closely resembles the non-lagged results, suggesting that the results may not largely be due to time.

In summary, the analysis suggests a consistent positive relationship between residential mold complaints and domestic violence reports, but borough-level differences and other contextual factors appear to drive much of the variation, highlighting the complexity of environmental and social influences on public health outcomes. In the future, I would like to look at neighborhood-specific trends, or DV/mold rates instead of counts as populations vary across boroughs.

Annotate

Next Chapter
2 Beating Around the Bush: Uncovering the Hidden Link Between Urban Trees and Wildlife Activity
PreviousNext
Analyst Case Studies
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org