Skip to main content

The 2025 Brooklyn Open Data Collection: Analyst Portfolios: 6 Leading Causes of Death and Indoor Environmental Complaints

The 2025 Brooklyn Open Data Collection: Analyst Portfolios
6 Leading Causes of Death and Indoor Environmental Complaints
  • Show the following:

    Annotations
    Resources
  • Adjust appearance:

    Font
    Font style
    Color Scheme
    Light
    Dark
    Annotation contrast
    Low
    High
    Margins
  • Search within:
    • Notifications
    • Privacy
  • Project HomeBrooklyn Civic Data Lab
  • Projects
  • Learn more about Manifold

Notes

table of contents
  1. About
    1. 0.1 How to Use This Book
    2. 0.2 Companion Textbook
    3. 0.3 Instructor Note
    4. 0.4 Why NYC Open Data?
    5. 0.5 Contributors
    6. 0.6 Acknowledgments
    7. 0.7 How to Cite This Volume
  2. 1 Toxic Homes: Exploring Mold Exposure Complaint and Domestic Violence Report Trends in NYC
    1. 1.1 Loading, Prepping, Cleaning, & Aggregating
      1. 1.1.1 Data Preparation & Cleaning
      2. 1.1.2 Aggregating Mold Data & DV Data
    2. 1.2 Exploring the Data
      1. 1.2.1 Domestic Violence Data
      2. 1.2.2 Mold Exposure Data
      3. 1.2.3 Summary Stats
      4. 1.2.4 Borough/Year Distributions
      5. 1.2.5 Heat Map
      6. 1.2.6 Preliminary Correlation
    3. 1.3 Temporal Trends
      1. 1.3.1 Exploring Mold Resolution
      2. 1.3.2 Quick Look at Resolution Time
      3. 1.3.3 Average Resolution Delay per Month
      4. 1.3.4 Lagged Data
    4. 1.4 Statistical Analysis
    5. 1.5 Regression Models
    6. 1.6 Discussion & Insights
  3. 2 Beating Around the Bush: Uncovering the Hidden Link Between Urban Trees and Wildlife Activity
    1. 2.1 Required Packages
    2. 2.2 Data and Methods
      1. 2.2.1 Data Sources
      2. 2.2.2 Data Cleaning and Preparation
    3. 2.3 Descriptive Analysis (Plots)
      1. 2.3.1 Street Tree Distribution Across Boroughs (Bar chart)
      2. 2.3.2 Wildlife Incidents Across Boroughs (Bar chart)
      3. 2.3.3 Combining Tree and Wildlife Data at the Borough Level (Table)
      4. 2.3.4 Wildlife Incidents Relative to Street Tree Availability (Standardized bar chart / rate per 10,000 trees)
      5. 2.3.5 Spatial Distribution of Street Trees (Binned spatial density plot / heatmap)
      6. 2.3.6 Park-Level Patterns in Wildlife Incidents (Faceted horizontal bar chart)
      7. 2.3.7 Species Involved in Wildlife Incidents (Faceted horizontal bar chart)
    4. 2.4 Inferential and Exploratory Analyses
      1. 2.4.1 Differences in Average Street Tree Size Across Boroughs (One-way ANOVA)
      2. 2.4.2 Association Between Borough and Wildlife Condition (Chi-square test of independence)
      3. 2.4.3 Exploratory Relationship Between Street Tree Abundance and Wildlife Incidents (Simple linear regression)
    5. 2.5 Discussion and Implications
      1. 2.5.1 Conclusion
      2. 2.5.2 Audience & Relevance
      3. 2.5.3 Connection to Open Data
  4. 3 Environmental Stressors and Social Complaints in New York City
    1. 3.1 Research Question
    2. 3.2 Data Sources
    3. 3.3 Reproducible Workflow
    4. 3.4 Loading Downloaded Excel Datasets
    5. 3.5 Accessing NYC Open Data via API (311 Noise Complaints)
    6. 3.6 Data Cleaning and Preparation
    7. 3.7 Merging Datasets
    8. 3.8 Descriptive Statistics
    9. 3.9 Visualization 1: Flooding Complaints by Borough
    10. 3.10 Visualization 2: Flooding and Noise Complaints
    11. 3.11 Statistical Analysis
    12. 3.12 Results
    13. 3.13 Discussion
    14. 3.14 Limitations and Future Directons
    15. 3.15 Connection to Open Data
    16. 3.16 Conclusion
  5. 4 The Madison Square Garden Effect in the NBA
    1. 4.0.1 What is Madison Square Garden?
    2. 4.0.2 What makes MSG so special?
    3. 4.0.3 Is the MSG effect real?
    4. 4.0.4 Three overarching research questions:
    5. 4.1 —————————————————————————–
    6. 4.2 NBA Data Project
    7. 4.3 —————————————————————————–
    8. 4.4 Q1: Do the New York Knicks experience a special home-court advantage due to playing at MSG?
    9. 4.5 —————————————————————————–
    10. 4.6 Q2: Do visiting players play differently at MSG than other arenas?
      1. 4.6.1 For context, let’s look at the league-wide home vs. away comparisons.
      2. 4.6.2 Let’s see if visiting players play better or worse at MSG compared to other away games.
    11. 4.7 —————————————————————————–
    12. 4.8 Q3: Who benefits the most from playing at MSG?
      1. 4.8.1 Which players put up the best performances at MSG? (min = 8 games played at MSG)
      2. 4.8.2 Who steps up their game the most playing at MSG vs. other away games?
      3. 4.8.3 Let’s also look at shooting efficiency.
      4. 4.8.4 How do the stars of the NBA today perform at MSG compared to other venues?
    13. 4.9 —————————————————————————–
    14. 4.10 Conclusion: Is the MSG Effect detectable?
      1. 4.10.1 On an individual player performance level: yes.
  6. 5 NYC Restaurants and Museums
    1. 5.1 Packages
    2. 5.2 Data Loading, Cleaning, and Merging
    3. 5.3 Loading Data
    4. 5.4 Cleaning and Merging Data Sets
      1. 5.4.1 Cleaning “restaurant_rating_data” Set
    5. 5.5 Cleaning “restaurant_data” Set
    6. 5.6 Merging Data Sets
    7. 5.7 Inputting Ratings for EACH Restaurant
    8. 5.8 Deleting Restaurants Without Rating from Google
    9. 5.9 Merging “dba” and “name” Columns
    10. 5.10 Deleting Unnecessary Columns in “merged_restaurant_data” Set
    11. 5.11 Cleaning “museum_data” Set
    12. 5.12 Goal 1: Statistical analysis (higher ratings)
    13. 5.13 Creating New Column
    14. 5.14 Typing “Yes” or “No”
    15. 5.15 Binning ratings into Groups
    16. 5.16 Contingency Table
    17. 5.17 Visualizing our Data
    18. 5.18 Chi-Square Test
      1. 5.18.1 Chi=Square Interpretation
    19. 5.19 Goal 2: Statistical analysis (Restaurant Violations)
    20. 5.20 Creating New Column
    21. 5.21 Typing “None” or “Critical”
    22. 5.22 Contingency Table
    23. 5.23 Visualizing our Data
    24. 5.24 Chi-Square Test
      1. 5.24.1 Interpretation
    25. 5.25 Fisher’s Exact Test
      1. 5.25.1 Interpretation
    26. 5.26 Goal 3: Creating an interactive Map
    27. 5.27 Conclusion
    28. 5.28 References
  7. 6 Leading Causes of Death and Indoor Environmental Complaints
    1. 6.1 Loading Libraries and importing data sets
    2. 6.2 Cleaning the data sets
    3. 6.3 Looking at both data sets
    4. 6.4 Visualizations
    5. 6.5 Pairing Complaint types with Causes of Death
    6. 6.6 Process of merging data
    7. 6.7 Merged Data
    8. 6.8 Corrleation between causes of death and indoor environmental complaints
    9. 6.9 Linear Regression
    10. 6.10 Relevance and Conclusion
  8. 7 Social Infrastructure & Well-Being
    1. 7.1 Libraries Used
    2. 7.2 Data Loading
    3. 7.3 Cleaning
      1. 7.3.1 Basic Events Cleaning
      2. 7.3.2 BoroReport Cleaning
      3. 7.3.3 Final Events Cleaning
    4. 7.4 Events Count
    5. 7.5 SNAP Benefits Count
    6. 7.6 Merging
    7. 7.7 Linear Regression
    8. 7.8 Conclusion

6 Leading Causes of Death and Indoor Environmental Complaints

Author: Crystal Adote

This project examines the leading causes of death in NYC from 2007 - 2014, and indoor environmental complaints such as mold, indoor air quality, asbestos and more from 2010 - present. I want to explore each data set and see if there are any possible relationships between the 2 data sets. I will be doing this by creating visuals and running a statistical test.

6.1 Loading Libraries and importing data sets

library(tidyverse)
library(skimr)
library(readxl)
library(ggplot2)
library(knitr)
library(lubridate)

causes_of_death<- read_xlsx("New_York_City_Leading_Causes_of_Death_data.xlsx")
indoor_complaints<- read_xlsx("Indoor_Environmental_Complaints_data.xlsx")

In this section I loaded all of the packages that were used throughout the project. The 2 data sets used in this project are the ‘Leading Causes of Death’ and ‘Indoor Environmental Complaints’ data from 311 which could both be found on the NYC Open data website.

6.2 Cleaning the data sets

indoor_complaints<- select(indoor_complaints, -Incident_Address)
indoor_complaints<- select(indoor_complaints, -Incident_Address_Street_Number)
indoor_complaints<- select(indoor_complaints, -Incident_Address_Street_Name)
indoor_complaints<- select(indoor_complaints, -Incident_Address_Zip)
indoor_complaints<- select(indoor_complaints, -Complaint_Status)
indoor_complaints<- select(indoor_complaints, -Latitude)
indoor_complaints<- select(indoor_complaints, -Longitude)
indoor_complaints<- select(indoor_complaints, -`Community Board`)
indoor_complaints<- select(indoor_complaints, -`Council District`)
indoor_complaints<- select(indoor_complaints, -`Census Tract`)
indoor_complaints<- select(indoor_complaints, -BIN)
indoor_complaints<- select(indoor_complaints, -BBL)
indoor_complaints<- select(indoor_complaints, -NTA)
indoor_complaints<- select(indoor_complaints, -Deleted)
indoor_complaints<- select(indoor_complaints, -Complaint_Number)
indoor_complaints<- select(indoor_complaints, -Descriptor_1_311)
indoor_complaints<- select(indoor_complaints, -Incident_Address_Borough)
indoor_complaints$Date_Received<- year(indoor_complaints$Date_Received)
indoor_complaints<- indoor_complaints %>% rename(Year = Date_Received)
indoor_complaints<- indoor_complaints %>% rename(complaint_type = Complaint_Type_311)

causes_of_death<- select(causes_of_death, -`Death Rate`)
causes_of_death<- select(causes_of_death, -`Age Adjusted Death Rate`)
causes_of_death<- select(causes_of_death, -Sex)
causes_of_death<- select(causes_of_death, -`Race Ethnicity`)
causes_of_death<- select(causes_of_death, -Deaths)
causes_of_death<- causes_of_death %>% rename(cause_of_death = `Leading Cause`)

indoor_complaints<- indoor_complaints %>% 
  mutate(complaint_type = recode(
     complaint_type,
    "MOLD"="Mold",
    "Asbestos/Garbage Nuisance"="Garbage Nuisance",
    "LEAD"="Lead",
    "NEW YORK"="NY",
    "ASBESTOS"="Asbestos",
    "IAQ"="Indoor Air Quality"
  ))
indoor_complaints<- indoor_complaints %>% 
  filter(!complaint_type %in% c("NY", "100", "04727995"))
causes_of_death<- causes_of_death %>% 
  filter(!cause_of_death %in% c("Human Immunodeficiency Virus Disease (HIV: B20-B24)", "Intentional Self-Harm (Suicide: X60-X84, Y87.0)",
                                "Essential Hypertension and Renal Diseases (I10, I12)", "Diabetes Mellitus (E10-E14)", "Mental and Behavioral Disorders due to Accidental Poisoning and Other Psychoactive Substance Use (F11-F16, F18-F19, X40-X42, X44)",
                                "Accidents Except Drug Posioning (V01-X39, X43, X45-X59, Y85-Y86)", "All Other Causes", "Certain Conditions originating in the Perinatal Period (P00-P96)", 
                                "Chronic Liver Disease and Cirrhosis (K70, K73)", "Nephritis, Nephrotic Syndrome and Nephrisis (N00-N07, N17-N19, N25-N27)", "Alzheimer's Disease (G30)", 
                                "Assault (Homicide: Y87.1, X85-Y09)", "Congenital Malformations, Deformations, and Chromosomal Abnormalities (Q00-Q99)",
                                "Septicemia (A40-A41)", "Viral Hepatitis (B15-B19)", "Aortic Aneurysm and Dissection (I71)", "Parkinson's Disease (G20)",
                                "Tuberculosis (A16-A19)","Mental and Behavioral Disorders due to Use of Alcohol (F10)", "Insitu or Benign / Uncertain Neoplasms (D00-D48)", "Atherosclerosis (I70)"))


complaints_summary<- indoor_complaints %>% add_count(complaint_type, name = "Number of Complaints")

deaths_summary <- causes_of_death %>%
  group_by(cause_of_death) %>%
  summarise(`Number of Deaths` = n(), .groups = "drop")

Here, I cleaned the 2 data sets and took out the columns that I don’t need. I also made the complaint type names match, (e.g., “MOLD” and “Mold”) and took out “NY”, “04727995”, and “100” because they aren’t complaints/a type of complaint. I also took out many causes of death so I can focus on just 5 common/well known causes such as ‘Chronic Lower Respiratory Diseases’ for example, for easier analyses and exploration among the 2 data sets. I also added the calculated number of complaints and death as a column in each data set.

6.3 Looking at both data sets

death_causes_cont_table<- table(causes_of_death$Year, causes_of_death$cause_of_death)
death_causes_cont_table
#>       
#>        Cerebrovascular Disease (Stroke: I60-I69)
#>   2007                                        11
#>   2008                                        11
#>   2009                                        11
#>   2010                                        12
#>   2011                                        10
#>   2012                                        12
#>   2013                                        11
#>   2014                                        12
#>       
#>        Chronic Lower Respiratory Diseases (J40-J47)
#>   2007                                           11
#>   2008                                           11
#>   2009                                           11
#>   2010                                           11
#>   2011                                           12
#>   2012                                           10
#>   2013                                           11
#>   2014                                           11
#>       
#>        Diseases of Heart (I00-I09, I11, I13, I20-I51)
#>   2007                                             12
#>   2008                                             12
#>   2009                                             12
#>   2010                                             12
#>   2011                                             12
#>   2012                                             12
#>   2013                                             12
#>   2014                                             12
#>       
#>        Influenza (Flu) and Pneumonia (J09-J18)
#>   2007                                      12
#>   2008                                      12
#>   2009                                      12
#>   2010                                      12
#>   2011                                      12
#>   2012                                      12
#>   2013                                      12
#>   2014                                      12
#>       
#>        Malignant Neoplasms (Cancer: C00-C97)
#>   2007                                    12
#>   2008                                    12
#>   2009                                    12
#>   2010                                    12
#>   2011                                    12
#>   2012                                    12
#>   2013                                    12
#>   2014                                    12
enviro_complaint_cont_table<- table(indoor_complaints$Year, indoor_complaints$complaint_type)
enviro_complaint_cont_table
#>       
#>        Asbestos Cooling Tower Garbage Nuisance
#>   2010      247             0                0
#>   2011      576             0                0
#>   2012      500             0                0
#>   2013      459             0                0
#>   2014      493             0                0
#>   2015      523             0                0
#>   2016      494             0                1
#>   2017      457            14                0
#>   2018      563             0                0
#>   2019      573             0                0
#>   2020      412             0                0
#>   2021      527             0                0
#>   2022      553             0                0
#>   2023      594             0                0
#>   2024      575             0                0
#>   2025      524             0                0
#>       
#>        Indoor Air Quality Indoor Sewage Lead Mold
#>   2010               2309             0    0   64
#>   2011               4148             0    0  225
#>   2012               4149             0    0  321
#>   2013               4458             0    0  410
#>   2014               4985             0    0  439
#>   2015               4808             0    0  344
#>   2016               4349             0    1  313
#>   2017               4407           863    0  346
#>   2018               4571          1131    0  438
#>   2019               3777          1293    0  414
#>   2020               3956          1201    0  188
#>   2021               5916           238    0  291
#>   2022               5999             0    0  282
#>   2023               7026             0    0  347
#>   2024               8324             0    0  381
#>   2025               8095             0    0  381
kable(enviro_complaint_cont_table)
AsbestosCooling TowerGarbage NuisanceIndoor Air QualityIndoor SewageLeadMold
20102470023090064
201157600414800225
201250000414900321
201345900445800410
201449300498500439
201552300480800344
201649401434901313
201745714044078630346
201856300457111310438
201957300377712930414
202041200395612010188
20215270059162380291
202255300599900282
202359400702600347
202457500832400381
202552400809500381
kable(death_causes_cont_table)
Cerebrovascular Disease (Stroke: I60-I69)Chronic Lower Respiratory Diseases (J40-J47)Diseases of Heart (I00-I09, I11, I13, I20-I51)Influenza (Flu) and Pneumonia (J09-J18)Malignant Neoplasms (Cancer: C00-C97)
20071111121212
20081111121212
20091111121212
20101211121212
20111012121212
20121210121212
20131111121212
20141211121212

I created a contingency table for both data sets. For the ‘Leading Causes of Death’ data set, I looked at the year and the cause of death to see how many deaths occurred due to the specific cause each year. For example, there were 12 recorded deaths due to a heart disease in 2007.

For the ‘Indoor Environmental Complaints’ data set, I also looked at years and complaint types to see how many complaints were made each year. For example, in 2012, there were 500 complaints of asbestos filed.

6.4 Visualizations

complaint_and_year<- ggplot(indoor_complaints, aes(x=Year, fill=complaint_type))+
  geom_bar()+
  labs(
    title="Indoor Environmental Complaint Types across the Years",
    x="Year",
    y="Complaint Type",
    fill="Complaint Type"
  ) +
theme_classic()
complaint_and_year
This stacked bar graph conveys the amount of indoor environmental complaints over the years

Figure 6.1: This stacked bar graph conveys the amount of indoor environmental complaints over the years

This stacked bar graph shows the amount of different complaints that were submitted from 2010 - present. Indoor Air Quality was the most indoor environmental complaint filed every year. It makes you wonder if there could be a relationship between these complaints and causes of death.

death_counts<- causes_of_death %>% count(Year, cause_of_death)

kable(death_counts)
Yearcause_of_deathn
2007Cerebrovascular Disease (Stroke: I60-I69)11
2007Chronic Lower Respiratory Diseases (J40-J47)11
2007Diseases of Heart (I00-I09, I11, I13, I20-I51)12
2007Influenza (Flu) and Pneumonia (J09-J18)12
2007Malignant Neoplasms (Cancer: C00-C97)12
2008Cerebrovascular Disease (Stroke: I60-I69)11
2008Chronic Lower Respiratory Diseases (J40-J47)11
2008Diseases of Heart (I00-I09, I11, I13, I20-I51)12
2008Influenza (Flu) and Pneumonia (J09-J18)12
2008Malignant Neoplasms (Cancer: C00-C97)12
2009Cerebrovascular Disease (Stroke: I60-I69)11
2009Chronic Lower Respiratory Diseases (J40-J47)11
2009Diseases of Heart (I00-I09, I11, I13, I20-I51)12
2009Influenza (Flu) and Pneumonia (J09-J18)12
2009Malignant Neoplasms (Cancer: C00-C97)12
2010Cerebrovascular Disease (Stroke: I60-I69)12
2010Chronic Lower Respiratory Diseases (J40-J47)11
2010Diseases of Heart (I00-I09, I11, I13, I20-I51)12
2010Influenza (Flu) and Pneumonia (J09-J18)12
2010Malignant Neoplasms (Cancer: C00-C97)12
2011Cerebrovascular Disease (Stroke: I60-I69)10
2011Chronic Lower Respiratory Diseases (J40-J47)12
2011Diseases of Heart (I00-I09, I11, I13, I20-I51)12
2011Influenza (Flu) and Pneumonia (J09-J18)12
2011Malignant Neoplasms (Cancer: C00-C97)12
2012Cerebrovascular Disease (Stroke: I60-I69)12
2012Chronic Lower Respiratory Diseases (J40-J47)10
2012Diseases of Heart (I00-I09, I11, I13, I20-I51)12
2012Influenza (Flu) and Pneumonia (J09-J18)12
2012Malignant Neoplasms (Cancer: C00-C97)12
2013Cerebrovascular Disease (Stroke: I60-I69)11
2013Chronic Lower Respiratory Diseases (J40-J47)11
2013Diseases of Heart (I00-I09, I11, I13, I20-I51)12
2013Influenza (Flu) and Pneumonia (J09-J18)12
2013Malignant Neoplasms (Cancer: C00-C97)12
2014Cerebrovascular Disease (Stroke: I60-I69)12
2014Chronic Lower Respiratory Diseases (J40-J47)11
2014Diseases of Heart (I00-I09, I11, I13, I20-I51)12
2014Influenza (Flu) and Pneumonia (J09-J18)12
2014Malignant Neoplasms (Cancer: C00-C97)12
death_causes_and_year<- ggplot(death_counts, aes(x=Year, y=cause_of_death, fill=n))+
  geom_tile()+
  labs(
    title="Leading Causes of Death Across the Years",
    x="Year",
    y="Leading Causes of Death",
    fill="Number of Deaths"
  ) +
theme_minimal()
death_causes_and_year
This is a Heatmap that conveys 5 of the leading causes of death over the years

Figure 6.2: This is a Heatmap that conveys 5 of the leading causes of death over the years

This is a heatmap which conveys the 5 causes of death that I chose to examine for this project, just to note, these are not the top 5 leading causes of death in the data. The map shows the amount of deaths and their causes from 2007 - 2014. We can see that throughout all 7 years that data was collected, cancer, the flu and pneumonia, and diseases of the heart were consecutively the cause of the most amount of deaths. I created a table that groups the leading causes of death data by year and causes of death and records the amount of deaths happened due to those causes. Then, I used the information from that table to create the heatmap.

6.5 Pairing Complaint types with Causes of Death

pairing_death_complaints <- tribble(
  ~complaint_type,        ~cause_of_death,
  
  "Indoor Air Quality",   "Influenza (Flu) and Pneumonia (J09-J18)",
  
  "Mold",                 "Chronic Lower Respiratory Diseases (J40-J47)",
  
  "Asbestos",             "Malignant Neoplasms (Cancer: C00-C97)",

  "Lead",                 "Cerebrovascular Disease (Stroke: I60-I69)",
  "Lead",                 "Diseases of Heart (I00-I09, I11, I13, I20-I51)",
  
  "Cooling Tower",        "Influenza (Flu) and Pneumonia (J09-J18)",
  
  "Indoor Sewage",        "Viral Hepatitis (B15-B19)",
  
  "Garbage Nuisance",     "Influenza (Flu) and Pneumonia (J09-J18)"
)

I created a separate data set where I would be able to pair certain complaint types with causes of death. This data set does not convey that the complaint type is the reason for the cause of death. This is just my assumption, and should not be seen as real and/or correct information or causation.

6.6 Process of merging data

causes_of_death<- select(causes_of_death, -Year)
indoor_complaints<- select(indoor_complaints, -Year)
death_causes_labeled<- causes_of_death %>% left_join(pairing_death_complaints, by= "cause_of_death") %>% group_by(cause_of_death) %>% summarise(complaint_type = paste(unique(complaint_type),collapse = "; "),.groups = "drop")

I took out the ‘Year’ column in both data sets before starting to merge, because of the different ranges of years that each data set has. So in this analysis, we will not be examining data over time/the years due to the complication and inaccuracy that will come from the results.

6.7 Merged Data

death_and_complaints <- complaints_summary %>%
  left_join(pairing_death_complaints, by = "complaint_type") %>%
  left_join(deaths_summary, by = "cause_of_death") 

death_and_complaints<- death_and_complaints %>% select(-Year)


death_and_complaints<- death_and_complaints %>% 
  filter(
    !is.na(complaint_type),
    !is.na(cause_of_death),
    !is.na(`Number of Deaths`)
  )
kable(death_and_complaints) %>% head(15)
#>  [1] "|complaint_type     | Number of Complaints|cause_of_death                                 | Number of Deaths|"
#>  [2] "|:------------------|--------------------:|:----------------------------------------------|----------------:|"
#>  [3] "|Asbestos           |                 8070|Malignant Neoplasms (Cancer: C00-C97)          |               96|"
#>  [4] "|Indoor Air Quality |                81277|Influenza (Flu) and Pneumonia (J09-J18)        |               96|"
#>  [5] "|Asbestos           |                 8070|Malignant Neoplasms (Cancer: C00-C97)          |               96|"
#>  [6] "|Asbestos           |                 8070|Malignant Neoplasms (Cancer: C00-C97)          |               96|"
#>  [7] "|Indoor Air Quality |                81277|Influenza (Flu) and Pneumonia (J09-J18)        |               96|"
#>  [8] "|Indoor Air Quality |                81277|Influenza (Flu) and Pneumonia (J09-J18)        |               96|"
#>  [9] "|Asbestos           |                 8070|Malignant Neoplasms (Cancer: C00-C97)          |               96|"
#> [10] "|Asbestos           |                 8070|Malignant Neoplasms (Cancer: C00-C97)          |               96|"
#> [11] "|Mold               |                 5184|Chronic Lower Respiratory Diseases (J40-J47)   |               88|"
#> [12] "|Indoor Air Quality |                81277|Influenza (Flu) and Pneumonia (J09-J18)        |               96|"
#> [13] "|Asbestos           |                 8070|Malignant Neoplasms (Cancer: C00-C97)          |               96|"
#> [14] "|Asbestos           |                 8070|Malignant Neoplasms (Cancer: C00-C97)          |               96|"
#> [15] "|Indoor Air Quality |                81277|Influenza (Flu) and Pneumonia (J09-J18)        |               96|"

I was able to merge the data sets together with the mapping table that I created to pair complaints with the 5 causes of death that I chose. After I merged the data sets, I filtered out the NAs that were in some of the columns to make it easier to run statistical tests.

6.8 Corrleation between causes of death and indoor environmental complaints

death_causes_complaint_cor<- cor(death_and_complaints$`Number of Complaints`, death_and_complaints$`Number of Deaths`)
death_causes_complaint_cor
#> [1] 0.6122905

I ran a correlation test to examine if there was a relationship between the number of indoor environmental complaint types and the 5 leading causes of death that I chose to work with. After running the test, we get an r of 0.6122905, which conveys that there is a moderately positive relationship between the number of complaints and causes of death.

However, it is important to note that the merged data set has multiple repeated rows for each complaint type and cause of death. Due to this, the correlation may not be fully accurate.

6.9 Linear Regression

lm_death_and_complaints<- lm(`Number of Deaths` ~ `Number of Complaints` + cause_of_death, data=death_and_complaints)
lm_death_and_complaints
#> 
#> Call:
#> lm(formula = `Number of Deaths` ~ `Number of Complaints` + cause_of_death, 
#>     data = death_and_complaints)
#> 
#> Coefficients:
#>                                                  (Intercept)  
#>                                                    9.000e+01  
#>                                       `Number of Complaints`  
#>                                                    1.068e-16  
#>   cause_of_deathChronic Lower Respiratory Diseases (J40-J47)  
#>                                                   -2.000e+00  
#> cause_of_deathDiseases of Heart (I00-I09, I11, I13, I20-I51)  
#>                                                    6.000e+00  
#>        cause_of_deathInfluenza (Flu) and Pneumonia (J09-J18)  
#>                                                    6.000e+00  
#>          cause_of_deathMalignant Neoplasms (Cancer: C00-C97)  
#>                                                    6.000e+00

I created a linear regression to examine if the number of indoor environmental complaints could predict the amount of deaths for different causes of death. The linear regression shows that number of complaints is not a predicting factor for causes of death, and that the differences in the different leading causes of death is more due to the actual cause of death. Although I did not find a promising predicting effect, this linear regression helped to show us that there may not be a relationship with indoor environmental complaints and leading causes of death. Overall, the differences in number of deaths are more explained by the cause of death (e.g., heart diseases, chronic lower respiratory diseases, etc.)

Once again, the merged data has repetitions in both the complaint type column and the cause of death column, so the results from this linear regression model should not be strongly interpreted.

6.10 Relevance and Conclusion

This topic is important to the general community because it shed light to indoor environmental hazards that individuals file complaints about. It also sheds a little light on the leading causes of death and could make people wonder if there is a relationship between indoor environmental hazards and leading causes of death in NYC. From analyzing our data a little bit, we were able to see that Indoor Air quality was the most complained about over the last 15 years. That is very important to know because it is a problem that doesn’t seem to have been getting better over the years, meaning that it needs to be brought to the public’s attention and reach policy makers to show them that it is a ongoing problem/complaint and something needs to be done about it. I chose to look at 5 leading causes of death out of the 26 causes that were provided in this data set. The reason I did this was to look at some of the more common and possibly well known (compared to other) causes and try to see if there could possibly be a relationship between the different complaint types and those 5 causes of death. Once again, to note, I paired the causes of death with the complaint type myself, meaning that it is not a solid fact that there is causation among this analysis.

Annotate

Next Chapter
7 Social Infrastructure & Well-Being
PreviousNext
Analyst Case Studies
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org