7 Social Infrastructure & Well-Being

Author: Jonah Dratfield

NYC Open Data provides numerous datasets about the city “as part of an initiative to improve the accessibility, transparency, and accountability of City government.” My presentation focuses on how public data can help ordinary citizens better understand—and potentially improve—the quality of life in New York City. While my analysis centers around two pre-existing data sets and a relationship between them, it focuses, as much, on how future data collection can be improved to better address the aforementioned goal of holistic improvement.

Many NYC Open Data datasets, such as 311 service request logs, provide valuable information for policymakers, administrators, or individuals with substantial financial or political power. However, these datasets are often difficult for ordinary residents to act upon. The majority of New Yorkers, for example, do not have the capacity to meaningfully influence the housing market.

That said, there are certain types of information that (i) can be directly acted upon by individuals and (ii) can be translated into concrete, low-barrier actions. The field of positive psychology, which consistently finds that strong social relationships are the most reliable predictors of well-being, provides one such framework for identifying this information. One, when considering this area of research, might ask the following:

Can publicly available data be used to explore the conditions that best facilitate social connectedness, and thereby, most enhance quality of life?

The answer, at the moment, is a tentative yes. At present, NYC Open Data does not include the validated measures psychologists typically use to assess metrics like social connectedness and well-being. Instead, researchers and citizens must rely on rough proxies — such as economic metrics. However, over time, the number of resources amenable to the type of analysis I propose can be expanded.

In this exploratory analysis, I examine whether the number of permitted events in a community district (i.e., gatherings, such as street fairs, that require city permits) predicts the number of monthly SNAP recipients in a community district (i.e., low-income individuals who receive benefits that can be used to purchase food). (Note: The acronym SNAP stands for Supplemental Nutrition Assistance Program). I conceptualize permitted events as a rough measure of social connectedness and number of SNAP recipients per month as a rough measure of economic health and, thereby, overall well-being. Yet, rather than treat these variables as definitive measures, I use them as an opportunity to demonstrate how lucrative this mode of research can be. I conclude, also, with a number of suggestions as to how data collection in this field can best be facilitated.

7.1 Libraries Used

library(tidyverse)
#> ── Attaching core tidyverse packages ──── tidyverse 2.0.0 ──
#> ✔ dplyr     1.1.4     ✔ readr     2.1.5
#> ✔ forcats   1.0.1     ✔ stringr   1.6.0
#> ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
#> ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
#> ✔ purrr     1.2.0     
#> ── Conflicts ────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(nycOpenData)
library(dplyr)
library(stringr)

7.2 Data Loading

First, I loaded records of NYC permitted events and NYC borough community reports using the NYC Open Data package that my professor (Christian Martinez) created.

Events <- nyc_permit_events_historic(limit = 10000, filters = list())
BoroReport <- nyc_borough_community_report(limit = 10000, filters = list())

knitr::kable(
  head(Events, 25),
  caption = "First 25 rows of Events"
)

Table 7.1: First 25 rows of Events
event_agency	event_id	event_name	start_date_time	end_date_time	event_type	event_borough	event_location	street_closure_type	community_board	police_precinct
43,	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
43,	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
43,	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
43,	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
43,	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
43,	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
N/A	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
N/A	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
N/A	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
N/A	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
NA	21124	Ganando Almas Para Cristo’\|08/28/10 01:00 PM\|08/28/10 06:00 PM\|Street Activity Permit Office\|Religious Event\|Bronx\| MORRIS AVENUE between EAST 196 STREET and EAST KINGSBRIDGE ROAD\|Full\|Full Street Closure \|7, \|52, \|	NA	NA	NA	NA	NA	NA	NA	NA
67,	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
66,	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
13,	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
05,	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
60,	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
44,	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
Parks Department	886939	Summer on the Hudson Holiday on the Hudson	2026-12-05T16:30:00.000	2026-12-05T18:00:00.000	Special Event	Manhattan	West Harlem Piers: Marginal Street Between 125th 123rd St.	N/A	9,	26,
Parks Department	886939	Summer on the Hudson Holiday on the Hudson	2026-12-05T16:30:00.000	2026-12-05T18:00:00.000	Special Event	Manhattan	West Harlem Piers: Marginal Street Between 125th 123rd St.	N/A	9,	26,
Parks Department	886939	Summer on the Hudson Holiday on the Hudson	2026-12-05T16:30:00.000	2026-12-05T18:00:00.000	Special Event	Manhattan	West Harlem Piers: Marginal Street Between 125th 123rd St.	N/A	9,	26,
Parks Department	886939	Summer on the Hudson Holiday on the Hudson	2026-12-05T16:30:00.000	2026-12-05T18:00:00.000	Special Event	Manhattan	West Harlem Piers: Marginal Street Between 125th 123rd St.	N/A	9,	26,
Parks Department	886939	Summer on the Hudson Holiday on the Hudson	2026-12-05T16:30:00.000	2026-12-05T18:00:00.000	Special Event	Manhattan	West Harlem Piers: Marginal Street Between 125th 123rd St.	N/A	9,	26,
Parks Department	886939	Summer on the Hudson Holiday on the Hudson	2026-12-05T16:30:00.000	2026-12-05T18:00:00.000	Special Event	Manhattan	West Harlem Piers: Marginal Street Between 125th 123rd St.	N/A	9,	26,
Parks Department	899434	Junior Volunteer Corps	2026-12-05T13:00:00.000	2026-12-05T15:00:00.000	Special Event	Brooklyn	Prospect Park: Bandshell South	N/A	55,	78,
Parks Department	899434	Junior Volunteer Corps	2026-12-05T13:00:00.000	2026-12-05T15:00:00.000	Special Event	Brooklyn	Prospect Park: Bandshell South	N/A	55,	78,


knitr::kable(
  head(BoroReport, 25),
  caption = "First 25 rows of BoroReport"
)

Table 7.1: First 25 rows of BoroReport
month	borough	community_district	bc_snap_recipients	bc_snap_households	bc_ca_recipients	bc_ca_cases	bc_ma_only_enrollees	bc_total_ma_enrollees
2025-09-01T00:00:00.000	Staten_Island	S03	14469	9014	3561	2058	7510	13710
2025-09-01T00:00:00.000	Staten_Island	S02	19401	11080	4635	2612	9603	18324
2025-09-01T00:00:00.000	Staten_Island	S01	41291	22547	16929	8311	12480	37025
2025-09-01T00:00:00.000	Queens	Q14	32329	18077	14228	7051	10278	31230
2025-09-01T00:00:00.000	Queens	Q13	24177	15506	8303	4954	13676	25712
2025-09-01T00:00:00.000	Queens	Q12	53866	32609	23653	12689	22253	54341
2025-09-01T00:00:00.000	Queens	Q11	10421	6997	2037	1314	8253	11976
2025-09-01T00:00:00.000	Queens	Q10	19401	11975	5338	3009	9588	17649
2025-09-01T00:00:00.000	Queens	Q09	26116	15665	6911	4053	11865	22301
2025-09-01T00:00:00.000	Queens	Q08	23079	14038	6065	3527	12821	22782
2025-09-01T00:00:00.000	Queens	Q07	38431	25919	7700	5278	26503	40712
2025-09-01T00:00:00.000	Queens	Q06	14111	9139	2813	1685	7676	13557
2025-09-01T00:00:00.000	Queens	Q05	20634	12888	4587	2866	9926	17464
2025-09-01T00:00:00.000	Queens	Q04	28606	17742	5285	3318	13979	23492
2025-09-01T00:00:00.000	Queens	Q03	25425	15738	5632	3288	13002	22126
2025-09-01T00:00:00.000	Queens	Q02	13205	8772	4072	2577	8064	14152
2025-09-01T00:00:00.000	Queens	Q01	26630	16700	9699	5355	12358	27336
2025-09-01T00:00:00.000	Manhattan	M12	47829	33714	12617	8120	18676	40620
2025-09-01T00:00:00.000	Manhattan	M11	41971	27242	16403	10048	12306	36987
2025-09-01T00:00:00.000	Manhattan	M10	30823	21013	12758	8484	8749	26825
2025-09-01T00:00:00.000	Manhattan	M09	22012	15017	7668	4856	7413	19109
2025-09-01T00:00:00.000	Manhattan	M08	6913	5373	2295	1586	4574	8492
2025-09-01T00:00:00.000	Manhattan	M07	17042	12767	6099	4150	7977	17923
2025-09-01T00:00:00.000	Manhattan	M06	7464	5934	3252	2531	3301	7887
2025-09-01T00:00:00.000	Manhattan	M05	5531	4317	5186	2747	3064	9537

7.3 Cleaning

7.3.1 Basic Events Cleaning

After this, I removed all non-numeric characters from the community board listings in events and made the community board listings numeric.

Community boards refer to community districts within the five boroughs (and, as a result, function as geographical subdivisions of New York City). There are 59 community boards, as well as a number of so-called “joint-interest areas.” I removed non-numeric characters – such as letters, commas and quotation marks – to standardize the community board notation in the dataset.

eventscleaner <- Events %>%
  mutate(
    cd_id =
      community_board |> 
      str_replace_all("[^0-9]", "") |>  
      as.numeric()                       
  )

7.3.2 BoroReport Cleaning

In the borough report, I separated the community district field into a borough identifier and a numeric community board. I then recoded the borough identifiers as numeric prefixes and combined these with the community board numbers to create a standardized community district ID. The goal of this transformation was to make the notation in the BoroReport dataset equivalent to that in the Events dataset.

BoroReport <- BoroReport %>%
  
  mutate(
    snap_borough = str_extract(community_district, "^[A-Za-z]") |> str_to_upper(),
    snap_cb      = str_extract(community_district, "[0-9]+") |> as.numeric()
  ) %>%
  
  mutate(
    snap_borough_num = case_when(
      snap_borough == "M" ~ 100,  # Manhattan
      snap_borough == "B" ~ 200,  # Bronx
      snap_borough == "K" ~ 300,  # Brooklyn
      snap_borough == "Q" ~ 400,  # Queens
      snap_borough == "S" ~ 500,  # Staten Island
      TRUE ~ NA_real_
    ),
    
    cd_id = snap_borough_num + snap_cb
  )

7.3.3 Final Events Cleaning

Finally, I applied this same numbering pattern to the events data sheet. I replaced the borough names with numbers and added these numbers to the community districts.

eventscleaner <- eventscleaner %>%
  
  mutate(
    borough_num = case_when(
      event_borough == "Manhattan"     ~ 100,
      event_borough == "Bronx"         ~ 200,
      event_borough == "Brooklyn"      ~ 300,
      event_borough == "Queens"        ~ 400,
      event_borough == "Staten_Island" ~ 500,
      event_borough == "Staten Island" ~ 500,  
      TRUE ~ NA_real_
    ),
    
   
    cd_id = borough_num + cd_id
  )

7.4 Events Count

After this, I glanced at the number of events per community district – just to garner a better understanding of the data.

events_cd <- eventscleaner %>%
  count(cd_id, name = "n_events")

knitr::kable(
  head(events_cd, 30),
  caption = "Number of Events Per CD"
)

Table 7.2: Number of Events Per CD
cd_id	n_events
101	2
107	6
108	277
109	12
111	96
164	235
211	11
228	213
301	16
302	13
305	19
306	21
307	33
310	89
311	6
312	78
315	21
316	13
318	135
355	6290
377	24
401	48
402	140
405	628
407	151
408	747
411	392
412	84
413	22
481	80

Across community districts, the mean number of permitted events was 312.5, with a median of 63. (Note: The right skew in the data was due to the number of events in joint-interest areas. These were dropped from the later analysis, due to the lack of SNAP recipients in those areas).

I then created a graph to display the number of events per district, in descending order:

Community district numbers correspond to the final two digits shown on the y-axis.
District numbers starting with 1 indicate Manhattan.
District numbers starting with 2 indicate the Bronx.
District numbers starting with 3 indicate Brooklyn.
District numbers starting with 4 indicate Queens.
District numbers starting with 5 indicate Staten Island.
New York City has 59 community districts in total:
- Manhattan: 12 districts
- The Bronx: 12 districts
- Brooklyn: 18 districts
- Queens: 14 districts
- Staten Island: 3 districts
District numbers that do not follow this schema (for example, 55 and 64) refer to joint-interest areas rather than standard community districts.
- District 55 corresponds to Prospect Park.
- District 64 corresponds to Central Park.

A full list of community districts and joint-interest areas is available here.


events_cd %>%
  slice_max(n_events, n = 25) %>%
  ggplot(aes(x = reorder(cd_id, n_events), y = n_events)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Number of Events by Community District (Top 25)",
    x = "Community District",
    y = "Number of Events"
  ) +
  theme_minimal()

*Fig 1: Number of permitted events by community district, ordered from highest to lowest for the top 25. Each horizontal bar represents a distinct community district. Community district identifiers are displayed on the y-axis and event counts are displayed on the x-axis.*

Figure 7.1: Fig 1: Number of permitted events by community district, ordered from highest to lowest for the top 25. Each horizontal bar represents a distinct community district. Community district identifiers are displayed on the y-axis and event counts are displayed on the x-axis.

7.5 SNAP Benefits Count

I also looked over the number of SNAP recipients per district.

The table below shows the mean number of SNAP recipients per month per community district. (Note: There are not necessarily equal amounts of people per community district, so number of SNAP recipients within a given district is not a de facto indication of the proportional amount of poverty in the area. That said, it still functions as a meaningful snapshot of poverty rates).

BoroReport <- BoroReport %>%
  mutate(
    bc_snap_recipients = as.numeric(bc_snap_recipients)
  )

snap_plot_data <- BoroReport %>%
  group_by(cd_id) %>%
  summarise(
    bc_snap_recipients = mean(bc_snap_recipients,na.rm = TRUE),
    .groups = "drop"
  )

snap_plot_data %>%
  slice_max(bc_snap_recipients, n = 25) %>%
  ggplot(
    aes(
      x = reorder(cd_id, bc_snap_recipients),
      y = bc_snap_recipients
    )
  ) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(
    title = "SNAP Recipients by Community District (Top 25)",
    x = "Community District",
    y = "SNAP Recipients"
  ) +
  theme_minimal()

*Fig 2: Mean number of SNAP recipients by community district, ordered from highest to lowest for the top 25. Each horizontal bar represents a distinct community district. Community district identifiers are displayed on the y-axis and SNAP recipients are displayed on the x-axis.*

Figure 7.2: Fig 2: Mean number of SNAP recipients by community district, ordered from highest to lowest for the top 25. Each horizontal bar represents a distinct community district. Community district identifiers are displayed on the y-axis and SNAP recipients are displayed on the x-axis.

Across community districts, the mean number of SNAP recipients per month was 28580, with a median of 25456.

7.6 Merging

Finally, I merged the two datasheets using the community district names I created earlier.

merged <- BoroReport %>%
  left_join(events_cd, by = "cd_id")

knitr::kable(
  head(merged, 25),
  caption = "First 25 rows of merged"
)

Table 7.3: First 25 rows of merged
month	borough	community_district	bc_snap_recipients	bc_snap_households	bc_ca_recipients	bc_ca_cases	bc_ma_only_enrollees	bc_total_ma_enrollees	snap_borough	snap_cb	snap_borough_num	cd_id	n_events
2025-09-01T00:00:00.000	Staten_Island	S03	14469	9014	3561	2058	7510	13710	S	3	500	503	NA
2025-09-01T00:00:00.000	Staten_Island	S02	19401	11080	4635	2612	9603	18324	S	2	500	502	NA
2025-09-01T00:00:00.000	Staten_Island	S01	41291	22547	16929	8311	12480	37025	S	1	500	501	NA
2025-09-01T00:00:00.000	Queens	Q14	32329	18077	14228	7051	10278	31230	Q	14	400	414	NA
2025-09-01T00:00:00.000	Queens	Q13	24177	15506	8303	4954	13676	25712	Q	13	400	413	22
2025-09-01T00:00:00.000	Queens	Q12	53866	32609	23653	12689	22253	54341	Q	12	400	412	84
2025-09-01T00:00:00.000	Queens	Q11	10421	6997	2037	1314	8253	11976	Q	11	400	411	392
2025-09-01T00:00:00.000	Queens	Q10	19401	11975	5338	3009	9588	17649	Q	10	400	410	NA
2025-09-01T00:00:00.000	Queens	Q09	26116	15665	6911	4053	11865	22301	Q	9	400	409	NA
2025-09-01T00:00:00.000	Queens	Q08	23079	14038	6065	3527	12821	22782	Q	8	400	408	747
2025-09-01T00:00:00.000	Queens	Q07	38431	25919	7700	5278	26503	40712	Q	7	400	407	151
2025-09-01T00:00:00.000	Queens	Q06	14111	9139	2813	1685	7676	13557	Q	6	400	406	NA
2025-09-01T00:00:00.000	Queens	Q05	20634	12888	4587	2866	9926	17464	Q	5	400	405	628
2025-09-01T00:00:00.000	Queens	Q04	28606	17742	5285	3318	13979	23492	Q	4	400	404	NA
2025-09-01T00:00:00.000	Queens	Q03	25425	15738	5632	3288	13002	22126	Q	3	400	403	NA
2025-09-01T00:00:00.000	Queens	Q02	13205	8772	4072	2577	8064	14152	Q	2	400	402	140
2025-09-01T00:00:00.000	Queens	Q01	26630	16700	9699	5355	12358	27336	Q	1	400	401	48
2025-09-01T00:00:00.000	Manhattan	M12	47829	33714	12617	8120	18676	40620	M	12	100	112	NA
2025-09-01T00:00:00.000	Manhattan	M11	41971	27242	16403	10048	12306	36987	M	11	100	111	96
2025-09-01T00:00:00.000	Manhattan	M10	30823	21013	12758	8484	8749	26825	M	10	100	110	NA
2025-09-01T00:00:00.000	Manhattan	M09	22012	15017	7668	4856	7413	19109	M	9	100	109	12
2025-09-01T00:00:00.000	Manhattan	M08	6913	5373	2295	1586	4574	8492	M	8	100	108	277
2025-09-01T00:00:00.000	Manhattan	M07	17042	12767	6099	4150	7977	17923	M	7	100	107	6
2025-09-01T00:00:00.000	Manhattan	M06	7464	5934	3252	2531	3301	7887	M	6	100	106	NA
2025-09-01T00:00:00.000	Manhattan	M05	5531	4317	5186	2747	3064	9537	M	5	100	105	NA

7.7 Linear Regression

I then conducted a linear regression to determine whether number of permitted events predicts number of SNAP recipients. The model was statistically significant, F(1, 723) = 45.34, p < .001, and explained approximately 6% of the variance in SNAP recipients (R² = .059). The number of events was a significant negative predictor of SNAP recipients, b = −21.30, SE = 3.16, t(723) = −6.73, p < .001.

(Note: The model dropped all rows with missing event counts. This means that all joint-interest areas were dropped from the analysis, as well as any months for which there was no event count data)

model1 <- lm(bc_snap_recipients ~ n_events, data = merged)
summary(model1)
#> 
#> Call:
#> lm(formula = bc_snap_recipients ~ n_events, data = merged)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -28081 -12534  -2646   8391  44572 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 30043.800    715.843  41.970  < 2e-16 ***
#> n_events      -21.303      3.164  -6.733 3.38e-11 ***
#> ---
#> Signif. codes:  
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 16210 on 723 degrees of freedom
#>   (986 observations deleted due to missingness)
#> Multiple R-squared:  0.059,  Adjusted R-squared:  0.0577 
#> F-statistic: 45.34 on 1 and 723 DF,  p-value: 3.385e-11


ggplot(
  merged,
  aes(
    x = n_events,
    y = bc_snap_recipients
  )
) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = TRUE, color = "red") +
  labs(
    title = "Permitted Events and SNAP Recipients by Community District",
    x = "Permitted Events",
    y = "SNAP Recipients"
  ) +
  theme_minimal()
#> `geom_smooth()` using formula = 'y ~ x'
#> Warning: Removed 986 rows containing non-finite outside the scale
#> range (`stat_smooth()`).
#> Warning: Removed 986 rows containing missing values or values
#> outside the scale range (`geom_point()`).

*Fig 3: Relationship between permitted events and SNAP recipients by community district. Each point represents a distinct community district, and the line shows the linear association between event counts and SNAP recipients.*

Figure 7.3: Fig 3: Relationship between permitted events and SNAP recipients by community district. Each point represents a distinct community district, and the line shows the linear association between event counts and SNAP recipients.

7.8 Conclusion

Despite the significant p-value of this analysis, there are a number of limitations. As mentioned in the introduction, the number of SNAP recipients is an imperfect measure of economic well-being (not to mention holistic well-being). Likewise, permitted events are an imperfect indicator of social gatherings in an area. At a more granular level, community districts are not normalized by population size, and major hubs of social activity—such as parks—are excluded from the regression.

However, these limitations point to ways in which data collection could be improved. Below, I outline several possibilities for instantiating such improvements:

Better Dependent Variables

To meaningfully assess quality of life in NYC, future datasets should include more varied indicators of well-being and capture outcomes across the income distribution. Ideally, validated population-level measures of well-being and social connectedness would be available for use as dependent variables. In addition, economic proxies for well-being (such as median income) should be collected. Diverse datasets of this sort would provide a more complete picture of the psychological and economic well-being of NYC residents.

More Information about Social Gatherings

Currently, NYC Open Data has information about permitted events. Yet, there are countless other social gatherings that could be quantified as well. These include volunteer opportunities, Meetup groups, Eventbrite activities, Reddit meetups, and more. While an exhaustive catalog of social gatherings is not feasible, expanded coverage of accessible, low-barrier events would strengthen any analyses of social life in the city. It would also allow analysts to subdivide events in meaningful ways.

Geographic Information

Community districts provide a useful organizational unit, but many NYC datasets lack this data. In addition, even more detailed neighborhood-level data on events might provide information about areas with a shortage (or surplus) of social activity. Identifying such areas might support more strategic intervention. Finally, knowledge of individuals’ willingness (or lack of willingness) to travel might provide yet more valuable information. The prominence of parks in the event data suggests that social life is often organized around specific hubs. The practical accessibility of these hubs is yet another concept worth exploring.

Concrete Suggestions

There is no “control New York City.” As such, causality cannot be established through the analyses I describe. Nevertheless, if evidence were to suggest that certain types of social activities were associated with positive psychological outcomes, it would then be possible to recommend concrete actions to citizens who wished to improve civic and social life in New York. In this way, improved data infrastructure could help foster a stronger sense of civic autonomy among New Yorkers – as well as a happier, healthier New York City.

Analyst Case Studies

Show the following:

Adjust appearance:

Notes