Notes
Development
Understanding the complexities of New York City’s real estate market requires more than just crunching numbers—it calls for a blend of meticulous analysis and compelling storytelling. To make my findings both accessible and engaging, I adopted a structured, data-driven approach combined with intuitive visual narratives.
The journey began by clearly defining the research objectives and scope. My focus was on uncovering how historical trends, demographic shifts, and external events—like the COVID-19 pandemic—have shaped the market. Specifically, I analyzed property prices and demand across a spectrum of real estate categories: residential, commercial, industrial, and mixed-use properties, etc. By examining data spanning the past two decades, I aimed to capture a comprehensive view of the market's evolution. Each trend, pattern, and anomaly became a vital clue in understanding the forces shaping NYC’s real estate landscape.
Research Objectives
But this research wasn’t just about collecting data—it was about finding the stories behind the numbers. Every shift in the data told a tale about the city’s growth, challenges, and transformation. My goal was to translate these findings into meaningful insights for anyone navigating NYC’s dynamic real estate market.A comprehensive analysis of NYC’s real estate market requires robust property sales data to capture the dynamics of pricing, volume, and geography. For this study, I relied on the key data sources:
Data Sources
A comprehensive analysis of NYC’s real estate market requires robust property sales data to capture the dynamics of pricing, volume, and geography. To this end, I sourced property sales data from:
- Zillow: Provided historical pricing trends for various property types.
- NYC Department of Finance: Offered detailed sales transaction records, including sale prices, sale dates, property types, and ZIP codes.
- NYC Open Data: Contributed supplementary data on property use classifications, zoning, and building sizes.
Data Preparation
From these sources, I derived critical metrics such as sale price, gross square footage, price per square foot, and transaction dates. In addition to the property data, it is also crucial to understand the people behind the property. To explore these connections, I integrated demographic data sourced from the U.S. Census Bureau, focusing on four key metrics:
- Sale price
- Gross square feet
- ZIP code
- Sale date
- Price per square foot
These metrics were critical for analyzing property values over time, understanding regional variations, and identifying market trends across NYC’s five boroughs.
Key Demographic Metrics
The analysis prioritized four key demographic metrics:
- Income Levels: Median household income to understand economic disparities and purchasing power across neighborhoods.
- Age Groups: Distribution of residents across different age categories, highlighting generational shifts in housing preferences.
- Education Levels: Percentage of residents with college degrees or higher, which often correlates with income and homeownership trends.
- Race and Ethnicity: Composition of neighborhoods to explore how cultural diversity intersects with real estate patterns.
After I gathered the raw data from the relevant data sources, the data is not in a good shape for the downstream analysis. As such, my first task was to clean and standardize these datasets. This process was essential for ensuring accuracy, consistency, and meaningful integration.
To make the property sales dataset analysis-ready, I started by converting sale prices into USD and calculating the price per square foot—a key metric—by dividing the sale price by gross square footage. Sale dates were standardized into a uniform format (mm-dd-yyyy) to enable consistent time-based analysis. Missing or incomplete data presented challenges, but these were tackled systematically:
- Missing Records: I removed entries where both sale price and square footage were absent.
- ZIP Codes: Gaps were filled using public maps or postal databases.
- Incomplete Dates: Approximations were made based on available transaction timelines.
Errors like outliers and duplicate entries were flagged and resolved, while inconsistent property classifications were standardized—for instance, merging "Residential Condo" with "Condominium." Smaller property categories were grouped into broader types to streamline analysis. To ensure geospatial accuracy, I validated the alignment of ZIP codes with their respective boroughs.
When it comes to the demographic data, I adopted a distinct approach tailored to its population-focused nature. I grouped income levels into simple brackets (e.g., $0–$10k, $10k–$15k, $15k-$25k, etc). I also standardized education categories (e.g., "High School," "Bachelor’s Degree or Higher") and created clear racial and ethnic classifications, such as "White," "Black," "Hispanic/Latino," and "Asian.". The missing data were observed across the board. For the income gaps, I filled in with median values from nearby ZIP codes or borough averages. For the incomplete age distributions, I estimated based on trends in similar neighborhoods. Discrepancies were resolved by cross-checking datasets for consistency and prioritizing the most recent and detailed sources, like census data over NYC Open Data. Geographic maps helped verify accuracy, ensuring demographic details matched their ZIP codes and boroughs.
The meticulous data preparation process was instrumental in establishing a reliable foundation for analyzing New York City’s real estate market. Through this comprehensive approach, the cleaned datasets not only reflected real-world dynamics but also enabled meaningful integration for deeper insights. By aligning property metrics such as sale price and price per square foot with demographic factors like income, education, and age, the groundwork was laid for exploring the complex interplay between real estate performance and population characteristics. This thorough preparation ensures that the subsequent analysis is both robust and credible, paving the way for uncovering patterns, relationships, and actionable insights. The next section will detail how these datasets were analyzed and visualized.