[1] "e2510a23b0c3dba6ac091ea554e6f2605dbde651"
Lab 1: Census Data Quality for Policy Decisions
Evaluating Data Reliability for Algorithmic Decision-Making
Assignment Overview
Scenario
You are a data analyst for the Philadelphia Department of Human Services. The department is considering implementing an algorithmic system to identify communities that should receive priority for social service funding and outreach programs. Your supervisor has asked you to evaluate the quality and reliability of available census data to inform this decision.
Drawing on our Week 2 discussion of algorithmic bias, you need to assess not just what the data shows, but how reliable it is and what communities might be affected by data quality issues.
Learning Objectives
- Apply dplyr functions to real census data for policy analysis
- Evaluate data quality using margins of error
- Connect technical analysis to algorithmic decision-making
- Identify potential equity implications of data reliability issues
- Create professional documentation for policy stakeholders
Submission Instructions
Submit by posting your updated portfolio link on Canvas. Your assignment should be accessible at your-portfolio-url/labs/lab_1/
Make sure to update your _quarto.yml navigation to include this assignment under an “Labs” menu.
Part 1: Portfolio Integration
Create this assignment in your portfolio repository under an labs/lab_1/ folder structure. Update your navigation menu to include:
- text: Assignments
menu:
- href: labs/lab_1/your_file_name.qmd
text: "Lab 1: Census Data Exploration"
If there is a special character like a colon, you need use double quote mark so that the quarto can identify this as text
Setup
State Selection: I have chosen Pennsylvania for this analysis because: For consistency.
Part 2: County-Level Resource Assessment
2.1 Data Retrieval
Your Task: Use get_acs() to retrieve county-level data for your chosen state.
Requirements: - Geography: county level - Variables: median household income (B19013_001) and total population (B01003_001)
- Year: 2022 - Survey: acs5 - Output format: wide
Hint: Remember to give your variables descriptive names using the variables = c(name = "code") syntax.
# Write your get_acs() code here
pa <- get_acs(geography = "county", variables = c(totpop = "B01003_001", medinc = "B19013_001"),
state = "PA", survey = "acs5", year = 2022, output = "wide")
# Clean the county names to remove state name and "County"
# Hint: use mutate() with str_remove()
pa_clean <- pa %>%
mutate(
county_name = str_remove(NAME, " County, Pennsylvania")
)
# Display the first few rows
head(pa_clean)# A tibble: 6 × 7
GEOID NAME totpopE totpopM medincE medincM county_name
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
1 42001 Adams County, Pennsylvania 104604 NA 78975 3334 Adams
2 42003 Allegheny County, Pennsylva… 1245310 NA 72537 869 Allegheny
3 42005 Armstrong County, Pennsylva… 65538 NA 61011 2202 Armstrong
4 42007 Beaver County, Pennsylvania 167629 NA 67194 1531 Beaver
5 42009 Bedford County, Pennsylvania 47613 NA 58337 2606 Bedford
6 42011 Berks County, Pennsylvania 428483 NA 74617 1191 Berks
2.2 Data Quality Assessment
Your Task: Calculate margin of error percentages and create reliability categories.
Requirements: - Calculate MOE percentage: (margin of error / estimate) * 100 - Create reliability categories: - High Confidence: MOE < 5% - Moderate Confidence: MOE 5-10%
- Low Confidence: MOE > 10% - Create a flag for unreliable estimates (MOE > 10%)
Hint: Use mutate() with case_when() for the categories.
# Calculate MOE percentage and reliability categories using mutate()
pa_reliability <- pa_clean %>%
mutate(
moe_percentage = round((medincM / medincE) * 100, 2),
reliability = case_when(
moe_percentage < 5 ~ "High Confidence",
moe_percentage >= 5 & moe_percentage <= 10 ~ "Moderate",
moe_percentage > 10 ~ "Low Confidence"
)
)
# Create a summary showing count of counties in each reliability category
# Hint: use count() and mutate() to add percentages
pa_reliability %>%
count(reliability) %>%
mutate(
percent = round(n / sum(n) * 100, 2)
)# A tibble: 2 × 3
reliability n percent
<chr> <int> <dbl>
1 High Confidence 57 85.1
2 Moderate 10 14.9
2.3 High Uncertainty Counties
Your Task: Identify the 5 counties with the highest MOE percentages.
Requirements: - Sort by MOE percentage (highest first) - Select the top 5 counties - Display: county name, median income, margin of error, MOE percentage, reliability category - Format as a professional table using kable()
Hint: Use arrange(), slice(), and select() functions.
# Create table of top 3 counties by MOE percentage
pa_reliability %>%
arrange(desc(moe_percentage)) %>%
slice(1:5) %>%
select(county_name, medincE, moe_percentage, reliability) %>%
# Format as table with kable() - include appropriate column names and caption
kable(col.names = c(
"County",
"Median Household Income",
"MOE Percentage (%)",
"Reliability Category"),
caption = "Top 5 Pennsylvania Counties with Highest Income Estimate Uncertainty"
)| County | Median Household Income | MOE Percentage (%) | Reliability Category |
|---|---|---|---|
| Forest | 46188 | 9.99 | Moderate |
| Sullivan | 62910 | 9.25 | Moderate |
| Union | 64914 | 7.32 | Moderate |
| Montour | 72626 | 7.09 | Moderate |
| Elk | 61672 | 6.63 | Moderate |
Data Quality Commentary:
[Counties such as Forest, Sullivan, and Union show relatively high uncertainty in median household income estimates, which means algorithms relying on this data may make less reliable decisions for these areas. This could lead to misallocation of resources or misclassification in models used for funding, eligibility, or planning. Higher uncertainty in these counties is likely driven by small populations, rural characteristics, and limited survey samples, which increase margins of error in ACS estimates.]
Part 3: Neighborhood-Level Analysis
3.1 Focus Area Selection
Your Task: Select 2-3 counties from your reliability analysis for detailed tract-level study.
Strategy: Choose counties that represent different reliability levels (e.g., 1 high confidence, 1 moderate, 1 low confidence) to compare how data quality varies.
# Use filter() to select 2-3 counties from your county_reliability data
# Store the selected counties in a variable called selected_counties
selected_counties <- pa_reliability %>%
group_by(reliability) %>%
slice(1)
# Display the selected counties with their key characteristics
# Show: county name, median income, MOE percentage, reliability category
print(selected_counties) %>%
select(county_name, medincE, moe_percentage, reliability)# A tibble: 2 × 9
# Groups: reliability [2]
GEOID NAME totpopE totpopM medincE medincM county_name moe_percentage
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 42001 Adams County… 104604 NA 78975 3334 Adams 4.22
2 42023 Cameron Coun… 4536 NA 46186 2605 Cameron 5.64
# ℹ 1 more variable: reliability <chr>
# A tibble: 2 × 4
# Groups: reliability [2]
county_name medincE moe_percentage reliability
<chr> <dbl> <dbl> <chr>
1 Adams 78975 4.22 High Confidence
2 Cameron 46186 5.64 Moderate
Comment on the output: [Using filter() to select a certain number of observations requires you to have a basic understanding of your data frame.One county with the highest MOE percentage is selected from each reliability group.]
3.2 Tract-Level Demographics
Your Task: Get demographic data for census tracts in your selected counties.
Requirements: - Geography: tract level - Variables: white alone (B03002_003), Black/African American (B03002_004), Hispanic/Latino (B03002_012), total population (B03002_001) - Use the same state and year as before - Output format: wide - Challenge: You’ll need county codes, not names. Look at the GEOID patterns in your county data for hints.
# Define your race/ethnicity variables with descriptive names
race_vars <- c(
white = "B03002_003", # white alone
black = "B03002_004", # Black/African American
his_lat = "B03002_012", # Hispanic/Latino
totpop = "B03002_001" # total population
)
# Use get_acs() to retrieve tract-level data
# Census API bug, failed at Forest and Pike County
# To avoid the Census API bug, choose Adams and Cameron County
county_fip <- c("42001", "42023") # Adams and Cameron
tract_data <- get_acs(
geography = "tract",
variables = race_vars,
state = "PA",
year = 2022,
survey = "acs5",
output = "wide"
) %>%
# Clean county names
mutate(
County_clean1 = str_extract(NAME, "(?<=;).*?(?=;)") # extract whatever is between two ;
) %>%
mutate(
County_clean = str_extract(County_clean1, "(?<=\\s)\\S+(?=\\s)") # extract whatever is between two spaces
)
# Hint: You may need to specify county codes in the county parameter
# filter for Adams and Cameron County
selected_tract <- tract_data %>%
filter(County_clean == c("Adams", "Cameron"))
# Calculate percentage of each group using mutate()
# Create percentages for white, Black, and Hispanic populations
pct_race <- selected_tract %>%
mutate(
pct_white = round(whiteE / totpopE, 3) * 100,
pct_black = round(blackE / totpopE, 3) * 100,
pct_his_lat = round(his_latE / totpopE, 3) * 100
)
# Add readable tract and county name columns using str_extract() or similar
pct_race_clean <- pct_race %>%
mutate(
TRACT = str_extract(NAME, "\\d+.?\\d+" ),
COUNTY = str_extract(NAME, "(?<=;)\\s[^ ]+")
)3.3 Demographic Analysis
Your Task: Analyze the demographic patterns in your selected areas.
# Find the tract with the highest percentage of Hispanic/Latino residents
# Hint: use arrange() and slice() to get the top tract
pct_race_clean %>%
arrange(desc(pct_his_lat)) %>%
slice(1)# A tibble: 1 × 17
GEOID NAME whiteE whiteM blackE blackM his_latE his_latM totpopE totpopM
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 420010315… Cens… 2856 314 107 74 816 265 3908 292
# ℹ 7 more variables: County_clean1 <chr>, County_clean <chr>, pct_white <dbl>,
# pct_black <dbl>, pct_his_lat <dbl>, TRACT <chr>, COUNTY <chr>
# Calculate average demographics by county using group_by() and summarize()
# Show: number of tracts, average percentage for each racial/ethnic group
pct_race_clean %>%
group_by(COUNTY) %>%
summarize(
n_tracts = n(),
pct_white = round(mean(pct_white, na.rm = TRUE), 1),
pct_black = round(mean(pct_black, na.rm = TRUE), 1),
pct_his_lat = round(mean(pct_his_lat, na.rm = TRUE), 1)
) %>%
# Create a nicely formatted table of your results using kable()
kable(col.names = c(
"County",
"Number of Tracts",
"Percent of White Residents (%)",
"Percent of Black Residents (%)",
"Percent of Hispanic or Latino Residents (%)"
),
caption = "Average Demographis by County"
)| County | Number of Tracts | Percent of White Residents (%) | Percent of Black Residents (%) | Percent of Hispanic or Latino Residents (%) |
|---|---|---|---|---|
| Adams | 14 | 87.6 | 1.2 | 8.0 |
| Cameron | 1 | 88.6 | 0.0 | 2.9 |
Part 4: Comprehensive Data Quality Evaluation
4.1 MOE Analysis for Demographic Variables
Your Task: Examine margins of error for demographic variables to see if some communities have less reliable data.
Requirements: - Calculate MOE percentages for each demographic variable - Flag tracts where any demographic variable has MOE > 15% - Create summary statistics
# Calculate MOE percentages for white, Black, and Hispanic variables
# Hint: use the same formula as before (margin/estimate * 100)
pct_race_clean <- pct_race_clean %>%
mutate(
MOE_white = round(whiteM / whiteE * 100, 2),
MOE_black = round(blackM / blackE * 100, 2),
MOE_his_lat = round(his_latM / his_latE * 100, 2)
)
# Create a flag for tracts with high MOE on any demographic variable
# Use logical operators (| for OR) in an ifelse() statement
flag_tract <- pct_race_clean %>%
mutate(
MOE_issue = if_else(
MOE_white > 15 | MOE_black > 15 | MOE_his_lat > 15,
TRUE,
FALSE
)
)
# Create summary statistics showing how many tracts have data quality issues
flag_tract %>%
summarise(
tracts_with_MOE_issues = sum(MOE_issue, na.rm = TRUE)
)# A tibble: 1 × 1
tracts_with_MOE_issues
<int>
1 15
4.2 Pattern Analysis
Your Task: Investigate whether data quality problems are randomly distributed or concentrated in certain types of communities.
# Group tracts by whether they have high MOE issues
# Categorize MOE > 100 as "high", MOE <= 100 as "low"
flag_tract <- flag_tract %>%
mutate(
MOE_category = case_when(
MOE_white > 100 | MOE_black > 100 | MOE_his_lat > 100 ~ "high",
MOE_white <= 100 | MOE_black <= 100 | MOE_his_lat <= 100 ~ "low",
)
)
# Calculate average characteristics for each group:
# - population size, demographic percentages
# Use group_by() and summarize() to create this comparison
flag_tract %>%
group_by(MOE_category) %>%
summarise(
avg_pop = round(mean(totpopE, na.rm = TRUE), 0),
avg_pct_white = round(mean(pct_white, na.rm = TRUE), 1),
avg_pct_black = round(mean(pct_black, na.rm = TRUE), 1),
avg_pct_hispanic = round(mean(pct_his_lat, na.rm = TRUE), 1)
) %>%
# Create a professional table showing the patterns
kable(
col.names = c(
"MOE Issue Category",
"Average Population Size",
"Average percentage of white population (%)",
"Average percentage of black population (%)",
"Average percentage of hispanic or latino population (%)"
),
caption = "Average Demographic Characteristics for Tracts with Different Levels of MOE Issues"
)| MOE Issue Category | Average Population Size | Average percentage of white population (%) | Average percentage of black population (%) | Average percentage of hispanic or latino population (%) |
|---|---|---|---|---|
| high | 2896 | 89.7 | 0.7 | 5.9 |
| low | 4645 | 82.0 | 2.5 | 12.3 |
Pattern Analysis: [Tracts with high MOE issues have a smaller population size and more demographically homogeneous, with a larger share of white presence and smaller share of black and hispanic or latino population. Tracts with smaller population size have smaller survey samples. More homogeneous tracts have limited variation across demographic categories. Both of the cases increase statistical uncertainty and inflate margins of error.]
Part 5: Policy Recommendations
5.1 Analysis Integration and Professional Summary
Your Task: Write an executive summary that integrates findings from all four analyses.
Executive Summary Requirements: 1. Overall Pattern Identification: What are the systematic patterns across all your analyses? 2. Equity Assessment: Which communities face the greatest risk of algorithmic bias based on your findings? 3. Root Cause Analysis: What underlying factors drive both data quality issues and bias risk? 4. Strategic Recommendations: What should the Department implement to address these systematic issues?
Executive Summary:
[Across the analysis of census data reliability at county levels in Pennsylvania, approximately 85% of the state’s 67 counties have a margin of error below 5% (“High Confidence”). Approximately 15% of all counties have a margin of error between 5% to 10% (“Moderate”).No county exhibits a margin of error exceeding 10% (“Low Confidence”), indicating that median income estimates at the county level are generally reliable. One county was selected from each MOE reliability category to conduct a tract-level analysis of data reliability. Examining the margins of error associated with racial population shares at the census tract level reveals that many tracts in Pennsylvania experience some degree of MOE-related data quality issues. On average, tracts with higher MOE issues tend to have larger population sizes and more racially homogeneous compositions, characterized by a higher proportion of White residents and smaller shares of Black and Hispanic or Latino populations. These patterns suggest that higher MOE issues are driven by both total population size and the interaction between survey sampling limitations and small subgroup populations, particularly in racially homogeneous tracts. When using census data to inform policy decisions, special care should be taken in interpreting estimates for these communities to avoid introducing statistical bias into policy design and implementation: To address systematic MOE issues, the Department should integrate data quality metrics directly into analytic and algorithmic workflows, particularly for tract-level decision-making. The following strategic recommendations outline practical steps to mitigate data uncertainty and reduce the risk of biased policy outcomes: 1. Incorporate data quality thresholds into decision-making. 2. Supplement ACS data with local or administrative data. 3. Prioritize transparency and documentation.]
6.3 Specific Recommendations
Your Task: Create a decision framework for algorithm implementation.
# Create a summary table using your county reliability data
# Include: county name, median income, MOE percentage, reliability category
# Add a new column with algorithm recommendations using case_when():
# - High Confidence: "Safe for algorithmic decisions"
# - Moderate Confidence: "Use with caution - monitor outcomes"
# - Low Confidence: "Requires manual review or additional data"
pa_recommendation <- pa_reliability %>%
mutate(
recommendation = case_when(
reliability == "High Confidence" ~ "Safe for algorithmic decisions",
reliability == "Moderate" ~ "Use with caution - monitor outcomes",
reliability == "Low Confidence" ~ "Requires manual review or additional data",
TRUE ~ NA_character_
)
) %>%
select(county_name, medincE, moe_percentage, reliability, recommendation)
# Format as a professional table with kable()
pa_recommendation %>%
kable(
col.names = c(
"County",
"Median Income",
"MOE Percentage (%)",
"MOE reliability",
"Algorithm Recommendation"
),
caption = "Algorithmic Use Recommendations Based on County-Level Data Reliability"
)| County | Median Income | MOE Percentage (%) | MOE reliability | Algorithm Recommendation |
|---|---|---|---|---|
| Adams | 78975 | 4.22 | High Confidence | Safe for algorithmic decisions |
| Allegheny | 72537 | 1.20 | High Confidence | Safe for algorithmic decisions |
| Armstrong | 61011 | 3.61 | High Confidence | Safe for algorithmic decisions |
| Beaver | 67194 | 2.28 | High Confidence | Safe for algorithmic decisions |
| Bedford | 58337 | 4.47 | High Confidence | Safe for algorithmic decisions |
| Berks | 74617 | 1.60 | High Confidence | Safe for algorithmic decisions |
| Blair | 59386 | 3.47 | High Confidence | Safe for algorithmic decisions |
| Bradford | 60650 | 3.57 | High Confidence | Safe for algorithmic decisions |
| Bucks | 107826 | 1.41 | High Confidence | Safe for algorithmic decisions |
| Butler | 82932 | 2.61 | High Confidence | Safe for algorithmic decisions |
| Cambria | 54221 | 3.34 | High Confidence | Safe for algorithmic decisions |
| Cameron | 46186 | 5.64 | Moderate | Use with caution - monitor outcomes |
| Carbon | 64538 | 5.31 | Moderate | Use with caution - monitor outcomes |
| Centre | 70087 | 2.77 | High Confidence | Safe for algorithmic decisions |
| Chester | 118574 | 1.70 | High Confidence | Safe for algorithmic decisions |
| Clarion | 58690 | 4.37 | High Confidence | Safe for algorithmic decisions |
| Clearfield | 56982 | 2.79 | High Confidence | Safe for algorithmic decisions |
| Clinton | 59011 | 3.86 | High Confidence | Safe for algorithmic decisions |
| Columbia | 59457 | 3.76 | High Confidence | Safe for algorithmic decisions |
| Crawford | 58734 | 3.91 | High Confidence | Safe for algorithmic decisions |
| Cumberland | 82849 | 2.20 | High Confidence | Safe for algorithmic decisions |
| Dauphin | 71046 | 2.27 | High Confidence | Safe for algorithmic decisions |
| Delaware | 86390 | 1.53 | High Confidence | Safe for algorithmic decisions |
| Elk | 61672 | 6.63 | Moderate | Use with caution - monitor outcomes |
| Erie | 59396 | 2.55 | High Confidence | Safe for algorithmic decisions |
| Fayette | 55579 | 4.16 | High Confidence | Safe for algorithmic decisions |
| Forest | 46188 | 9.99 | Moderate | Use with caution - monitor outcomes |
| Franklin | 71808 | 3.00 | High Confidence | Safe for algorithmic decisions |
| Fulton | 63153 | 3.65 | High Confidence | Safe for algorithmic decisions |
| Greene | 66283 | 6.41 | Moderate | Use with caution - monitor outcomes |
| Huntingdon | 61300 | 4.72 | High Confidence | Safe for algorithmic decisions |
| Indiana | 57170 | 4.65 | High Confidence | Safe for algorithmic decisions |
| Jefferson | 56607 | 3.41 | High Confidence | Safe for algorithmic decisions |
| Juniata | 61915 | 4.79 | High Confidence | Safe for algorithmic decisions |
| Lackawanna | 63739 | 2.58 | High Confidence | Safe for algorithmic decisions |
| Lancaster | 81458 | 1.79 | High Confidence | Safe for algorithmic decisions |
| Lawrence | 57585 | 3.07 | High Confidence | Safe for algorithmic decisions |
| Lebanon | 72532 | 2.69 | High Confidence | Safe for algorithmic decisions |
| Lehigh | 74973 | 2.00 | High Confidence | Safe for algorithmic decisions |
| Luzerne | 60836 | 2.35 | High Confidence | Safe for algorithmic decisions |
| Lycoming | 63437 | 4.39 | High Confidence | Safe for algorithmic decisions |
| McKean | 57861 | 4.75 | High Confidence | Safe for algorithmic decisions |
| Mercer | 57353 | 3.63 | High Confidence | Safe for algorithmic decisions |
| Mifflin | 58012 | 3.43 | High Confidence | Safe for algorithmic decisions |
| Monroe | 80656 | 3.17 | High Confidence | Safe for algorithmic decisions |
| Montgomery | 107441 | 1.27 | High Confidence | Safe for algorithmic decisions |
| Montour | 72626 | 7.09 | Moderate | Use with caution - monitor outcomes |
| Northampton | 82201 | 1.93 | High Confidence | Safe for algorithmic decisions |
| Northumberland | 55952 | 2.67 | High Confidence | Safe for algorithmic decisions |
| Perry | 76103 | 3.17 | High Confidence | Safe for algorithmic decisions |
| Philadelphia | 57537 | 1.38 | High Confidence | Safe for algorithmic decisions |
| Pike | 76416 | 4.90 | High Confidence | Safe for algorithmic decisions |
| Potter | 56491 | 4.42 | High Confidence | Safe for algorithmic decisions |
| Schuylkill | 63574 | 2.40 | High Confidence | Safe for algorithmic decisions |
| Snyder | 65914 | 5.56 | Moderate | Use with caution - monitor outcomes |
| Somerset | 57357 | 2.78 | High Confidence | Safe for algorithmic decisions |
| Sullivan | 62910 | 9.25 | Moderate | Use with caution - monitor outcomes |
| Susquehanna | 63968 | 3.14 | High Confidence | Safe for algorithmic decisions |
| Tioga | 59707 | 3.23 | High Confidence | Safe for algorithmic decisions |
| Union | 64914 | 7.32 | Moderate | Use with caution - monitor outcomes |
| Venango | 59278 | 3.45 | High Confidence | Safe for algorithmic decisions |
| Warren | 57925 | 5.19 | Moderate | Use with caution - monitor outcomes |
| Washington | 74403 | 2.38 | High Confidence | Safe for algorithmic decisions |
| Wayne | 59240 | 4.79 | High Confidence | Safe for algorithmic decisions |
| Westmoreland | 69454 | 1.99 | High Confidence | Safe for algorithmic decisions |
| Wyoming | 67968 | 3.85 | High Confidence | Safe for algorithmic decisions |
| York | 79183 | 1.79 | High Confidence | Safe for algorithmic decisions |
Key Recommendations:
Your Task: Use your analysis results to provide specific guidance to the department.
Counties suitable for immediate algorithmic implementation: [Counties classified as High Confidence—with low MOE percentages—are appropriate for immediate use in algorithmic decision-making. In these counties, median income estimates are statistically reliable, reducing the risk that automated systems will misallocate resources or misclassify need. Algorithms applied in these contexts can be expected to perform consistently, provided routine validation checks remain in place. (Adams, Allegheny, Armstrong, Beaver, Bedford, etc.)]
Counties requiring additional oversight: [Counties with Moderate Confidence data should be included in algorithmic workflows but paired with active monitoring and evaluation. In these areas, algorithmic outputs should be reviewed periodically against observed outcomes to detect potential bias or instability. Incorporating performance audits or sensitivity checks can help ensure that moderate data uncertainty does not translate into systematic policy errors. (Forest, Greene, Elk, Cameron, Carbon, etc.)]
Counties needing alternative approaches: [Counties identified as Low Confidence require alternative approaches due to high margins of error and limited data reliability. For these counties, algorithmic outputs should not be used as the sole basis for decision-making. Instead, the Department should rely on manual review, aggregated or multi-year data, supplemental administrative records, or targeted local surveys to inform policy decisions and reduce the risk of statistical bias.]
Questions for Further Investigation
- How do MOE patterns vary spatially across Pennsylvania, and are high-uncertainty tracts geographically clustered in rural, suburban, or peripheral urban areas?
- How do margins of error change over time, and are certain counties or tracts becoming more or less reliable?
Technical Notes
Data Sources: - U.S. Census Bureau, American Community Survey 2018-2022 5-Year Estimates - Retrieved via tidycensus R package on 2026-02-10
Reproducibility: - All analysis conducted in R version 4.5.2 - Census API key required for replication - Complete code and documentation available at: https://gabyxchen.github.io/PPA_Portfolio/
Methodology Notes: [Adams and Cameron Counties were randomly selected for detailed demographic analysis to represent their respective MOE risk categories, although this selection may introduce elements of randomness that could influence the results.]
Limitations: [Only one sample county was selected from each MOE risk category group, and the small sample size may limit the generalizability of the findings. Additionally, this analysis relies solely on 2022 ACS 5-year estimates, which capture conditions within a single time period. As a result, the study does not account for temporal variation or longer-term trends in demographic patterns and data reliability.]
Submission Checklist
Before submitting your portfolio link on Canvas:
Remember: Submit your portfolio URL on Canvas, not the file itself. Your assignment should be accessible at your-portfolio-url/labs/lab_1/your_file_name.html