The People's Archives: An Investigation into the Promise and Peril of Federal Open Data

Table of Contents

A Ghost in the Machine

The journey began with a simple task: verify a number.

A press release from the Environmental Protection Agency (EPA) announced a significant enforcement action against a local industrial facility, citing a specific fine for emissions violations.

As a data journalist, the next step is reflexive: find the source document, the raw data that underpins the announcement.

This should have been a straightforward trip into the heart of the U.S. government’s open data ecosystem, a system designed for precisely this kind of public accountability.

Instead, it was a descent into a bureaucratic maze.

The links in the press release led to a generic portal page.

Searching the facility’s name on the EPA’s Enforcement and Compliance History Online (ECHO) database yielded a dozen different records, none of which matched the exact details of the announcement.

Some data were locked in non-searchable PDF documents, the digital equivalent of a photograph of a spreadsheet.¹

Other records had cryptic violation codes and notes about data migration problems from state-level systems.²

After hours of painstaking “data wrangling”—the tedious process of cleaning, structuring, and piecing together messy information—a partial picture emerged, but the initial, clear-cut fact from the press release remained elusive, a ghost in the vast machine.³

This frustrating experience is a microcosm of a larger, national story.

The U.S. federal open data initiative represents one of the most ambitious attempts to democratize information in modern history.

We were promised a public library of Alexandria, a perfectly organized repository of national knowledge, accessible to all.¹

Yet what users often find is a library with missing books, censored sections, and a complex, sometimes nonsensical, card catalog.

The success of this grand democratic experiment is not guaranteed by laws or websites alone.

It exists in a constant state of tension between its powerful ideals and the persistent, systemic threats of political interference, bureaucratic inertia, and technical decay.

This report is an investigation into that tension—a journey through the aisles of this great public library to examine the architecture of its promise, the evidence of its impact, the anatomy of its failures, and what is required to defend this fragile public good.

Part I: The Grand Promise: A Library of Alexandria for the People

The foundation of the federal open data movement rests on a simple but revolutionary principle: information generated and collected by the federal government is a national asset.¹

This is not merely a technical or administrative policy; it is a philosophical and political claim that public disclosure is essential to the operation of a democracy and that citizens are entitled to the products of their government.⁶

Proponents frame the benefits in four key areas: enhancing transparency to reduce corruption, improving public services, fueling economic innovation, and increasing the efficiency of government itself.⁸

Economically, government data is treated as a “public good”—non-rivalrous (one person’s use does not diminish another’s) and non-excludable (it is available to all), much like a lighthouse guiding ships at sea.⁷

This framework explains why government, not the private sector, must serve as the primary steward of this resource.

The Historical Arc: From Reactive to Proactive Disclosure

The modern open data movement marks a fundamental shift in the philosophy of government transparency.

For decades, access was governed by a model of reactive disclosure.

Laws like the Freedom of Information Act (FOIA) empowered citizens to request information, but the government was only obligated to respond after being asked.

The new ideal is one of proactive disclosure, where public information is put online by default, accessible to anyone without needing to ask first.⁹

This evolution occurred over several decades, marked by key milestones:

Early Seeds (Pre-2009): The concept is not entirely new. The Clinton-era Office of Management and Budget (OMB) Circular A-130, issued in 1994, was one of the first official policies to assert the public’s right to access government information through emerging information technology.⁶ Subsequent laws like the E-Government Act of 2002 furthered this goal, but a comprehensive, government-wide strategy was still lacking.¹
The Obama-Era Catalyst (2009-2013): The pivotal moment arrived on President Barack Obama’s first full day in office with his “Memorandum on Transparency and Open Government”.⁶ This memo established the principles of transparency, participation, and collaboration as touchstones for his administration.¹ It was followed by the Open Government Directive (M-10-06), which was a crucial operational step. It mandated that federal agencies identify and publish at least three “high-value” datasets in open, machine-readable formats, leading directly to the launch of the central portal, Data.gov.¹¹
Policy Solidification (2013): The 2013 OMB memorandum M-13-13, “Open Data Policy—Managing Information as an Asset,” formalized the vision.⁶ It established the “open by default” principle for all government information and, critically, required agencies to create and maintain comprehensive inventories of their data assets. This was a vital step toward accountability, creating a public list of what data the government holds, whether it was public or not.¹⁴

The Capstone Legislation: The OPEN Government Data Act

The journey from idea to practice culminated in the passage of the Open, Public, Electronic, and Necessary (OPEN) Government Data Act.

It was passed as Title II of the bipartisan Foundations for Evidence-Based Policymaking Act of 2018, a landmark piece of legislation that codified the core tenets of the Obama-era policies into permanent law.¹⁵

This move from executive policy to statute was a deliberate effort to make open data a durable feature of American governance, less susceptible to the shifting priorities of any single administration.¹⁷

The Act established several core mandates:

Open by Default: It created a legal presumption that “Government data assets made available by an agency shall be published as machine-readable data…in an open format, and…under open licenses”.¹⁷ This shifted the burden of proof, forcing agencies to justify why data should
not be open, rather than the other way around.
Comprehensive Inventories: It legally requires agencies to maintain and publish a complete inventory of all their data assets, making it easier for the public and other agencies to discover what information exists.¹⁷
Centralized Portal: It enshrined Data.gov in statute as the single public interface for open government data, ensuring its continued existence.¹⁵
Public Engagement: It directs agencies to create processes for engaging with the public to help prioritize which datasets to release and to support innovative uses of that data.²⁰

The progression from a presidential memo to a bipartisan law reveals a crucial understanding within the open data movement itself.

Executive orders are powerful but fragile; they can be undone with the stroke of a pen by a subsequent administration.

The push to pass the OPEN Government Data Act was a recognition of this political vulnerability.

By embedding the principles of open data into federal law, its advocates sought to build a more resilient foundation for transparency.

However, as later sections will show, even a statute is not a perfect defense against an unwilling or under-resourced executive branch, highlighting a fundamental tension between legislative intent and executive implementation.

Part II: Building the Shelves: The Architecture of Access

At the heart of the federal open data initiative is Data.gov, the public’s front door to this vast library of information.

Launched on May 21, 2009, with a modest 47 datasets, the portal has grown exponentially to catalog nearly 300,000 datasets from over 100 different organizations.¹²

Its stated mission is to “unleash the power of government open data” to inform decisions, drive innovation, achieve agency missions, and strengthen the foundation of transparent government.²⁴

The portal is managed by the General Services Administration (GSA) and, in a move that aligns with its core philosophy, is built using open-source software like CKAN and 11ty.

This means the code behind the platform for open data is itself open for anyone to inspect, use, or adapt.¹⁵

How It Works: The “Harvesting” Model

A common misconception is that Data.gov hosts all the government’s data.

In reality, it functions more like a sophisticated card catalog for a highly decentralized library system.

The process relies on a “harvesting” model.²⁶

Under the OPEN Government Data Act, each federal agency is required to create and maintain a comprehensive inventory of its data assets.

This inventory is published in a standardized data.json file hosted on the agency’s own website (e.g., gsa.gov/data.json).²⁵

The Data.gov catalog is populated by a harvester that automatically and regularly visits these agency

data.json files and collects the metadata—the descriptive information about each dataset, such as its title, description, keywords, and, most importantly, the link to access the actual data file.²⁶

This decentralized architecture is a pragmatic solution to an immense data management challenge.

A single, centralized repository for all raw federal data would be a technical and bureaucratic behemoth.

The harvesting model cleverly distributes the responsibility for data management to the individual agencies that create, understand, and use the data best.

However, this design is also the system’s fundamental structural weakness.

Data.gov has no direct control over the quality, timeliness, or even the existence of the data to which it links.

Its integrity is entirely dependent on the compliance and competence of more than 100 different federal organizations.²³

If an agency provides a broken link, incomplete metadata, or an outdated file in its

data.json inventory, Data.gov will faithfully reflect that failure.

This explains the often-jarring user experience of navigating a sleek, modern portal only to end up at a dead end.

The problem is rarely with the central catalog, but with the individual librarians failing to maintain their collections.

The system’s architecture mirrors the political structure of the U.S. government itself: a federation of semi-autonomous entities.

Its weaknesses are therefore not just technical bugs but inherent features of American bureaucracy.

The Ecosystem of Data and Strategy

The open data ecosystem extends beyond just federal agencies.

Data.gov also harvests metadata from states, cities, counties, and tribal governments that choose to participate, creating a multi-layered public resource.¹⁵

Furthermore, specialized data communities have been established to cater to specific user groups, such as Health.Data.gov for health data,

geoplatform.gov for geospatial information, and a portal for legal materials.⁵

Guiding this entire architecture is the Federal Data Strategy (FDS), a 10-year vision for accelerating the use of data across government.²⁷

The FDS outlines 40 specific “Practices” for agencies to adopt, grouped into three categories: building a data-valuing culture, governing and protecting data, and promoting efficient data use.²⁸

This long-term strategy demonstrates a sophisticated understanding that true data-driven governance requires more than just publishing files; it requires a deep cultural and operational transformation within the government itself.

Part III: Lighting the Fires: When Data Fuels Discovery and Accountability

Despite its flaws, the federal open data initiative has produced profound and tangible successes.

When the system works, it provides the raw material for groundbreaking journalism, economic innovation, and more effective governance.

The following case studies illustrate the immense value unlocked when public data is put to use.

Case Study 1: Exposing “Sacrifice Zones” with Environmental Justice Journalism

One of the most powerful examples of open data’s impact is ProPublica’s landmark “Sacrifice Zones” investigation.²⁹

Journalists used data from the EPA’s Toxics Release Inventory (TRI), a core federal dataset where industrial facilities must report their emissions of certain toxic chemicals.

This data feeds into the EPA’s Risk-Screening Environmental Indicators (RSEI) model, which estimates the potential health risks to surrounding communities.

While the data existed, it was dense and inaccessible to the average person.

The ProPublica team analyzed five years of RSEI data and, for the first time, mapped it at a neighborhood level.

The results were stunning, revealing more than 1,000 “hot spots” of cancer-causing industrial air pollution across the country, disproportionately located in low-income and minority communities.²⁹

The investigation gave residents a view of their potential risk that was never before possible, provoked a direct response from the EPA, and empowered local advocates with hard evidence.

In a virtuous cycle, ProPublica then published its cleaned, analyzed data, making it a valuable resource for other researchers and journalists.³⁰

Case Study 2: Tracking the Money for Fiscal Transparency

The Digital Accountability and Transparency Act (DATA Act) of 2014 was designed to create unprecedented transparency in federal spending.³¹

Its primary product is USAspending.gov, the official open data source for the nearly $4 trillion in annual federal spending.³²

This portal allows journalists, watchdog groups, and the public to track federal contracts, grants, and loans, filtering by agency, recipient, location, and industry.

This data has been used to track the distribution of COVID-19 relief funds, analyze patterns in federal procurement, and expose potential waste.

It also serves an economic purpose: small businesses use the data to understand which agencies buy what products, helping them compete for federal contracts that might otherwise seem unattainable.³²

Case Study 3: Fueling the Economy with Innovation

Open data is not just a tool for accountability; it is a vital raw material for economic growth.

A 2013 McKinsey report estimated that open data could unlock $3 trillion to $5 trillion in annual economic value across seven sectors.⁸

Weather Data: The free and open data provided by the National Oceanic and Atmospheric Administration (NOAA) is the foundation of the entire private weather industry. This data powers everything from the forecast app on a smartphone to sophisticated risk-management models used by the agriculture, insurance, and construction industries.³³
GPS Data: The U.S. government’s decision to open its Global Positioning System (GPS) for civilian use created a global public utility. This single act of data liberation spawned countless industries, from logistics and precision agriculture to ride-sharing and navigation apps, fundamentally reshaping the modern economy.³⁴
Health and Education Data: The Department of Education’s College Scorecard uses open data to help students and families compare universities based on cost, graduation rates, and post-graduation earnings.³⁵ Similarly, data from Health.Data.gov has spurred the creation of tools that allow patients to compare the costs and safety records of hospitals and physicians, empowering consumer choice.⁵

Case Study 4: Improving Public Services and Crisis Response

Open data also has the power to make government itself more effective and responsive, particularly in times of crisis.

Disaster Response: During the 2010 Deepwater Horizon oil spill, the integration of open data on weather patterns, ocean currents, and geography was critical for predicting the spill’s path and directing cleanup crews effectively.³⁷ Following the 2011 earthquake in Christchurch, New Zealand, the local recovery authority’s sharing of geospatial data with construction companies was projected to save NZ$40 million by improving coordination and efficiency.³⁸
Public Health: Open data on health system performance has empowered citizens in countries from Uruguay to Burundi to demand better care from their governments.³⁴ In Singapore, the government’s publication of a real-time dengue fever cluster map allowed the public to take precautions and helped health officials target interventions, demonstrating how transparency can directly improve public health outcomes.³⁴

The following table provides a practical toolkit, summarizing key federal data portals and their demonstrated impact, transforming the narrative examples into a functional guide for civic engagement.

The Open Data Toolkit: A Guide to Key Federal Resources
Portal/Dataset	Governing Agency	Type of Data	Real-World Impact/Story
USAspending.gov ³²	Department of the Treasury	Federal contracts, grants, loans, and other financial awards	Tracking pandemic relief funds; helping small businesses win federal contracts ³³
FBI Crime Data Explorer ⁴⁰	Department of Justice / FBI	National crime statistics, including hate crimes and use-of-force data	Analyzing national and local crime trends; informing public safety policy discussions
EPA Toxics Release Inventory (TRI) ³⁰	Environmental Protection Agency	Industrial emissions of toxic chemicals	ProPublica’s “Sacrifice Zones” investigation into cancer risk hot spots ²⁹
College Scorecard ³⁶	Department of Education	University costs, graduation rates, student debt, and post-college earnings	Helping students and families make informed decisions about higher education ³⁵
NOAA Weather Data ³⁴	National Oceanic and Atmospheric Administration	Weather forecasts, climate data, satellite imagery	Fueling the private weather industry; supporting agriculture and risk management ³³
Census Bureau Data ³³	Department of Commerce	Demographic, economic, and housing data	Informing business location decisions; guiding public service allocation

Part IV: The Cracks in the Foundation: A System Under Siege

For all its successes, the federal open data ecosystem is deeply flawed and perpetually under threat.

The grand promise of a transparent, data-driven government is consistently undermined by systemic failures that range from the technical to the political.

These are not isolated incidents but recurring, structural pathologies that threaten the integrity of the entire enterprise.

1. The Quality Quagmire: “Garbage In, Garbage Out”

The most persistent and widespread complaint from data users is the poor quality of the data itself.

The principle of “garbage in, garbage out” holds true: if the underlying data is flawed, any analysis derived from it will also be flawed.⁴¹

Issues with completeness, accuracy, timeliness, and consistency are rampant across federal datasets.¹

The Government Accountability Office (GAO) has repeatedly documented these shortcomings.

A review of USAspending.gov found that while data quality had improved since the passage of the DATA Act, persistent challenges remained.

Varying agency interpretations of data standards meant that data was not always comparable across the government, hindering the very transparency the law was meant to create.³¹

Similarly, the Federal IT Dashboard, designed to track spending on major technology projects, has been criticized for relying on self-reported agency data of questionable quality and completeness.¹

Even agencies themselves acknowledge these issues.

The EPA maintains a “Known Data Problems” page for its compliance databases, which reads like a catalog of system failures: data migration problems between state and federal systems, delays in reporting, and entire categories of inspections that may not be recorded in the national database.²

This data quality quagmire erodes public trust and places an enormous “data wrangling” burden on users, effectively creating a barrier to access for any journalist, researcher, or citizen who lacks the time and technical skill to clean up the government’s messy data.³

2. The Politics of Deletion: When the Library Burns its Own Books

The most direct and alarming threat to open data is its vulnerability to political interference.

Several reports have documented instances where politically inconvenient datasets have been removed, altered, or made more difficult to find on government websites, particularly those related to climate change, environmental justice, and LGBTQ+ issues.⁴³

During the first 100 days of the second Trump administration, the Environmental Data & Governance Initiative (EDGI) documented the complete takedown of the EPA’s environmental justice website and its flagship EJScreen mapping tool.⁴⁵

This followed patterns from the first term, when the EPA’s main climate change page was redirected and left dormant for 18 months.⁴⁵

The damage extends beyond the simple deletion of files.

It represents an erosion of public trust in the integrity of all federal statistics.⁴³

Furthermore, the loss is not just of data, but of the institutional knowledge, methodologies, and expert technical staff behind its collection and maintenance—assets that cannot be easily or quickly reconstituted.⁴⁴

This threat has given rise to a movement of “data rescue” among journalists, academics, and digital archivists.

Organizations like the Internet Archive, Big Local News at Stanford, and Investigative Reporters and Editors (IRE) now scramble to download and preserve federal datasets, creating shadow archives out of fear that the official ones may disappear without warning.⁴⁵

3. The Labyrinth of Bureaucracy: Underfunded and Overwhelmed

Even with the best of intentions, federal agencies often struggle to meet their open data obligations.

Mandates like the OPEN Government Data Act frequently arrive without dedicated funding, forcing agencies to stretch already thin resources to comply.¹

In a survey of agency Chief Information Officers, many expressed support for open data but cited challenges in obtaining resources and balancing the work against other pressing priorities like cybersecurity and infrastructure modernization.¹

This problem is compounded by the government’s reliance on legacy IT systems.

Federal agencies collectively spend about 80% of their annual IT budget—over $100 billion—simply maintaining these aging systems, some of which are decades old.⁴⁸

These outdated platforms often create data silos, making it difficult and costly to extract information in modern, usable formats.

To make matters worse, the Office of Management and Budget (OMB) was years late in issuing the required implementation guidance for the OPEN Government Data Act, leaving agencies without clear marching orders on how to build their data inventories and make data open by default.²⁰

4. The Format Farce: The Tyranny of the PDF

A core tenet of open data is that it must be “machine-readable,” meaning a computer can easily process and analyze it.¹⁷

Yet agencies frequently publish data in formats like PDF, which are designed for human eyes and are notoriously difficult for computers to parse.

A dataset released as a PDF table is a form of passive-aggressive compliance; the agency can claim it has made the information public while ensuring it is nearly useless for large-scale analysis without painstaking and error-prone data extraction.¹

This practice once again raises the barrier to entry, favoring well-resourced organizations that can afford the labor to “liberate” data from its PDF prison.

The following table synthesizes these systemic failures into a coherent framework, providing a clear diagnosis of the pathologies that plague the open data ecosystem.

Anatomy of a Data Debacle: Systemic Threats to Federal Open Data
Failure Point	Description of the Problem	Real-World Example
Data Quality	Data is inaccurate, incomplete, outdated, or inconsistent, eroding trust and making analysis unreliable.	The GAO finds persistent data quality and comparability issues in USAspending.gov due to varying agency interpretations of standards.³¹
Political Interference	Datasets are removed, altered, or made inaccessible for political or ideological reasons, rather than scientific ones.	The EPA’s climate change and environmental justice websites and tools were taken down or altered during the Trump administration.⁴⁵
Resource Constraints	Agencies lack the funding, staff, and modern IT infrastructure to fully comply with open data mandates.	Agency CIOs report a lack of resources for open data initiatives, which must compete with priorities like cybersecurity.¹
Format & Standards	Data is published in non-machine-readable formats (e.g., PDF) or without consistent standards, rendering it difficult to use.	Datasets on Data.gov are often not in open formats or are updated infrequently, limiting their usefulness for analysis.¹

Part V: Case File: The Fragmented Earth

Nowhere are the tensions of the federal open data system—its promise, its fragmentation, its quality issues, and its ultimate power when synthesized—more apparent than in the realm of environmental and public health data.

This domain serves as a perfect microcosm of the entire system, illustrating how the most valuable insights emerge not from pristine government datasets, but from the difficult work of stitching together a mosaic of flawed and fragmented information.

The Challenge: A Puzzle with Missing Pieces

Environmental governance in the United States is inherently fragmented.

Authority is divided across a complex web of federal agencies (like the EPA and the Department of the Interior), state environmental departments, and local governments.⁵⁰

This “regulatory fragmentation” means that environmental data is also fragmented, collected by different entities, under different standards, and stored in disconnected silos.

This makes answering the most critical questions—such as linking a specific pollution source to a specific public health outcome—an immense challenge.

Researchers and journalists are confronted with a lack of standardized data, significant gaps in monitoring networks, and the scientific difficulty of assessing the combined effects of multiple pollutants.⁵²

Even within a single agency like the EPA, key databases like the Toxics Release Inventory (TRI) and the Enforcement and Compliance History Online (ECHO) are plagued by known issues, including data migration failures from state partners, reporting delays, and complex violation codes that are difficult for outsiders to interpret.²

This is the “quality quagmire” in action, in a field where the stakes are human health and safety.

The political dimension is starkly visible here as well.

The documented takedown of the EPA’s environmental justice tools directly impeded the public’s ability to assess risk in their communities.⁴⁵

In another case, the EPA suspended the scientist who created a key greenhouse gas database after he signed a letter critical of the administration, effectively halting updates to the government version and forcing the project into the private sector, where critics warned its credibility could be compromised.⁵⁵

The Synthesis: Geospatial Analysis as a Superpower

Faced with this fragmented and flawed data landscape, journalists and researchers have turned to a powerful set of tools to create the coherent picture the government fails to provide: geospatial data analysis and Geographic Information Systems (GIS).⁵⁶

GIS software allows an analyst to layer multiple, disparate datasets onto a single map.

One can take facility locations and emissions data from the EPA’s TRI, overlay it with demographic data on race and income from the U.S. Census Bureau, and add a third layer of public health data on disease prevalence from the Centers for Disease Control and Prevention (CDC).

This act of synthesis creates a new, more powerful dataset from the fragmented raw materials provided by the government.⁵⁸

This is precisely the technique ProPublica used in its “Sacrifice Zones” investigation to connect the dots between industrial pollution and the communities living in its shadow.³⁰

By visualizing the data spatially, journalists can identify patterns, correlations, and stories of injustice that are completely invisible in isolated spreadsheets or databases.²⁹

This case study reveals a profound truth about the open data ecosystem.

Its highest value may not lie in the publication of perfect, individual datasets, but in providing just enough raw material for external actors to perform the synthesis and analysis that government agencies are often unable or unwilling to do themselves.

The system, in effect, shifts the burden of creating coherent insight from the government to a civic-minded public of journalists, academics, and activists.

Open data is therefore not a passive resource to be consumed, but an active, often adversarial, process of reconstruction.

Conclusion: Guarding the Archives

The federal open data ecosystem is not the pristine, finished Library of Alexandria that was once envisioned.

It is a living, contested space—part functioning library, part crumbling archive, and part political battleground.

Its value is immense, but its condition is fragile.

Its shelves hold the potential for profound discovery and democratic accountability, but they are also plagued by neglect, decay, and censorship.

In this flawed reality, citizens cannot afford to be passive consumers of information.

They must become active guardians of the archives.

It is the nation’s data journalists, academic researchers, nonprofit watchdogs, and civic technologists who have become the de facto librarians and conservators of this public good.³³

They are the ones finding the hidden information in obscure databases, repairing the broken links through tedious data wrangling, archiving the vulnerable collections before they are deleted, and, most importantly, telling the human stories that give the raw numbers meaning.⁴⁷

To secure the future of this vital democratic infrastructure, action is required on multiple fronts.

Recommendations

For Policymakers: The mandates of the OPEN Government Data Act must be fully funded. The authority of agency Chief Data Officers (CDOs) must be strengthened to enforce data quality standards across their organizations. Congress should enact stronger statutory protections to prevent the politically motivated removal of scientific data. Finally, new investments are needed to modernize legacy IT systems and improve data quality at the source, rather than placing the cleanup burden on the public.
For Journalists and Researchers: Adopt a “trust but verify” approach to all government data; assume it is flawed until proven otherwise. Develop essential skills in data wrangling, cleaning, and geospatial analysis to overcome fragmentation.⁵⁷ Proactively archive any dataset you rely on using tools like the Internet Archive’s Wayback Machine.⁴⁵ Foster a collaborative environment to share the immense burden of cleaning and analyzing large, complex government datasets.
For Citizens: Use the data. Explore portals like USAspending.gov and the College Scorecard. Demand better data from public officials. Support the journalistic and nonprofit organizations that do the hard work of turning data into accountability. Participate in initiatives like the Data Foundation’s #MyDataStory campaign to demonstrate to policymakers the real-world value of these public assets in helping families, businesses, and communities make better decisions.³³

Federal open data is not ultimately a technical project; it is a democratic one.

It is not a finished product but a continuous struggle.

The quality of our public data is a direct reflection of the health of our democracy, and defending the integrity of one is essential to preserving the vitality of the other.

Works cited

Open Data and Open Government – CIO Council, accessed August 12, 2025, https://www.cio.gov/assets/resources/sofit/02.03.sofit.open.govt.open.data.pdf
Known Data Problems | ECHO | US EPA, accessed August 12, 2025, https://echo.epa.gov/resources/echo-data/known-data-problems
What is Data Wrangling? Key Steps & Benefits | Qlik, accessed August 12, 2025, https://www.qlik.com/us/data-management/data-wrangling
Data Wrangling: What It Is, Why It Matters & How to Do It – Syracuse University’s iSchool, accessed August 12, 2025, https://ischool.syracuse.edu/data-wrangling/
Data.gov | The White House, accessed August 12, 2025, https://obamawhitehouse.archives.gov/21stcenturygov/tools/data-gov
U.S. Federal Open Data Policy, accessed August 12, 2025, https://opengovdata.io/2014/us-federal-open-data-policy/
Open data for official statistics: History, principles, and implementation, accessed August 12, 2025, https://opendatawatch.com/wp-content/uploads/2021/Publications/Open-Data-for-Official-Statistics-History-Principles-Implementation.pdf
Starting an Open Data Initiative – Open Government Data Toolkit – World Bank, accessed August 12, 2025, https://opendatatoolkit.worldbank.org/en/data/opendatatoolkit/starting
Open Data Policy Guidelines – Sunlight Foundation, accessed August 12, 2025, https://sunlightfoundation.com/opendataguidelines/
Open data 101: The history and principles of open data – Part 1 – Apolitical, accessed August 12, 2025, https://apolitical.co/solution-articles/en/open-data-101-the-history-and-principles-of-open-data-part-1
Open Data Policies | US EPA, accessed August 12, 2025, https://www.epa.gov/data/open-data-policies
Data.gov – Wikipedia, accessed August 12, 2025, https://en.wikipedia.org/wiki/Data.gov
FS Policy 901-1, Open Data Policy, accessed August 12, 2025, https://fiscaldata.treasury.gov/data/about-us/901-1%20Open%20Data%20Policy.pdf
Open Data Policy | National Archives, accessed August 12, 2025, https://www.archives.gov/data
Open Government – Data.gov, accessed August 12, 2025, https://data.gov/open-gov/
OPEN Government Data Act – GovInfo, accessed August 12, 2025, https://www.govinfo.gov/content/pkg/PLAW-115publ435/html/PLAW-115publ435.htm
OPEN Government Data Act | LEARN – Data Foundation, accessed August 12, 2025, https://datafoundation.org/news/key-laws-open-data/118/118-OPEN-Government-Data-Act-
Passed into Law: OPEN Government Data Act (S. 760 / H.R. 1770) – SPARC, accessed August 12, 2025, https://sparcopen.org/our-work/open-government-data-act/
S.760 – OPEN Government Data Act 115th Congress (2017-2018), accessed August 12, 2025, https://www.congress.gov/bill/115th-congress/senate-bill/760
Open Data: Additional Action Required for Full Public Access | U.S. GAO, accessed August 12, 2025, https://www.gao.gov/products/gao-22-104574
H.R.1770 – 115th Congress (2017-2018): To expand the Government’s use and administration of data to facilitate transparency, effective governance, and innovation, and for other purposes., accessed August 12, 2025, https://www.congress.gov/bill/115th-congress/house-bill/1770
Open Data Plan | GSA, accessed August 12, 2025, https://www.gsa.gov/governmentwide-initiatives/open-gsa/open-data-plan
Data.gov Home – Data.gov, accessed August 12, 2025, https://data.gov/
data.gov, accessed August 12, 2025, https://data.gov/#:~:text=The%20United%20States%20Government’s%20open,an%20open%20and%20transparent%20government.
About Us – Data.gov, accessed August 12, 2025, https://data.gov/about/
User Guide – Data.gov, accessed August 12, 2025, https://data.gov/user-guide/
Federal Data Strategy: Welcome, accessed August 12, 2025, https://strategy.data.gov/
Practices – Federal Data Strategy, accessed August 12, 2025, https://strategy.data.gov/practices/
The Importance of Accessible Government Data in Advancing Environmental Justice – Scholarship Repository, accessed August 12, 2025, https://scholarship.law.wm.edu/cgi/viewcontent.cgi?article=1855&context=wmelpr
We’re Releasing the Data Behind Our Toxic Air Analysis – ProPublica, accessed August 12, 2025, https://www.propublica.org/article/were-releasing-the-data-behind-our-toxic-air-analysis
Data Act: Quality of Data Submissions Has Improved but Further Action Is Needed to Disclose Known Data Limitations – GAO, accessed August 12, 2025, https://www.gao.gov/products/gao-20-75
USAspending: Government Spending Open Data, accessed August 12, 2025, https://www.usaspending.gov/
MyDataStory – Data Foundation, accessed August 12, 2025, https://datafoundation.org/pages/mydatastory
Open Data’s Impact – The GovLab, accessed August 12, 2025, https://odimpact.org/
Open Data: Empowering Americans to Make Data-Driven Decisions | whitehouse.gov, accessed August 12, 2025, https://obamawhitehouse.archives.gov/blog/2016/02/05/open-data-empowering-americans-make-data-driven-decisions
cloud.gov Pages – Success Stories, accessed August 12, 2025, https://cloud.gov/pages/success-stories/
Five Examples of How Federal Agencies Use Big Data, accessed August 12, 2025, https://www.businessofgovernment.org/blog/five-examples-how-federal-agencies-use-big-data
Global Impact – Open Data’s Impact, accessed August 12, 2025, https://odimpact.org/key-findings.html
OPEN DATA IMPACT WHEN DEMAND AND SUPPLY MEET – The Governance Lab, accessed August 12, 2025, https://thegovlab.org/static/files/publications/open-data-impact-key-findings.pdf
Crime/Law Enforcement Stats (UCR Program) – FBI, accessed August 12, 2025, https://www.fbi.gov/how-we-can-help-you/more-fbi-services-and-information/ucr
Data science metaphors? : r/datascience – Reddit, accessed August 12, 2025, https://www.reddit.com/r/datascience/comments/1lvsh3e/data_science_metaphors/
Open Data Explained: Benefits, Challenges, and Industry Use Cases – Acceldata, accessed August 12, 2025, https://www.acceldata.io/blog/open-data-explained-benefits-challenges-and-industry-use-cases
A Crisis of Trust in Federal Data | Boston Indicators, accessed August 12, 2025, https://www.bostonindicators.org/article-pages/2025/march/federal-data-brief
The US government data purge is a loss for policymaking and research | Brookings, accessed August 12, 2025, https://www.brookings.edu/articles/the-us-government-data-purge-is-a-loss-for-policymaking-and-research/
Vanishing public data: How journalists can fight back – Journalism Institute, accessed August 12, 2025, https://www.pressclubinstitute.org/2025/07/22/vanishing-public-data-how-journalists-can-fight-back/
How the Loss of Public Data Compromises Scientific Integrity | AcademyHealth, accessed August 12, 2025, https://academyhealth.org/blog/2025-04/how-loss-public-data-compromises-scientific-integrity
How to Safeguard Data as Federal Databases Disappear – Education Writers Association, accessed August 12, 2025, https://ewa.org/educated-reporter/how-to-safeguard-data-as-federal-databases-disappear
Overcoming the Chaos of Data Connectivity in Government – Database Trends and Applications, accessed August 12, 2025, https://www.dbta.com/Editorial/Trends-and-Applications/Overcoming-the-Chaos-of-Data-Connectivity-in-Government-162515.aspx
White House finalizes OPEN Government Data Act guidance, restarts CDO Council, accessed August 12, 2025, https://fedscoop.com/white-house-open-government-data-act-restarts-cdo-council/
Fragmented water quality governance – Rissman Research Group, accessed August 12, 2025, https://rissman.russell.wisc.edu/wp-content/uploads/sites/281/2011/12/Wardropper-et-al.-2015-Fragmented-water-quality-governance-mapping-LUP.pdf
Regulatory Fragmentation | Kate Volkova, accessed August 12, 2025, https://www.evolkova.info/research/fragmentation/
Research on Health Effects from Air Pollution | US EPA, accessed August 12, 2025, https://www.epa.gov/air-research/research-health-effects-air-pollution
Gaps and future directions in research on health effects of air pollution – PMC, accessed August 12, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC10363432/
Air pollution and public health: emerging hazards and improved understanding of risk – PMC, accessed August 12, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC4516868/
EPA halts updates to top greenhouse gas database after scientist’s suspension – EHN, accessed August 12, 2025, https://www.ehn.org/epa-halts-updates-to-top-greenhouse-gas-database-after-scientists-suspension
Geospatial ESG – WWF Sight, accessed August 12, 2025, https://wwf-sight.org/geospatial-esg/
What is Geospatial Data? | IBM, accessed August 12, 2025, https://www.ibm.com/think/topics/geospatial-data
Geospatial Visualization of Environmental Data – EDM – ITRC, accessed August 12, 2025, https://edm-1.itrcweb.org/geospatial-visualization-of-environmental-data/
Environmental Management with GIS & Spatial Analysis – CARTO, accessed August 12, 2025, https://carto.com/solutions/environmental-management
About – Oxpeckers Investigative Environmental Journalism, accessed August 12, 2025, https://oxpeckers.org/about/
The Uproot Project Database, accessed August 12, 2025, https://uprootproject.org/members/
Data Journalist | Web – Join The Michigan Daily, accessed August 12, 2025, https://join.michigandaily.com/web/data-journalist/

The People’s Archives: An Investigation into the Promise and Peril of Federal Open Data

Genesis Value Studio

Related Posts

The Invisible Architecture: Why Understanding Social Factors is the Key to Solving Our Biggest Problems

Beyond the Pill: Why My Arthritis Treatment Failed and the “Patient Ecosystem” That Finally Brought Relief

Beyond the Pyramid: Why Everything You Know About Social Needs Is Wrong (And How an Underground Forest Network Taught Me the Truth)

The Unfinished Edifice: A Narrative Timeline of the Affordable Care Act

The Pain That Makes You Sick: My Journey Through the Chaos of Back Pain and Nausea, and the New Science That Finally Explained It All

Back Pain After Heavy Lifting: A Biomechanical, Clinical, and Psychological Analysis

Beyond the Checklist: A Battle-Tested Guide to Building Your Personal Financial Fortress