This story is part of a collaboration with The Texas Observer, with support by the Pulitzer Center.

In Texas and New Mexico, state regulators are already on the hook for cleaning up more than 7,000 orphaned oil and gas wells, which can leak contaminants into nearby air and water and even become significant sources of methane emissions — contributing to climate change even though they’re no longer producing fuel.  

Our reporting shows that, in these two major oil-producing states, the true scale of oil well abandonment is likely far greater than the official numbers: Approximately 12,000 Texas wells are nearly statistically indistinguishable from the more than 6,000 already on the state’s rolls. Those additional wells are likely to become officially abandoned within the next four years, tripling the state’s current cleanup cost estimate. All in all, the Lone Star State could be looking at a bill of just under $1 billion. And in New Mexico, about 421 wells appear to be no different than the state’s 687 already-orphaned wells.

To estimate these figures, Grist and the Texas Observer conducted statistical modeling to understand the difference between the wells that the states currently consider to be merely inactive — that is, no longer producing oil or gas but likely to be revived by their operators in the future — and wells they consider abandoned or orphaned, which will likely require public funds to clean up.

Many inactive wells on the states’ lists have not produced oil or gas for years, and we wanted to determine whether any of these wells should be considered abandoned — in other words, whether they have characteristics nearly identical to “officially” abandoned wells but are not considered abandoned because they do not meet states’ formal definitions. To do that, we enlisted machine learning.

Here’s what that means from a technical standpoint: Statistically, we considered the problem of differentiating between inactive wells and abandoned wells to be a classification problem with a partially mislabeled training set. We assumed that inactive wells and abandoned wells would be largely separable based on identifiable characteristics, but upon training a machine classifier, we expected a small proportion of wells labeled by the states as “inactive” to be statistically indistinguishable from wells they’ve labeled as “abandoned.”

Well age (years)



Time inactive (months)



Age of operator’s business (years)



Number of enforcement violations



Cleanup deposit paid



Relative price of oil/gas*



The above application represents a limited version of our statistical model of Texas oil wells. All variables not seen here have been set to their mean values.
(*This variable, in standard deviation units, represents the difference between oil/gas prices and their dataset averages.)
Clayton Aldern / Grist

To build our datasets, we joined the states’ official inactive well lists (a subset of which are already considered abandoned) to a set of data on wells and well operators gathered from agency, state, and federal sources. To collect this information, we filed public records requests with the Texas Railroad Commission and the New Mexico Oil Conservation Division and queried public databases from the Texas Workforce Commission, Texas Real Estate Research Center, New Mexico Workforce Connection, and the Federal Reserve. Each row in the final datasets (roughly 100,000 rows for Texas and 4,500 rows for New Mexico) contains a unique inactive well, each of which is characterized by variables like its type (oil or gas), depth, age, oversight district, projected plugging cost, and history of operation, inspections, and violations. To help determine the likelihood of abandonment, we also considered aggregate control variables like interest rates, county populations, unemployment rates, oil- and gas- firm counts, and normalized gas and oil prices at the initial date of well inactivity. Due to differences in variable availability — enforcement data were easier to come by in Texas, for example — we built a separate model for each state.

To begin to identify the characteristics that separate inactive and abandoned wells into distinct classes of data, we conducted a series of initial exploratory analyses. We found that, relative to inactive wells, abandoned wells tend to be shallower and have more registered violations with enforcement agencies. They have also been inactive for more time. Their operators tend to be younger organizations with less money committed to the state for future cleanup costs.

After building a series of logistic regression models — to understand variance both within and between counties and districts — we constructed final state models using the LASSO, a modified form of logistic regression that helps eliminate unimportant variables from the model during fitting. In a portion of the Texas dataset that was new to our model, our final model specification was able to correctly identify abandoned wells approximately 90 percent of the time and identify inactive wells about 86 percent of the time. In New Mexico, the analogous figures were 94 percent and 90 percent, respectively. In other words, our model was able to identify wells that the states would slot into the “abandoned” and “inactive” categories with a high degree of accuracy.

Given these encouraging results, we turned our attention to the question of misclassification — figuring out which wells the model considered abandoned, but the state did not — and whether the errors made sense. We ran our entire dataset through the same routines and then examined the classification responses. Of the approximately 100,000 oil and gas wells in Texas, the model categorized about 12,000 wells the state considers to be inactive as abandoned, in addition to the roughly 6,000 that both the state and model agree are abandoned. In New Mexico, the model categorized about 400 inactive wells as abandoned, in addition to the state’s approximately 680.

After conducting a series of additional statistical tests to understand the distributional differences between the misclassified wells and the state-labeled inactive and abandoned wells, we concluded the wells misclassified by the model had characteristics that by and large were indistinguishable from wells the states already considers abandoned — except along a few very specific dimensions. Most notably, these wells have been inactive for less time than the states’ official abandoned wells, leading us to hypothesize that the only reason these wells do not appear on the abandoned well lists is because their operators have not yet satisfied the states’ delinquency criteria or otherwise declared bankruptcy. In other words, we hypothesize that the 12,000 additional Texas wells identified by the model as abandoned represent wells that are likely to be abandoned in the near future.

In order to estimate when the 12,000 wells may end up in the state’s hands, we determined that the median difference in length of inactivity between the wells on the state’s official list and those identified by the model is about 50 months, or a little over four years. It’s fair to say, then, that we might expect those 12,000 wells to be abandoned within the next four years. (A naïve survival analysis of existing operators with inactive wells supported this theory; about 10 to 12 percent of past operators became delinquent in 50 months.)

To forecast the public costs of these potential future abandonments, we used the wells’ American Petroleum Institute identification numbers to perform lookups in state databases of plugging cost projections. While numerous independent analyses have indicated that plugging costs calculated by regulators are often an underestimate, we relied on the states’ numbers in order to arrive at a conservative estimate.

In a separate analysis, we also found that abandonment rates appear to be sensitive to oil prices, with annual rates rising dramatically when prices drop to about $50 per barrel and lower. In particular, to understand the tipping points below which abandonment rates might rise, we grouped state-level abandonment rates by year and compared them to the average oil price at the initial date the wells in question stopped producing (adjusted for inflation to 2019 dollars). Using the “segmented” package of the programming language R, we performed a “broken-line” analysis to construct a piecewise linear regression of abandonment rate onto oil price.

The modeling methodology was independently reviewed by Lucas Merrill Brown, a software engineer now with the United States Digital Service, and Daniel Grzenda, a staff data scientist at the University of Chicago. Data, code, and an R Markdown file are available at our GitHub repository.

Interviews with state regulators suggest that our findings are identifying oil and gas companies that are likely to abdicate their environmental responsibilities. The model identified about two dozen operators that were either facing lawsuits or enforcement action by the New Mexico State Land Office, which oversees oil and gas activity in the state. “A lot of these names are familiar to us,” said Ari Biernoff, general counsel at the land office. “Some of them are not, and that’s very intriguing and something that we need to take a closer look at.”

This story is part of the project “Waves of Abandonment,” a collaboration between Grist and The Texas Observer, an Austin-based nonprofit news organization that strives to make Texas a more equitable place by exposing injustice through investigative journalism, narrative storytelling, and cultural coverage. The project was supported by the Pulitzer Center.