This is a copy of an old post from my personal blog, placed here so that data posts will be all in one place.
This is the second post in a series on global warming data, about the basics of U.S. EPA's eGRID database.
eGRID's home page claims that it is “the preeminent source of air emissions data for the electric power sector,” and as for as the U.S. is concerned, that is probably true.  It contains air emissions data for nitrogen oxides (Nox) and sulfur dioxide (SO2), which are of concern because they contribute to ground-level smog and acid rain.  It contains data on emissions of mercury, a persistent bioaccumulative toxic.  And it contains data on emissions of the greenhouse gases carbon dioxide (CO2), methane (CH4), and nitrous oxide (N2O).  It also has information on how much power is generated, and how much fuel of each type is used, so that you can see how efficient each plant is.
eGRID is an odd database in that it's not a data collection; no one ever fills out a form to report their emissions to eGRID.  Instead, it's a combination of data from various data collections, together with model estimates.  Most of the data that go into eGRID were originally collected through a scatter of databases held by EPA and the Department of Energy.  For EPA for the last decade or more, it's been very difficult to get any new, major data collections, so information has to be cobbled together from a number of sources, none of them designed to exactly address the problem.
One of the advantages of a yearly data collection is that it has to be released every year.  The primary disadvantage of eGRID, in the past, was that it came out irregularly and by the time it came out it sometimes used old versions of the data sources that it drew from.  For instance, it's been released about once a year since 1998, except that it wasn't between May 2003 and Dec 2006.  The Dept. of Energy databases that it draws from currently seem to be available up through 2006, and eGRID only has data through 2005.  Still, a version has just been released – as of October 2008 – and that makes it up-to-date enough for all but the most picky and expert uses.
One of the large advantages to using eGRID is that some data quality work has been done to match the various databases together.  I had to do that once, for a report for an environmental group that we couldn't use eGRID data for, and it's something that you don't want to do unless you have no other choice.  Even more important, it upgrades all plant ownership, parent company, merger data and so on to a single date: December 31 2007 in this case.  Electric utilities try all sorts of tricks to confuse their paper trail or to take advantage of regulatory exemptions or make financial maneuvers; there has been a lot of buying and selling of power plants among various entities.  Making sure that all of that is upgraded to a single date is a significant advance.  What this means is that, for instance, a power plant that last reported in 2005 will be listed in eGRID as being owned by whichever company owned it on Dec 31 2007, not by whatever company owned it in 2005.
eGRID is used in all sorts of regulatory initiatives, for environmental disclosure, and in governmental and nonprofit electricity-information Web sites such as Power Profiler, Power Scorecard, or CARMA.  If you have a casual interest in your local electric power, you're probably better off with one of those.  But it's good for some people to look at eGRID, because more information is available through it directly, and because it sets the baseline that so many people work from.
There are a couple of reasons why eGRID may not be the best source for generally tracking electricity, as opposed to tracking sources of emissions due to electricity generation.  For one thing, it doesn't include any purchases of power, e.g. from Canada.  For another, the net generation amounts that it reports subtract generation used by the power plant itself, but don't take transmission and distribution losses into account, so the electricity that people actually use will have a lower efficiency with respect to emissions than is reported in eGRID.
So how do you use eGRID?  It's really just a set of three Excel files, so all you do is download them and  open them on your computer – you can use OpenOffice.   The most basic file holds information for each generating plant, and for subunits within plants.  A second file, the aggregation file, adds things up – it combines individual plants into totals by state, owner, operator, parent company, grid, and for the whole U.S.  That has almost all of the same data fields as the plant file, so once you learn one of them, you learn the other.  The third of the files is for state imports and exports, and you can probably ignore it.
(Note, though, that the aggregation file handles parent companies badly, in my opinion.  The people who made eGRID considered a parent company to be a holding company, not whatever company ultimately controls the plant, including the plant itself if there is no other owner.  Therefore, some plants in eGRID don't have parent companies.  That means that the parent company file, unlike the other aggregations, doesn't add up to the total of the individual plants.  I may try to get the people who make eGRID to change this, put in a parent company for every plant, and indicate whether a parent company is a holding company or not with some kind of data field.)
But the plant file is probably the most useful.  EPA doesn't like to release information about individual plants, or companies, within its general summary documents which are all that most people see if they see anything.  It likes to release numbers about states, regions, industries, and so on, but saying that specific company ABC is responsible for x percent of pollution?  You'll very rarely see that from EPA.  So you'll have to dig it out for yourself.
The plant file contains sheets on generators and boilers: components of plants.  Most users will probably skip those, although it's worth noting that they include years when the equipment went in service, which can be important for some things.  But you'll probably want the information on plants themselves.  There's about 5000 of them.  You can look at the eGRID technical docs to explain the data elements.
What are some of the more useful data elements?  Well, for the purpose of global warming, I'll look at CO2, ignoring methane and N2O for now.  That's “plant annual CO2 emissions (tons)”, or PLCO2AN.  A quick descending sort of the sheet by that field, and the top plant is the Scherer plant in Georgia, whose parent company is Southern Co.  With 26 million tons of CO2, that's one percent of the total CO2 emissions for the whole database right there.  There's only 68 plants that emitted more than 10 million tons.  Those 68 plants account for 36% of the total emissions from electricity generation.  That's about 12%  of the total U.S. CO2 emissions from all sources, including cars, industry, houses, and residential electricity used by those light bulbs that people are always telling you to change whenever you say that we need to do something about global warming.
But those plants generate electricity too, of course.  How much?  Well, there the whole thing is complicated by the fact that a single power plant might generate electricity from a wide range of fuels.  So just totaling up all the electricity from those plants is going to be a bit off.  But I can total up the net generation from combustion sources for them.  It's 31% of total U.S. generation from combustion sources – we're getting 36% of the CO2 for 31% of the power from combustion.  It's 22% of our power from all sources.
What I'd like to see for these top plants is how efficient they are in burning coal.  Coal is worse, from a CO2 standpoint, than natural gas, and coal burning efficiency varies by the equipment and the grade of coal used.  But I can't quite see how to do it.  The database includes an efficiency number that divides emissions of CO2 by the net generation from all combustion sources, but that includes oil and gas as well as coal.  There's a net generation only from coal number, but there doesn't appear to be a CO2 emissions only from coal number, so I don't see how to figure out an emissions rate that includes only coal in both the numerator and denominator.  Perhaps I could get it by digging into the boiler and generator data – but this post is too long as it is.
So, finally, here's a table of the 6 largest plants for 2005 for CO2 emissions, those with more than 20 million tons.  You could get these yourself through the eGRID tables, but I might as well list them here for Google indexing purposes:
Top U.S. CO2 Emitting Electric Power Plants, 2005
| State | Plant name | Plant operator | Parent company | 2005 CO2 tons | 
| GA | Scherer | Georgia Power Co | Southern Co | 26,040,793.5 | 
| AL | James H Miller Jr | Alabama Power Co | Southern Co | 22,509,466.8 | 
| GA | Bowen | Georgia Power Co | Southern Co | 22,156,373.7 | 
| IN | Gibson | Duke Indiana Inc | Duke Energy | 21,746,394.3 | 
| TX | Martin Lake | TXU Generation Co LP | Energy Future Holdings (TXU) | 21,593,119.5 | 
| TX | W A Parish | NRG Energy |  | 20,703,129.9 |