The Cato Institute’s “Deepbills” project has added semantically rich XML markup to every version of every bill in the 113th Congress. All of ‘em. That’s a pretty big accomplishment considering that there were 10,637 bills introduced in the last two-year meeting of our national legislature. Many of those bills were huge and many of them had multiple versions.
What is “semantically rich XML markup,” you ask? Basically, we’ve embedded codes into the text of each bill that make it easy for computers to automatically find key things in them.
CatoXML makes it easy for computers to find the references bills make to federal agencies and existing laws — and also to find the spending proposals in those bills. That gives you a better idea of what the bills do and who might be interested in them. It makes it easier for coders to build things that improve public understanding of what the bills contain. We’re talking about computer-aided oversight of Congress.
The data has found a few uses, on sites run by the New York Times and the Washington Examiner, for example. Cornell’s Legal Information Institute used the data to show people looking up provisions of the U.S. Code that Congress might amend the laws they were studying. That’s a pretty cool way to get intelligent people engaged.
Now that there’s a full dataset for an entire Congress, I’ll be encouraging researchers and political scientists to use the data in more detailed studies of congressional behavior.
Just as importantly, the success of this effort serves as a proof of concept for Congress itself. In January, the House amended its rules, asking its administrative organs to “broaden the availability of legislative documents in machine readable formats.” We’ve shown that Congress can produce bills with meaningful machine-readability. Congress can make it easier for the Internet-using public to access, read, and understand bills. A good team of people is working behind the scenes on the Hill to do that.
Marking up every bill in the 113th Congress has had its challenges. (You might have guessed that from this triumphal announcement coming a half-year after the close of the 113th’s second session.) We have yet to decide if we’re going to continue the project in the current 114th Congress and into the future. If we do, we’ll revamp the technology and add more data elements that tell more revealing stories about what Congress is up to.
The public demand for members of Congress to “read the bill” is really a demand for more deliberative processes and public understanding of what’s going on in Congress. That demand can be satisfied, I believe, if modern information technologies are applied to Congress’s archaic and somewhat time-worn processes. We’ve shown the way with Deepbills.