Projects‎ > ‎

FMS Symphony

Location: Columbia
Team: CSV Soundsystem
  • Brian Abelson: Open News Fellow, New York Times
  • Jake Bialer: Innovations Editor, Huffington Post
  • Burton DeWilde: Data Analyst, Harmony Institute
  • Michael Keller: Senior Data Reporter, Newsweek and The Daily Beast
  • Thomas Levine: Data Superhero
  • Cezary Podkul: Reporter, Reuters
Project Categories:
Innovation && Insight

Every day at 4pm, the United States Treasury sends out an email summarizing the cash spending and borrowing of the Federal government — all the money Uncle Sam took in that day from taxpayers, how he spent it, on what, and how much debt he took out to make it happen.

At a time of record fiscal deficits and continual debates over spending, taxation, and the debt, this daily accounting of our government's main checking account is an essential data point that the public should have ready access to. Yet, the Treasury told us in response to a Freedom of Information Act Request that it does not store this data in any format other than inconsistent, poorly-structured text files that don't lend themselves to programmatic analysis. So, Team CSV set out to liberate this dataset by scraping and parsing eight years' worth of Daily Treasury Statements...

Our goal is to create the first-ever electronically searchable database of the Federal government's daily cash spending and borrowing, update it daily, and allow the public to easily search, explore, and visualize how the government spends their tax dollars. We will also use the data to answer some basic questions about the country's finances: How does the government spend the tax dollars it takes in each day? Which programs are most responsible for driving up our national debt each week? How often do we spend more money than we take in as a country, and why?

Daily treasury statements were downloaded as fixed-width files (example), then painstakingly parsed in Python with Pandas.

We used various R sonification, processing, and visualization libraries to produce a multi-sensory information interactive. These libraries include
* plyr
* reshape2
* csvsoundsystem
* aplpack

We sought to produce an interactive that simultaneously displayed highly-dimensional data. We started with a 55-dimensional dataset consisting of
* 1 date
* 52 daily line items
* 1 daily interest rate
* 1 debt ceiling

We added a few variables to assist with our analyses:
* Day of week
* Day of month
* Rolling mean
* Rolling z-score
* Daily balance
* Variance of the 52 line items

And then reduced the now-61-dimensional dataset to an interactive.

We used principal component analysis to rotate the 52 line-items and plotted the 15 highest-loaded components as Chernoff faces. We plotted interest rate and federal account balance with line-width as standard deviation of all the line-items.

We represented similar data in audio. Chords were selected based on the derivative of account balance, and a melody was composed based on the federal interest rate. We also included a contrapuntal riff driven by the distance between accumulated federal debt and the legal debt ceiling.

We also used the data to pinpoint which programs eat up most of the government's resources on a daily basis. Our analysis shows that Medicare costs are eating up an alarmingly large amount of the government's daily cash spending. In 2006, the Treasury's daily spending on medicare equaled about 30% of the cash tax receipts it took in; by 2012, it had risen to about 40%. A daily plot of the medicare spend as a percentage of the treasury's tax receipts illustrates this alarming trend.

Current Results:

Next Steps:
  • Host the dataset as a searchable SQL database
  • Automatically update the SQL database on a daily basis to keep pace with posts by the FMS
  • Built Twitter-bot to automatically tweet out the government's most interesting financial developments each day
Un mélange de R, Python avec Pandas et Notebook, shell scripts, JavaScript, SQL.

Data Sources:
Financial Measurement Services daily treasury statements: