Please feel free to contribute project ideas on this page. Projects can be on any topic (climate change, campaign finance, lobbying, immigration and tax reform, etc.) as long as it fits the general theme of of money's influence in politics . They can also be of any level: federal, state, local. Also, check out discussions on "Join This Wiki" page.

Ideas from Cathy O'Neal and Lee Drutman

A range of excellent ideas - please see them here

Climate change

  • "Can we mash up industry giving (fossil fuels and/or renewables) to lawmakers with climate patterns in their states?"
  • "Should we crowd-source weather anomalies?" - Kathy Kiely
    There is a "low tech" (if you call Ham radio low tech) NWS program that has done a lot related to this for ~30 years:­ywarn/whatis.html

Open elections

There is no such thing as a national election; there are 50 state elections. The data that would allow journalists and hackers to analyze the returns from what should be the world's most sophisticated (if sophistication is measured in dollars) democracy are a jumble of mismatched sets. That's why the Knight Foundation made a grant to a two data journalists -- one from the New York Times and one from the Washington Post -- to create the nation's first standardized set of election results. You can read more about the project here:  One of the grantees, Derek Willis of the NYT, will join us remotely Saturday afternoon (Saturday morning PT) to discuss the project and see if he can recruit some of our participants to help. To get an idea of the records you'll be working with, take a look at these links, sent along by Derek:

Imagine your county is going broke. There are likely no direct bridges to explain the brickwork that leads you directly to the collapse. Rather than prove any causations, it might be reasonable to assess some of the factors that give rise to questions about these counties. Using the law of Benford we are looking to find trends in the aggregate number sets across each individual county that will present "under-randomization" of the human element - hence providing evidence of human-generated numbers not representative of the real values of county budgets. 
  • For example: Capital Expenditures (CapEx) in county (x) Iowa in 2009 amounted to $1,840,056. 
    • By taking each individual account balance that equals the sum number and comparing it to historical data, we can find significant repetitions of the number "4" that may amount to unusual randomization. 
The same theory of fraudulent stratifications was applied to the EU collapse to find 'cooked books' in countries like Greece. The subsequent publication of strategy and  "Fact and Fiction in EU Government: Economic Data" by Stefan Engel found indeed that Greece ranked the most likely out of the EU countries to have committed acts of fraud. Unfortunately the numbers were only crunched ten years after the primary fraud period was really committed. 

#1 Goal, put package online so everyone can use the Benford program (R package) to calculate fraudulent activity in comparable environments.
#2 Goal, provide insights to regulators as to how they might proactively document cases of fraud rather than retroactivly respond to financial crises. 

#1 Not-goal, prove fraud. This program will not create conclusive outcomes, but more accurately direct the scrutiny of journalists in the near future.

Tracking corporations around the globe and down the rabbit hole

The idea behind OpenCorporates ( is breathtakingly simple -- and breathtakingly ambitious:  create a database that will allow users to trace every corporate entity back to its corporate progenitor. Some interesting (beta) visualizations of their work here: and here:
OpenCorporates needs help! Co-founder Chris Taggart will by in NYC to discuss the group's latest effort to map political donations to corporate networks and would be happy to have assistance on this worthy project.

Gun control

  • Can we organize an effort to follow gun money to state houses and even city councils are where many important gun laws are written?
  • Access to gun ownership records is restricted in most states (see Sunlight's reporting on restricted access to gun data).  Can we uncover the interest groups and track the political players that have kept this valuable information out of the public record? 
  • Can we create a database of all of the gun deaths that occur in the US, say from 2011 onwards, over a map, which can also be color-coded to show where gun regulation is most lax, or where gun prevalence is highest?
  • What's the National Rifle Association doing where you live? Can we make it easy for citizens to get information about the latest lobbying efforts of the NRA in their community?  See the Washington Post's summary of NRA successes at the state level
  • Can we create a map (state, local, or national) that changes over time (using a slider or something) showing gun-related deaths compared to gun legislation?

Campaign Finance & Spending

  • Associating campaign contributions by the district they are from (address) and the district they are going to (candidate)

  • Comparing contributions per and post redistricting

  • Comparing socio-economic data (census, BLS data) with campaign contributions (How much of the funds come from the poorest/average income/richest neighborhoods)
  • Campaign finance can tell you what politicians a company is interested in. Lobbying reports can tell you what issues and bills they're interested in. Combine the two, and you may be    able to see *why* a company gave to a particular politician.
  •   Looking at campaign finance figures mapped over voter turnout and results and see how predictive that can be.

  • Campaign spending is often overlooked. It would be great to have a method for comparing spending across committee types (PAC, House, Senate), identify the outliers and figure out why their spending is so different or how it changed over time.  [ Candidate disbursements are available here: ]

  • How influential is out-of-state support? 
  • Associate campaign contributions from particular industries and/or interest groups with representatives' long-term voting records in the policy domains of interest to the industries/interest groups. 
  • Comparing campaign spending from a company's employees versus its executives. Do the companies that appear to support a specific politician actually support that politician? Or are large contributions from executives overshadowing the support coming from the majority of the company's employees?

Tracking political ads

  • Assess the mood/negativity in campaign ads and correlate that with the funding sources. (The Wesleyan Media Project has done some work in this area using Kantar/CMAG data. Unfortunately, the data is proprietary but WesMedia's analysis will be helpful on the tone of the campaign.)

  • Political ad sleuth  as collected a large number of scanned ad buy contracts from broadcasters' public files. Using computer vision + machine learning, cluster together similarly structured forms in order to aid in information extraction.

  • Ad Hawk  has amassed a large collection of political ads (> 4,000). Using computer vision, extract text from these ads in order to understand more about the key terms and concepts introduced in each one.
  • Use Maplight to assess money spent on ballot measures over the years and combine that with campaign ads supporting each measure to see moneys influence on whether a measure passed or not and which issues are most likely to attract the most money and attention.


  • President Obama promises that immigration reform will be on his 2013 agenda. What are the interest groups you want to watch? Are there stories the MSM is missing? See Sunlight's report in the immigration lobby to get the lay of the current landscape.
  • Can we make it easy for journalists to write about where money focused on immigration reform is coming from? There's a super PAC, Republicans for Immigration Reform, ready to spend big on the issue.
  • Q: I am interested in methods to estimate the numbers of undocumented immigrants in California. No official data source can help us estimate these numbers accurately, so how are estimates made? What data sources can help get us started?

    Response: The most definitive data is at the Pew Hispanic Center, which has used Census and other data to extrapolate the number of undocument residents of the USA. Based on this story:­2/12/06/unauthorized-immigrant­s-11-1-million-in-2011/ it looks like a new release of data may be imminent.
  • In his inaugural address President Obama spoke of immigration in the context of ensuring opportunity for the education of future "engineers" and others. With recent movements of the dream act and more research coming out of transnational students (children born in the US but deported back to Mexico with their parents), I would be interested to see data on how many colleges have enrolled undocumented students in the last three years and compare that to how the numbers of transnational students being deported in the same time-frame to show how much work needs to be done.  

  • Human Rights Watch has proposed several immigration-focused ideas:

    In 2011, Human Rights Watch released a report on immigration detention transfers. This report was based on a dataset of over 5 million records obtained from Immigration and Customs Enforcement (ICE) via FOIA request.  Our analysis of this data can be viewed here.

    We'd like to propose continued work on immigration data for the Datafest, as it would be wonderful to have additional quantitative and qualitative information which we could potentially use to enhance our advocacy efforts for immigration reforms that respect the human rights of non-citizens and their families.

    In agreement with the event's focus on money and politics, we'd like to propose participants mine data for insights regarding immigration detention facilities. We'll provide two pieces of information. First, this dataset/map of all 1500+ facilities that held an immigration detainee between 1998-2010.  On the page, you can look at the data table and download all of the data. Second, we have a basic spreadsheet of the top 100 facilities by number of detainees which includes some qualitative information about the facilities including the private companies that manage them (there are two major private detention corporations - The Geo Group and CCA). 
     There are also two datasets on immigration detention facilities. One has all of the detention facilities that were in the raw data received from ICE (link) . The other has categorical data on the top 100 facilities by volume of detainees (link). If you use these datasets, please cite Enrique Piraces and Brian Root at Human Rights Watch.

    Questions we are interested in:
    • Can we confirm the management/ownership of each facility?
    • Can we confirm the geolocation data of each facility?
    • Who are the major elected officials for the geographic areas of each facility, at local, state and federal level?
      • How have these officials voted on immigration/detention related bills or fiscally-related votes?
    • Can we examine the lobbying efforts of the relevant private detention companies?
      • Which elected officials received funding? Which facilities are in their districts?  
      • What other political data can be accumulated regarding these companies' activities?
    • What data on funding/budget allocation can be gathered at the company and facility level? Can we trace money down from the federal government to the detention facility/company?
    If there is interest in working on this project, we can discuss ideas and options further. It is extremely important to HRW that all methods for data gathering/analysis be well documented as data sources would be critically vetted.
    Thank you for considering!
    Brian Root, PhD
    US Qualitative Analyst
    rootb AT

Big cities, big money?

  • The nation's two biggest cities will be holding mayoral elections in 2013. Both are sure to draw lots of campaign contributions and ethnic politics. How can we use data to illuminate the contests and draw comparisons and contrasts?"

Wall Street/Silicon Valley Divide

  • These two motherlodes of campaign money went in opposite directions in 2012, with the right coast betting heavily on Romney and the left on Obama. What will it mean in 2013 for everyone who lives in between?

Analyzing Legislative Text Data

  • Journalists have discovered that private companies push for favorable changes in law by funding a think tank that produces model legislation for state legislatures. For example, see this NPR report. NLP could help scale up and speed up this kind of work. NLP experts can efficiently sift through state laws and detect essentially identical text and trace it back to the source.
  • Patterns in state-level legislation, possibly, over time. Big policy shifts are often cumulative in nature and build up over time.
  • Correlating the background of donors with the content of a politician's speeches and the bills s/he introduced or supported.
  • Look for correlation between the subject of bills introduced to state legislatures to big companies within those districts and campaign donations - using Sunlights Open States Api

  • Look for correlation between the passing/failing of bills with respect to the national tragedies they follow. For example, gun control bills before and after Sandy Hook, disaster relief bills before and after Hurricane Sandy, etc. 

Social Media/Network Analysis

  • Analysis of social media and news reports may reveal new knowledge. E.g.,Nate Silver in his book “The Signal and the Noise,” writes that professional economists didn’t see the housing bubble coming. Yet, the general public was concerned long before the crisis. Google searches for “housing bubble” grew tenfold in 2004-2005 and the notion was discussed 10 times a day in news media. Perhaps, there are similar insights with respect to money & politics hiding in online content and in news reports.
  • Is social media a platform for event-driven or policy-driven discussion?  There was a report that mentioned that the Sandy Hook Elementary shootings spurred a debate about gun control on Twitter, whereas previous shootings stimulated social media conversation about topics solely pertinent to the particular shooting.  (  Does this similarly hold in the social media world for other issues like gay marriage, climate change, and immigration reform? 

Sacramento Bee's Phil Reese's 10 Ideas
  • How many California ballot measures in the last 10 years have been primarily supported by fewer than five people or corporations? How many such ballot measures have succeeded?
  • How many corporations or political action committees have given to both the California Democratic and Republican parties in the last decade? How many have given to candidates running against one another in the same race?
  • What state legislators benefited the most from redistricting by getting more wealthy potential donors in their district?
  • How often do corporations give almost exclusively to legislators on committees that directly regulate them?
  • How often did legislators go to free dinners or entertainment events on lobbyists' dime in the last decade while the state budget was late?
  • Which U.S. representatives receive the most money from retired individuals and how have they voted on key issues affecting Medicare and Social Security?
  • Do legislators who vote against the party line most often tend to get more or less political contributions?
  • How many donors extend their giving power by donating the maximum from each member of their family?
  • Have labor union contributions declined or risen as union membership falls in California?
  • How often do corporations give gifts directly to spouses and children of legislators to avoid statutory gift-giving limits?

Data Visualization
  • Mapping a Changing Nation: The country is changing and we need a way to study and map those changes at the community level – how some places are getting while other are getting poorer, how some places are growing more socially and politically liberal, while others hold firm or grow more conservative. Can we build on the data-driven community types identified by Patchwork Nation and build off the success of WNYC’s Patchwork Nation map we create a mapping platform and approach that will let us visualize multiple data sets at the same time?

  • Can software developers produce widgets or apps that make it easy to update the data visualizations produced at the Datafest? Or even just compile it into a more easily readable website that doesn't involve clicking around on individual projects.


  • What data is available about financing of city or county elections? Some cities collect and release political contribution information. A survey of what local governments have this information would be very interesting.
  • Wikipedia represents a dense network of connections that link people, ideas, events, etc. Can this data be analyzed during the Datafest?  ( is an attempt to provide a database of 5 billion web pages for people to analyze. They had a contest to use the data and one of the winner was Wikientities: which attempts to figure out the meaning of words by the most common links in Wikipedia. It might be a good jumping off point for analyzing Wikipedia. Another project was to see Online Sentiment Towards Congressional Bills:
  • How is state-level different from national-level? When do big national orgs give at the local level? Are there orgs that give locally but not nationally? Are there interesting instances of local giving though subsidiaries that aren't obviously related to parent company? Does the party breakdown change by locality--that is, will companies give heavily to Democrats in one locality while giving heavily to Republicans in another?
  • Charities and advocacy (not specifically campaign finance): can we correlate charitable giving by companies with advocacy by the charity that aligns with the companies' positions?

Revolving Door Data

  • Starting in 2008, lobbying firms have filed electronic disclosures with Congress. Among the most interesting and underused data sets come from the lobbying registration forms, filed whenever a lobbying firm gets a new client. On these forms, and these forms only, individual lobbyists must disclose their past government employment going back 20 years. Lobby registration data is potentially the best source of revolving door information available, and yet no one is using it. The data is in a balky XML format that can be downloaded from this page: . Now, there's no standard format in which lobbyists disclose their past government employment. So you'll find some who worked for Sen. Edmund Kennedy, Sen. Kennedy, Senator Kennedy, Office of Senator Edmund Kennedy--etc. etc. To make the data useful, some cleaning will be necessary. We won't find every lobbyist (not every lobbyist will have picked up a new client in the last four years) but we will get a lot of them, and when looking for who's doing special interests what favors on Capitol Hill, it's always useful to see who their old staffers represent. It's one of the first things I look for.

Costs of Political Apathy

  • One frequent complaint about the U.S. political system is that so few people vote — it is generally between 50 and 60 percent for presidential elections, well behind other democracies in the developed world.
  • One idea positied by pundits, and by the President, is that corruption is prevalent due in part to this low participation rate. In his 2012 DNC speech, Obama said:
    "If you give up on the idea that your voice can make a difference, then other voices will fill the void — the lobbyists and special interests, the people with the $10 million checks who are trying to buy this election and those who are trying to make it harder for you to vote."
  • Is this true? Does a low rate of political participation lead to increased corruption? And how can we visualize this?
    • I already have raw vote totals from 2012 by county in machine-readable form
    • Voter registration data is available by state
    • Census data should help determine a rough approximation of "eligible voters"
    • Data is available showing how closely representatives hew to the interests of the people who donate to their campaigns, and the amount of cash they receive
  • People with experience in data mining, statistics, mapping, extracting data from PDFs (voter registration is not usually available in a machine-readable format) and people knowledgable about the legislative process would be helpful on this project.
           *It could be interesting to use Google Fusion Tables to create a map of the United States that shows both the percentage of the population that votes in that state, as well as the total sum of political donations to the recent presidential election that come from individuals in that state. The map could also be organized by broader regional areas, or by how the state voted in the election. 

Congresspeople Playing Hooky
  • Mash up Sunlight's Political Party Time site, which shows (user-submitted) parties that congresspeople plan to attend, with vote data to see if any congresspeople have missed a vote because they were at a party. Journalistic and dev collaborators of all skill levels would be welcome; Python or Ruby; I'm in NYC, but happy to work with Palo Alto-based folks. Contact me at jeremybmerrill at jeremybmerrill dot com.

In October 2012, the STOCK act came into force. It requires US Senators and House of Representative Members to report financial holdings and transactions, including loans, shares. The information is published on the Office of the Clerk website for the U.S. House of Representatives, and the Senate Office of Public Records (albeit in somewhat hostile PDFs which are partially handwritten, a database is reportedly due later this year). It may be newsworthy to compare the financial position and performance of elected officials' holdings in comparison to market averages, population averages etc. - Suggested by Fergus Pitt [@fergle on twitter. Message Me if you're interested.]


The next decade will be marked by dramatic changes in United States. Shifts in politics, economics and culture that have been building for many years will be felt more concretely and the impact those changes will vary greatly at the community level. Some places are set to thrive, some will struggle to adjust, and some will fall far behind. This evolving community landscape is remaking America.

With Patchwork Nation we mapped and charted these differences in 12 different types of communities around the Unite States at the county level. The project’s home page offers good explanation of what it is – - and the fine work of others, like WNYC, shows how the map data can be simultaneously viewed in charts -  

But as we as move on and prepare to overhaul and rebuild the demographic community breakdown we created and expand its use in academia and research, we have some special challenges involving the visualizations.

            ·      How to write code that takes complicated figures like unemployment, that are based on multiple numbers like “workforce” and “number employed” and quickly crunches them together to come out with “type average” as seen in the WNYC map with the vote?

            ·      How to display multiple data sets on the map and in the charts at the same time? So that we can show say unemployment or % Hispanic with vote Obama/Romney vote totals by county and in bar charts.

            ·      How to do both those things with time series data.


  • Generate a map of the country made of congressional districts where distance between the districts is the inverse of the number of votes in common.
  • Identify names of important things: people, corporations, locations. This can be done as an NLP project in conjunction with network analysis.

  • Election spending on Google ads in 2012 was 5X of the 2008 levels. According to Google's Chief Business Officer Nikesh Arora, in 9 of 11 “top Senate races … the candidate who spent more with Google was elected.” Causality or correlation?
  • I have clean, plaintext Supreme Court opinions from 2006 to 2011 and can trivial scrape and clean the rest. If anyone would like the data, let me know: jeremybmerrill at jeremybmerrill dot com