Hacking Our Cultural Heritage Datasets

April 28 and 29 was Transparency Camp 12 (TCamp), an unconference to gather journalists, technologists, activists, and others to work on ways to promote and work with openness in government. The 30th was a special hack day on the Voter Information Project. Turns out, I was over my head in that context and couldn't really contribute. So instead I ditched and built an Omeka site to display the data in the catalog of government agency datasets CSV file from Data.gov (full disclosure for those who don't know me--I work on Omeka for the Roy Rosenzweig Center for History and New Media)

Here it is

Here's why.

Data Sets Are Cultural Heritage Artifacts

If we live in the "information age", or better yet the "data age", then the sets of data that we, as a society, collect and use reflect that fact. They are cultural products of the time. What data was collected (and how, and in what formats), say something about this cultural moment. As such, they deserve to be treated as part of our cultural heritage just as much as artifacts from, say, the "space age". So, using Omeka for to republish dataset makes sense.

"Hacking" Need Not Be Reserved For Techies

I have to say I was a little surprised that I didn't see more of the DIY spirit at
TCamp. Maybe that's because of my expectations from THATCamps. Either way, during one of the sessions the state of Data.gov was discussesd. There was much dissatisfaction with the site, in terms of UI and UX, updating, and more. But I still think it is remarkable that there is so much data there to grab and play with. That situation reminded me of the sagacity of Butthead when he said, "This sucks! Change it!" Putting aside for the moment the issue of data not being updated, if some useful and interesting data is there for the download but packaged in a bad site, I want to do something about it.

I'll call that hacking, even if it doesn't involve touching any code at all. It's taking data, manipulating it, and repurposing it. That's what coders do. But, with the data available, noncoders can also engage in that activity of hacking. I want to send that message to the TCamp crowd to encourage more people to engage in low-tech-level hacking on the data that's there.

What I Did

I grabbed the CSV file (N.B. they say this set is updated daily, so it's the snapshot as of April 30, 2012) and poked around in it a little just to see what was there and what munging I might need or want to do. It's a one-day hackathon, so I didn't want to get too fancy, but it did look like I'd want to do some work to distinguish the datasets listed and the agencies producing them. I created item types in Omeka for "Dataset" and "Agency" (I decided to mostly ignore, for now, the distinction in the data between agency and sub-agency, but did build a field for parent agency), with some associated metadata for each. Then, CSV Import plugin nommed it all in. I created three different imports on the same file, one to map data onto datasets, one to make the agencies, and a third to make agencies out of the sub-agancies.

For the sake of discovery, I was pretty liberal with mapping data to tags. For all three imports, the category and keywords CSV fields were mapped onto both Dublin Core subjects and onto tags.

I broke up the original CSV file into parts to fit into upload size restrictions to my host. After three or four parts, I noticed that the data was corrupted -- the headings no longer matched up with the data in the columns. In the interest of pushing out a proof of concept, I declared "meh" and stopped after those parts. Some data cleanup would probably produce a little over two or two-and-a-half times the amount of data in the site.

An implication of the idea that datasets are cultural heritage artifacts is that there should be some narrative and interpretation built up around them. So, just to demo that idea, I built a simple demo exhibit.

It would also be nice for people to contribute their own stories or information, so I fired up the MyOmeka and Commenting plugins.

Done. Data hacked and repurposed. Notice, I haven't touched a line of code yet. BUT HACKING WAS ACHIEVED!

Next, code hacking to add some niftiness.

One thing that I wish Omeka does better is create relations between items, like between the Dataset and Agency types I'd created. So I built some scripts to first clean up the multiple representations of each Agency that appeared -- you'll notice that each row of the CSV would make a new Agency, thus many duplicates. Plus, in that easy import, the agency for each dataset was recorded as plain text. Once I had removed the duplicates, it was fairly easy to go back through the datasets and change the plain text info about the agency (and subagency) into a link to the record in Omeka. Last, I fired up another script to add links between the parent and child agencies.

UPDATE:Ooops! Looks like the script isn't working right with all the sub-agencies. Might be that only one is surviving the rewrite into a link.

Granted, this came together so quickly because I'm so familiar with Omeka, its plugins, and what they can do. Even still, data can be repurposed into something that invites user feedback and building pretty easily. I think Omeka is good for the job for that philosophical reason of treating datasets as cultural heritage artifacts. But a savvy Drupal person could probably do similarly awesome things without writing code. Not sure, but I suspect that there are tools in Drupal that would get farther before code-writing became necessary. The point is, if there's someone in your organization with some familiarity with any CMS, you're in a great position to start doing things with the data that's available now.

Anything Useful Here?

I think so. Complaints about interface, design, and user experience can certainly be lodged. I just took one of the available Omeka themes without modification beyond what's in the theme configuration. But, as proof of concept, the important point is that we can go in and change it when we want to. In other words, the same data that's on data.gov has been moved into a context that lets us change the context and the interaction to our needs. That's a big step forward.

Out of the box, one of the interesting things is the tag cloud.

What Next?

The site is definitely a proof of concept of how we can start hacking the data that's available. Completely missing in this exercise is the really important call to action step. The site is almost completely about information sharing, both by giving a new view on the catalog, and by inviting site users to augment the information being displayed. Given the extraordinary site designs I saw at TCamp, there's certainly room to develop the theme (it's just out-of-the-box Emiglio), and do something really interesting that way. It's also possible that a different set of data could lend itself more to something that could become a call to action. Here's a few possible directions.

Crowdsourced monitoring of compliance to dataset release
It would be trivial to build a plugin to mark what datasets are in compliance with their stated release schedule. Something more fancy could record a history of compliance. That monitoring could be crowd-sourced. If we had 100 people agreeing to check on 10 datasets each every quarter, that'd cover things pretty well. And that would help keep pressure on agencies to adhere to their stated goals.
Patterns of data
Narratives are great for explaining patterns that we see, and that's what Omeka's exhibits are good for. We could bring together datasets to explain why they are important in aggregate, and develop further directions for research, both journalistic and sociological.

As a proof of concept, I'm not entirely sure how much additional work I'll be doing with the site. If interesting things start to happen there, I'll maintain it as best I can, and if things really get interesting I'll build up feature requests. But if we get to that point, I'll need help. If people want to join the site at a level to build more exhibits, I'll certainly add you in. If you want to be an admin, all the better! My real hope, though, is that others will want to try a similar approach with different sets of data.


Thnx for providing this resource on your website.

by casper111 on Fri, 09/27/2013 - 06:48

Hey ! very good layout on how to proceed with the hacking of the cultural data sets . this will help many of us to try put with different sets of datasets at a given time without the problem of crowding up.

by Ahmad on Wed, 01/29/2014 - 02:45

I will share how to lose body weight fast just by exercising in a way that boost any metabolism. This creates an indoor environment in the male body that keeps slimming off well long subsequently after your workouts session is expired. www.afterburnworkout.com

by alex112 on Mon, 02/17/2014 - 07:40

I'm so happy I found this website www.radiologist-salaries.com

by ankurs127 on Mon, 03/03/2014 - 09:30

Daily audio bible is a Devotional website,
it has Bible Studies and Prophecy Studies.
It's a Christian Ministry that exemplifies inspiration,
fellowship, encouragement, prayer,
laughter, and friendship which come together,
strengthening our walk in audio bible.

by Ahmad on Wed, 01/29/2014 - 02:45

Exactly where deciding for sure if you can buy these magazines there are a variety things to decide upon before you will find special. If you can be on an unusually strict budget you might limited how a large number of magazines you can buy and ways often you can buy them. www.allfashionmagazine.com

by Ahmad on Wed, 01/29/2014 - 02:46

The visa or mastercard is frequently a handy tactic to manage the day after day requirements. The bonus with credit cards is that hot weather will to help you get max 50 days to shell out back lacking interest. The costumer will most likely always pay before its due to attributes carefully interest and additionally late extra fees. www.askcreditsolutions.com

by asdf on Thu, 02/20/2014 - 08:59

Yacht Charter Menorca takes its name after its larger "sister", Mallorca and is part of the Balearic group of islands off Spain's eastern shore. Its name translates as "smaller island" since Mallorca is the "larger island"

by Ahmad on Wed, 01/29/2014 - 02:47

All of these restrictions indicate of the fact that property is subject to the likes and dislikes of all the homeowners organisation. Everyone buying throughout the development becomes subject to the rules for the association. www.atlantahomes101.com

by Ahmad on Wed, 01/29/2014 - 02:47

Understructure and Breakfasts are often the perfect nooks which usually allow secrecy, quaint decorum and polite and snug services. Distinct decor for B d B's are likewise on all the menu, especially on the favorites for downtown Gwinnett. www.bedbreakfastatlanta.com

by Ahmad on Mon, 02/03/2014 - 04:28

Gym mats add comfort, safety, and appeal to exercise areas. Our selection of gym mats include single rubber mats for isolated areas such as weight lifting mats and exercise machine mats, or rubber mat rolls covering larger areas, or even interlocking rubber mats going wall-to-wall. Whatever the particular need we have the gym mat solution. gym mats

by Ahmad on Fri, 02/14/2014 - 02:42

Evidently, when you can be shopping for home security providers and additionally products, you will want the most to defend your interests on the market. You might also want a quantity of inspecting that will give peace about mind as well as dead away, www.homesecurityproviders.org

by Ahmad on Fri, 02/14/2014 - 02:42

Any industrial remove that sounds like a family home roof, plus by advantage this really is strong and straightforward to install, tones interesting, most suitable? If which usually interests you will, you should brows through the metallic tiles which you'll find the cutting edge generation about roofing equipment. www.houseremodel.org

by Ahmad on Fri, 02/14/2014 - 02:42

About the most important attributes that are kept in view while sucking out or possibly designing all the interiors about any room certainly is the noise reverberations inside. Sound this really is unwanted can lead to a large amount of irritation and additionally discomfort to steps inside all the closed locale, especially any time it an office environment. www.howtohomeimprovement.org

by Ahmad on Thu, 02/20/2014 - 05:45

Being a head coach at Oklahoma, Louisville and Miami (FL). Being the offensive coordinator for the undefeated 1972 Miami Dolphins. Being head coach of the Baltimore Colts and offensive coordinator for the Los Angeles Rams. http://www.myfloridacoach.com

by Ahmad on Thu, 02/27/2014 - 23:30

In the last year, a major law firm has been slapped with a lawsuit for sloppy document review work by their legal process outsourcing company. legal-ediscovery-service

by jassica john on Mon, 02/03/2014 - 02:38

The women are certainly more passionate on the subject of wearing modern garments, shoe and fashion accessories, guys much too are no less fashion careful. stylishshoesstore.com

by v2 cigs coupon on Tue, 02/18/2014 - 08:14

Well This post is truly inspiring. I like your post and everything you share with us is current and very informative, I want to bookmark the page what does bubblegum casting do ?

by jorg on Thu, 02/27/2014 - 06:16

The graphs are very impressive. These clearly illustrate the subject matter. Such kind of graphs is suitable for any research paper or study. http://valenscube.com/correct-selection-of-engagement-rings-can-represen...

by How to get out ... on Sun, 03/30/2014 - 07:32

Well I didn't want to get too fancy, but it did look like I'd want to do some work to distinguish the datasets listed and the agencies producing them Scottish Trust Deeds

by PPI Claims on Tue, 04/08/2014 - 09:17

If we live in the "information age", or better yet the "data age", then the sets of data that we, as a society

by jams on Thu, 04/17/2014 - 05:03

Last, I fired up another script to add links between the parent and child agencies.
hackear facebook

by Alica on Mon, 07/08/2013 - 13:24

Fantastic blog site you’ve got listed here.It is tricky to find substantial good quality producing like yours currently. Insurance in LA I truly enjoy persons such as you! Get care!!

Please let me know if you're looking for a article
writer for your blog. You have some really great posts
and I believe I would be a good asset. If you ever want to
take some of the load off, I'd absolutely love to write some content
for your blog in exchange for a link back to mine.
Please shoot me an e-mail if interested. Many thanks!

by PHP on Sun, 09/15/2013 - 22:15

Lorsque vous achetez christian louboutin boutique en ligne de sortie, il ya quelques trucs que vous pouvez www.vrluxe.com , vous pouvez le voir dans les conseils suivants. Cela aidera à acheter des chaussures de vitesse dès que possible.

by adele_stuart on Wed, 11/06/2013 - 04:32

It is simple to follow your advice, I want to see improvements!

by FuosChen12 on Tue, 12/03/2013 - 11:19

Your approach to this topic is unique and informative. I am writing an article for our school paper and this post has helped me a lot. Cheers

by Rodene on Thu, 10/17/2013 - 06:22

Year 2012 has been quite eventful. SRSG also witnessed events and evolved in terms of business verticals, technologies, people and processes.
Turnkey Solution
RODE audio
Grass valley switcher

by Green on Fri, 10/18/2013 - 06:49

Green Power International was established in early 2002 in close cooperation with MWM GmbH (formerly Deutz Power System GmbH).
Gas engine genset
E2P gas genset
Rotary UPS

by Henry on Sat, 10/19/2013 - 05:30

Gangaur Realtech is a professionally managed organisation specializing in real estate services where integrated services are provided by professionals to its clients seeking increased value by owning, occupying or investing in real estate.
Residential projects in gurgaon
New plots in gurgaon
Buy Residential property in gurgaon

by doris on Mon, 10/28/2013 - 00:59

I was delighted to find this web site. I wanted to thank you for your time reading this wonderful! I really enjoyed every bit of it and I’ve marked to ensure that the blog post something new. NCIS seasons 1-10 dvd box set | Downton Abbey Seasons 1-3 DVD Box Set | The Vampire Diaries season 4 dvd box set | 30 rock Seasons 1-7 DVD Box Set

by mighty student on Tue, 11/12/2013 - 00:26

That tracking could be crowd-sourced. If we had 100 individuals accepting to examine on 10 datasets each every one fourth, that would protect factors fairly well. And that would help keep stress on organizations to follow their mentioned objectives.

American Auto Shipping now has produced it possible for all those people trying to find car shipping services to fill free quote request forms.

If people are within the car along, they will feel the tension in the air and you are going to all wish you shipped the vehicle
instead. It's crucial that when an individual realized the requirement to send his car, he or she must start gaining car shipping
quotes as soon as he can.

by Rojer on Sat, 11/16/2013 - 13:07

You were right in your approach. Your practical guidelines helped me a lot to find out the best sites for essays. I consider it to be my privilege to comment on your writing. Your posts are the valuable sermons each student should follow in order to achieve success in life.

by zachary jose on Wed, 12/04/2013 - 00:28

Continue to keep within the excellent operate. I merely additional increase your Rss to my MSN News Reader. fjackets

by Gail on Wed, 12/04/2013 - 01:03

I'll call that hacking, even if it doesn't involve touching any code at all. It's taking data Hello! I could have sworn that I visited this blog before but after browsing some of the post I realized it was new to meMonster Beats Headphones for Sale|monster beats tour headphones|monster beats solo hd headphones|
monster beats studio headphones|monster beats pro headphones

by premier on Sun, 12/15/2013 - 09:56

It is simple to follow your advice, I want to see improvements!
Pakistan Idol 3rd Episode

Superb blog! Do you have any hints for aspiring writers?
I'm hoping to start my own blog soon but I'm a little lost on everything.
Would you advise starting with a free platform like
Wordpress or go for a paid option? There are so many
options out there that I'm completely overwhelmed ..
Any ideas? Thanks a lot!

by jack11 on Tue, 12/17/2013 - 06:33

That said, there would have to be examcollection VCP510-DT a tradeoff and balance. It really is totally unreasonable of me to think that the VCP510-DT vce questions grad school and department will let me get credit for my "minor" for just taking classes that are hard to demonstrate are part of my research. vce VCP510-DT

by Klear Knkow on Sun, 12/29/2013 - 15:48

Sometimes it is extremely difficult to explore good and useful information out there when doing research for Mongolia Fly Fishing. Now I will send it to my colleagues as well. Thanks for helping me out. http://google.com/images

by abbyby1 on Tue, 12/31/2013 - 03:33

your website actual nice administration I am a real animated of your analytic website thanks travel

by barkat on Thu, 01/16/2014 - 03:43

thanks blogger I acclimated in my blog now you can find there, its appealing! http://makercoffeecupcoffee.co.uk/

by Nana on Thu, 01/16/2014 - 20:14

It's a one-day hackathon, so I didn't want to get too fancy, but it did look like I'd want to do some work to distinguish the datasets listed and the agencies producing them. This article is good. I like to read it. This is my first visit to your blog! We are a team of volunteers and starting a new initiative in a community in the same niche. Your blog provided us beneficial information to work on. You have done a marvellous job! DVD Releases for Sale

by Diana on Sun, 01/19/2014 - 21:09

broader set of participants and approaches to "texts" taken broadly. sagacity of humor, let an individual assume, and let individuals with a cheerful mood, the sensation is extremely smart. i prefer the article, the center be convey the authors of shares.
Futurama Seasons 1-5 DVD Box set
Family guy seasons 1-11 DVD Box set
Dexter Seasons 1-8 DVD Box set
Two and half men Seasons 1-10 dvd box set
Sons of Anarchy Seasons 1-5 DVD Box set

by Diana on Sun, 01/19/2014 - 21:10

I fired up another script to add links between the parent and child agencies. Painting is an art that demands lot of imagination, talent and concentration. The stories

monster beats by dr dre headphones
Cheap monster beats headphones
Cheap monster beats headphones
monster beats wireless headphones
Cheap monster beats outlet

by mobil sedan on Mon, 01/20/2014 - 01:59

Superb blog! Do you have any hints for aspiring writers?
I'm hoping to start my own blog soon but I'm a little lost on everything.
Would you advise starting with a free platform like
Wordpress or go for a paid option? There are so many
options out there that I'm completely overwhelmed ..
Any ideas? Thanks a lot! Mobil Corolla,agen Texas Online Indonesia Terpercaya

by Jack Mega Sena on Mon, 01/20/2014 - 11:26

For me, having a successful life is not just about having a lot of money for me to be successful is to have a family, a companion wife, wonderful children, and be working on what we like, as in my case, make money with my Resultados Da Lotofacil and Resultados Mega Sena blogs, for me it is to have a successful life.

by angelsmith on Tue, 01/21/2014 - 01:43

Thanks for this advantageous reminders and advice. It's accurate

by technology on Mon, 01/27/2014 - 00:52

Excellent approach ! I anticipate it will accessible for all marketers to advance their content. http://forextrading1online.com

by sophiawright625 on Tue, 01/28/2014 - 05:37

Excellent approach ! I anticipate it will accessible for all marketers to advance their content. http://universityeducationeducationcollege.com/


Add comment

"Any medium powerful enough to extend man's reach is powerful enough to topple his world. To get the medium's magic to work for one's aims rather than against them is to attain literacy."
-- Alan Kay, "Computer Software", Scientific American, September 1984

Search form

I'm patrick_mj on Twitter

Subscribe to

© Patrick Murray-John. All content is CC-BY. Drupal theme by Kiwi Themes.