Hacking on Cooper-Hewitt's data release at THATCamp, Or, How to get me to work for free (*)

For THATCamp Prime V, we tried out having a hackathon on a dataset. One suggested dataset was the Cooper-Hewitt data on github. I tried out putting it into an Omeka site and seeing what possibilities were there.

Hacking on the data

There were a couple things that I wanted to do as I pulled the data into Omeka. First, I wanted to map the Cooper-Hewitt data onto Dublin Core as best as I could. Sometimes this was a little tricky, since I wasn't entirely sure of the correct mappings, and was working from just a small sample of what I saw in a tiny handful of real data that I browsed through.

In the first round, I mapped many things onto Dublin Core Subject -- things like 'culture', 'dynasty', and 'period'. I went with that because I figured that, as a broad term, "Subject" would be a nice way to start broad to see lots of material.

I also created a "Cooper-Hewitt" item type in Omeka, so that I could recreate those more precise fields. The upshot is that Dublin Core Subject incorporates a lot of different kinds of data found in the CH release, and the item type data maintains the distinctions. That might allow for broad connections via Dublin Core Subject, and more focused connection via the original data.

Since Omeka is mostly about publishing/displaying, I let drop some data, like references to collection ids where I had no data about the collection itself.

Some were a bit more tricky. Where I could map a field in the CH data onto a Dublin Core field, I did that and let it drop from the item type data. On the next import, I will probably not do that. It seemed good at the time in terms of normalization, but once I saw the real outcomes I realized that that was a bad idea. If I can do some normalization into DC and maintain the original data, I should do that with as little data loss as possible. For example, 'provenance' and 'credit_line_repro' were both mapped onto DC Provenance and dropped after that. I should recreate both in the item type data to maintain fidelity to the original, even while I push things together in the DC data.

So I pulled in about a tenth of the data, just to see what it looked like and what ideas it sparked.

And the ideas it sparked -- hoo, boy -- led me into some. . . let's call them complications.

"Ownership" of data

Data came in happily, and I had a site to look at based on the CH data. The 'media_id' field purported to nicely gave a filename for a thumbnail that would connect with their media.csv file. In practice, the real URLs to thumbnails were a little different, but it was easy enough to build the correct URLs ignoring the media.csv data and following the pattern in practice.

So I had a site to clicky-browse around. Turns out, even with only pulling in a fraction of the entire data, I started to see both potentials and problems.

For example, as I clicked around, I found myself wanting to see a screen that listed all of the possible values for a 'culture' or 'period', and get all the items that fit one of those criteria, like "show me all the items from the mid 19th century".

Programaticaly, I'm pretty sure that's easy. But just to make sure, I took a dip into the database to see all the possible values for a field.

That's when I discovered this.

  • mid-19th C
  • Mid 19th Century
  • mid 19th Century
  • mid 19th century
  • mid-19th century
  • Mid-19th century
  • Mid-19th Century

Seven different ways in which the concept was expressed.

To be sure, the folks as CH were aware of problems like this. In general, this really isn't a problem. I mean, if I'm looking at just one record, card-catalog style, this gets me the info I need without ambiguity. As a human being reasonable skilled in reading, I'll transparently map each of these mentally onto the same concept without worry.

This is really only a problem when someone like me comes along and wants to work not from understanding an individual record, but from finding ways to group and regroup records with the help of technology.

So, if I wanted (as I do) to produce a page listing all the periods, so people could click to see all the items from that period, that would be inverting the way the data display is designed. That is, these seven different ways to express the same concept, the mid 19th century, work perfectly well if the starting point is a single item and the goal is to give information about it. But, if we want to go the other way, starting with the concept of 'mid 19th century' and see what's relevant, this completely fails.

That's when the magic of Cooper-Hewitt releasing the data on GitHub, under a CC-0 Public Domain Dedication really hit me.

If the data had been available via an API, that would have put a huge burden on my site. I could have grabbed the data for the 'period', but to make it useful in my recontextualization of the data, I would have had to grab ALL the data, then normalize it, then display it. And, if I didn't have the rights to do what I needed, I would have had to do that ON EVERY PAGE DISPLAY. That is, without the licensed rights to manipulate and keep the data as I needed, the site would have churned to a halt.

Instead, I could operate on the data as I needed. Because in a sense I own it. It's in the public domain, and I have a site that wants to work with it. That means that the data really matters to me, because it is part of my site. So I want to make it better for my own purposes. But, also, since it is in the public domain, any improvements I make for my own purpose can and should go back into the public domain. Hopefully, that will help others. It's a wonderful, beautiful, feedback loop, no?

As a fork of CC-0 content from github, it sets off a wonderful network of ownership of data, where each node in the network can participate in the happy feedback.

Google Refining for free

With an idea for a site and how it could display the data, I needed to start making the data work well with my ideas. Basically, I just wanted, for example, links to items related to the nineteenth century to show up together alphabetically. Clearly, starting with "Mid" would produce some problems when we talk about more than one century. "Mid 16th century" would show up close to "Mid 20th century". Not the desired outcome.

This turned out to be a great chance to use Google Refine for real. I'd played with it on fake data, but now I had data that I cared about because it would go into a site with my name on it -- and Cooper-Hewitt's.

For example, this

became very quickly and easily. Nice!

After a few hours cleaning up a LOT -- and I mean a LOT of data this way -- I suddenly asked myself why I was doing all this work for Cooper-Hewitt for free.

The point is that I wasn't, or at least not exclusively. I was doing the work for me, too, because I was taking some ownership of the data as it related to the representation of it that I wanted to produce. So, they'll get some cleaned up data if they accept the pull request, and I'll get some nicer data to produce a nicer display if I follow through and keep building the site.

So, why did I spend some hours (and will spend more) cleaning up data and contributing it back?

  1. The data is in the public domain, so I own it, and so do all of you, and we can do useful things with it.
  2. I came up with a small project of my own to do something possibly useful with it. The fact that the project was all mine in the implementation gave me even greater sense of ownership of the data -- and hence responsibility for data along with my responsibility for the project.

So What?

The folks at Cooper-Hewitt knew that there were some problems with the data, and probably were hoping that by releasing it this way they would get some folks to improve it in a crowdsourcing kind of way. Here's an idea on how to more strongly prompt that into action.

In addition to releasing the data, suggest some project ideas for people to pick up, even while pointing to possible data clean-up that might be needed for the project. For example:

  • Take the data and put it into ViewShare (you might also want to map some fields into more standard schema)
  • Take the data and put it into Omeka (you might need to normalize some of the field values)
  • Take the data and RDFize it (you might want to explore the possible schema available for the various fields, and mint Cool URIs for each record)

I'm sure there are other possible projects to suggest. I'm also pretty sure that various people -- and maybe even some university courses -- would welcome the suggestions for projects that could let them build something while engaging in the shared improvement of data that we all own.

(*) Roy Rosenzweig Center for History and New Media is disqualified from this offer


by Joan Beaudoin on Wed, 06/27/2012 - 09:20

Thanks for the post about your experiences with the dirty data. Your post highlights a historical issue in museums that I think involves a lack of funding for basic aspects such as description of their collections. The data is only as good as the knowledge and skill of the people doing the data entry. As you found, much of the work is done by staff who may have a great deal of knowledge concerning the objects themselves, but don't have a full understanding of what occurs when a single concept is entered multiple ways in the database (as in your Neoclassical example above). While entering data in this way is fine, at least on an intellectual level for each single record, when you try to perform queries on the data in the aggregate chaos ensues!

by is bubblegum ca... on Wed, 04/02/2014 - 07:31

So I think that when we talk about code and DH, we might be losing sight of the real virtue of coding and hacking for us

by is bubblegum ca... on Sat, 04/05/2014 - 05:58

I was surfing net and fortunately came across this site and found very interesting stuff here Thanks. Arab chat

by Eve on Sun, 04/13/2014 - 20:41

Thanks for your great article, i get new information, new ideas to do somethings, i hope you will share again, i keep waiting for next post, thanks. - Jual Jaket wanita Online - Jual Sepatu flat wanita Online

by Henry on Sun, 04/13/2014 - 20:50

Jika saat ini anda mempunyai rencana untuk melakukan perjalanan wisata, maka anda bisa mendapatkan layanan paket wisata terbaik di sini dengan harga termurah. travel wisata dengan harga murah. Kalau anda sudah punya rencana silahkan hubungi kami dan baca lihat produk kami
Bagi anda yang membutuhkan pashmina cantik untuk aktifitas sehari-hari anda, kami menyediakan berbagai motif pashmina yang sangat indah. Silahkan hubungi kami jual pashmina motif cantik. Mau melihat koleksi pashmina kami? Silahkan baca kunjungi blog kami
Adapun koleksi batik pekalongan semakin diminati saat ini, berbagai motif dan model menjadi incaran di pasaran saat ini jual gamis batik sesuai untuk hari raya. Tertarik untuk melihat aneka ragam pilihan batik pekalongan?klik di sini
Kata-kata cinta romantis begitu indah akan mempesona hati dari kekasih anda. Jika anda ingin mendapatkan koleksi kata mutiara yang paling ampuh, silahkan baca di kata romantis. Mau melihat kumpulan kata mutiara yang terbaik untuk pasangan kekasih yang romantis? baca berita selengkapnya

by Artikel Kesehatan on Wed, 04/16/2014 - 09:51

Tips Menghilangkan

Jerawat Dan Bekas Jerawat Dengan Alami
Makanan Enak Yang Dapat Mencerdaskan

Cara Berhenti Merokok Dengan Mudah

Dan Efektif
Tips Mudah

Menghilangkan Capek Di Tubuh Anda
Manfaat Buah Melon Bagi

Tips Efektif

Menghilangkan Stres
Manfaat Minum Air Putih Di

Pagi Hari
Manfaat Dahsyat Madu Bagi

Kesehatan Tubuh Anda
Tips Mudah Menjaga

Kesehatan Mata Anda
5 Makanan Pencegah

Ejakulasi Dini

by Pradeep on Sat, 05/11/2013 - 06:30

Thank you for your informative post

.net training in chennai

by Ahmad on Wed, 01/29/2014 - 02:38

Forex certainly is the largest debt market anywhere. It delivers trillions about dollars about currency swaps everyday that's why operates round the clock and 7 days a workweek therefore, also getting the a large number of liquid market anywhere. contentneteducation.net

by Ahmad on Thu, 02/20/2014 - 05:46

There is a very simple and easy way for the sale of your car which having a deal with a dealer. The dealer who deals in the purchase of used cars will give you opportunity of picking up the car from you home and in any condition. These dealers have many customers for these cars. http://www.akersrolls.net

by asdf on Thu, 02/20/2014 - 08:09

Being located in the westernmost tip of Europe in the Iberian Peninsula at the meeting point between the Atlantic and the Mediterranean, yacht charter Spain is among the top destinations for people who enjoy sailing http://www.yachtchartersspain.com/

by sidra on Sat, 02/22/2014 - 05:23

You must know by now, your article goes to the nitty-gritty of the subject. Your clarity leaves me wanting to know more. Just so you know, i will immediately grab your feed to keep up to date with your online blog. Sounding Out thanks is simply my little
Tablet Covers

by maryann on Mon, 03/31/2014 - 06:36

Internet based business guides would also have a comprehensive section that would ideally educate you with regards to the legal aspects of running an internet business. myhowtoguides.net

by maryann on Mon, 03/31/2014 - 06:36

The town of London is recognized as among the four style capitals on the planet. All the brand new and forthcoming trends associated with fashion as well as beauty help to make their looks for the very first time in these types of cities. rescuebeautylounge.net

by wawasan online on Sun, 03/23/2014 - 07:06

This blog post really grabbed my attention. With that said I am going to subscribe. Therefore I will get more updates on what you have to say. Please keep writing as I want to learn more. tips cepat gemuk

by nothana on Thu, 03/13/2014 - 02:45

Using a shared account as a backup/storage device is not permitted on hostgator host provider.

by arma 3 server mieten on Sat, 04/12/2014 - 07:51

Positive site, where did u come up with the information on this posting? I'm pleased I discovered it though, ill be checking Thanks

by honey on Sun, 04/13/2014 - 13:09

When everything else physical and mental seems to diminish, the appreciation of beauty is on the increase.

by Ahmad on Wed, 01/29/2014 - 02:38

When buying good booklet on debt management it is advisable to first work out the experience degree the precise book you are thinking about getting. Many article marketers target people who had a specific degree financial cleverness. financialreviewofbooks.net

by Ahmad on Wed, 01/29/2014 - 02:39

Dubai certainly is the perfect place while the oriental ecstasy for the east joins the warehousing giants for the west. Dubai using six other sorts of emirates creates the United Arab Emirates. As well as industry, lube production and additionally trade, tourism comes with substantial revenue to emirates' country's economy. www.goldislandhotel.net

by Ahmad on Wed, 01/29/2014 - 02:39

As well as will 2013 clinical reform transformations impact hiring managers financially, but so many new cooperate laws might also directly have an effect on the finance stability about businesses across the country. From taxation limit transformations and salesperson retirement packages, to workers' recompense insurance and additionally employment law regulations, businesses really are facing a tough year in front of you. healthfirstfinancial.net

by How to get out ... on Mon, 04/07/2014 - 10:57

Well full schedule for regional broadcasts can be found by clicking the link below and fans are encouraged to check local listings to find out which games are available Accident Claims

by Ahmad on Wed, 01/29/2014 - 02:40

About the most important divisions of whatever hotel office personnel is human resources management. Proper human resources management could possibly difference between an exceedingly well dash hotel along with poorly a hotel. hotels-in-milos.net

by Ahmad on Mon, 02/03/2014 - 04:27

In warm, arid countries, grass mats serve as a building material. Make a simple mat for a bed, or connect many large mats to make a hut. If in an emergency situation, weave grass mats to create a small, warm shelter. Use these mats in your garden to mark out paths and mulch the areas between plant rows. Grass Mats

by Ahmad on Fri, 02/14/2014 - 02:40

Bargain hotel estimates, without troubling quality is normally something individuals want. Whether planning which usually dream christmas, work event and also weekend getaway an average often realises themselves considering quality vs cost. www.cheaphotelprices.org

by Ahmad on Fri, 02/14/2014 - 02:40

Vegan handbags or vegetarian bags are often the handbags prepared using purely natural material sole. In other sorts of words, they are manufactured from materials not produced animal just by either destroying them or possibly harming individuals. www.fashionhelp.org

by Ronald McGuire on Thu, 04/03/2014 - 11:07

I am new with these stuff. Let me imply this. I am trying to gather more information. Cheap houses for sale in detroit

by Denis Mullaly on Fri, 04/04/2014 - 15:24

I am a new hacker. I am trying to be with a community of hacking. I am looking to know more

by jaeef on Mon, 04/14/2014 - 05:39

I am so happy to read this. This is the kind of manual that needs to be given and not the random misinformation that's at the other blogs. Thanks for sharing this Wooden Door Frames

by Ahmad on Fri, 02/14/2014 - 02:40

Car or truck values be contingent on the health of your used car and how owner has treated the used car. The car and motorbike industry comes with book character for put into use cars to assist you to in determining what a car or truck is seriously worth. Keep in view these books are accustomed as guides to assist you to establish the extra worthiness. www.autovalues.org

by Ahmad on Thu, 02/27/2014 - 23:33

Mahopac is a hamlet in New York and is located 76 Km to the south of the city. The city has a beautiful lake of 587 acres known as Lake Mahopac from which it gets its name. thomsoncanopus

by jhony on Wed, 03/19/2014 - 01:13

Providing a user-friendly mobile website is really just as important as creating one that works on standard desktops. For some, filling the gaps with app development is the way to go, and some websites are better suited than others to this solution. website design lexington

by Henry on Sun, 04/13/2014 - 23:56

saat ini kehidupan pernikahan suami isteri kerap dihadapkan pada permasalah di mana mereka menemui kebosanan dan membutuhkan variasi. Dr.Boyke menyarankan agar pasangan menggunakan obat atau alat bantu untuk membantu variasi kehidupan seksual mereka….pembesar alat vital yang berkualitas. Dapatkan produk berkualitas untuk obat kuat pria dan perangsang wanita, lihat info lengkapnya
Bagi teman-teman wanita yang membutuhkan banyak tips seputar kehidupan wanita bisa mendapatkannya di sini. Kami mempunyai banyak tips kecantikan dan kesehatan wanita di mencerahkan wajah secara alami. Ingin info lengkap dan terbaik? pilih produk yang anda inginkan
Untuk membantu banyak orang dalam mempelajari dunia web, maka kami mencoba untuk memberikan beberapa panduan blogging yang sangat sesuai bagi para pemula. Temukan tips dan tutorial selengkapnya di sini….tutorial blogging terbaru. Pelajari cara membuat blog yang bersaing di SERP, click here

by Zahid on Sat, 07/13/2013 - 09:31

It is an excellent strategy for online business that is possible with the help of Google and other useful tools. more details

by Williams on Sat, 12/21/2013 - 06:16

The study of law crosses the boundaries between the social sciences and humanities, depending on one's view of research into its objectives and effects. Law is not always enforceable, especially in the international relations context. press release distribution services

by Matt Coffey on Thu, 12/26/2013 - 01:19

i was not known about this in details.thanks fro sharing this here. carolina herrera

by paulleon370 on Thu, 01/16/2014 - 15:39

The post is written in very a good manner and it entails many useful information for me. I am happy to find your distinguished way of writing the post. Now you make it easy for me to understand and implement the concept.

by Sim Patrick on Sat, 04/12/2014 - 01:24

Is there any website who provide online training of hacking? i need to know this. go to this site

by Ali on Sat, 08/17/2013 - 14:52

I realized that that was a bad idea. You must focus its main point and try to get some better. http://best-cv-templates.com/ is great CV providing source.

by gippy on Wed, 12/18/2013 - 13:56

It's not the tools that you have faith in - tools are just tools. They work, or they don't work. It's people you have faith in or not. Yeah, sure, I'm still optimistic I mean, I get pessimistic sometimes but not for long.
mcrae jungle boots

by Henry on Sun, 04/13/2014 - 23:57

Outbound training adalah pelatihan yang ditujukan untuk membentuk kerjasama antar individu di dalam sebuah tim. Saat ini tempat pelatihan outbound yang paling terkenal adalah di bogor. Dapatkan info outing rafting di bogor. Mau tahu program pelatihan outbound yang terbaru? baca beritanya
Kami adalah jasa travel dan rental mobil yang sudah sangat berpengalaman di kota padang. Jika anda membutuhkan paket perjalanan wisata di wilayah sumatera barat, segera hubungi kami pelangi holidays. rental mobil Toyota di sumetera barat. Butuh travel wisata berpengalaman di padang? lihat penawaran kami
Beragam model jaket korea saat ini semakin diminati. Jika anda membutuhkan berbagai koleksi dengan tampilan yang menarik dan harga terjangkau dengan bahan fleece berkualitas Jaket Korea semakin digemari di indonesia. Dapatkan info lengkap tentang produk jaket korea yang terbaik dan Dapatkan info selengkapnya

by php training in... on Mon, 08/19/2013 - 08:44

Good Information

php training in chennai

by gujjar on Wed, 02/26/2014 - 05:13

Guys wake up at your place and they expect breakfast. They don't eat bagels and M&M's in the morning. They want things like toast. I say, 'I don't have these recipes.'
easy recipes

I loved as much as you'll receive carried
out right here. The sketch is attractive, your authored subject matter stylish.
nonetheless, you command get bought an shakiness over that you wish
be delivering the following. unwell unquestionably
come further formerly again as exactly the same nearly a lot often inside case you shield this

by Sherry Rose on Sat, 04/12/2014 - 02:11

hacking should not be a crime. It is an art. There are also so many good uses of hacking. h2s alive calgary

by Sherry Rose on Tue, 04/15/2014 - 07:25

I like this site because this site always help me with coding for hacking. bed sheets

by Alsia on Tue, 08/20/2013 - 12:39

Nice art collection. Once you have a complete body of work that you are comfortable with, you can then begin your quest to find a gallery space that is suitable for your work. http://www.austininjurylawyernow.com/

by eavedrop44 on Mon, 03/10/2014 - 02:52

Positive site, where did u come up with the information on this posting? I'm pleased I discovered it though, ill be checking back soon to find out what additional posts you include.SEO Manchester

by Henry on Sun, 04/13/2014 - 23:58

Apakah anda membutuhkan pengobatan tradisional untuk penyakit sesak nafas atau yang biasa disebut sebagai asma? Jika ya maka kami memyediakan pengobatan yang alami asma diawali dengan gejala seperti ini. Apakah anda ingin mencoba produk herbal untuk asma anda?Kunjungi situs kami
Sprei untuk kenyamanan tidur anda sangatlah penting. Jangan memili sprei atau bed cover sembarangan karena bisa jadi anda merasa rugi membelinya. Seringkali sprei berkualitas justru diperoleh dari industri rumahan. bisnis sprei dan sarung bantal semakin meluas. Lihat koleksi sprei dan bed cover kamiBaca info lengkapnya
pacar kuku adalah koleksi yang bisa membantu penampilan anda untuk terlihat lebih cantik tapi siap dibersihkan kapan saja saat anda ingin sholat. Ini sangat praktis bagi anda kaum muslimah yang dinamis. jual kutek muslim arzoo nnail henna yang berkualitas. Lihat koleksi pacar kuku dan kutek kami. masuk ke sini

by Jeff on Fri, 08/23/2013 - 03:46

I totally feel your point about the "mid-19th century" dilemma. Makes it extra hard to properly format so many of my documents when there are so many different ways to express the same idea.Jeffwww.breakintobartending.com

by princemano on Mon, 01/27/2014 - 11:43

This is a nice post in an interesting line of content.Thanks for sharing this article, great way of bring this topic to discussion.
Brooklyn Fashion Awards


Add comment

"Any medium powerful enough to extend man's reach is powerful enough to topple his world. To get the medium's magic to work for one's aims rather than against them is to attain literacy."
-- Alan Kay, "Computer Software", Scientific American, September 1984

Search form

Info about apps mentioned

I'm patrick_mj on Twitter

Subscribe to

© Patrick Murray-John. All content is CC-BY. Drupal theme by Kiwi Themes.