I've been holding on to an article in my inbox for some time on "Overcoming Data Friction." In it, Jon Udell describes "data friction" as both intentional and unintentional barriers to making public data both available and usable. His article was prompted by the announcement that EveryBlock needed to hire a computer programmer to "scrape" data from public websites -- in other words (and I'm sure I'm putting this badly) writing a program to automatically get information from websites where you can find the data but where you would otherwise need to print it out and retype it for it to be of any use.
Take a moment to read the article and the responses to it. We all know the problem -- we spend too much time re-entering information from their website to our excel spreadsheets to post on our websites. It's a waste of resources that doesn't need to be that way; here's what Jon says:
Data friction can be intentional or not. When it’s intentional, you might have to file a FOIA request to get it. But in a lot of cases, it’s unintentional. The data is public, and intended to be widely seen and used, but isn’t readily reusable.
Consider the following two restaurant inspection records for Bully’s Deli in New York:
1. in the NYC Department of Health website
2. in EveryBlock
It’s the same data, from the same source, but EveryBlock makes better use of it. In the NYC website, you can search by ZIP code and number of violations. In EveryBlock you can search more powerfully, and you can ask and answer questions that matter to you. Maybe you care about shellfish. Have any Manhattan restaurants been cited recently for use of unapproved shellfish? Yes: five since January 21.
What EveryBlock is doing is completely aligned with the interests of the NYC Department of Health. Tax dollars are paying for those restaurant inspections. The information is published in order to make New York a safer and healthier place. It’s great to have this data available in any form, and it’s great to see EveryBlock adding value to it.
Now it’s time to grease the wheels.
Here’s one way that can happen. An enlightened city government can decide to publish this kind of data in a resuable way. I’ve written extensively about Washington DC’s groundbreaking DCStat program which does exactly that. I can’t wait to see what happens when EveryBlock goes to Washington.
But city governments shouldn’t have to go out of their way to provide web-facing data services and feeds. Databases should natively support them. That’s the idea behind Astoria (ADO.NET Services), which is discussed in this interview with Pablo Castro. If the NYC Department of Health had that kind of access layer sitting on top of its database, it wouldn’t put EveryBlock’s screen-scraper out of a job, it would just make that job a whole lot more interesting and effective.
With the work of the State of the USA project and its opportunity to push for data format standardization, and the efforts of the OECD to bring people together in using SDMX as a statistical data exchange standard, we have more opportunities to lessen "data friction." I don't know enough on the technical side of things to understand how this works. (That's why the article sat in my inbox so long.) But clearly, using a standard for sharing statistical data makes information-sharing much easier, and can only help the local community indicator efforts.
Pedro Díaz Muñoz, Chair of the SDMX Sponsors Committee, said:
I firmly believe that the SDMX standards and guidelines provide cost-effective solutions for the production and exchange of official statistics between national and international statistical systems. As in the past, the SDMX Sponsoring Organisations encourage all interested parties at international and national level to contribute actively to the realisation of this vision by participating in the further development of the SDMX standards and guidelines as well as to its active implementation.
I think we can, in our local communities, push for adoption of SDMX standards. We can try to follow along as the process and standards are developed. Most importantly, however, in our local purchasing/development decisions, demand of our web developers adherence to SDMX standards, and help establish the international standard.
It should pay off in incredible dividends for us over time.
What are your thoughts? Is my understanding of SDMX off? What about XML? What should I have known in order to make this post more coherent?
Digging Deeper: Mapping Maternal Health
-
Combining Surgo’s gMVI with PolicyMap Datasets for Deeper Insights We
recently announced the integration of Surgo Health’s Granular Maternal
Vulnerabili...
10 hours ago
0 comments:
Post a Comment