(Translated by https://www.hiragana.jp/)
Rethink Digital Archiving - An Important Question | NewAssignment.Net
The Wayback Machine - https://web.archive.org/web/20090124005322/http://newassignment.net:80/blog/tom_cheredar/sep2008/28/rethink_digital_
NewAssignment.Net

User login

Join NewAssignment.Net’s Facebook Group.

WHERE WE ARE

BeatBlogging.Org

13 beat reporters build social networks into their beats.

OffTheBus.Net

Help us cover the presidential elections at OffTheBus.net

Broowaha.com

A citizen journalism network to experiment with distributed reporting.

Readable Laws

Explaining Congressional legislation in plain English.

Assignment Zero

Published in Wired News.


Want To Learn More About NAN?

Check out this 7-minute interview with Jay Rosen. Or watch the full presentation at the Berkman Center, also available in MP3, or this five part nicely edited
series
.


Browse archives

« January 2009  
Su Mo Tu We Th Fr Sa
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Rethink Digital Archiving - An Important Question

Tom Cheredar's picture
by Tom Cheredar on September 28, 2008 - 2:47am.

Well before newspapers hit a decade of exclusive Web-only coverage, they may want to take a look at how they archive articles.

Rarely are instances of breaking news done as one official report. Instead there could be a first version that is a few paragraphs explaining the situation, perhaps a quote. Later in a second version this news “snippet” might be updated with whatever was available at the end of the daily news cycle. Then, the following day a more complete third version is published and — lets say there is an error so corrections need to be made. You’re now up to four versions of the report.

My guess is that some multi-version reports are archived separately and others are updated and archived without any acknowledgment of draft history. Logic would dictate that a final version (with corrections) should be the only one necessary to archive for the sake of clarity. On the other hand publishing everything could provide an absolute record of information as it happened. But are either actions really the best policy?

I highly doubt most publications on the web have given much thought their digital archiving policy.

While there aren’t many news organizations old enough to anticipate what the future will hold for past sins of sloppy archiving, an examination of Wikipedia may give us some clues. The user generated encyclopedia is hardly sloppy in its archiving, but issues of filtering out inadequate content are on the rise. Those that have discrepancies with Wikipedia’s policies are creating alternatives to fill in the gaps.

A good example of this is deletionpedia, a site comprised of 60,000 deleted entries from the English Wikipedia. The site’s rational for its actions states: “Who knows what might be lost to the world if we lose anything? And besides, storage is cheap!”

Ars Technica Senior Editor Nate Anderson wrote a great post in which he states:

“…such sites are part of a new, “never lose it“ approach to information collection. Certainly this has tremendous benefits, but in terms of sheer information overload, does it also have costs?”

It’s a question newspaper publishers should be asking their staffs now rather than later.


Deletionpedia: where entries too trivial for Wikipedia live on

Where do Wikipedia pages go to die? To the Deletionpedia, of course, a 60,000+ article archive of material that did not live up to Wikipedia’s high standards. Even the Wikipedia entry for Deletionpedia was in danger of deletion.

Actually, forget what I said a moment ago about high standards; Wikipedia is the sort of site where even Jimbo Wales can’t stand his own entry. “It pains me very much when I go to a conference and someone introduces me by reading from it,” he told the Wall Street Journal this summer.

The Journal piece, by noted science writer James Gleick, gives a nice overview of the factionalism and ideological differences that have turned Wikipedia editing into a full-contact sport. One of the many debates raging in the wikisphere concerns the limits of triviality: a list of minor Star Wars bounty hunters was axed, for instance, while a complete list of Jetsons episodes persists as a service to the world.

As the battle between deletionists and inclusionists leaves the corpses of battered articles scattered about the site, a bot grabs deleted articles and archives them for all to see on Deletionpedia. If you’re the sort of person who cries “censorship!” when a “list of films with monkeys in them” gets deleted for being “unmanageable listcruft,” worry no more.

The site is a fascinating reference work to the kind of material that even Wikipedia, a site that hosts 200 lists of animated TV show episodes, deems too obscure to be worthy. The shockingly long “Weapons of the Imperium (Warhammer 40,000)” was finally deleted on July 9, 2008, for instance, after being up on Wikipedia for 893 days.

3D chainsword model, you
are not wanted at Wikipedia After coverage of Deletionpedia this week from The Industry Standard and Slashdot, the “Deletionpedia” entry on Wikipedia was actually considered for deletion, though it was eventually saved. Phew!

Plenty of user-generated content simply isn’t very good, or doesn’t fall within established parameters, or violates copyright, or does something else that gets it yanked from the sites that host such material, but sites like Deletionpedia and Delutube (now down) have sprung up to archive the deletions.

This is the web’s archival impulse at work—”Who knows what might be lost to the world if we lose anything? And besides, storage is cheap!” But human history has been a record of such losses, most information slipping through the cracks of time and into the void; the human body is the same, at any given moment filtering out the majority of external stimuli in order to provide us that great gift of concentration, focus, attention itself.

Astroloji Ekonomi Emlak Haber Hosting Internet Magazin Müzik Nakliyat Ödev Otomobil Oyun Prefabrik Rüya Sağlık Siyaset Spor Turizm Yemek Gizlilik Bildirimi Yardım Kullanım Şartları Reklam Olanakları İletişim Site Haritası Artı TTNET Bobinaj Digitouch


You have an outstanding good

You have an outstanding good and well structured site. I enjoyed browsing through it.
purchase viagra online
cialis without prescription
viagra professional without prescription


Re Crowdsourcing

Dan, I’m right there with you. I really don’t see a downside to letting the crowd tag the story content for future searching. Yet, I’m still weary of how they update and file ongoing news.


croudsource the tagging of articles for archiving

I’ve long thought that newspapers, whose archives are so vast, could ease the burden of archiving their content by crowdsourcing the tags.

That’s the real problem right now—not the storage space (as you point out, it’s super cheap—but the actual grouping, culling, and arranging of content. Why not let the readers themselves do the work?