- Roy Rosenzweig, “Can History be Open Source? Wikipedia and the Future of the Past“
- Roger Bruce, “Capturing Expertise for the Evaluation of Photographs“
- Mark Lawrence Kornbluh , “From Digital Repositories to Information Habitats: H-Net, the Quilt Index, Cyber Infrastructure, and Digital Humanities“
- Cathy N. Norton, “The Encyclopedia of Life, Biodiversity Heritage Library, Biodiversity Informatics and Beyond Web 2.0“
- Jeffrey Schnapp, “Animating the Archive“
Reading Rosenzweig, Kornbluh, Norton, and Schnapp I am struck by the overt idealism of Web 2.0. One could argue that a revolution of thought and feeling is well underway, that a true democratization of information is arriving, and a new era of collaboration and true meritocracy is on the horizon. Rosenzweig discusses the challenges of overcoming what he calls “possessive individualism” (italics in original) and presents a well-reasoned case study of Wikipedia with an analysis of its achievements and failures. Throughout his article I was impressed by the enthusiastic embrace of the notions behind this “new” collaborative world. Rosenzweig appears to claim that new media is about ideals, not technology. He does this by challenging the notions of the collegiate business model, the need for professional historians to make online history better and more available/accessible to all, the fee-for-service model of the exclusive online archives, and notes the ideals of Wikipedia where one direct challenge to professional historians is clear: There is no privileged position.
Rosenzweig suggests, and I agree, that collaboration is good, ego is bad, and professionals owe it to the amateurs to help them, and the amateurs are in relationship to work with the professionals on some of the data crunching. Sounds very utopian. In fact, it seems to mirror Goggle’s unofficial corporate motto: Don’t be evil.
Google is a pretty good example of the prevalence of ideals in this brave new world. Their corporate mission is: to organize the world’s information and make it universally accessible and useful. Wow. This goal is so lofty that it may be considered hubris to think they could actually pull it off. BUT, the Google phenomenon is real and they are moving towards their mission. They are buoyed by belief and apparently, their ten commandments support the claim that they are a belief-based organization:
- Focus on the user and all else will follow.
- It’s best to do one thing really, really well.
- Fast is better than slow.
- Democracy on the web works.
- You don’t need to be at your desk to need an answer.
- You can make money without doing evil.
- There’s always more information out there.
- The need for information crosses all borders.
- You can be serious without a suit.
- Great just isn’t good enough.
These ten things, as they are called by Google, are not technology-based or economy-based objectives… they are all-out philosophy. This seems to be exactly what Rosenzweig was commenting on. Kornbluh agrees as he attacks the stove-pipe, selfish mentality of previous/current works in favor of collaborative development, sharing, and exploration. This is an essential concept behind cloud-computing, another Google-supported initiative. He describes the Quilt Index as a great success in this collaborative environment, and I have no doubt that it is. The fact that is has grown to such a degree is testimony to the value of standards-based development and collaboration.
Rosenzweig and Kornbluh idealistically point to the one thing your mother may have taught you: It is nice to share and play well with others. Ironically, this seems to fly in the face of current academic practices. While professional academic historians exude the collegial nature of Senators, they can be a rowdy and vindictive bunch. Attend a controversial conference and watch the panel discussions for proof. After all, as Rosenzweig pointed out, a scholar’s measure is his or her reputation as gained through research, publication, and significant labor and as preserved in the form of authorship of the results of that research. It is possessive individualism. If you take that away, what, then, will a scholar use for his CV?
Further challenging the ideals of the Web 2.0 utopia is the Wikipedian declaration that rank has no privilege. After years of servitude to academia, there are no laurels, no seats of honor. That’s a hard pill to swallow and will be fought. If my academic opinion is weighted equally with a Pulitzer-prize wining academician, or a weekend warrior, what are the capitalistic goals? Why work so hard?
Norton and Schnapp examine the possibilities of this new world and point to some of the obvious benefits. Norton discusses some of the cloud-computing-esque notions of digital cross-walking of standards-based data indices. She gives the example of the changes in naming conventions over time for species. That information alone can save countless hours of cross-referencing data. This efficiency can allow for greater allocation of resources to research, not data mining. But, the key is there have to be multiple inputs to standards-based data. We have to share. Schnapp seems to agree when he examines the changes coming to libraries and archives away from the product-based to the process-based. In other words, they become enablers of data transfer, not necessary the agents.
Despite traditional capitalist objections to this model of irrational belief and non-attributable sharing, it appears to work.
Wikipedia provides the evidence. Examining the discussion tab of the Wiki article on the Cuban Missile Crisis, one discovers a vibrant discussion of the material and a rather useful grading scale within broader subcategories as well as an importance scale. This is the most effective and efficient peer review I have encountered.
The history provides a fair picture of how viable this topic still is with over 500 edits this year. History may help with the attribution “problem” for academics and for the editors… but it fails to answer why people spent time editing the articles. Web 2.0 collaboration is a belief system that has significant advantages and it seems to throw much of the capitalist model on its head. The economic/resource advantages of standards and collaboration are obvious, but attribution is a significant emotional component. I refer to attribution in this case as “ownership” of an idea, a conclusion, a process, etc.
Web 2.0 is not about technology or tools, it is about a balance of beliefs and a utopian vision for the world’s data.
Given the following:
- Necessity is the mother of invention.
- There is a need to “move” artifacts from the physical domain to the digital domain to increase access and transportability while protecting the original
- To “do it right” could easily exceed $100,000 for a simple text digitization project
- A digital image of the original is just the beginning
- Preservation also means resolving authenticity issues.
- Knowledge management dictates adequate access to the information contained “IN” the text, not just a representation “OF” the text.
I am experimenting with a low-cost tool suite to enable historians to digitally capture and then manage original texts. Admittedly, the digital capture piece is a tiny fraction of the project, but in a low-cost environment, may require the most labor. The conversion of and the meta-data associated with the information “IN” an artifact are the trickier pieces that require detailed analysis and consistent, uniform, logical rules to be made useful.
I would like to develop a methodology to capture and preserve an original artifact, efficiently convert the information contained in the artifact to machine-readable language, establish a standard set of meta-data associated with a text-based artifact, and render the results in an authentic reproduction of the original in a fully Section 508-compliant and machine searchable, generally accepted format.
I think I can do the first half of that. I have no idea how to do the last half of that, but between MySQL and .XML and a few years of database development experience I think I can pull off the rest.
Here’s what I have so far. Last spring, while volunteering at the National Guard Education Foundation with Zanya, I was given permission to digitally capture, from the original, The Militaman’s Pocket Companion, published in 1822. All I walked away with were 5 megapixel .jpegs of each page taken with my cheap HP digital camera. This is the Preface page 1:
Using Adobe Photoshop, I was able to create the following, easily readable reproduction:
Using Adobe Acrobat’s OCR capability rendered about a 90% solution. Unsatisfied, I tested some other software and found SimpleOCR’s freeware to achieve about a 98% solution:
After manually reviewing/comparing/editing the copy from the original, I am left with this 99.x% machine-readable, fully searchable, copiable, and transformable, data object.
That’s the easy part. I think. From there, one has to develop the appropriate data normalization for massive strings. Not sure how well that will work. Also, rules for generating appropriate meta-data must be developed and applied to the database structure. Finally, the queries have to be written to extract portions of the data from the text as required by the user… this is something I have zero experience in the thin-client world of the internet.
Simultaneously, I would like to investigate .XMLs properties to see if that may render a more useful method. Further, using digital signatures and electronic certificates, I think I can vouch for the authenticity of the changes made and at least be able to claim the mistakes as mine alone.
The intent is to develop a very low-cost, efficient methodology that takes into account preserving the original, converting it within acceptable guidelines, ensuring its authenticity, and enabling transmission over the web, thus allowing an unlimited number of reviewers of the material to examine a text without harming the physical object. In the end, at least this text would be made available back to the National Guard Education Foundation for display as a part of their emerging web-presence.
Having said that, I worry if this methodology and development may be too narrowly focused and not what this class intends. I am focusing primarily on the methodology of the archiving and less on the actual document preserved for the purposes of the project.
Thoughts? Comments? Derision?
Kojo Namdi examines the impact of what he references as new new media on NPR’s Tech Tuesday. Click on Windows Media or Real Audio to the right to listen…
Essential point… Consumers are instant producers.
I really want to meet Errol Morris. Anyone that will go to those links to definitively un-definitively determine the order of a couple of 159 year old photographs with such humor and undaunted enthusiasm is someone I want to know.
Much like the heroes in Alice’s Restaurant digital historians are sidelined by various misdemeanors and conventions. Instead of a thrilling debate about garbage or cannon balls, I think the issues of veracity are core to what historians have always faced. Any primary source could be a ruse. A witness is guaranteed to miss something, and diaries, while usually interesting reading, have to be very gingerly weighed. So, do the challenges of the digital domain really make that much of a difference?
In matters of scale, probably; there is simply more digital data to consider. In matters of integrity, probably not; historians have to weigh each piece of evidence carefully and independently. So the key lesson here is not to blindly trust the picture, the email, the recording, or any element of data; but to carefully correlate it with items that can be verified or at least corroborated.
Valid, correlated, corroborated data is one aspect of a larger problem that Cohen and Rosenzweig bring up in their descriptions of the seven qualities of “digital media and networks that potentially allow us to do things better (p 3, Digital History).” In discussing capacity, accessibility, flexibility, diversity, manipulability, interactivity, and hypertextuality (as well as their corollaries: quality, durability, readability, passivity, and inaccessibility) they bring up the larger topic of Knowledge Management.
Industry, government, and the military like the idea of knowledge management (KM) and have widely varying definitions and implications for KM. In theory, data management manages data at the molecular level. Information management manages access and transport of groups of data allowing for the development and dissemination of information built on the data. KM is IM with some nebulous measurement of artificial intelligence (AI), experience, analysis, wisdom and timing. In other words, (at least in the military’s attempted implementation) KM is the right data at the right place to the right person at the right time to make the right decision.
While not peeling back the cover of that black box of hocus pocus, I think that digital historians face a similar task.
Digital history is the art of acquiring, assessing, making available, analyzing, and effectively using digital means for better historical analysis, writing, conclusions, etc. In other words, digital history enables the right evidence available to the right researcher and the right time to inform the best conclusion.
In all of that hocus pocus, veracity of data, availability, readability, durability, and passivity are all concerns to the digital historian. Nevertheless, while utopia is not around the corner, major advances in the tools and methods are impacting research. My own research into the Cuban missile crisis (www.october1962.com) was largely a digital affair from start to finish. If you examine my bibliography, you will see the National Security Archives at George Washington University were critical to my research. While working from a computer, I accumulated hundreds of memos, messages, transcripts, orders, etc. that I could never have obtained in person. Once written in a traditional format, I was able to transform the data into a web-site and make some of my primary sources available for download.
All in all, I tend to side with Michael Frisch’s “tools-based” view of digital history. It is not quite a new field, but a new and decidedly powerful suite of tools emerging to historians.
In Lev Manovich’s The Language of New Media( MIT Press, 2001), the author posits a couple of observations about new media. Beyond his “aim to describe and understand the logic driving the development of the language of new media” (p.7), he raises several key/troubling issues. Among them are:
- What are some of the implications of “databases as a cultural form” (p219)?
- How do we protect history from the “new media [ability] to create versions of the same object…” (p. 39)?
In the foreword, Mark Tribe describes the “net art” community as one which “possessed an anarchic quality of entrepreneurial meritocracy strikingly different from the rest of the art world…” (p. xii). I think this description fairly describes the impact of the database culture on the post-industrial cultures of the West. In his statement, there is a comparison between an unstructured meritocracy (an oxymoron) and the implicit “rest of the world” which is neither exactly anarchic nor meritocratic. I think that one could describe the “rest of the world” in such loose terms effectively enough, but I believe this leaves an opportunity for the “rest of the world” to strive for structure and the database culture to require at least structure in its framework if not value definitions. In other words, as Manovich points out, there are two ways to order data, flat and hierarchical. That is the key to the anarchy that describes the database culture.
The structure is found in the meritocracy… or on this case, the rational order of the database. The more rational, the higher order; the more flexible, the more useful. The anarchy stems from two opposing methods of implementing that order, flat and equally distributed (in its own way a form of meritocracy) or hierarchical.
Eschewing the esoteric, what are other implications of the database culture?
In the flat organization of data, one relies on hyperlinking extensively. Manovich argues that this is the demise of rhetoric (p. 77). It removes building the case for an argument from a linear progression and presents data in a random access scenario begging the question: Does this change the definition of an intelligent, capable, gifted being? From Plato to the Renaissance to the Enlightenment, the mark of an educated and truly intelligent person was his or her ability to accumulate knowledge and translate it into well-reasoned thought and logic. (Admittedly, this is a boorish over-simplification.) What, then, is the definition of an intelligent, capable, gifted being now?
In the Information Age, there is more relevant material available than can be consumed, let alone mastered. The cultural impact of this is that gifts of reason and rhetoric have indeed been replaced with capability in tools. It is no longer particularly valuable to know the last Aztec ruler, the strategic import of the Second Peloponnesian War, or the role of the church in the development of the printing press. It is valuable to know the events occurred and there may be some import associated with them, but particularly, it is critical to know where to find the data. Analysis of data occurs at near-real time from a vast library increasingly available at the fingertips. As a result, successful people in the current age are not ones with vast knowledge of things, but vast access and experience in finding out.
A second point that Manovich exposed is along the lines of authenticity. In a digital realm, how can we trust the data?
Below is a photograph of DeadGuyQuotes flying an airplane in 2008. On the left you see the author in the right-hand seat of the aircraft in what is typically the co-pilot’s seat, implying he is not the “pilot-in-command” (PIC) (a relevant term in the view of the Federal Aviation Administration). On the right, you see the pilot in the left-hand seat in the pilot’s seat implying he is the PIC.
Which is it? Does it matter?
It does. I was actually flying from the co-pilot’s seat while a licensed pilot was PIC flying from the other seat. My medical certificate has expired and I am not legally able to assume control of an aircraft. But, with a very simple move from Photoshop, I became the pilot. The historian would need the dates and my log book to attempt to validate the veracity of this photo.
Since my logbook does not reflect the day’s travel the historian is left wondering if I simply failed to enter the trip, or I was choosing to fly illegally, or there is a forgery somewhere. In my logbook the historian would have noted an expired medical certificate, but that does not prove that a valid one does not exist somewhere else.
How do we protect the immeasurable amounts of data being collected? How do we determine integrity and authenticity?
In the near future, the historian and archivist will routinely sort through petabytes of email trying to establish a chain of events and discussions where today we are quite happy to swim in lakes of scanned memoranda. Conducting digital forensics on every source is impractical and cost-prohibitive.
I don’t have any particular solutions, either technically or philosophically, but I am greatly concerned about this challenge. As a historian interested in executive American history, and a member of the executive branch of the government, I see a disaster in the making. We are not protecting our archives to make them available to future historians and tools such as the National Archives’ Electronic Records Archives are current projects doomed to failure as trying to do too much for too many with no standards.
Professor Cohen was right, the problem with blogs is not writing enough, but writing too much. There is much more to discuss on the notion of database culture and much more to develop on the preservation of history in the digital domain.
After a long-enjoyed hiatus… the DeadGuyQuotes blogger is back in business commenting on all things American History… ok not all things, just some comments ideally on Executive, Political, Military, and Technological history with particular interest in the Cold War. Some previous research on the Cold War, specifically the Cuban Missile Crisis, can be perused at www.october1962.com . Comments here are always welcome!