Copyright © 2004 by The Chronicle of Higher Education; All Rights Reserved
From the issue dated April 30, 2004

ONLINE

Here Today, Gone Tomorrow: Studying How Online Footnotes Vanish


By SCOTT CARLSON


Scholarly citations in cyberspace are like atoms in various states of decay, say Michael Bugeja and Daniela Dimitrova.

Mr. Bugeja and Ms. Dimitrova, professors of journalism and communication at Iowa State University, are undertaking a study to determine what they call "the half-life of Internet footnotes." They describe that as the typical length of time it takes for half of the Web addresses in a scholarly article to become outdated, broken, or changed.

Their preliminary investigation confirms what many librarians, archivists, and scholars know firsthand: The Internet is an unstable, fluid medium, unsuitable for the long-term archiving needs of academe.

"The footnote is basic to research," Mr. Bugeja says. "If we cannot rely on a footnote because the medium is too dynamic, then Internet scholarship will always be a second-class citizen in academe. Our hope is to recommend ways to stabilize a basic element of research, so that future generations can investigate the Web and Internet-related topics with the same reliability that one would find in the library."

In their preliminary study, the Iowa State professors examined the links cited in articles accepted in 2003 by the communication-technology division of the Association for Education in Journalism and Mass Communication. Of the 108 links examined, 40 percent no longer worked. The scholars estimate that the half-life of the links included in their study is about one year and three months.

Mr. Bugeja and Ms. Dimitrova plan to expand their study to examine links used in the most popular journals in mass communication. At the end of their study, they will draw up a list of recommendations for preserving links cited in their discipline. Ms. Dimitrova says that they may recommend having either their professional association or the individual journals archive the Web pages cited in research.

Sources Disappear

The idea for the research came out of Mr. Bugeja's personal experience. He recently finished writing a book about technology and mass media. When he was almost finished with his first draft, he checked the Web pages he had cited and noticed that about 30 percent of them had vanished.

"This frightened me because this was a book about the Internet, and there was no way for me to write the book without incorporating these Web sites in my reference notes," he says.

Researchers in other disciplines have been looking into the longevity of Internet links as well.

Jonathan D. Wren, a research scientist in bioinformatics at the University of Oklahoma, recently published his research on the half-life of Web links cited in article abstracts on Medline, a database run by the National Library of Medicine.

He says he tested Web addresses found in abstracts, instead of in full-text articles, because the addresses cited in abstracts are generally essential to the content of the article.

Oddly, he says, the first two links published in Medline, in 1994, still worked. But of the 1,630 unique addresses found in abstracts over the past 10 years, 20 percent did not work. He estimates that the half-life of links in the database is seven years.

He is working with Robert P. Dellavalle, an assistant professor of dermatology at the University of Colorado Health Sciences Center, who has studied the problem of Web links in medical and science journals. They are compiling recommendations for preserving links for future researchers. Then the two scholars will present the recommendations in a letter-writing campaign, starting within a couple of weeks, to the most prestigious journals in biology and medicine.

Coming up with the recommendations, however, is a bit of a challenge. Authors can print out and keep copies of Web pages, but that doesn't help if the authors' notes disappear, or if the cited pages feature programs that perform calculations or contain music or video.

"There is a difference between passive and active content," Mr. Wren says. "Before we start the letter-writing campaign, we want to look into this and not be naïve."

http://chronicle.com
Section: Information Technology
Volume 50, Issue 34, Page A33