Monday, November 3, 2008

Week 10 Readings

Digital Libraries-Challenges and Influential Work

Mischo sets out to describe the complexities involved in the creation and maintainence of digital Libraries. He states that a digital library is more than a collection of sound, image or data files.

AOI Meta Data for Libraries

Open Archive Initiative Meta-data Harvesting Protocol is a way to collect information about the structure of archived data. The OAIMH protocol was designed as a simple, low-barrier way to achieve interoperability through meta-data harvesting. Exactly how useful meta-data sharing will be has not been fully determined. However, considerable interest in OAI and experience with early OAIMH implementations is encouraging(Warner, S., Exposing and harvesting meta-data protocol, 2001).

Deep Web Surfacing Hidden Value

Programs called spiders or web crawlers are deployed to hunt the web pages in search of content. Some types of content, however escape detection by being buried or hidden from those commonly used detection programs. Think of it as viewing a photo. One can see what is in the foreground with little effort, but may need a magnifying glass to pull up finer details. The web crawers function as this magnifying class but is limited to commonly detectable elements. To get a veiw of the finer details or detect encoded meta-data, different program was needed. BrightPlanet technology was invented to read header packets and detect content by the size of the files. While this sounds simple it is very effective. Small content files do not use much in the way of bits and bytes, but files with larger files do.

Site characterization required three steps:

  1. Estimating the total number of records or documents contained on that site.
  2. Retrieving a random sample of a minimum of ten results from each site and then computing the expressed HTML-included mean document size in bytes. This figure, times the number of total site records, produces the total site size estimate in bytes.
  3. Indexing and characterizing the search-page form on the site to determine subject coverage. (Bergmen, M. , 2001).

No comments: