Week 12 Reading Notes

Web Search Engines: Part 1 and Part 2 (Hawking, D.)

I would be very interested in what kinds of indexing methods search engines use for organizing information on the Web—crawling algorithms.
I found the part about spam rejection very interesting—especially how some spam sites will give crawlers different information than they give to site visitors.
I found both of these articles completely fascinating and have determined that I want to be a crawler when I grow up.

Current Developments and Future Trends for the OAI Protocol for Metadata Harvesting (Shreeves, S., et al)

This article was written in 2005. How was OAI changed since then?
How was it decided that Dublin Core became a sort of metadata standard for OAI?
For the Sheet Music Consortium, how effective is it that users are annotating the metadata records?

White Paper: The Deep Web: Surfacing Hidden Value (Bergman, M.)

What search engines, if any, use BrightPlanet’s search technology that retrieves both deep and surface content?
When it comes to the size of the surface and deep Web, how has it changed since 2000?
Does having full access to the deep Web mean that we will be able to retrieve more accurate and relevant information or that we will simply retrieve more information that is perhaps irrelevant?
Are people not finding what they want on the Web because of limited access or because they lack sufficient searching skills, such as using synonyms for search terms? Or perhaps there’s already too much information that people have to sift through?

LIS 2600 - Intro to Information Technology