Choosing the Right Tool for the Job: Searchbots
by Rich Cummins
Before Gutenberg's printing press, the most common way to cross-reference was to ask the two oldest members of the village to tell you the same story. Then came the printing press, and, because new technologies almost always destroy as much as they enable, we traded a world based on memory and the art of storytelling for one based on a vast storehouse of knowledgeneatly linear and categorized into logical bundles. Gutenberg's revolution led to evolutions in indexing and other bibliographic techniques. By the Renaissance, for the first time in human experience, a person's knowledge horizon began to approach infinity, as the possibilities for cross-referencing became limitless.

The modern dilemmas of data storage and retrieval are actually several centuries old. The unique problem created by the advent of the Internet is that each and every "keyword" becomes its own category, leading to taxonomy ad absurdum. Hypertext has created the desperate situation in which Internet users are awash in a sea of disconnected pieces of data triggered by these so-called keywords. This is not entirely negative, of course: the fun aspect of hypertext is that knowledge can be serendipitous, intrinsically associative, and highly interactive. The downside is that each new thought alluringly propels the researcher through a labyrinth of unrelated connections, an unfortunate situation that underscores two of the most crucial aspects of searching the Web: developing a good search strategy by asking specific questions, and following up by thinking critically about the results you receive.

When teaching research strategies to college students, I like to share a quotation from the poet e.e. cummings: "always the beautiful answer who asks a more beautiful question" (cummings, 1972, p. 462). If, when beginning a search for a documented essay, students fail to narrow a broad subject by using a specific topic question, the results are diffuse. This happens partially because of the wealth of cross-referenced hits at the subject level, but it also occurs because the searcher has not identified what information is being soughtand thus he or she will not be able to interpret whatever information is found.

Equally unproductive is a poorly formed set of search terms. For example, if I want information on the impact of pollution on Mexico City's economy, and I enter "mexico city" and "pollution" in a search at google.com, I receive no less than 18,700 hits. My thought process, unfocused at the subject level, has produced thousands of starting points, rather than moving me forward from a specific starting point. A poorly phrased topic question yields only slightly better results. For a search at askjeeves.com, I used the question, "Why is Mexico City polluted?" I got URLs ranging from history to tourism to politics, and everything in between. A slightly better question at the same site, "What is the air quality in Mexico City?" gave me somewhat better results, while the google.com search string "mexico city" + "economic impact" + "pollution" returned 487 hits—more than I would like, but a number that I can easily scan for relevance and then condense into a short webliography.

To illustrate the need for critical thinking, I offer a story about my mother-in-law. She used to tell me frequently that one could find "everything" on the Internet. In reality, though, much of this "everything" was suspect information. A friend of hers might recite to her, say, an urban legend about carcinogens in common bathroom products. This, in turn, would create considerable anxiety in my mother-in-law that her grandchildren were at risk from brushing their teeth or washing their hands. Reviewing articles on the Web completed a perfect Catch-22 for her suspicions, as her friend directed her to Web pages describing the horrors of hairspray. When she showed me these pages, I pointed out the sites' various fallacies: a lack of credentials in scientific research, studies with ideological aims and conclusions, and so forth. She has since learned how to evaluate source material with a better set of critical questions. She now asks about authority, objectivity, currency, and accuracy when confronting the plethora of information on the Internet.

How does the average person locate information on the Internet? Most people I have asked use a search engine such as google.com, dogpile.com, or altavista.com (my favorites), and this type of engine remains the best choice for the majority of Internet users. Unfortunately, these engines can also be notoriously inefficient, returning several pages of "hits" ranked on a percentage scale that estimates the likelihood that the sites contain what you want. At the same time, these engines often miss the hits that might help the user (or they bury them on the 47th page). Web-wide engines also do not necessarily have query access to the specialized databases that store the information you really need.

In response, software authors continually develop specialized search tools called "searchbots." Searchbots locate images, maps, e-mail addresses, street addresses, and other information in specialized categories. Altavista.com, for example, has evolved from a one-size-fits-all site into a number of these smaller searchbots. Likewise, many Web developers use database-driven Web sites; the Internet Movie Database is a wonderful example.

Beyond these offerings, I recommend three inexpensive tools that can be used for specialized searches.

BullsEye Pro

BullsEye Pro is an index of indexesa meta-searchbot. It searches more than 1,000 professional search engines and databases to find queries that can be entered in "natural language." In other words, this tool sends your request to other search engines that then search through databases at national newspapers, Web sites, business and professional resources, health centers, and technical documents. A single search on a news item might locate relevant items in The New York Times, the Library of Congress, and USA Today, whereas a shopping question would initiate a search of databases on books, CDs, consumer electronics, toy shops, online auctions, and other relevant sources.

Installation of BullsEye takes just a few minutes, and once you have opened the program, you simply choose the category you want to search. In Figure 1, category buttons are shown on the left. To the right, I have opened the engine that allows Web searches in various languages and countries. Important features include the elimination of dead links, annotation tools, reporting and archiving functions, and a built-in browser that handles both HTML and PDF files well.

You can download a 30-day trial copy of BullsEye for free. The licensed copy is rather expensive ($249 US) for personal use, though worth consideration for use by librarians and researchers.

Subject Search Spider

The Internet has been fairly well dominated by English speakers so far, but many nations—and language groups—are becoming increasingly frequent users of the medium. Subject Search Spider will search in any of 35 languages, most of which are European with a few Asian languages also included.

The easiest way to use SSS is to click on the "Query" menu at the top of the screen and then choose "Search Wizard." The wizard will guide you through the process of creating a tightly defined search. You can also choose the particular engines you want to use for your query (travel, education, cooking, the entire Web, etc.). Your results will come back in hypertext on a Web page that you can save. The program also includes a webliography of sites that you have visited during your current search.

Figure 2 is a demonstration of a search I created for the term "Risteard." I selected the language Irish Gaelic. (Risteard O'Coimin is my name, Richard Cummins, in Gaelic.) The search allows for a large number of customizable parameters, and the results are saved in a folder for future reference.

Subject Search Spider costs $29.95 US, though you can use a limited trial version for free.

Image Wolf

ImageWolf is a premier search tool for locating images on the Web. Figure 3 shows a search that I conducted for pictures of golfing sensation Tiger Woods. I stopped the search after a minute, at which time ImageWolf had already located 179 pictures. With one menu click, I prepared a Web report summarizing the search and providing hyperlinks to the image files.

There are several advantages to using ImageWolf, the first of which is that this tool can search as many as 20 sites simultaneously, unlike others that ploddingly search one site at a time. There are also advanced search options that allow you to search for images or video only. The most useful feature of this tool is its capability to verify files, so that your report will not contain dead hyperlinks. ImageWolf also builds a library of sites where images are found, becoming more intelligent with every use. The disadvantage, as far as I can tell, is that a fair number of links are extraneous, but these can be more or less avoided by using the URL Library feature, which allows better navigation than the Image Files Web report. When scanning typical results, for example, I can confidently skip entries such as "Sign In or Register at NYTimes.com" (which contained an ad featuring Woods's picture), "Error," "Ohio.com," and so on.

The regular version of ImageWolf costs $29.95 US, and there is a free, less robust trial copy for unlimited use.

Staying on top of searchbot developments is time consuming but necessary if you wish to use the Internet for the data warehouse that it is. The fact that content on the Internet doubles every month or two makes this paramount for anyone who wishes to be efficient and effective at finding information relevant to the task at hand. The good news, though, is that with a handful of well-chosen searchbots and some critical thinking skills, anyone can become a competent cybrarian chasing down electronic information across the wires of our emerging world infostructure.


cummings, e. e. (1972). Introduction. Complete poems: 1913 - 1962. New York: Harcourt Brace Jovanovich Inc.

