Intag Classified Documents
Classified Document Number 8
Number: 008
Date: August 2001



Intelligent Bibliography

Basic AI - IR Bibliography

Most of this bibliography deals with advanced applications, trying to optimize the Web performance from the owners Websites side, and "de facto" ignoring the users' side. This characteristic is very important when building intelligent Websites. Our methodology emphasized initially the users' side logic as the "other side" is by default well controlled. However, we reviewed the whole "state of the art" concerning the inner side in order to improve our methodology. Is our opinion that most actual Web projects are oriented to optimize the search and consequently the matching between sites and users supposing that what Websites say about themselves in predetermined places is only the truth but the true. It is well known that nothing is more distant of the reality than this naive supposition, making a big part of the AI - IR excellence useless.

8- FIRST within the vast world of AI – IR





AI-IR Authorities


·         001- Scientific Literature Digital Library, a Research Index of the NECI. The site was moving to 002-this new place, and It is now being replaced by new focus of interest like for instance learning, vision and intelligence.


We went there to look for Basic Knowledge classification hints and volume data


·         003-Semantic Networks For Conceptual Analysis, SENECA, from Ernesto García Camarero ( ), J. García Sanz y M.F. Verdejo of the Centro de Cálculo, Universidad Complutense de Madrid and Universidad Politécnica de Madrid, Spain 1980. 


We went there to see how to encapsulate knowledge objects.


·         004-Logic Programming, presented within a Review of Knowledge Representation Languages (the document is provided in dvi format!), from Michel Chitta Baral, from the Computer Science Department of University of Texas at El Paso, USA.


·         005-A Survey On Web Information Retrieval Technologies (the document is provided in PDF format), from Lan Huang, Computer Science Department, State University of New York at Stony Brook, email:, May 1999.


·         006-Software Agents:  An Overview, from Hyacinth S. Nwana from Intelligent Systems Research, Advanced Applications & Technology Department, BT Laboratories, Martlesham Heath, Ipswich, Suffolk, IP5 7RE, U.K., e-mail:, year 1996

We went there to see how to perform HK classification made by robots without human intervention or at least with negligible human intervention in a near future.



Comparative Search performance analysis


·         Done for the following search engines and directories:


·         007-Google: to see how the PageRank Algorithm and its Lexicon work;

·         008-Altavista: to compare its advanced search facility performance against Google;

·         009-Infoseek: to analyze its search among results feature;

·         010-Yahoo;

·         011-Infomine,

·         012-Britannica;

·         013-Galaxy, and;

·         014-Librarian's Index.



·         015-HITS (published in French), which stands for Hypertext Induced Topic Search, created by John Kleinberger, from IBM to identify sources of authority: We find very useful to implement the concept of hubs (good sources of links) and authorities (good sources of content). I think most of our URL's will correspond to authorities. A good hub is the one that points to many authorities and conversely a good authority is the one that is pointed to by many hubs.


We went there to see how to find “good enough” authorities and hubs.



·         016-OGS: OGS, Open Global Ranking Search Engine and Directory,  is a distributed concept trying to use all search facilities opinions.


We went there to see how to implement users’ opinions.


·         TAPER: TAPER, which stands for Taxonomy And Path Enhanced Retrieval system, was developed by 017-Soumen Chakrabarti, in collaboration with Byron Dom and Piotr Indyk , from IBM Santa Teresa Research Lab, year 1997. You may find also the related document (in pdf) Using Taxonomy, Discriminants and Signatures for navigating in Text Databases, written by Soumen Chakrabarti, Byron Dom, Rakesh Agrawal and Prabhakar Raghavan of IBM Almaden Research Center, 1997. 


·         018-How do we find information on the Web by Kiduk Yang, March29, 2001, School of Information and Library Sciences, University of North Caroline at Chapel Hill.




Web Sizing


·         019-DBLP Bibliography) is a straightforward and much of common sense procedures to sample statistically search engines universes.


·         020-Cybermetrics, with works of Bharat and Broder related to this issue of measuring the Cyberspace, such us Web mining and Web metrics.


·         021-Web Archeology, by the Research Group of Compaq where one of the archeologists is Andrei Broder.



KR - One outstanding IR “authority”


·         Knowledge Representation by John F. Sowa, August 1999.


We went there to look for systems thinking concerning our project in three areas: Logic, 022-ontology and computation.



New and Old ideas


·         Clustering


·         023-Vivísimo. Clustering is a relative “old” technology that once we get one answer to a query it could be organized in meaningful clusters so if it works and if does not take significant process time it always add never subtract concerning a better understanding (no ranking). You may see in action in the new search engine Vivísimo originated in the Carnegie Mellon University and launched in February 2001. It works fine in scientific literature, web pages, patent abstracts, newswires, meeting transcripts, and television transcripts. It is a Meta search engine because it works over several search engines at a time applying the clustering process to all the answers. It’s especially apt when users don’t know how to make accurate queries: they advise to use regular search engines in those cases. As they work directly on the pipeline of answers, the procedure is titled “just in time clustering”.


·         024-Temporal Trends in Document Databases, from Alexandrin Popescul, Gary William Flake, Steve Lawrence, Lyle H. Ungar, C. Lee Giles, IEEE Advances in Digital Libraries, ADL 2000, Washington, DC, May 22–24, pp. 173–182, 2000. To check results they used the Citeseer database available in 025-Citeseer Index, which consists of 250,000 articles on Computer Science and they used 150,000. Their algorithm works on the ideas of co-citation and previously determined influential papers.



·         026-Teoma is a project of Computer Labs at Rutgers University launched on May 2001 trying to excel Google.


We went there to see how to identify communities and local authorities within.



Basic UK Seeds


·         027-Collection of Glossaries is another laudable effort made by Aussie Slang (that’s not a woman but it stands for Australian Slang!) to facilitate the users’ navigation. It’s only a directory of glossaries and dictionaries that could be useful to the initial tasks to build the HKM, when gathering trees, paths and keywords of Major Subjects of the HK (they say have catalogued more than 3,200 glossaries, really an upper limit to the volume of our Thesaurus.



More about KR


·         028-Towards Knowledge Representation: The State of Data and Processing Modeling Standards, from Anthony K. Sarris of Ontek Corporation, 1996: It’s another source to fully depict the state of the art related to KR. In the Web domain we are dealing with conventional knowledge, at last forms of the classical written knowledge complemented with some images.


·         029-Library of Congress site, trying to describe it with words but it will be extremely difficult to provide a map of its built in knowledge as an institution.


·         030-ISO, the International Standards Organization or International Organization for Standardization. We have to allocate room in the I-URL’s of FIRST to take into consideration that near future possibility.


We went there to see how it is possible to represent complex objects like for instance societies and enterprises.




HK Databases models

Digital Library Initiative


·         031-National Science Foundation (NSF),

·         The DOD, Department of Defense's 032-Advanced Research Projects Agency,

·         And the 033-National Aeronautics and Space Administration.


·         034-National Library of Medicine, The Library of Congress

·         035-National Endowment for the Humanity

·         The FBI, 036-Federal Bureau of Investigations.





    Class  back to Index
  Send a comment to our CEO