Intag White papers
Maps of the Human Knowledge and I-Databases
HKM- Human Knowledge Maps
Some zoom views of its anatomy
Classified 001, 11 June 2002
Author:Juan Chamero, CEO Intag


HKM- Clones Network and Coopbots tasks


We may imagine the Webspace hosting a HKM – Human Knowledge Maps network in different evolutionary states: [Lg-abc, t, URL, U(type, M)], which stands for
  • Language/s: for instance En (English) or En/Sp (English – Spanish)
  • abc: a given sequential number of the clone of an original HKM Version
  • t: traffic measured in some adequate unit, for example, giga visits
  • URL: the Website locator of the site that hosts one associated HKM clone
  • U (type, M): that defines the user’s market by type of market and Mass measured in mega users

We may imagine also a set of coopbots interchanging strategic information concerning the clones’ evolution, namely: matchmaking efficiency, dropouts, databases growing rates, searching profiles, etc.





Anatomy of a HKM Clone

Let’s take a view onto one of the clone’s hosts. The clone works in a matchmaking mode versus users. The clone consists of nearly 500.000 human_made_agents_aided briefs pointing to equal and corresponding number of websites, considered either Authorities or Hubs, a necessary and sufficient basic approach to the Human Knowledge at a given moment. As the “core” of each clone we have:

  • Small Green Oval: A set of nearly 250 trees, one for each Major Subject of the Human Knowledge. A tree is the logical tree of a classical Major Subject Program, fort instance Microbiology;

  • Large Green Oval: A set of Manuals, one for each major Subject (MS). They will have from 60 through 120 pages each with text, images and essential hyperlinks, totaling a Virtual Encyclopedia of nearly 20.000 pages, equivalent to a 40 Volume Collection;

  • Blue Oval: The Thesaurus, the whole sets of keywords for the entire collection, from 400.000 through 600.000 units. That means a full and evolutionary Thesaurus for each human language, enriched as time passes by in proportion to user’s traffic;

  • Heavy Green Small Oval: The whole set of threads for the whole collection. A thread stands for a string of keywords that has a special meaning for each MS, without being a subject; A set of connected concepts, to become a subject whether supported by a high and/or consistent traffic;

The user, after browsing internally the clone for a given “triad” [k, s, th], keyword, subject, thread, the first obliged and the second and third optional, he/she may decides either to visit or not a given Website (yellow oval).

The blue crown is built via users-clone interactions, for instance, orthographically correct keywords not present in the clone’s Thesaurus. For each of these “actually non-existent” keywords the system registers statistics and accordingly sends robots to look for information and global popularity statistics to the Web space, in order to suggest some keywords updates and/or new Authorities-Hubs for the next evolutionary step. For more details you have to see how the Expert Systems works in detail.

The aquamarine external crown are complementary URL’s, with briefs shown as they are by the search engines. These URL’s behave like bait for users. If they are browsed and/or selected too often they are considered potentially candidates to become part of the clone. Robots guided by the clone’s administrators choose these complementary URL’s. The algorithm that guides the robots for this intelligent pre selection task must take into account popularity, age matters, and Website structural parameters.





Major Subject’s Anatomy


We depict below a more detail of how a Major Subject is organized within the host. Each MS comprises 2.500 briefs – I-URL’s in the average, 1000 threads and 2.500 words (the same amount 2.500 is a mere coincidence). The program sector of the database holds the subject’s tree and the Manual. We show here the suggested I-URL’s (in fact we start suggesting only raw URL’s instead I-URL’s) to bait users and the suggested keywords sector enriched via users interactions.




Tree Evolution



The pieces of knowledge are logically presented as trees, and complemented by a set of Thesaurus and a set of Threads. However, the tree is the pillar of the knowledge at a given evolutionary state. With the tree we started the construction of the first version of the map, once approved by the academic staff for each MS. The initial trees must be considered only the best approaches for a given set of MS’s. Each user’s market is able to make the tree evolve fitting to their knowledge and information needs as well.

We depict in the figure a thread within a given MS, for instance ADN sequencing in Biology. The users – clones interactions, matches and mismatches, teach us many things. For instance, a regular use of the first branch, a null for the second, a heavy for the third and a rather weak for the fourth. One singularity could happens in the third level (red) of the fourth branch, a sub-sub-subject that perhaps deserve to be considered a higher subject level by itself!.

All these statistics and singularities are factors to make the trees and Thesaurus evolve, as we will see when studying the Expert System that makes the clones evolve.





How the HKM’s interface looks like to the users


The user queries the Expert System that governs the clone via a triad [k, s, th]. The first variable is mandatory meanwhile the two others are either optional or set by default. Once the query is issued, the Expert System (XS) extract out from its Briefs – I-URL’s database, the briefs – I-URL’s that match best, quite a few compared to a classical search engine inquiry, for instance from 10 through 40 instead thousands to millions. The list is sorted by popularity – presence. Popularity is the amount of real clicks, accesses to the URL described in the brief meanwhile presence measures how many times the brief was presented for any circumstances.

The algorithm outcome is highly influenced by the brief downloads, that is, as a function of how many times the brief was loaded into a user’s cart.

The brief – I-URL consists of three parts, namely: The body or Human Edited Brief, a summary written following a strict procedure that requires an exhaustive global research of the Website commented, sometimes aided by a set of utilities and Intelligent Agents;

A Certification Data Tags, that account for the website architecture and anatomy as it’s described in the I-URL white paper;

The statistics and living Brief – I-URL history: the ranking assigned by the above mentioned ranking algorithm, the traffic till the moment, the amount of clicks done to retrieve the Website located at the URL, the search deepness, that is how many pages in the average are browsed by each user (this computation strongly depends of the type of access permitted: free user, captive user, registered user, etc), and declared satisfaction (this feature demands that some users are invited to evaluate/make a choice in a poll quest).




The K – Process


By that we mean the keyword process. One user run is identified by one IP and the time run. Along that time period the user queries the XS with a string of k’s, via a triad selecting subject, and/or thread or leaving it by default.

For each k, the XS checks existence and errors. If the keyword exists, it operates the query and user decides to browse the list totally or partially, to save it or not some briefs, to make or nor some clicks and in the aftermath to declare or not satisfaction upon request.

Being the keyword non-existent, the XS checks different types of grammatical, syntaxes and orthography errors. By yes it typifies the error/s and suggests new keyword/s. By no it initiates a complex process of searching in the Web and putting a demand of potentially needed keyword in line. This process feeds the blue region of our second figure.




The XS try to do its best to discover coherent clusters of demand in order to learn as much as possible of the users’ market behavior and more than that: to suggest new threads and even new subjects to the HKM clone’s administrators. We have the intuition that users behave like thinking via clusters, keywords and sets of commonplace clusters, instead of trees. If the intuition is confirmed much could be gained to increase the match making efficiency of the HKM’s and of their clones.
    i-DB's, ..  back to Index
  Send your Comments Recommend this page to a friend