Intag White Papers
User Tracking and Thesaurus Evolution
Aug 06 2002

Thesaurus Evolution, Keywords Popularity, and something more

The figure above depicts a typical user track. We may define in each track the following significant events:
  • enter : a new user enters a query, asking for a given keyword within a given subject (optional)
  • c: means a positive HKM database answer to a query; in the figure k1, k3,,kn have positive answers meanwhile k2 doesn't.
  • C: means hat the user decides to retrieve one of the basic documents of the Web space and catalogued as belonging to the Universal HK Virtual Library. This is a crucial instance of the tracking. Effectively, the user abandons the site to dive into the outside document.
  • error: another crucial instance: the corresponding keyword (in the figure k2) leads to an error: supposedly the referenced site is not hosted anymore in that URL address.
  • leave: the user leaves the system, but could still make a..
  • re-entry: the user re-enter into the system, very important from the point of view of HKM usage, for some other keywords string within the same subject or for a different one.
  • subject: the user is emphatically invited to report a subject, apart from keywords; however he/she is not obliged to provide it.
  • r: another crucial instance: the user statistically decides either to return to the system or to continue browsing the We space by his/her own means.
  • Main Subjects' Tutorials: eventually, FIRST offer users a set of tutorials where the main subjects of each Major Subject of the HKM are thoroughly explained.

Warning: We are talking about existing keywords, that is, the users query the HKM by existing keywords. Perhaps the most crucial event occurs whenever an inexistent keyword is queried provided it's correctively written. Some things must be investigated by FIRST in this case: a) test if the keyword is inexistent within an specific main subject but it's present in the HKM database for the queried Major Subject; b) test if the keyword is inexistent in the HKM database for the queried Major Subject but could be present in some others; c) test if it's absolutely out of the HKM.

See below the different groups of keywords. First must analyze the existence/non-existence of not recognized keywords for all those groups. The Chief Editor FIRST must carefully review these cases once properly reported by.

Tracking "zoom"

We could improve our insight deepening into each incident, namely:

Over c: Once a couple [keyword, subject] is keyed and properly checked about all types of consistencies programmed, FIRST answers with a hierarchical list of either the selected I-URLs or their corresponding briefs. The later procedure invites to mark the most appropriate with a click. The user could even navigate within the same list, that is, within the same couple.

Over error: Eventually the users could get a wrong URL address (however, these kind of errors must be avoided as much as possible). The system must make the most of these opportunities trying to offer the user some alternatives: similar URL's (once checked the link works properly!) and/or advising to consult related tutorials within the system. Independently, these events must trigger one of the searching intelligent agents either to locate where the URL could have migrated (the most probable condition) or in an extreme to proceed looking for new documents. The potential documents to replace the lost one must be sent to the FIRST Chief Editor who finally approve/disapprove the new document once the corresponding I-URL is edited. Once finally approved, the announcement of the new document must be emailed to the users that previously authorized the system to be warned.

Note: An internal clock measures for each user the time duration session: once gone out to review something the system waits a reasonable time to receive the user as working along the same session. User may change subjects along one session.

Possible strings are:

[k1, c, C, k2, k3, c, k4, c, C, leave] subject i
[k1, k2, c, c, c, k3, c, C, c, C, k4, k5, leave] subject j

In the first string for subject i, the user decided to make a click over an URL once reviewed its I-URL, then returned, searching for k2 and k3 but just peeping without being interested to read the list of I-URL's provided, then tried with k4 and making another click over another URL and finally leaving the system.

In the second string for subject j, the user sweeps over k1 but review extensively k2 list and with k3 made two more searches, then another sweeping over k4 and k5 to finally leave the system.

As our purpose is to keep only keywords strings, those strings could be summarized as follows:

[k1, k2, k3, k4] subject i

[k1, k2, k3, k4, k5] subject j

Where we go from a cold color (blue) to a very hot and active one (red). For each session and for each subject the keyword strings are saved for statistic purposes. Statistics are made by string as they are and alphabetical.

Thesaurus Evolution Mechanics

All keywords and I-URL's traffic are from time to time statistically analyzed. Let's see how the Thesaurus evolves. For each keyword we have at each moment two variables: its quantitative presence within the Logical Tree structure and its popularity. We may define within the Thesaurus the following groups:

    a) Regular keywords
    b) Synonyms of specific keywords
    c) Related keywords to specific keywords
    d) Antonyms of specific keywords

a versus b and their respective popularities tell us about how well designed are the synonymies.
a versus c and their respective popularities tell us about some semantic irregularities.
a versus d and their respective popularities tell us about searching patterns the must be deeply investigated.

For instance if in politics we detect a high popularity of peace and conversely a low popularity of war it means that people is changing its attitude concerning the crucial problem of peace versus war. We may investigate also all the other possible combinations b versus c, b versus d, and c versus d.

Analysis of some other types of user interactions

We may save all users login and from time to time depersonalize them defining common behaviors, common searching patterns. We are going to find all imaginable kinds of searching patterns, namely
  • Users that dive wide and shallow systematically
  • Users that dive wide and deep systematically
  • Users that dive wide and shallow at random
  • Users that dive wide and deep at random
  • o Users that dive focused and shallow systematically
  • Users that dive focused and deep systematically
  • Users that dive as picking at random eventually either shallow or deep

All these and many other categories and divided in frequent and eventual users as well.

Another crucial events: users' feedback

A user could feedback FIRST in the following ways:
  • Making comments about i-URL's c stage
  • Making comments about specific URL's once reviewed C stage
  • Making comments from pre determined behavior-tracking places strategically distributed along the system, for instance: before entering the query process, leaving the query, before leaving the site, during the query process.
  • Making open suggestions from inside the site
  • Making open suggestion from outside the site

We may design the user interface warning users when they are ready to abandon the system, and welcome them when coming back from C type inspections. Eventually as we commented above the users could get a wrong addressing.

Path keyword string correspondences

We said that for each path of the initial logical tree of a given Major Subject of the KNM we define a string of keywords; whether possible with priorities; let's say from left to right. After the three-stage procedure depicted in the FIRST white papers, we have an initial set of correspondences between paths and strings, being both related to specific subjects under each Major Subject.

After a measurable evolution change for a given user's market the initial World Virtual Library of HK changes. In the figures above we depict such a change. Some documents - central figure- will be considered "useless" (light yellow regions) and some were added to the system, extracted from the HK as_it_should_be region (reddish regions). Finally the third figure shows how the actual World Library of the HK and its related HKM will look. Topologically, for the next evolutionary step we consider the situation like initially, with a red circle within a larger yellow circle but leaving a smaller yellow crown.

However, if we do not change the logical tree and the Thesaurus accordingly, the procedure will fail. To make the red region converge to cover as much as possible the yellow region the procedure will enter in a vicious circle.

    Intag WP's  back to Index
  Send your Comments Recommend this page to a friend