Discussion Notes

Jan 31, 2001

(courtesy Balaji K.S. with some edits and input by Rob Capra)

The discussions focused on three threads:

Is a web log useful to draw inferences?

Multiple Windows

The user may have different windows open at the same time which can make drawing inferences about the user's experience of a web site tricky. For example, the user could open two windows to different sites (A and B). Then, for 15 minutes, the user does nothing on site A, but is instead working on site B. Based on this data, site A might be tempted to think that the user has a liking of their page but in reality, the user spent all the time reading site B.

How to distinguish one user from another in the web log

Caching problem

Web site design depends on perception

Concept hierarchies can help give insights


Data Mining

For a particular scenario (or scenarios), several ideas are available to improve the efficiency of the data mining process: we survey them below.

Anti-monotonicity

If it is found (in a bottom-up fashion) that there is no support for {b}, then there is no need to look any higher to the {b,d} and {b,c,d} nodes since there will be no support for them either:
                    {b,c,d}
                     / | \
                    /  |  \
                   /   |   \
                  /    |    \
              {b,d}  {b,c}  {d,c}
                | \  /   \  / |
                |  \/     \/  |
                |  /\     /\  |
                | /  \   /  \ |
               {b}    {d}   {c}
                 \     |     /
                  \    |    /
                   \   |   /
                    \  |  /
                      { }

Query optimizers can make use of the antimonotonicity constraint to selective "reorder" query (mining) operations in an attempt to improve retrieval performance.

Using Generality Orderings

Meta-Patterns = Patterns of Patterns. Thus, syntactic and semantic constraints on the nature of patterns can be used to prune the search space for hypotheses.

Anytime Results

Data mining can be terminated when results of the desired fidelity are achieved.

Caveats with the WUM approach

The mining language does not support closure in the sense of SQL (i.e., the output of a mining query cannot seamlessly serve as the input to another mining query). Moreover, the expressiveness of the language is constrained to propositional logic. First order predicate logic can help mine fundamentally relational patterns.


Placing weblog mining in a larger context

Enumeration

Each scenario is enumerated in advance to ensure that data mining and exploitation (of mined patterns) can make use of this information. There is some attempt to separate modeling of the system from targeting.

Different Sessions

The session information cannot be easily maintained. If an user accesses page2 and in another window goes to page0, both might need to be considered as a single session (e.g., a "manual" information integration scenario); according to the authors, however, they are modeled as different sessions. (only caching issue is addressed by Cooley and not session information consistency).

Interaction between sites

Mining a web log can give some information but it will not be very useful to infer something. In today's world, everything a user needs are from different websites. So having the web log from a single website can give information about how he navigates within a website..which link he clicks .. etc ( can be useful to redesign his website.. some links may not be used. we can infer either the user did not like it or link was not placed in the proper place.. Redesign and analyze the behaviour), .....

but the designer will not be able to know the pattern (context, scenario) (1) from which website the user came to this site (why?) 2) is the output from this site going to be used in different site. ie the interaction of different sites cannot be determined.

Evaluation

The author's idea of evaluation coupled with the usability study appears nice and should be developed. Some statements and observations about how users prefer to interact with the system deserve particular attention. Do the users do this because this is what they want or because that's how they think the system can work?



Return Home