FoD
From Daniel Crabtree
FoD - Festival of Doubt, Victoria University's weekly AI group meeting. It's web page is here (http://www.mcs.vuw.ac.nz/cgi-bin/wiki/festival). Other useful links include: [Events Listing (http://www.mcs.vuw.ac.nz/~fod/FoDevents.php?mode=future)], [FoDders, people who attend FoD (http://www.mcs.vuw.ac.nz/cgi-bin/wiki/festival?DubutanteS)]. Below you will find details on the presentations I have given at FoD, with the most recent talks listed first.
29 Oct 2007 - Understanding Query Aspects with applications to Interactive Query Expansion
This is a practice talk for a paper that I am presenting at WI 2007 in Fremont, California in a few days time. The abstract of the paper follows. For many hard queries, users spend a lot of time refining their queries to find relevant documents. Many methods help by suggesting refinements, but it is hard for users to choose the best refinement, as the best refinements are often quite obscure. This paper presents Qasp, an approach that overcomes the limitations of other refinement approaches by using query aspects to find different refinements of ambiguous queries. Qasp clusters the refinements so that descriptive refinements occur together with more obscure and potentially better performing refinements, thereby explaining the effect of refinements to the user. Experiments are presented that show Qasp significantly increases the precision of hard queries. The experiments also show that Qasp’s clustering method does find meaningful groups of refinements that help users choose good refinements, which would otherwise be overlooked.
27 July 2007 - Exploiting Underrepresented Query Aspects for Automatic Query Expansion
This is a practice talk for a paper that I am presenting at KDD 2007 in San Jose in early August. The abstract of the paper follows. Users attempt to express their search goals through web search queries. When a search goal has multiple components or aspects, documents that represent all the aspects are likely to be more relevant than those that only represent some aspects. Current web search engines often produce result sets whose top ranking documents represent only a subset of the query aspects. By expanding the query using the right keywords, the search engine can find documents that represent more query aspects and performance improves. This paper describes AbraQ, an approach for automatically finding the right keywords to expand the query. AbraQ identifies the aspects in the query, identifies which aspects are underrepresented in the result set of the original query, and finally, for any particularly underrepresented aspect, identifies keywords that would enhance that aspect’s representation and automatically expands the query using the best one. The paper presents experiments that show AbraQ significantly increases the precision of hard queries, whereas traditional automatic query expansion techniques have not improved precision. AbraQ also compared favourably against a range of interactive query expansion techniques that require user involvement including clustering, web-log analysis, relevance feedback, and pseudo relevance feedback.
15 May 2007 - QC4 - A Clustering Evaluation Method
This is my practice talk for PAKDD 2007 where I will be presenting this paper next week. Its about evaluating clustering algorithms. Practically nothing about web page's or web search in this talk. But obviously there are some web examples thrown in there for good measure and of course the ubiquitous Jaguar example you are probably all too familiar with by now.
7 December 2006 - Query Directed Web Page Clustering
A practice talk for the paper "Query Directed Web Page Clustering", which is a paper that I am presenting at Web Intelligence 2006 in Hong Kong at Christmas. It's an interesting clustering algorithm that gets very good performance. The presentation is to be 20 minutes long.
2 March 2006 - Improving Web Page Clustering with Global Document Analysis
This is just a short talk (15 minutes), it is going to be a practice for a talk I am giving at a Workshop in Auckland on Saturday. I would like to have comments on the talk and get any last minute ideas for improving it. After the talk, I can discuss the work further and elaborate on anything that interests anyone or if no-one is interested, it'll be a very short FoD. Here is the abstract for the talk: Web page clustering methods categorize and organize search results into semantically meaningful clusters that assist users with search refinement. Finding clusters that are semantically meaningful to users is difficult. In this presentation, I describe a new web page clustering algorithm that chooses clusters that more closely relate to the user's query. The algorithm uses term co-occurence statistics to construct feasible clusters, which are merged, ranked, and selected using an heuristic model of web page clustering usability. The performance of the new algorithm is evaluated and compared against other algorithms and a significant performance improvement is achieved over the other clustering algorithms.
21 October 2005 - A new approach to sandwiches - [Download Presentation Slides (http://www.danielcrabtree.com/downloads/research/my_presentation_slides/20051021%20FoD%20WI_IAT%2005%20Conference%20Report.ppt)]
My report on the WI/IAT 2005 international conference. This talk is going to be a summary and overview of interesting ideas, projects, and papers that I can remember about from at the conference. Additional ideas from people that I talked to are likely to be included. This talk is likely to be exceptionally light on technical details - very much in contrast to most of the actual talks at the conference. BTW: I have chosen to keep the automagically generated title. (Video extract of my conference talk as shown in FoD is not included with slides.)
2 September 2005 - Web Clustering - New Scoring and Selection Methods, New Evaluation Method - [Download New Scoring and Selection Method Slides (http://www.danielcrabtree.com/downloads/research/my_presentation_slides/20050921%20WI05%20Cluster%20Selection.ppt)] - [Download Evaluation Method Slides (http://www.danielcrabtree.com/downloads/research/my_presentation_slides/20050921%20WI05%20QC4.ppt)]
I will give: a 20 minute presentation on a new scoring method and a new cluster selection method. a 10 minute presentation on a new evaluation method. These are two talks that I will be giving at the Web Intelligence Conference in a week or so. So these will be practice runs. Please ask questions and help me sort out any problems with these talks. (Slides for download are the ones used at conference, slightly updated from suggestions given during this FoD.)
29 July 2005 - Web Clustering - A Sneak Peak - [Download Presentation Slides (http://www.danielcrabtree.com/downloads/research/my_presentation_slides/20050729%20FoD%20Web%20Clustering%20Sneak%20Peak.ppt)]
I will give a sneak peak into the content of two papers that I've had accepted at the Web Intelligence conference. The full presentation of these with slides will come in a few weeks. This sneak peak is just to introduce some of the interesting ideas in both papers, so I have a feeling for the type of content to include in my actual presentation. I will follow that with some sort of demonstration of my clustering system. I'm pretty sceptical as to whether a live demonstration of any new searches can be done due to time constraints, but there are a few prepared searches to look through. At the start of the talk we will decide on a 1 word search and try to have it ready to view by the end of the talk, so bring ideas for that.
8 April 2005 - The problems of searching the web, web clustering, and possible solutions
This will be an informal talk about the problems that exist with search the web, the capabilities of web clustering and how far it currently goes in addresses the problems of searching the web, and my possible solutions for how the remaining problems could be solved. The talk will start with a brief look at the big picture and the ultimate goal of search and more specifically the ultimate way of obtaining information.
25 February 2005 - Automatic Meaning Discovery Using Google - [Download Presentation Slides (http://www.danielcrabtree.com/downloads/research/my_presentation_slides/20050225%20FoD%20Google.ppt)]
Exploration of the method described in Automatic Meaning Discovery Using Google (Cilibrasi & Vitanyi). Basically it’s an interesting paper about finding some kind of relationship or meaning or semantics of words, word-pairs, phrases, etc using Google. I see it as having applications for clustering and in improving clustering results, for instance, they give an example of it separating colours from numbers using Google. They also show that given simple examples of terms, it can extract the 'gist' of their semantic relationship to other terms. I'm more interested in discussing the idea of the technique and its applications, and while it is based on some Bayesian, statistical, and other mathematical constructs, I will not be talking about these in great detail; so if you have any questions, turn to the paper, as that is what I would be doing. I intend to give a brief introduction on the paper and its techniques (brief relative to last weeks), then to move to open discussion amongst the group.
