Here, we study which ranking strategy, favoring precision or favoring recall, works better to filter the semantic data used for searching relevant documents on the web. Ranking criteria are based on:
- Confidence/quality (q@1-how good are the ontological translations used to derive the answer?)
- Popularity (p@1-which answer is returned by the most ontologies?)
- Meaning given by a WordNet similarity measure (s@1-are the answers from different sources similar interpretations of the query?)
- A combination of all those criteria (c@1) as detailed on [4].
We manually evaluated the effectiveness of the query expansion. The keywords of a user query were used to retrieve documents in Yahoo and a set of judgments over the retrieved answers was performed to calculate precision@10. We compare these results with the ones obtained by PowerAqua after performing query expansion with the set of selected answers
These experiments shows that query expansion improves the quality of the keyword-based approach in more than half of the evaluated queries, which means that the semantic data can help to enhance the quality or precision of the results the user is likely to read (@10). All ranking algorithms outperformed the keyword baseline. The best ranking algorithms are those favoring precision (q@1) and reducing noise.
RESULTS (precision @ 10):
P@10 | Yahoo | q@1 | p@1 | s@1 | c@1 |
---|---|---|---|---|---|
Q1 | 0/10 | 8/10 | 8/10 | 0/10 | 8/10 |
Q2 | 8/10 | 7/10 | 5/10 | 9/10 | 4/10 |
Q3 | 3/10 | 7/10 | 7/10 | 7/10 | 7/10 |
Q4 | 2/10 | 5/10 | 5/10 | 5/10 | 5/10 |
Q5 | 3/10 | 3/10 | 3/10 | 3/10 | 3/10 |
Q6 | 1/10 | 5/10 | 5/10 | 5/10 | 3/10 |
Q7 | 1/10 | 2/10 | 2/10 | 2/10 | 2/10 |
Q8 | 2/10 | 2/10 | 0/10 | 2/10 | 2/10 |
Q9 | 2/10 | 2/10 | 2/10 | 2/10 | 2/10 |
Q10 | 4/10 | 6/10 | 6/10 | 6/10 | 6/10 |
Q11 | 6/10 | 10/10 | 10/10 | 10/10 | 10/10 |
Q12 | 6/10 | 6/10 | 6/10 | 6/10 | 6/10 |
Q13 | 4/10 | 4/10 | 4/10 | 4/10 | 4/10 |
Q14 | 3/10 | 3/10 | 3/10 | 3/10 | 3/10 |
Q15 | 4/10 | 7/10 | 7/10 | 7/10 | 7/10 |
Q16 | 0/10 | 0/10 | 0/10 | 0/10 | 0/10 |
Q17 | 10/10 | 10/10 | 10/10 | 10/10 | 10/10 |
Q18 | 9/10 | 10/10 | 10/10 | 10/10 | 10/10 |
Q19 | 7/10 | 10/10 | 10/10 | 10/10 | 10/10 |
Q20 | 1/10 | 6/10 | 6/10 | 6/10 | 6/10 |
Avg.(20) | 38% | 56.5% | 54.5% | 51.5% | 54% |
TOPICS (QUERIES)
Q1: who plays in Nirvana?
(who, play, nirvana) or (person, organization, play, nirvana)
Pages that list or talk about the members of Nirvana in a given context are valid,
however pages about Nirvana that do not mention their members (e.g. only their discography) are not.
Therefore, the wikipedia page of Nirvana (with the list of members) is correct, but results about guitar lessons, videos or the bibliography
of a given Nirvana member (if other members are not mention) are not valid results
Q2: which islands belong to Spain
(islands, belong, Spain)
Articles (geography, tourism) that talks about (or list) various islands in Spain
(e.g. the names of the different Canary islands in Spain) are considered valid
(even if they do not provide the complete list of Spanish islands).
Tourism articles about renting properties, ferries or offering sports (e.g. scuba-diving) in a given island are not valid
Q3: which Russian Rivers flow into the Black Sea
It has to explicitly mention the river(s) that are in Russia and flow into the Black sea (e.g. in wikipedia)
Q4: What state is austin in?
(state, austin)
Pages that explicitly mention that Austin is the capital or is located in Texas.
Articles about Austin or Texas hospital, library, real state, highways, etc, are not valid
Q5: How many rivers does Washington have?
(rivers, washington)
Pages that list more than one river in Washington
Graceful degradation. List of PowerAqua answers contains labels that introduce noise (like columbia2, Snake2), which together with all the answers do not return any website
Q6: What is the capital of maryland?
(capital, maryland)
Pages about the capital of Maryland, which explicitly states that Annapolis is the capital of Marylan. Results such as "Venture Capital Firms", "Maryland Capital Enterprises", "the capital, a daily newspaper published in the Annapolis, Maryland" or "capital news Service: a news wire affiliated to the university of Maryland" are not valid.
Q7: Which state is kalamazoo in?
(state, kalamazoo)
Pages that explicitly state that Kalamazoo is in Michigan. Pages about weather in Kalamazoo, Michigan, or kalamazoo university, etc, are not valid
Q8: Find all the lakes in California
(lakes, California)
Pages about the town Mammoth Lakes or Canyon Lake community, are not valid. Pages listing lakes in California are valid (even if the list is not complete) but pages describing only one lake are not.
Graceful degradation. Too many results for confidence, sysnet and combination. Only one results for popularity (Tahoe) , so all pages are just about this one lake
Q9: San antonio is in what state?
Web pages that explicitly state that San Antonio is in Texas. Web pages about the university of Texas at San Antonio or San Antonio business directory, hotels and news are not relevant
Q10: List me all films with Brad Pitt and Angelina Jolie
(films, Brad Pitt) (films, Angelina Jolie)
Web sites that talk about a films in which both Angelina Jolie and Brad Pitt are involved. Relationship is not specified. Brad Pitt is also the producer of “a mighty heart”, and they are both starring “Mr & Mrs Smith”. Pages talking about the relationship between both actors, or news the Cannes festival Like the appearance of the couple in the Cannes festival regarding to a movie in which only Brad Pitt participates, without listing the movies in which both of them appear, are not relevant
Q11: find me actors in Pulp Fiction
(actors, Pulp Fiction)
Pages about the movie without listing the authors are not valid
Q12: Give me films by david lynch
(films, david lynch)
Only pages that list or mention David Lynch' movies are valid, pages about one particular movie are not (e.g. the short films of david lynch (DVD))
Graceful degradation . Too many answers + one noise answer
Q13:Find me actors starring films directed by Francis Ford Coppola
(actors, starring, films) (actors/films, directed, Francis Ford Coppola)
Pages about the films, and their actors, directed by Francis Ford Coppola
Too many results. Graceful degradation
Q14: Give me films about Buenos Aires
(films, buenos aires)
Pages about Buenos Aires film festival are not valid. List of films filmed in Buenos Aires (Argentina) or about Buenos Aires are valid
Too many results. Graceful degradation
Q15: Which books Stephen King wrote?
(books, Stephen King, wrote)
Pages that list Stephen King books. Articles about just one of his books are not valid (e.g. "on writing"). Pages listing audio books are valid too.
Q16: Which English actors play in Titanic?
(English actors, play, Titanic)
Page that list British actors that play in Titanic are valid. Pages about only one actor are not valid, pages that list actors without mentioning where are they from are not valid. Pages about English actors that do not explicitly mention that they appear in Titanic are not valid
Top
Q17: Where was Franz Kafka born?
(where, Franz Kafka, born)
All pages that mention that Fran Kafka born in Prague are valid
PowerAqua's answers are a bit noisy as apart from Prague (birthplace) it also gives Viena and Austria as results (deathplace)
Q18: How many languages are spoken in Afghanistan?
(languages, spoken, Afghanistan)
List of languages spoken in Afghanistan
Q19: Give me the husbands of Elizabeth Taylor
(husbands, Elizabeth Taylor)
Pages that list all the husbands (not just the last one)
Q20: who is the wife of Tom Cruise?
(person/organization, wife, Tom Cruise)
Pages that list all the wives of Tom Cruise (not just one) are valid. Scientology websites or other news about Tom Cruise that do not mention his wives are not valid