home
 
Approach
Bottom
Linked Data is producing a critical mass of semantic data. Despite all its problematic aspects (heterogeneity, incompleteness, modeling errors), if the semantic data to answer a user query is available, this semantic data can be used to expand the query and improve the precision of the documents retrieved when querying the Web. If no semantic data is available the system degrades gracefully to use only the query terms. In this evaluation we probe the feasibility of this statement by presenting 20 example queries, whose answers are encoded totally or partially in DBpedia and in a set of medium size ontologies used in PowerAqua previous evaluations.

Here, we study which ranking strategy, favoring precision or favoring recall, works better to filter the semantic data used for searching relevant documents on the web. Ranking criteria are based on:
  • Confidence/quality (q@1-how good are the ontological translations used to derive the answer?)
  • Popularity (p@1-which answer is returned by the most ontologies?)
  • Meaning given by a WordNet similarity measure (s@1-are the answers from different sources similar interpretations of the query?)
  • A combination of all those criteria (c@1) as detailed on [4].
Answers cluster at position one (@1) represent the best subset of results according to the chosen ranking method and are the ones used to perform query expansion.

We manually evaluated the effectiveness of the query expansion. The keywords of a user query were used to retrieve documents in Yahoo and a set of judgments over the retrieved answers was performed to calculate precision@10. We compare these results with the ones obtained by PowerAqua after performing query expansion with the set of selected answers

These experiments shows that query expansion improves the quality of the keyword-based approach in more than half of the evaluated queries, which means that the semantic data can help to enhance the quality or precision of the results the user is likely to read (@10). All ranking algorithms outperformed the keyword baseline. The best ranking algorithms are those favoring precision (q@1) and reducing noise.

RESULTS (precision @ 10):
P@10 Yahoo q@1 p@1 s@1 c@1
Q1 0/10 8/10 8/10 0/10 8/10
Q2 8/10 7/10 5/10 9/10 4/10
Q3 3/10 7/10 7/10 7/10 7/10
Q4 2/10 5/10 5/10 5/10 5/10
Q5 3/10 3/10 3/10 3/10 3/10
Q6 1/10 5/10 5/10 5/10 3/10
Q7 1/10 2/10 2/10 2/10 2/10
Q8 2/10 2/10 0/10 2/10 2/10
Q9 2/10 2/10 2/10 2/10 2/10
Q104/10 6/10 6/10 6/10 6/10
Q116/10 10/10 10/10 10/10 10/10
Q126/10 6/10 6/10 6/10 6/10
Q134/10 4/10 4/10 4/10 4/10
Q143/10 3/10 3/10 3/10 3/10
Q154/10 7/10 7/10 7/10 7/10
Q160/10 0/10 0/10 0/10 0/10
Q1710/10 10/10 10/10 10/10 10/10
Q189/10 10/10 10/10 10/10 10/10
Q197/10 10/10 10/10 10/10 10/10
Q201/10 6/10 6/10 6/10 6/10
Avg.(20) 38% 56.5% 54.5% 51.5% 54%


TOPICS (QUERIES)


Q1: who plays in Nirvana?


music

(who, play, nirvana) or (person, organization, play, nirvana)
Pages that list or talk about the members of Nirvana in a given context are valid, however pages about Nirvana that do not mention their members (e.g. only their discography) are not. Therefore, the wikipedia page of Nirvana (with the list of members) is correct, but results about guitar lessons, videos or the bibliography of a given Nirvana member (if other members are not mention) are not valid results




Q2: which islands belong to Spain


geography

(islands, belong, Spain)
Articles (geography, tourism) that talks about (or list) various islands in Spain (e.g. the names of the different Canary islands in Spain) are considered valid (even if they do not provide the complete list of Spanish islands). Tourism articles about renting properties, ferries or offering sports (e.g. scuba-diving) in a given island are not valid




Q3: which Russian Rivers flow into the Black Sea


geography
(russian rivers, flow, Black Sea)

It has to explicitly mention the river(s) that are in Russia and flow into the Black sea (e.g. in wikipedia)




Q4: What state is austin in?


geography

(state, austin)
Pages that explicitly mention that Austin is the capital or is located in Texas. Articles about Austin or Texas hospital, library, real state, highways, etc, are not valid




Q5: How many rivers does Washington have?


geography
(rivers, washington)

Pages that list more than one river in Washington

Graceful degradation. List of PowerAqua answers contains labels that introduce noise (like columbia2, Snake2), which together with all the answers do not return any website




Q6: What is the capital of maryland?


geography
(capital, maryland)

Pages about the capital of Maryland, which explicitly states that Annapolis is the capital of Marylan. Results such as "Venture Capital Firms", "Maryland Capital Enterprises", "the capital, a daily newspaper published in the Annapolis, Maryland" or "capital news Service: a news wire affiliated to the university of Maryland" are not valid.




Q7: Which state is kalamazoo in?


geography
(state, kalamazoo)

Pages that explicitly state that Kalamazoo is in Michigan. Pages about weather in Kalamazoo, Michigan, or kalamazoo university, etc, are not valid




Q8: Find all the lakes in California


Geography
(lakes, California)

Pages about the town Mammoth Lakes or Canyon Lake community, are not valid. Pages listing lakes in California are valid (even if the list is not complete) but pages describing only one lake are not.


Graceful degradation. Too many results for confidence, sysnet and combination. Only one results for popularity (Tahoe) , so all pages are just about this one lake




Q9: San antonio is in what state?


geography
(state, san antonio)

Web pages that explicitly state that San Antonio is in Texas. Web pages about the university of Texas at San Antonio or San Antonio business directory, hotels and news are not relevant




Q10: List me all films with Brad Pitt and Angelina Jolie


actors and films
(films, Brad Pitt) (films, Angelina Jolie)

Web sites that talk about a films in which both Angelina Jolie and Brad Pitt are involved. Relationship is not specified. Brad Pitt is also the producer of “a mighty heart”, and they are both starring “Mr & Mrs Smith”. Pages talking about the relationship between both actors, or news the Cannes festival Like the appearance of the couple in the Cannes festival regarding to a movie in which only Brad Pitt participates, without listing the movies in which both of them appear, are not relevant




Q11: find me actors in Pulp Fiction


actors and films
(actors, Pulp Fiction)

Pages about the movie without listing the authors are not valid




Q12: Give me films by david lynch


actors and films
(films, david lynch)

Only pages that list or mention David Lynch' movies are valid, pages about one particular movie are not (e.g. the short films of david lynch (DVD))


Graceful degradation . Too many answers + one noise answer




Q13:Find me actors starring films directed by Francis Ford Coppola


actors and films
(actors, starring, films) (actors/films, directed, Francis Ford Coppola)

Pages about the films, and their actors, directed by Francis Ford Coppola


Too many results. Graceful degradation




Q14: Give me films about Buenos Aires


actors and films
(films, buenos aires)

Pages about Buenos Aires film festival are not valid. List of films filmed in Buenos Aires (Argentina) or about Buenos Aires are valid


Too many results. Graceful degradation




Q15: Which books Stephen King wrote?


books
(books, Stephen King, wrote)

Pages that list Stephen King books. Articles about just one of his books are not valid (e.g. "on writing"). Pages listing audio books are valid too.




Q16: Which English actors play in Titanic?


actors and films
(English actors, play, Titanic)

Page that list British actors that play in Titanic are valid. Pages about only one actor are not valid, pages that list actors without mentioning where are they from are not valid. Pages about English actors that do not explicitly mention that they appear in Titanic are not valid



Top


Q17: Where was Franz Kafka born?


famous people
(where, Franz Kafka, born)

All pages that mention that Fran Kafka born in Prague are valid


PowerAqua's answers are a bit noisy as apart from Prague (birthplace) it also gives Viena and Austria as results (deathplace)




Q18: How many languages are spoken in Afghanistan?


languages
(languages, spoken, Afghanistan)

List of languages spoken in Afghanistan




Q19: Give me the husbands of Elizabeth Taylor


famous people
(husbands, Elizabeth Taylor)

Pages that list all the husbands (not just the last one)




Q20: who is the wife of Tom Cruise?


famous people
(person/organization, wife, Tom Cruise)

Pages that list all the wives of Tom Cruise (not just one) are valid. Scientology websites or other news about Tom Cruise that do not mention his wives are not valid







AKT