TOPICS (QUERIES)
Which russian rivers flow into the Black sea?
1 answers from 2 semantic sources (Dnepr)
precision = 1/1, precision@1 = 1/1, recall@1 = 1/1
21.6 secs for mapping the onto-triples, 0 secs for fusion.
Which animals are reptiles?
13 answers from 12 sources (Aligator, Crocodile, Gecko, Lizard, Snake, Tortoise_Reptile, Turtle, etc.)
precision = 13/13, precision@1 = 13/13, recall@1 = 13/13
51.7 secs for mapping the onto-triples, 15.5 secs for fusion.
Where is kazakhstan?
19 answers from 13 sources (Asia and Kazakhstan itself are ranked first, followed by countries, cities or seas that border Kazakhstan, we only consider "Asia" as the valid precise answer, although some of the other answers (Caspian_sea, Turkmenistan, Russua, People's republic of China are not strictly incorrect).
precision = 1/19, precision@1 = 1/2, recall@1 = 1/1
45.5 secs for mapping the onto-triples, 4.3 secs for fusion.
What borders Croatia?
20 answers from 19 sources (Europe, Bosnia_and_Herzegovina, Hungary, Montenegro, Serbia, Slovenia...and noisy answers such as Zagreb, war-and-revolution and duplicated terms for Croatia).
precision = 6/20, precision@1 = 1/1, recall@1 = 1/6
11.0 secs for mapping the onto-triples, 7.4 secs for fusion.
where is islam practiced?
20 answers from 10 sources (iran, iraq, christians-and-christianity, japan, mosques, new-york-city, nigeria, etc.).
precision = 6/20, precision@1 = 6/20, recall@1 = 6/6
14.7 secs for mapping the onto-triples, 22.2 secs for fusion.
who knows Enrico Motta?
24 answers from 27 sources (Barry_Norton, Carlos_Pedrinaci, Alan_Ruttenberg, Deborah_McGuinness, Denny_Vrandecic, Diana_Maynard, York_Sure, Stefan_Decker, etc.).
precision = 24/24, precision@1 = 2/2, recall@1 = 2/24
88.9 secs for mapping the onto-triples, 9.3 secs for fusion.
who has an interest in ontology evaluation?
2 answers from 11 sources (York_Sure, Denny_Vrandecic, all the sources are in fact from ontoworld.org or semanticweb.org, e.g.:http://ontoworld.org/index.php/Special:ExportRDF/ESWC2006?xmlmime=rdf, http://ontoworld.org/index.php/Special:ExportRDF/OntoClean?xmlmime=rdf, etc.).
precision = 2/2, precision@1 = 2/2, recall@1 = 2/2
31.2 secs for mapping the onto-triples, 2.76 secs for fusion.
who works at AIFB?
31 answers from 37 sources (AIFB, Denny_Vrandecic, Anupriya_Ankolekar, Peter_Haase, Andreas_Eberhart,etc.).
precision = 30/31, precision@1 = 3/4, recall@1 = 3/30
74.0 secs for mapping the onto-triples, 28.2 secs for fusion.
which organizations are working on the Semantic Web?
17 answers from 13 sources (AIFB, DERI_Galway, DERI_Innsbruck, Salzburg_Research,etc.).
precision = 17/17, precision@1 = 3/3, recall@1 = 3/17
31.2 secs for mapping the onto-triples, 8.0 secs for fusion.
who attended both at ISWC2006 and ESWC2006?
72 answers from 53 sources (Abraham_Bernstein, Boris_Motik, Dieter_Fensek, Steffen_Staab, Tom_Heath, Aldo_Gangemi ,etc.).
precision = 72/72, precision@1 = 5/5, recall@1 = 5/72
169.3 secs for mapping the onto-triples, 8.0 secs for fusion.
Which countries are in Europe?
9 answers from 3 sources (Belgium, Czech, France, Germany, Holland, EastEurope, etc.).
precision = 7/9, precision@1 = 7/7, recall@1 = 7/7
12.5 secs for mapping the onto-triples, 8.6 secs for fusion.
does Jones Marion takes steroids?
1 answers from 2 sources .
precision = 1/1, precision@1 = 1/1, recall@1 = 1/1 7.1 secs for mapping the onto-triples, 0 secs for fusion.
Who attended ESWC2007 ?
25 answers from 27 sources (Stefan_Decker, Abraham_Bernstein, Alexander_Loeser, etc.).
precision = 25/25, precision@1 = 1/1, recall@1 = 1/25
30.2 secs for mapping the onto-triples, 8.9 secs for fusion.
where is Tom Heath working?
32 answers from 30 sources (KMi, Knowledge_Media_Institute, Talis, Talis_Information_Ltd, The_Open_University, Chris_Bizer, ESWC2007, foaf.rdf, http://kmi.open.ac.uk/people/tom, technical_girls_and_guys).
precision = 7/32 precision@1 = 0/2, recall@1 = 0/25
36.4 secs for mapping the onto-triples, 8.1 secs for fusion.
What is Sierra Leone?
15 answers from 11 sources (Africa, lessDevelopedCountry, Atlantic_Ocena, Guinea, Liberia, Sierra Leone).
precision = 1/15 precision@1 = 1/2, recall@1 = 1/1
8.4 secs for mapping the onto-triples, 4.32 secs for fusion.
in which city is Barajas international airport?
1 answers from 1 source (Madrid)
precision = 1/1 precision@1 = 1/1, recall@1 = 1/1
3.7 secs for mapping the onto-triples, 0 secs for fusion.
what airport is in washington?
1 answers from 1 source (Dulles_International_Airport)
precision = 1/1 precision@1 = 1/1, recall@1 = 1/1
4.8 secs for mapping the onto-triples, 0 secs for fusion.
who works at the Open University?
5 answers from 3 source (DAndriy_Nikolov, Liliana_Cabral, Marta_Sabou, etc.)
precision = 5/5 precision@1 = 5/5, recall@1 = 5/5
28.0 secs for mapping the onto-triples, 0.8 secs for fusion.
Which sea is close to Volgograd?
2 answers from 4 source (Azov_sea, Caspia_sea)
precision = 2/2 precision@1 = 1/1, recall@1 = 1/2
4.4 secs for mapping the onto-triples, 0.3 secs for fusion.
what is the religion in Russia?
3 answers from 2 source (Animist, Islam, Russian_Orthodox)
precision = 3/3 precision@1 = 3/3, recall@1 = 3/3
4.6 secs for mapping the onto-triples, 0 secs for fusion.
Which are the universities based in Madrid
2 answers from 1 source (Universidad_Complutense_de_Madrid, Universad_Politecnica_de_Madrid)
precision = 2/2 precision@1 = 0/2, recall@1 = 0/2 (merging failed)
3.9 secs for mapping the onto-triples, 0 secs for fusion.
Which are the universities based in Madrid
2 answers from 1 source (Universidad_Complutense_de_Madrid, Universad_Politecnica_de_Madrid)
precision = 2/2 precision@1 = 0/2, recall@1 = 0/2 (merging failed)
3.9 secs for mapping the onto-triples, 0 secs for fusion.
where is Odessa?
1 answers from 2 sources (Russia)
precision = 1/1 precision@1 = 1/1, recall@1 = 1/1
2.8 secs for mapping the onto-triples, 0 secs for fusion.
Find me all cities in California
20 answers from 11 sources (Arizona, California, San Diego, Sacramento, San Franciso, Mexico, Santa Cruz, Pacific_Ocean)
precision = 6/20 precision@1 = 6/18, recall@1 = 6/6
14.7 secs for mapping the onto-triples, 6.2 secs for fusion.
In which countries are earthquakes
39 answers from 4 sources (alaska, seismology, earthquakes, food_contamination-and-poisoning)
precision = 1/39 precision@1 = 1/39, recall@1 = 1/39
5.9 secs for mapping the onto-triples, 2.2 secs for fusion.
Name the president of Russia
1 answers from 2 sources (Vladimir_Vladimirovich_Putin)
precision = 1/1 precision@1 = 1/1, recall@1 = 1/1
4.8 secs for mapping the onto-triples, 0 secs for fusion.
Name the president of Russia
1 answers from 2 sources (Vladimir_Vladimirovich_Putin)
precision = 1/1 precision@1 = 1/1, recall@1 = 1/1
4.8 secs for mapping the onto-triples, 0 secs for fusion.
Give me oil industries in Russia
1 answers from 2 sources (Lukoil)
precision = 1/1 precision@1 = 1/1, recall@1 = 1/1
12.8 secs for mapping the onto-triples, 0 secs for fusion.
Give me all oceans and seas
11 answers from 10 sources (Arafura_See, Azov_Sea, Caspian_Sea, Gulf of Mexico, PacificOcean)
precision = 11/11 precision@1 = 6/6, recall@1 = 6/11
23.8 secs for mapping the onto-triples, 1.7 secs for fusion.
-
Results:
- Average seconds for mapping = 27.7 secs
- Average seconds for fusion = 5.43 secs
- Average seconds total = 33.1 secs
- Average precision = 77 %
- Average precision@1 = 83 %
- Average recall@1 = 69 %
-
Problems and Limitations:
- Semantic sources in the open Web have very bad quality: duplicated, very small (foaf files) and very noisy.
- Large ontologies are missing (many of them are available in a .zip format and cannot be crawl by search engines)
- Lack of schema: no domain and range for properties, not type defined for instances or classes (or defined as Thing)
- Not intuitive labels (string algorithms fail to match the correct term), e.g.: the user keyword "pc member" could not be map to the property with label "Property-3AHas_PC_member"
- Domain sparseness (geography, semantic conference, publications, foaf files) and many not populated ontologies (taxonomies only)
- The main problem is that ontologies are split in different files and those different files are not recognize as part of the same graph or ontology (as with Virtuoso or Sesame), thus the schema definition is not part of the instantiated triples (types missing or define somewhere else)
- Different graphs (URIs) for the same source. For example, consider the query "where islam is practiced?", for which there are many files (with no associated schema) which contains mappings for islam (most of the sources are originated from the same domain http://aaronland.info/nytimes, but they have different URI and are not considered part of the same graph:
- Triples in http://aaronland.info/nytimes/knows/related/2004/05/13/index.rdf [3]
(nigeria, place, islam), (chirstians-and-christianity, idea, islam) (violence, idea, islam)
- Triples in http://aaronland.info/nytimes/knows/related/2004/05/17/index.rdf [2]
(iraq, place, islam) (politics-and-government, idea, islam)
- Triples in (mappings in http://aaronland.info/nytimes/knows/related/2004/05/15/index.rdf [2])
(roman-catholic-church, organizaton, islam) (marriages, idea, islam)
- Triples in http://aaronland.info/nytimes/knows/related/2004/05/28/index.rdf [6])
(mosques, idea, islam) (new-york-city, idea, islam) (taxi-and-limousine-commission, organzation, islam)