We are pleased to offer http://sameas.org/ as a service to provide you with help finding URIs.
It sort of does what it says – if you provide a URI, it will give you back URIs that may well be co-referent, should any be known to it. In addition, by using sindice.com, if you search for a string, it will provide bundles of URIs that correspond to looking up that string at Sindice and processing them at <sameAs>.
Apart from the obvious use of the service through the web-form, simple URIs can be provided to do lookups:
The following formats are supported –
rdf+xml, text/n3, application/json, text/plain
For example as URIs:
Or, use content negotiation. For example at the command line:
curl -iLH "Accept: application/rdf+xml" "http://sameas.org/?uri=http://dbpedia.org/resource/Tim_Berners-Lee"
Mischa Tuffield has commented that he can include the following into his foaf file to get some nice extra stuff
<http://mmt.me.uk/foaf.rdf#mischa> rdfs:seeAlso <http://sameas.org/?uri=http://mmt.me.uk/foaf.rdf#mischa> .
Where does this co-reference data come from?
Well, this is rather a long story, as we did not set out to provide this service. But if I tell you the story, you might be able to assess the utility of it in your context.
As part of the RKBExplorer work, we needed to be able to manage co-reference between triplestores (see related publications). We had an existing infrastructure for doing this, the Co-Reference Service (CRS), and we populated these CRSes with the co-reference data we were generating on RKBExplorer.com. As the RKBExplorer application became more sophisticated, we needed to know co-reference information with other sites such as dbpedia and http://data.semanticweb.org/. This enabled us to use information such as descriptions from wikipedia/dbpedia, and the information on conferences and foaf relationships.
However, long ago we discovered that getting things even slightly wrong can cause serious problems once the "network effect" that we are seeking comes into play. A seemingly trivial problem of a source telling us that two different people with the same name are the same person can result in our network relationships between entities that are related to them being badly misrepresented. Such problems would not arise if the raw data is simply being presented.
So I set out to gather co-referent information from sources I thought were sufficiently accurate for my purposes.
I started with the data we already had, and indeed are still generating. I then went to the Linked Data cloud, and harvested from the RDF dumps and SPARQL endpoints that I deemed to be satisfactory. In addition I approached some people who were not publishing in a form I could easily harvest already, such as David Baxter of Opencyc, and asked them to provide the data to me directly.
I have avoided spidering the web for arbitrary data, and indeed would suggest that Sindice is a much better source for this than I can possibly provide.
The question of which predicates I might have used now arises. There is what I consider a deep irony here. For many years, we have been arguing (not always with great success) that the issue of co-reference is much more complicated than can be captured by a simple predicate such as owl:sameAs. On undertaking this task, I found that there are many predicates coming into existence that address this question. In assembling this site, I have used at least the following:
<http://www.w3.org/2002/07/owl#sameAs> <http://www.rkbexplorer.com/ontologies/coref#coreferenceData> <http://umbel.org/umbel/sc/isLike> <http://www.w3.org/2004/02/skos/core#exactMatch> <http://www.w3.org/2004/02/skos/core#closeMatch> <http://open.vocab.org/terms/similarTo> <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym>I accepted the idea of co-reference for each of these on a per source basis. The <sameAs> service currently has a single concept of co-reference, and publishes the data it has in a single way, for example using the owl:sameAs predicate.
I have to say it does beg the question of why there should be so many vocabularies that mint new URIs for these concepts.
So what sources? Here is a non-exhaustive list of places I may have got the data came from:
http://go.bio2rdf.org/ http://purl.org/hcls/ http://moustaki.org/ http://rdf.dmoz.org/ http://doapstore.org/ http://dbpedia.org/ http://rdf.geospecies.org/ http://www.yr-bcn.es/pmika/ http://umbel.org/ http://downloads.dbpedia.org/ http://www.opencyc.org/ http://hcls.deri.org/ http://lingvoj.org/ http://www.cs.vu.nl/STITCH/rameau/ http://rkbexplorer.com/ http://airports.dataincubator.org/ http://telegraphis.net/ http://ontologi.es/rail/stations http://data.linkedct.org/ http://discogs.dataincubator.org/ http://www.bbc.co.uk/music/ http://linkedgeodata.org/ http://data.nytimes.com/ http://bnb.data.bl.uk http://d-nb.info http://data.bibsys.no http://nektar.oszk.hu http://dbpedia.org http://id.loc.gov http://id.ndl.go.jp http://stitch.cs.vu.nl
Finally, please be aware that the data is changing all the time. As people browse using RKBExplorer, the system examines the results and establishes co-reference as appropriate; thus the results provided by the RKBExplorer are intended to improve as time goes by, and also the <sameAs.org> reflection of that will change.
I hope that helps - I confess that in the early days I was simply getting data I needed, rather than preparing to document it.
There is currently no public service to enable arbitrary contribution to the contents of <sameAs>. If you have significant data you would be prepared to give us, then please conact us at the email below. On the other hand, if you have time to help us provide such services, then please feel free to offer your help.
License and Re-use
We believe that Linked Data needs to develop clear, focussed, services that only do one or two things, so that they can be composed and utilised by the more complex services, as well as facilitating re-use. We hope that <sameAs> fits into that category, and that Linked Data application builders will find it an appropriate and useful service for the important task of discovering co-referent URIs.
In addition, by providing formats oriented towards non-Linked Data application, we hope that the use of Linked Data can be spread even wider. If you really want, there are a number of sameAs logos available.
There are currently roughly 18M URIs, with an average of about 3 URIs per bundle.
The information is provided as-is and without any warranty.
We acknowledge the partial financial support of:
- The ReSIST Project, funded by the EU under contract number 026764
- The Korean Institute of Science and Technology Information (KISTI)
We thank the Sindice team for providing the excellent Semantic Web service that allows us to do the text to URI mapping enhancement.
We thank all the people who have provided this information to us, either specifically or by publishing on the web.
There are a number of publications about this work.