Swoogle is a crawler-based indexing and retrieval system specifically designed for Semantic Web Documents (SWDs). It views the Semantic Web as a collection of interconnected SWDs distributed across the Internet.
An SWD is an online document written in RDF or OWL, typically with extensions like:
Unlike traditional search engines that return web pages, Swoogle returns links to specific documents (with extensions .rdf or .owl). This is a fundamental distinction!
Reusing ontologies is even more important than reusing URIs. Without ontology reuse, there would be no common language, no shared understanding, and no interoperability — effectively nullifying the Semantic Web's purpose.
The most popular use of Swoogle! Search for existing ontologies that match your needs using specific terms.
How it works: Query Swoogle with specific terms → Get matching ontologies → Follow links to evaluate if they fit your needs.
Search SWDs for specific resources to enable URI reuse.
Example: Search for a friend named "Liyang" → Find their existing URI → Reuse it to add additional information about that person.
Explore interdocument relations using collected metadata.
Features: Follow namespaces to find related documents, discover internal linkages, study how ontologies are referenced and connected.
The NSF-supported SPIRE project uses Swoogle to find appropriate ontologies and terms to describe biological and ecological data and services. For example, searching for ontologies with keywords "before," "after," and "interval" to describe temporal relationships.
Two web crawlers that discover SWDs distributed across the Internet
Collects metadata for navigation and computation
Classifies relationships and calculates document ranks
Core search engine functionality for finding SWDs
Swoogle distinguishes between two types of SWDs:
ontoRatio: The fraction of individuals recognized as classes and properties. An SWD is considered a strict ontology if ontoRatio ≥ 0.8
Google API
Seeds
Focused
Crawler
Swooglebot
Parser
User
Submissions
Used for calculating ontoRatio to determine if an SWD is an ontology or instance document.
Three interdocument relations:
Metadata serves two key purposes: (1) Provides a navigational tool for users to find documents efficiently, and (2) Supplies information needed to rank pages and terms.
When a user searches for a term, multiple documents may match. Ranking determines which document appears first, helping users find the most important and relevant results.
Similar to how Google's PageRank uses link analysis to rank web pages, Swoogle's ranking algorithms analyze the relationships between SWDs to determine relative importance. Documents that are more widely referenced or imported receive higher rankings.
Unlike traditional documents (streams of words), SWDs are collections of triples, each made up of three URIs. How do you index them effectively?
Every URI has the form: namespace + localName
Swoogle partitions each URI and indexes on both parts. This allows users to search using just the localName without knowing the namespace!
[DEF] OnlineGamingAccount, Organization, Person, Personal, PersonalProfileDocument
SemanticWebDocument, RDFXML, ontoRatio(1.00), metadata, cached
[DEF] MasterThesis, Meeting, Member, Person, PhDStudent
SemanticWebDocument, RDFXML, ontoRatio(1.00), metadata, cached
[DEF] contact, contact-information, contact-person
SemanticWebDocument, RDFXML, ontoRatio(0.91), metadata, cached
Swoogle returns documents with their ontoRatio, metadata links, and cached versions. You can navigate through related namespaces and documents to explore the Semantic Web structure.
Found the FOAF ontology defining Person class, and discovered an instance document (http://www.livejournal.com/community/lj_dev/data/foaf) actually using that class.
Swoogle is primarily designed for researchers and developers in the Semantic Web community. For casual users wanting to find "the best hotel in Las Vegas," Swoogle won't help — it returns SWDs, not general web information.
In the next chapter, we'll explore FOAF (Friend of a Friend), another popular real-world example of the Semantic Web at work!