We've learned about ontologies and semantic languages (RDF, RDFS, OWL). But how do we actually connect these powerful tools to the billions of existing Web pages?
The Semantic Web is about extending the current Web to make it more machine-understandable. But we have two separate worlds that need to be connected.
HTML pages for humans
No machine semantics
Ontologies (OWL)
Machine-readable semantics
Ontologies are independent of Web pages. An ontology doesn't provide a link to any specific Web page and is not linked from any Web page either. We need a bridge to connect them!
Semantic markup (also called annotation) adds semantics to current Web pages, connecting them to appropriate ontologies so machines can understand the content.
A markup file is normally an RDF document containing RDF statements that describe the content of a Web page by using terms defined in one or several ontologies.
When an agent reaches a Web page with a markup document, it reads the markup and loads related ontologies. The agent becomes a smart agent that can "understand" the content and "deduce" implied facts about the page.
Decide which ontology (or ontologies) to use. Read and understand the ontology to ensure it fits your needs. Try to reuse existing ontologies when possible, or extend them if needed.
Decide what content to markup (not everything needs to be marked up). Create RDF statements based on the ontology. Use a simple editor or specialized tools. Always validate your document!
Inform the world that your page has a markup document. Add a <link> tag in the HTML header pointing to your markup file. Store the markup file where it can be accessed publicly.
"If there were an agent visiting my page, what information would it want to understand?" This helps you decide what content to markup.
A simple HTML review page for the Nikon D70 digital camera. The page contains information for humans but no machine-readable semantics.
We choose our Camera ontology because:
This page discusses a digital SLR camera named "Nikon D70", written by a photographer named "Liyang Yu", with specifications like 6 megapixels.
We start by declaring that the page discusses an instance of the SLR class from our camera ontology.
The page author is Liyang Yu, who is a Photographer according to our ontology.
1. Page URL and title (using Dublin Core)
2. Camera is an SLR with 6 megapixel sensor
3. Written by a Photographer named Liyang Yu
Add a <link> tag in the HTML header to point to your markup file. This tells agents where to find the semantic information.
If you create your markup manually, always validate it! A simple syntax error may cause a chunk of your RDF document to be ignored by agents.
SMORE (Semantic Markup, Ontology, and RDF Editor) is a tool from University of Maryland that lets you create markup documents without deep OWL knowledge.
Use File menu to load ontologies from local machine or Web. No limit on how many ontologies you can load. The tool displays all classes in a tree view.
Enter the URL in the textbox. The page can be on your local machine or a Web server. SMORE displays both the ontology tree and the Web page side by side.
Click "New Individual" tab, enter a name, then drag a class from the ontology tree to assign the type. The new instance appears in the instance window.
Click the Individuals tab to see available properties for each instance. Fill out the form to set property values. SMORE handles syntax and namespaces automatically.
Click the RDF/XML tab to see your markup document. Make any final modifications. The syntax is guaranteed to be correct!
SMORE can also validate your ontology! If there's an error in your OWL file, SMORE won't open it.
Manual markup requires significant effort. Who will markup the billions of existing pages? Why should page owners learn ontologies and do this work without a killer application to demonstrate the benefit?
Some research exists on automatic markup, but most techniques only work for technical texts. The Internet's heterogeneous, natural language content remains challenging.
If we can't modify original pages, we need centralized markup storage. This requires building servers to store markup documents with efficient indexation systems. Agents would query this database for each page they visit.
Without a killer Semantic Web application → no motivation for owners to markup pages.
Without marked-up pages → killer applications cannot be created.
Instead of marking up the entire Internet, focus on specific domains (e.g., Bioinformatics) and manually markup the majority of pages within that domain.
Create dedicated servers to store markup documents with efficient indexation systems for quick lookups.
Build excellent Semantic Web applications for specific domains to demonstrate the power and value of semantic markup.
As the community realizes the value, page owners will start marking up their own documents — similar to how websites became essential for business.
"Recall the days when only a few big companies had Web sites. Soon everyone realized that without a Web site, there could be a huge loss of business. Let us hope that such a day will soon arrive for the Semantic Web."
Compare traditional keyword search vs. semantic search
Matches "digital", "SLR", "amateur" as text strings
Cannot distinguish camera types or user expertise levels
Type: SLR, Target: Amateur Photographer, Price: <$1000
Type: SLR, Target: Entry Level, Price: <$1000
How would semantic markup change the way we search for products, research papers, or job listings? What other domains would benefit most?
What is the PRIMARY purpose of semantic markup?
Which step comes FIRST in the semantic markup procedure?
Semantic markup is the practical implementation of adding semantics to the Web. It's the crucial step that transforms our theoretical knowledge of ontologies into real-world machine understanding.
Practice creating markup documents for simple Web pages. Explore tools like SMORE. Consider how markup could enhance Web applications in your domain of interest.