Mark Up Your Web Document, Please!

Chapter 9: Semantic Markup — Building the Bridge Between Worlds

Eng. Dr. Tiroshan Madushanka

Learning Objectives

What you will learn in this chapter

By the end of this lecture, you will be able to:

  • Understand what semantic markup is and why it's essential
  • Explain the connection between the current Web and the Semantic Web
  • Follow the three-step procedure for semantic markup
  • Create markup documents manually using RDF
  • Use tools like SMORE to create markup documents
  • Discuss current challenges and issues in semantic markup

💭 Opening Thought

We've learned about ontologies and semantic languages (RDF, RDFS, OWL). But how do we actually connect these powerful tools to the billions of existing Web pages?

The Two Worlds Problem

Section 9.1: A Connection Between Two Worlds

The Core Challenge

The Semantic Web is about extending the current Web to make it more machine-understandable. But we have two separate worlds that need to be connected.

🌐 Current Web

HTML pages for humans
No machine semantics

🧠 Semantic World

Ontologies (OWL)
Machine-readable semantics

What's Missing?

Ontologies are independent of Web pages. An ontology doesn't provide a link to any specific Web page and is not linked from any Web page either. We need a bridge to connect them!

The Bridge = Semantic Markup

Semantic markup (also called annotation) adds semantics to current Web pages, connecting them to appropriate ontologies so machines can understand the content.

What is Semantic Markup?

Understanding the concept

Definition

A markup file is normally an RDF document containing RDF statements that describe the content of a Web page by using terms defined in one or several ontologies.

❌ Without Markup

  • Page looks great to humans
  • No machine semantics
  • Agent sees only HTML tags
  • Cannot "understand" content

✓ With Markup

  • Page still looks great to humans
  • Machine-readable semantics
  • Agent reads markup + ontology
  • Can understand and reason

How Does It Work?

When an agent reaches a Web page with a markup document, it reads the markup and loads related ontologies. The agent becomes a smart agent that can "understand" the content and "deduce" implied facts about the page.

The Procedure of Semantic Markup

Section 9.1.2: Three steps to markup your page
1

Choose Your Ontology

Decide which ontology (or ontologies) to use. Read and understand the ontology to ensure it fits your needs. Try to reuse existing ontologies when possible, or extend them if needed.

2

Create the Markup Document

Decide what content to markup (not everything needs to be marked up). Create RDF statements based on the ontology. Use a simple editor or specialized tools. Always validate your document!

3

Link the Markup to Your Page

Inform the world that your page has a markup document. Add a <link> tag in the HTML header pointing to your markup file. Store the markup file where it can be accessed publicly.

💡 Key Question to Ask

"If there were an agent visiting my page, what information would it want to understand?" This helps you decide what content to markup.

Manual Markup: A Real Example

Section 9.2: Marking up a Nikon D70 review page

Our Example Page

A simple HTML review page for the Nikon D70 digital camera. The page contains information for humans but no machine-readable semantics.

Step 1: Choosing the Ontology

We choose our Camera ontology because:

  • The page is a review (not a sales page)
  • Contains terms like Digital, SLR, Photographer
  • No pricing/vendor information needed

What We Want to Express

This page discusses a digital SLR camera named "Nikon D70", written by a photographer named "Liyang Yu", with specifications like 6 megapixels.

<html> <head> <title> Nikon D70 Review </title> </head> <body> <h2>D70 Review By Liyang Yu</h2> <img src="D70.jpg"> <h4>Basic Information About Nikon-D70:</h4> <ul> <li>announced on 28th January 2004</li> <li>6 megapixel CCD sensor</li> ... </ul> </body> </html>

Building the Markup Document

Step 2: Creating RDF statements

Initial Markup: Declaring the Camera Instance

We start by declaring that the page discusses an instance of the SLR class from our camera ontology.

<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:camera="http://www.yuchen.net/photography/Camera.owl#"> <rdf:Description rdf:about="...#Nikon-D70"> <rdf:type rdf:resource="...Camera.owl#SLR"/> <rdfs:label>Nikon-D70</rdfs:label> <rdfs:label>D-70</rdfs:label> </rdf:Description> </rdf:RDF>

Enriching the Markup

Adding photographer and specifications

Adding the Author (Photographer)

The page author is Liyang Yu, who is a Photographer according to our ontology.

<!-- Add Photographer instance --> <camera:Photographer rdf:about="...#LiyangYu"> <rdfs:label>Liyang Yu</rdfs:label> </camera:Photographer> <!-- Add page metadata using Dublin Core --> <rdf:Description rdf:about="...NikonD70Review.html"> <dc:title>D70 Review By Liyang Yu</dc:title> <dc:creator>Liyang Yu</dc:creator> </rdf:Description> <!-- Add camera specifications --> <camera:pixel rdf:datatype="...#MegaPixel">6</camera:pixel> <camera:has_spec> <rdf:Description> <camera:model>Nikon-D70</camera:model> </rdf:Description> </camera:has_spec>

Final Markup Conveys:

1. Page URL and title (using Dublin Core)
2. Camera is an SLR with 6 megapixel sensor
3. Written by a Photographer named Liyang Yu

Step 3: Linking the Markup

Making the markup discoverable

The Final Step

Add a <link> tag in the HTML header to point to your markup file. This tells agents where to find the semantic information.

<html> <head> <link rel="meta" type="application/rdf+xml" href="/markup.rdf"/> <title> Nikon D70 Review </title> </head> <body> ... (rest of the page) ... </body> </html>

⚠️ Important Reminder

If you create your markup manually, always validate it! A simple syntax error may cause a chunk of your RDF document to be ignored by agents.

Validation Checklist

  • Use an RDF validator (as discussed in Chapter 3)
  • Ensure proper namespace declarations
  • Check that the markup file is publicly accessible
  • Verify the link tag points to the correct location

Markup Using Tools: SMORE

Section 9.3: Easier markup with GUI tools

What is SMORE?

SMORE (Semantic Markup, Ontology, and RDF Editor) is a tool from University of Maryland that lets you create markup documents without deep OWL knowledge.

  • Step 1: Load Your Ontology

    Use File menu to load ontologies from local machine or Web. No limit on how many ontologies you can load. The tool displays all classes in a tree view.

  • Step 2: Load Your Web Page

    Enter the URL in the textbox. The page can be on your local machine or a Web server. SMORE displays both the ontology tree and the Web page side by side.

  • Step 3: Create Individuals

    Click "New Individual" tab, enter a name, then drag a class from the ontology tree to assign the type. The new instance appears in the instance window.

  • Step 4: Add Properties

    Click the Individuals tab to see available properties for each instance. Fill out the form to set property values. SMORE handles syntax and namespaces automatically.

  • Step 5: Export RDF/XML

    Click the RDF/XML tab to see your markup document. Make any final modifications. The syntax is guaranteed to be correct!

Bonus Feature

SMORE can also validate your ontology! If there's an error in your OWL file, SMORE won't open it.

Semantic Markup Issues

Section 9.4: Current challenges and open questions

🤔 Who and Why?

Manual markup requires significant effort. Who will markup the billions of existing pages? Why should page owners learn ontologies and do this work without a killer application to demonstrate the benefit?

🤖 Automatic Markup?

Some research exists on automatic markup, but most techniques only work for technical texts. The Internet's heterogeneous, natural language content remains challenging.

🏢 Centralized vs. Decentralized?

If we can't modify original pages, we need centralized markup storage. This requires building servers to store markup documents with efficient indexation systems. Agents would query this database for each page they visit.

The Chicken-and-Egg Dilemma

Without a killer Semantic Web application → no motivation for owners to markup pages.
Without marked-up pages → killer applications cannot be created.

A Path Forward

Potential solutions to the markup challenge

Domain-Specific Approach

Instead of marking up the entire Internet, focus on specific domains (e.g., Bioinformatics) and manually markup the majority of pages within that domain.

1

Build Centralized Markup Servers

Create dedicated servers to store markup documents with efficient indexation systems for quick lookups.

2

Create Domain-Specific Applications

Build excellent Semantic Web applications for specific domains to demonstrate the power and value of semantic markup.

3

Drive Organic Adoption

As the community realizes the value, page owners will start marking up their own documents — similar to how websites became essential for business.

"Recall the days when only a few big companies had Web sites. Soon everyone realized that without a Web site, there could be a huge loss of business. Let us hope that such a day will soon arrive for the Semantic Web."

Interactive: Semantic Search Demo

Experience the difference semantic markup makes

🔍 Search for Camera Reviews

Compare traditional keyword search vs. semantic search

Traditional Search

Page contains keywords...

Matches "digital", "SLR", "amateur" as text strings

No semantic understanding

Cannot distinguish camera types or user expertise levels

Semantic Search

Nikon D70 Review

Type: SLR, Target: Amateur Photographer, Price: <$1000

Canon EOS 300D Review

Type: SLR, Target: Entry Level, Price: <$1000

💭 Discussion Question

How would semantic markup change the way we search for products, research papers, or job listings? What other domains would benefit most?

Knowledge Check

Test your understanding

Quiz: Semantic Markup

What is the PRIMARY purpose of semantic markup?

Which step comes FIRST in the semantic markup procedure?

Chapter Summary

Key takeaways from Lecture 09

What We Learned

  • Semantic markup bridges the gap between current Web pages and machine-readable ontologies
  • A markup file is an RDF document describing page content using ontology terms
  • The three-step procedure: Choose ontology → Create markup → Link to page
  • Markup can be created manually (using editors) or with tools (like SMORE)
  • Validation is crucial — syntax errors can cause markup to be ignored
  • Major challenges include: motivation, automation, and centralization

The Big Picture

Semantic markup is the practical implementation of adding semantics to the Web. It's the crucial step that transforms our theoretical knowledge of ontologies into real-world machine understanding.

📚 Next Steps

Practice creating markup documents for simple Web pages. Explore tools like SMORE. Consider how markup could enhance Web applications in your domain of interest.

1 / 15