From Traditional Web to Semantic Web

Chapter 1: Introduction to the Semantic Web

Eng. Dr. Tiroshan Madushanka

Welcome! 👋

Understanding the Semantic Web

What is the Semantic Web?

Despite having about 290 web pages explaining "what is Semantic Web," it's still quite unclear what it is, why we need it, how we build it, and how to use it.

Today's Learning Journey

  • Examine how we use the World Wide Web in our daily life
  • Study how search engines work in traditional Web environments
  • Understand the common difficulties we experience with the Web
  • Introduce the concept of the Semantic Web
  • Learn how added semantics changes search engines
  • Explore the fundamental concept of metadata

🤔 Opening Discussion (5 min)

Question: What frustrations have you experienced when searching for information on the web?

Think about: irrelevant results, too many results, difficulty finding specific information

1.1 What is the World Wide Web? 🌍

Understanding the foundation

WWW stands for World Wide Web or, simply, the Internet. It's a magical place where:

  • Anyone with a server can publish documents for the world to see
  • Documents can be hyperlinked to any other document
  • Platform and location don't matter

The Internet: Nearly Two Decades of Magic ✨

The Web has become the ultimate information source, presenting both intriguing challenges and promising opportunities.

Platform Independence Example:

A page served from a Unix server in Beijing, China can be viewed on a Macintosh machine in Atlanta, GA. You can browse it regardless of the underlying technology!

💭 Think About It (3 min)

How has the web changed your daily life in the past 5 years?

1.1.1 How Are We Using the Internet?

Three main activities

🎯 Three Main Uses

Search, Integration, and Web Mining

  • 🔍 1. Search

    Goal: Locate and access information or resources on the Web

    Examples:

    • Finding recipes for margaritas
    • Locating a real estate agent
    • Looking up technical specifications

    The Problem: Search engines rely on keyword matching, leading to many irrelevant results

  • 🔗 2. Integration

    Definition: Combining and aggregating resources on the Web so they can be collectively useful

    Common Example: Restaurant Search

    1. Search for Indian restaurants
    2. Pick a restaurant and note the address
    3. Open map utility to get directions

    Problem: Too much manual work - wouldn't it be nice to have automation?

  • ⛏️ 3. Web Data Mining

    Definition: The nontrivial extraction of useful information from large distributed datasets

    Professional Example: Air Traffic Control Analysis

    • Gathering historical takeoff rates from multiple airports
    • Analyzing weather impact patterns
    • Data exists but is scattered and mixed with irrelevant information

    Solution: Specialized crawler agents to collect and organize data

    Problem: Highly specialized, case-by-case development required

Complex Integration: Booking an Airline Ticket

  1. Step 1: Consume airline's Web service to get flight schedules
  2. Step 2: Feed selected flights to travel agent's service for pricing
  3. Step 3: Invoke payment service to book the ticket

This involves three different Web services, but you still have to manually integrate these steps!

Interactive Demo: The Search Problem 🔍

Understanding keyword matching limitations

The Classic Example: Searching for "SOAP"

SOAP is a W3C standard for Web services (Simple Object Access Protocol). But when you search for it...

Try It Yourself!

Search Results: ~128,000,000 results

1. Buy Ivory Soap - Best Detergent Online

Get the best deals on dish soap and laundry detergent...

2. Soap Opera Daily - Latest Episodes

Watch your favorite soap operas online...

3. Handmade Natural Soaps - Organic Products

Shop our collection of artisanal soaps...

4. SOAP Protocol Specification - W3C

Simple Object Access Protocol (SOAP) Version 1.2...

5. Soap Making Tutorial - DIY Guide

Learn how to make your own soap at home...

❌ The Problem

Search engines implement keyword matching. As long as a document contains the keyword, it's included in results.

Result: You get detergents, soap operas, and only after sifting through multiple pages do you find the W3C's SOAP specifications!

💡 Discussion Point (5 min)

Why can't search engines distinguish between different meanings of the same word?

Hint: Think about what information is available to the search engine

1.1.2 What Stops Us From Doing More? 🚧

Identifying the core limitations

❌ Current Problem

  • ✗ Computers present information
  • ✗ Cannot "understand" information
  • ✗ All documents look the same to machines
  • ✗ Cannot make intelligent decisions
  • ✗ Too much manual work

✓ What We Need

  • ✓ Computers understand meaning
  • ✓ Make intelligent decisions
  • ✓ Filter information automatically
  • ✓ Automated integration
  • ✓ Global-scale processing

🎯 The Root Cause

The Internet is constructed such that it is oblivious to actual information content.

Web browsers, servers, and search engines cannot distinguish:

  • Weather forecasts from scientific papers
  • Personal homepages from corporate websites
  • Product descriptions from user reviews

💭 If We Had Magic Powers...

  • Reconstruct the Internet so computers understand information
  • Enable intelligent decisions on our behalf
  • Filter results before presentation
  • Automate integration tasks
  • Make Web mining less expensive

The Common Thread

All three activities (search, integration, web mining) suffer from the same underlying problem:

Documents only contain information for computers to PRESENT them, not to UNDERSTAND them.

1.2 A First Look at the Semantic Web 🚀

The vision for machine-understandable data

Tim Berners-Lee's Vision

"The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation... a web of data that can be processed directly and indirectly by machines."

— Tim Berners-Lee, James Hendler, Ora Lassila

W3C's Definition

"...the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration, and reuse of data across various applications."

🎯 Key Concepts

  • Machine-Readable Web: Computers can understand, not just display
  • Automation: Intelligent agents can act on our behalf
  • Integration: Seamlessly combine resources
  • Reuse: Data can be repurposed across applications

What is the Semantic Web? - Complete Picture

  • ✓ Current Web is made up of many Web documents (pages)
  • ✓ Documents currently only give machines instructions about presentation
  • ✓ Machines have no idea about the meaning of documents
  • ✓ Every document looks the same to machines
  • ✓ Machines cannot make intelligent decisions
  • ✓ Developers cannot process documents on a global scale
  • Solution: Add extra data to documents to enable understanding
  • ✓ This extra information is called metadata
  • ✓ This modified Web is the Semantic Web

1.3 An Introduction to Metadata 📊

The building block of the Semantic Web

⚠️ Most Important Reason for Metadata

Metadata is STRUCTURED DATA that machines can READ and UNDERSTAND

1.3.1 The Basic Concept

Metadata = "Data about data"

It is data that describes information resources.

Example: Metadata for a Web Document

  • 📝 Title of the document
  • 👤 Author of the document
  • 📅 Date created
  • 🏷️ Subject/keywords
  • 📄 Format (HTML, PDF, etc.)
  • 🌐 Language

Why Metadata Standards?

Without standards, every document would have its own unique metadata structure. An automated agent couldn't process metadata uniformly.

Solution: Metadata Schema - an agreed-on set of criteria for describing data.

Dublin Core (DC) Metadata Schema

Developed in March 1995, it has 15 elements (originally 13) - the minimum required to facilitate discovery of document-like objects.

<html> <head> <title>A joke written by Liyang</title> <meta name="DC.Title" content="a joke written by Liyang"> <meta name="DC.Creator" content="Liyang"> <meta name="DC.Type" content="text"> <meta name="DC.Date" content="2004"> <meta name="DC.Format" content="text/html"> <meta name="DC.Identifier" content="http://www.example.com/joke"> </head> <body> I decided to make my first son a medical doctor... </body> </html>

Note: These metadata are not displayed by the browser - they're for automated agents!

Summary & Knowledge Check 🎓

Test your understanding

🎯 Key Takeaways

  • The Semantic Web extends the current Web to enable machine processing
  • Three main Web activities: Search, Integration, Web Mining
  • All suffer from the same problem: machines can't understand meaning
  • Solution: Add metadata (structured, machine-readable data)
  • Metadata is the building block for the Semantic Web
  • Standards like Dublin Core ensure uniformity

📝 Quiz Time!

Question 1: What is the main problem with traditional web search engines?

A) They are too slow
B) They don't index enough pages
C) They can only do keyword matching and cannot understand meaning
D) They are expensive to use

Question 2: What does metadata provide?

A) Better graphics for web pages
B) Structured data that machines can read and understand
C) Faster loading times
D) Security features

Question 3: The three main uses of the Internet are:

A) Email, Social Media, Shopping
B) Search, Integration, Web Mining
C) Gaming, Streaming, Browsing
D) Reading, Writing, Computing

🎤 Final Discussion Questions (15 min)

  1. How might the Semantic Web change your daily internet activities?
  2. What challenges do you foresee in implementing the Semantic Web?
  3. Can you think of a specific application where semantic understanding would be crucial?
  4. How can we add metadata to existing web pages that we don't own?

🔜 Next Session Preview

In Chapter 2, we'll dive deep into how search engines work in both traditional and Semantic Web environments, with concrete examples showing the precise benefits of the Semantic Web!

1 / 9