Understanding the Semantic Web
Despite having about 290 web pages explaining "what is Semantic Web," it's still quite unclear what it is, why we need it, how we build it, and how to use it.
Question: What frustrations have you experienced when searching for information on the web?
Think about: irrelevant results, too many results, difficulty finding specific information
Understanding the foundation
WWW stands for World Wide Web or, simply, the Internet. It's a magical place where:
The Web has become the ultimate information source, presenting both intriguing challenges and promising opportunities.
Platform Independence Example:
A page served from a Unix server in Beijing, China can be viewed on a Macintosh machine in Atlanta, GA. You can browse it regardless of the underlying technology!
How has the web changed your daily life in the past 5 years?
Three main activities
Search, Integration, and Web Mining
Goal: Locate and access information or resources on the Web
Examples:
The Problem: Search engines rely on keyword matching, leading to many irrelevant results
Definition: Combining and aggregating resources on the Web so they can be collectively useful
Common Example: Restaurant Search
Problem: Too much manual work - wouldn't it be nice to have automation?
Definition: The nontrivial extraction of useful information from large distributed datasets
Professional Example: Air Traffic Control Analysis
Solution: Specialized crawler agents to collect and organize data
Problem: Highly specialized, case-by-case development required
This involves three different Web services, but you still have to manually integrate these steps!
Understanding keyword matching limitations
SOAP is a W3C standard for Web services (Simple Object Access Protocol). But when you search for it...
Get the best deals on dish soap and laundry detergent...
Watch your favorite soap operas online...
Shop our collection of artisanal soaps...
Simple Object Access Protocol (SOAP) Version 1.2...
Learn how to make your own soap at home...
Search engines implement keyword matching. As long as a document contains the keyword, it's included in results.
Result: You get detergents, soap operas, and only after sifting through multiple pages do you find the W3C's SOAP specifications!
Why can't search engines distinguish between different meanings of the same word?
Hint: Think about what information is available to the search engine
Identifying the core limitations
The Internet is constructed such that it is oblivious to actual information content.
Web browsers, servers, and search engines cannot distinguish:
All three activities (search, integration, web mining) suffer from the same underlying problem:
Documents only contain information for computers to PRESENT them, not to UNDERSTAND them.
The vision for machine-understandable data
"The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation... a web of data that can be processed directly and indirectly by machines."
— Tim Berners-Lee, James Hendler, Ora Lassila
"...the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration, and reuse of data across various applications."
The building block of the Semantic Web
Metadata is STRUCTURED DATA that machines can READ and UNDERSTAND
Metadata = "Data about data"
It is data that describes information resources.
Without standards, every document would have its own unique metadata structure. An automated agent couldn't process metadata uniformly.
Solution: Metadata Schema - an agreed-on set of criteria for describing data.
Developed in March 1995, it has 15 elements (originally 13) - the minimum required to facilitate discovery of document-like objects.
Note: These metadata are not displayed by the browser - they're for automated agents!
Test your understanding
Question 1: What is the main problem with traditional web search engines?
Question 2: What does metadata provide?
Question 3: The three main uses of the Internet are:
In Chapter 2, we'll dive deep into how search engines work in both traditional and Semantic Web environments, with concrete examples showing the precise benefits of the Semantic Web!