There have been a number of stories and references to the “Deep Web” in the media over the last two months, including references in Season Two of the Netflix series “House of Cards.” With a renewed interest, I wanted to make sure that I was clear on the different terms associated with the Deep Web. My research prompted me to dig even deeper (pun intended).
The Surface Web
The surface web is the part of the web that is searched by sites such as Google, Yahoo, and Bing. It is estimated that this surface layer accounts for only 1–5 percent of the entire web, as illustrated in a recently posted infographic from CNN. This surface layer excludes database search results and all corporate and academic sites behind a firewall. Search engines build and search from an index, so if a site is not part of the publicly searchable index, then it is not included in this layer. It is also possible for a website to intentionally become unsearchable by using a particular metatag.
The Deep Web
The Deep Web is the layer that lies below the surface. Every time you query an online database, the site creates a new page. That new page, however, is not included in the surface layer index because the web crawlers cannot do the same thing. The web crawler can only build an index by visiting websites and searching their links as well as the links referencing those sites. Other examples of data in the Deep Web are academic journals that are either behind a “for fee” structure or protected by a firewall. All intranet data on corporate networks also resides in the Deep Web layer. Businesses such as Bright Planet provide services that assist you in navigating the Deep Web.
The Dark Web
The top two layers can be considered to house legitimate data and transactions; they simply represent information that can be searched and indexed by web crawlers (surface) and information that cannot be seen by automated searchers (deep). Within the Deep Web, however, is an isolated area called the dark web. This is the area where cyber tracks are erased and transactions for goods and services may or may not be legal or legitimate. You can access this part of the web through browsers such as TOR that can be downloaded and allows access to the TOR network. TOR is an acronym which stands for “The Onion Router.” If you think about an onion and its layers, TOR allows you to access the core of that onion. TOR operates by hiding originating addresses among a network of servers so the end user remains anonymous. This area may house legitimate anonymous transactions but it is also the home of drug and other illicit trading.
I think it is important to understand the different terms relating to the different layers of the web and to understand the purpose of each layer. Could you benefit from a service that dissects the larger Deep Web for big data not available in the surface web? It is possible and very useful to be knowledgeable about all available options so you can provide the best IT service to your customers.
About Kelly Brown
Kelly Brown is an IT professional, adjunct faculty for the University of Oregon, and academic director of the UO Applied Information Management Master’s Degree Program. He writes about IT and business topics that keep him up at night.