Online Clarity

Online Media & Marketing Insight from an Online Marketing Consultant in South Africa

How Do Search Engines Work? Part 2

Continuing from my previous post, I’ll continue to try and explain how Search Engines work.

The Performing Search Engine

At the end of the day a search engine is nothing more than a software program designed to sift through billions of pages recorded in its index to find matches to a search query and rank them in an order that it believes is most relevant. Quite a mouthful.

You’re probably wondering how search engines go about determining relevancy, especially when confronted with hundreds of millions of web pages to sort through, right? Quite simply, each search engine has developed a set of rules and mathematical equations, known as an algorithm, which it uses to set the order of its rankings.

Exactly how a particular search engine's algorithm works is a closely-kept secret, but some general rules are clear that are often used to increase a website's ranking performance. This is referred to as search engine optimisation.

In a nutshell, search engines use on and off page copy to group related pages into vertical themes. If we take a page relating to the entertainment industry, these themes or groups could be music entertainment, movie entertainment, movie star entertainment, etc. Each theme has common words and phrases that best describe the pages the group contains. Some pages may belong to more than one group. For instance, a page relating to movie profits could belong to both financial and entertainment groups.

The SERP (or Search Engine Results Page)

After applying this algorithm to their index of sites, a search engine comes up with a list of the most relevant results according to the search conducted.

To simplify an otherwise complex process – and believe me, it’s complex - , when a user enters a search query, the search engine analyses and searches its index for the web pages it considers relevant to the query. Once it has a shortlist of the relevant pages, it further calculates what order they are presented to the user in, based on further algorithmic factors. These could be a user's location and possibly even their search history.
This algorithm differs between the different engines, which is why different search engines may produce different results for the same query. Each search engine has its niche. It is however not uncommon for a user to use more than one search engine at a time. This further demonstrates the importance for website owners to be indexed and ranked well on all search engines, and not just concentrating on one search engine, such as Google.

Conclusion

The aim of a search engine is to deliver appropriate, relevant, information-rich sites that will satisfy users, first time round. It’s a very exciting challenge and one that sees knew developments from the Search Engines. All you need to worry about though is making your site as informative, engaging, accessible and usable as possible. The rest will happen naturally.

16/01/2006 in SEO/SEM | Permalink | Comments (0)

How Do Search Engines Work? Part 1

I’m breaking this post into two. My aims is to shed some uncertainty you may have about search engines and in order to do so, I’d like to keep it simple. After all, if you want to benefit from being listed on search engines, you'd better know how they work in the simplest manner possible.

Think of the Number Three

Crawler-based search engines are made up of three major elements: the spider, the index, and the software. Each has its own function and together they produce what we have come to trust (or distrust) on the SERPs (Search Engine Results Pages).

The Crawling Spider

Also known as a web crawler or robot, a search engine spider is an automated program that reads web pages and follows any links, preferably text based, to other pages within the site. This is often referred to as a site being "spidered" or "crawled". There are three very active spiders on the Net. Their names are Googlebot (Google), Slurp (Yahoo!) and MSNBot (MSN Search).

Spiders start their journeys with a list of page URLs that have previously been added to their index (database). As the spider visits these pages, crawling the code and content, it adds new pages (links) that it finds on the page to its index. As such, one could refer to a spider as feeding an evolving index, which is discussed below.

Search engine spiders return to the sites in its index on a regular basis, scanning for any changes. How often the spider returns is up to the search engines to decide. Website owners do have some control in how often a spider visits their site though by making use of a robot.txt file. Search engines first look for this file before crawling a page further. So, if for instance you didn’t want a page on your site to be indexed and listed on the Search Engines, then you would edit the robot.txt file.

The Growing Index

An index is like a giant catalogue or inventory of websites containing a copy of every web page and file that the spider finds. If a web page changes, this catalogue is updated with the new information. To give you an idea of the size of these indexes, the latest figure released by Google is over 8 billion pages.

It sometimes takes a while for new pages or changes that the spider finds to be added to its index. Thus, a web page may have been "spidered" but not yet "indexed." Until a page is indexed - added to the index - spidered pages will not be available to those searching with the search engine.

16/01/2006 in SEO/SEM | Permalink | Comments (0)

About

Subscribe to this blog's feed

Recent Posts

  • Online Word-of-mouth
  • Competitor and Demographic Analysis
  • Online Marketing Defined
  • Eight Stages to a Successful Email Campaign
  • Some Basics in Online Strategy
  • Who’s Your Target Audience?
  • Do Not Disregard Response-Driven Copy
  • Can your business make use of Podcasting?
  • What is blogging?
  • Adding value

Categories

  • Advertising
  • Blogs
  • Email Marketing
  • Online Advertising
  • Online Copywriting
  • Online Marketing
  • Podcasting
  • SEO/SEM
  • Viral Marketing
  • Web/Tech

Archives

  • February 2006
  • January 2006
Blog powered by TypePad