How do search engines work?

19.29 / Diposting oleh Adityawarman /

The term "search engine" is often used generically to describe both
crawler-based search engines and human-powered directories. These two types of
search engines gather their listings in radically different ways.

Crawler-Based Search Engines
Crawler-based search engines, such as Google, create their listings
automatically. They "crawl" or "spider" the web, then people search through what
they have found.

If you change your web pages, crawler-based search engines eventually find
these changes, and that can affect how you are listed. Page titles, body copy
and other elements all play a role.

Human-Powered Directories
A human-powered directory, such as the Open Directory, depends on humans for
its listings. You submit a short description to the directory for your entire
site, or editors write one for sites they review. A search looks for matches
only in the descriptions submitted.

Changing your web pages has no effect on your listing. Things that are
useful for improving a listing with a search engine have nothing to do with
improving a listing in a directory. The only exception is that a good site, with
good content, might be more likely to get reviewed for free than a poor site.

"Hybrid Search Engines" Or Mixed Results
In the web's early days, it used to be that a search engine either presented
crawler-based results or human-powered listings. Today, it extremely common for
both types of results to be presented. Usually, a hybrid search engine will
favor one type of listings over another. For example, MSN Search is more likely
to present human-powered listings from LookSmart. However, it does also present
crawler-based results (as provided by Inktomi), especially for more obscure
queries.

The Parts Of A Crawler-Based Search Engine
Crawler-based search engines have three major elements. First is the spider,
also called the crawler. The spider visits a web page, reads it, and then
follows links to other pages within the site. This is what it means when someone
refers to a site being "spidered" or "crawled." The spider returns to the site
on a regular basis, such as every month or two, to look for changes.

Everything the spider finds goes into the second part of the search engine,
the index. The index, sometimes called the catalog, is like a giant book
containing a copy of every web page that the spider finds. If a web page
changes, then this book is updated with new information.

Sometimes it can take a while for new pages or changes that the spider finds
to be added to the index. Thus, a web page may have been "spidered" but not yet
"indexed." Until it is indexed -- added to the index -- it is not available to
those searching with the search engine.

Search engine software is the third part of a search engine. This is the
program that sifts through the millions of pages recorded in the index to find
matches to a search and rank them in order of what it believes is most relevant.
You can learn more about how search engine software ranks web pages on the
aptly-named How Search Engines Rank Web Pages page.

Major Search Engines: The Same, But Different
All crawler-based search engines have the basic parts described above, but
there are differences in how these parts are tuned. That is why the same search
on different search engines often produces different results. Some of the
significant differences between the major crawler-based search engines are
summarized on the Search Engine Features Page. Information on this page has been
drawn from the help pages of each search engine, along with knowledge gained
from articles, reviews, books, independent research, tips from others and
additional information received directly from the various search engines.

Search Engines vs. Directories

Search Engines: Search engines create listings automatically by crawling a
URL, (unified resource locator), and compiling information about that web site
into a database. When "searching" one of these databases, results are presented
giving emphasis on certain criteria. The methodology of this search and delivery
is known as an algorithm.

If you change one of your web pages, search engines eventually find those
changes, which can affect how you are listed.

Directories: Directories such as Yahoo!, are maintained by humans who review
inclusion requests of URLs. The human editors then divide these web sites into
categories accordingly.

You submit a short description to the directory for your entire site, or
editors write one for sites they review. A search looks for matches only in the
descriptions submitted.

How Search Engines Rank Web Pages

Search for anything using your favorite crawler-based search engine. Nearly
instantly, the search engine will sort through the millions of pages it knows
about and present you with ones that match your topic. The matches will even be
ranked, so that the most relevant ones come first.

Of course, the search engines don't always get it right. Non-relevant pages
make it through, and sometimes it may take a little more digging to find what
you are looking for. But, by and large, search engines do an amazing job.

As WebCrawler founder Brian Pinkerton puts it, "Imagine walking up to a
librarian and saying, 'travel.' They¢re going to look at you with a blank face."

OK -- a librarian's not really going to stare at you with a vacant
expression. Instead, they're going to ask you questions to better understand
what you are looking for.

Unfortunately, search engines don't have the ability to ask a few questions
to focus your search, as a librarian can. They also can't rely on judgment and
past experience to rank web pages, in the way humans can.

So, how do crawler-based search engines go about determining relevancy, when
confronted with hundreds of millions of web pages to sort through? They follow a
set of rules, known as an algorithm. Exactly how a particular search engine's
algorithm works is a closely-kept trade secret. However, all major search
engines follow the general rules below.

Location, Location, Location...and Frequency

One of the the main rules in a ranking algorithm involves the location and
frequency of keywords on a web page. Call it the location/frequency method, for
short.

Remember the librarian mentioned above? They need to find books to match
your request of "travel," so it makes sense that they first look at books with
travel in the title. Search engines operate the same way. Pages with the search
terms appearing in the HTML title tag are often assumed to be more relevant than
others to the topic.

Search engines will also check to see if the search keywords appear near the
top of a web page, such as in the headline or in the first few paragraphs of
text. They assume that any page relevant to the topic will mention those words
right from the beginning.

Frequency is the other major factor in how search engines determine
relevancy. A search engine will analyze how often keywords appear in relation to
other words in a web page. Those with a higher frequency are often deemed more
relevant than other web pages.

Spice In The Recipe

Now it's time to qualify the location/frequency method described above. All
the major search engines follow it to some degree, in the same way cooks may
follow a standard chili recipe. But cooks like to add their own secret
ingredients. In the same way, search engines add spice to the location/frequency
method. Nobody does it exactly the same, which is one reason why the same search
on different search engines produces different results.

To begin with, some search engines index more web pages than others. Some
search engines also index web pages more often than others. The result is that
no search engine has the exact same collection of web pages to search through.
That naturally produces differences, when comparing their results.

Search engines may also penalize pages or exclude them from the index, if
they detect search engine "spamming." An example is when a word is repeated
hundreds of times on a page, to increase the frequency and propel the page
higher in the listings. Search engines watch for common spamming methods in a
variety of ways, including following up on complaints from their users.

Off The Page Factors

Crawler-based search engines have plenty of experience now with webmasters
who constantly rewrite their web pages in an attempt to gain better rankings.
Some sophisticated webmasters may even go to great lengths to "reverse engineer"
the location/frequency systems used by a particular search engine. Because of
this, all major search engines now also make use of "off the page" ranking
criteria.

Off the page factors are those that a webmasters cannot easily influence.
Chief among these is link analysis. By analyzing how pages link to each other, a
search engine can both determine what a page is about and whether that page is
deemed to be "important" and thus deserving of a ranking boost. In addition,
sophisticated techniques are used to screen out attempts by webmasters to build
"artificial" links designed to boost their rankings.

Another off the page factor is clickthrough measurement. In short, this
means that a search engine may watch what results someone selects for a
particular search, then eventually drop high-ranking pages that aren't
attracting clicks, while promoting lower-ranking pages that do pull in visitors.
As with link analysis, systems are used to compensate for artificial links
generated by eager webmasters.

Referenced By Danny Sullivan, Internet Consultant & Editor for
SearchEngineWatch.com

source:http://www.submitawebsite.com/blog/how-does-search-work.html

0 komentar:

Posting Komentar