The Search Engine Directories And How They Work

Web search engine is a software system that is designed to search for information on the World Wide Web.

The search results are generally presented in a line of results often referred to as search engine results pages (SERPs).

The information may be a mix of web pages, images, and other types of files.

Some search engines also mine data available in databases or open directories.

 Unlike web directories, which are maintained only by human editors, search engines also maintain real-time information by running an algorithm on a web crawler.

Web search engines get their information by web crawling from site to site.

The "spider" checks for the standard filename robots.txt, addressed to it, before sending certain information back to be indexed depending on many factors, such as the titles, page content, headings, as evidenced by the standard HTML markup of the informational content, or its metadata in HTML meta tags.

Indexing means associating words and other definable tokens found on web pages to their domain names and HTML-based fields.

The associations are made in a public database, made available for web search queries. A query from a user can be a single word. The index helps find information relating to the query as quickly as possible.

Some of the techniques for indexing, and caching are trade secrets, whereas web crawling is a straightforward process of visiting all sites on a systematic basis.

Between visits by the spider, the cached version of page (some or all the content needed to render it) stored in the search engine working memory is quickly sent to an inquirer.

If a visit is overdue, the search engine can just act as a web proxy instead. In this case the page may differ from the search terms indexed.

 The cached page holds the appearance of the version whose words were indexed, so a cached version of a page can be useful to the web site when the actual page has been lost.

Typically when a user enters a query into a search engine it is a few keywords. The index already has the names of the sites containing the keywords, and these are instantly obtained from the index.

The real processing load is in generating the web pages that are the search results list: Every page in the entire list must be weighted according to information in the indexes. 

Then the top search result item requires the lookup, reconstruction, and markup of the snippets showing the context of the keywords matched.

These are only part of the processing each search results web page requires, and further pages (next to the top) require more of this post processing.

About once a month submit the URLs for your Websites. Manually submitting your site to the major search engines will cause your site to be re-indexed so that new text and information will be recognized and cataloged.

 Although most major search engines have ‘robots’ or ‘spiders’ that roam the web cataloging sites, the frequency with which they visit your site is based on many factors: how established it is, how often it is updated, how many pertinent incoming links you have, etc…

It is important to note that there are many search utilities on the web that are ‘powered’ by one of the major search engines, and are therefore essentially extensions of those engines.

 This means if you are registered with the main engine, you will likely also be included in the additional directories it is powering.

No comments:

Post a Comment

Write Only Comments Related To The Post Or Article. Thank You For Visiting.