Search Engines (Davar Web Site)

It's obvious that the web as a resource (of about anything) can be only as good as its best searching tools. In order to find something you should either know the way to it, or have some tool which will permit you to find a "needle in a haystack" either by name or by some description of its properties. Search engines are exactly this sort of tool and, as a gateway to all other web resources, are one of the most valuable resources themselves. Distinct from the catalogs that are organized as a hierarchy of categories, search engines organize their data logically in a way that allows to relate the resource properties to the resource location on the web. I think it isn't an exaggeration to say that it is search engines that make the web universally useful — if you can't find it, of what use is the fact that it's out there?

A search engine is basically a database of web content equipped with two mechanisms — a web site scanning robot (that moves through the web site collecting information about web site pages and associating it with pages' locations on the web — URLs) and the search tool which permits the searcher to formulate the search request in order to get a list of relevant URLs with short description of pages corresponding to them. The ability of a search engine to collect the data is its most important characteristic. Search sophistication and interface convenience come second and third in priorities — however brilliant they might be, you simply won't be able to find what wasn't placed into the search engine database.

The task of finding best search engines is greatly complicated by the fact that search engines and catalogs almost always look very similar: search engines try to compliment their databases by category catalogs, while catalogs offer search capabilities (quite often using some external search engine). It is important, however, to realize the difference and this involves reading a lot of fine print that is often quite artfully arranged to cloud the issue. And the difference is essential: while search engines try to cover the entire ocean of the web content (always with a certain degree of success), catalogs are inherently selective and this selection is already made for you by somebody else.

(No offense to catalogs is intended here — they often prove to be quite useful when you can formulate what you are looking for and what should be the way to it in terms of the hierarchy of categories. Unfortunately, most often this is not the case — you might have quite a vague idea what you are looking for, and no idea at all how to get to it through the tree of possible categories. In such a case the search engine is practically the only tool to find something on the web. Another problem with catalogs is the site submission process — it is manual and tedious, submissions get reviewed by the catalog stuff and as a result catalogs carry only a small fraction of the information available on the web.)

When search engines are separated from the catalogs having a search capability, it's desirable to find out the difference between search engines themselves — any serious search takes time and effort and it is definitely good to know which search engines would provide the best return. It looks like that the only reliable way to tell the real differences between search engines is to conduct a series of experiments of submission of at least average size web site. This will test the most important search engine features — its ability to collect and hold data.

According to everything said above, the search engines presented here were carefully selected by separating them from catalogs having a search capability and by conducting site submission experiments in order to find out how deep (and consequently wide) their scanning robot will go through the hierarchy of submitted site as it scans it, and how long the engine database will hold the retrieved data about the scanned site.

The second step after sorting search engines from the catalogs was to separate the genuine ones from those who simply link to them as a part of their advertising business (some search engines have several "face plates" of their own — this doesn't count — I simply have chosen one of them). I did my own research and looked for opinions of others on the web (for one example have a look at the Scrub the Web list — their comments make a lot of sense to me, though their list still lacks precision). This research narrowed the potential list of search engines to a manageable size.

And finally, I've conducted a submission and search experiments using my own web site (maximum depth is 4 levels and number of pages is several hundred). I have submitted an entry page only and then returned to search in about two months. I searched by URL where it was possible, or by the unique keywords "Davar Web Site" which are embedded in every page of my site (both into Keywords and into the text) specifically for the purpose of easy page identification in the search engine database.

I've thrown out of the list those search engines that didn't contain any references to the submitted site — they either do selective scans, or fail to scan at all, or won't hold what they have scanned (needless to say, that I've thrown out engines which failed to provide a free and clear site submission procedure). Any provider of web content will notice soon that such engines lack performance and won't waste time submitting a site to them — consequently they will perform poorly for the searcher as well, because of the lack of content as a result or poor scanning techniques or policies.

I've sorted the remaining search engines into three main groups according to their ability to scan pages of submitted web site, and one specialized group. As time passes some good search engines get corrupted either by cutting free access to them, or cutting free site submission, and most often both. Content of the web is always changing and is never static, so as soon as the engine stops adapting to it in real time, it looses its touch with reality and any value it might have had before. This way my two former favorites Alta Vista and Northern Light dropped out of the active game and became the miserable shadows of themselves. Consequently, some categories of my search engine list might become empty at certain periods.

Muli-Level Search Engines — Scan submitted page and pages many (several) levels down in the hierarchy of submitted web site. These engines are the best choice for the searcher, since their databases are most complete (in comparison with other 2 groups). Engines in this group are listed according to the percentage of pages they've scanned from the submitted web site (around two thirds each, and pretty close to one another).

Double-Level Search Engines — Scan submitted page and pages one level down in the hierarchy of submitted web site. They are better than the third group engines, but still miss most of the web site content.

Single-Level Search Engines — Scan only the page that has been submitted. These engines inevitably miss most of the content of any web site composed of more than one page. They have to rely on description of the site provided on the entry page to the site. Since there are rather tight limitations on the size of Description and Keywords, these engines have no way to adequately present content of any web site other than the most simple. Engines in this group are listed alphabetically.

Russian Search Engines — Provide support for both Russian and English languages, while reflecting a predominantly Russian content. The only engine listed in this group so far is a muli-level search engine.

My recommendation for a search on the web would be to use only the first and the last group (if necessary). It makes sense to use a Simple search as a first approximation (check Help — it might be not that "simple") and then, based on its results switch to Advanced search in order to make the search more precise. In any case it will be necessary to look through a list of titles and descriptions of the pages that were found in the search engine database, and to make your own decision which of them are worth browsing.

When I submit my site for scanning by the search engine (only when content gets added or changed, but no more often than once a month), I do this for all search engines listed above and it takes only 5-10 minutes using the direct links to engines' site submission pages (the issues of preparing information on the page for the search engine are covered in my description of Simple text extract presentation).

As I come across the search engine that I haven't tried, I put it to the above described test in order to decide whether it should be included into my list and into which category. From time to time I do control searches to make sure that my categories reflect the current performance of listed search engines.

I would advise both web searcher and web developer to be skeptical about any offers (even free ones) to search/submit to thousands of search engines. I strongly doubt that the number of real search engines is even in dozens. As the number of fakes grows every day, the number of genuine search engines actually goes down, until maybe there will be only one engine "to rule them all". God bless Google and save it from getting corrupted as so many have gotten already!

Best of the luck with your search — it is also a very important component of a search success. Unfortunately it can't be listed or classified, so I can only wish it to you.