How does the search robot
On this subject there is a detailed faq on Yandex at http://help.yandex.ru/webmaster/?id=995296
Detailed, but not quite informative. For example, a direct question, and, ask yourself: "What is the search engine robot and what is he doing? "Yandex itself and answers:
Robot (English crawler) maintains a list of URL, it can index and to keep pumping their corresponding documents. If the analysis of the document robot detects a new link, it adds it to your list. Thus, any document or site, which has links, can be found by the robot, and hence the search for Yandex.
As you can see, the answer is only the second part of the question. Because what is a robot, we have not learned. Let us turn to the independent experts with Wikipedia.
Crawler (Web spider, spider, spider, crawler) - a program which is part of the search engine and designed to crawl web pages in order of entry of information about them (keywords) into the database search engine. In essence, the spider is more reminiscent web browser. It scans the contents of the page, throws it on the server search engine, which owns and sent on the links on the following pages. The owners of search engines typically limit the depth of penetration of the spider inside the site and the maximum size of scanned text, so too large sites may not be fully indexed search engine. Besides the usual spiders, there are so-called "woodpeckers" - robots that "tapping" indexed site to determine that it is connected to the Internet.
The order of traversal of pages, frequency of visits, protection against loops, as well as the selection criteria determined by the keyword search engine algorithms.
In most cases, the transition from one page to another via a link contained on the first and subsequent pages.
Also, many search engines provide users the ability to add the site to the queue for indexing. Usually it is significantly faster indexing site, and in cases where no external links do not lead to the site in general is the only opportunity to declare its existence.
Restrict indexing your site by using the file robots.txt, but some search engines may ignore the existence of this file. Full protection from indexing provides a mechanism to circumvent that while the spiders can not. Usually - set a password on the page, or the requirement to complete a registration form before you get access to the page content.
Even clearer. Robot - is a program. The program, built-in search engine as part of it and submitting to the search engine algorithms. In addition, the robot is subject and the author or the website administrator. To subjugate a search engine spider admin site should be competently perform a dance with a tambourine to write instructions in the file robots.txt, which a file is an instruction for the robot, which pages are not recorded in its index. We note here that access to these pages, if they have incoming links, robot still open. He just does not log them into the index, though, because of its subordination to a particular search engine algorithm, is likely to change, for absolute certainty that your sensitive data will not be in error property of the people, it is better to play it safe and set the same password on your page, or other obstacles for the robot, such as sms-lock:) Robots, of course, constantly being improved intellectually, but something tells me that to pay by card or by SMS not learn essentially never.
And below we see a link to the script by which we can check which pages on the server are protected against robots Yandex selected to match the instructions in the robots.txt: script