PDA

View Full Version : Robot.txt



aman1
19 Dec 2012, 01:20 AM
What is Robots.txt and use the steps for robots.txt?

Yesceeohhh
19 Dec 2012, 04:35 AM
Robots.txt files helps search engine spiders how to interact with indexing your content.

By default search engines are greedy. They want to index as much high quality information as they can, and they will assume that they can crawl everything unless you tell them otherwise.

If you specify data for all bots (*) and data for a specific bot (like GoogleBot) then the specific bot commands will be followed while that engine ignores the global/default bot commands.

If you make a global command that you want to apply to a specific bot and you have other specific rules for that bot then you need to put those global commands in the section for that bot as well, as highlighted in this article by Ann Smarty.

When you block URLs from being indexed in Google via robots.txt they may still show those pages as URL only listings in their search results. A better solution for completely blocking the index of a particular page is to use a robots noindex meta tag on a per page bases. You can tell them to not index a page, or to not index a page and to not follow outbound links by inserting either of the following code bits in the HTML head of your document that you do not want indexed.
<meta name="robots" content="noindex">
<meta name="robots" content="noindex,nofollow">
Please note that if you block the search engines in robots.txt and via the meta tags then they may never get to crawl the page to see the meta tags, so the URL may still appear in the search results URL only.

If you do not have a robots.txt file, your server logs will return 404 errors whenever a bot tries to access your robots.txt file. You can upload a blank text file named robots.txt in the root of your site (ie: seobook.com/robots.txt) if you want to stop getting 404 errors, but do not want to offer any specific commands for bots.

Some search engines allow you to specify the address of an XML Sitemap in your robots.txt file, but if your site is well structured with a clean link structure you should not need to create an XML sitemap

SASAtechno
19 Dec 2012, 05:54 AM
Hi in my opinion your url is online and visitors seeing your webpage but can not crawling google....

Christy
20 Dec 2012, 12:31 AM
When a search engine crawler comes to your site, it will look for a special file on your site. That file is called robots.txt and it tells the search engine spider, which Web pages of website should be indexed and which should be ignored.

rashmiseoindia
21 Dec 2012, 11:21 PM
Robots.txt is a very helpful file that tells the search engine spiders that which URL to crawl and index and which not.



Water Dispenser Manufacturers (http://www.karlstonwaterdispensers.com) | Water Dispenser Suppliers (http://www.karlstonwaterdispensers.com) | Water Dispenser India (http://www.karlstonwaterdispensers.com/our-products.html)

kathygreen
27 Dec 2012, 07:10 AM
robot.txt is nothing but a very useful thing by which you can prevent search engine to crawl the links which you don't want. You can disallow any search engine to crawl any of your broken link or dead link. It can be done by webmaster tool.

alastairclark8
27 Dec 2012, 09:01 AM
robot.txt is a text file which every webmaster put in root directory of any website to send a clear message to crawl bot that which page you don't want to crawl or index by crawler.

jacksamwhite
28 Dec 2012, 05:39 AM
Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.

alfraidjones
02 Jan 2013, 11:13 PM
Robots.txt is a text file created by webmasters to tell search engine robots that how to crawl and index pages on their website.

Hire Developer
23 Jan 2013, 03:14 AM
Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do.