implementting robots.txt in every site
Hi could you please let me know how to implement robots.txt for every site (in moss 2007) and how it could be tested to work. Thanks
April 18th, 2011 2:28am

Put only one robot.txt file at the root of the web application not necessary on every subsite. Disallow sites/pages in subsite from this file directly. use this links for reference http://support.microsoft.com/kb/837847 http://www.robotstxt.org/robotstxt.htmlw: http://www.worldofsharepoint.com | t: @sharesandip
Free Windows Admin Tool Kit Click here and download it now
April 18th, 2011 3:54am

could you please elaborate the steps? Where actually do i need to put the robots.txt inside the IIS folder or inside the content database. robots.txt in the application does have the option to selective enable/disable crawling; 1 know that we can put the text inside the robots.txt. can we give an option to content managers to enable/disable(access robots.txt ) from the sharepoint page so that he could enable or disable the crawling for t,he current working page. Also, the first link seems to be more on tiff files not on robots.txt Many Thanks...
April 18th, 2011 12:37pm

Nab, Just copied the text from the article described by Sandeep. How to use the Robots.txt file and HTML tags to prevent access to content on the portal site You can use a Robots.txt file to control where robots (Web crawlers) can go on a Web site. You can also use the Robots.txt file to indicate whether to exclude specific crawlers. Web servers use these rules to control access to Web sites by preventing robots from accessing certain areas. SharePoint Portal Server 2003 and SharePoint Server 2007 look for this file when it crawls, and it obeys the restrictions that are contained in the Robots.txt file. You can prevent another server from crawling content on the portal site by modifying the Robots.txt file. For example, you might want to restrict a specific robot from accessing the server because the frequency of requests from the robot is blocking the Web site. You may also want to restrict all robots from certain areas on the server. SharePoint Portal Server 2003 and SharePoint Server 2007 do not install a Robots.txt file. However, you can create a Robots.txt file and put the Robots.txt file in the home directory of the default Web site on the server. To determine the home directory of the default Web site on the server, follow these steps: Start Internet Information Services (IIS) Manager. Expand <var style="box-sizing: border-box;">server name</var>, and then expand Web Sites. Right-click Default Web Site, and then click Properties. Click the Home Directory tab. Make a note of the path that appears in the Local Path box, and then click Cancel. Put the Robots.txt file in the path that appears in the Local Path box. For example, if the path is D:\Inetpub\Wwwroot, put the Robots.txt in the D:\Inetput\Wwwroot folder on the server. To confirm that the Robots.txt file is in the correct folder on the server, start your Web browser, and then type http://<var style="box-sizing: border-box;">server name</var>/robots.txt. You can restrict access to certain documents by using HTML META tags. HTML META tags tell the robot whether a document can be included in the index and whether the robot can follow the links in the document by using the INDEX/NOINDEX attribute and the FOLLOW/NOFOLLOW attributes in the tag. For example, you can mark a document with the following if you do not want the document crawled and you do not want links in the document followed: <META name="robots" content= "NOINDEX, NOFOLLOW"> SharePoint Portal Server 2003 and SharePoint Server 2007 automatically obey the restrictions that are contained in the Robots.txt file. V
Free Windows Admin Tool Kit Click here and download it now
April 18th, 2011 12:59pm

A couple of questions 2. Expand <var style="box-sizing: border-box;">server name</var>, and then expand Web Sites. i did not understand this step after inetmgr where do i need to put it Also, in the article, "To confirm that the Robots.txt file is in the correct folder on the server, start your Web browser, and then type http://<var style="box-sizing: border-box;">server name</var>/robots.txt. " so if the servername is example.com then i should paste exactly http://<var style="box-sizing: border-box;">example.com</var>/robots.txt in the address bar. could you please let me know what is to be done?
April 19th, 2011 12:07am

Nab, Ignore the html tags. They were added as a result of copy/paste (typical problem). Hence it should be in this way. The same mistake is done twice here. 2. Expand server name, and then expand Web Site To confirm that the Robots.txt file is in the correct folder on the server, start your Web browser, and then type http://server name/robots.txt. Let me know if it is not still clear. Thanks V V
Free Windows Admin Tool Kit Click here and download it now
April 19th, 2011 10:46am

one last question could you please let me know, after putting the crawler and after configuring the directories to be excluded while crawling from robots.txt, how am i going to know, it is actually working. Do we have some kind of tool to test it?
April 20th, 2011 10:47am

This article should give you good idea of tool that exists. http://www.google.com/support/webmasters/bin/answer.py?answer=156449 Thanks VV
Free Windows Admin Tool Kit Click here and download it now
April 20th, 2011 10:56am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics