Robots.txt file generator
Transcript of Robots.txt file generator
What is Robots.txt file ?The robots exclusion protocol (REP), or robots.txt is a
text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on
their website. In short Web site owners use the /robots.txt file to give instructions about their site to
web robots; this is called The Robots Exclusion Protocol.
How to create robots.txt file for your website?Step1 : Go to the following website :http://tools.seobook.com/robots-txt/generator/
You will the following screen :
Step 2 : Suppose you don’t want the robots to have access to your about-us page of your website. Then just select /about-us.html from your website as shown in the image below :
And paste that part in the files or directories tab and click add you your robots.txt will be ready as shown below :
Copy this code in notepad and save as robots.txt in your main folder
Very important point
Step 3 : Then upload the robots.txt file to your website using filezilla ftp client :
Step 4 : Check whether the file is uploaded to your website by typing robots.txt infront of your website’s url :
Your robots.txt file is uploaded successfully to your website
You can directly write the code in notepad and save the file as robots.txt too if you don’t want to use the online tool.
Important Note :The "/robots.txt" file is a text file, with one or more records. Usually contains a single
record looking like this:User-agent: *
Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /~joe/
In this example, three directories are excluded.Note that you need a separate "Disallow" line for every URL prefix you want to exclude you cannot say "Disallow: /cgi-bin/ /tmp/" on a single line. Also, you may not have blank
lines in a record, as they are used to delimit multiple records.Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot".
Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or "Disallow: *.gif".
What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples:
Some examples :
1) To exclude all robots from the entire server
2) To allow all robots complete access
(or just create an empty "/robots.txt" file, or don't use one at all)
3) To exclude all robots from part of the server
User-agent: * Disallow: /
User-agent: * Disallow:
User-agent: * Disallow: /cgi-bin/
Disallow: /tmp/ Disallow: /junk/
4) To exclude a single robot
5) To allow a single robot
6) To exclude all files except one
This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:
7) Alternatively you can explicitly disallow all disallowed pages :
User-agent: BadBot Disallow: /
User-agent: Google Disallow: User-agent: * Disallow: /
User-agent: * Disallow: /~joe/stuff/
User-agent: * Disallow: /~joe/junk.htmlDisallow: /~joe/foo.html Disallow: /~joe/bar.html
THANK YOU