The robot text file, better known as robots.txt, is a long-running Web standard which helps prevent Google and other search engines from accessing parts of your site.
Why would you want to block Google from parts of your site? One important reason is to prevent Google from indexing pages on your site which are duplicates of pages on other sites—such as the default WordPress pages. Google penalizes sites with duplicate content.
Another important reason is to prevent Google from linking to unprotected premium content on your website. For example, maybe you give out a free ebook to people who subscribe to your mailing list. You don’t want Google to link directly to this ebook, so you use the robot text file to prevent Google from indexing it.
For example the ebooks might be stored in the folder in your root domain called PDF. This is what you would do to block all search engines.
User-Agent: *
Disallow: /PDF/
On the other hand, if you want your free book to go viral, don’t block the search engines from the book.
Some people also like to prevent Google from using their images in Google search or from downloading large files.
Also, if you have a large authority WordPress site, Google may be loading the same page under several different names, using up a large part of your bandwidth and webserver computer processing power. Special robot text file patterns can tell Google to only access pages once.
Finally, you can tell Google about your XML or text site map using robots.txt, so it indexes new pages on your site much faster than just waiting for it to re-crawl your site.
Robot Txt File Basics
The robot text file is an optional file in the root directory of a website. Since you’re reading this, I assume you have a website. Take a moment to see if you already have a robot text file by going to the following URL: http://example.com/robots.txt
(Replace example.com with your domain name.)
Here is mine: Please note, it is a work in progress. I recently changed my WordPress theme which also required that I do some of my own robot text file editing.
You must be careful when editing this file and you can easily make a mistake and block the search engines from accessing your website.
If you get a 404 File Not Found error, you don’t have a robot text file. Otherwise, you will see a simple text file with lines labeled User-Agent, Allow, Disallow, and Sitemap, plus blank lines, and comment (“#”) lines.
What Things Mean in the Robot Text File
• User-Agent means the user agent of the Web browser visiting your site. The robot text file only apples to robots—also called spiders—who crawl your website for search engines and other automated online tools. Google’s crawler robot is called Googlebot, although Google also has a few other robots for its other search tools.
• Allow tells robots that they’re allowed to visit URLs containing a particular path. Most robot text files tell robots the the root (“/”) path is ok to crawl.
• Disallow tells robots where they cannot go. Most of your time editing a robots.txt file will be spent crafting disallow lines.
• Sitemap points to your site map (or multiple sitemaps if you have a large site). You need a sitemap to use this, which requires something like the WordPress plugin XML Sitemap Generator.
Getting Your Robot Text File In WordPress
The following instructions will only work if you use WordPress to manage the root directory of your website. That means the main page of your blog doesn’t have any words in it after the domain name.
For example, if your main WordPress page is http://example.com/, then WordPress probably manages your robots.txt file. But if your main WordPress page is http://example.com/blog, then WordPress probably doesn’t manage your robots.txt file and you’ll have to work on it directly using FTP upload.
By default, WordPress will create a restrictive robots.txt file if you use the WordPress settings to mark your blog as private. Most people have public sites, so the default WordPress robot text file is empty.
Some website hosting companies provide a default robot text file for WordPress—especially if you used a one-click install for WordPress. If so, you may need to edit your robots.txt file using FTP upload too.
But if none of the above is the case, you can probably have WordPress generate your robots.txt file for you.
Robots.txt WordPress Plugins
Several SEO plugins can generate a robots.txt file. I’d be careful using these if you do anything besides blogging with your site because they can stop Google from indexing legitimate pages. This can be one of those silly errors that cause your website rankings to drop fast.
Another plugin which automatically creates the robot text file is XML Sitemap Generator. It doesn’t block or allow anything—it simply includes a Sitemap line to tell Google and other search engines where to find your sitemap.
An example would be:
Sitemap: http://tips4pc.com/sitemap.xml
There’s also a very old WordPress plugin which lets you edit your robot text file from within WordPress. I haven’t used this plugin, so I don’t know if it still works.
The Old-Fashioned Robots.Txt File Editor
If you want a custom robot text file, you can create one the old fashioned way. Open Windows Notepad, Mac OSX TextEdit, or vi or emacs for Linux. Enter the following text:
User-Agent: *
Allow: /
The example file above will tell robots to act exactly like they would if you didn’t have a robot text file, so it won’t break anything on your site. Save the file as robots.txt and upload it to your webserver’s root directory using an FTP tool or your website hosting company’s online file manager.
(The root directory is the same directory where you add the Google website verification code file, in case you’ve done that before.)
After the file is uploaded, use your Web browser to visit http://example.com/robots.txt (but use your domain instead). You should see the file you just uploaded. If you don’t, you will need to contact your hosting company for help.
What To Put In Your WordPress Robot Txt File
Your robot text file can be as simple as the example above or much more complicated. In general, you want to block the following:
• WordPress login and help directories, which all start with wp. Put this code under “allow: /”
Disallow: /wp-*
• The example above will tell Google not to index the WordPress uploads directory where you store your images. If want your images to appear on Google and Bing image search, add the following code:
Allow: /wp-content/uploads
• If Google tries to index a trackback, it will just get an error page, so add this code too:
Disallow: */trackback
• If you are using Google Adsense then it is recommended that you use this line to allow Google to crawl all content so they can serve targeted ads.
User-agent: Mediapartners-Google*
Allow: /
Those simple robot text file commands should cover the most important parts of your site, but if you want more ideas, go to your favorite WordPress-based website and look at their robots.txt file.
Hi Millica its really very useful post,Good Research,Need more post like this,i tried this trick in my WordPress blog,Thanks.All the Articles in this blog is really cool.
Kamal G recently posted..Important General Knowledge Questions for Bank Exams Part-2
Nice detailed article on robots.txt.
I’ve a question can I add multiple sitemaps reference in robots.txt file?
like sitesmaps that have been created by different plugins and that have different names.
Karan recently posted..How To Add Google Custom Search On WordPress Site
Do you know of whether the WordPress SEO robots.txt is valid or not? I’ve been using this part of his plugin and doing okay with it.
tonygreene113 recently posted..French Patriot Mom Stands In Front of Islam – Against death threats
Hi Tony
I think you are talking about Yoast SEO.?? It is valid and true but I have had no luck editing the robots.txt in the SEO plugin. I have edited it on the server then the updates are there in Yoast immediately.
I have edited the .htaccess file though from inside that plugin.
Hi Mitz,
Your posts are always informative and inspiring. Most of the bloggers are least bothered about Robots.txt file. I’m one of them. Thanks for sharing these tips.
Suresh recently posted..3 new essential wordpress plugins
Nice info about robots.txt . but any other way to generate robots.txt for blogger.com platform ??? helpful lessons learning through this awesome post 🙂 🙂 Thank You
Ashvin Patel recently posted..Bigg Boss 7 News : Celebrity Guest Name and Host
Go to Your Site Settings › Search preferences › Crawlers and indexing to edit.
Thanks to Mitz, we now have a simple basic much needed tutorial which is
perfect for newer webmasters and internet marketers, as I am an SEO guy
and I cant tell you how many times people neglect having a ROBOTS.TXT
file on their servers!
This is needed for Google(TM)SEO and I RECOMMEND YOU FOLLOW WHAT MITZ
has posted if you want your website or blog to be Google(TM) best practices/webmaster guidelines SEO compliant.
I am happy to share this, as a homebound disabled veteran seo guy,
freelancer, with my following on Google+, Twitter and Facebook as I consider this article to be a valuable resource even for people who
know about it, I recommend you go back into your editors and review
your source code for robots.txt to ensure your SYNTAX code is correct
and does not prevent GOOGLEBOT from crawling your website.
I have bookmarked this blog and look forward to more informative
and useful articles in the future by Mitz!
Ricky
Ricky Wright recently posted..100% MONEY BACK GUARANTEE ON SEO OR WEB DESIGN!
Thanks Ricky! Glad you like my article. You are right about many people neglecting this file so I thought I had better let them know a few details. 🙂
Really appreciate the sharing with your audience.
hi thanks for this article this file can be used to block search engines from visiting-crawling-indexing your html and other files
vaibhav recently posted..Facebook messenger application free download
Tutorial is quiet good regarding Robot txt file n its easy to follow. Thanks for sharing.. 🙂
Farhan Memon recently posted..Download ‘Pastel Beauty For iOS 7′ Wallpapers
They are still missing few things like pagination issue. replytocon problem. It has to be blocked in robots.txt file
Martin recently posted..UPDATE #4: Ten Beautiful Twitter Bootstrap templates to Any Needs
I have created some blogs through blogger.com belong to Google. But it is so strange that all my blogs can’t show their descriptions when I check with Google
Oh I dont know that, My robots.txt only got wp-admin, and wp-includs…thanks for nice tips
Shamim recently posted..How to run incompatible software in windows 8
Indeed a very useful tutorial for not only the newbies but also from experienced folks.As explained by Mitz, this file can be used to block search engines from visiting-crawling-indexing your html and other files.I would recommend the newbies to not mess with this file ..unless you know what you’re doing 😀
-Pramod
Useful things to apply because when you are facing some trouble in your site then you can use this tool to stop negativity on visitors.
Hyptia recently posted..Process Services in Zimbabwe | Zimbabwe Process Servers
Hi Mitz. I have a problem here, I have a sub domain (sub.domainblog.com) but I don’t want the sub domain crawled by google. How to write the robot.txt from my main blog?
Uphy recently posted..Pengertian Gerak
Hello Mitz, you’ve mentioned some good information about Robot.txt file various basics. It’s really crucial to know what is robot.txt and importance of it. To make sure a site is friendly before Google here mentioned information is educative and I’m pleased to learn very helpful lessons learning through this awesome presentation. Keep it up… 🙂
Thank you very much for this Robots.txt tutorial.
I had one question to ask regarding this, I heard that we should no-index tags and categories from our sites to improve SEO.
Is it true? Should we allow or disallow Tags, Author Archives and Categories ?
Jafar Dhada recently posted..How to Root Samsung Galaxy S3 GT-I9300