The robots.txt file is a simple text file that is placed in the root directory of a website and is used to communicate with web robots, also known as crawlers or spiders, about which pages or files on the site they can or cannot access. In this blog post, we will discuss the importance of robots.txt file, its syntax and how to use it effectively.
The Importance of robots.txt File
Robots.txt file is an essential tool for website owners and webmasters who want to control how their website is crawled and indexed by search engines. It helps search engines like Google, Bing, Yahoo, and others to understand which pages or sections of the website should be indexed and displayed in search results. By using robots.txt, you can prevent search engines from indexing sensitive pages or sections of the website that you do not want to appear in search results.
In addition to search engines, other web robots can also crawl your website, such as web scrapers, which can extract data from your website for different purposes. By using robots.txt, you can prevent these web robots from accessing your website or specific pages, which can help protect your website’s content and information.
How To Add Custom Robots.txt File in Blogger
Custom Robots.txt is a file that we add to a blog/site to improve SEO. By this file, we can guide search engine crawler which page to crawl or which not to. With this tutorial, you must Setup Custom Header Tags in Blogger. It is also similar to Custom Robots.txt.
- Sign in to blogger and choose blog which you want to customize.
- Go to “Search Preferences” > “Settings”.
- Find “Custom robots.txt” under the “Crawlers and Indexing” section. Click on “Edit” on the right side of the option. Select “Yes”. A blank box will appear. Copy the below code and, paste into the box.
User-agent: Mediapartners-GoogleDisallow:User-agent: *Disallow: /searchAllow: /Sitemap: http://YOUR BLOG NAME/feeds/posts/default?orderby=UPDATED
4. After that, click on “Save changes”.That’s all
robots.txt file explanation.
If you have ever worked on a website or used a search engine, you might have come across a file called “robots.txt”. The robots.txt file is a simple text file that is placed in the root directory of a website and is used to communicate with web robots, also known as crawlers or spiders, about which pages or files on the site they can or cannot access. In this blog post, we will discuss the importance of robots.txt file, its syntax and how to use it effectively.
Syntax of robots.txt File
The robots.txt file uses a simple syntax, which consists of two main parts: user-agent and disallow. The user-agent specifies the web robot that the rule applies to, and the disallow specifies the pages or sections of the website that the robot should not access. Here’s an example of a robots.txt file:
User-agent: *Disallow: /admin/Disallow: /private/
In this example, the user-agent (*) specifies that the rules apply to all web robots, and the disallow rules indicate that the /admin/ and /private/ directories should not be accessed by web robots.
Note that you can also use the allow directive to specify which pages or sections of the website should be accessed by web robots.
Using robots.txt Effectively
While robots.txt file can be an effective tool to control web robot access to your website, it’s important to use it correctly to avoid accidentally blocking legitimate access. Here are some tips to use robots.txt effectively:
- Use the “Disallow” directive to block access to sensitive pages or directories.
- Use the “Allow” directive to explicitly allow access to certain pages or directories.
- Use the “*” wildcard to apply a rule to all web robots.
- Use the “User-agent” directive to specify a particular web robot to apply a rule to.
- Test your robots.txt file using the Google Search Console or other tools to ensure that it’s working as intended.