top of page

In Practice: How To Locate, Create And Start Using Robots.Txt On Websites, Step By Step [2023]

In the previous article we covered the canonical method which, when misplaced, causes duplication of content, i.e. pages. In case the duplication of pages on the site cannot be solved, we can resort to Robots.txt. Robots.txt is a text file stored at the root of a website, which is used to control and instruct search robots how to handle the indexing of the respective pages. With Robots.txt you can allow and/or restrict pages from being crawled by search engines or you can only allow and/or restrict a page to one search engine, i.e. if you have some pages on your site that you don’t want Google to find and index, you can use Robots.txt to restrict those pages to Google only and not to another search engine, for example Bingo.


• Robots.txt

• How to find a Robots.txt file

• Create a Robots.txt file

• Robots.txt directives

• Conclusion


Robots.Txt

Robots.txt is extremely important because if a site is too large or contains poorly ranking pages, Google will not be able to index that many pages and may index the worst pages the site contains. An SEO professional will need to know how to work with this Robots.txt file to give the best of the sites to the search engines. It is also important to exclude resources such as pdf’s, videos and images from search results as these overwhelm a website, unless it is a site for that very purpose.


How To Find A Robots.Txt File

The robots.txt file is hosted on the same server as your site, you can see this file in the backend/host, in the folders that contain the entire site structure (file manager), and if you can’t find it you can always search for the file:



You can also type in the full URL of your home page and add in front of it: /robots.txt, and you will see your Robots.txt file:



If you have a wordpress site you can use plugins, such as RankMath, that will provide you with a Robots.txt file:



Create A Robots.Txt File

Creating a Robots.txt file is very easy, you just have to open a.txt file with any text editor or browser, in which you will name it “Robots.txt”. Then fill in the directives you want to set for your site, save and place this file in the backend of your site: file manager, in the main directory where all your site folders are.


Robots.Txt Directives

In the Hearder of each page of the site it is possible to place a robots meta tag:


<meta name=”robots”content=”noindex”/>


Already in Robots.txt you can set some indexing directives on your website in general. For this purpose there is the “Allow” directive that tells search robots what they can index on the website. The code below indicates that it allows “JavaScript” and “CSS” files to be indexed and parsed:


Allow: .js

Allow: .css


You can also set the “User-Agent” command that determines which search robot to target, in this case “Googleboot” or by placing an * allowing your site to be seen by all existing search engines:


User-agent: Googleboot

User-agent: *


Additionally, there is the “Disallow” command, which is used to prevent a page from being viewed or indexed by search robots, in this case “Beta.php” and the “Files”:


Disallow: /beta.php

Disallow: /arquivos/


Finally, there is the directive to indicate the “website sitemap”, very useful to help search robots to identify all the existing pages in the domain. Nowadays this directive is in dedevided to the “Google Webmaster Tools” that help in this subject in a more effective way.



The Crawl-Delay directive can be very useful because it specifies a crawl delay in seconds, it serves to prevent robots from overloading a server (which slows down your site):


Crawl-delay: 10

 

For a Robots.txt to be effective, you must have a well defined structure, i.e. you must start by declaring the User-Agent: whether the site pages will be crawled by all search engines or just one specific one. After communicating the User-Agent you have to specify if the site wants to allow/allow crawling of all pages and/or those that you want to restrict. If you want to set the Crawl-Delay you must do it after the User-Agent.


You can test whether your Robots.txt is instilled on the site and whether it is well structured by these two options:

. The Test robots.txt tool in Search Console;

. The open source robots.txt library.



 

Conclusion

Robots.txt is a crucial SEO technique, as it is an asset to websites, even as a resource to make them faster and lighter, as to define what is useful and can be shown to Google or vice versa, learn more about other important SEO techniques that can complement with this method.


Today, many companies need immediate results, but the truth is that they cannot afford to implement SEO internally while leveraging with the priority of their business focus. If you still can’t handle these steps or don’t have the time to put them in place, Bringlink SEO ensures you get the brand visibility and growth you deserve.


Talk to us, send email to bringlinkseo@gmail.com.

 

References



bottom of page