SEO Know-HowTechnical SEO Strategies: Canonization, Robots.TXT, Redirects, HTTP Status Codes.

BringLink SEO
Dec 15, 2023
7 min read

We have already seen that technical SEO is about optimizing the internal structure of a website. In the previous article on technical SEO part 1 we covered how to do indexing via XML sitemap and how to do a migration. In the crawl part we talked about crawling budgets, Log Files, 404 errors and availability of a site. These techniques impact a site’s organic search results but there is much more to cover.

In this article we will look at the remaining crawling techniques, servers/CDN’s, Browsers, Robots Directives, Redirects, HTTP Status Code and Canonization.

• Step 3 of the Strategy: Getting the Tracking right

• Servers/CDN's

• Browser Compatibility

• Robot Directives

• Redirections

• HTTP Response Status Codes

• Canonization

• Conclusion

Servers/CDN’S

CDN servers are very important these days. A CDN is a content delivery network, a group of geographically distributed servers that speed up the delivery of Web content. CDNs cache content on servers close to the user’s location, which allows content loading to be significantly faster. CDN services are designed to solve the problem of network congestion caused by delivering advanced Web content such as graphics and videos over the Internet, but they can also offer websites greater protection from malicious actors and concerns such as security attacks. It is important for an SEO to check what kind of server it deals with and if the site requires a lot of image/video content, to propose to the IT team to work with a CDN server.

Browser Compatibility

When creating a website you need to consider the variability of browsers that exist today (Chrome, Edge, Firefox, Safari, etc.). While some users turn to modern browsers, many still use Internet Explorer as the default for their browsing. In addition, each browser reads websites differently, which can hinder the viewing of some of them. Therefore, SEO technicians should consider the limitations of each browser, know where their target audience is, and perform an SEO audit to verify domain compatibility in each browser.

Robot Directives

Robots.txt is a text file stored at the root of the website to control and instruct the search robots how to handle the indexing of the respective websites. In the Hearder of each page on the site you can place a robots meta tag, a “piece” of HTML that tells search engines how to crawl or index a particular page:

content=”noindex”/>

Already in Robots.txt you can set some indexing directives on your website in general. For this purpose there is the “Allow” directive that tells search robots what they can index on the website. The code below indicates that it allows “JavaScript” and “CSS” files to be indexed and parsed:

Allow: .js

Allow: .css

You can also set the “User-Agent” command that determines which search robot is addressed, for example

“Googleboot”:

User-agent: Googleboot

Additionally, there is the “Disallow” command, which is used to prevent a page from being viewed or indexed by search robots, in this case “Beta.php” and the “Files”:

Disallow: /beta.php

Disallow: /arquivos/

Finally, there is the directive of indicating the “website sitemap”, very useful to help search robots to identify all the existing pages in the domain. Nowadays this directive is in disuse due to the “Google Webmaster Tools” that help in this subject in a more effective way.

Sitemap: https://www.seusite.com/sitemap.xml

Redirects

Redirects are used in the pages of a site, more specifically in URL’s/links, and serve to redirect those same links to another page/content.

There are several types of redirects:

300 – Multiple Choice, links can be taken with the state 200 – OK (this is a manual redirect),
301 – Permanently Moved;
302 – Found and Temporarily Moved;
303 – Indicates that the Redirect does not lead to recent resources, but to a confirmation page, for example, the form thank you pages;
304 – Not Modified, cache refresh, indicates that the cache value is still recent and can be used,
307 – Temporarily Moved, similar to 302, but this redirect does not maintain PageRank;
308 – Permanently Moved, this redirect also does not maintain PageRank.

HTML Redirects

To create a redirect in the HTML of a site page, developers can create a “meta” element and the “http-equiv” – attribute set to refresh in the page header. When displaying the page, the browser will find this element and go to the indicated page. The “content” is the attribute starting with a number, indicating how many seconds the browser should wait, before redirecting to the provided URL. Always set this to 0:

JavaScript Redirects

JavaScript redirects are created by setting a value to the “window.location” property and the new page is loaded. This redirect will only work on clients that run JavaScript:

window.location = “http://www.example.com/”;

Example code for the redirection approach:

fetch(`/api/products/${productId}`)

.then(response => response.json())

.then(product => {

if(product.exists) {

showProductDetails(product);

// shows the product information on the page} else {

// this product does not exist, so this is an error page.

window.location.href = ‘/not-found’;

// redirect to 404 page on the server.

}

})

Example code for the noindex tag approach:

fetch(`/api/products/${productId}`)

.then(response => response.json())

.then(product => {if(product.exists)

{ showProductDetails(product);

// shows the product information on the page} else {

// this product does not exist, so this is an error page.

// Note: This example assumes there is no other meta robots tag present in the HTML.

const metaRobots = document.createElement(‘meta’);

metaRobots.name = ‘robots’;

metaRobots.content = ‘noindex’;

document.head.appendChild(metaRobots);

}

})

Redirection Loops

Loops happen when successive redirects follow the one that has already been followed, i.e. a redirect to a certain link had already been created, and this link suffered another redirect to a new page. Most of the time this is a server problem, if you encounter this error right after modifying a server configuration, it is probably a redirection loop: 500 (Internal Server Error).

In other cases it might be a browser error, due to cookies, for that you have to clear the history manually. Another option that can solve the problem if you are on wordpress: deactivate and activate all plugins that you have on your site, it could be this type of error.

In other cases it is related to the security protocol (SSL), so you have to set SSL back to the default settings and refresh the page. If the error message is still there, try resetting your SSL certificate completely. You can also check the .htaccess, the file that controls most page redirects: reset it with an FTP client (Control Panel):

Use your FTP client to find your site files;
Access the WordPress file folder via “Online File Manager;
Go to your htdocs folder;
Find the .htaccess and download it (in case you need to restore it later);
Right click to edit it. Now you can use the web text editor;
Reset the default settings; then save and update your site;
If that hasn’t fixed the error, you can restore the .htaccess file you downloaded earlier again.

Redirects In Common Servers

In “Apache” redirects are defined in the “.htaccess” file of each directory. The URL “http://example.com/” will be redirected to http://www.example.com/:

<VirtualHost *:80> ServerName example.com Redirect / http://www.example.com</VirtualHost>

Redirect_Match does the same, but uses a regular expression to define a collection of URLs that are affected, e.g. all documents in the “images/” folder will be redirected to a different domain:

RedirectMatch ^/images/(.*)$ http://images.example.com/$1

In “Nginx” you create a server block specific to the content you want to be redirected:

server { listen 80; server_name example.com; return 301 $scheme://www.example.com$request_uri;}

To apply a redirect only to a folder or a subset of pages, use the “rewrite” directive:

rewrite ^/images/(.*)$ http://images.example.com/$1 redirect;

rewrite ^/images/(.*)$ http://images.example.com/$1 permanent;

Redirects are a way to forward traffic (or search engine bots) from one URL to another, this is extremely important for a site, but we must take into account that we must redirect a URL to the content most similar to it, because Google can redirect that link back to a 404 page. The implementation of redirects differs depending on the CMS you use, we can do it through the .htaccess file of the site, adding a server block to the nginx.conf file, using the “Easy Redirects Manager” plugin in WordPress, or the “Rank Math” which takes advantage of a number of SEO related tools.

HTTP Response Status Codes

HTTP response status codes indicate whether an HTTP request has completed correctly or not. Responses are grouped into five classes:

Information responses ( 100- 199);
Successful responses ( 200- 299);
Redirects ( 300- 399);
Client errors ( 400- 499);
Server errors ( 500- 599).

Canonization

Usually a domain can be accessed by multiple combinations of URL’s. This allows the user to access the site in different ways within the same domain, for example:

www.suaempresa.com;

suaempresa.com;

suaempresa.com/index.html;

www.suaempresa.com/index.html.

However, when there are multiple versions of the same page, Google will select only one to store in its index. This process is called canonization (see canonization in practice) and the URL selected as canonical will be the one that Google will show in the search results. The easiest way to see how Google has indexed a page is to use the URL inspection tool in the Google Search Console. This will show the canonical URL selected by Google.

There are two ways to canonize the site: through .htaccess or by organizing the links of the whole site manually. For example, from www to no-www, in .htaccess, you put the following code:

RewriteCond %{HTTP_HOST} ^mestreseo\.com\.br [NC]

RewriteRule (.*) https://www.agenciamestre.com/$1 [R=301,L]

The second option is through canonical tags, which inform search engines that the URL in question is the master copy of a page: rel = “canonical”. Canonical tags are found in the hearder of a page:

Or the use of canonical URLs, the URL chosen as the primary URL for a set of duplicate pages. In wordpress you can use the plugins “Yoast SEO” and “Rank Math” to do these canonizations, on any page or post, go to the ‘advanced’ tab of the RankMath meta box, go to the field and set a canonical URL for the page “canonical”. Canonical tags are found in the hearder of a page:

Conclusion

In this article Part 2 of Practical Technical SEO we cover crawling techniques. In the article Part 3 of Technical SEO we will talk about site architecture, Schema, Javascript, API’s.

Many companies today need immediate results, but the truth is they can’t afford to implement SEO internally while leveraging with the priority of their business focus. If you still can’t handle these steps or don’t have the time to put them in place, Bringlink SEO ensures you get the brand visibility and growth you deserve.

Talk to us, send email to bringlinkseo@gmail.com.