robots.txt file indicating to Google bot which pages, URLs crawl or not. robots.txt format helps to crawl pages or not.
What is robots.txt?
All the Search engines like Google, Bing, Yahoo use a special program, they go on the present internet to collect the information on the website, from one website to another website, this program is called a spider, crawler, bots, etc.
In the beginning internet computing power and memory, both were costly, in that time some website's owner was disturbed for a crawler of search engine. the reason is that at that time website was not that successful for robots to, again and again, visit every website, and in pages due to that reason server was mainly down and the results were not shown for the user, and the website resource was finished.
For this reason, they found a solution, an idea to give search engine that idea is a robot.txt. that means we instruct that which page is allowed to crawl or not, robot.txt file in the root file.
when robots go to your site they follow your robots.txt file instruction, if the robot did not find your robots.txt file then it crawls the whole website. If it crawls the whole website then users will face various problems on your website.
user-agent:*
disallow:/
User - agent: Googlebot - Google
user - agent : Amazonbot - Micro office
How does it affect SEO?
Today 98% of traffic is handled by Google so let's talk about Google only. Google allows each website a crawl budget, this budget decides how much time to do crawling in your site.
The crawl budget depends on two things.
1- server slow at crawl time means when a robot crawls your website does your website load slow during the visit.
2- how much popular your website and how much content is your website. Google crawl first, because robot wants to keep itself updated that is why it crawls mostly popular and more content website.
For the robots.txt file, your website should be under maintenance. if you want to disallow any file then you can be blocked by robots.txt.
What is the format of the robots.txt file?
If you want to stop whichever page contains your employee details or duplicate content from crawling then you can always disallow to crawl then you need to take help to the robots.txt file.
for example,
your website name - htps://www.your website.com
your folder name - sample
your page name- sample.html
then you block by robots.txt
user-agent: / *
Disallow; / sample.html
How to fix blocked by robots.txt?
If the Google search console shows up blocked by robots.txt in the section of Excluded and you are worried about it, then I have a solution for you.
Friends, when the Google search console shows up blocked by robots.txt it means you have some issues with your pages or URL s. Let's learn how to fix this problem.
- Go to your blog
- Click settings
- Click on custom robots.txt
- Turn on
- and paste the robots.txt file
- and save.
How to get your website robots.txt file?
- Go to the sitemap generator
- paste your website URL
- click on generate sitemap
- copy the code below in the sitemap
- and paste your robots.txt section.
Disallow :/ search
Disallow :/ category/
Disallow :/ tags/
Allow:/
After these settings go to the custom robots header tags
- Enable custom robots header tags
- click on home page tags_ turn on all, noodp
- Click on the archive and search page tags_ turn on noindex, noodp
- Click on the post and page tags_ turn on all, noodp
When you have completed this setting, Google crawlers index your page, it took some days or weeks.
How does Google bot work?
Google bot will visit your website and find the robot.txt file. It will not crawl the pages that are disallowed and the pages which are allowed will be crawled and then indexed. After finishing the crawling and indexing, it will take your website in the search engine to rank.
How can check your website robots.txt file?
It's simple to search in Google search engine
type :
site: https://yourwebsite.com/robots.txt
Conclusion
In the above article I tried my best to explain what is robots.txt, how does it affect SEO, What is the format of the robots.txt file, How to fix blocked by robots.txt, how to get your website robots.txt file, and last how does Google bot work. robots.txt is needed to give the instructions to the Google bot.
Hopefully, I succeeded to clear your every doubt and query through my article. If you want to give any suggestion that should be required in the article then you are free to do so.
what next to read
0 Comments