What is Robots.txt?
Robots.txt is a text file which contains few lines
of code. It is saved on the website or blog’s server which instruct the web
crawlers on how to index and crawl your blog in the search results. That means
you can restrict any web page on your blog from web crawlers so that it can’t
get indexed in search engines like your blog labels page, your demo page or any
other pages that are not as important to get indexed. Always remember that
search crawlers scan the robots.txt file before crawling any web page.
THIS IS FOR EDUCATIONAL PURPOSE ONLY, I AM NOT RESPONSIBLE FOR ANY ILLEGAL ACTIVITIES DONE BY VISITORS, THIS IS FOR ETHICAL PURPOSE ONLY
How To Add Custom Robots.txt in your Blog -Blogger
1) Login to your Blogger account
2) open the blog for which you want to add
Robots.txt
3) Go to Settings
4)Go to search preferences
5) Crawlers and indexing
6) Custom robots.txt - -Edit
7) Enable custom robots.txt content? -- Yes
8) Copy the following data and paste it (you need
to change "yourdomain.com")
to your
blogger domain name.
9) Click Save Changes
10) That's it you are done with adding Robots.txt to your Blog
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap: http://www.yourdomain.com/atom.xml?redirect=false&start-index=1&max-results=500
What
Does the Above Lines State?
Mediapartners-Google:
Media partner Google is the user
agent for Google adsense that is used to server better relevant ads on your
site based on your content. So if you disallow this they you will won’t able to
see any ads on your blocked pages.
User-agent:*
So you all know what user-agent is,
so what is user-agent:*.
The user-agent that is marked with (*) asterisk is applicable to all crawlers and
robots that can be Bing robots, affiliate crawlers or any client software it
can be.
Disallow:
By adding disallow you are telling
robots not to crawl and index the pages. So below the user-agent:* you can see
Disallow: /search which means you are disallowing your
blogs search results by default. You are disallowing crawlers in to the
directory /search that comes next after your domain name. That is a search page
like http://yourdomain.com/search/label/yourlabel will not be crawled and never be
indexed.
Allow:/
Allow: / simply refers to or you are
specifically allowing search engines to crawl those pages.
Sitemap:
Sitemap helps to crawl and index all
your accessible pages and so in default robots.txt you can see that your blog
specifically allowing crawlers in to sitemaps. There is an issue with default Blogger sitemap.
=========== Hacking Don't Need Agreements ==========
Just Remember One Thing You Don't Need To Seek Anyone's Permission To Hack Anything Or Anyone As Long As It Is Ethical, This Is The Main Principle Of Hacking Dream
Thank You for Reading My Post, I Hope It Will Be Useful For You
I Will Be Very Happy To Help You So For Queries or Any Problem Comment Below Or You Can Mail Me At Bhanu@HackingDream.net
No comments:
Post a Comment