The site map is very important for any site , the sitemap is basically help to find out the the canonical url in individual url.Its site map unique feature that sitemap how to identify unique url for particular site.the problem arias when the site have large amount of pages and generated automatically , in that condition crawler confuse that which page crawl and how to manage site pagination according site architecture .
The spider decide that which page crawl on when , decide priority according xml site entry.
Here is the sitemap importance according search engine :
The site map ruke accepted by search engine in 2006 .after that google and bing create and webmaster tools for the user to detect site issue and error .one of the the best serch engine that google is give best facility and more flexible webmaster dashboard for tha end use in that web master dashboard the user easily identify the the issues and error for his project and that unique feature make google separate from other search engine .
Here is the some guide line for google given by the webmaster dashboard user that is :
- The site limit is 500000 url only
- The file size of the page is 50 Mb
- In one account you can submit only 500 site.
- The basic characterise for the site map that is use in optimization according sitemap url :
- Basic sitemap optimization according site map chek that duplicate url for the same site.
- Second main task done by sitemap that find basic code error 300,400,500 is not occur ,
- Meta rel canonicals that are not self-referential
- No index meta robots tags
Another area of sitemap is stong that the sitemap identify that which page and shich content bot crowl by spider as well as which area is not give googlebot .
Additional advanced Google searches like
site:example.com/blog/ inurl:tag AND
inurl: author could then be done to determine the scale of potential excess crawling and indexation. The same rule for applied in dynamic site that navigation related ,pagination ,and shorting product etc.
Here is the sitemap index tag definition :
- Attribute Description <sitemapindex>
Required – Encapsulates information about all of the Sitemaps in the file.
- <sitemap>
Required – Encapsulates information about an individual Sitemap.
- <loc>
Required – Identifies the location of the Sitemap.
This location can be a Sitemap, an Atom file, RSS file or a simple text file.
<lastmod> optional