XML sitemap generator only crawling 20% of my site
-
Hi guys,
I am trying to submit the most recent XML sitemap but the sitemap generator tools are only crawling about 20% of my site. The site carries around 150 pages and only 37 show up on tools like xml-sitemaps.com. My goal is to get all the important URLs we care about into the XML sitemap.
How should I go about this?
Thanks
-
I believe it's not a significant issue if the sitemap encompasses the core framework of your website. As long as the sitemap is well-organized, omitting a few internal pages is acceptable since Googlebot will crawl all pages based on the sitemap. Take a look at the <a href="https://convowear.in">example page</a> that also excludes some pages, yet it doesn't impact the site crawler's functionality.
-
Yes Yoast on WordPress works fine for sitemap generation. I would also recommend that. Using on all of my blog sites.
-
If you are using WordPress then I would recommend to use Yoast plugin. It generates sitemap automatically regularly. I am also using it on my blog.
-
I'm using Yoast SEO plugin for my website. It generates the Sitemap automatically.
-
My new waterproof tent reviews blog facing the crawling problem. How can I fix that?
-
use Yoast or rankmath ot fix it
آموزش سئو در اصفهان https://faneseo.com/seo-training-in-isfahan/
-
Patrick wrote a list of reasons why Screaming Frog might not be crawling certain pages here: https://moz.com/community/q/screamingfrog-won-t-crawl-my-site#reply_300029.
Hopefully that list can help you figure out your site's specific issue.
-
This doesn't really answer my question of why I am not able to get all links into the XML sitemap when using xml sitemap generators.
-
I think it's not a big deal if the sitemap covers the main structure of your site. If your sitemap is constructed in a really decent structure, then missing some internal pages are acceptable because Googlebot will crawl all of your pages based on your site map. You can see the following page which also doesn't cover all of its pages, but there's no influence in terms of site crawler.
-
Thanks Boyd but unfortunately I am still missing a good chunk of URLs here and I am wondering why? Do those check on internal links in order to find these pages?
-
Use Screaming Frog to crawl your site. It is free to download the software and you can use the free version to crawl up to 500 URLs.
After it crawls your site you can click on the Sitemaps tab and generate an XML sitemap file to use.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why do people put xml sitemaps in subfolders? Why not just the root? What's the best solution?
Just read this: "The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/." here: http://www.sitemaps.org/protocol.html#location Yet surely it's better to put the sitemaps at the root so you have:
Intermediate & Advanced SEO | | McTaggart
(a) http://example.com/sitemap.xml
http://example.com/sitemap-chocolatecakes.xml
http://example.com/sitemap-spongecakes.xml
and so on... OR this kind of approach -
(b) http://example/com/sitemap.xml
http://example.com/sitemap/chocolatecakes.xml and
http://example.com/sitemap/spongecakes.xml I would tend towards (a) rather than (b) - which is the best option? Also, can I keep the structure the same for sitemaps that are subcategories of other sitemaps - for example - for a subcategory of http://example.com/sitemap-chocolatecakes.xml I might create http://example.com/sitemap-chocolatecakes-cherryicing.xml - or should I add a sub folder to turn it into http://example.com/sitemap-chocolatecakes/cherryicing.xml Look forward to reading your comments - Luke0 -
Traffic drop on this site
I am SEO'ing this site but need some assistance in the analysis. it was doing not too bad but in the last 4 months the google traffic has really fallen off, i suspect the keywords may need improving but any tips or observations would be great.
Intermediate & Advanced SEO | | crowng0 -
Site migration from non canonicalized site
Hi Mozzers - I'm working on a site migration from a non-canonicalized site - I am wondering about the best way to deal with that - should I ask them to canonicalize prior to migration? Many thanks.
Intermediate & Advanced SEO | | McTaggart0 -
Will Google bots crawl tablet optimized pages of our site?
We are in the process of creating a tablet experience for a portion of our site. We haven’t yet decided if we will use a one URL structure for pages that will have a tablet experience or if we will create separate URLs that can only be access by tablet users. Either way, will the tablet versions of these pages/URLs be crawled by Google bots?
Intermediate & Advanced SEO | | kbbseo0 -
Is this site worth subscribing to?
Hi everyone is, the below site worthwhile submitting to? I see one of our competitors is on here and the article they have published has in turn be picked up by other sites. Is the financial cost worth the back link reward? https://app.prweb.com/Main.aspx?Entity=Home
Intermediate & Advanced SEO | | Hardley10 -
New site now links disappearing in Open Site Explorer and GWT
We launched a new site at the beginning of December 2012 and carefully 301'd all URLs from the old site to the new (custom CMS on old site wordpress on new). Our rankings have slipped quite badly but the most worrying thing is that we used to have about 1200 backlinks according to GWT/OSE before the new site launched and now we're down to about 30. Can anyone help shed some light on this please? The site is www.littleoneslondon.co.uk A few things that might help: 1. We were getting a lot of links through our job feeds (it's a nanny recruitment site) on indeed and trovitt, for some reason no new ones from these have appeared in site explorer and all the old jobs are gone completely. 2. We had 1000s of not found errors in google webmaster tools and once these were redirected and marked as fixed this is when the links disappeared. 3. We are getting quite a few 504 errors on the site due to an old proxy redirect (/blog was hosted on a different server on the old site and has not been removed yet), this will be fixed tomorrow but could this be a factor? 4. The developer seems to have redirected all the links through wordpress directly some how (I don't see any redirect plugins but there are lots of pages called 'redirect'). There are no references in the htaccess file for any redirects other than from the /blog folder that the wordpress instance sits in. Sorry for the long post, I hope I've given any details you'd need and I really appreciate any help anyone can give. Thanks, Karl
Intermediate & Advanced SEO | | Bdig0 -
Separate Site or should we incorporate it into our main site
Hello, We have a website to sell personal development trainings. The owners want to start 2 blogs - one for each owner - that promotes their personal coaching practices. What's the SEO advantages of embedding both blogs in the current site vs starting 2 brand new blogs with their names as the domain names?
Intermediate & Advanced SEO | | BobGW0 -
How to enable crawling for dynamic generated search result pages?
I want to enable crawling facility for dynamic generated search result pages which are generating by Magento Solr search. You can view more about it by following URLs. http://code.google.com/p/magento-solr/ http://www.vistastores.com/catalogsearch/result/?q=bamboo+table+lamp
Intermediate & Advanced SEO | | CommercePundit
http://www.vistastores.com/catalogsearch/result/?q=ceramic+table+lamp
http://www.vistastores.com/catalogsearch/result/?q=green+patio+umbrella Right now, Google is not crawling search result page because, I have added following syntax to Robots.txt file. Disallow: /*?q= So, How do I enable crawling of search result pages with best SEO practice? If any other inputs in same direction so, it will help me more to get it done.0