XML sitemap generator only crawling 20% of my site
-
Hi guys,
I am trying to submit the most recent XML sitemap but the sitemap generator tools are only crawling about 20% of my site. The site carries around 150 pages and only 37 show up on tools like xml-sitemaps.com. My goal is to get all the important URLs we care about into the XML sitemap.
How should I go about this?
Thanks
-
I believe it's not a significant issue if the sitemap encompasses the core framework of your website. As long as the sitemap is well-organized, omitting a few internal pages is acceptable since Googlebot will crawl all pages based on the sitemap. Take a look at the <a href="https://convowear.in">example page</a> that also excludes some pages, yet it doesn't impact the site crawler's functionality.
-
Yes Yoast on WordPress works fine for sitemap generation. I would also recommend that. Using on all of my blog sites.
-
If you are using WordPress then I would recommend to use Yoast plugin. It generates sitemap automatically regularly. I am also using it on my blog.
-
I'm using Yoast SEO plugin for my website. It generates the Sitemap automatically.
-
My new waterproof tent reviews blog facing the crawling problem. How can I fix that?
-
use Yoast or rankmath ot fix it
آموزش سئو در اصفهان https://faneseo.com/seo-training-in-isfahan/
-
Patrick wrote a list of reasons why Screaming Frog might not be crawling certain pages here: https://moz.com/community/q/screamingfrog-won-t-crawl-my-site#reply_300029.
Hopefully that list can help you figure out your site's specific issue.
-
This doesn't really answer my question of why I am not able to get all links into the XML sitemap when using xml sitemap generators.
-
I think it's not a big deal if the sitemap covers the main structure of your site. If your sitemap is constructed in a really decent structure, then missing some internal pages are acceptable because Googlebot will crawl all of your pages based on your site map. You can see the following page which also doesn't cover all of its pages, but there's no influence in terms of site crawler.
-
Thanks Boyd but unfortunately I am still missing a good chunk of URLs here and I am wondering why? Do those check on internal links in order to find these pages?
-
Use Screaming Frog to crawl your site. It is free to download the software and you can use the free version to crawl up to 500 URLs.
After it crawls your site you can click on the Sitemaps tab and generate an XML sitemap file to use.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Our new site will be using static site generator which is supposed to be better for SEO?
Hi folks, Our dev team is planning on building our new marketing webpages on SSG or Static Site Generator(we are stepping away from SSR). Based on my research this is something that can help our SEO in particular for site speed (our site has a poor score).
Intermediate & Advanced SEO | | TyEl
Are there any challenges or concerns I should be aware regarding this direction? If so what are they and how can this be addressed? Thanks0 -
We 410'ed URLs to decrease URLs submitted and increase crawl rate, but dynamically generated sub URLs from pagination are showing as 404s. Should we 410 these sub URLs?
Hi everyone! We recently 410'ed some URLs to decrease the URLs submitted and hopefully increase our crawl rate. We had some dynamically generated sub-URLs for pagination that are shown as 404s in google. These sub-URLs were canonical to the main URLs and not included in our sitemap. Ex: We assumed that if we 410'ed example.com/url, then the dynamically generated example.com/url/page1 would also 410, but instead it 404’ed. Does it make sense to go through and 410 these dynamically generated sub-URLs or is it not worth it? Thanks in advice for your help! Jeff
Intermediate & Advanced SEO | | jeffchen0 -
If I put a piece of content on an external site can I syndicate to my site later using a rel=canonical link?
Could someone help me with a 'what if ' scenario please? What happens if I publish a piece of content on an external website, but then later decide to also put this content on my website. I want my website to rank first for this content, even though the original location for the content was the external website. Would it be okay for me to put a rel=canonical tag on the external website's content pointing to the copy on my website? Or would this be seen as manipulative?
Intermediate & Advanced SEO | | RG_SEO1 -
How important are sitemap errors?
If there aren't any crawling / indexing issues with your site, how important do thing sitemap errors are? Do you work to always fix all errors? I know here: http://www.seomoz.org/blog/bings-duane-forrester-on-webmaster-tools-metrics-and-sitemap-quality-thresholds Duane Forrester mentions that sites with many 302's 301's will be punished--does any one know Googe's take on this?
Intermediate & Advanced SEO | | nicole.healthline0 -
Is there a way to keep sitemap.xml files from getting indexed?
Wow, I should know the answer to this question. Sitemap.xml files have to be accessible to the bots for indexing they can't be disallowed in robots.txt and can't block the folder at the server level. So how can you allow the bots to crawl these xml pages but have them not show up in google's index when doing a site: command search, or is that even possible? Hmmm
Intermediate & Advanced SEO | | irvingw0 -
How to stop pages being crawled from xml feed?
We have a site that has an xml feed going out to many other sites.
Intermediate & Advanced SEO | | jazavide
The xml feed is behind a password protected page so cannot use a cannonical link to point back to original url. How do we stop the pages being crawled on all of the sites using the xml feed? as with hundreds using it after launch it will cause instant duplicate content issues? Thanks0 -
Where to link to HTML Sitemap?
After searching this morning and finding unclear answers I decided to ask my SEOmoz friends a few questions. Should you have an HTML sitemap? If so, where should you link to the HTML sitemap from? Should you use a noindex, follow tag? Thank you
Intermediate & Advanced SEO | | cprodigy290 -
On-Site Optimization Tips for Job site?
I am working on a job site that only ranks well for the homepage with very low ranking internal pages. My job pages do not rank what so ever and are database driven and often times turn to 404 pages after the job has been filled. The job pages have to no content either. Anybody have any technical on-site recommendations for a job site I am working on especially regarding my internal pages? (Cross Country Allied.com)
Intermediate & Advanced SEO | | Melia0