Sitemap generator partially finding list of website URLs

Taysir

Hi everyone,

When creating my XML sitemap here it is only able to detect a portion of the website. I am missing at least 20 URLs (blog pages + newly created resource pages). I have checked those missing URLs and all of them are index and they're not blocked by the robots.txt.

Any idea why this is happening? I need to make sure all wanted URLs to be generated in an XML sitemap.

Thanks!

Taysir

Gaston,

Interestingly enough by default the generator only located only half of the URLs. I hope that one of those 2 fields will do the trick.

Gaston Riera

Hi Taysir,

I´ve never used that service. I suspect that the section you refer to should do the trick.
I believe that you do know how many URLs there are in the whole site, so you can compare how much pro-sitemaps.com finds to your numbers.

Best luck!
GR

Taysir

Thanks for your response Gaston. These pages are definitely not blocked by the robots.txt file. I think that it is an internal linking problem. I actually subscribed to pro-sitemap.com and was wondering if I should use this section and add remaining sitemap URLs that are missing: https://cl.ly/0k0t093f0Y1T

Do you think this would do the trick?

TucsonAZWebDesign

Google not only provides a basic template you could do the sitemap manually if you wished, and this link has Google listing several dozen open source sitemap generators.

If Google Webmaster's can't read the one you generated fully, then clearly an alternate generator should definitely fix that for you. Good luck!

Gaston Riera

Hi taysir!

Have you tried any other crawler to check whether those pages can be finded?
I'd strongly suggest you Screaming Frog spider, the free version allows you up to 500 URLs. Also, it has a feature to create sitemaps from the crawled URLs. Even though dont know if that available in the free version.
Here some info about that feature: XML sitemap genetator - Screaming Frog

Usual issues in not being findable are:

Poor internal linking
Not having a sitemap (this is why you find out)
Blocked resources in robots.txt
Blocked pages with robots meta tag

That being said, its completely normal that Google has indexed pages that you cant find in a AdHoc crawl, that is because GoogleBot could have found those pages from external linking.
Also keep in mind that having pages blocked with Robots.txt or robots meta tag will not prevent that page from being indexed nor will make them deindex if you add some rules to block them.

Hope it helps.
Best luck
GR

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Sitemap generator partially finding list of website URLs

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Redirecting an Entire Website?

Website is not indexing

Http urls on a new https website

Sitemap international websites

No Keyword in URL

Changing all urls

Canonical URL

SEOMoz is indicating I have 40 pages with duplicate content, yet it doesn't list the URL's of the pages???