Sitemap generator partially finding list of website URLs
-
Hi everyone,
When creating my XML sitemap here it is only able to detect a portion of the website. I am missing at least 20 URLs (blog pages + newly created resource pages). I have checked those missing URLs and all of them are index and they're not blocked by the robots.txt.
Any idea why this is happening? I need to make sure all wanted URLs to be generated in an XML sitemap.
Thanks!
-
Gaston,
Interestingly enough by default the generator only located only half of the URLs. I hope that one of those 2 fields will do the trick.
-
Hi Taysir,
I´ve never used that service. I suspect that the section you refer to should do the trick.
I believe that you do know how many URLs there are in the whole site, so you can compare how much pro-sitemaps.com finds to your numbers.Best luck!
GR -
Thanks for your response Gaston. These pages are definitely not blocked by the robots.txt file. I think that it is an internal linking problem. I actually subscribed to pro-sitemap.com and was wondering if I should use this section and add remaining sitemap URLs that are missing: https://cl.ly/0k0t093f0Y1T
Do you think this would do the trick?
-
Google not only provides a basic template you could do the sitemap manually if you wished, and this link has Google listing several dozen open source sitemap generators.
If Google Webmaster's can't read the one you generated fully, then clearly an alternate generator should definitely fix that for you. Good luck!
-
Hi taysir!
Have you tried any other crawler to check whether those pages can be finded?
I'd strongly suggest you Screaming Frog spider, the free version allows you up to 500 URLs. Also, it has a feature to create sitemaps from the crawled URLs. Even though dont know if that available in the free version.
Here some info about that feature: XML sitemap genetator - Screaming FrogUsual issues in not being findable are:
- Poor internal linking
- Not having a sitemap (this is why you find out)
- Blocked resources in robots.txt
- Blocked pages with robots meta tag
That being said, its completely normal that Google has indexed pages that you cant find in a AdHoc crawl, that is because GoogleBot could have found those pages from external linking.
Also keep in mind that having pages blocked with Robots.txt or robots meta tag will not prevent that page from being indexed nor will make them deindex if you add some rules to block them.Hope it helps.
Best luck
GR
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Parameters as pagination
Hi guys, due to some changes to our category pages our paginated urls will change so they will look like this: ...category/bagger/2?q=Bagger&startDate=26.06.2017&endDate=27.06.2017 You see they include a query parameter as well as a start and end date which will change daily. All URLs with pagination are on noindex/follow. I am worrying that the products which are linked from the category pages will not get crawled well when the URLs on which they are linked from change on a daily basis. Do you have some experience with this? Are there other things we need to worry about with these pagination URLs? cheers
Technical SEO | | JKMarketing0 -
White listing a site
A new clients site is blocked by a lot of Firewalls. And I can't work out why, the content is family friendly they sell nursery equipment. I've run it through the Google checker and there is no malicious software found on the site. Can anyone tell me what I need to do to get this site unblocked? The url is http://knuma.co.uk/
Technical SEO | | Marketing_Optimist0 -
Canonical URL on frontpage
I have a site where the CMS system have added a canonical URL on my frontpage, pointing to a subpage on my site. Something like on my domain root.Google is still showing MyDomain.com as the result in the search engines which is good, but can't this approach hurt my ranking? I mean it's basically telling google that my frontpage content is located far down the hierarki, instead of my domain root, which of course have the most authority.
Technical SEO | | EdmondHong87
Something seems to indicate that this could very well be the case, as we lost several placements after moving to this new CMS system a few months ago.0 -
Website cache?
Hi mozzers, I am conducting an audit and was looking at the cache version of it. The homepage is fine but all the other pages, I get a Google 404. I don't think this is normal. Can someone tell me more what could be the issue here? thanks
Technical SEO | | Ideas-Money-Art0 -
How could i create sitemap with 1000 page and should i update sitemap frequently?
My website have over 1000 pages but the sitemap creator tools i knew only create maximum 500 pages, how could i create sitemap with full of my webpage?
Technical SEO | | magician0 -
Question about construction of our sitemap URL in robots.txt file
Hi all, This is a Webmaster/SEO question. This is the sitemap URL currently in our robots.txt file: http://www.ccisolutions.com/sitemap.xml As you can see it leads to a page with two URLs on it. Is this a problem? Wouldn't it be better to list both of those XML files as separate line items in the robots.txt file? Thanks! Dana
Technical SEO | | danatanseo0 -
Find where the not selected pages are from
Hi all Can anyone suggest how I can find where gtoogle is finding approx. 1000 pages not to select? In round numbers I have 110 pages on the site site: searech shows all pages index status shows 110 slected and 1000 not selected. For the life of me I cannot fingure where these pages are coming from. I have set my prefered domain to www., setup 301 's to www. as per below RewriteCond %{HTTP_HOST} ^growingyourownveg.com$
Technical SEO | | spes123
RewriteRule ^(.*)$ "http://www.growingyourownveg.com/$1" [R=301,L] site is www.growingyourownveg.com any suggestions much appreciated Simon0 -
How to find out if I have been penalized?
I have launched a new website beginning January this year and have seen slowly more and more traffic coming from google to the website until the 20th of March where suddenly there are no more visitors from the google search engine. The only traffic left is from google images, social networks or other search engines. Without visitors from google search this reduces our overall traffic by ~66%. I can't easily find anymore our website in the search results of google by using terms which we usually ranked quite well. Nevertheless, the website is still indexed as I can find it using the "site:" search query. In google webmaster tools there are no messages and we have only been doing a bit of link building on website and blog directories (nothing excessive and nothing paid neither). Is there any way to find out if google penalized my website? I guess it has... and what would be the best thing to do right now? The website is hellasholiday (dot) com Thanks in advance for your idea and suggestions
Technical SEO | | socialtowards0