Sitemap generator partially finding list of website URLs
-
Hi everyone,
When creating my XML sitemap here it is only able to detect a portion of the website. I am missing at least 20 URLs (blog pages + newly created resource pages). I have checked those missing URLs and all of them are index and they're not blocked by the robots.txt.
Any idea why this is happening? I need to make sure all wanted URLs to be generated in an XML sitemap.
Thanks!
-
Gaston,
Interestingly enough by default the generator only located only half of the URLs. I hope that one of those 2 fields will do the trick.
-
Hi Taysir,
I´ve never used that service. I suspect that the section you refer to should do the trick.
I believe that you do know how many URLs there are in the whole site, so you can compare how much pro-sitemaps.com finds to your numbers.Best luck!
GR -
Thanks for your response Gaston. These pages are definitely not blocked by the robots.txt file. I think that it is an internal linking problem. I actually subscribed to pro-sitemap.com and was wondering if I should use this section and add remaining sitemap URLs that are missing: https://cl.ly/0k0t093f0Y1T
Do you think this would do the trick?
-
Google not only provides a basic template you could do the sitemap manually if you wished, and this link has Google listing several dozen open source sitemap generators.
If Google Webmaster's can't read the one you generated fully, then clearly an alternate generator should definitely fix that for you. Good luck!
-
Hi taysir!
Have you tried any other crawler to check whether those pages can be finded?
I'd strongly suggest you Screaming Frog spider, the free version allows you up to 500 URLs. Also, it has a feature to create sitemaps from the crawled URLs. Even though dont know if that available in the free version.
Here some info about that feature: XML sitemap genetator - Screaming FrogUsual issues in not being findable are:
- Poor internal linking
- Not having a sitemap (this is why you find out)
- Blocked resources in robots.txt
- Blocked pages with robots meta tag
That being said, its completely normal that Google has indexed pages that you cant find in a AdHoc crawl, that is because GoogleBot could have found those pages from external linking.
Also keep in mind that having pages blocked with Robots.txt or robots meta tag will not prevent that page from being indexed nor will make them deindex if you add some rules to block them.Hope it helps.
Best luck
GR
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Internal link is creating duplicate content issues and generating 404s from website crawl.
Not sure what the best way to describe it but the site is built with Elementor page builder. We are finding out that a feature that is included with a pop modal window renders an HTML code as so: Click So when crawled I think the crawling is linking itself for some reason so the crawl returns something like this: xyz.com/builder/listing/ - what we want what we don't want xyz.com/builder/listing/ xyz.com/builder/listing/%23elementor-action%3Aaction%3Dpopup%3Aopen%26settings%3DeyJpZCI6Ijc2MCIsInRvZ2dsZSI6ZmFsc2V9/ xyz.com/builder/listing/%23elementor-action%3Aaction%3Dpopup%3Aopen%26settings%3DeyJpZCI6Ijc2MCIsInRvZ2dsZSI6ZmFsc2V9//%23elementor-action%3Aaction%3Dpopup%3Aopen%26settings%3DeyJpZCI6Ijc2MCIsInRvZ2dsZSI6ZmFsc2V9/ so you'll notice how that string in the HREF is appended each time and it loops a couple times. Could I 301 this issue, what's the best way to go about handling something like this? It's causing duplicate meta descriptions/content errors for some listing pages we have. I did add a rel='nofollow' to the anchor tag with JavaScript but not sure if that'll help.
Technical SEO | | JoseG-LP0 -
Trailing Slashes on URLs
Hi everyone I have a question on trailing slashes in URL. The crux of it is this: is having both: example.com/subdirectory/ and: example.com/subdirectory on all of your subdirectories considered duplicate content by Google - or in some other way really bad? We have done a heck a lot of research into this, and it would seem...no one knows for sure (it is easy to get lost in a sea of Webmaster tool forums from 2012). Google itself has both URLs for it's subdirectories (try https://www.google.co.uk/maps and https://www.google.co.uk/maps/) as does Moz; and yet there are some rumblings on the internet of people who think you must put a 'redirect' (although not really a redirect as it isn't a 301) in your htaccess file to one or the other (so for example.com/subdirectory/ would 'forward' to example.com/subdirectory); and this is what bbc.co.uk do. We tried putting this htaccess 'forward' in as an experiment, but I noticed our site then stopped being fully crawled by Google bot, so we reversed it. Can any one shed any light?
Technical SEO | | NickOrbital0 -
Having Problems to Index all URLs on Sitemap
Hi all again ! Thanks in advance ! My client's site is having problems to index all its pages. I even bought the full extension of XML Sitemaps and the number of urls increased, but we still have problems to index all of them. What are the reasons? The robots.txt is open for all robots, we only prohibit users and spiders to enter our Intranet. I've read that duplicate content and 404's can be the reason. Anything else?
Technical SEO | | Tintanus0 -
Do the terms in a website url drive search hits
I've tried to do a search on a few key words that I knew was on my landing page and I couldn't get Google to find it. So I thought maybe I needed to change my url to reflect a few the terms.
Technical SEO | | Toal0 -
Website dropping in ranking
Hello My website is www.invitationsforless.ie and on google Ireland it was ranking no 6 for the keyword "wedding invitations" and I was doing quite well from this but it has now has moved down along the rankings to no 10 and I have gone from the first page on google to the second page which is very disappointing. I did recently change the blurb on my homepage - would this have effected it. Please help I don't know what to do Thanks Linda
Technical SEO | | invitationsforless0 -
Duplicate content with same URL?
SEOmoz is saying that I have duplicate content on: http://www.XXXX.com/content.asp?ID=ID http://www.XXXX.com/CONTENT.ASP?ID=ID The only difference I see in the URL is that the "content.asp" is capitalized in the second URL. Should I be worried about this or is this an issue with the SEOmoz crawl? Thanks for any help. Mike
Technical SEO | | Mike.Goracke0 -
Sitemap Creation
Hi I am looking for the best way to generate an XML sitemap for webmaster tools for my website http://www.cheapfindergames.com. I have come across http://www.xml-sitemaps.com/ but it only allows up to 500 links. Is there a PHP script that any experts could share that would create the XML map that I could upload please? Many Thanks
Technical SEO | | ocelot0 -
How to attach at text to image that other websites use from my website
I often have other websites link to my website. They will do this with an image that they pull off of my website. (actually my website continues to serve the image). These inbound links are great, but they don't have alt text. Is there a way for me to attach alt text to the images, or is this something the other website needs to code themselves?
Technical SEO | | EugeneF0