Sitemap generator partially finding list of website URLs
-
Hi everyone,
When creating my XML sitemap here it is only able to detect a portion of the website. I am missing at least 20 URLs (blog pages + newly created resource pages). I have checked those missing URLs and all of them are index and they're not blocked by the robots.txt.
Any idea why this is happening? I need to make sure all wanted URLs to be generated in an XML sitemap.
Thanks!
-
Gaston,
Interestingly enough by default the generator only located only half of the URLs. I hope that one of those 2 fields will do the trick.
-
Hi Taysir,
I´ve never used that service. I suspect that the section you refer to should do the trick.
I believe that you do know how many URLs there are in the whole site, so you can compare how much pro-sitemaps.com finds to your numbers.Best luck!
GR -
Thanks for your response Gaston. These pages are definitely not blocked by the robots.txt file. I think that it is an internal linking problem. I actually subscribed to pro-sitemap.com and was wondering if I should use this section and add remaining sitemap URLs that are missing: https://cl.ly/0k0t093f0Y1T
Do you think this would do the trick?
-
Google not only provides a basic template you could do the sitemap manually if you wished, and this link has Google listing several dozen open source sitemap generators.
If Google Webmaster's can't read the one you generated fully, then clearly an alternate generator should definitely fix that for you. Good luck!
-
Hi taysir!
Have you tried any other crawler to check whether those pages can be finded?
I'd strongly suggest you Screaming Frog spider, the free version allows you up to 500 URLs. Also, it has a feature to create sitemaps from the crawled URLs. Even though dont know if that available in the free version.
Here some info about that feature: XML sitemap genetator - Screaming FrogUsual issues in not being findable are:
- Poor internal linking
- Not having a sitemap (this is why you find out)
- Blocked resources in robots.txt
- Blocked pages with robots meta tag
That being said, its completely normal that Google has indexed pages that you cant find in a AdHoc crawl, that is because GoogleBot could have found those pages from external linking.
Also keep in mind that having pages blocked with Robots.txt or robots meta tag will not prevent that page from being indexed nor will make them deindex if you add some rules to block them.Hope it helps.
Best luck
GR
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Changing Urls
Hi All, I have a question I hope someone can help me with. I ran a scan on a website and it has a stack of urls that are far too long. I am going through and changing the urls to shorter ones. But my question is regarding redirections. Wordpress seems to be automatically redirecting the old urls to the new ones, should i be adding a more solid 301 in as well or is the wordpress redirect enough? I ask as they dont all seem to stay redirecting Thanks in advance for the help
Technical SEO | | DaleZon2 -
Video sitemap
Hello, I'm no Wordpress developer so need a little help please. I have manually created a video sitemap. It needs to be uploaded to the website. Where should the .xml file be uploaded onto Wordpress? Which directory? Is it Ok to add the code to a notepad file and upload? I'm trying to avoid the plugin route if possible. Thanks
Technical SEO | | AL123al0 -
CMS Auto Generated Sitemap Work Around?
Hey Moz Community, The Shopify ecommerce platform auto generates xml sitemaps and robots.txt for you. Frustratingly there is no way to augment either of these. If I noindex on a page it will still show up in the site map... Causing inconstancy with the sitemap submitted to GWT. In theory if put the MY version of the sitemap on site and point GWT to MY version.. Would this solve the inconstancy ? Or would Googlebot go in and still crawl the default /sitemap.xml anyway? Any suggestions and insight is greatly appreciated!
Technical SEO | | paul-bold0 -
URL removals
Hello there, I found out that some pages of the site have two different URL's pointing at the same page generating duplicate content, title and description. Is there a way to block one of them? cheers
Technical SEO | | PremioOscar0 -
Website Redirects
Background information: We have a website (devicelock.com) which is currently our corporate website. The company use to operate under (ntutility.com) which is now being redirected to devicelock.com via a DNS Forward - 302 Redirect. The IT admin (a founder of the company) is reluctant to change it to a 301. The current flow is ntutility.com redirects to protect-me.com then redirects again to devicelock.com. When i search up Devicelock on google, it shows up as ntutlity.com. There is no devicelock.com homepage on google search. Question: Are there any negative implications about this? Is this hurting our SEO in any way? When i do link building, will this have any negative affects? Will my links for devicelock be attributed to devicelock.com?
Technical SEO | | Devicelock0 -
Optimizing Flash websites
Does anyone know the latest on how well Google is indexing flash sites? I'm talking to a prospect now whose site is 100% flash, but looking at it all the content within the site is indexed and returning in the results. This is promising enough, but does anyone know exactly how well Google is indexing flash content these days and are there still any problems with that? Thanks!
Technical SEO | | MattBarker0 -
Would you shorten this url, and if so how?
I designed the structure of my website way before I even thought about SEO. I run a website that requires me to categorize articles is somewhat deep nested categories so an example url would be as follows http://www.yakangler.com/articles/news/new-products/boats/item/1442-jackson-kayak-launches-the-big-tuna Would you shorten the url to somethign like this? http://www.yakangler.com/a/n/np/b/item/1442-jackson-kayak-launches-the-big-tuna If so how would you manage the redirects I'm unsure how to add a 301 redirect in my .htaccess file that wouldn't require me to add one for every single article. Could I do it with a rule that recognizes only the middle part of the url and redirect it accordingly? Thanks for any advice you might have!
Technical SEO | | mr_w0 -
Sitemaps for Google
In Google Webmaster Central, if a URL is reported in your site map as 404 (Not found), I'm assuming Google will automatically clean it up and that the next time we generate a sitemap, it won't include the 404 URL. Is this true? Do we need to comb through our sitemap files and remove the 404 pages Google finds, our will it "automagically" be cleaned up by Google's next crawl of our site?
Technical SEO | | Prospector-Plastics0