Sitemaps and Indexed Pages
-
Hi guys,
I created an XML sitemap and submitted it for my client last month.
Now the developer of the site has also been messing around with a few things.
I've noticed on my Moz site crawl that indexed pages have dropped significantly.
Before I put my foot in it, I need to figure out if submitting the sitemap has caused this.. can a sitemap reduce the pages indexed?
Thanks
David.
-
Thanks Eli!
I guess I was wondering if the MOZ Bot only followed pages that were in the sitemap. It was generated by Screaming Frog I have trusted it to include all relevant pages!
I have put in a more detailed description in the response below. Overall I need to investigate further but i'm satisfied that the sitemap has not caused the drop!
-
Thanks Martijn!
I guess I was wondering if the MOZ Bot only followed pages that were in the sitemap. It was generated by Screaming Frog I have trusted it to include all relevant pages!
To elaborate.
There were about 80,000 pages and I used canonical, no index, and redirects to clean up a rather large mess of filter URL's and dup content.
That dropped the pages to about 14k. Then I submitted the sitemap last month and now the crawl only found 4k pages.
Further investigation is needed on my behalf but I wanted to double check that this sudden drop was not because of a sitemap! Thanks for clarifying that!
-
Hi David,
Messing up, Changing or Updating, Deleting a Sitemap is not necessarily something that will decrease the number of ranked or crawled pages. It usually is used a signal to find new pages and figure out if old ones are deleted. But the chances that your sitemap have had a significant impact in what kind of pages went down is something I would find unlikely. It could happen though that you'd see the opposite, an increase in pages indexed/submitted/crawled after you submit a sitemap.
Martijn.
-
Hey David!
Thanks for reaching out to us!
Unfortunately I am not an SEO consultant / Web Developer so I cannot offer specific advice, but I'm sure there are loads of members here who would love to help and have a lot more knowledge than I do! A few things I have picked up which may help are the following:
Try to determine when the drop started, did it drop when you submitted the XML sitemap or when the developer changed certain things? This could help point to the reason for the drop in indexing. There are a variety of reasons as to why Google may not choose to index pages, however some of the common ones are:
-
Check your robots.txt to ensure those pages are still crawlable
-
Check for duplicate content / was there any canonical changes?
-
One of the tools you could use to help keep track of ranking fluctuations is mozcast (http://mozcast.com/). Was there turbulence in the Google algorithm when the indexed pages dropped significantly?
If you want us to have a look at your specific campaign to investigate further could you please pop an email over to help@moz.com.
Thanks!
Eli
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Still not got any index update data.
Is anyone finding that they haven't got the results of the update yet? I have tried some competitors and they are not updated either.
API | | AHC_SEO0 -
January’s Mozscape Index Release Date has Been Pushed Back to Jan. 29th
With a new year brings new challenges. Unfortunately for all of us, one of those challenges manifested itself as a hardware issue within one of the Mozscape disc drives. Our team’s attempts to recover the data from the faulty drive only lead to finding corrupted files within the Index. Due to this issue we had to push the January Mozscape Index release date back to the 29<sup>th</sup>. This is not at all how we anticipated starting 2016, however hardware failures like this are an occasional reality and are also not something we see being a repeated hurdle moving forward. Our Big Data team has the new index processing and everything is looking great for the January 29<sup>th</sup> update. We never enjoy delivering bad news to our faithful community and are doing everything in our power to lessen these occurrences. Reach out with any questions or concerns.
API | | IanWatson2 -
Bulk Page Authority Tracking
Hi Is there a way in Moz to identify your page authority by landing page, possibly crawling the site and providing this in bulk so you don't have to go through and check each page? I want to track how my page authority for certain pages moves over time. Thank you
API | | BeckyKey0 -
September's Mozscape Update Broke; We're Building a New Index
Hey gang, I hate to write to you all again with more bad news, but such is life. Our big data team produced an index this week but, upon analysis, found that our crawlers had encountered a massive number of non-200 URLs, which meant this index was not only smaller, but also weirdly biased. PA and DA scores were way off, coverage of the right URLs went haywire, and our metrics that we use to gauge quality told us this index simply was not good enough to launch. Thus, we're in the process of rebuilding an index as fast as possible, but this takes, at minimum 19-20 days, and may take as long as 30 days. This sucks. There's no excuse. We need to do better and we owe all of you and all of the folks who use Mozscape better, more reliable updates. I'm embarassed and so is the team. We all want to deliver the best product, but continue to find problems we didn't account for, and have to go back and build systems in our software to look for them. In the spirit of transparency (not as an excuse), the problem appears to be a large number of new subdomains that found their way into our crawlers and exposed us to issues fetching robots.txt files that timed out and stalled our crawlers. In addition, some new portions of the link graph we crawled exposed us to websites/pages that we need to find ways to exclude, as these abuse our metrics for prioritizing crawls (aka PageRank, much like Google, but they're obviously much more sophisticated and experienced with this) and bias us to junky stuff which keeps us from getting to the good stuff we need. We have dozens of ideas to fix this, and we've managed to fix problems like this in the past (prior issues like .cn domains overwhelming our index, link wheels and webspam holes, etc plagued us and have been addressed, but every couple indices it seems we face a new challenge like this). Our biggest issue is one of monitoring and processing times. We don't see what's in a web index until it's finished processing, which means we don't know if we're building a good index until it's done. It's a lot of work to re-build the processing system so there can be visibility at checkpoints, but that appears to be necessary right now. Unfortunately, it takes time away from building the new, realtime version of our index (which is what we really want to finish and launch!). Such is the frustration of trying to tweak an old system while simultaneously working on a new, better one. Tradeoffs have to be made. For now, we're prioritizing fixing the old Mozscape system, getting a new index out as soon as possible, and then working to improve visibility and our crawl rules. I'm happy to answer any and all questions, and you have my deep, regretful apologies for once again letting you down. We will continue to do everything in our power to improve and fix these ongoing problems.
API | | randfish11 -
In lue of the canceled Moz Index update
Hey Moz, Overall we love your product and are using it daily to help us grow, part of that has been to rely on the Moz Index for DA and PA as well as places where we are doing positive linking through genuine partnerships and reviews of clients. We were really excited to see any the results for this month as we have been partner linked from lots of high reputation sites and google seems to agree as our rankings are moving up weekly. The question from our marketing team is, since a significant part of Moz will not be available to us this month, will there be any compensation handed out to the paying community. PS: I am an engineer and I know how you have probably lost a very large set of data which cant simply be re-crawled over night but Moz Pro is not a cheap product and we do expect it to work. Source: https://moz.com/products/api/updates Kind Regards.
API | | SundownerRV0 -
Mozscape Index
Hello: There was a Mozscape Index scheduled 9/8/2015 and now it go pushed back October 8,2015. There seems to be a lot of delays with the Mozscape Index. Is this something we should expect? Updates every 2 months instead of every month? Thanks!
API | | sderuyter1 -
Suggestion - Should OSE include "citation links" within its index?
This is really a suggestion (and debate to see if people agree with me), with regard to including "citation links" within Moz tools, by default, as just another type of link NOTE: when I am talking about "citation links" I am talking about a link that is not wrapped in a link tag and is therefore non clickable, eg moz.com Obviously Moz have released the mentions tool, which is great, and also FWE which is also great. However, it would seem to me that they are missing a trick in that "citation links" don't feature in the main link index at all. We know that Google as a minimum uses them as an indicator to crawl a page ( http://ignitevisibility.com/google-confirms-url-citations-can-help-pages-get-indexed/ ), and also that they don't pass page rank - HOWEVER, you would assume that google does use then as part of their alogrithm in some manner as they do nofollow links. It would seem to me that a "Citation Link" could (possibly) be deemed more important than a no follow link in Googles alogrithm, as a "no follow" link is a clear indication by the site owner that they don't fully trust the link, but a citation link would neither indicate trust or non trust. So - my request is to get "citation links" into the main link index (and the Just Discovered index for that matter). Would others agree??
API | | James770 -
On-Page Reports showing old urls
While taking a look at our sites on-page reports I noticed some of our keywords with very old urls that haven't existed for close to a year. How do I make sure moz's keyword ranking is finding the correct page and make sure I'm not getting graded on that keywords/urls that don't exist any more or have been 301'd to new urls? Is there a way to clean these out? My on-page reports say I have 62 reports for only a total of 34 keywords in rankings. As you can see from the image most of the urls for "tax folder" have now been 301'd to not include /product or /category but moz is still showing them with the old url structure. BTW our site is minespress.com 2KdGcPL.png
API | | smines0