Sitemaps and Indexed Pages
-
Hi guys,
I created an XML sitemap and submitted it for my client last month.
Now the developer of the site has also been messing around with a few things.
I've noticed on my Moz site crawl that indexed pages have dropped significantly.
Before I put my foot in it, I need to figure out if submitting the sitemap has caused this.. can a sitemap reduce the pages indexed?
Thanks
David.
-
Thanks Eli!
I guess I was wondering if the MOZ Bot only followed pages that were in the sitemap. It was generated by Screaming Frog I have trusted it to include all relevant pages!
I have put in a more detailed description in the response below. Overall I need to investigate further but i'm satisfied that the sitemap has not caused the drop!
-
Thanks Martijn!
I guess I was wondering if the MOZ Bot only followed pages that were in the sitemap. It was generated by Screaming Frog I have trusted it to include all relevant pages!
To elaborate.
There were about 80,000 pages and I used canonical, no index, and redirects to clean up a rather large mess of filter URL's and dup content.
That dropped the pages to about 14k. Then I submitted the sitemap last month and now the crawl only found 4k pages.
Further investigation is needed on my behalf but I wanted to double check that this sudden drop was not because of a sitemap! Thanks for clarifying that!
-
Hi David,
Messing up, Changing or Updating, Deleting a Sitemap is not necessarily something that will decrease the number of ranked or crawled pages. It usually is used a signal to find new pages and figure out if old ones are deleted. But the chances that your sitemap have had a significant impact in what kind of pages went down is something I would find unlikely. It could happen though that you'd see the opposite, an increase in pages indexed/submitted/crawled after you submit a sitemap.
Martijn.
-
Hey David!
Thanks for reaching out to us!
Unfortunately I am not an SEO consultant / Web Developer so I cannot offer specific advice, but I'm sure there are loads of members here who would love to help and have a lot more knowledge than I do! A few things I have picked up which may help are the following:
Try to determine when the drop started, did it drop when you submitted the XML sitemap or when the developer changed certain things? This could help point to the reason for the drop in indexing. There are a variety of reasons as to why Google may not choose to index pages, however some of the common ones are:
-
Check your robots.txt to ensure those pages are still crawlable
-
Check for duplicate content / was there any canonical changes?
-
One of the tools you could use to help keep track of ranking fluctuations is mozcast (http://mozcast.com/). Was there turbulence in the Google algorithm when the indexed pages dropped significantly?
If you want us to have a look at your specific campaign to investigate further could you please pop an email over to help@moz.com.
Thanks!
Eli
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Navigation pages with a PA 1
Ok, you guys probably think this is a new website and i should just wait, but this is not the case.
API | | Forresult
We have 2 websites (old) websites with a DA of 34 and a DA of 19 and high PA values on the mainpage. Our problem: All the other pages stay at a page authority of 1. One website is build in Magento and one in Wordpress. Both websites have deeplinks, footerlinks en in-contentlinks. The other pages don't get any linkjuice according to Moz. We don't use any robot noindex,nofollow or nofollow links and the menu structure isn't the problem. Is anyone familiare with the problem? I know is shouldn't be concerned about PA/DA, but i just can't explain what's going on.0 -
Crawler unable to access pages
Hi crawler is unable to access site and crawl properly. Mainly for the backlink checker, it's producing no results There is nothing in the robots.txt file blocking crawler access. Any help is much appreciated as it's driving me crazy!
API | | 2Cubedie0 -
Frequency of Moz page authority updates?
I have some new pages on my site, and Moz gives them a very low PA ranking. I am wondering if these scores are updated monthly or quarterly? I'm not sure how frequently to check back for updated scoring.
API | | AndrewMicek0 -
/index.php causing a few issues
Hey Mozzers, Our site uses magento. Pages within the site (not categories or products) are set to display as www.domain.co.uk/page-url/ The hta access is set to redirect all version such as www.domain.co.uk/page-url to a url ending in a / However in google analytics and in moz landing page tracker these urls are being represented by www.domain.co.uk/page-url/index.php When visiting www.domain.co.uk/page-url/index.php a 404 is displayed. I know that by default when directed to a directory it automatically finds and displays the index file. So i understand why this is happening to some degree. However, when manually visiting this link does not exist. This poses a problem when trying to view the landing pages information in moz pro. I have 20 keywords being tracked in relation to www.domain.co.uk/page-url/ but because moz is recording it as www.domain.co.uk/page-url/index.php the keywords are unrelated so not showing information in relation to the page. Any ideas?
API | | ATP0 -
Mozscape Index update frequency problems?
I'm new to Moz, only a member for a couple months now. But I already rely heavily on the mozscape index data for link building, as I'm sure many people do. I've been waiting for the latest update (due today after delay), but am not seeing any mention of the data yet - does it normally get added later in the day? I'm not that impatient that I can't wait until later today or tomorrow for this index update, but what I am curious about is whether Moz is struggling to keep up, and if updates will continue to get more and more rare? For example, in 2013 I count 28 index updates. In 2014 that number dropped to 14 updates (50% drop). In 2015, there was only 8 (another 43% drop), and so far this year (until the March 2nd update is posted) there has only been 1. This isn't just a complaint about updates, I'm hoping to get input from some of the more experienced Moz customers to better understand (with the exception of the catastrophic drive failure) the challenges that Moz is facing and what the future may hold for update frequency.
API | | kevin.kembel1 -
August 3rd Mozscape Index Update (our largest index, but nearly a monthly late)
Update 5:27pm 8/4 - the data in Open Site Explorer is up-to-date, as is the API and Mozbar. Moz Analytics campaigns are currently loading in the new data, and all campaigns should be fully up-to-date by 4-10pm tomorrow (8/5). However, your campaign may have the new data much earlier as it depends on where that campaign falls in the update ordering. Hey gang, I wanted to provide some transparency into the latest index update, as well as give some information about our plans going forward with future indices. The Good News: This index, now that it's delivered, is pretty impressive. Mozscape's August index is 407 Billion URLs in size, nearly 100 Billion (~25%) bigger than our last record index size. We indexed 2.18 trillion links for the first time ever (prior record was 1.54 trillion). Correlations for Page Authority have gone up from 0.319 to 0.333 in the latest index, suggesting that we're getting a slightly more accurate representation of Google's use of links in rankings from this data (DA correlations remain constant at 0.185) Our hit ratio for URLs in Google's SERPs has gone up considerably, from 69.97% in our previous index to 78.66% in the August update. This indicates we are crawling and indexing more of what Google shows in the search results (a good benchmark for us). Note that a large portion of what's missing will be things published in the last 30-60 days while we were processing the index (after crawling had stopped). The Bad News: August's index was late by ~25 days. We know that reliable, consistent, on-time Mozscape updates are critically important to everyone who uses Moz's products. We've been working hard for years to get these to a better place, but have struggled mightily. Our latest string of failures was completely new to the team - a bunch of problems and issues we've never seen before (some due to the index size, but many due to odd things like a massive group of what appear to be spam domains using the Palau TLD extension clogging up crawl/processing, large chunks of pages we crawled with 10s of thousands of links which slow down the MozRank calculations, etc). While there's no excuse for delays, and we don't want to pass these off as such, we do want to be transparent about why we were so late. Our future plans include scaling back the index sizes a bit, dealing with the issues around spam domains, large link-list pages, some of the odd patterns we see in .pl and .cn domains, and taking one extra person from the Big Data team off of work on the new index system (which will be much larger and real-time rather than updated every 30 days) to help with Mozscape indices. We believe these efforts, and the new monitoring systems we've got will help us get better at producing high quality, consistent indices. Question everyone always asks: Why did my PA/DA change?! There are tons of reasons why these can change, and they don't necessarily mean anything bad about your site, your SEO efforts, or whether your links are helping you rank. PA and DA are predictive, correlated metrics that say nothing about how you're actually performing. They merely map better than most metrics to Google's global rankings across large SERP sets (but not necessarily your SERPs, which is what you should care about). That said, here's some of the reasons PA/DA do shift: The domains/pages with the highest PA/DA scores gain even faster than most of the domains below them, making it harder each index to get higher scores (since PA/DA are on a logarithmic scale, this is smoothed out somewhat - it would be much worse on a conventional scale, e.g. Facebook.com 100, everyone else 0.0003). Google's ranking algorithm introduces new elements, changes, modifies what they care about, etc. Moz crawls a set of the web that does or doesn't include the pages that are more likely to point to a given domain than another. Although our crawl tends to be representative, if you've got lots of links from deep pages on less popular domains in a part of the web far from the mainstream, we may not consistently crawl those well (or, we could overcrawl your sector because it recently received powerful links from the center of the web). My advice, as always, is to use PA/DA as relative scores. If your scores are falling, but your competitors' are falling more, that's not a bad thing. If your scores are rising, but your competitors' are rising faster, they're probably gaining ground on you. And, if you're talking about score changes in the 1-4 points range, that's not necessarily anything but noise. PA/DA scores often shift 1-4 points up or down in a new index so don't sweat it! Let me know if you've got more questions and I'll do my best to answer. You can also refer to the API update page here: https://moz.com/products/api/updates
API | | randfish8 -
Have Questions about the Jan. 27th Mozscape Index Update? Get Answers Here!
Howdy y'all. I wanted to give a brief update (not quite worthy of a blog post, but more than would fit in a tweet) about the latest Mozscape index update. On January 27th, we released our largest web index ever, with 285 Billion unique URLs, and 1.25 Trillion links. Our previous index was also a record at 217 Billion pages, but this one is another 30% bigger. That's all good news - it means more links that you're seeking are likely to be in this index, and link counts, on average, will go up. There are two oddities about this index, however, that I should share: The first is that we broke one particular view of data - 301'ing links sorted by Page Authority doesn't work in this index, so we've defaulted to sorting 301s by Domain Authority. That should be fixed in the next index, and from our analytics, doesn't appear to be a hugely popular view, so it shouldn't affect many folks (you can always export to CSV and re-sort by PA in Excel if you need, too - note that if you have more than 10K links, OSE will only export the first 10K, so if you need more data, check out the API). The second is that we crawled a massively more diverse set of root domains than ever before. Whereas our previous index topped out at 192 million root domains, this latest one has 362 million (almost 1.9X as many unique, new domains we haven't crawled before). This means that DA and PA scores may fluctuate more than usual, as link diversity are big parts of those calculations and we've crawled a much larger swath of the deep, dark corners of the web (and non-US/non-.com domains, too). It also means that, for many of the big, more important sites on the web, we are crawling a little less deeply than we have in the past (the index grew by ~31% while the root domains grew by ~88%). Often, those deep pages on large sites do more internal than external linking, so this might not have a big impact, but it could depend on your field/niche and where your links come from. As always, my best suggestion is to make sure to compare your link data against your competition - that's a great way to see how relative changes are occurring and whether, generally speaking, you're losing or gaining ground in your field. If you have specific questions, feel free to leave them and I'll do my best to answer in a timely fashion. Thanks much! p.s. You can always find information about our index updates here.
API | | randfish8 -
Does Moz's crawlers use _escaped_fragment_ to inspect pages on a single-page application?
I just got started, but got a 902 error code on some pages, with a message saying there might be an outage on my site. That's certainly not the case, so I'm wondering if the crawlers actually respect and use the escaped_fragment query parameter. Thanks, David.
API | | CareerDean0