Sitemaps during a migration - which is the best way of dealing with them?
-
Many SEOs I know simply upload the new sitemap once the new site is launched - some keep the old site's URLs on the new sitemap (for a while) to facilitate the migration - others upload both the old and the new website together, to support the migration. Which is the best way to proceed? Thanks, Luke
-
Very much appreciated CleverPhD!
-
Found this while looking for a answer for another question could not find this the other day- right from the mouth of Google to not include pages that do not exist in XML sitemaps.
http://googlewebmastercentral.blogspot.com/2014/10/best-practices-for-xml-sitemaps-rssatom.html
URLs
URLs in XML sitemaps and RSS/Atom feeds should adhere to the following guidelines:
- Only include URLs that can be fetched by Googlebot. A common mistake is including URLs disallowed by robots.txt — which cannot be fetched by Googlebot, or including URLs of pages that don't exist.
-
Mate nailed it completely!
-
I would say make sure that your new sitemap has all the latest URLs. The reason people say that you should have old URLs in the sitemap is so that Google can quickly crawl the old URLs to find the 301s to the new URLs.
I am not convinced that this helps. Why?
Google already has all your old URLs in its systems. You would be shocked how far back Google has data on your site with old URLs. I have a site that is over 10 years old and I still see URL structures referenced in Google from 7 years ago that have a 301 in place. Why is this?
Google will assume that, "Well, I know that this URL is a 301 or 404, but I am going to crawl it every once in a while just to make sure the webmaster did not do this by mistake." You can notice this in Search Console error or link reports when you setup 301s or 404s, they may stay in there for months and even come back once they fall out of the error list. I had an occurrence where I had some old URLs showing up in the SERPs and various Search Console reports for a site for 2 years following proper 301 setups. Why was this happening?
This is a large site and we still had some old content still linking to the old URLs. The solution was to delete the links in that old content and setup a canonical to self on all the pages to help give a definitive directive to Google. Google then finally replaced the old URLs with the new URLs in the SERPs and in the Search Console reports. The point here being that previously our site was giving signals (links) that told Google that some of the old URLs were still valid and Google was giving us the benefit of the doubt.
If you want to have the new URLs seen by Google, show them in your sitemap. Google already has all the old URLs and will check them and find the 301s and fix everything. I would also recommend the canonical to self on the new pages. Don't give any signals to Google that your old URLs are still valid by linking to them in any way, especially your sitemap. I would even go so far as to reach out to any important sites that link to old URLs to ask for an updated link to your site.
As I mentioned above, I do not think there is an "advantage" of getting the new URLs indexed quicker by putting old URLs in the sitemap that 301 to the new URLs. Just watch your Google Search Console crawl stats. Once you do a major overhaul, you will see Google really crawl your site like crazy and they will update things pretty quick. Putting the old URLs in the sitemap is a conflicting signal in that process and has the potential to slow Google down IMHO.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Dealing with 404s during site migration
Hi everyone - What is the best way to deal with 404s on an old site when you're migrating to a new website? Thanks, Luke
Intermediate & Advanced SEO | | McTaggart0 -
Dealing with past events
Hi We have a website which lists both upcoming and past events. Currently everything is indexed by google, with no real issues (usually it finds the most up-to-date events) and we have deprioritised the past events in the sitemap. Do I need to go one step further and noindex events which are past or just leave it as-is? They dont really hold much value, but sometimes will have a number of incoming links and social media shares pointing to them. We want to keep the page active for visitors, just wondering about google (there's no real link between past events and future either, so difficult to 'point' to newer version of an event) We have approx 1M 'past' events and growing so its a big change. Also would you keep them in sitemap with lower priority, or just remove them? EDIT: Just seen a Matt Cutts post from 2014 which indicates than an 'unavailable_after' meta tag might be best?
Intermediate & Advanced SEO | | benseb0 -
Best option for Affiliate links on your website?
Hello! I have a website which is completely affiliate based. What is the best option for the links on-page? Examples would be: affiliate.website.com/12901730?2=3532523=user12342901730?2=3532523=user?Whittie www.website.com/affiliate=user?Whittie=load-of-tracking=date=blah=blaH?blah And So on... Which look ugly as sin when you hover over the Anchor Text. Ideally I would like a 301 redirect to mysite.com/goto/affiliatename, which would then have a rel nofollow. This way I could also track the exit pages via Analytics too guess, which I've not currently got set up and i'm desperate for it to be done. Does this method effect anything on search engines though? I've seen mixed report, but going back to 2011 which is too long ago in the SEO world. Another option is to use the likes of "Bit.ly" or use another domain and host 301s on there? The new bit.ly integration from moz might come in handy here. Please advise on the subject, I really appreciate any help on this, as i'm at a brick wall. Thanks
Intermediate & Advanced SEO | | Whittie0 -
What is the best way to hide duplicate, image embedded links from search engines?
**Hello! Hoping to get the community’s advice on a technical SEO challenge we are currently facing. [My apologies in advance for the long-ish post. I tried my best to condense the issue, but it is complicated and I wanted to make sure I also provided enough detail.] Context: I manage a human anatomy educational website that helps students learn about the various parts of the human body. We have been around for a while now, and recently launched a completely new version of our site using 3D CAD images. While we tried our best to design our new site with SEO best practices in mind, our daily visitors dropped by ~15%, despite drastic improvements we saw in our user interaction metrics, soon after we flipped the switch. SEOMoz’s Website Crawler helped us uncover that we now may have too many links on our pages and that this could be at least part of the reason behind the lower traffic. i.e. we are not making optimal use of links and are potentially ‘leaking’ link juice now. Since students learn about human anatomy in different ways, most of our anatomy pages contain two sets of links: Clickable links embedded via JavaScript in our images. This allows users to explore parts of the body by clicking on whatever objects interests them. For example, if you are viewing a page on muscles of the arm and hand and you want to zoom in on the biceps, you can click on the biceps and go to our detailed biceps page. Anatomy Terms lists (to the left of the image) that list all the different parts of the body on the image. This is for users who might not know where on the arms the biceps actually are. But this user could then simply click on the term “Biceps” and get to our biceps page that way. Since many sections of the body have hundreds of smaller parts, this means many of our pages have 150 links or more each. And to make matters worse, in most cases, the links in the images and in the terms lists go to the exact same page. My Question: Is there any way we could hide one set of links (preferably the anchor text-less image based links) from search engines, such that only one set of links would be visible? I have read conflicting accounts of different methods from using JavaScript to embedding links into HTML5 tags. And we definitely do not want to do anything that could be considered black hat. Thanks in advance for your thoughts! Eric**
Intermediate & Advanced SEO | | Eric_R0 -
Is there a more practical way to see OSE metrics?
Hey guys, is there a way to do these things? Paste a list of URLs into a Google Docs spreadsheet and get the main metrics for each domain (like domain authority and page authority) pulled into the spreadsheet using OSE's API? I know it can be done, but has someone done it already? And if you have, would you please share the link? Do a Google search and see the DA and PA for all the domains below each result. SEOQuake does a good job with this but they don't show SEOmoz's metrics. Is there a better way to check for keyword difficulty than pasting 5 keywords at a time in the SEOmoz tool? Is there an API for this? What about a Google spreadsheet? Thank you so much! Zeke
Intermediate & Advanced SEO | | elcrazyhorse0 -
Best Product URL For Indexing
My proposed URL: mydomain.com/products/category/subcategory/product detail Puts my products 4 levels deep. Is this too deep to get my products indexed?
Intermediate & Advanced SEO | | waynekolenchuk0 -
Best way to consolidate link juice
I've got a conundrum I would appreciate your thoughts on. I have a main container page listing a group of products, linking out to individual product pages. The problem I have is the all the product pages target exactly the same keywords as the main product page listing all the products. Initially all my product pages were ranking much higher then the container page, as there was little individual text on the container page, and it was being hit with a duplicate content penality I believe. To get round this, on the container page, I have incorporated a chunk of text from each product listed on the page. However, that now means "most" of the content on an individual product page is also now on the container page - therefore I am worried that i will get a duplicate content penality on the product pages, as the same content (or most of it) is on the container page. Effectively I want to consolidate the link juice of the product pages back to the container page, but i am not sure how best to do this. Would it be wise to rel=canonical all the product pages back to the container page? Rel=nofollow all the links to the product pages? - or possibly some other method? Thanks
Intermediate & Advanced SEO | | James770 -
Best way to stop pages being indexed and keeping PageRank
If for example on a discussion forum, what would be the best way to stop pages such as the posting page (where a user posts a topic or message) from being indexed AND not diluting PageRank too? If we added them to the Disallow on robots.txt, would pagerank still flow through the links to those blocked pages or would it stay concentrated on the linking page? Your ideas and suggestions will be greatly appreciated.
Intermediate & Advanced SEO | | Peter2640