Mapping and tracking old and new information architecture
-
Howdy.
So I'm working on "example.com", which has thousands of URLs. The site is going to be redesigned, with some changes to the information architecture.
I'm trying to think of a good way to organize and account for similarities and differences between the original information architecture and the new one. This should help with building 301s.
I've downloaded a list of URLs from example.com from Open Site Explorer. What I would love to do is generate a visual "tree" of the site based on the output from Open Site Explorer. It would basically look like a pyramid with all of the subfolders branching out.
Does anybody know of a tool out there that will do this for me? Or am I going to have a long day in Excel?
Any other thoughts on working through this process are welcome.
Thank you!
-
I wouldn't use OSE for this, given that they may not crawl all your urls
I suggest using a specifc cralwer that you can set to crawl the whole site. Give Xenu, Screaming frog or IIS toolkit a go to get a good idea of your urls. After you go live, make sure you have mapped all your old urls across to new ones
In my experience, a site never actually looks like a tree, people just like to describe it that way because its simple
S
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Issue with GA tracking and Native AMP
Hi everyone, We recently pushed a new version of our site (winefolly.com), which is completely AMP native on WordPress (using the official AMP for WordPress plugin). As part of the update, we also switched over to https. In hindsight we probably should have pushed the AMP version and HTTPS changes in separate updates. As a result of the update, the traffic in GA has dropped significantly despite the tracking code being added properly. I'm also having a hard time getting the previous views in GA working properly. The three views are: Sitewide (shop.winefolly.com and winefolly.com) Content only (winefolly.com) Shop only (shop.winefolly.com) The sitewide view seems to be working, though it's hard to know for sure, as the traffic seems pretty low (like 10 users at any given time) and I think that it's more that it's just picking up the shop traffic. The content only view shows maybe one or two users and often none at all. I tried a bunch of different filters to only track to the main sites content views, but in one instance the filter would work, then half an hour later it would revert to no traffic. The filter is set to custom > exclude > request uri with the following regex pattern: ^shop.winefolly.com$|^checkout.shopify.com$|/products/.|/account/.|/checkout/.|/collections/.|./orders/.|/cart|/account|/pages/.|/poll/.|/?mc_cid=.|/profile?.|/?u=.|/webstore/. Testing the filter it strips out anything not related to the main sites content, but when I save the filter and view the updated results, the changes aren't reflected. I did read that there is a delay in the filters being applied and only a subset of the available data is used, but I just want to be sure I'm adding the filters correctly. I also tried setting the filter to predefined, exclude host equal to shop.winefolly.com, but that didn't work either. The shop view seems to be working, but the tracking code is added via Shopify, so it makes sense that it would continue working as before. The first thing I noticed when I checked the views is that they were still set to http, so I updated the urls to https. I then checked the GA tracking code (which is added as a json object in the Analytics setting in the WordPress plugin. Unfortunately, while GA seems to be recording traffic, none of the GA validators seem to pickup the AMP tracking code (adding using the amp-analytics tag), despite the json being confirmed as valid by the plugin. This morning I decided to try a different approach and add the tracking code via Googles Tag Manager, as well as adding the new https domain to the Google Search Console, but alas no change. I spent the whole day yesterday reading every post I could on the topic, but was not able to find any a solution, so I'm really hoping someone on Moz will be able to shed some light as to what I'm doing wrong. Any suggestions or input would be very much appreciated. Cheers,
Technical SEO | | winefolly
Chris (on behalf of WineFolly.com)0 -
Is it possible to deindex old URLs that contain duplicate content?
Our client is a recruitment agency and their website used to contain a substantial amount of duplicate content as many of the listed job descriptions were repeated and recycled. As a result, their rankings rarely progress beyond page 2 on Google. Although they have started using more unique content for each listing, it appears that old job listings pages are still indexed so our assumption is that Google is holding down the ranking due to the amount of duplicate content present (one software returned a score of 43% duplicate content across the website). Looking at other recruitment websites, it appears that they block the actual job listings via the robots.txt file. Would blocking the job listings page from being indexed either by robots.txt or by a noindex tag reduce the negative impact of the duplicate content, but also remove any link juice coming to those pages? In addition, expired job listing URLs stay live which is likely to be increasing the overall duplicate content. Would it be worth removing these pages and setting up 404s, given that any links to these pages would be lost? If these pages are removed, is it possible to permanently deindex these URLs? Any help is greatly appreciated!
Technical SEO | | ClickHub-Harry0 -
Migrate Old Archive Content?
Hi, Our team has recently acquired several newsletter titles from a competitor. We are currently deciding how to handle the archive content on their website which now belongs to us. We are thinking of leaving the content on their site (so as not to suddenly remove a chunk of their website and harm them) but also replicating it on ours with a canoncial link to say our website is the original source. The articles on their site go back as far as 2010. Do you think it would help or hinder our site to have a lot of old archive content added to it? I'm thinking of content freshness issues.Even though the content is old some of it will still be interesting or relevant. Or do you think the authority and extra traffic this content could bring in makes it worth migrating. Any help gratefully received on the old content issue or the idea of using canonical links in this way. Many Thanks
Technical SEO | | frantan0 -
Should we re-use our old re-directed domain for a new wesbite ?
Hi One our clients has an old domain that has been redirected to another website of his. Now he is asking us to build a new website for that domain and direct it back. This is new website will be very relative to it's own old content and where it has been re-directed recently. I guess the only benefit of this would be taking advantage of the age of this domain. Do you recommend doing that or getting him a new domain ?
Technical SEO | | Dynamo-Web0 -
301 from old domain to new domain
Hi, I need to create a 301 redirect for all internal pages located on organic7thheaven.com to the homepage of our new site at http://www.7thheavennaturals.com/ Currently internal pages of the old site such as the following are returning a page not found www.organic7thheaven.com/products/deepcleansing/miraclemud.asp Can anyone help me in setting up a .htaccess file for this problem please? Thanks
Technical SEO | | MJMarketing0 -
Changed cms - google indexes old and new pages
Hello again, after posting below problem I have received this answer and changed sitemap name Still I receive many duplicate titles and metas as google still compares old urls to new ones and sees duplicate title and description.... we have redirectged all pages properly we have change sitemap name and new sitemap is listed in webmastertools - old sitemap includes ONLY new sitemap files.... When you deleted the old sitemap and created a new one, did you use the same sitemap xml filename? They will still try to crawl old URLs that were in your previous sitemap (even if they aren't listed in the new one) until they receive a 404 response from the original sitemap. If anone can give me an idea why after 3 month google still lists the old urls I'd be more than happy thanks a lot Hello, We have changed cms for our multiple language website and redirected all odl URl's properly to new cms which is working just fine.
Technical SEO | | Tit
Right after the first crawl almost 4 weeks ago we saw in google webmaster tool and SEO MOZ that google indexes for almost every singlepage the old URL as well and the new one and sends us for this duplicate metatags.
We deleted the old sitemap and uploaded the new and thought that google then will not index the old URL's anymore. But we still see a huge amount of duplicate metatags. Does anyone know what else we can do, so google doe snot index the old url's anymore but only the new ones? Thanks so much Michelle0 -
How long does it take open site explorer to recognize new links?
I'm building a steady link profile to one of my websites and the new links still haven't shown up in open site explorer even after 2 months. How long does it take OSE to recognize new backlinks?
Technical SEO | | C-Style2