Joomla to Wordpress site migration - thousands of 404s
-
I recently migrated a site from Joomla to Wordpress. In advance I exported the HTML pages from Joomla using Screaming Frog and did 301 redirects on all those pages.
However Webmaster Tools is now telling me (a week after putting the redirects in place) that there are >7k 404s. Many of them aren't HTML pages, just index.php files but I didn't think I would have to export these in my Screaming Frog crawl.
We have since done a blanket 301 redirect for anything with index.php in it but Webmaster Tools is still picking them up as 404s.
So my question is, what should I have done with Screaming Frog re exporting to ensure I captured all pages to redirect and what should I now do to fix the 404s that Webmaster Tools is picking up?
-
Hi There
Generally those types of 404's won't be too harmful - they sound like they may have been somewhat artificial WordPress pages.
What I would do is get your list now from Analytics or Webmaster Tools - this way you will capture URLs that actually got traffic or Impression in Google and redirect those.
So run a landing pages report, and an top pages report in webmaster tools - maybe for the last 6 months. Create a text file of all the URLs, and run them in list mode through Screaming Frog. Redirect any that 404.
If you were to go back in time, what I would have done with Screaming Frog is - let it crawl everything - you have to allow it to "follow redirects" and "ignore robots.txt" etc - I know Google is not supposed to crawl anything in robots.txt - but basically you'd be letting Screaming Frog get to everything, that way you don't miss any URLs.
-
I know it doesn't create redirects but I wanted to use it to figure out the list of files / pages to create 301 redirects for and then add these to the HTAccess file. However was I incorrect to just export the HTML files from Screaming Frog as there were only 500 of these but there are now 7000 404s in Webmaster Tools of PHP files.
-
Hi,
Screaming frog doesn't create redirects. You need to use a mod_redirect or something similar.
Maybe, the best option for your problem it's creating a database of old pages -> new pages, and redirect all connections for unknown pages to these page.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 migration - Indexed Pages rising on old site
Hello, We did a 301 redirect from site a to site b back in March. I would check on a daily basis on the index count using query "site:sitename" The past couple of days, the old domain (that was 301 redirected) indexed pages has been rising which is really concerning. We did a 301 redirect back in march 2016, and the indexed count went from 400k pages down to 78k. However, the past 3 days it went from 78k to 89,500. And I'm worried that the number is going to continue to rise. My question - What would you do to investigate / how to investigate this issue? Would it be screaming frog and look at redirects? Or is this a unique scenario that I'd have to do other steps/procedures?
Intermediate & Advanced SEO | | ggpaul5620 -
Merging Two Unrelated Sites into a Third Site
We have a new client interested in possibly merging 2 sites into one under the brand of a new parent company. Here's a breakdown of the scenario..... BrandA.com sells a variety of B2B widget-services via their online store. BrandB.com sells a variety of B2B thing-a-majig products and services (some of them large in size) not sold through an online store. These are sold more consultatively via a sales team. The new parent company, BrandA-B.com is considering combining the two sites under the new brand parent company domain. The Widget-services and Thing-A-Majigs have very little similarity or purchase crossover; so just because you're interested in one doesn't make you a good candidate for the other. We feel pretty confident that we can round-up all the necessary pages and inbound links to do proper transitioning to a new, separate third domain though we're not in agreement that this is the best course of action. Currently the individual brand sites are fairly well known in their industry and each ranks fairly well for a variety of important terms though there is room for improvement and each site has good links with the exception of the new site which has considerably fewer. BrandA.com DA = 73 - 19 years old
Intermediate & Advanced SEO | | OPM
BrandB.com DA = 55 - 18 years old
BrandA-B.com DA = 40 - 1 year old Our SEO team members have opinions on what the potential outcome(s) of this would be but are wondering what the community here thinks. Will the combining of the sites cause a dilution of the topics of the two sites and hurt rankings? Will the combining of the domain authority help one set part of the business but hurt the other? What do you think? What would you do?0 -
Severe health issues are found on your site. - Check site health (GWT)
Hi, We run a Magento website - When i log in to Google Webmaster Tools, I am getting this message: Severe health issues are found on your site. - <a class="GNHMM2RBFH">Check site health
Intermediate & Advanced SEO | | bjs2010
</a>Is robots.txt blocking important pages? Some important page is blocked by robots.txt. Now, this is the weird part - the page being blocked is the admin page of magento - under
www.domain.com/index.php/admin/etc..... Now, this message just wont go away - its been there for days now - so why does Google think this is an "important page"? It doesnt normally complain if you block other parts of the site ?? Any ideas? THanks0 -
What this site is doing? Does it look like cloaking to you?
Hi here, I was studying our competitors SEO strategies, and I have noticed that one of our major competitors has setup something pretty weird from a SEO stand point for which I would like to know your thoughts about because I can't find a clear explanation for it. Here is the deal: the site is musicnotes.com, and their product pages are located inside the /sheetmusic/ directory, so if you want to see all their product pages indexed on Google, you can just type in Google: site:musicnotes.com inurl:/sheetmusic/ Then you will get about 290,000 indexed pages. No, here is the tricky part: try to click on one of those links, then you will get a 302 redirect to a page that includes a meta "noindex, nofollow" directive. Isn't that pretty weird? Why would they want to "nonidex, nofollow" a page from a 302 redirect? And how in the heck the redirecting page is still in the index?!! And how Google can allow that?! All this sounds weird to me and remind me spammy techniques of the 90s called "cloaking"... what do you think?
Intermediate & Advanced SEO | | fablau0 -
Need help or explanation on my site!
My site has suffered greatly since the recent Google update. I have done everything as suggested. I have had all bad links removed over 2 months ago. I have lowered keyword density (not easy since the keyword is in our company name!). I have rewritten various content and bolstered our existing content. What gives? What can I do? As an example the keyword, "maysville plumber" - I rank about 40th for this keyword. The first three pages are filled with websites with literally NO content or no added value. Maysville is a town of about 1k residents - there is no competition. Before the update I was #1 for years on this particular keyword. And this is the case with 35 other cities (mostly small cities, but a few larger ones). Please help me understand or suggest what I can possibly do at this point. We have hundreds of pages of unique content on each and every page. We have zero duplicate content (I have ran tests and crawlers). We have no fishy links. I have not gotten any messages from google on Webmasters. PLEASE HELP!! I asked a similar question a little while back and fixed all of the suggestions. My site is www.akinsplumbing.net.
Intermediate & Advanced SEO | | chuckakins0 -
Site structure question
Hello Everyone, I have a question regarding site structure and I would like to mastermind it with everyone. So I am optimizing a website for a Ford Dealership in Boston, MA. The way the site architecture is set up is as follows: Home >>>> New Inventory >>> Inventory Page (with search refinement choices) After you refine your search (lets say we choose a Ford F150 in white) it shows a page with images, price information and specs. (Nothing the bots or users can sink their teeth into) My thoughts are to create category pages for each Ford model with awesome written content and THEN link to the inventory pages. So it would look like this: Home >>> New Inventory >>> Ford 150 Awesome Category Page>>>>Ford F150 Inventory Page I would work hard at getting these category pages to rank for the vehicle for our GEO targeted locations. Here is my questions: Would you be annoyed to first land on a category page with lots of written text, reviews images and videos first and then link off to the inventory page. Or would you prefer to go right from the new inventory page to the actual inventory page and start looking for vehicles? Thanks you so much, Bill
Intermediate & Advanced SEO | | wparlaman0 -
What to do with WordPress generated pages?
I'm an SEOmoz Newbie and have a very specific question about the auto generated WordPress Pages. SEOmoz caught and labeled the auto generated WP pages as Crawl Warnings like: Long URL - 302 - Title Element to Long - Missing Meta Description Tag - Too Many On-Page Links So I have learned the lesson and have now made those pages "no follow" / "no idex." HOWEVER, WHAT DO I DO WITH THE ONES THAT HAVE ALREADY BEEN INDEXED? Do I... 1. Just leave them as is a hope they don't hurt me from an SEO perspective? 2. Redirect them all to a relevant page? I'm sure many people have had this issue. What do you think? Thanks Dominic
Intermediate & Advanced SEO | | amorbis0 -
On-Site Optimization Tips for Job site?
I am working on a job site that only ranks well for the homepage with very low ranking internal pages. My job pages do not rank what so ever and are database driven and often times turn to 404 pages after the job has been filled. The job pages have to no content either. Anybody have any technical on-site recommendations for a job site I am working on especially regarding my internal pages? (Cross Country Allied.com)
Intermediate & Advanced SEO | | Melia0