Way to spider Wordpress site
-
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.
I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.
I tried using these spidering programs: WinHTTack Website Copier and PageNest
Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
-
Hi Dan
Hmm that's a little strange. Two things;
- is WordPress updated? Do you get the normal URLs when viewing in your browser?
- have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages
Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.
This blackhat world thread has a few options too.
-Dan
-
Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.
IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.
If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.
Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.
Hope that helped you out!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirects in site map
I have a site with the ace/sef ( creates friendly URLS) in a large data base site. It creates a site map dynamically. Yet I realize one issue which I am trying to think through. I recently changed my urls to include an ID number example: homepage/houses/1134-big-blue-house The prior url was: homepage/houses/big-blue-house the original url above redirects to the new one with the ID like I want. However the site map has both URLS in it which go to same page I am not sure but it seems rather stupid to have the new URL and OLD redirected URL in the site map. Yet beside stupid I am wondering if this is duplicate content and will cause a penalty from the google bot. What is your opinion ?
Technical SEO | | aimiyo0 -
Redesigning the site with same Domain (IMP.)
technical SEO question - If we take down a site and use the same domain but just redesign the whole site. I guess sometimes in this case Google still keeps indexing old pages though they do not exist now! What the solution for this? Google suggests redirect them to a 404 page but in this case as its same domain- Is it possible that we throw 404 errors and redirect them to 404 page and this 404 page exists in the new site itself (but of course we don't have link our menu to this 404 page) (if that makes sense)? Would appreciate if you can suggest or add anything to above topic.
Technical SEO | | Personnel_Concept0 -
Best way to change from one CMS to wordpress??
I have a client that was working with another SEO consultant and they pretty much dropped the ball big time! The SEO company took his site off of wordpress and put it on another CMS (http://www.wsinetsuccess.com/WSI-E-Fusion). My client would like to take hes site off of the WSI E Fusion platform and back onto wordpress. My question is how is the best way to going about doing this with out loosing all of the PR to the site? Should I find all of the URLs that are on the site and just 301 them to the right page? Thank you very much for your help.
Technical SEO | | pakevin0 -
Redirect from old wordpress site to new php site? Best approach
Hi I have two websites one legacy site done in wordpress the other in php. However I would like to merge the two together and remove the wordpress site. However it has a good link profile and the pages rank well. What is the best approach to do a 301 redirect from the old site with all its pages pointing to the homepage of the new site? If so what's the best way to do this in wordpress? Many thanks
Technical SEO | | ocelot0 -
Best way to redirect 3 sites to 1 new one.
Hi All We currently have 3 old sites that have tones of content. Due to brand/business consolidation we have merge all 3 to produce 1 website. The new site contains all the old content from the old 3. So, I know I need to 301 redirect all the old content from the previous sites to the equivelent content on the new sites but am confused how you do this with 3 domains? One of the domains is being replaced with the new site. So I have: www.domain1.co.uk www.domain2.co.uk www.domain3.co.uk All the content for all the sites have been imported into a new site and any duplicate content issues havce been resolved. Can anyone point me in the right direction? Thanks
Technical SEO | | EclipseLegal0 -
Canonical Issues with Wordpress
Hi all, I have just started using Wordpress SEO by Yoast and still having a hard time correcting my Canonical issues for all posts with a .html at the end. The pluggin allows you to add a '/' to the end for canonical issues, but just for pages, not posts. How best in Wordpress to make my post change from .html/ to .html. I really don't want to go to the hassle to make each URL a new 301 redirect in my .htaccess. I hate the .html, but if they are going to stay, how can I make sure I get the .html/ link juice back to them. Many thanks!
Technical SEO | | RunningInTheRain0 -
How do you stop Wordpress spam
What's the best way to stop Wordpress spam? We don't let comments go live without moderation, so the spammers don't succeed, however it wastes time going through the comments. A captcha code could work but a lot of software can crack it. Are there any good captcha solutions or could something else work better/in conjunction? Also, is there anywhere to report spam IP addresses? Not sure much happens when you mark a comment as spam in Wordpress.
Technical SEO | | giantpeach1 -
Delete old site but redirect domain to a new domain and site
I just have a quick query and I have a feeling about what the answer is so just wanted to see what you guys thought... Basically I am working on a client site. This client has a few other websites that are divisions of their company. However these divisions/websites are no longer used. They are wanting to delete the websites but redirect the domains to their name main website. They believe this will pass on SEO benefits as these old division sites are old and have a good PR and history. I'm unsure for DEFINITE, which way is correct?
Technical SEO | | Weerdboil0