Way to spider Wordpress site

DanCrean

I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.

I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.

I tried using these spidering programs: WinHTTack Website Copier and PageNest

Does anyone know of another method of turning a Wordpress site into a non Wordpress site?

evolvingSEO

Hi Dan

Hmm that's a little strange. Two things;

is WordPress updated? Do you get the normal URLs when viewing in your browser?
have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.

This blackhat world thread has a few options too.

-Dan

mememax

Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.

IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.

If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.

Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.

Hope that helped you out!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Way to spider Wordpress site

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Does anyone know the linking of hashtags on Wix sites does it negatively or postively impact SEO. It is coming up as an error in site crawls 'Pages with 404 errors' Anyone got any experience please?

Site Hack In Meta Description

Move a Wordpress Site to HTTPS with Bluehost

What's the best way for users to upload their images to my wordpress site to promote UGC

What is the best way to redirect visitors to certain pages of your site based on their location?

Mobile site ranking instead of/as well as desktop site in desktop SERPS

404 Errors After Site Migration

Delete old site but redirect domain to a new domain and site

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved