Way to spider Wordpress site
-
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.
I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.
I tried using these spidering programs: WinHTTack Website Copier and PageNest
Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
-
Hi Dan
Hmm that's a little strange. Two things;
- is WordPress updated? Do you get the normal URLs when viewing in your browser?
- have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.
This blackhat world thread has a few options too.
-Dan
-
Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.
IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.
If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.
Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.
Hope that helped you out!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Has anyone had problems with Wordpress plugins on their blog causing payment issues on the main site?
Looking to migrate a subdomain Wordpress site onto the main domain, but the payment system breaks based on one or more of the plugins used on the blog having been linked with spammy activity in the past. Need to isolate the plugin and remove before migrating or it'll break the site! Has anyone had any similar issues with some of the following plugins? Akismet Wordfence Security Subscribe2 Timber Backup Buddy
Technical SEO | | Amelia.Coleby0 -
How Often is Site Crawled
Good morning- I saw some errors in my first crawl and immediately removed the pages from my website. I then re-created my XML sitemap and uploaded to Google. The question I have is will the site be crawled to recognize the changes in the next day or so? The pages were just placed on the site as test pages and never removed. The initial crawl that notified me it was done found the errors and were removed. Thanks for your help. Peter
Technical SEO | | VT_Pete0 -
Wordpress html page
Hi, WE are designing new agency site which contain from then 100 page. Which URL is best excample.com/about/ or excample/about.html excample.com/service/ or excample/service.html
Technical SEO | | srinathk0 -
Two sites
Hi there just joined had nightmere of a time trying to get a website up and running..... now i have 2 .... one marketing person did and one i did the one i did performing better on google but other onre looks more profetional is there a way i can conbine the 2 under one site..... the one that looks better and getting the benifit of the one thats performing better...... Thanks steve......
Technical SEO | | stevetemple0 -
Www. version of my site shows nothing in Open Site Explorer
When I first setup my site the domain was learnbonds.com. I moved hosts a couple of months ago and as part of the process I asked them to make the site show as www.learnbonds.com which they did. Now however when I goto www.learnbonds.com in open site explorer it says there is no data. When I enter learnbonds.com into open site explorer it gives me data but says that the site has been redirected to the www. version which shows no data. Also in google webmaster when I try to set the preferred domain as the www. version it gives me the following message: Part of the process of setting a preferred domain is to verify that you own http://www.learnbonds.com/. Please verify http://www.learnbonds.com/. I am concerned that this is hurting my SEO and would appreciate any advice you can give. Thanks Dave
Technical SEO | | fxtrader19790 -
Internal Links on eCommerce sites
I have been working on an eCommerce site; www.pretavoir.co.uk over the past year. Improvements in SERPs have been good with many top three positions. However, there are other important keywords of similar difficulty which refuse to behave in a similar way.... The site is PR4 and has a homepage PA 52. The homepage includes links to internal brand pages eg Prada, Gucci etc. Q Would it be worthwhile creating footer anchor text with eaxct text eg Prada sunglasses, Gucci Sunglasses?? Thanks
Technical SEO | | seanmccauley0 -
How to write 301 redirects in WordPress
I've successfully migrated new site to new domain (www.cmsearchmarketing.com) But I cannot get 301 redirects for pages and blog posts to redirect from the old domain (www.creativemindsearchmarketing.com). And it's my understanding I need to do a 301 for each page to maintain SEO. Here's what I've tried: RewriteCond %{QUERY_STRING} ^p=975$RewriteRule ^index.php$ http://www.cmsearchmarketing.com/top-5-questions-to-ask-an-seo-firm-before-signing-up/? [R=301,L] BEGIN WordPress<ifmodule mod_rewrite.c="">RewriteEngine OnRewriteBase /RewriteCond %{REQUEST_FILENAME} !-fRewriteCond %{REQUEST_FILENAME} !-dRewriteRule . /index.php [L]</ifmodule># END WordPress #AND ALSO# Use PHP5 Single php.ini as defaultAddHandler application/x-httpd-php5s .php BEGIN WordPress<ifmodule mod_rewrite.c="">RewriteEngine OnRewriteBase /RewriteCond %{REQUEST_FILENAME} !-fRewriteCond %{REQUEST_FILENAME} !-dRewriteRule . /index.php [L]</ifmodule># END WordPress redirect 301 /top-5-questions-to-ask-an-seo-firm-before-signing-up http://www.cmsearchmarketing.com/top-5-questions-to-ask-an-seo-firm-before-signing-up/ Any suggestions would be appreciated. _Cindy P.S. Maybe some other issues are in the way: --Old site is WP-Remix theme no longer supported, and latest WP version is 2.9.1 -- Old domain (www.creativemindsearchmarketing.com) is the primary account on BlueHost …and the new domain (www.cmsearchmarketing.com) is an addon, so the new domain's directory is within root of old domain. -- in root domain of old site there are other "handler files" that also have base file rewrites, if this is an issue: name of this file in root directory is:
Technical SEO | | CeCeBar
.htaccess.addHandlerBak -FrontPage- <limit get="" post="">order deny,allowdeny from allallow from all</limit><limit put="" delete="">order deny,allowdeny from all</limit>AuthUserFile /home/creatjo7/public_html/_vti_pvt/service.pwdAuthGroupFile /home/creatjo7/public_html/_vti_pvt/service.grp# BEGIN WordPress<ifmodule mod_rewrite.c="">RewriteEngine OnRewriteBase /RewriteCond %{REQUEST_FILENAME} !-fRewriteCond %{REQUEST_FILENAME} !-dRewriteRule . /index.php [L]</ifmodule> END WordPressAuthName creativemindsearchmarketing.comIndexIgnore .htaccess /.?? *~ *# /HEADER /README /_vti0 -
Site Architecture Trade Off
Hi All I'm looking for some feedback regarding a site architecture issue I'm having with a client. They are about to enter a re-design and as such we're restructuring the site URLs and amending/ adding pages. At the moment they have ranked well off the back of original PPC landing pages that were added onto the site, such as www.company.com/service1, www.company.com/service2, etc The developer, from a developer point of view wished to create a logical site architecture with multiple levels of directories etc. I've suggested this probably isn't the best way to go, especially as the site isn't that large (200-300 pages) and that the key pages we're looking to rank should be as high up the architecture as we can make them, and that this amendment could hurt their current high rankings. It looks like the trade off may be that the client is willing to let some pages be restructured so for example, www.company.com/category/sub-category/service would be www.company.com/service. However, although from a page basis this might be a solution, is there a drawback to having this in place for only a few pages rather than sitewide? I'm just wondering if these pages might stick out like a sore thumb to Google.
Technical SEO | | PerchDigital1