Way to spider Wordpress site
-
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.
I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.
I tried using these spidering programs: WinHTTack Website Copier and PageNest
Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
-
Hi Dan
Hmm that's a little strange. Two things;
- is WordPress updated? Do you get the normal URLs when viewing in your browser?
- have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.
This blackhat world thread has a few options too.
-Dan
-
Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.
IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.
If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.
Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.
Hope that helped you out!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexing of Site Map
We recently launched a new site - on June 4th we submitted our site map to google and almost instantly had all 25,000 URL's crawled (yay!). On June 18th, we made some updates to the title & description tags for the majority of pages on our site and added new content to our home page so we submitted a new sitemap. So far the results have been underwhelming and google has indexed a very low number of the updated pages. As a result, only a handful of the new titles and descriptions are showing up on the SERP pages. Any ideas as to why this might be? What are the tricks to having google re-index all of the URLs in a sitemap?
Technical SEO | | Emily_A0 -
Switching site from http to https. Should I do entire site?
Good morning, As many of you have read, Google seems to have confirmed that they will give a small boost to sites with SSL certificates this morning. So my question is, does that mean we have to switch our entire site to https? Even simple information pages and blog posts? Or will we get credit for the https boost as long as the sensitive parts of our site have it? Anybody know? Thanks in advance.
Technical SEO | | rayvensoft1 -
Site Ranking Ahead of Us - Why?
I don't want to give details, but a competitor's site ranks ahead of us by a spot or 2 for some of our major keywords, and we can't figure out why. We have more content indexed, we have better content (my opinion), we're better optimized, we have significantly better MozRank and Majestic rankings, we have more links, we have better links. They have a plain Jane Wordpress blog, and their pages don't even have 300 characters. They don't have any awesome links. There are plenty of other sites besides ours that should outrank them. When I am using the various tools at my disposal, I can see why a competitor outranks us. I'm not obsessing, but this one, I just don't understand. Has anyone had this experience?
Technical SEO | | CsmBill0 -
Not sure which way to go or what to do?
Hi there, I have been a pro member of SEOmoz for a while now but this is my question in the forum and although I have looked through so much helpful information I was wondering if someone could give me some further advice and guidance? I have a 3 year old ecommerce website personalisedmugs.co.uk which until May 2012 had some excellent growth, we then lost around 50% of traffic due to reduced organic rankings in google. We then noticed a further drop again in September. From researching information I believe this drop was from the penguin update and EMD update? Since these updates we have: *Stopped working with a company in India whom was looking after SEO for us for 18 months redeveloped/designed website and upgraded software version constantly refreshed website with content as we always have done Modified internal anchor text (this did seem keyword rich) My next steps I believe before giving up 😞 is checking our links coming into website? Is anybody able to please help me with regards to our links or point me in the right direction. I have no idea where to start or what do now? Someone may see something really obvious so any help or guidance is greatly appreciated to assist me in gaining some UK organic rankings back. Kind Regards, Mark
Technical SEO | | SparkyMarky0 -
Brand New Site Penalized?
I recently launched 2 completely separate and unrelated websites at the same time. Both are new domains and hosting accounts. neither have any links. One is ranking for a branded search and the other is not. The interesting thing is that I tested both sites on the back end of my server before launch. The site that is not ranking for branded search IS ranking still on the back end of my site for the branded search. I have removed all content and 301 redirected the testing urls back to my portfolio page. Could this be do to Google indexing one but not the other. Does it have anything to do with testing on my server first and my DA being higher than current new sites? Or is it something completely different I'm missing completely. Is this a Penalty?
Technical SEO | | CDUBP0 -
How to find all the links to my site
hi i have been trying to find all the links that i have to my site http://www.clairehegarty.co.uk but i am not having any luck. I have used the open explorer but it is not showing all the links but when i go to my google webmaster page it shows me more pages than it does on the semoz tool. can anyone help me sort this out and find out exactly what links are going into my site many thanks
Technical SEO | | ClaireH-1848860 -
How to do a no follow on site search
We have a site search that is causing a huge amount of errors as the SEOmoz crawler is showing these as duplicate content. Our first thought was to do a no-follow on the site-search directory, but we realized that the site search is /site-search.aspx and URl strings appear at the end for hundreds of pages. How dow we/how can we no-follow an undetermined amount of URL strings?
Technical SEO | | Apptixweb0