Way to spider Wordpress site
-
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.
I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.
I tried using these spidering programs: WinHTTack Website Copier and PageNest
Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
-
Hi Dan
Hmm that's a little strange. Two things;
- is WordPress updated? Do you get the normal URLs when viewing in your browser?
- have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.
This blackhat world thread has a few options too.
-Dan
-
Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.
IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.
If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.
Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.
Hope that helped you out!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Squidoo vs Personal Site
Hey guys I'm Nikolas a newb, just signed up to the pro membership trial after alot of digging on the seomoz blog for months . First off let me tell you alittle about my story and seo knowledge. I started off online on the well known squidoo site with revenue sharing, because of my day job I had alot of time to work on my articles and build up to a nice monthly salary of just over 1k in less than 5 months which doubled and trippled in the last few months. Seo is like a 6th sense to me , onpage offpage and the lots. Most of what I read here is not new to me or something I didn't already know about, but its good to freshen up and remember things, as theres alot to search engine optimization. I have built up to over 500k unique visitors in less than a year and have decided to move on to my own site 4 months ago. The niche is the exact same one I have targeted on squidoo. My site had alot of issues at the start the classic 301 redirection ht_access fix I had to do,content management system building low quality content pages via tags that i have fixed(noindex) and removed with 404s, build up original unique valuable posts, interlink ,onpage and offpage seo the basics I did for squidoo. The problem here is that I can't seem to get any traction from google where as my squidoo search engine traffic is 80% , my sites google traffic is 5-10%. I have the same number of articles on both sites, similar topics , similar onpage offpage optimisation basically identical but have alot better content on my new site. My bing, yahoo and referral traffic is rising everyday but as I know google is 85% of the market share I am leaving alot of money on the table. I hope that most of you more dedicated seo's can give me a tip or two and explain exactly what is going on with my situation and if possible take a look at my site hardwarepal .
Technical SEO | | NikolasNikolaou0 -
Moving an eCommerce Site to Wordpress
I'm evaluating moving an established eCommerce I own over to a WordPress based site with a woocommerce plugin. My question is, does the added /category/ slug hurt SEO rankings at all?
Technical SEO | | CobraJones950 -
What is wrong with my site?
I have been working hard for over two months on my sites in seomoz and have seen some nice results in some (www.etraxc.com/ and www-my-etraxc.com for instance. Still I am really frustrated by www.classroomconnection.us/. I cant even get on the first page with the search term "classroom connection." i would love some help on this one. On a related note, does it help to have links to YouTube videos about the content? If so, how do I ensure that this piece is working well for me? Thanks a ton!
Technical SEO | | bobbabuoy0 -
404 Error from site - is this normal?
I have been trying to clean up any 404 errors. We keep getting the following: URL /include/vdimgck.php referring domain http://www.856d.c@m/plus/feedback.php rendered domain unclickable by adding the "@" since I do not know if it is safe. I just turned off the trackbacks and pings in the blog since I saw it was producing duplicate content and from what I read it is not worth keeping those with Wordpress. is vdimgck.php anything some here instantly recognizes ? It tops all our 404 errors, seems like a lot of requests. Thanks!
Technical SEO | | Force70 -
Site dropped after article submissions
I have a site about back pain. The name is lower back pain relief. It has a lot of content on how to treat back pain with natural remedies. I paid for botw.org and yahoo directories as well as submissions from submit edge. My site was doing well within a few months. I was floating around at the bottom of page two for "lower back pain relief" and lower back pain. I found out about Unique article wizard. I started using that to write and submit articles. Now, my website is not ranking anywhere. I submitted a request for reconsideration to Google. They said no manual action had been taken. The question: How long does this penalty last? Is that website toast now? I think it was penalized for using the anchor text too much or adding too many links at once. It has been penalized for about 6 weeks now.
Technical SEO | | Naturalhealth150 -
How do you stop Wordpress spam
What's the best way to stop Wordpress spam? We don't let comments go live without moderation, so the spammers don't succeed, however it wastes time going through the comments. A captcha code could work but a lot of software can crack it. Are there any good captcha solutions or could something else work better/in conjunction? Also, is there anywhere to report spam IP addresses? Not sure much happens when you mark a comment as spam in Wordpress.
Technical SEO | | giantpeach1 -
.CA site same as .com site - are both necessary?
Dear Friend, We representa a major national brand in the auto care industry, and they have locations in both US and Canada. There is a primary content site at .com that we have duplicated at .ca. We are hosting the .ca site on a separate IP on a server in Canada - but by in large it is the same site. (there are some minor changes we made to change US English to Canadian English - though minor. When we search Google.ca we generally see strong search results for the .com site, but rarely, if ever any evidence of rankings for the .ca site. The .com site was launched several years ago about 18 months before the .ca site. Why doesn't Google.ca show the .ca site? Is this an issue of duplicate content, and Google.ca simply shows the .com version which it knew about first? Are we wasting our time, money and efforts having both? Thanks, Tim ps. this isn't about location. We use a separate site to locate local shops, and have coordinated that well with Google Places, and when looking for local auto care - we do well in both US and Canada. The sites described above are largetl content sites.
Technical SEO | | lunavista-comm0