Way to spider Wordpress site
-
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages.
I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects.
I tried using these spidering programs: WinHTTack Website Copier and PageNest
Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
-
Hi Dan
Hmm that's a little strange. Two things;
- is WordPress updated? Do you get the normal URLs when viewing in your browser?
- have you tried Screaming Frog SEO Spider? It's free to crawl up to 500 pages Although it won't get the actual HTML on the pages, it could solve the URL issue perhaps.
This blackhat world thread has a few options too.
-Dan
-
Hi Dan, I'm not so experienced in migrating a WP to non -wp but I understand that the issue you're having is that the spider is returning index.htmlfiles for urls like domain/page/.
IT's normal, any spider you will use you'll always have and index.html file. Every directory has it's index.html which is the default file to show if you're not establishing something different with rewrite rules.
If you write /page/ the browser will read the index.html file. What you have to be sure is that you'll set up a 301 redirect to avoid any index.html url to show and have it redirected to the main / page (with wildcards is a one line rule) and that your internal links are pointing all to / pages and not to index.html version of it. You can jsut find and replace the /index.html" string into the html code with the /" text (dreamweaver or any html editor will do that in bulk.
Only one commentary on you idea is that you may consider useful to build a php driven site, using includes for header, footer and nav/sidebar, jsut because thinking ahead if you're willing to make changes to a portion of the page repeating throughout the site you'll have to make changes in all pages and uplaod them all which is quite huge to do and also let space for many human/machine errors.
Hope that helped you out!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
PortfolioID urls appearing in my wordpress site- what to do?
Hey guys, Hoping someone may have some advice on a wordpress site. Most of their URL's are duplicates due to a PortfolioID appearing in the URLs causing a duplicate title tags
Technical SEO | | Swanny_s
It's the same page but it's being flagged as duplicate. Would you remove the portfolioID url or 301 redirect? Many thanks
Simon0 -
On our site by mistake some wrong links were entered and google crawled them. We have fixed those links. But they still show up in Not Found Errors. Should we just mark them as fixed? Or what is the best way to deal with them?
Some parameter was not sent. So the link was read as : null/city, null/country instead cityname/city
Technical SEO | | Lybrate06060 -
Squarespace or Wordpress for a Photographer
Hi, I was wondering if people would recommend squarespace or wordpress for a photographer. I'm mainly curious about how wordpress uses internal links for their images and squarespace images exist on http://static1.squarespace.com. Wouldn't a photographers website, one that focuses on images, be better on wordpress for this readson?
Technical SEO | | mattdinbrooklyn1 -
What Would i do to get my site ranking high?
Hello Friends, I need your help please tell me what would I do to get my site ranking high in Google search engine. When I start my work on my site my work blog commenting , social bookmarking, keyword targeting etc.… But now the scene is completely changing. Now I am working on just guest blogging. I don’t understand that what would I do next after the guest blogging. Because I think there is now just one way to promote your site VIA guest blogging. Now please tell me is there any other option to work and get high ranking?
Technical SEO | | KLLC0 -
Wordpress Category Archives
Wordpress question here. Can anyone tell me if there is an SEO advantage to creating a page filtered to show results from an individual category as opposed to simply linking to the category archive? The content is identical in both cases.
Technical SEO | | waynekolenchuk0 -
How a google bot sees your site
So I have stumbled across various websites like this: http://www.smart-it-consulting.com/internet/google/googlebot-spoofer/ The concept here is to be able to view your site as a googlebot sees it. However, the results are a little puzzling. Google is reading the text on my page but not the title tags according to the results. Are websites like this accurate OR does Google not read title tags and H1 tags anymore? Also on a slighly related note. I noticed the results show the navigation bar is being read first by google, is this bad and should the navigation bar be optimized for keywords as well? If it did, it would read a bit funny and the "humans" would be confused.
Technical SEO | | StreetwiseReports0 -
SEO friendly way to move a wordpress installation
Hi Mozzers I am working with a client who currently has 2 wordpress installations on their site - one is in the root domain and one is in a subdirectory /hub which is where the majority of their content is. They want to move all of their content over from the /hub directory into the root installation. Any ideas of the most SEO friendly way to do this? Thanks for any suggestions.
Technical SEO | | beva0 -
What should be noindexed on a Wordpress blog?
I know this can be a "it depends" answer so I'll try to explain. Qualifications on your answers would be great. I use the Wordpress architecture for myself and clients on sites and blogs. Almost every business site we create has a blog and I'm always working to improve results on them. My strategy has been the following: Categories: General, main content types, general keywords. Index, follow Tags: Very specific, post specific, may only be used once for one post. My categories have descriptions that are displayed on the category pages with excerpts. Tags rarely have a description but are displayed with excerpts on the page. My idea has been to index the categories to crawl the content and they have unique content by showing the category description. Tags shouldn't be archived because they may be all over the place and may have only 1 post with no tag description. I'm trying to reduce duplicate content but I don't want to limit results for my clients and myself. Should I set tags to noindex, follow or should I have them indexed? The only thing I'm thinking with having the tags indexed is that I may be able to get additional traffic through the more specific tags (i.e. tag = meta tags, category = SEO).
Technical SEO | | JaredDetroit0