Googlebot and other spiders are searching for odd links in our website trying to understand why, and what to do about it.
-
I recently began work on an existing Wordpress website that was revamped about 3 months ago. https://thedoctorwithin.com. I'm a bit new to Wordpress, so I thought I should reach out to some of the experts in the community.Checking ‘Not found’ Crawl Errors in Google Search Console, I notice many irrelevant links that are not present in the website, nor the database, as near as I can tell. When checking the source of these irrelevant links, I notice they’re all generated from various pages in the site, as well as non-existing pages, allegedly in the site, even though these pages have never existed.
For instance:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/feedback-and-testimonials/ allegedly linked from:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/ (doesn’t exist)
In other cases, these goofy URLs are even linked from the sitemap. BTW - all the URLs in the sitemap are valid URLs.
Currently, the site has a flat structure. Nearly all the content is merely URL/content/ without further breakdown (or subdirectories). Previous site versions had a more varied page organization, but what I'm seeing doesn't seem to reflect the current page organization, nor the previous page organization.
Had a similar issue, due to use of Divi's search feature. Ended up with some pretty deep non-existent links branching off of /search/, such as:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/consultations/ allegedly linked from:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/ (doesn't exist).
I blocked the /search/ branches via robots.txt. No real loss, since neither /search/ nor any of its subdirectories are valid.
There are numerous pre-existing categories and tags on the site. The categories and tags aren't used as pages. I suspect Google, (and other engines,) might be creating arbitrary paths from these. Looking through the site’s 404 errors, I’m seeing the same behavior from Bing, Moz and other spiders, as well.
I suppose I could use Search Console to remove URL/category/ and URL/tag/. I suppose I could do the same, in regards to other legitimate spiders / search engines. Perhaps it would be better to use Mod Rewrite to lead spiders to pages that actually do exist.
- Looking forward to suggestions about best way to deal with these errant searches.
- Also curious to learn about why these are occurring.
Thank you.
-
Thanks, Kevin.
Glad I'm not the only one.
Disabling tags and categories aren't an option, in my case. Guess I need to look at more of the potential upside. Seems tags and categories, if handled correctly, could provide a new way to engage visitors and search engines.
I've heard people refer to 'spidering budgets, or whatnot'. Guess it's an entirely new topic of discussion... if limiting the spurious spider searching, (from good spiders,) means that said spiders will spend more time on the conventional pathways of a site.
-
Thanks, Vjay.
Did a lot of work fixing links in the database.
The issue was occurring even before implementation of WP super cache, and before the link fixing.
Being new-ish to WP, it seems strange that it's so willing to:
-
provide access via directories that don't really exist:
-
categories, tags, even search, if using a theme-provided site search.
I'm getting better at .htaccess, so I'm able to handle a lot of the old incoming links fairly well. In the case of these weird 'in the mind of the spiders' links, will be try to address these as well.
Thanks for your advice about 404 and 301 plugins. Time to look around and see what other useful tools are out there.
-
-
I have the same issue, I have stopped using tags because of all the irrelevant links they cause. Looking forward to reading the comments on this thread.
KJr
-
Hi There,
Your website is built on WordPress and it looks like that there might be spurious entries in the DB, which might also not be getting deleted due to the WP super cache plugin. You may try to empty your cache and install 'all 404 redirect' and 301 management plugins.
I hope this helps.
Regards,
Vijay
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Broken URL Links
Hi everyone, I have a question regarding broken URL links on my website. Late last year I move my site from an old platform to Shopify, and now have broken URL links giving out 4xx errors. When I look at Moz Pro>Campaigns>Insights>links, I can see the top broken URL links, however there is a difference if copy & paste URL directly from Moz Pro and by Export CSV file. For example below, If I copy and paste links direct from Moz Pro, it has the “http://” in front as below: http://www.thehairhub.com.au/WebRoot/ecshared01/Shops/thehairhub/57F3/1D8F/D244/C675/E27D/AC10/003F/35AD/manic-panic-colours.jpg But when I export the list of links as an CSV file, the http:// is removed. www.thehairhub.com.au/WebRoot/ecshared01/Shops/thehairhub/57F3/1D8F/D244/C675/E27D/AC10/003F/35AD/manic-panic-colours.jpg Another Example below: By copy & paste URL direct from Moz Pro
Technical SEO | | johnwall
http://thehairhub.com.au/Shop-Brands/Vitafive-CPR/CPR-Rescue By export CSV file.
thehairhub.com.au/Shop-Brands/Vitafive-CPR/CPR-Rescue Which one do I use to enter into the “Redirect From” field in Shopify URL Redirects? Do I need to have the http:// in front of the URL? Or is it not required for redirects to work? Kind Regards, John Wall
The Hair Hub0 -
Duplicate Version of My Website
Hello Again, Looking for a little help to help me understand what exactly is going on here. Ive taken over maintenance of a website and have so far fixed a lot of issues. ahrefs has shown me that a second version of my companies website exists that exists at a second url. This second website is linked to the actual company website like I haven't seen before. www(dot)#(dot)co(dot)uk is the main company website. But a second accessible version exists and is accessible at www(dot)#(dot)co(dot)uk The instruments version is a direct copy and all of the links point directly to my main site. Any changes I make on the main version are automatically applied to the other version. It shows up as a SPAM back link on moz as all of the link points to my website etc Ideally in my mind, the instruments version homepage should simply re-direct to the main homepage to solve this "duplicate content and spammy backlink" issue however, the instruments version is the same suffix that all our company emails work with. Basically, HELP lol. I have no understanding of how this is set up, and the best way in which to deal and if it could affect anything such as company emails.
Technical SEO | | ATP0 -
How do I optimize a website for SEO for a client that is using a subdirectory as a seperate website?
We launched a subdirectory site about two months ago for our client. What's happening is searches for the topic covered by the subdirectory are yielding search results for the old site and not the new site. We'd like to change this. Are there best practices for the subdirectory site Specifically we're looking for things we can do using sitemapping and Webmaster tools. Are there other technical things we can do? Thanks you.
Technical SEO | | IVSeoTeam120 -
My beta site (beta.website.com) has been inadvertently indexed. Its cached pages are taking traffic away from our real website (website.com). Should I just "NO INDEX" the entire beta site and if so, what's the best way to do this? Please advise.
My beta site (beta.website.com) has been inadvertently indexed. Its cached pages are taking traffic away from our real website (website.com). Should I just "NO INDEX" the entire beta site and if so, what's the best way to do this? Are there any other precautions I should be taking? Please advise.
Technical SEO | | BVREID0 -
HUGE decrease in links since website redesign
We recently added several new pages pages to our website. These new pages were constructed on a dev site, and then pushed live. Since the new site has gone live I have seen a huge decline in links. My external followed links have dropped from 3000 to 500 and my total website links have fallen from 35,000 to 4,500. I have done some research, and I think there is a server side issue. Where multiple versions of my URL may be running. The majority of the links built were pointing to the homepage. That being said I do not have access to our in-house dev person this week, so I am trying to identify the problem myself. I have used screaming frog to crawl my site and did not see any errors which stand out. I realize I probably need to use 301 redirects to solve this problem, I just need some guidance on how to identify what I need to 301 redirect. Second question. If I move a landing page out of the global navigation but it can still be reached through other pages on the website , will this cause issues?
Technical SEO | | GladdySEO0 -
Track outbound links
I would like to track outbound links at http://bit.ly/yYHmbf 1. Shall i add the following code before at the above page What does 100 means in above code ? 2. Then use this for each outgoing link ``` [onClick="recordOutboundLink(this, 'Outbound Links', 'example.com');return false;">](http://www.example.com) ``` [](http://www.example.com) ```[``` http://www.example.com is the outbound link Am i right on both counts ? where should i look for report in GA ? ```](http://www.example.com)
Technical SEO | | seoug_20050 -
SEO LINKS
New to S.E.O. so excuse my naivety. I have made lots of new links some of them paid for e.g. Best of the Web but I don’t see any change in the latest competitive link analysis. Some of the links we have been accepted for just do not show. Also the keywords we are trying to promote the most have disappeared off the radar for over 2 weeks now. I think we have followed the optimization suggestions correctly. Please could you enlighten me. Regards Paul www.curtainpolesemporium.co.uk
Technical SEO | | CPE0 -
Website has been penalized?
Hey guys, We have been link building and optimizing our website since the beginning of June 2010. Around August-September 2010, our site appeared on second page for the keywords we were targeting for around a week. They then dropped off the radar - although we could still see our website as #1 when searching for our company name, domain name, etc. So we figured we had been put into the 'google sandbox' sort of thing. That was fine, we dealt with that. Then in December 2010, we appeared on the first page for our keywords and maintained first page rankings, even moving up the top 10 for just over a month. On January 13th 2011, we disappeared from Google for all of the keywords we were targeting, we don't even come up in the top pages for company name search. Although we do come up when searching for our domain name in Google and we are being cached regularly. Before we dropped off the rankings in January, we did make some semi-major changes to our site, changing meta description, changing content around, adding a disclaimer to our pages with click tracking parameters (this is when SEOmoz prompted us that our disclaimer pages were duplicate content) so we added the disclaimer URL to our robots.txt so Google couldn't access it, we made the disclaimer an onclick link instead of href, we added nofollow to the link and also told Google to ignore these parameters in Google Webmaster Central. We have fixed the duplicate content side of things now, we have continued to link build and we have been adding content regularly. Do you think the duplicate content (for over 13,000 pages) could have triggered a loss in rankings? Or do you think it's something else? We index pages meta description and some subpages page titles and descriptions. We also fixed up HTML errors signaled in Google Webmaster Central and SEOmoz. The only other reason I think we could have been penalized, is due to having a link exchange script on our site, where people could add our link to their site and add theirs to ours, but we applied the nofollow attribute to those outbound links. Any information that will help me get our rankings back would be greatly appreciated!
Technical SEO | | bigtimeseo0