Googlebot and other spiders are searching for odd links in our website trying to understand why, and what to do about it.
-
I recently began work on an existing Wordpress website that was revamped about 3 months ago. https://thedoctorwithin.com. I'm a bit new to Wordpress, so I thought I should reach out to some of the experts in the community.Checking ‘Not found’ Crawl Errors in Google Search Console, I notice many irrelevant links that are not present in the website, nor the database, as near as I can tell. When checking the source of these irrelevant links, I notice they’re all generated from various pages in the site, as well as non-existing pages, allegedly in the site, even though these pages have never existed.
For instance:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/feedback-and-testimonials/ allegedly linked from:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/ (doesn’t exist)
In other cases, these goofy URLs are even linked from the sitemap. BTW - all the URLs in the sitemap are valid URLs.
Currently, the site has a flat structure. Nearly all the content is merely URL/content/ without further breakdown (or subdirectories). Previous site versions had a more varied page organization, but what I'm seeing doesn't seem to reflect the current page organization, nor the previous page organization.
Had a similar issue, due to use of Divi's search feature. Ended up with some pretty deep non-existent links branching off of /search/, such as:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/consultations/ allegedly linked from:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/ (doesn't exist).
I blocked the /search/ branches via robots.txt. No real loss, since neither /search/ nor any of its subdirectories are valid.
There are numerous pre-existing categories and tags on the site. The categories and tags aren't used as pages. I suspect Google, (and other engines,) might be creating arbitrary paths from these. Looking through the site’s 404 errors, I’m seeing the same behavior from Bing, Moz and other spiders, as well.
I suppose I could use Search Console to remove URL/category/ and URL/tag/. I suppose I could do the same, in regards to other legitimate spiders / search engines. Perhaps it would be better to use Mod Rewrite to lead spiders to pages that actually do exist.
- Looking forward to suggestions about best way to deal with these errant searches.
- Also curious to learn about why these are occurring.
Thank you.
-
Thanks, Kevin.
Glad I'm not the only one.
Disabling tags and categories aren't an option, in my case. Guess I need to look at more of the potential upside. Seems tags and categories, if handled correctly, could provide a new way to engage visitors and search engines.
I've heard people refer to 'spidering budgets, or whatnot'. Guess it's an entirely new topic of discussion... if limiting the spurious spider searching, (from good spiders,) means that said spiders will spend more time on the conventional pathways of a site.
-
Thanks, Vjay.
Did a lot of work fixing links in the database.
The issue was occurring even before implementation of WP super cache, and before the link fixing.
Being new-ish to WP, it seems strange that it's so willing to:
-
provide access via directories that don't really exist:
-
categories, tags, even search, if using a theme-provided site search.
I'm getting better at .htaccess, so I'm able to handle a lot of the old incoming links fairly well. In the case of these weird 'in the mind of the spiders' links, will be try to address these as well.
Thanks for your advice about 404 and 301 plugins. Time to look around and see what other useful tools are out there.
-
-
I have the same issue, I have stopped using tags because of all the irrelevant links they cause. Looking forward to reading the comments on this thread.
KJr
-
Hi There,
Your website is built on WordPress and it looks like that there might be spurious entries in the DB, which might also not be getting deleted due to the WP super cache plugin. You may try to empty your cache and install 'all 404 redirect' and 301 management plugins.
I hope this helps.
Regards,
Vijay
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitelink search in google search for Brand name redirect me to 404, how?
Hi All, When I search my brand name in google and in google search result my site appears with sitelink and in site link there is option of search when I search any keyword in that search then that search redirect me to 404 page of my site. I found I have implemented wrong schema at category page for search action and then I fixed the bug but 5 days passed away still google showing 404 of my search action. I have not implemented schema for search action at homepage. Now please let me know what is the issue?
Technical SEO | | amu1230 -
Website no longer visible Search Results
Overnight my website no longer appears in search engines for the two keywords I use. The website has been nicely climbing up (very steady progress to 42 and 73) the overnight it has vanished off the Radar. I have checked my webmaster account, no messages etc. Please can anyone shed any light on why this has happened? Website is http://www.securityjobsuk.co.uk Many thanks in advance for any help with this. D
Technical SEO | | SJUK0 -
Should we rel=nofollow these links ?
On our website, we have a section of free to low-cost tools that could help small business increase their productivity without spending big bucks. For example, this is the page for online collaboration tools: http://www.bdc.ca/EN/solutions/smart_tech/tech_advice/free_low_cost_applications/Pages/online_collaboration_tools.aspx None of the company pay anything to be on these list. We actually do quite a lot of research to chose which should be listed there and which should not. Recently, one of the company in our lists asked us to add rel=nofollow to the link to their website because they add been targeted by a manual action on Google and want their link profile to be as clean as possible (probably too clean). My question is : Should we add rel=nofollow to all these links ? Thanks, Jean-François Monfette
Technical SEO | | jfmonfette0 -
Slow website
Hi I have just migrated from a custom written php/mysql site to a site using wordpress and woocommerce. I couldnt believe the drop in speed . I am using a few plugins for wordpress - contact forms / social sharing. and I have a few woocommerce plugins for taking payment etc. I am hosting images css's and js's on W3 Total Cache and MAXCDN hoping to speed the site up but tools at http://tools.pingdom.com/fpt sometimes show that the time between browser request and reply can be between 1 and 15 secs. I have searched all day looking for a post I read about two months ago with a tool that seems to look at server responce and redirect processing etc hoping it would help but cant find it. If anyone knows what I am talking about I would appreciate them posting a link The site is http://www.synergy-health.co.uk and an example of an inner page is http://www.synergy-health.co.uk/home/shop/alacer-emergen-c-1000-mg-vitamin-c-acai-berry-30-packets-8-4-g-each/ Any suggestions please? Perhaps I have w3total cache set wrong? Also, as the has been tanked and is in freefal iin google ranking since January would this be a good time to change the structure of Url from home/shop/product to domain.name/brand/product? Thanks in advance !
Technical SEO | | StephenCallaghan0 -
Quality links are beneficial, but are neutral links detrimental?
So obviously a link profile featuring quality / authoritative / relavant in-bound links is preferable, but here's my question: If I'm starting work on a brand new domain, should I build links that one would consider neutral (i.e. from a non-spammy, but unrelated site) or should I not bother and only focus on quality links? Thanks
Technical SEO | | underscorelive0 -
Internal Linking
Hello there, I own a "how to" website with 1000+ articles, and the number of articles is growing every day. Often some articles are easier to understand if I link a certain step to an article that was written before, because that article explains the step in more detail. Should I use "read here/read more" or the "title of the article I'm referring to" as anchor text? When is internal linking too much? Should I use nofollow?
Technical SEO | | FisnikSylka0 -
Explain this search result
Hi folks, I came across a strange search result. Search on Google Australia for "income portfolio". http://www.google.com.au/search?sourceid=chrome&ie=UTF-8&q=income+portfolio See the first result? It's a login page. How is that search result showing? And in position #1! Where is it getting its title and descriptions tags from? Does Google have a way to somehow see what is behind the login? Appreciate your thought.
Technical SEO | | scotennis0 -
How to recover after blocking all the search engine spiders?
I have the following problem - one of my clients (a Danish home improvement company) decided to block all the international traffic (leaving only Scandiavian one), because they were getting a lot of spammers using their mail form to send e-mails. As you can guess this lead to blocking Google also since the servers of Google Denmark are located in the US. This lead to drop in their rankings. So my question is - What Shall I do now - wait or contact Google? Any help will be appreciated, because to be honest I had never see such thing in action until now 😄 Best Regards
Technical SEO | | GroupM0