Googlebot and other spiders are searching for odd links in our website trying to understand why, and what to do about it.
-
I recently began work on an existing Wordpress website that was revamped about 3 months ago. https://thedoctorwithin.com. I'm a bit new to Wordpress, so I thought I should reach out to some of the experts in the community.Checking ‘Not found’ Crawl Errors in Google Search Console, I notice many irrelevant links that are not present in the website, nor the database, as near as I can tell. When checking the source of these irrelevant links, I notice they’re all generated from various pages in the site, as well as non-existing pages, allegedly in the site, even though these pages have never existed.
For instance:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/feedback-and-testimonials/ allegedly linked from:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/ (doesn’t exist)
In other cases, these goofy URLs are even linked from the sitemap. BTW - all the URLs in the sitemap are valid URLs.
Currently, the site has a flat structure. Nearly all the content is merely URL/content/ without further breakdown (or subdirectories). Previous site versions had a more varied page organization, but what I'm seeing doesn't seem to reflect the current page organization, nor the previous page organization.
Had a similar issue, due to use of Divi's search feature. Ended up with some pretty deep non-existent links branching off of /search/, such as:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/consultations/ allegedly linked from:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/ (doesn't exist).
I blocked the /search/ branches via robots.txt. No real loss, since neither /search/ nor any of its subdirectories are valid.
There are numerous pre-existing categories and tags on the site. The categories and tags aren't used as pages. I suspect Google, (and other engines,) might be creating arbitrary paths from these. Looking through the site’s 404 errors, I’m seeing the same behavior from Bing, Moz and other spiders, as well.
I suppose I could use Search Console to remove URL/category/ and URL/tag/. I suppose I could do the same, in regards to other legitimate spiders / search engines. Perhaps it would be better to use Mod Rewrite to lead spiders to pages that actually do exist.
- Looking forward to suggestions about best way to deal with these errant searches.
- Also curious to learn about why these are occurring.
Thank you.
-
Thanks, Kevin.
Glad I'm not the only one.
Disabling tags and categories aren't an option, in my case. Guess I need to look at more of the potential upside. Seems tags and categories, if handled correctly, could provide a new way to engage visitors and search engines.
I've heard people refer to 'spidering budgets, or whatnot'. Guess it's an entirely new topic of discussion... if limiting the spurious spider searching, (from good spiders,) means that said spiders will spend more time on the conventional pathways of a site.
-
Thanks, Vjay.
Did a lot of work fixing links in the database.
The issue was occurring even before implementation of WP super cache, and before the link fixing.
Being new-ish to WP, it seems strange that it's so willing to:
-
provide access via directories that don't really exist:
-
categories, tags, even search, if using a theme-provided site search.
I'm getting better at .htaccess, so I'm able to handle a lot of the old incoming links fairly well. In the case of these weird 'in the mind of the spiders' links, will be try to address these as well.
Thanks for your advice about 404 and 301 plugins. Time to look around and see what other useful tools are out there.
-
-
I have the same issue, I have stopped using tags because of all the irrelevant links they cause. Looking forward to reading the comments on this thread.
KJr
-
Hi There,
Your website is built on WordPress and it looks like that there might be spurious entries in the DB, which might also not be getting deleted due to the WP super cache plugin. You may try to empty your cache and install 'all 404 redirect' and 301 management plugins.
I hope this helps.
Regards,
Vijay
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website indexing issues
My website is being indexed with both https - https with www. and no leader at all. example. https//www.example.com and https//example.com and example.com 3 different versions are being indexed. How would I begin resolving this? Hosting?
Technical SEO | | DigitalRipples0 -
Is there software that makes it easier to reach out to websites and webmaster to have toxic links removed?
I'm currently trying to disavow toxic links that I have found on my site, that our previous SEO company created. Google requires that we reach out to the individual websites and try to have them removed. Does anyone know of software that makes this process automated or easer? I'm currently doing it manually, uhg! Also, is there software that can help you find toxic links? I'm currently also doing that manually, uhg! Thanks.
Technical SEO | | milehigh52800 -
Website Migration Query
We are going to migrate our site but we cannot do this gradually, so before we complete the whole migration, we were thinking of launching the new site on a sub-domain and gradually redirect traffic to the sub-domain, starting with 10%, moving up steadily so that we then migrate to the new site within four/five weeks. The new site will have a new URL structure on the same domain, with a complete re-design and the IP address will be changing as well, even though the server geographical location will remain the same. a) Should we noindex the new sub-domain while the new site is on trial? b) Are there any other issues we should look out for? Thanks in Advance 🙂
Technical SEO | | seoec0 -
My beta site (beta.website.com) has been inadvertently indexed. Its cached pages are taking traffic away from our real website (website.com). Should I just "NO INDEX" the entire beta site and if so, what's the best way to do this? Please advise.
My beta site (beta.website.com) has been inadvertently indexed. Its cached pages are taking traffic away from our real website (website.com). Should I just "NO INDEX" the entire beta site and if so, what's the best way to do this? Are there any other precautions I should be taking? Please advise.
Technical SEO | | BVREID0 -
New website
Hello, How bad is going to be if I change my Joomla website to Wordpress? I can check the 100 best pages and redirect them to the new url with 301 but my website has 424 pages. If is this needs time, how long does it take to be in the same position? Is Google review my new website quickly? What about if I make my services more specific and the main topic is going to be smaller in pages? (Mpre social services pages vs. less pages about the main webdesign topic) I should change my website to WP but I am afraid because now I am in the 2. 🙂 Thanks! Regards,
Technical SEO | | Netkreativ
Misi0 -
Too many footer links?
Hi. We're working on http://www.gear-zone.co.uk/ at the moment, and I was wondering what's everyone's opinion on footer links. There's quite a lot on the page, and I was wondering if there might be a few too many. If so, what would be the best plan of action? Remove them altogether, stick them in an iframe or in a bit of JS so they can't be crawled? Thanks!
Technical SEO | | neooptic0 -
How is link juice passed to links that appear more than once on a given page?
For the sake of simplicity, let's say Page X has 100 links on it, and it has 100 points of link juice. Each page being linked to would essentially get 1 point of link juice. Right? Now let's say Page X links to Page Y 3 times and Page Z 5 times, and every other link only once. Does this mean that Page Y would get 3 "link juice points" and Page Z would get 5? Note: I know that the situation is much more complex than this, such as the devaluation of footer links, etc, etc, etc. However, I am interested to hear peoples take on the above scenario, assuming all else is equal.
Technical SEO | | bheard0 -
How to handle .mobi and normal website for mobile search and regular search
Hi, we have our regular website at jameda.de and a mobile only page at jameda.mobi Users on mobile devices will be automatically redirected to .mobi if they click on a link to jameda.de in the SERPs. What is the best practice to ensure, that Googlebot is indexing jameda.de and Googlebot Mobile is indexing jameda.mobi without duplicate content issues and having Link-Juice benefits on mobile search at the same time? Thanks a lot
Technical SEO | | jameda0