Googlebot and other spiders are searching for odd links in our website trying to understand why, and what to do about it.
-
I recently began work on an existing Wordpress website that was revamped about 3 months ago. https://thedoctorwithin.com. I'm a bit new to Wordpress, so I thought I should reach out to some of the experts in the community.Checking ‘Not found’ Crawl Errors in Google Search Console, I notice many irrelevant links that are not present in the website, nor the database, as near as I can tell. When checking the source of these irrelevant links, I notice they’re all generated from various pages in the site, as well as non-existing pages, allegedly in the site, even though these pages have never existed.
For instance:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/feedback-and-testimonials/ allegedly linked from:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/ (doesn’t exist)
In other cases, these goofy URLs are even linked from the sitemap. BTW - all the URLs in the sitemap are valid URLs.
Currently, the site has a flat structure. Nearly all the content is merely URL/content/ without further breakdown (or subdirectories). Previous site versions had a more varied page organization, but what I'm seeing doesn't seem to reflect the current page organization, nor the previous page organization.
Had a similar issue, due to use of Divi's search feature. Ended up with some pretty deep non-existent links branching off of /search/, such as:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/consultations/ allegedly linked from:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/ (doesn't exist).
I blocked the /search/ branches via robots.txt. No real loss, since neither /search/ nor any of its subdirectories are valid.
There are numerous pre-existing categories and tags on the site. The categories and tags aren't used as pages. I suspect Google, (and other engines,) might be creating arbitrary paths from these. Looking through the site’s 404 errors, I’m seeing the same behavior from Bing, Moz and other spiders, as well.
I suppose I could use Search Console to remove URL/category/ and URL/tag/. I suppose I could do the same, in regards to other legitimate spiders / search engines. Perhaps it would be better to use Mod Rewrite to lead spiders to pages that actually do exist.
- Looking forward to suggestions about best way to deal with these errant searches.
- Also curious to learn about why these are occurring.
Thank you.
-
Thanks, Kevin.
Glad I'm not the only one.
Disabling tags and categories aren't an option, in my case. Guess I need to look at more of the potential upside. Seems tags and categories, if handled correctly, could provide a new way to engage visitors and search engines.
I've heard people refer to 'spidering budgets, or whatnot'. Guess it's an entirely new topic of discussion... if limiting the spurious spider searching, (from good spiders,) means that said spiders will spend more time on the conventional pathways of a site.
-
Thanks, Vjay.
Did a lot of work fixing links in the database.
The issue was occurring even before implementation of WP super cache, and before the link fixing.
Being new-ish to WP, it seems strange that it's so willing to:
-
provide access via directories that don't really exist:
-
categories, tags, even search, if using a theme-provided site search.
I'm getting better at .htaccess, so I'm able to handle a lot of the old incoming links fairly well. In the case of these weird 'in the mind of the spiders' links, will be try to address these as well.
Thanks for your advice about 404 and 301 plugins. Time to look around and see what other useful tools are out there.
-
-
I have the same issue, I have stopped using tags because of all the irrelevant links they cause. Looking forward to reading the comments on this thread.
KJr
-
Hi There,
Your website is built on WordPress and it looks like that there might be spurious entries in the DB, which might also not be getting deleted due to the WP super cache plugin. You may try to empty your cache and install 'all 404 redirect' and 301 management plugins.
I hope this helps.
Regards,
Vijay
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Images on Website for SEO
Good Morning, We have a magento website with hundreds of different products that have slight size variations. The image for each of these products looks the same (the only difference between the products is some of the dimensions) .... Would you recommend using the same image for each of these products and just use a generic file name that describes the overall product or would you give each product its own image with it's specific product name as the file name? Should I use 1 image for 500 different sku's or should i rename the file the name of each individual sku and load an individual image? The end user will not know the difference since all of the images will appear identical, simply asking from an SEO perspective. Thanks
Technical SEO | | Prime850 -
Are links still considered reciprocal if the link from one website is rel="nofollow" and the other isnt ?
Im working on a site that has some press coverage due in the next couple of days from quite a big site in the niche. The press outlet has requested that we link back to the content they post about us, they said the link can be rel="nofollow" if we'd prefer. Id really like to get the full benefit of the link back to our website, obviously if i did a straight link back to the 3rd party press site the links would be reciprocal and cancel each other out in terms of "link juice", but i was wandering if we make our link back to the 3rd party rel="nofollow" will we still get the full benefit of their link to us in terms of link juice ? ie. having the link back to them, but nofollow wouldn't been seen as a reciprocal link. ? (Obviously either way there is still benefit of having the link even if it reciprocal as it will send traffic to our site, but just no "link juice") Note - Ive used the phrase"Link Juice" for lack of a better term, any ideas on how else to refer to this ?
Technical SEO | | Sam-P1 -
Keyword links in footer
Hi - I am trying to help a site to get out from under a Google manual action penalty - down as "Partial Matches - Unnatural Links to site".
Technical SEO | | StevieD
I am checking through their links - the site that links most to them is a local directory style site - it has 2,682 links back into 1 page (Home) The directory site was built by the web co. that built my clients' site and they put a keyword link in the footer of the directory site - the keyword was "Buy Truffles". All my instincts say that is a bad thing! But - this is what is perplexing me - they are ranking no.1 for that keyword! Whereas they have lost rankings (i.e. not top 50) for all the other keywords they were targeting. So I don't get it! Can anyone explain why this is. I feel I should I get that link removed but don't want to take out their only ranking keyword! Webmaster shows about 55 different pages in the directory site have a link back to my client. Hope you can help.
Cheers - Steve0 -
My seo company has a footer link that links to my site by keyword will this effect my rankings
My old SEo company has a footer link by keyword to my site so it acts like a site wide link will this effect my rankings. My site was in the top 5 for many keywords now page 2 and 3 so I am trying to see what has effected it as we havent changed what we do
Technical SEO | | Casefun0 -
Multi Company websites
Hello SEO community ! Hope you'll have some good advice for this project. 🙂 I'm working for a group of companies just starting its SEO experience. Nowadays they have 10 different websites with different names and pretty much the same objectives. So basicly, > Would it be better to gather all website under one adress with subdomains ? They want to display almost the same info, blogs and products.. It make dupplicate content a real pain and Social Media strategy a nightmare. More info: 10 websites for 8 subsidiaries, 1 holding, 1 online shop Each subisdiary has english + its proper language They want regular posts and info updates (blogs, newsletters) They don't have all the same name They all do the same activity Online shop is full a product keywords Ideas: Working on the holding website as mother ship - for branding (social media), actu (blogs), CM (videos, and more)- Displaying the online shop products in all websites (xml) Diplaying blog updates (no full message) via xml on all websites Linking all websites to the blog, shop and holding Tks a lot !
Technical SEO | | AymanH0 -
Too Many On Page LInk
The analysis of my site is showing that I have a problem with too many on-page links. Most of this is due to our menu, and wanting users to be able to quickly get to the shopping category they are looking for. We end up with over 200 links in order to get the menu we want. How are other people dealing with a robust menu, but avoiding getting dinged for too many links? One of our pages in question is: http://www.milosport.com/category/2176-snowboards.aspx
Technical SEO | | dantheriver0 -
Affiliate links
Is there a best practice for linking out to affiliates URLs post panda? I know some believe it can be a factor.
Technical SEO | | PeterM220 -
How Does Link Juice Pass?
Say there is a link on an authoritative site to my site, and the link points to www.mysite.com. However, I have set all URL variations (https://mysite.com, www.mysite.com, mysite.com, etc.) to redirect to http://mysite.com automatically. Does the link juice from this authoritative site pass through the www.mysite.com URL to http://mysite.com automatically due to the automatic redirect? I guess my question is does the link juice automatically pass on to the destination URL, even though it is not the original URL the authoritative site pointed to?
Technical SEO | | NiallTom0