Googlebot and other spiders are searching for odd links in our website trying to understand why, and what to do about it.
-
I recently began work on an existing Wordpress website that was revamped about 3 months ago. https://thedoctorwithin.com. I'm a bit new to Wordpress, so I thought I should reach out to some of the experts in the community.Checking ‘Not found’ Crawl Errors in Google Search Console, I notice many irrelevant links that are not present in the website, nor the database, as near as I can tell. When checking the source of these irrelevant links, I notice they’re all generated from various pages in the site, as well as non-existing pages, allegedly in the site, even though these pages have never existed.
For instance:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/feedback-and-testimonials/ allegedly linked from:
- https://thedoctorwithin.com/category/seminars/newsletters/page/7/newsletters/page/3/ (doesn’t exist)
In other cases, these goofy URLs are even linked from the sitemap. BTW - all the URLs in the sitemap are valid URLs.
Currently, the site has a flat structure. Nearly all the content is merely URL/content/ without further breakdown (or subdirectories). Previous site versions had a more varied page organization, but what I'm seeing doesn't seem to reflect the current page organization, nor the previous page organization.
Had a similar issue, due to use of Divi's search feature. Ended up with some pretty deep non-existent links branching off of /search/, such as:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/consultations/ allegedly linked from:
- https://thedoctorwithin.com/search/newsletters/page/2/feedback-and-testimonials/feedback-and-testimonials/online-continuing-education/ (doesn't exist).
I blocked the /search/ branches via robots.txt. No real loss, since neither /search/ nor any of its subdirectories are valid.
There are numerous pre-existing categories and tags on the site. The categories and tags aren't used as pages. I suspect Google, (and other engines,) might be creating arbitrary paths from these. Looking through the site’s 404 errors, I’m seeing the same behavior from Bing, Moz and other spiders, as well.
I suppose I could use Search Console to remove URL/category/ and URL/tag/. I suppose I could do the same, in regards to other legitimate spiders / search engines. Perhaps it would be better to use Mod Rewrite to lead spiders to pages that actually do exist.
- Looking forward to suggestions about best way to deal with these errant searches.
- Also curious to learn about why these are occurring.
Thank you.
-
Thanks, Kevin.
Glad I'm not the only one.
Disabling tags and categories aren't an option, in my case. Guess I need to look at more of the potential upside. Seems tags and categories, if handled correctly, could provide a new way to engage visitors and search engines.
I've heard people refer to 'spidering budgets, or whatnot'. Guess it's an entirely new topic of discussion... if limiting the spurious spider searching, (from good spiders,) means that said spiders will spend more time on the conventional pathways of a site.
-
Thanks, Vjay.
Did a lot of work fixing links in the database.
The issue was occurring even before implementation of WP super cache, and before the link fixing.
Being new-ish to WP, it seems strange that it's so willing to:
-
provide access via directories that don't really exist:
-
categories, tags, even search, if using a theme-provided site search.
I'm getting better at .htaccess, so I'm able to handle a lot of the old incoming links fairly well. In the case of these weird 'in the mind of the spiders' links, will be try to address these as well.
Thanks for your advice about 404 and 301 plugins. Time to look around and see what other useful tools are out there.
-
-
I have the same issue, I have stopped using tags because of all the irrelevant links they cause. Looking forward to reading the comments on this thread.
KJr
-
Hi There,
Your website is built on WordPress and it looks like that there might be spurious entries in the DB, which might also not be getting deleted due to the WP super cache plugin. You may try to empty your cache and install 'all 404 redirect' and 301 management plugins.
I hope this helps.
Regards,
Vijay
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Search visibility of website that only uses H2 tags - will not having H1 damage my visibility?
Excuse the basic question. I host my domain and website on Squarespace. I use a specific theme and after doing a site crawl of my site Moz picked up that Pages and Blog posts 'Missing or Invalid H1' tags (450 issues!). I discovered that my Squarespace theme only using H2 tags. Is this a serious issue that affects my search visibility? What would you recommend that I do to fix this, if anything? I'm starting some SEO and lnikbuilding, but wanted to see if this is an issue that I need to consider. Thanks!!!!
Technical SEO | | twofourseven0 -
Is SEO effected of putting an external link in the primary navigation of a website?
I have a customer, www.xxx.com. This site has good traffic, low bounce rate (28%), 2:00 min avg time on site, and 45% return visitor rating. No spam rankings, etc. Good load time. Another site, www.yyy.com, has sent out a request for them to add them as a new link in www.xxx.com's primary navigation - using a title such as "abc" (not the name of the company or site of yyy.com). This second site, www.yyy.com, has a bounce rate of 98%, avg time on site is :30, and 10.2% return visitor rate. No spam flags noted in Open Site explorer. Plus they are asking other sites similar to www.xxx.com to do the same thing. Questions/Concerns and Feedback appreciated: Will yyy.com's analytics and quality pass back to xxx.com and cause Google or algorithms to flag or penalize xxx.com? (It ranks #1 for quite a few things.) The relevancy between the sites is good -same industry, same business objectives. From a usability standpoint, isn't it more appropriate to place a link to another website in a different way? e.g. a promotional graphic wit a link or anchor text links? Isn't it more appropriate to ask another business for links - not using the primary nav of a site? (It seems yyy.com is essentially asking other sites for 'free advertising/promotion.' Thanks!
Technical SEO | | mundsack0 -
Odd 404 pages
Evening all, I've performed a Screaming Frog technical crawl of a site, and it's returning links like this as 404s: http://clientsite.co.uk/accidents-caused-by-colleagues/js/modernizr-2.0.6.min.js Now, I recognise that Modernizr is used for detecting features in the user's browser - but why would it have created an indexed page that no longer exists? Would you leave them as is? 410 them? Or do something else entirely? Thanks for reading, I look forward to hearing your thoughts! Kind regards, John.
Technical SEO | | Muhammad-Isap0 -
Website Revision
Hi all~ We are completely remaking our website: www.containmydog.com. I believe we have a good handle on the visual aspects of the redesign. What are the backend or behind the scenes (probably not using the technical term) things that need to be done so search engines know where things are. For example i know we are not going to remove some pages , change were some pages are on the site and add new pages. Is there a checklist that lists the important things to do when designing/redesigning a website? If there is not a checklist what are the things I should be asking the web person we hire?
Technical SEO | | PhotographerSteve1 -
Is it possible to export Inbound Links in a CSV file categorized by Linking Root Domains ?
Hi, I am performing an analysis of the total inbound links to my homepage and I would like to have the total amount of inbound links categorized by the Linking root domains. For example, the Open Site explorer does offer the feature to show you the Linking Root Domains to your page. Then when you click on the first Linking Root Domain, it also shows you the Top Linking Pages ( Which means all the pages that link to your page from this particular top level domain) Now I would like to export this data to a CSV file, but open site explorer only exports the total amount of top level linking domains. Does anyone has a solution to this problem ? Thank you very much for the help in advance!
Technical SEO | | Feweb0 -
Two companies merge: website A redirect 301 to website B. Problems?
Hi, last december the company I work for and another company merged. The website of company A was taken offline and the home page was 302 redirected to a page on website B. This page had information about the merger and the consequences for customers. The deeper pages of website A were 301 redirected to similar pages on website B. After a while, the traffic from the redirected home page decreased and we thought it was time to change the redirect from a 302 into a 301 redirect to the home page. Because there are still a lot of links to the home page of website A and we wanted to preserve the link juice. Two weeks ago we changed the 302 redirect from website A into a 301 redirect to the home page of website B. Last week the Google webmaster tools account of website B showed the links from the 301 redirected website A. The total amount of links doubled and the top anchor text is the name of company A instead of company B. This, off course, could trigger an alarm at Google. Because we got a lot of new links with a different anchor text. A tactic used by spammers/black-hats. I am a bit worried that our change will be penalized by Google. But our change is legit. It is to the advantage of our customers to find us if they search for the name of company A or click on a link to website A. We didn´t change the change of address of domain A in Google webmaster tools yet. Is it a good idea to change the change of address of domain A into domain B? Are there other precautions we can take?
Technical SEO | | NN-online0 -
I can buy a domain from a competitor. Whats the best way to make good use of these links for my existing website
I can buy a domain from a competitor. Whats the best way to make good use of these links for my existing website
Technical SEO | | Archers0 -
Website hacked
Hi I've been asked to help a colleague with his website. It seems to be hacked. He recently received an e-mail from Google saying his adwords account was suspended 'due to high probability his site may be hosting or distributing malicious software' I just checked his source and there seems to loads of weird on code on his pages, this would not have been but on by any members of the website owners. Please image attached when we try to access his website via google search I just contacted the hosting provider - does anyone have experience with this and how to prevent such hacking in the future. The site is build using HTML with no CMS. IjW19.jpg
Technical SEO | | Socialdude0