Huge number of crawl anomalies and 404s - non- existent urls
-
Hi there,
Our site was redesigned at the end of January 2020. Since the new site was launched we have seen a big drop in impressions (50-60%) and also a big drop in total and organic traffic (again 50-60%) when compared to the old site.
I know in the current climate some businesses will see a drop in traffic, however we are a tech business and some of our core search terms have increased in search volume as a result of remote-working.
According to search console there are 82k urls excluded from coverage - the majority of these are classed as 'crawl anomaly' and there are 250+ 404's - almost all of the urls are non-existent, they have our root domain with a string of random characters on the end. Here are a couple of examples:
root.domain.com/96jumblestorebb42a1c2320800306682
root.domain.com/01sportsplazac9a3c52miz-63jth601
root.domain.com/39autoparts-agency26be7ff420582220
root.domain.com/05open-kitchenaf69a7a29510363
Is this a cause for concern? I'm thinking that all of these random fake urls could be preventing genuine pages from being indexed / or they could be having an impact on our search visibility. Can somebody advise please?
Thanks!
-
Unlikely, as long as they're returning 404 errors you should be OK. Maybe update your disavow file and you should be good to go!
-
Thanks for your reply.
I’m new to the business and I’ve found that that the old website had a spam attack, all of these fake urls are from the old pages (as they have 301s).
There are 82,000 crawl anomalies from these fake/spam URLs and around 200 404s. None of the fake /spam urls have been indexed. Could this be having a negative effect of search visibility/DA or rankings?
Thanks!
-
It's tough to say without seeing the site. Overall it's unlikely if you don't use that string anywhere. We usually see it more for broken relative URLs. Maybe a third party site is using that string.
-
Thanks for your reply, would broken urls from the internal linking structure explain the random characters? e.g. root.domain.com/96jumblestorebb42a1c2320800306682
We've never had any page content/urls relating to 'jumblestore'.
Thanks!
-
From what I can tell, this probably isn't the reasons for the drops. I'd go back and ensure that any URLs that changed are 301 redirecting to the correct destination URL. I'd also ensure that no pages that were associated with high volume keywords no longer exist.
For your issue, Google is likely finding some broken URLs, possibly from your internal linking structure. Perform a crawl of the site and see if you can find "Inlinks" to those broken pages. If so, you can work with dev to eliminate the issue.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
WWW vs Non WWW for EXISTING site.
This one has sort of been asked already but I cannot find an answer. When we evaluate a new SEO client, previously with Majestic we would review the root domain vs sub domain (www) for which had the higher Trust Flow and Citation flow, and if there was a major difference, adjust the Google indexed domain to the higher peforming one. Is there a way to do this with Moz, Domain Authority, and Sub Domain authority are always returning the same DA for me. Thanks in advance.
Technical SEO | | practiceedge10 -
Why does Bing bot crawl so aggressively?
We observer that the Bing bot is crawling our site very aggressively. We set Bing's crawl control so that it should not crawl us during heavy traffic hours, but that did not change a thing. Does anyone have the problem and even better a solution?
Technical SEO | | Roverandom1 -
WP URL issue - Concatenated URLs (LOTS of them)
WP is doing this somehow, and creating URLs for hundreds of pages that don't exist. HOW is this happening, and how do I stop It? I have many, many URLS like this: https://www.atouchofrust.com/terms-of-use/atouchofrust.com/vendor-news. Of note, atouchofrust.com/terms-of-use, and atouchofrust.com/vendor-news are both legit pages on the site. Why they are being concatenated is beyond my limited understanding of WP. Please, somebody, help. Cori
Technical SEO | | FlyingC0 -
Canonical sitemap URL different to website URL architecture
Hi, This may or may not be be an issue, but would like some SEO advice from someone who has a deeper understanding. I'm currently working on a clients site that has a bespoke CMS built by another development agency. The website currently has a sitemap with one link - EG: www.example.com/category/page. This is obviously the page that is indexed in search engines. However the website structure uses www.example.com/page, this isn't indexed in search engines as the links are canonical. The client is also using the second URL structure in all it's off and online advertising, internal links and it's also been picked up by referral sites. I suspect this is not good practice... however I'd like to understand whether there are any negative SEO effectives from this structure? Does Google look at both pages with regard to visits, pageviews, bounce rate, etc. and combine the data OR just use the indexed version? www.example.com/category/page - 63.5% of total pageviews
Technical SEO | | MikeSutcliffe
www.example.com/page - 34.31% of total pageviews Thanks
Mike0 -
Removed URLs
Hi all, We have recently removed 200+ articles from our blog. However, those links are still being shown on Google weeks after their removal. In there a way to speed up the process? What effect will this have on our SEO ranking?
Technical SEO | | businessowner0 -
AJAX and High Number Of URLS Indexed
I recently took over as the SEO for a large ecommerce site. Every Month or so our webmaster tools account is hit with a warning for a high number of URLS. In each message they send there is a sample of problematic URLS. 98% of each sample is not an actual URL on our site but is an AJAX request url that users are making. This is a server side request so the URL does not change when users make narrowing selections for items like size, color etc. Here is an example of what one of those looks like Tire?0-1.IBehaviorListener.0-border-border_body-VehicleFilter-VehicleSelectPanel-VehicleAttrsForm-Makes We have over 3 million indexed URLs according to Google because of this. We are not submitting these urls in our site maps, Google Bot is making lots of AJAX selections according to our server data. I have used the URL Handling Parameter Tool to target some of those parameters that are currently set to let Google decide and set it to "no urls" with those parameters to be indexed. I still need more time to see how effective that will be but it does seem to have slowed the number of URLs being indexed. Other notes: 1. Overall traffic to the site has been steady and even increasing. 2. Google bot crawls an average of 241000 urls each day according to our crawl stats. We are a large Ecommerce site that sells parts, accessories and apparel in the power sports industry. 3. We are using the Wicket frame work for our website. Thanks for your time.
Technical SEO | | RMATVMC0 -
Changed URL of all web pages to a new updated one - Keywords still pick the old URL
A month ago we updated our website and with that we created new URLs for each page. Under "On-Page", the keywords we put to check ranking on are still giving information on the old urls of our websites. Slowly, some new URLs are popping up. I'm wondering if there's a way I can manually make the keywords feedback information from the new urls.
Technical SEO | | Champions0 -
301 Redirecting weird URLs with % in them
I've been working on redirecting links reported as 404 in Google webmaster tools. I've stumbled upon 41 URLs that Google is reporting as 404 that include a '%' in the URL, but I don't know how to redirect. Here is an example: URL: bond_information.htm%20Surety%20Bond%20Information,%20with%20FAQ Attempted redirect: redirect 301 /bond_information.htm%20Surety%20Bond%20Information,%20with%20FAQ http://www.mysite.com/ Unfortunately, after implementing the redirect, http://www.mysite.com/bond_information.htm%20Surety%20Bond%20Information,%20with%20FAQ still resolves a 404 error. Anyone successfully fix these errors using Apache .htaccess?
Technical SEO | | TheDude0