Huge number of crawl anomalies and 404s - non- existent urls
-
Hi there,
Our site was redesigned at the end of January 2020. Since the new site was launched we have seen a big drop in impressions (50-60%) and also a big drop in total and organic traffic (again 50-60%) when compared to the old site.
I know in the current climate some businesses will see a drop in traffic, however we are a tech business and some of our core search terms have increased in search volume as a result of remote-working.
According to search console there are 82k urls excluded from coverage - the majority of these are classed as 'crawl anomaly' and there are 250+ 404's - almost all of the urls are non-existent, they have our root domain with a string of random characters on the end. Here are a couple of examples:
root.domain.com/96jumblestorebb42a1c2320800306682
root.domain.com/01sportsplazac9a3c52miz-63jth601
root.domain.com/39autoparts-agency26be7ff420582220
root.domain.com/05open-kitchenaf69a7a29510363
Is this a cause for concern? I'm thinking that all of these random fake urls could be preventing genuine pages from being indexed / or they could be having an impact on our search visibility. Can somebody advise please?
Thanks!
-
Unlikely, as long as they're returning 404 errors you should be OK. Maybe update your disavow file and you should be good to go!
-
Thanks for your reply.
I’m new to the business and I’ve found that that the old website had a spam attack, all of these fake urls are from the old pages (as they have 301s).
There are 82,000 crawl anomalies from these fake/spam URLs and around 200 404s. None of the fake /spam urls have been indexed. Could this be having a negative effect of search visibility/DA or rankings?
Thanks!
-
It's tough to say without seeing the site. Overall it's unlikely if you don't use that string anywhere. We usually see it more for broken relative URLs. Maybe a third party site is using that string.
-
Thanks for your reply, would broken urls from the internal linking structure explain the random characters? e.g. root.domain.com/96jumblestorebb42a1c2320800306682
We've never had any page content/urls relating to 'jumblestore'.
Thanks!
-
From what I can tell, this probably isn't the reasons for the drops. I'd go back and ensure that any URLs that changed are 301 redirecting to the correct destination URL. I'd also ensure that no pages that were associated with high volume keywords no longer exist.
For your issue, Google is likely finding some broken URLs, possibly from your internal linking structure. Perform a crawl of the site and see if you can find "Inlinks" to those broken pages. If so, you can work with dev to eliminate the issue.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirect indexed lightbox URLs?
Hello all, So I'm doing some technical SEO work on a client website and wanted to crowdsource some thoughts and suggestions. Without giving away the website name, here is the situation: The website has a dedicated /resources/ page. The bulk of the Resources are industry definitions, all encapsulated in colored boxes. When you click on the box, the definition opens in a lightbox with its own unique URL (Ex: /resources/?resource=augmented-reality). The information for these colored lightbox definitions is pulled from a normal resources page (Ex: /resources/augmented-reality/). Both of these URLs are indexed, leading to a lot of duplicate indexed content. How would you approach this? **Things to Consider: ** -Website is built on Wordpress with a custom theme.
Technical SEO | | Alces
-I have no idea how to even find settings for the lightbox (will be asking the client today).
-Right now my thought is to simply disallow the lightbox URL in robots.txt and hope Google will stop crawling and eventually drop from the index.
-I've considered adding the main resource page canonical to the lightbox URL, but it appears to be dynamically created and thus there is no place to access (outside of the FTP, I imagine?). I'm most rusty with stuff like this, so figured I'd appeal to the masses for some assistance. Thanks! -Brad0 -
Use existing page with bad URL or brand new URL?
Hello, We will be updating an existing page with more helpful information with the goal of reaching more potential customers through SEO and also attaching a SEM campaign to the specific landing page. The current URL of the page scores 25 on Page Authority, and has 2 links to it from blog articles (PA 35, 31). The current content needs to be rewritten to be more helpful and also needs some additional information. The downsides are that it has an "bad" URL- no target keyword and uses underscores. Which of the following choices would you make? 1. Update this old "bad" URL with new content. Benefit from the existing PA. -or- 2. Start with a new optimized URL, reusing some of the old content and utilizing a 301 redirect from the previous page? Thank you!
Technical SEO | | XLMarketing0 -
Changing site URL structure
Hey everybody, I'm looking for a bit of advice. A few weeks ago Google sent me an email saying all pages with any text input on them need to switch to https for those pages. This is no problem, I was slowly switching the site to https anyway using a 301 redirect. However, my site also has a language subfolder in the url, mysite.com/en/ mysite.com/ru/ etc. Due to poor work on my part the translations of the site haven't been updated in a long time and lots of the pages are in english even on the russian version etc. So I'm thinking of just removing this url structure and just having mysite.com My plan is to 301 all requests to https and remove the language subfolder in the url at the same time. So far the https switching hasn't changed my rankings. Am I more at risk of losing my rankings by doing this? Thanks!
Technical SEO | | Ruhol0 -
Spaces (actual spaces) in URL
Hi all, Is there a huge loss of SEO performance if a URL shows spaces with an actual space (i.e. %20) in the URL rather than a "-" (or indeed a "_")? I know the preferred option is to have a "-", but I am just wondering if it is worth our effort to manually change the "%20" to a "-" in all the instances? Thanks 🙂 Diana
Technical SEO | | Diana.varbanescu0 -
Changed URL of all web pages to a new updated one - Keywords still pick the old URL
A month ago we updated our website and with that we created new URLs for each page. Under "On-Page", the keywords we put to check ranking on are still giving information on the old urls of our websites. Slowly, some new URLs are popping up. I'm wondering if there's a way I can manually make the keywords feedback information from the new urls.
Technical SEO | | Champions0 -
How to increase the crawl rate?
hello, Our site was hosted in North America and Google was crawling it reasonably fast. Since our traffic is mostly from India we moved it to India, now the crawling is terribly slow from Google. Is there anyway to fix the crawl rate(we have increased the crawl rate in GWT)
Technical SEO | | greyniumseo0 -
Non existant URLs being generated in index
Hi all, I have a pretty big problem with my site at the moment which I'm worried will have an impact on my rankings. I've just had a crawl test done and for some reason I get a load of urls returned that don't actually exist... For example I am getting urls like this in my crawl test and xml sitemap: www.applicablejobs.com/jobs/add/android-designer/android-designer/android-designer/android-developer/android-developer/ www.applicablejobs.com/jobs/add/android-designer/android-designer/android-designer/android-developer/iphone-designer/ All the urls seem to start off with www.applicablejobs.com/jobs/ and there is an entry for every conceivable combination of slugs. I can only assume that if the crawl test and an xml sitemap generator is indexing these urls then Google and other search engines probably are too. Does anyone have any idea what might be causing this issue and what can I do to remove them from Googles index if they are? Thanks
Technical SEO | | Benji870 -
Are lots of links from an external site to non-existant pages on my site harmful?
Google Webmaster Tools is reporting a heck of a lot of 404s which are due to an external site linking incorrectly to my site. The site itself has scraped content from elsewhere and has created 100's of malformed URLs. Since it unlikely I will have any joy having these linked removed by the creator of the site, I'd like to know how much damage this could be doing, and if so, is there is anything I can do to minimise the impact? Thanks!
Technical SEO | | Nobody15569050351140