Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
[Very Urgent] More 100 "/search/adult-site-keywords" Crawl errors under Search Console
-
I just opened my G Search Console and was shocked to see more than 150 Not Found errors under Crawl errors. Mine is a Wordpress site (it's consistently updated too):
Here's how they show up:
Example 1:
- URL: www.example.com/search/adult-site-keyword/page2.html/feed/rss2
- Linked From: http://an-adult-image-hosting.com/search/adult-site-keyword/page2.html
Example 2 (this surprised me the most when I looked at the linked from data):
-
URL: www.example.com/search/adult-site-keyword-2.html/page/3/
-
Linked From:
-
www.example.com/search/adult-site-keyword-2.html/page/2/ (this is showing as if it's from our own site)
-
http://a-spammy-adult-site.com/search/adult-site-keyword-2.html
Example 3:
- URL: www.example.com/search/adult-site-keyword-3.html
- Linked From: http://an-adult-image-hosting.com/search/adult-site-keyword-3.html
How do I address this issue?
-
Here is what I would do
-
Disavow the domain that is linking to you from the adult site(s).
-
The fact that Google search console is showing that you have an internal page linking as well makes me want to know a) have you always owned this domain and maybe someone previously did link internally like this or b) you may have been or are hacked
In the case of b) this can be really tricky. I once had a site that in a crawl it was showing sitewide links to various external sites that we should not be linking to. When I looked at the internal pages via my browser, there was no link as far as I could see even though it showed up on the crawler report.
Here was the trick. The hacker had setup a script to only show the link when a bot was viewing the page. Plus, we were running mirrored servers and they had only hacked one server. So, the links only showed up when you were spidering a specific mirrored instance as a bot.
So thanks to the hacking, not only were we showing bad links to bad sites, we were doing this through cloaking methodology. Two strikes against us. Luckily we picked this up pretty quick and fixed immediately.
Use a spidering program or browser program to show a user agent of Googlebot and go visit your pages that are linking internally. You might be surprised.
Summary
Googlebot has a very long memory. It may be that this was an old issue that was fixed long ago. If that was the case, just show the 404s for the pages that do not exist, and disavow the bad domain and move on. Make sure that you have not been hacked as this would also be why this is showing.
Regardless, the fact that Google did find it at one point, you need to make sure you resolve. Pull all the URLs into a spreadsheet and run Screaming Frog in list mode to check them all to make sure you fix all of it.
-
-
Yep.. Looking if anyone can help with this..
-
Oh yea, I missed that. That's very strange, not sure how to explain that one!
-
Thanks for the response Logan. What you are saying definitely makes sense.. But it makes think why do I see something like Example 2 under Crawl errors. Why Google Search Console shows linked from as 2 URL - one the spammy site's and other is from my own website. How is that even possible?
-
I've seen similar situations, but never in bulk and not with adult sites. Basically what's happening is somehow a domain (or multiple) are linking to your site with inaccurate URLs. When bots crawling those sites find the links pointing to yours, they obviously hit a 404 page which triggers the error in Search Console.
Unfortunately, there's not too much you can do about this, as people (or automated spam programs) can create a link to any site and any time. You could disavow links from those sites, which might help from an SEO perspective, but it won't prevent the errors from showing up in your Crawl Error report.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl Stats Decline After Site Launch (Pages Crawled Per Day, KB Downloaded Per Day)
Hi all, I have been looking into this for about a month and haven't been able to figure out what is going on with this situation. We recently did a website re-design and moved from a separate mobile site to responsive. After the launch, I immediately noticed a decline in pages crawled per day and KB downloaded per day in the crawl stats. I expected the opposite to happen as I figured Google would be crawling more pages for a while to figure out the new site. There was also an increase in time spent downloading a page. This has went back down but the pages crawled has never went back up. Some notes about the re-design: URLs did not change Mobile URLs were redirected Images were moved from a subdomain (images.sitename.com) to Amazon S3 Had an immediate decline in both organic and paid traffic (roughly 20-30% for each channel) I have not been able to find any glaring issues in search console as indexation looks good, no spike in 404s, or mobile usability issues. Just wondering if anyone has an idea or insight into what caused the drop in pages crawled? Here is the robots.txt and attaching a photo of the crawl stats. User-agent: ShopWiki Disallow: / User-agent: deepcrawl Disallow: / User-agent: Speedy Disallow: / User-agent: SLI_Systems_Indexer Disallow: / User-agent: Yandex Disallow: / User-agent: MJ12bot Disallow: / User-agent: BrightEdge Crawler/1.0 (crawler@brightedge.com) Disallow: / User-agent: * Crawl-delay: 5 Disallow: /cart/ Disallow: /compare/ ```[fSAOL0](https://ibb.co/fSAOL0)
Intermediate & Advanced SEO | | BandG0 -
Breaking up a site into multiple sites
Hi, I am working on plan to divide up mid-number DA website into multiple sites. So the current site's content will be divided up among these new sites. We can't share anything going forward because each site will be independent. The current homepage will change to just link out to the new sites and have minimal content. I am thinking the websites will take a hit in rankings but I don't know how much and how long the drop will last. I know if you redirect an entire domain to a new domain the impact is negligible but in this case I'm only redirecting parts of a site to a new domain. Say we rank #1 for "blue widget" on the current site. That page is going to be redirected to new site and new domain. How much of a drop can we expect? How hard will it be to rank for other new keywords say "purple widget" that we don't have now? How much link juice can i expect to pass from current website to new websites? Thank you in advance.
Intermediate & Advanced SEO | | timdavis0 -
Redirect wordpress from /%post_id%/%postname%/ to /blog/%postname%/
Hi what is the code to redirect wordpress blog from site.com/%post_id%/%postname%/ to site.com/blog/%postname%/ We are moving the site to a new server and new url structure. Thanks in advance
Intermediate & Advanced SEO | | Taiger0 -
Best way to "Prune" bad content from large sites?
I am in process of pruning my sites for low quality/thin content. The issue is that I have multiple sites with 40k + pages and need a more efficient way of finding the low quality content than looking at each page individually. Is there an ideal way to find the pages that are worth no indexing that will speed up the process but not potentially harm any valuable pages? Current plan of action is to pull data from analytics and if the url hasn't brought any traffic in the last 12 months then it is safe to assume it is a page that is not beneficial to the site. My concern is that some of these pages might have links pointing to them and I want to make sure we don't lose that link juice. But, assuming we just no index the pages we should still have the authority pass along...and in theory, the pages that haven't brought any traffic to the site in a year probably don't have much authority to begin with. Recommendations on best way to prune content on sites with hundreds of thousands of pages efficiently? Also, is there a benefit to no indexing the pages vs deleting them? What is the preferred method, and why?
Intermediate & Advanced SEO | | atomiconline0 -
Crawled page count in Search console
Hi Guys, I'm working on a project (premium-hookahs.nl) where I stumble upon a situation I can’t address. Attached is a screenshot of the crawled pages in Search Console. History: Doing to technical difficulties this webshop didn’t always no index filterpages resulting in thousands of duplicated pages. In reality this webshops has less than 1000 individual pages. At this point we took the following steps to result this: Noindex filterpages. Exclude those filterspages in Search Console and robots.txt. Canonical the filterpages to the relevant categoriepages. This however didn’t result in Google crawling less pages. Although the implementation wasn’t always sound (technical problems during updates) I’m sure this setup has been the same for the last two weeks. Personally I expected a drop of crawled pages but they are still sky high. Can’t imagine Google visits this site 40 times a day. To complicate the situation: We’re running an experiment to gain positions on around 250 long term searches. A few filters will be indexed (size, color, number of hoses and flavors) and three of them can be combined. This results in around 250 extra pages. Meta titles, descriptions, h1 and texts are unique as well. Questions: - Excluding in robots.txt should result in Google not crawling those pages right? - Is this number of crawled pages normal for a website with around 1000 unique pages? - What am I missing? BxlESTT
Intermediate & Advanced SEO | | Bob_van_Biezen0 -
Mobile Googlebot vs Desktop Googlebot - GWT reports - Crawl errors
Hi Everyone, I have a very specific SEO question. I am doing a site audit and one of the crawl reports is showing tons of 404's for the "smartphone" bot and with very recent crawl dates. If our website is responsive, and we do not have a mobile version of the website I do not understand why the desktop report version has tons of 404's and yet the smartphone does not. I think I am not understanding something conceptually. I think it has something to do with this little message in the Mobile crawl report. "Errors that occurred only when your site was crawled by Googlebot (errors didn't appear for desktop)." If I understand correctly, the "smartphone" report will only show URL's that are not on the desktop report. Is this correct?
Intermediate & Advanced SEO | | Carla_Dawson0 -
When doing a site search my homepage comes up second. Does that matter?
When I do a site: search the homepage comes up second. Does this matter?
Intermediate & Advanced SEO | | EcommerceSite0 -
Rel="self" and what to do with it?
Hey there Mozzers, Another question about a forum issue I encountered. When a forum thread has more than just one page as we all know the best course of action is to use rel="next" rel="prev" or rel="previous" But my forum automatically creates another line in the header called Rel="self" What that does is simple. If i have 3 pages http://www.example.com/article?story=abc1
Intermediate & Advanced SEO | | Angelos_Savvaidis
http://www.example.com/article?story=abc2
http://www.example.com/article?story=abc3 **instead of this ** On the first page, http://www.example.com/article?story=abc1 On the second page, http://www.example.com/article?story=abc2 On the third page, http://www.example.com/article?story=abc3: it creates this On the first page, http://www.example.com/article?story=abc1 So as you can see it creates a url by adding the ?page=1 and names it rel=self which actually gives back a duplicate page because now instead of just http://www.example.com/article?story=abc1 I also have the same page at http://www.example.com/article?story=abc1?page=1 Do i even need rel="self"? I thought that rel="next" and rel="prev" was enough? Should I change that?0