404 Errors for Form Generated Pages - No index, no follow or 301 redirect
-
Hi there
I wonder if someone can help me out and provide the best solution for a problem with form generated pages.
I have blocked the search results pages from being indexed by using the 'no index' tag, and I wondered if I should take this approach for the following pages. I have seen a huge increase in 404 errors since the new site structure and forms being filled in. This is because every time a form is filled in, this generates a new page, which only Google Search Console is reporting as a 404.
Whilst some 404's can be explained and resolved, I wondered what is best to prevent Google from crawling these pages, like this: mydomain.com/webapp/wcs/stores/servlet/TopCategoriesDisplay?langId=-1&storeId=90&catalogId=1008&homePage=Y
Implement 301 redirect using rules, which will mean that all these pages will redirect to the homepage. Whilst in theory this will protect any linked to pages, it does not resolve this issue of why GSC is recording as 404's in the first place. Also could come across to Google as 100,000+ redirected links, which might look spammy.
Place No index tag on these pages too, so they will not get picked up, in the same way the search result pages are not being indexed.
Block in robots - this will prevent any 'result' pages being crawled, which will improve the crawl time currently being taken up. However, I'm not entirely sure if the block will be possible? I would need to block anything after the domain/webapp/wcs/stores/servlet/TopCategoriesDisplay?. Hopefully this is possible?
The no index tag will take time to set up, as needs to be scheduled in with development team, but the robots.txt will be an quicker fix as this can be done in GSC.
I really appreciate any feedback on this one.
Many thanks
-
Hi there
I wonder if you would be able to still help. The number of 404's is increasing significantly and the majority only appear in GSC. The reason I think this could be search URL related is these are increasing significantly every day.
The robots.txt has blocked some, but as the number continues to increase I am thinking there could be a few reasons, which I need to look into more.
A siteliner report cannot crawl the site due to 'too many redirections for this URL'. This is one reason why I suspect there is a wider issue to investigate with the https http.
Moz and Screaming Frog are recording some errors (which we expected and need to resolve) but in the 100's, compared to the 1000's recorded in GSC.
Any other ideas / suggestions would be appreciated.
Many thanks
-
Hi Ric,
That makes sense, so do these pages result in a non-404 from a search, but direct traffic would result in a 404? Or are these 404's only appearing in GSC?
Did the robots.txt blocking work out? Are any of these URL's mentioned in the sitemap.xml? Have you tried crawling the site with a crawler like screaming frog to see if they surface in that? If they do you might need to approach your search results a different way.
-
Hi - thank you for your response. Apologies, I mean't test in GSC.
To answer your question, these are not soft 404's
Many thanks
-
Hi RIc,
I believe your first step would be blocking via robots.txt something along the lines of:
Disallow: domain/webapp/wcs/stores/servlet/TopCategoriesDisplay?*
But I think you are mistaken that you can make this change within GSC, you can test in GSC, but this doesn't change anything on your site. You will still have to reach out to a dev to get this change complete.
Out of curiosity are these 404's being marked as soft 404's?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Search Console Showing 404 errors for product pages not in sitemap?
We have some products with url changes over the past several months. Google is showing these as having 404 errors even though they are not in sitemap (sitemap shows the correct NEW url). Is this expected? Will these errors eventually go away/stop being monitored by Google?
Technical SEO | | woshea0 -
Is my page being indexed?
To put you all in context, here is the situation, I have pages that are only accessible via an intern search tool that shows the best results for the request. Let's say i want to see the result on page 2, the page 2 will have a request in the url like this: ?p=2&s=12&lang=1&seed=3688 The situation is that we've disallowed every URL's that contains a "?" in the robots.txt file which means that Google doesn't crawl the page 2,3,4 and so on. If a page is only accessible via page 2, do you think Google will be able to access it? The url of the page is included in the sitemap. Thank you in advance for the help!
Technical SEO | | alexrbrg0 -
Selective 301 redirections of pages within folders
Redirection Puzzle - it's got me puzzled anyhow! The finished website has just been converted from an old aspx affair to a wordpress site. Some directory structures have changed significantly; there appears to be a load of older medical articles that have not been added back in and it sounds unlikely that they will be. Therefore unmatched old news articles need to be pointed to the top news page to keep hold of any link value they may have accrued. The htaccess file starts with ithemes security's code, Followed by the main wordpress block and I have added the user redirects to the final section of the htaccess file . I have been through the redirects and rewrites line by line to verify them and the following sections are giving me problems. This is probably just my aging brain failing to grasp basic logic. If I can tap into anybody's wisdom for a bit of help I would appreciate it. My eyes and brain are gone to jelly. I have used htaccesscheck.com to check out the underlying syntax and ironed out the basic errors that I had previously missed. The bulk of the redirects are working correctly. #Here there are some very long media URLs which are absent on the new site and I am simply redirecting visiting spiders to the page that will hold media in future. Media items refuse to redirect
Technical SEO | | TomVolpe
Line 408 redirect 301 /Professionals/Biomedicalforum/Recordedfora/Rich%20Media%20http:/kplayer.kcl.ac.uk/ess/echo/presentation/15885525-ff02-4ab2-b0b9-9ba9d97ca266 http://www.SITENAME.ac.uk/biomedical-forum/recorded-fora/ Line 409 redirect 301 /Professionals/Biomedicalforum/Recordedfora/Quicktime%20http:/kplayer.kcl.ac.uk/ess/echo/presentation/15885525-ff02-4ab2-b0b9-9ba9d97ca266/media.m4v http://www.SITENAME.ac.uk/biomedical-forum/recorded-fora/ Line 410 redirect 301 /Professionals/Biomedicalforum/Recordedfora/Mp3%20http:/kplayer.kcl.ac.uk/ess/echo/presentation/15885525-ff02-4ab2-b0b9-9ba9d97ca266/media.mp3 http://www.SITENAME.ac.uk/biomedical-forum/recorded-fora/ #Old site pagination URLs redirected to new "news" top level page - Here I am simply pointing all the pagination URLs for the news section, that were indexed, to the main news page. These work but append the pagination code on to the new visible URL. Have I got the syntax correct in this version of the lines to suppress the appended garbage? RewriteRule ^/LatestNews.aspx(?:.*) http://www.SITENAME.ac.uk/news-events/latest-news/? [R=301,L] #On the old site many news directories (blog effectively) contained articles that are unmatched on the new site, have been redirected to new top level news (blog) page: In this section I became confused about whether to use Redirect Match or RewriteRule to point the articles in each year directory back to the top level news page. When I have added a redirectmatch command - it has been disabling the whole site! Despite my syntax check telling me it is syntactically correct. Currently I'm getting a 404 for any of the old URLs in these year by year directories, instead of a successful redirect. I suspect Regex lingo is not clicking for me 😉 My logic here was rewrite any aspx file in the directory to the latest news page at the top. This is my latest attempt to rectify the fault. Am I nearer with my syntax or my logic? The actual URLs and paths have been substituted, but the structure is the same). So what I believe I have set up is: in an earlier section; News posts that have been recreated in the new site are redirected 1 - 1 and they are working successfully. If a matching URL is not found, when the parsing of the file reaches the line for the 1934 directory it should read any remaining .aspx URL request and rewrite it to the latest news page as a 301 and stop processing this block of commands. The subsequent commands in this block repeat the process for the other year groups of posts. Clearly I am failing to comprehend something and illumination would be gratefully received. RewriteRule ^/Blab/Blabbitall/1934/(.*).aspx http://www.SITENAME.ac.uk/news-events/latest-news/ [R=301,L] #------Old site 1933 unmatched articles redirected to new news top level page RewriteRule ^/Blab/Blabbitall/1933/(.*).aspx http://www.SITENAME.ac.uk/news-events/latest-news/ [R=301,L] #------Old site 1932 unmatched articles redirected to new news top level page RewriteRule ^/Blab/Blabbitall/1932/(.*)/.aspx http://www.SITENAME.ac.uk/news-events/latest-news/ [R=301,L] #------Old site 1931 unmatched articles redirected to new news top level page RewriteRule ^/Blab/Blabbitall/1931/(.*)/.aspx http://www.SITENAME.ac.uk/news-events/latest-news/ [R=301,L] #------Old site 1930 unmatched articles redirected to new news top level page RewriteRule ^/Blab/Blabbitall/1930/(.*)/.aspx http://www.SITENAME.ac.uk/news-events/latest-news/ [R=301,L] Many thanks if anyone can help me understand the logic at work here.0 -
60,000 404 errors
Do 404 errors on a large scale really matter? I'm just aware that I now have over 60,000 and was wondering if the community think that I should address them by putting 301 redirects in place. Thanks
Technical SEO | | the-gate-films0 -
Remove a page after redirection
Hi, I had page eg. www.example.com/page1 and I redirect 302 it to > www.example.com/page2 After that I fatch this page (page2) with GSC and this page was index in serp. Can I remove this old redirect page > www.example.com/page1 now? Will this remove harm my page?
Technical SEO | | Tormar0 -
404 error
Both SEOmoz and Google webmaster tools are returning over 4000 error 404.The majority or returned error URLs are for images, and all URLs end up with %20target=as shown belowimages/products/detail/AD9058RoundGlassTableChairs.jpg%20target=images/products/detail/BM921ModernRoundDiningTable.jpg%20target=images/products/detail/CR701506CappuccinoCoffeeTableSet.jpg%20target=any suggestions?RegardsTony
Technical SEO | | OCFurniture0 -
301 redirecting a mobile site.
Is it possible to selectively 301 redirect mobile/tablet user agents and google robots from the desktop version of a website to a mobile site? Would this preserve the SEO for the desktop website while optimizing the mobile/tablet site for mobile SEO?
Technical SEO | | inc.com0 -
Is this a safe 301 redirect?
We are moving our site from one platform to another. Currently on our site we have two homepages. "www.homepage.com" and "www.homepage.com/Index" Both pages have some high quality links pointing in on them. The problem: We are going to be doing a 301 redirect from "www.homepage.com/Index" page to "www.homepage.com" as we are moving platforms at this time we weren't going to create a "www.homepage.com/Index" page all. This leaves this page as an empty URL. With this webpage disappearing all together will we lose traction as we are redirecting an empty URL? Or is it better to recreate this "www.homepage.com/Index" on our new platform redirect it and wait for google to deIndex this page for us? As well is there a tutorial for how to implement 301 redirects or is this something worth looking for a developer and pay someone to do?
Technical SEO | | HCGDiet0