404 Errors for Form Generated Pages - No index, no follow or 301 redirect
-
Hi there
I wonder if someone can help me out and provide the best solution for a problem with form generated pages.
I have blocked the search results pages from being indexed by using the 'no index' tag, and I wondered if I should take this approach for the following pages. I have seen a huge increase in 404 errors since the new site structure and forms being filled in. This is because every time a form is filled in, this generates a new page, which only Google Search Console is reporting as a 404.
Whilst some 404's can be explained and resolved, I wondered what is best to prevent Google from crawling these pages, like this: mydomain.com/webapp/wcs/stores/servlet/TopCategoriesDisplay?langId=-1&storeId=90&catalogId=1008&homePage=Y
Implement 301 redirect using rules, which will mean that all these pages will redirect to the homepage. Whilst in theory this will protect any linked to pages, it does not resolve this issue of why GSC is recording as 404's in the first place. Also could come across to Google as 100,000+ redirected links, which might look spammy.
Place No index tag on these pages too, so they will not get picked up, in the same way the search result pages are not being indexed.
Block in robots - this will prevent any 'result' pages being crawled, which will improve the crawl time currently being taken up. However, I'm not entirely sure if the block will be possible? I would need to block anything after the domain/webapp/wcs/stores/servlet/TopCategoriesDisplay?. Hopefully this is possible?
The no index tag will take time to set up, as needs to be scheduled in with development team, but the robots.txt will be an quicker fix as this can be done in GSC.
I really appreciate any feedback on this one.
Many thanks
-
Hi there
I wonder if you would be able to still help. The number of 404's is increasing significantly and the majority only appear in GSC. The reason I think this could be search URL related is these are increasing significantly every day.
The robots.txt has blocked some, but as the number continues to increase I am thinking there could be a few reasons, which I need to look into more.
A siteliner report cannot crawl the site due to 'too many redirections for this URL'. This is one reason why I suspect there is a wider issue to investigate with the https http.
Moz and Screaming Frog are recording some errors (which we expected and need to resolve) but in the 100's, compared to the 1000's recorded in GSC.
Any other ideas / suggestions would be appreciated.
Many thanks
-
Hi Ric,
That makes sense, so do these pages result in a non-404 from a search, but direct traffic would result in a 404? Or are these 404's only appearing in GSC?
Did the robots.txt blocking work out? Are any of these URL's mentioned in the sitemap.xml? Have you tried crawling the site with a crawler like screaming frog to see if they surface in that? If they do you might need to approach your search results a different way.
-
Hi - thank you for your response. Apologies, I mean't test in GSC.
To answer your question, these are not soft 404's
Many thanks
-
Hi RIc,
I believe your first step would be blocking via robots.txt something along the lines of:
Disallow: domain/webapp/wcs/stores/servlet/TopCategoriesDisplay?*
But I think you are mistaken that you can make this change within GSC, you can test in GSC, but this doesn't change anything on your site. You will still have to reach out to a dev to get this change complete.
Out of curiosity are these 404's being marked as soft 404's?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Not Indexing Pages (Wordpress)
Hello, recently I started noticing that google is not indexing our new pages or our new blog posts. We are simply getting a "Discovered - Currently Not Indexed" message on all new pages. When I click "Request Indexing" is takes a few days, but eventually it does get indexed and is on Google. This is very strange, as our website has been around since the late 90's and the quality of the new content is neither duplicate nor "low quality". We started noticing this happening around February. We also do not have many pages - maybe 500 maximum? I have looked at all the obvious answers (allowing for indexing, etc.), but just can't seem to pinpoint a reason why. Has anyone had this happen recently? It is getting very annoying having to manually go in and request indexing for every page and makes me think there may be some underlying issues with the website that should be fixed.
Technical SEO | | Hasanovic1 -
Follow no-index
I have a question about the right way to not index pages: With a canonical or follow no-index. First we have a blog page: **Blogpage **
Technical SEO | | Happy-SEO
URL: /blog/
index follow Page 2 blog:
URL: /blog?=p2
index follow
rel="prev" /blog/
el="next" ?=p3 Nothing strange here i guess. But we also have other pages with chance on duplicate content: /SEO-category/
/SEO-category/view-more/ Because i don't want the "view-more" items to be indexed i want to set it on: follow no-index (follow to reach pages). But now the "view-more" also have pagination. What is the best way? Option 1:
/SEO-category/view-more/
Follow no-index /SEO-category/view-more?=p2
Follow no-index
rel="prev" /view-more/
el="next" ?=p3 Option 2: /SEO-category/view-more/
Canonical: /SEO-category/ /SEO-category/view-more?=p2
rel="prev" /view-more/
el="next" ?=p3 Option 3: Other suggests? Thanks!0 -
Which way round to 301 redirect?
Hi We have just added a new layered navigation menu to our website. so for example we had Before : www.tidy-books.co.uk/chidlrens-bookcases (this has the seo juice) And Now: http://www.tidy-books.co.uk/childrens-bookcases-book-storage/childrens-bookcases Might be a stupid question but do I redirect the 'now' url to the 'before' url or the the other way round I look forward to hearing your thoughts Thanks
Technical SEO | | tidybooks0 -
404 Error
Hello, Seomoz flagged a url as having a 404 client error. The reason the link doesn't return a proper content page is because the url name was changed. What should we do? Will this error disappear when Google indexes our site again? Or is there some way to manually eliminate it? Thanks!
Technical SEO | | OTSEO0 -
How to properly remove 404 errors
Hi, According to seomoz report I have two 404 errors on my site. (http://screencast.com/t/2FG8fA1dvGB) I removed them from google webmasters central about 2 weeks ago (http://screencast.com/t/MQ8XBvrFm ) , but they're still showing as an error in the next report (weekly update). Is there anything else you do about 404 or just remove urls through gwc? Or maybe seomoz data is delayed? Thanks in advance, JJ
Technical SEO | | jjtech0 -
.EDU via a 301 Redirect?
I recently received a link to my website from an .edu. However, the way they configured it was they pointed the link to one of their internal pages and then made that page 301 to my website. Is there anyway to gain any link juice from that sort of link?
Technical SEO | | gundogs0 -
301 Redirects
Hi, I ran the seomox link report and see that I have an entry for our home page (http://www.trophycentral.com/) and http://www.trophycentral.com/index.html. The index is shown with a 301 redirect. Does this mean that a redirect is already in place to http://www.trophycentral.com/? I want to ensure our traffic is not being split between the two urls, but not sure how to confirm this. Thanks! <colgroup><col width="294"></colgroup><colgroup><col width="81"></colgroup><colgroup><col width="80"></colgroup><colgroup><col width="77"></colgroup><colgroup><col width="214"></colgroup>
Technical SEO | | trophycentraltrophiesandawards
| URL | HTTP Status | Total Links | Page Authority | Number of Linking Root Domains |
| http://www.trophycentral.com/ | 200 | 5746 | 53 | 244 |
| http://www.trophycentral.com/index.html | 301 | 5123 | 42 | 4 |1 -
Joomla 301 redirects
hi i am using joomla sef404 and i have 7000 not found pages in my webmaster google tool page. the trouble is i got rid of a lot of pages and also components so now i am left with loads of not found pages. what i want to try and do is to create 301 redirects so that i do not lose page rank. can anyone please let me know how to do this step by step please
Technical SEO | | ClaireH-1848860