404 Errors for Form Generated Pages - No index, no follow or 301 redirect
-
Hi there
I wonder if someone can help me out and provide the best solution for a problem with form generated pages.
I have blocked the search results pages from being indexed by using the 'no index' tag, and I wondered if I should take this approach for the following pages. I have seen a huge increase in 404 errors since the new site structure and forms being filled in. This is because every time a form is filled in, this generates a new page, which only Google Search Console is reporting as a 404.
Whilst some 404's can be explained and resolved, I wondered what is best to prevent Google from crawling these pages, like this: mydomain.com/webapp/wcs/stores/servlet/TopCategoriesDisplay?langId=-1&storeId=90&catalogId=1008&homePage=Y
Implement 301 redirect using rules, which will mean that all these pages will redirect to the homepage. Whilst in theory this will protect any linked to pages, it does not resolve this issue of why GSC is recording as 404's in the first place. Also could come across to Google as 100,000+ redirected links, which might look spammy.
Place No index tag on these pages too, so they will not get picked up, in the same way the search result pages are not being indexed.
Block in robots - this will prevent any 'result' pages being crawled, which will improve the crawl time currently being taken up. However, I'm not entirely sure if the block will be possible? I would need to block anything after the domain/webapp/wcs/stores/servlet/TopCategoriesDisplay?. Hopefully this is possible?
The no index tag will take time to set up, as needs to be scheduled in with development team, but the robots.txt will be an quicker fix as this can be done in GSC.
I really appreciate any feedback on this one.
Many thanks
-
Hi there
I wonder if you would be able to still help. The number of 404's is increasing significantly and the majority only appear in GSC. The reason I think this could be search URL related is these are increasing significantly every day.
The robots.txt has blocked some, but as the number continues to increase I am thinking there could be a few reasons, which I need to look into more.
A siteliner report cannot crawl the site due to 'too many redirections for this URL'. This is one reason why I suspect there is a wider issue to investigate with the https http.
Moz and Screaming Frog are recording some errors (which we expected and need to resolve) but in the 100's, compared to the 1000's recorded in GSC.
Any other ideas / suggestions would be appreciated.
Many thanks
-
Hi Ric,
That makes sense, so do these pages result in a non-404 from a search, but direct traffic would result in a 404? Or are these 404's only appearing in GSC?
Did the robots.txt blocking work out? Are any of these URL's mentioned in the sitemap.xml? Have you tried crawling the site with a crawler like screaming frog to see if they surface in that? If they do you might need to approach your search results a different way.
-
Hi - thank you for your response. Apologies, I mean't test in GSC.
To answer your question, these are not soft 404's
Many thanks
-
Hi RIc,
I believe your first step would be blocking via robots.txt something along the lines of:
Disallow: domain/webapp/wcs/stores/servlet/TopCategoriesDisplay?*
But I think you are mistaken that you can make this change within GSC, you can test in GSC, but this doesn't change anything on your site. You will still have to reach out to a dev to get this change complete.
Out of curiosity are these 404's being marked as soft 404's?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page Indexing without content
Hello. I have a problem of page indexing without content. I have website in 3 different languages and 2 of the pages are indexing just fine, but one language page (the most important one) is indexing without content. When searching using site: page comes up, but when searching unique keywords for which I should rank 100% nothing comes up. This page was indexing just fine and the problem arose couple of days ago after google update finished. Looking further, the problem is language related and every page in the given language that is newly indexed has this problem, while pages that were last crawled around one week ago are just fine. Has anyone ran into this type of problem?
Technical SEO | | AtuliSulava1 -
Keywords are indexed on the home page
Hello everyone, For one of our websites, we have optimized for many keywords. However, it seems that every keyword is indexed on the home page, and thus not ranked properly. This occurs only on one of our many websites. I am wondering if anyone knows the cause of this issue, and how to solve it. Thank you.
Technical SEO | | Ginovdw1 -
301 Redirects, Sitemaps and Indexing - How to hide redirected urls from search engines?
We have several pages in our site like this one, http://www.spectralink.com/solutions, which redirect to deeper page, http://www.spectralink.com/solutions/work-smarter-not-harder. Both urls are listed in the sitemap and both pages are being indexed. Should we remove those redirecting pages from the site map? Should we prevent the redirecting url from being indexed? If so, what's the best way to do that?
Technical SEO | | HeroDesignStudio0 -
Why is this page not ranking but is indexed?
I have a page http://jobs.hays.co.uk/jobs-in-norfolk and it is indexed by Google but will not show up for any keywords I try. Any ideas?
Technical SEO | | S_Curtis0 -
301 redirects
Hello. Our site was recently rebuilt, and we switched from using index.php in all the urls to not using it at all. We also changed the names of many of our pages. So the urls have been renamed from "example.com/index.php/old_page_name/" to "example.com/new-page-name/". While we were at it, we changed from "_" to "-" as our word separators in the urls. In the .htaccess file, we have a small block of code that strips out "index.php/" from all requests. This code redirects a request for "example.com/index.php/old_page_name/" to "example.com/old_page_name/" For your information, the code that strips out "index.php/" is: RewriteCond %{THE_REQUEST} ^GET.index.php [NC]
Technical SEO | | nyc-seo
RewriteCond %{THE_REQUEST} !/uSZWTLna/.
RewriteRule (.?)index.php/(.*) /$1$2 [R=301,L] Then we have 301 redirects from "example.com/old_page_name/" to "example.com/new-page-name/" QUESTION 1: Is this two-step redirect approach okay, or would it be better to skip the separate index.php stripping code and simply have 301 redirects that include "index.php" in the urls? QUESTION 2: Will we lose some of the benefit of the links that have to pass through a 301 redirect? QUESTION 3: We have 50 or so redirects. Will this affect performance of the site? How many redirects does it take to start affecting performance? Thank you!0 -
Index page
To the SEO experts, this may well seem a silly question, so I apologies in advance as I try not to ask questions that I probably know the answer for already, but clarity is my goal I have numerous sites ,as standard practice, through the .htaccess I will always set up non www to www, and redirect the index page to www.mysite.com. All straight forward, have never questioned this practice, always been advised its the ebst practice to avoid duplicate content. Now, today, I was looking at a CMS service for a customer for their website, the website is already built and its a static website, so the CMS integration was going to mean a full rewrite of the website. Speaking to a friend on another forum, he told me about a service called simple CMS, had a look, looks perfect for the customer ... Went to set it up on the clients site and here is the problem. For the CMS software to work, it MUST access the index page, because my index page is redirected to www.mysite.com , it wont work as it cant find the index page (obviously) I questioned this with the software company, they inform me that it must access the index page, I have explained that it wont be able to and why (cause I have my index page redirected to avoid duplicate content) To my astonishment, the person there told me that duplicate content is a huge no no with Google (that's not the astonishing part) but its not relevant to the index and non index page of a website. This goes against everything I thought I knew ... The person also reassured me that they have worked within the SEO area for 10 years. As I am a subscriber to SEO MOZ and no one here has anything to gain but offering advice, is this true ? Will it not be an issue for duplicate content to show both a index page and non index page ?, will search engines not view this as duplicate content ? Or is this SEO expert talking bull, which I suspect, but cannot be sure. Any advice would be greatly appreciated, it would make my life a lot easier for the customer to use this CMS software, but I would do it at the risk of tarnishing the work they and I have done on their ranking status Many thanks in advance John
Technical SEO | | Johnny4B0 -
301 redirect from root to /index.aspx
I have taken over the SEO for www.domain.net. The way i've inherited the setup is that www.domain.net is 301 redirected to www.domain.net/index.aspx Looking at top pages and linking root domains in Opensiteexplorer I can see that www.domain.net/index.aspx has 1,006 linking root domains www.domain.net has 806 linking root domains. I assume that www.domain.net is passing the value of it's 806 domain links to www.domain.net/index.aspx via the 301 redirect and because of this would expect www.domain.net/index.aspx to be the strongest page on the site and be the url that ranks in the listings for many relevant searches. It appears however that www.domain.net is what is shown in listings and not www.domain.net/index.aspx ?? Can anyone explain why this might be?? If I do a site: search in Google then www.domain.net is indexed and not www.domain.net/index.aspx ??
Technical SEO | | QubaSEO0 -
Does page speed affect what pages are in the index?
We have around 1.3m total pages, Google currently crawls on average 87k a day and our average page load is 1.7 seconds. Out of those 1.3m pages(1.2m being "spun up") google has only indexed around 368k and our SEO person is telling us that if we speed up the pages they will crawl the pages more and thus will index more of them. I personally don't believe this. At 87k pages a day Google has crawled our entire site in 2 weeks so they should have all of our pages in their DB by now and I think they are not index because they are poorly generated pages and it has nothing to do with the speed of the pages. Am I correct? Would speeding up the pages make Google crawl them faster and thus get more pages indexed?
Technical SEO | | upper2bits0