Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
404 or 410 status code after deleting a real estate listing
-
Hi there,
We manage a website which generates an overview and detailpages of listings for several real estate agents.
When these listings have been sold, they are removed from the overview and pages. These listings appear as not found in the crawl error overview in Google Search Console. These pages appear as 404's, would changing this to 410's solve this problem? And if not, what fix could take care of this problem?
-
Good answer Dirk.
I like your idea of adding valuable, relevant content to the pages Dirk, good thinking.
Personally, I'd rather Iet Google know these pages are removed intentionally and not due to errors, so 410 rather than leaving as 40.
One thing to be mindful of, though, is how much crawl budget you're willing to give to these pages. If we're talking about a lot of pages in bulk, I'd be worried how much crawl budget they'd eat up over time. As you point out, they'd likely drop in rank anyway due to loss of internal links too, so might be the cost to the crawl budget isn't worth it?.
Another solution (using your idea Dirk), would be to somehow automate the process of, when a listing is marked as sold, the listing is removed, other properties in the same area are added (as you suggest), then some time later (month or two?), a 410 header set.
The other option would be to 301 the old pages back to the area page for the properties (perhaps with something like a bootstrap message saying the property is sold but others in the area are available). This would pass juice etc back to that page. but, of course, you'd be telling G that the page had permanently moved, which isn't quite the case.
-
The answer from Kristen is correct. However changing 404 to 410 will just let these pages appearing as 410 in the Search Console. The fact that they are appearing is not a problem - it's just that Google wants to notify you that pages return a 4xx status. If this is intended (like in your case) you can just ignore these messages and mark them as fixed.
In your case you could as well consider another option - remove the pages from the listings but keep them published (with status 200). Update the page, indicating that the original property is sold but list some other (similar) properties as an alternative. This way, if there are external pages linking to the property page the link value doesn't get lost and if people would accidentally land on this page they still find content which could be interesting to them (as you remove the navigation links to these pages they become orphans - so little change that they will rank very high in Google)
Dirk
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Forwarded vanity domains, suddenly resolving to 404 with appended URL's ending in random 5 characters
We have several vanity domains that forward to various pages on our primary domain.
Intermediate & Advanced SEO | | SS.Digital
e.g. www.vanity.com (301)--> www.mydomain.com/sub-page (200) These forwards have been in place for months or even years and have worked fine. As of yesterday, we have seen the following problem. We have made no changes in the forwarding settings. Now, inconsistently, they sometimes resolve and sometimes they do not. When we load the vanity URL with Chrome Dev Tools (Network Pane) open, it shows the following redirect chains, where xxxxx represents a random 5 character string of lower and upper case letters. (e.g. VGuTD) EXAMPLE:
www.vanity.com (302, Found) -->
www.vanity.com/xxxxx (302, Found) -->
www.vanity.com/xxxxx (302, Found) -->
www.vanity.com/xxxxx/xxxxx (302, Found) -->
www.mydomain.com/sub-page/xxxxx (404, Not Found) This is just one example, the amount of redirects, vary wildly. Sometimes there is only 1 redirect, sometimes there are as many as 5. Sometimes the request will ultimately resolve on the correct mydomain.com/sub-page, but usually it does not (as in the example above). We have cross-checked across every browser, device, private/non-private, cookies cleared, on and off of our network etc... This leads us to believe that it is not at the device or host level. Our Registrar is Godaddy. They have not encountered this issue before, and have no idea what this 5 character string is from. I tend to believe them because per our analytics, we have determined that this problem only started yesterday. Our primary question is, has anybody else encountered this problem either in the last couple days, or at any time in the past? We have come up with a solution that works to alleviate the problem, but to implement it across hundreds of vanity domains will take us an inordinate amount of time. Really hoping to fix the cause of the problem instead of just treating the symptom.0 -
SEO on Jobs sites: how to deal with expired listings with "Google for Jobs" around
Dear community, When dealing with expired job offers on jobs sites from a SEO perspective, most practitioners recommend to implement 301 redirects to category pages in order to keep the positive ranking signals of incoming links. Is it necessary to rethink this recommendation with "Google for Jobs" is around? Google's recommendations on how to handle expired job postings does not include 301 redirects. "To remove a job posting that is no longer available: Remove the job posting from your sitemap. Do one of the following: Note: Do NOT just add a message to the page indicating that the job has expired without also doing one of the following actions to remove the job posting from your sitemap. Remove the JobPosting markup from the page. Remove the page entirely (so that requesting it returns a 404 status code). Add a noindex meta tag to the page." Will implementing 301 redirects the chances to appear in "Google for Jobs"? What do you think?
Intermediate & Advanced SEO | | grnjbs07175 -
How can I make a list of all URLs indexed by Google?
I started working for this eCommerce site 2 months ago, and my SEO site audit revealed a massive spider trap. The site should have been 3500-ish pages, but Google has over 30K pages in its index. I'm trying to find a effective way of making a list of all URLs indexed by Google. Anyone? (I basically want to build a sitemap with all the indexed spider trap URLs, then set up 301 on those, then ping Google with the "defective" sitemap so they can see what the site really looks like and remove those URLs, shrinking the site back to around 3500 pages)
Intermediate & Advanced SEO | | Bryggselv.no0 -
Magento: Should we disable old URL's or delete the page altogether
Our developer tells us that we have a lot of 404 pages that are being included in our sitemap and the reason for this is because we have put 301 redirects on the old pages to new pages. We're using Magento and our current process is to simply disable, which then makes it a a 404. We then redirect this page using a 301 redirect to a new relevant page. The reason for redirecting these pages is because the old pages are still being indexed in Google. I understand 404 pages will eventually drop out of Google's index, but was wondering if we were somehow preventing them dropping out of the index by redirecting the URL's, causing the 404 pages to be added to the sitemap. My questions are: 1. Could we simply delete the entire unwanted page, so that it returns a 404 and drops out of Google's index altogether? 2. Because the 404 pages are in the sitemap, does this mean they will continue to be indexed by Google?
Intermediate & Advanced SEO | | andyheath0 -
How to outrank a directory listing with high DA but low PA?
My site is at 4th place, 3 places above it is a gumtree (similar to yell, yelp) listing. How can you figure out how difficult it would be outrank those pages? I mean obviously the pages would have low PA and they are top based on the high DA of the site. This also seems to go back to keyword research and difficulty, when I'm doing keyword research and I see a wikipedia site in top 5 rank, or a yell.com or perhaps an article in forbes.com outranks your site. Typically the problem seems to be Google giving a lot of credit to these pages rankings based on the high DA rather than PA of the pages. How would you gauge the difficulty of that keyword then if the competition are pages with very high DA which is impossible to compete with but low PA? Thanks
Intermediate & Advanced SEO | | magusara2 -
Not found errors (404) due to being hacked
Hi Moz Guru's Our website was hacked a few months ago, since then we have taken various measures, last one being redesigning the website all together and removing it from a WordPress platform. So far all is going well, except that the 404 not found errors keeps coming up in Google Webmaster tools. The URLs are spam pages that were created by the virus. And these spam pages have been indexed by Google, and now we are struggling to get rid of them. Is there any way we can deal with these 404 spam pages links? Is marking all of them as fixed in the webmaster tools - search console- crawl errors helpful in any way? Can this have a negative impact on the SEO ? Looking forward to your answers. Many thanks.
Intermediate & Advanced SEO | | monicapopa0 -
Ecommerce: A product in multiple categories with a canonical to create a ‘cluster’ in one primary category Vs. a single listing at root level with dynamic breadcrumb.
OK – bear with me on this… I am working on some pretty large ecommerce websites (50,000 + products) where it is appropriate for some individual products to be placed within multiple categories / sub-categories. For example, a Red Polo T-shirt could be placed within: Men’s > T-shirts >
Intermediate & Advanced SEO | | AbsoluteDesign
Men’s > T-shirts > Red T-shirts
Men’s > T-shirts > Polo T-shirts
Men’s > Sale > T-shirts
Etc. We’re getting great organic results for our general T-shirt page (for example) by clustering creative content within its structure – Top 10 tips on wearing a t-shirt (obviously not, but you get the idea). My instinct tells me to replicate this with products too. So, of all the location mentioned above, make sure all polo shirts (no matter what colour) have a canonical set within Men’s > T-shirts > Polo T-shirts. The presumption is that this will help build the authority of the Polo T-shirts page – this obviously presumes “Polo Shirts” get more search volume than “Red T-shirts”. My presumption why this is the best option is because it is very difficult to manage, particularly with a large inventory. And, from experience, taking the time and being meticulous when it comes to SEO is the only way to achieve success. From an administration point of view, it is a lot easier to have all product URLs at the root level and develop a dynamic breadcrumb trail – so all roads can lead to that one instance of the product. There's No need for canonicals; no need for ecommerce managers to remember which primary category to assign product types to; keeping everything at root level also means there no reason to worry about redirects if product move from sub-category to sub-category etc. What do you think is the best approach? Do 1000s of canonicals and redirect look ‘messy’ to a search engine overtime? Any thoughts and insights greatly received.0 -
Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)
Hi Guys, We have developed a plugin that allows us to display used vehicle listings from a centralized, third-party database. The functionality works similar to autotrader.com or cargurus.com, and there are two primary components: 1. Vehicle Listings Pages: this is the page where the user can use various filters to narrow the vehicle listings to find the vehicle they want.
Intermediate & Advanced SEO | | browndoginteractive
2. Vehicle Details Pages: this is the page where the user actually views the details about said vehicle. It is served up via Ajax, in a dialog box on the Vehicle Listings Pages. Example functionality: http://screencast.com/t/kArKm4tBo The Vehicle Listings pages (#1), we do want indexed and to rank. These pages have additional content besides the vehicle listings themselves, and those results are randomized or sliced/diced in different and unique ways. They're also updated twice per day. We do not want to index #2, the Vehicle Details pages, as these pages appear and disappear all of the time, based on dealer inventory, and don't have much value in the SERPs. Additionally, other sites such as autotrader.com, Yahoo Autos, and others draw from this same database, so we're worried about duplicate content. For instance, entering a snippet of dealer-provided content for one specific listing that Google indexed yielded 8,200+ results: Example Google query. We did not originally think that Google would even be able to index these pages, as they are served up via Ajax. However, it seems we were wrong, as Google has already begun indexing them. Not only is duplicate content an issue, but these pages are not meant for visitors to navigate to directly! If a user were to navigate to the url directly, from the SERPs, they would see a page that isn't styled right. Now we have to determine the right solution to keep these pages out of the index: robots.txt, noindex meta tags, or hash (#) internal links. Robots.txt Advantages: Super easy to implement Conserves crawl budget for large sites Ensures crawler doesn't get stuck. After all, if our website only has 500 pages that we really want indexed and ranked, and vehicle details pages constitute another 1,000,000,000 pages, it doesn't seem to make sense to make Googlebot crawl all of those pages. Robots.txt Disadvantages: Doesn't prevent pages from being indexed, as we've seen, probably because there are internal links to these pages. We could nofollow these internal links, thereby minimizing indexation, but this would lead to each 10-25 noindex internal links on each Vehicle Listings page (will Google think we're pagerank sculpting?) Noindex Advantages: Does prevent vehicle details pages from being indexed Allows ALL pages to be crawled (advantage?) Noindex Disadvantages: Difficult to implement (vehicle details pages are served using ajax, so they have no tag. Solution would have to involve X-Robots-Tag HTTP header and Apache, sending a noindex tag based on querystring variables, similar to this stackoverflow solution. This means the plugin functionality is no longer self-contained, and some hosts may not allow these types of Apache rewrites (as I understand it) Forces (or rather allows) Googlebot to crawl hundreds of thousands of noindex pages. I say "force" because of the crawl budget required. Crawler could get stuck/lost in so many pages, and my not like crawling a site with 1,000,000,000 pages, 99.9% of which are noindexed. Cannot be used in conjunction with robots.txt. After all, crawler never reads noindex meta tag if blocked by robots.txt Hash (#) URL Advantages: By using for links on Vehicle Listing pages to Vehicle Details pages (such as "Contact Seller" buttons), coupled with Javascript, crawler won't be able to follow/crawl these links. Best of both worlds: crawl budget isn't overtaxed by thousands of noindex pages, and internal links used to index robots.txt-disallowed pages are gone. Accomplishes same thing as "nofollowing" these links, but without looking like pagerank sculpting (?) Does not require complex Apache stuff Hash (#) URL Disdvantages: Is Google suspicious of sites with (some) internal links structured like this, since they can't crawl/follow them? Initially, we implemented robots.txt--the "sledgehammer solution." We figured that we'd have a happier crawler this way, as it wouldn't have to crawl zillions of partially duplicate vehicle details pages, and we wanted it to be like these pages didn't even exist. However, Google seems to be indexing many of these pages anyway, probably based on internal links pointing to them. We could nofollow the links pointing to these pages, but we don't want it to look like we're pagerank sculpting or something like that. If we implement noindex on these pages (and doing so is a difficult task itself), then we will be certain these pages aren't indexed. However, to do so we will have to remove the robots.txt disallowal, in order to let the crawler read the noindex tag on these pages. Intuitively, it doesn't make sense to me to make googlebot crawl zillions of vehicle details pages, all of which are noindexed, and it could easily get stuck/lost/etc. It seems like a waste of resources, and in some shadowy way bad for SEO. My developers are pushing for the third solution: using the hash URLs. This works on all hosts and keeps all functionality in the plugin self-contained (unlike noindex), and conserves crawl budget while keeping vehicle details page out of the index (unlike robots.txt). But I don't want Google to slap us 6-12 months from now because it doesn't like links like these (). Any thoughts or advice you guys have would be hugely appreciated, as I've been going in circles, circles, circles on this for a couple of days now. Also, I can provide a test site URL if you'd like to see the functionality in action.0