Robots.txt file in Shopify - Collection and Product Page Crawling Issue
-
Hi, I am working on one big eCommerce store which have more then 1000 Product. we just moved platform WP to Shopify getting noindex issue. when i check robots.txt i found below code which is very confusing for me. **I am not getting meaning of below tags.**
- Disallow: /collections/+
- Disallow: /collections/%2B
- Disallow: /collections/%2b
- Disallow: /blogs/+
- Disallow: /blogs/%2B
- Disallow: /blogs/%2b
I can understand that my robots.txt disallows SEs to crawling and indexing my all product pages. ( collection/*+* ) Is this the query which is affecting the indexing product pages?
Please explain me how this robots.txt work in shopify and once my page crawl and index by google.com then what is use of Disallow:
Thanks.
-
Make sure products are in your sitemap and it has been re-submitted. You can also submit your products to request indexing for them in Google Search Console.
-
Thank you for replying,
But, our main issue is that we have already crawled all collection pages but the product pages haven't crawled yet. Now we don't figure out that whether it's robots.txt issue or other crawling issue?
For example: "www.abc.com/collection/" page is crawled but "www.abc.com/collection/product1/" page hasn't crawled.
Please reply me some tips here.
-
While you may not want context indexed, it's still valuable to be crawled and access your most important content like products.
If you are blocking your /collections pages, Google will not be able to see that page's meta robots set to noindex, causing an issue for you. You may consider allowing robots to crawl your /collections pages but noindex them if they are low value or duplicative.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Third part http links on the page source: Social engineering content warning from Google
Hi, We have received "Social engineering content" warning from Google and one of our important page and it's internal pages have been flagged as "Deceptive site ahead". We wonder what's the reason behind this as Google didn't point exactly to the specific part of the page which made us look so to the Google. We don't employ any such content on the page and the content is same for many months. As our site is WP hosted, we used a WordPress plugin for this page's layout which injected 2 http (non-https) links in our page code. We suspect if this is the reason behind this? Any ideas? Thanks
White Hat / Black Hat SEO | | vtmoz1 -
New Service/Product SEO and rankings
Hello, fellow MOZers. We are a web design company, and we had SEO as secondary service for years. Due to changes in the company we started pushing SEO as one of our main services about 6 monhs ago. We have separate page , targeting that service, as well as case studies, supportive information pages, even SEO Center, which is like a blog about SEO only. We are not using black hat SEO, doing honest link earning and building, don't use keyword stuffing, everything is by the book. I understand that SEO takes time, especially for a company which has a footprint as web design company, not as SEO company. We are ranking very good for web design related keyphrases, however, we don't see any improvements for SEO related keywords. It always was and is between 25-30 SERP. At the same time, competitors, who are ranking on first page for SEO related phrases are pretty bad looking. Design-wise as well as blackhat-SEO-wise. Everything is keyword stuffed, UX is horrible, prices are ridiculous. So, do you guys have any thought/advise on how we can see results / why we are not seeing results. Links: Google search result: https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=seo%20houston Competitors: www.seohouston.com, www.graphicsbycindy.com Our pages: https://www.hyperlinksmedia.com/seo-houston.php, https://www.hyperlinksmedia.com/seo-houston/
White Hat / Black Hat SEO | | seomozinator0 -
Glossary pages - keyword stuffing danger?
I've put together a glossary of terms related to my industry that have SEO value and am planning on building out a section on our site with unique pages for each term. However, most of these terms have synonyms or are highly similar to other valuable terms. If I were to make a glossary, and on each page (that will have high-quality, valuable, and accurate definitions and more), wrote something like "{term}, also commonly referred to as {synonym}, {synonym}," would I run the risk of keyword stuffing penalties? My only other idea beyond creating a glossary with separate pages defining each synonym is to use schema.org markup to add synonyms to the HTML of the page, but that could be seen as even more grey-hat type keyword stuffing. I guess one other option would be to work the synonyms into the definition so that the presence of the keyword reads more organically. Thanks!
White Hat / Black Hat SEO | | alecfwilson0 -
Controlling crawl speed/delay through dynamic server-code and 503's
Lately i'm experiencing performance trouble caused by bot traffic. Although Googlebot is not the worst (it's mainly bingbot and ahrefsbot), they cause heavy server load from time to time. We run a lot of sites on one server, so heavy traffic on one site impacts other site's performance. Problem is that 1) I want a centrally managed solution for all sites (per site administration takes too much time), which 2) takes into account total server-load in stead of only 1 site's traffic and 3) controls overall bot-traffic in stead of controlling traffic for one bot. IMO user-traffic should always be prioritized higher than bot-traffic. I tried "Crawl-delay:" in robots.txt, but Googlebot doesn't support that. Although my custom CMS system has a solution to centrally manage Robots.txt for all sites at once, it is read by bots per site and per bot, so it doesn't solve 2) and 3). I also tried controlling crawl-speed through Google Webmaster Tools, which works, but again it only controls Googlebot (and not other bots) and is administered per site. No solution to all three of my problems. Now i came up with a custom-coded solution to dynamically serve 503 http status codes to a certain portion of the bot traffic. What traffic-portion for which bots can be dynamically (runtime) calculated from total server load at that certain moment. So if a bot makes too much requests within a certain period (or whatever other coded rule i'll invent), some requests will be answered with a 503 while others will get content and a 200. Remaining question is: Will dynamically serving 503's have a negative impact on SEO? OK, it will delay indexing speed/latency, but slow server-response-times do in fact have a negative impact on the ranking, which is even worse than indexing-latency. I'm curious about your expert's opinions...
White Hat / Black Hat SEO | | internetwerkNU1 -
Googlebot stopped crawling
Hi All, One of my website stopped showing in SERP, after analysing in webmaster, found that Googlebot is not able to crawl. However it was working alright few days back. Try to investigate for panelisation, but no intimation found. I checked robot.txt for no follow etc but all seems to be ok. I resubmitted Sitemap in webmaster again, it crawled 250 pages out of 500 but it still site is not available in SERP (google), in bing it is ok. Pl suggest the best possible solutions to try. Thx
White Hat / Black Hat SEO | | 1akal0 -
Help with E-Commerce Product Pages
Hi, I need to find the best way to put my products on our e-commerce website. I have researched and researched but I thought I'd gather a range of ideas in here. Basically I have the following fields: Product Title
White Hat / Black Hat SEO | | YNWA
Product Description
Product Short Description SEO Title
Focus Keyword(s) (this is a feature of our CMS)
Meta Description The problem we have is we have a lot of duplicate content eg. 10 Armani Polos but then each one will be a different colour (but the model number is the same). I don't want to miss out on rankings because of this. What would you say is the best way to do this? My idea is this: Product Title: Armani Jeans Polo Shirt Blue
Product Description: Armani Jeans Polo Shirt in Blue Made from 100% cotton Armani Jeans Polo with Short Sleeves, Pique Collar and Button Up Collar. Designer Boutique Menswear are official stockists of Armani Jeans Polos.
Short Description: Blue Armani Jeans Polo SEO Title: Armani Jeans Polo Shirt Blue MA001 | Designer Boutique Menswear
Focus Keywords: Armani Jeans Polo Shirt
Meta Description: Blue Armani Jeans Polo Shirt. Made from 100% cotton. Designer Boutique Menswear are official stockists of Armani Polos. What are peoples thoughts on this? I would then run the same format across each of the different colours. Another question is on the product title and seo title, should these be exactly the same? And does it matter if I put the colour at the beginning or end of the title? Any help would be great.0 -
How do you optimize a page with Syndicated Content?
Content is syndicated legally (licensed). My questions are: What is the best way to approach this situation? Is there any a change to compete with the original site/page for the same keywords? Is it okay to do so? Will there be any negative SEO impact on my site?
White Hat / Black Hat SEO | | StickyRiceSEO0 -
My attempt to reduce duplicate content got me slapped with a doorway page penalty. Halp!
On Friday, 4/29, we noticed that we suddenly lost all rankings for all of our keywords, including searches like "bbq guys". This indicated to us that we are being penalized for something. We immediately went through the list of things that changed, and the most obvious is that we were migrating domains. On Thursday, we turned off one of our older sites, http://www.thegrillstoreandmore.com/, and 301 redirected each page on it to the same page on bbqguys.com. Our intent was to eliminate duplicate content issues. When we realized that something bad was happening, we immediately turned off the redirects and put thegrillstoreandmore.com back online. This did not unpenalize bbqguys. We've been looking for things for two days, and have not been able to find what we did wrong, at least not until tonight. I just logged back in to webmaster tools to do some more digging, and I saw that I had a new message. "Google Webmaster Tools notice of detected doorway pages on http://www.bbqguys.com/" It is my understanding that doorway pages are pages jammed with keywords and links and devoid of any real content. We don't do those pages. The message does link me to Google's definition of doorway pages, but it does not give me a list of pages on my site that it does not like. If I could even see one or two pages, I could probably figure out what I am doing wrong. I find this most shocking since we go out of our way to try not to do anything spammy or sneaky. Since we try hard not to do anything that is even grey hat, I have no idea what could possibly have triggered this message and the penalty. Does anyone know how to go about figuring out what pages specifically are causing the problem so I can change them or take them down? We are slowly canonical-izing urls and changing the way different parts of the sites build links to make them all the same, and I am aware that these things need work. We were in the process of discontinuing some sites and 301 redirecting pages to a more centralized location to try to stop duplicate content. The day after we instituted the 301 redirects, the site we were redirecting all of the traffic to (the main site) got blacklisted. Because of this, we immediately took down the 301 redirects. Since the webmaster tools notifications are different (ie: too many urls is a notice level message and doorway pages is a separate alert level message), and the too many urls has been triggering for a while now, I am guessing that the doorway pages problem has nothing to do with url structure. According to the help files, doorway pages is a content problem with a specific page. The architecture suggestions are helpful and they reassure us they we should be working on them, but they don't help me solve my immediate problem. I would really be thankful for any help we could get identifying the pages that Google thinks are "doorway pages", since this is what I am getting immediately and severely penalized for. I want to stop doing whatever it is I am doing wrong, I just don't know what it is! Thanks for any help identifying the problem! It feels like we got penalized for trying to do what we think Google wants. If we could figure out what a "doorway page" is, and how our 301 redirects triggered Googlebot into saying we have them, we could more appropriately reduce duplicate content. As it stands now, we are not sure what we did wrong. We know we have duplicate content issues, but we also thought we were following webmaster guidelines on how to reduce the problem and we got nailed almost immediately when we instituted the 301 redirects.
White Hat / Black Hat SEO | | CoreyTisdale0