Question about Robot.txt
-
I just started my own e-commerce website and I hosted it to one of the popular e-commerce platform Pinnacle Cart. It has a lot of functions like, page sorting, mobile website, etc. After adjusting the URL parameters in Google webmaster last 3 weeks ago, I still get the same duplicate errors on meta titles and descriptions based from Google Crawl and SEOMOZ crawl. I am not sure if I made a mistake of choosing pinnacle cart because it is not that flexible in terms of editing the core website pages. There is now way to adjust the canonical, to insert robot.txt on every pages etc. however it has a function to submit just one page of robot.txt. and edit the .htcaccess. The website pages is in PHP format.
For example this URL:
www.mycompany.com has a duplicate title and description with www.mycompany.com/site-map.html (there is no way of editing the title and description of my sitemap)
Another error is
www.mycompany.com has a duplicate title and description with http://www.mycompany.com/brands?url=brands
Is it possible to exclude those website with "url=" and my "sitemap.html" in the robot.txt? or the URL parameters from Google is enough and it just takes a lot of time.
Can somebody help me on the format of Robot.txt. Please? thanks
-
Thank you for your reply. This surely helps. I will probably edit the htaccess.
-
That's the problem with most sitebuilder type prgrams, they are very limited.
Perhaps look at your site title, and page titles. Usually the site title will be the included on all of your webpages followed by the page title so you could simply name your site www.yourcompany.com then add an individual page title to each page.
A robots.txt file is not supposed to be added to every page and only tells the bots what to crawl, and what not to.
If you can edit the htaccess, you should be able to get to the individual pages and insert/change the code for titles, just be aware that doing it manually can work, but sometimes when you go back to make an edit in the builder it may undo all of your manual changes, if that's the case, get your site perfect, then do the individual code changes as the last change.
Hope this helps.
-
I have no way of adding those too. Ooops thanks for the warning. I guess I would have to wait for Google to filter out the parameters.
Thanks for your answer.
-
You certainly don't want to block your sitemap file in robots.txt. It takes some time for Google to filter out the parameters and that is the right approach. If there is no way to change the title, I wouldn't be so concerned over a few pages with duplicate titles. Do you have the ability to add a noindex,follow meta tag on these pages?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl solutions for landing pages that don't contain a robots.txt file?
My site (www.nomader.com) is currently built on Instapage, which does not offer the ability to add a robots.txt file. I plan to migrate to a Shopify site in the coming months, but for now the Instapage site is my primary website. In the interim, would you suggest that I manually request a Google crawl through the search console tool? If so, how often? Any other suggestions for countering this Meta Noindex issue?
Technical SEO | | Nomader1 -
Adding your sitemap to robots.txt
Hi everyone, Best practice question: When adding your sitemap to your robots.txt file, do you add the whole sitemap at once or do you add different subcategories (products, posts, categories,..) separately? I'm very curious to hear your thoughts!
Technical SEO | | WeAreDigital_BE0 -
One server, two domains - robots.txt allow for one domain but not other?
Hello, I would like to create a single server with two domains pointing to it. Ex: domain1.com -> myserver.com/ domain2.com -> myserver.com/subfolder. The goal is to create two separate sites on one server. I would like the second domain ( /subfolder) to be fully indexed / SEO friendly and have the robots txt file allow search bots to crawl. However, the first domain (server root) I would like to keep non-indexed, and the robots.txt file disallowing any bots / indexing. Does anyone have any suggestions for the best way to tackle this one? Thanks!
Technical SEO | | Dave1000 -
Site blocked by robots.txt and 301 redirected still in SERPs
I have a vanity URL domain that 301 redirects to my main site. That domain does have a robots.txt to disallow the entire site as well. However, for a branded enough search that vanity domain still shows up in SERPs and has the new Google message of: A description for this result is not available because of this site's robots.txt I get why the message is there - that's not my , my question is shouldn't a 301 redirect trump this domain showing in SERPs, ever? Client isn't happy about it showing at all. How can I get the vanity domain out of the SERPs? THANKS in advance!
Technical SEO | | VMLYRDiscoverability0 -
Robots.txt file
How do i get Google to stop indexing my old pages and start indexing my new pages even months down the line? Do i need to install a Robots.txt file on each page?
Technical SEO | | gimes0 -
Basic URL Structure Question
Hi, Putting together a URL for a product we are selling. We sell IT Training courses and the structure is normally Top Folder=Main Courses section Sub Folder=Vendor Page Specific=Course Name + Term An example is courses/microsoft/mcse-training However I have a product where the vendor and course name are the same. How should I best organise the URL - double mention or single mention So a) courses/togaf/togaf-foundation-training or b) courses/togaf/foundation-training
Technical SEO | | RobertChapman0 -
Robots.txt Question
In the past, I had blocked a section of my site (i.e. domain.com/store/) by placing the following in my robots.txt file: "Disallow: /store/" Now, I would like the store to be indexed and included in the search results. I have removed the "Disallow: /store/" from the robots.txt file, but approximately one week later a Google search for the URL produces the following meta description in the search results: "A description for this result is not available because of this site's robots.txt – learn more" Is there anything else I need to do to speed up the process of getting this section of the site indexed?
Technical SEO | | davidangotti0 -
Craw Diagnostics Questions
SEO Moz is reporting that I have 50+ pages with a duplicate content issue based on this URL: http://www. f r e d aldous.co.uk/art-shop/art-supplies/art-canvas.html?manufacturer=178 But I have included this tag in the source: rel="canonical" href="http://www.f r e daldous.co.uk/art-shop/art-supplies/art-canvas.html"/> (I have purposefully added white space to the URLs in this message as I'm not sure about the rules for posting links here) I though this "canonical" tag prevented the duplicate content being indexed? is the reporting by SEOMoz wrong or being over cautious?
Technical SEO | | niallfred0