How to stop pages being crawled from xml feed?
-
We have a site that has an xml feed going out to many other sites.
The xml feed is behind a password protected page so cannot use a cannonical link to point back to original url.How do we stop the pages being crawled on all of the sites using the xml feed? as with hundreds using it after launch it will cause instant duplicate content issues?
Thanks
-
You'll probably want to disallow spiders from crawling them with robots.txt:
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ecommerce category pages
Hi there, I've been thinking a lot about this lately. I work on a lot of webshops that are made by the same company. I don't like to say this, but not all of their shops perform great SEO-wise. They use a filtering system which occasionally creates hundreds to thousands of category pages. Basically what happens is this: A client that sells fashion has a site (www.client.com). They have 'main categories' like 'Men' 'Women', 'Kids', 'Sale'. So when you click on 'men' in the main navigation, you get www.client.com/men/. Then you can filter on brand, subcategory or color. So you get: www.client.com/men/brand. Basically, the url follows the order in which you filter. So you can also get to 'brand' via 'category': www.client.com/shoes/brand Obviously, this page has the same content as www.client.com/brand/shoes or even /shoes/brand/black and /men/shoes/brand/black if all the brands' shoes happen to be black and mens' shoes. Currently this is fixed by a dynamic canonical system that canonicalizes the brand/category combinations. So there can be 8000 url's on the site, which canonicalize to about 4000 url's. I have a gut feeling that this is still not a good situation for SEO, and I also believe that it would be a lot better to have the filtering system default to a defined order, like /gender/category/brand/color so you don't even need to use these excessive amounts of canonicalization. Because, you can canonicalize the whole bunch, but you'd still offer thousands of useless pages for Google to waste its crawl budget on. Not to mention the time saved when crawling and analysing using Screaming Frog or other audit tools. Any opinions on this matter?
Intermediate & Advanced SEO | | Adriaan.Multiply0 -
Why would my total number of indexed pages stop increasing?
I have an ecommerce marketplace that has new items added daily. In search consoloe my pages have always gone up almost every week. It hasn't increased in 5 weeks. We haven't made any changes to the site and the sitemap looks good. Any ideas on what I should look for?
Intermediate & Advanced SEO | | EcommerceSite0 -
After Server Migration - Crawling Gets slow and Dynamic Pages wherein Content changes are not getting Updated
Hello, I have just performed doing server migration 2 days back All's well with traffic moved to new servers But somehow - it seems that w.r.t previous host that on submitting a new article - it was getting indexed in minutes. Now even after submitting page for indexing - its taking bit of time in coming to Search Engines and some pages wherein content is daily updated - despite submitting for indexing - changes are not getting reflected Site name is - http://www.mycarhelpline.com Have checked in robots, meta tags, url structure - all remains well intact. No unknown errors reports through Google webmaster Could someone advise - is it normal - due to name server and ip address change and expect to correct it automatically or am i missing something Kindly advise in . Thanks
Intermediate & Advanced SEO | | Modi0 -
How can a Page indexed without crawled?
Hey moz fans,
Intermediate & Advanced SEO | | atakala
In the google getting started guide it says **"
Note: **Pages may be indexed despite never having been crawled: the two processes are independent of each other. If enough information is available about a page, and the page is deemed relevant to users, search engine algorithms may decide to include it in the search results despite never having had access to the content directly. That said, there are simple mechanisms such as robots meta tags to make sure that pages are not indexed.
" How can it happen, I dont really get the point.
Thank you0 -
Please help with page
We used to use this page http://www.discountbannerprinting.co.uk/banners/vinyl-pvc-banners.html to rank for the words vinyl banner and PVC banner but we have tried to focus the page only on PVC banners and move the vinyl banners word to http://www.discountbannerprinting.co.uk/ yet for some reason even though they have both been spidered google has now chosen to rank this page http://www.discountbannerprinting.co.uk/stickers/vinyl-stickers.html for the vinyl banner words- how do I stop this from happening I thought the home page would be powerful enough to rank for the word with a title inclusion and a spread of the word on the page. Also if anyone can give their opinion on why they thinkhttp://www.discountbannerprinting.co.uk/banners/vinyl-pvc-banners.html does not rank very well I would be truly appreciative.
Intermediate & Advanced SEO | | BobAnderson0 -
Content per page?
We used to have an articles worth of content in a scroll box created by our previous SEO, the problem was that it was very much keyword stuffed, link stuffed and complete crap. We then removed this and added more content above the fold, the problem I have is that we are only able to add 150 - 250 words above the fold and a bit of that is repetition across the pages. Would we benefit from putting an article at the bottom of each of our product pages, and when I say article I mean high quality in depth content that will go into a lot more detail about the product, history and more. Would this help our SEO (give the page more uniqueness and authority rather than 200 - 250 word pages). If I could see one problem it would be would an articles worth of content be ok at the bottom of the page and at that in a div tab or scroll box.
Intermediate & Advanced SEO | | BobAnderson0 -
Are links to on-page content crawled / have any effect on page rank?
Lets say I have a really long article that begins with links to <a name="something">anchors on the same page.</a> <a name="something"></a> <a name="something">E.g.,</a> Chapter 1, Chapter 2, etc, allowing the user to scroll down to different content. There are also other links on this page that link to other pages. A few questions: Googlebot arrives on the page. Does it crawl links that point to anchors on the same page? When link juice is divided among all the links on the page, do these links count and page rank is then lost? Thanks!
Intermediate & Advanced SEO | | anthematic0 -
Crawl questions
My first website crawl indicating many issues. I corrected the issues, requested another crawl and received the results. After viewing the excel file I have some questions. 1. There are many pages with missing Titles and Meta Descriptions in the Excel file. An example is http://www.terapvp.com/threads/help-us-decide-on-terapvp-com-logo.25/page-2 That page clearly has a meta description and title. It is a forum thread. My forum software does a solid job of always providing those tags. Why would my crawl report not show this information? This occurs on numerous pages. 2. I believe all my canonical URLs are properly set. My crawl report has 3k+ records, largely due to there being 10 records for many pages. These extra records are various sort orders and style differences for the same page i.e. ?direction=asc. My need for a crawl report is to provide actionable data so I can easily make SEO improvements to my site where necessary. These extra records don't provide any benefit. IF the crawl report determined there was not a clear canonical URL, then I could understand. But that is not the case. An example is http://www.terapvp.com/forums/news/ If you look at the source you will clearly see Where is the benefit to including the 10 other records in the Crawl report which show this same page in various sort orders? Am I missing anything? 3. My robots.txt appropriately blocks many pages that I do not wish to be crawled. What is the benefit to including these many pages in the crawl report? Perhaps I am over analyzing this report. I have read many articles on SEO, but now that I have found SEOmoz, I can see I will need to "unlearn what I have learned". Many things such as setting meta keyword tags are clearly not helpful. I wish to focus my energy and I was looking to the crawl report as my starting point. Either I am missing something, or the report design needs improvement.
Intermediate & Advanced SEO | | RyanKent0