Question about construction of our sitemap URL in robots.txt file
-
Hi all,
This is a Webmaster/SEO question. This is the sitemap URL currently in our robots.txt file:
http://www.ccisolutions.com/sitemap.xml
As you can see it leads to a page with two URLs on it. Is this a problem? Wouldn't it be better to list both of those XML files as separate line items in the robots.txt file?
Thanks!
Dana
-
Hi Jarno,
Thanks so very much! I have to say I am really liking the A1 generator. How awesome of you to follow up. I really appreciate that. Yes, if you want to send me the complete sitemap via PM that would be awesome. I certainly hope I can return the favor Happy Holidays!
Dana
-
Yes, we definitely use XENU, but I think I like Screaming Frog a bit better (although our IT Director swears it's broken).
-
Hi Christopher,
Thanks for the update. Yes, I looked at it too and other than it not being "pretty" XML, the data seemed to be okay. The one thing the A! generator did that we couldn't do was assign the values for importance and frequency specific pages are modified. If that data is accurate, that's pretty cool. I'm just not sure, although it seems it did identify pages that are modified more frequently correctly. I have 30 days to play with the free trial, but so far I think I like it a lot.
Dana
-
Dana,
It just finished scanning here are the results:
Internal Sitemap URL's:
- Listed found: 5248
- Listed deduced: 5301
- Analyzed content: 3110
- Analyzed references: 3176
External URL's:
- Listed found: 700
When i look at the overview of the result i see a number of 301 redirects, canonical redirects (when tested again the get code 200 OK). But I see a lot op pages.
When i build the sitemap it generates one file (no idea why not more then one) with all the links in the document. Google's sitemap protocol states it should be like the schema at sitemaps.org which it does. The entire protocol of sitemap.org states that a sitemap can not hold over 50,000 links and should be smaller then 10 MB in filesize.
The one I just build for you is only 1 MB and contains less url's then 50,000 and thus is it allowed by Google.
http://www.sitemaps.org/protocol.html
I can send you the entire version of the sitemap if you'd like in a personal message or through e-mail?
Hope this helps you further.
kind regards
Jarno
-
i started the scan and it's still busy:
2500 analyzed references so far.
Let you know how it turns out.
Jarno
-
Thanks Jarno. I really appreciate that. Yes, I had it selected to just scan for images (as prompted when I attempted to create an image sitemap). Let me know what you see? I am wondering if it is going around in circles?
Dana
-
Dana,
sometimes that happens. Are you scanning for images or are you scanning the site?
i will check your site tomorrow with my full version and see what it does.
Sometimes with some websites you'll get things like this but it can be loads of things. 3500 pages should not take 2 hours but only a couple of minutes. I'll check it first thing tomorrow. A1 is not installed on my laptop..
Let you know tomorrow.
Kind regards
Jarno
-
A1 Sitemap does 2 things:
1 ) It builds a file names sitemap.xml which contains all files on the website (not conform the google requirements
-
It builds a number of files listed in sitemap-index.xml for every 100 pages in one sitemap. So if you're website contains 2800 pages You'll get loads of files: 28 sitemap-1.xml etc and 1 sitemap-index.xml file. Which does meet the Google standards. Afterwards you can do 2 things in Google webmasters:
-
enter the sitemap-index.xml file as a sitemap -> Google will follow everything and come to the grand total of 2800 pages.
-
Enter each sitemap separately.-> same result but you can pinpoint better where you have a 100 pages and google only indexes fewer (can happen).
Hope this helps
-
-
Hi again Jarno,
Is it normal for A1's sitemap generator's "Scan website" function for images to take over two hours? Our site is about 3,500 URLs. So far it has under "Internal 'sitemap' URLs" Listed found: 82076 (and climbing every few seconds).
I am wondering if there isn't something wrong? (I don't have any frame of reference since I've never used it before). Thanks!
Dana
-
I'm not familiar with the A1 Sitemap generator, but regarding the sitemap protocol, there is a limit on the size of a single sitemap.xml file, so for large sites, the sitemap must be split into multiple sitemap.xml files. And, the protocol has a method for indexing these multiple sitemap.xml files. It's sort of like an index to an index. None of my sites exceed the sitemap file limit, so I don't know which sitemap generators use this approach, but I would guess many of them do.
Sitemap generators I have used include DMXZone which is a Dreamweaver plugin, and xml-sitemaps.com which includes a video sitemap generator.
Best,
ChristopherEDIT: PS: Your current sitemap looks fine to me.
-
Thanks Christopher,
Your answer took a noment to sink in, but I think I get it (I think I am coffee deprived this morning).
So, if I am using the A1 Sitemap generator that Jarno suggested, this sitemap index should automatically be generated based on the size of my generated sitemap. Is that correct?
-
Thanks Jarno,
I have downloaded and am trying the 30-day free trial of the A1 Sitemap Generator right now. Thanks for the tip. Can you comment on Christopher's remark below concerning sitemap indexes for larger sitemaps?
Can either you or Christopher give me more clarification on that. Is this what our IT director has attempted to do with the sitemap in our robots.txt file? If so, has it been done correctly?
Thanks!
-
There is a limit on the size of a sitemap and to allow for large sitemaps to be split into smaller sitemaps, the sitemap protocol includes a sitemapindex. See "Using Sitemap index files (to group multiple sitemap files)" here http://www.sitemaps.org/protocol.html. Of course, it's also possible to include the multiple sitemaps in the robot.txt file, but automated sitemap generators will likely use the sitemapindex feature so that the robots.txt file does not have to be modified as the size of the site changes.
Best,
Christopher -
Another tool to help generate a sitemap and even check broken links is called Xenu (weird logo, but good free product).
-
Dana,
the buildup of your sitemap.xml is very strange to me. I use an external program to build my sitemap.xml for me entire website.
You now have a link in your robots.txt file pointing to a sitemap which contains 2 files (both .xml) with een map of the site?
Why not use a program (free or paid like Microsys A1 (the one I use)) to build 1 sitemap.xml en point to this file from your robots.txt?
hope this helps
if you do have any questions, please let me know.
kind regards
Jarno
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl solutions for landing pages that don't contain a robots.txt file?
My site (www.nomader.com) is currently built on Instapage, which does not offer the ability to add a robots.txt file. I plan to migrate to a Shopify site in the coming months, but for now the Instapage site is my primary website. In the interim, would you suggest that I manually request a Google crawl through the search console tool? If so, how often? Any other suggestions for countering this Meta Noindex issue?
Technical SEO | | Nomader1 -
Sitemap: Linking horizontal pages on a sitemap that has a vertical hierarchy structure
I'm currently in the process of revamping a website and creating a sitemap for it so that all pages get indexed by search engines. The site is divided into two websites that share the same root domain. The marketing site is on example.com and the application is on go.example.com. To get to go.example.com from example.com, you need to go through one of three “action pages”. The action pages are accessed from every page on example.com where we have a CTA button on the site (that’s pretty much every page). These action pages do not link back to any other page on the site though, nor are they a necessary step to navigate to other webpages. These action pages are only viewed when a user is ready to be taken to the application site. My question is, how should these pages be set up in a vertical sitemap since these three pages have a horizontal structure? Any insight would be much appreciated!
Technical SEO | | RallyUp0 -
Search Console rejecting XML sitemap files as HTML files, despite them being XML
Hi Moz folks, We have launched an international site that uses subdirectories for regions and have had trouble getting pages outside of USA and Canada indexed. Google Search Console accounts have finally been verified, so we can submit the correct regional sitemap to the relevant search console account. However, when submitting non-USA and CA sitemap files (e.g. AU, NZ, UK), we are receiving a submission error that states, "Your Sitemap appears to be an HTML page," despite them being .xml files, e.g. http://www.t2tea.com/en/au/sitemap1_en_AU.xml. Queries on this suggest it's a W3 Cache plugin problem, but we aren't using Wordpress; the site is running on Demandware. Can anyone guide us on why Google Search Console is rejecting these sitemap files? Page indexation is a real issue. Many thanks in advance!
Technical SEO | | SearchDeploy0 -
Add selective URLs to an XML Sitemap
Hi! Our website has a very large no of pages. I am looking to create an XML Sitemap that contains only the most important pages (category pages etc). However, on crawling the website in a tool like Xenu (the others have a 500 page limit), I am unable to control which pages get added to the XML Sitemap, and which ones get excluded. Essentially, I only want pages that are upto 4 clicks away from my homepage to show up in the XML Sitemap. How should I create an XML sitemap, and at the same time control which pages of my site I add to it (category pages), and which ones I remove (product pages etc). Thanks in advance! Apurv
Technical SEO | | AB_Newbie0 -
Can I rely on just robots.txt
We have a test version of a clients web site on a separate server before it goes onto the live server. Some code from the test site has some how managed to get Google to index the test site which isn't great! Would simply adding a robots text file to the root of test simply blocking all be good enough or will i have to put the meta tags for no index and no follow etc on all pages on the test site also?
Technical SEO | | spiralsites0 -
Canonical Question
Can someone please help me with a question, I am learning about Canonical URls at the moment and have had some errors come up, it is saying ```![Priority 1](http://try.powermapper.com/Reports/89db420a-2cf2-46dc-bae4-543efbefc241/report/Report/p1.png)This page has multiple rel=canonical tags.Line 9 Best Practice[![](http://try.powermapper.com/Reports/89db420a-2cf2-46dc-bae4-543efbefc241/report/Report/dropbox.png)](http://try.powermapper.com/Reports/89db420a-2cf2-46dc-bae4-543efbefc241/report/res/2.view.htm#)![Help](http://try.powermapper.com/Reports/89db420a-2cf2-46dc-bae4-543efbefc241/report/Report/help.png)Search engine behavior is unpredictable when a page has multiple canonical tags. <link rel="canonical" href="http://www.finalduties.co.uk/" /><link rel="alternate" type="application/rss+xml" title="Final Duties – Low cost probate RSS Feed" href="http://www.finalduties.co.uk/feed/" /> <link rel="alternate" type="application/atom+xml" title="Final Duties – Low cost probate Atom Feed" href="http://www.finalduties.co.uk/feed/atom/" /><link rel="pingback" href="http://www.finalduties.co.uk/xmlrpc.php" />That canonical link to Feed? should that be there, I know the Plugin has done this but I am lost to what should be there, I have no duplicate pages as far as I am aware than needs a canonical URL ??Thanks ``` >
Technical SEO | | Chris__Chris0 -
How can I see the SEO of a URL? I need to know the progress of a specific landing-page of my web. Not a keyword, an url please. Thanks.
I need to know the evolution on SEO of a specific landing-page (an URL) of my web. Not a keyword, a url. Thanks. (Necesito saber si es posible averiguar el progreso de una URL específica en el posicionamiento de Google. Es decir, lo que hace SEOmoz con las palabras clave pero al revés. Yo tengo una url concreta que quiero posicionar en las primeras posiciones de Google pero quiero ver cómo va progresando en función a los cambios que le voy aplicando. Muchas gracias)
Technical SEO | | online_admiral0 -
Trailing Slashes In Url use Canonical Url or 301 Redirect?
I was thinking of using 301 redirects for trailing slahes to no trailing slashes for my urls. EG: www.url.com/page1/ 301 redirect to www.url.com/page1 Already got a redirect for non-www to www already. Just wondering in my case would it be best to continue using htacces for the trailing slash redirect or just go with Canonical URLs?
Technical SEO | | upick-1623910