Does Google index XML files?
-
Does Google or other search engines include XML files in their index? More specifically, I am wondering how Google knows the difference between an xml filetype and an RSS feed.
-
Yes, Google indexes XML files. You can try a search using filetype:xml
I am not an expert on RSS files but I believe the XML versions use the <rss>tag. If I were to take a guess, I would say Google can easily examine the file (they read pdf's for example) and determine if it is an RSS feed.</rss>
-
Google and other search engines know about XML files. The reason I don't necessarily say indexed is because you don't see them really popping up on a SERP page. Robots know the difference between a straight xml file and an rss feed because of the technical markup and structure of the actual file.
For example a very common use for Google and xml files is sitemaps, which you can give them the sitemap feed in Webmaster tools so they will know about the file. If you did a search for www.rootdomain.com + sitemap file you would never see an .xml file actually popup. You would need to visit the url directly.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google is indexing bad URLS
Hi All, The site I am working on is built on Wordpress. The plugin Revolution Slider was downloaded. While no longer utilized, it still remained on the site for some time. This plugin began creating hundreds of URLs containing nothing but code on the page. I noticed these URLs were being indexed by Google. The URLs follow the structure: www.mysite.com/wp-content/uploads/revslider/templates/this-part-changes/ I have done the following to prevent these URLs from being created & indexed: 1. Added a directive in my Htaccess to 404 all of these URLs 2. Blocked /wp-content/uploads/revslider/ in my robots.txt 3. Manually de-inedex each URL using the GSC tool 4. Deleted the plugin However, new URLs still appear in Google's index, despite being blocked by robots.txt and resolving to a 404. Can anyone suggest any next steps? I Thanks!
Technical SEO | | Tom3_150 -
Google dropping pages from SERPs even though indexed and cached. (Shift over to https suspected.)
Anybody know why pages that have previously been indexed - and that are still present in Google's cache - are now not appearing in Google SERPs? All the usual suspects - noindex, robots, duplication filter, 301s - have been ruled out. We shifted our site over from http to https last week and it appears to have started then, although we have also been playing around with our navigation structure a bit too. Here are a few examples... Example 1: Live URL: https://www.normanrecords.com/records/149002-memory-drawings-there-is-no-perfect-place Cached copy: http://webcache.googleusercontent.com/search?q=cache:https://www.normanrecords.com/records/149002-memory-drawings-there-is-no-perfect-place SERP (1): https://www.google.co.uk/search?q=memory+drawings+there+is+no+perfect+place SERP (2): https://www.google.co.uk/search?q=memory+drawings+there+is+no+perfect+place+site%3Awww.normanrecords.com Example 2: SERP: https://www.google.co.uk/search?q=deaf+center+recount+site%3Awww.normanrecords.com Live URL: https://www.normanrecords.com/records/149001-deaf-center-recount- Cached copy: http://webcache.googleusercontent.com/search?q=cache:https://www.normanrecords.com/records/149001-deaf-center-recount- These are pages that have been linked to from our homepage (Moz PA of 68) prominently for days, are present and correct in our sitemap (https://www.normanrecords.com/catalogue_sitemap.xml), have unique content, have decent on-page optimisation, etc. etc. We moved over to https on 11 Aug. There were some initial wobbles (e.g. 301s from normanrecords.com to www.normanrecords.com got caught up in a nasty loop due to the conflicting 301 from http to https) but these were quickly sorted (i.e. spotted and resolved within minutes). There have been some other changes made to the structure of the site (e.g. a reduction in the navigation options) but nothing I know of that would cause pages to drop like this. For the first example (Memory Drawings) we were ranking on the first page right up until this morning and have been receiving Google traffic for it ever since it was added to the site on 4 Aug. Any help very much appreciated! At the very end of my tether / understanding here... Cheers, Nathon
Technical SEO | | nathonraine0 -
Why wont google Index this page?
A week ago i accidentally changed this page settings in my CMS to "disable & dont index" as i was going to replace this page with another, but this didnt happen, but i forgot to switch the settings back! http://www.over50choices.co.uk/funeral-planning/funeral-plans Anyhow in an effort to get it back up quickly i submitted in GWTs but its still not indexed. When i use several SEO on page checking tools it has the Meta Title data as "Form" and not the correct title. Any ideas please? Yours frustrated Ash
Technical SEO | | AshShep10 -
Google indexing despite robots.txt block
Hi This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | zeepartner0 -
Google Still Taking 2 - 3 Days to Index New Posts
My website is fairly new, it was launched about 3.5 months ago. I've been publishing new content daily (except on weekends) since then. Is there any reason why new posts don't get indexed faster? All of my posts gets +1's and they're shared on G+, FB and Twitter. My website's at www.webhostinghero.com
Technical SEO | | sbrault740 -
Should We Index These Category Pages?
Currently we have marked category pages like http://www.yournextshoes.com/celebrities/kim-kardashian/ as follow/noindex as they essentially do not include any original content. On the other hand, for someone searching for Kim Kardashian shoes, it's a highly relevant page as we provide links to all the Kim Kardashian shoe sightings that we have covered. Should we index the category pages or leave them unindexed?
Technical SEO | | Jantaro0 -
Google previews meanings
I have noticed that when I do a site:mysite.com search and all my indexed pages come up, that when I hover over the arrows and bring up the previews some of the pages have things in a red rectangle and some of the pages even have zig zag-like tears/perforations running through them. What do these signify?
Technical SEO | | VictorVC0 -
Mobile Google Not Indexing Mobile Website
Google currently does not index our mobile website. It has the WWW website in it's index. When a user from a mobile phone clicks on a mobile search result for WWW we redirect them to our mobile website. This is posing problems for us as our mobile website is a fraction of the # of pages/sections as our WWW. So for example, mobile search results show that we have a "careers" section; but that's not the case for the mobile website. As a result a user gets a 404. How do we force mobile Google to index our mobile website instead of our WWW?
Technical SEO | | RBA0