Sitemaps - Format Issue
-
Hi,
I have a little issue with a client site whose programmer seems kind of unwilling to change things that he has been doing a long time.
So, he has had this dynamic site set up for a few years and active in google webmaster tools and others, but is not happy with the traffic it is getting.
When I looked at webmaster tools I see that he has a sitemap registered, but it is /sitemap.php
When I said that we should be offering the SE's /sitemap.xml his response is that sitemap.php checks the site every day and generates /sitemap.xml, but there is no /sitemap.xml registered in webmaster tools.
My gut is telling me that he should just register /sitemap.xml in webmaster tools, but it is a hard sell
Anyone have any definitive experience of people doing this before and whether it is an issue?
My feeling is that it doesn't need to be rocket science...
Any input appreciated,
Sha
-
I have a sitemap.php on my sites. The file contains the php code which generates my xml sitemap. It is perfectly standard and common practice.
The question for your programmer is, where is the output xml file located? A sitemap program will output the file to the same location each time it is updated. He should be able to provide you a link to the file.
I would advise the URL to be placed somewhere like mydomain.com/sitemap directory. If a deeper directory is preferred, then add the location to robots.txt. Either way it cannot hurt to update the sitemap in Google WMT. With that said, it is not necessary to do so as long as you can confirm Google is getting the information.
-
I haven't seen a sitemap.php in a long time, Sha. Certainly Google could read it if they want, but whether they will or not is the question. I would be inclined to doubt it.
If he says that it's generating a sitemap.xml, but none is present on WMT, then I would respond that one of two things is happening:
1. It isn't generating the sitemap in an xml format at all, but only in php, or
2. For some reason, the xml version is either not transmitted, or not received.
The only other possibility that comes to mind is that perhaps the conversion from php to xml is not tagged in a fashion to be recognized as an xml file, and WMT is detecting it as php and assigning it that status accordingly. I suppose that could happen, particularly if he is using an outdated plugin or if of his own coding, the conversion is faulty.
I'd be interested in hearing what you ultimately learn on this.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting 'Indexed, not submitted in sitemap' for around a third of my site. But these pages ARE in the sitemap we submitted.
As in the title, we have a site with around 40k pages, but around a third of them are showing as "Indexed, not submitted in sitemap" in Google Search Console. We've double-checked the sitemaps we have submitted and the URLs are definitely in the sitemap. Any idea why this might be happening? Example URL with the error: https://www.teacherstoyourhome.co.uk/german-tutor/Egham Sitemap it is located on: https://www.teacherstoyourhome.co.uk/sitemap-subject-locations-surrey.xml
Technical SEO | | TTYH0 -
Any crawl issues with TLS 1.3?
Not a techie here...maybe this is to be expected, but ever since one of my client sites has switched to TLS 1.3, I've had a couple of crawl issues and other hiccups. First, I noticed that I can't use HTTPSTATUS.io any more...it renders an error message for URLs on the site in question. I wrote to their support desk and they said they haven't updated to 1.3 yet. Bummer, because I loved httpstatus.io's functionality, esp. getting bulk reports. Also, my Moz campaign crawls were failing. We are setting up a robots.txt directive to allow rogerbot (and the other bot), and will see if that works. These fails are consistent with the date we switched to 1.3, and some testing confirmed it. Anyone else seeing these types of issues, and can suggest any workarounds, solves, hacks to make my life easier? (including an alternative to httpstatus.io...I have and use screaming frog...not as slick, I'm afraid!) Do you think there was a configuration error with the client's TLS 1.3 upgrade, or maybe they're using a problematic/older version of 1.3?? Thanks -
Technical SEO | | TimDickey0 -
Magento Rewrite Issue
Moz's Crawler has thrown up a bunch of crawl issue for my site.The site is a magento based site and I recently updated the themes so some routes may have have become redundant. Moz has identified 289 pages with Temporary Redirect. I thought magento managed the redirects if I set the "Auto-redirect to Base URL" to Yes(301 Moved permanently). But this is enabled on my store and I still get the errors. The only thing I could think of was to add a Robots.txt and handle the redirection of these links from here. But handling redirection for 289 links is no mean task. I was looking for any ideas that could fix this without me manually doing this .
Technical SEO | | abhishek19860 -
How to fix this issue?
I redesign my website from Wix to HTML. Now URLs changed to _ http://www.spinteedubai.com/#!how-it-works/c46c To http://www.spinteedubai.com/how-it-works.html Same for all other pages. How I can fix this issue and both pages were also indexed in google.
Technical SEO | | AlexanderWhite0 -
Home page canonical issues
Hi, I've noticed I can access/view a client's site's home page using the following URL variations - http://example.com/
Technical SEO | | simon-145328
http://example/index.html
http://www.example.com/
http://www.example.com/index.html There's been no preference set in Google WMT but Google has indexed and features this URL - http://example.com/ However, just to complicate matters, the vast majority of external links point to the 'www' version. Obviously i would like to tidy this up and have asked the client's web development company if they can place 301 redirects on the domains we no longer want to work - I received this reply but I'm not sure whether this does take care of the duplicate issue - Understand what you're saying, but this shouldn't be an issue regarding SEO. Essentially all the domains listed are linking to the same index.html page hosted at 1 location My question is, do i need to place 301 redirects on the domains we don't want to work and do i stick with the 'non www' version Google has indexed and try to change the external links so they point to the 'non www' version or go with the 'www' version and set this as the preferred domain in Google WMT? My technical knowledge in this area is limited so any help would be most appreciated. Regards,
Simon.0 -
Interesting indexing issue - any input would be greatly appreciated!
A few months ago we did SEO for a website, just like any other website. However, we did not see crawl/indexing results that we have with all of our other SEO projects - the Google webmaster tool was indicating that only 1 page of the site (although only 20 pages) was indexed. The site was older & originally developed in Dreamweaver, so although that shouldn't have been an issue, we were desperate to solve the problem & ended up rebuilding the site in WordPress. While this actually helped increase the number of pages on the site that Google indexed (now all 20) - we are still seeing strange things in the search results. For example, when we check rankings manually for a particular term, the new description is showing, however, it is displaying the old title text. Does anyone know what the problem could be? Thank you so much!!
Technical SEO | | ZAG0 -
MSNbot Issues
We found msnbot is doing lots of request at same time to one URL, even considering we have caching, it triggers many requests at same time so caching does not help at the moment: For sure we can use mutex to make sure URL waits for cache to generate, but we are looking for solution for MSN boot. 123.253.27.53 [11/Dec/2012:14:15:10 -0600] "GET //Fun-Stuff HTTP/1.1" 200 0 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 1.253.27.53 [11/Dec/2012:14:15:10 -0600] "GET //Type-of-Resource/Fun-Stuff HTTP/1.1" 200 0 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 1.253.27.53 [11/Dec/2012:14:15:10 -0600] "GET /Browse//Fun-Stuff HTTP/1.1" 200 6708 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" We found the following solution: http://www.bing.com/community/site_blogs/b/webmaster/archive/2009/08/10/crawl-delay-and-the-bing-crawler-msnbot.aspx Bing offers webmasters the ability to slow down the crawl rate to accommodate web server load issues. User-Agent: * Crawl-Delay: 10 Need to know if it’s safe to apply that. OR any other advices. PS: MSNBot gets so bad at times that it could trigger a DOS attack – alone! (http://www.semwisdom.com/blog/msnbot-stupid-plain-evil#axzz2EqmJM3er).
Technical SEO | | tpt.com0 -
Crawl issue
Hi I have a problem with crawl stats. Crawls Only return 3k pages while my site have 27k pages indexed(mostly duplicated content pages), why such a low number of pages crawled any help more than welcomed Dario PS: i have more campaign in place, might that be the reason?
Technical SEO | | Mrlocicero0