Sitemap issues

edward-may

Hi ALL

Okay I'm a bit confused here, but it says I have submitted 72 (pages) im assuming and its returning only (2 pages) have been indexed?

I submitted a new site map for each of my 3 top level domains and checked it today and its showing this result attached.

We are still having issues with meta tags showing up in the incorrect country.

If anyone knows how I can attend to this knightmare would be much appreciated lol

new

edward-may

Awesome response Dirk! Thanks again for your endless help!

PatrickDelehanty

Hey again!

...I can't believe I didn't think of the simplicity of this earlier...

"it's even faster if you sort the url's in alphabetical order & delete the rows containing priority / lastmod /.. - then you only need to do a find/replace on the <loc>/</loc> "

100,000 spreadsheets and I don't even think of sorting for this task. Unreal. I laughed at myself aloud when I read it.

Thank you per usual my friend!

DirkC

Hi Patrick,

Your method is a good one - I use more or less the same trick to retrieve url's from a sitemap in xls (it's even faster if you sort the url's in alphabetical order & delete the rows containing priority / lastmod /.. - then you only need to do a find/replace on the <loc>/</loc> )

It's just in this specific case as the sitemap was generated in Screaming Frog that it's easier to eliminate these redirected url's upfront.

Dirk

PatrickDelehanty

Thanks so much Dirk - this is great. I was speaking to how I found the specific errors. Thanks for posting this for the sitemap - definitely left a big chunk out on my part!

DirkC

Hi Justin

The how-to of Patrick is correct - but as you are generating your sitemap using Screaming Frog there is really no need to go through this manual processing.

If you only need to create the sitemap:

Go to Configuration > Spider -
Tab: Basic settings: uncheck everything apart from "Crawl Canonicals" (unchecking Images/CSS/JS/External links is not strictly necessary but speeds up the crawl)

Advanced: Check "Always Follow redirects" / "Respect Noindex" / "Respect Canonical"

After the crawl - generate the sitemap - it will now only contain the "final" url's - after the redirects.

Hope this helps,

Dirk

PS Try to avoid internal links which are redirected - better to replace these links by links to the final destination

DirkC

Hi Justin

Probably the easiest way to eliminate these cross references is to ask your programmer to put all links as relative links rather than as absolute links. Relative links have the disadvantage that they can generate endless loops if something is wrong with the HTML - but this is something you can easily check with Screaming Frog.

If you check the .com version - example https://www.zenory.com/blog/tag/love/ -it's calling zenory.co.nz for plenty of links (just check the source & search for .co.nz) - both the http & the https version

You can check all these pages by hand - but I guess your programmer must be able to do this in an automated way.

It is also the case the other way round- on the .co.nz version - you'll find references in the source to the .com version

In screaming frog - the links with "NZ" are the only ones which should stay absolute - as they point to the other version

Hope this clarifies

Dirk

edward-may

Wow thanks Patrick, let me run this and see how I go, thanks so much for your help!!!

edward-may

Dirk, thanks so much for your help!

Could you tell me how to identify with the urls that are cross referencing - I tried using screaming frog and I found under the **external and clicked on inlinks and outlinks. **But whats really caught my eye, is alot of the links are from the blog with the same anchor text "name" others are showing up as a different name as well. Some are saying NZ NZ or AU AU as the anchor text and I think this has to do with the flag drop down to change the top level domains.

For eg:

FROM: https://www.zenory.co.nz/blog/tag/love/

TO: https://www.zenory.com.au/categories/love-relationships

Anchor Text: Twinflame Reading

PatrickDelehanty

Hi Justin

Yep! I use ScreamingFrog, here's how I do it:

Goto your /sitemap.xml
Select all + copy
Paste into Excel column A
Select column A
Turn "Wrap Text" off
Delete rows 1 through 5
Select column A again
"Find and Replace" the following:
<lastmod></lastmod>
<changefreq></changefreq>
daily
Whatever the date is
Priority numbers, usually 0.5 to 1.0
"Replace With" nothing, no spaces, nothing
You'll hit "Replace All" after every text string you put in, one at a time
With Column A still select, hit F5
Click "Special"
Click "Blank" and "Ok"
Right click in the spreadsheet
Select "Delete" and "Shift Rows Up"

Walla! You have your list. Now copy this list, and open ScreamingFrog. Click "Mode" up top and click "List". Click "Upload List" and click "Paste". Paste your URLs in there and hit Start.

Your sitemap will be crawled.

Here are URLs that returned 301 redirects:

https://www.zenory.com/blog/chat-psychic-readings/
https://www.zenory.com/blog/online-psychic-readings-private/
https://www.zenory.com/blog/live-psychic-readings/
https://www.zenory.com/blog/online-psychic-readings/

Here are URLs that returned 503 Service Unavailable codes twice, but 200s now:

I would check on that when you can. Check in Webmaster Tools if any issues have arrived there as well.

Hope this helps! Good luck!

edward-may

Thanks so much Patrick! Can you recommend how I would go about finding the urls that are redirecting in the sitemap? I'm assuming screaming frog?

DirkC

Hi Justin

Google doesn't seem to be figuring out (even with the correct hreflang in place) which site should be shown for each country.

If you look at the cached versions of your .com.au & .com versions it always the .co.nz version which is cached - this is probably also the reason why the meta description is wrong (it's always coming from the .co.nz version) and why the % of url's indexed for each sitemap (for the .com & .com.au version) is so low.

Try to rigorously eliminate all cross-references in your site - to make it more obvious for Google that these are 3 different sites:

in the footer - the links in the second column are pointing to the .co.nz version (latest articles) - change these links to relative ones
on all sites there are elements you load from the .com domain (see latest blog entries - the images are loaded from the .com domain for all tld's

As long as you send these confusing signals to Google - Google will mix up the different versions of your site.

rgds,

Dirk

PatrickDelehanty

Hi there Justin

Everything looks fine from here - there are a couple URLs that need to be updated in your sitemap as they are redirecting.

Google takes time to index, so give this a little more time. You could ask Google to recrawl your URLs but that's very unnecessary at the moment; just something to note.

I would make sure your internal links are all good to go and "follow" so that crawlers can at least find URLs that way.

I did a quick site: search on Google, so far you have 58 pages indexed. You should be okay.

Hope this helps! Good luck!

Alick300

Hi Justin,

Similar question asked in this post @ http://moz.com/community/q/webmaster-tools-indexed-pages-vs-sitemap

Hope this helps you.

Thanks

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Sitemap issues

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Will pillar posts create a duplication content issue, if we un-gate ebook/guides and use exact copy from blogs?

Robots.txt file in Shopify - Collection and Product Page Crawling Issue

More sitemap issues: help

Sitemap issues 19 warnings

Does IP Blacklist cause SEO issues?

Are Links from blogs with person using keyword anchor text a Penguin 2.0 issue?

Penguin issues

Google Sitemaps & punishment for bad URLS?