URL not indexed but shows in results?
-
We are working on a site that has a whole section that is not indexed (well a few pages are). There is also a problem where there are 2 directories that are the same content and it is the incorrect directory with the indexed URLs.
The problem is if I do a search in Google to find a URL - typically location + term then I get the URL (from the wrong directory) up there in the top 5. However, do a site: for that URL and it is not indexed! What could be going on here?
There is nothing in robots or the source, and GWT fetch works fine.
-
If you want to share a set of urls I'd be happy to take a look at it in case anything else jumps out.
-
I wouldn't say that the question is answered as such, more an issue identified. For me it looks like having a directory of URLs having a canonical set to another directory of duplicate URLs messes things up for Google.
I get virtually no visibly indexed single URLs out of around 500 URLs, the directory site: search returns the URLs. Some URLs were cached in the last day or 2, and plenty throw a 404 Google page when checking for a cached version. Seems flaky all round.
-
It appears as though they are though. You got what you need then? Your question is answered?
-
The canonical issue is identified. This is more of a "i've never seen that" day. Yes the directory Site: search returns all the URLs, but do a site: search for individual URLs and 95% are not showing as indexed.
-
The site command doesn't always show you every page that is indexed. You can:
- look to see if it has been cached (like you just did); or
- execute a specific site:domain.com/pagename.html or site:domain.com/section/ command to see if Google returns an indexed result; or
- look at Google Analytics to see if the page is receiving any search-engine-sourced page entries.
It sounds like your pages might, in fact, be indexed.
As to the wrong directory content getting indexed, I'm assuming you've no indexed one of them or assigned canonical tags indicating your strong preference. Both of these are only "suggestions" to Google. It can ignore you and when that happens, the situation like the one you describe happens.
The other thing to bear in mind is how long ago you noindexed or tagged your pages. It can take Google days, weeks, months and sometimes forever to catch up to your requested changes. You have to be patient and cross your fingers.
-
yes, a sample page is cached. It was cached today, however that URL using site: is not indexed. This URL was not showing as indexed yesterday either!
-
If you search for the page directly, can you see if a version of it has been cached?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
SEO URLs: 1\. URLs in my language (Greek, Greeklish or English)? 2\. Αt the end it is good to put -> .html? What is the best way to get great ranking?
Hello all, I must put URLs in my language Greek, Greeklish or in English? And at the end of url it is good to put -> .html? For exampe www.test.com/test/test-test.html ? What is the best way to get great ranking? I am a new digital marketing manager and its my first time who works with a programmer who doesn't know. I need to know as soon as possible, because they want to be "on air" tomorrow! Thank you very much for your help! Regards, Marios
Technical SEO | | marioskal0 -
New URL Structure
Hi Guy's, For our webshop we're considering a new URL structure because longtail keywords to rank so well. Now we have /category (main focus keywords)
Technical SEO | | Happy-SEO
/product/the-product345897345123/ (nice to rank on, not that much volume) We have over 500 categories and every one of them is placed after our domain. Because i think it's better to work with a good structure and managed a way to make categories and sub-categories. The 500 categories may be the case why not every one of them is ranking so well, so that was also the choice of thinking about a new structure. So the new URL structure will be: /category (main focus keywords)
/category/subcat/ (also main focus keywords) Everything will be redirect (301, good way), so i think there won't be to much problems. I'm thinking about what to do with the /product/ URL. Because now it will be on the same level as the subcategories, and i'm affraid that when it's on that level, Google will give the same value to both of them. My options that i'm considering are: **Old way **
/product/the-product-345897345123/ .html (seen this on big webshops)
/product/the-product-345897345123.html/ Level deeper SKU /product/the-product/345897345123/ What would you suggest? The new structure would be 20 categories 500+ sub's devided under main categories 5000+ products Thanks!0 -
URL Change, Old URLs Still In Index
Recently changed URLs on a website to remove dynamic parameters. We 301'd the old dynamic links (canonical version) to the cleaner parameter-free URLs. We then updated the canonical tags to reflect these changes. All pages dropped at least a few ranking positions and now Moz shows both the new page ranking slightly lower in results pages and the old page still in the index. I feel like I'm splitting value between the two page versions until the old one disappears... is there a way to consolidate this quickly?
Technical SEO | | ShawnW0 -
No index
Screaming frog spider does index pages on our website like: wp-content/plugins/woocommerce/assets/js/frontend/jquery-ui-touch-punch.min.js?ver=2.3.9 wp-content/plugins/mailchimp-for-wp/assets/css/checkbox.min.css?ver=2.3.2 Is it a bad/good idea to set my parameters in Webmastertools and tell Google not to crawl pages that begin with wp/content? Thanks!
Technical SEO | | Happy-SEO1 -
Which URL would you choose?
1 – www.company.com/subfolder/subfolder/keyword-keyword-product (I’m able to keyword match with this url) or 2. www.company.com/subfolder/subfolder/product (no url keyword match) What would you choose? A url which is "short" but still relevant, or, a url which is more descriptive allowing “keyword” match? Be great to get your feedback guys. Many thanks Gary
Technical SEO | | GaryVictory0 -
Site Indexed but not Cached?
I launched a new website ~2 weeks ago that seems to be indexed but not cached. According to Google Webmaster most of the pages are indexed and I see them appear when I search site:www.xxx.com. However, when I type into the URL - cache:www.xxx.com I get a 404 error page from Google.
Technical SEO | | theLotter
I've checked more established websites and they are cached so I know I am checking correctly here... Why would my site be indexed but not in the cache?0 -
SEOMOZ and Webmaster Tools showing Different Page Index Results
I am promoting a jewelry e-commerce website. The website has about 600 pages and the SEOMOZ page index report shows this number. However, webmaster tools shows about 100,000 indexed pages. I have no idea why this is happening and I am sure this is hurting the page rankings in Google. Any ideas? Thanks, Guy
Technical SEO | | ciznerguy1 -
Duplicate pages in Google index despite canonical tag and URL Parameter in GWMT
Good morning Moz... This is a weird one. It seems to be a "bug" with Google, honest... We migrated our site www.three-clearance.co.uk to a Drupal platform over the new year. The old site used URL-based tracking for heat map purposes, so for instance www.three-clearance.co.uk/apple-phones.html ..could be reached via www.three-clearance.co.uk/apple-phones.html?ref=menu or www.three-clearance.co.uk/apple-phones.html?ref=sidebar and so on. GWMT was told of the ref parameter and the canonical meta tag used to indicate our preference. As expected we encountered no duplicate content issues and everything was good. This is the chain of events: Site migrated to new platform following best practice, as far as I can attest to. Only known issue was that the verification for both google analytics (meta tag) and GWMT (HTML file) didn't transfer as expected so between relaunch on the 22nd Dec and the fix on 2nd Jan we have no GA data, and presumably there was a period where GWMT became unverified. URL structure and URIs were maintained 100% (which may be a problem, now) Yesterday I discovered 200-ish 'duplicate meta titles' and 'duplicate meta descriptions' in GWMT. Uh oh, thought I. Expand the report out and the duplicates are in fact ?ref= versions of the same root URL. Double uh oh, thought I. Run, not walk, to google and do some Fu: http://is.gd/yJ3U24 (9 versions of the same page, in the index, the only variation being the ?ref= URI) Checked BING and it has indexed each root URL once, as it should. Situation now: Site no longer uses ?ref= parameter, although of course there still exists some external backlinks that use it. This was intentional and happened when we migrated. I 'reset' the URL parameter in GWMT yesterday, given that there's no "delete" option. The "URLs monitored" count went from 900 to 0, but today is at over 1,000 (another wtf moment) I also resubmitted the XML sitemap and fetched 5 'hub' pages as Google, including the homepage and HTML site-map page. The ?ref= URls in the index have the disadvantage of actually working, given that we transferred the URL structure and of course the webserver just ignores the nonsense arguments and serves the page. So I assume Google assumes the pages still exist, and won't drop them from the index but will instead apply a dupe content penalty. Or maybe call us a spam farm. Who knows. Options that occurred to me (other than maybe making our canonical tags bold or locating a Google bug submission form 😄 ) include A) robots.txt-ing .?ref=. but to me this says "you can't see these pages", not "these pages don't exist", so isn't correct B) Hand-removing the URLs from the index through a page removal request per indexed URL C) Apply 301 to each indexed URL (hello BING dirty sitemap penalty) D) Post on SEOMoz because I genuinely can't understand this. Even if the gap in verification caused GWMT to forget that we had set ?ref= as a URL parameter, the parameter was no longer in use because the verification only went missing when we relaunched the site without this tracking. Google is seemingly 100% ignoring our canonical tags as well as the GWMT URL setting - I have no idea why and can't think of the best way to correct the situation. Do you? 🙂 Edited To Add: As of this morning the "edit/reset" buttons have disappeared from GWMT URL Parameters page, along with the option to add a new one. There's no messages explaining why and of course the Google help page doesn't mention disappearing buttons (it doesn't even explain what 'reset' does, or why there's no 'remove' option).
Technical SEO | | Tinhat0