Internal linking question
-
Hi there. Are all internal links listed in GWMT actually indexed?
-
Jonnygeekuk,
If GWT is telling you they are "aware" (whether indexed or not) of URLs that you do not want indexed, and you have either blocked them in the robot.txt file or the robots header tag, or the page serves a 404 or 410 response in the http header, it wouldn't hurt to use the URL removal tool to remove those pages from the index just to be sure.
-
So, sounds like you're looking for a list of indexed pages? Will this tool help?
http://www.intavant.com/tools/google-indexed-pages-extractor/
-
I'm sorry it's taking me so long to get back to you on this. However you told me you say you're using the removal tool in Google Webmaster tools?
I want to be certain you're not using the link disavow tool as a removal tool is that correct?
"Google updates its entire index regularly. When we crawl the web, we automatically find new pages, remove outdated links, and reflect updates to existing pages, keeping the Google index fresh and as up-to-date as possible.
If outdated pages from your site appear in the search results, ensure that the pages return a status of either 404 (not found) or 410 (gone) in the header. These status codes tell Googlebot that the requested URL isn't valid. Some servers are misconfigured to return a status of 200 (Successful) for pages that don't exist, which tells Googlebot that the requested URLs are valid and should be indexed. If a page returns a true 404 error via the http headers, anyone can remove it from the Google index using the webpage removal request tool. Outdated pages that don't return true 404 errors usually fall out of our index naturally when other pages stop linking to them."
"
Reincluding content in search
"Content removed using the URL removal tool will not appear in search results for a minimum of 90 days or until the content has been removed from the Google index. However, if you've updated robots.txt, added meta tags, or password-protected content to prevent it being crawled, the content should naturally have dropped out of our index, and you shouldn't need to worry about it reappearing after 90 days. You can reinclude your content at any time during the 90-day period by following the steps below.
Reinclude content:
- On the Webmaster Tools Home page, click the site you want.
- In the left-hand menu, click Optimization, and then click Remove URLs.
- Select the Removed content tab, and then click Reinclude next to the content you want to reinclude in the Google index.
Pending requests are usually processed within 3-5 business days."
-
Hi Chris, Thomas
Thanks for taking the time to reply.
Essentially, the reason i'm asking this question is recently the site in question became heavily over indexed due to search filters etc becoming indexed. This resulted in a ton of thin content being indexed. We've since no indexed these pages but they are taking time to drop off so we are helping a little by using the removal tool in GWMT. A lot of these pages are hidden, it's difficult to find them in the main index but index status says we still have >7k pages indexed when we really should have fewer than 2k. A site: command reveals about 9k but only 600 are listed and they are all valid pages. Basically we're trying to find the urls to remove and noticed that a lot of them are listed in the internal links tab on GWMT. I just wondered whether it was advisable to remove these too, in addition to the 2.5k we have already removed.
-
Hi Johnny, I want to tell you that I agree with what Chris stated above. If you're looking for someone to confirm that. You want to also make sure you do not have over 100 to 150 URLs or internal links on your site. This will hurt Google indexing of the website.
I also use a tool to make internal links. And if that is what you are speaking of. It's called http://scribecontent.com. You can use it not only on word press but on all sites. I have found it to be extremely useful please be cautious though it how many links you built internally so that you do not create a page that cannot be indexed correctly.
http://www.distilled.net/u/search-engine-basics/#crawling
I hope I've been in help,
Thomas
-
Hey JonnyG,
Be sure not to confuse links with URLs. Essentially, a link is clickable thing on a web page that, when clicked, takes the user to another URL. A URL is an address (non-clickable) . A web page is the resource that exists at a URL.
Anyway, the Internal Links tab shows how many links exist on your site that can take you to other pages on your site. However, if you click on the Health | Index Status tab, you'll get choices to see Basic and Advanced info on your indexed URLs. In the advanced tab, you'll see the total number of pages Google's index on your site. Google's Webmaster Tools Help has a page on Index Status for more info.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Questions about canonicals
Howdy Moz community, I had a question regarding canonicals. I help a business with their SEO, and they are a service company. They have one physical location, but they serve multiple cities in the state. My question is in regards to canonicals and unique content. I hear that a page with slightly differing content for each page won't matter as much, if most of the content is relevantly the same. This business wants to create service pages for at least 10 other cities they service. The site currently only have pages that are targeting one city location. I was wondering if it was beneficial to use a template to service each city and then put a canonical there to say that it is an identical page to the main city page? Example: our first city was san francisco, we want to create city pages for santa rosa, novato, san jose and etc. If the content for the 2nd, 3rd, 4th, city were the same content as the 1st city, but just had the slight change with the city name would that hurt? Would putting a canonical help this issue, if i alert that it is the same as the 1st page? The reason I want to do this, is because I have been getting concerns from my copywriter that after the 5th city, they can't seem to make the services pages that much different from the first 4 cities, in terms of wording of the content and its structure. I want to know is there a simpler way to target multiple cities for local SEO reasons like geo targeted terms without having to think of a completely new way to write out the same thing for each city service page, as this is very time consuming on my end. Main questions? Will making template service pages, changing the city name to target different geographic locations and putting a canonical tag for the new pages created, and referring back to the main city page going to be effective in terms of me wanting to rank for multiple cities. Will doing this tell google my content is thin or be considered a duplicate? Will this hurt my rankings? Thanks!
Technical SEO | | Ideas-Money-Art0 -
Page for Link Building
Hello Guys, My question is about a link building process. We all know that some directories/sites do require a reciprocal link. Does it make any sense to creat a page in website exclusively to reciprocal links? And what we do with this webpage in terms of indexing, do folow, crawling...etc. Any sugestions are more then welcome 🙂 Tks in advance! PP
Technical SEO | | PedroM0 -
Internal Linking
Hello there, I own a "how to" website with 1000+ articles, and the number of articles is growing every day. Often some articles are easier to understand if I link a certain step to an article that was written before, because that article explains the step in more detail. Should I use "read here/read more" or the "title of the article I'm referring to" as anchor text? When is internal linking too much? Should I use nofollow?
Technical SEO | | FisnikSylka0 -
Link Indexing Thoughts
We have have several promotional Articles put out for a few client sites, (posted on sites - not article directories) That was in Sept, it looks like they have not yet been indexed - any ideas on best to get them indexed? Not just these, but a lot of external links indexed quickly -Google seem to be slowing getting to them (big web after all....)
Technical SEO | | OnlineAssetPartners0 -
Why are my links not being counted?
I have a site that has over 400 links going to it. When I use Moz open site explorer or any other SEO tool its says I have only 12 links. Does anyone know why this could be happening?
Technical SEO | | Goopping0 -
Robots.txt questions...
All, My site is rather complicated, but I will try to break down my question as simply as possible. I have a robots.txt document in the root level of my site to disallow robot access to /_system/, my CMS. This looks like this: # /robots.txt file for http://webcrawler.com/
Technical SEO | | Horizon
# mail webmaster@webcrawler.com for constructive criticism **User-agent: ***
Disallow: /_system/ I have another robots.txt file in another level down, which is my holiday database - www.mysite.com/holiday-database/ - this is to disallow access to /holiday-database/ControlPanel/, my database CMS. This looks like this: **User-agent: ***
Disallow: /ControlPanel/ Am I correct in thinking that this file must also be in the root level, and not in the /holiday-database/ level? If so, should my new robots.txt file look like this: # /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism **User-agent: ***
Disallow: /_system/
Disallow: /holiday-database/ControlPanel/ Or, like this: # /robots.txt file for http://webcrawler.com/
# mail webmaster@webcrawler.com for constructive criticism **User-agent: ***
Disallow: /_system/
Disallow: /ControlPanel/ Thanks in advance. Matt0 -
Absolute of Relative Internal Website Links
Hi, I am not sure what is considered best practice when linking between pages on the same site - absolute or relative: Link Or Link I notice a lot of CMS systems (WordPress) use the absolute method - is there a reason? Any help much appreciated. Barney.
Technical SEO | | barnst0 -
Linking out?
First of all, sorry this Q is all in one block, but iPads don't like this site or vc/vs. When using the SEOmoz on-site keyword optimizer tool, it suggests at least one link to be to an off-site page. Would it be considered a link exchange if we linked out to an niche SUPER Authority sit that had a link back to our website? It seems like a naturally good strategy, but I'm afraid google may not agree. If the answer is no, there are many similar sites that mention our company in ver good ways, awards, etc.., but with no links. I would think this is a no-brainer. Personally I would like to eventually harvest all this press coverage to benefit our site. Btw, I was grey before I learned about SEOmoz, just like the rest of our niche. Now I'm shooting to be Snow White! Hopefully it works out. 🙂 I also wrote two landing pages that I tried to SEO the right way. I would love to hear your feedback to know if they are truly effective and if they are actually white. I think they are, but don't know "all" the rules of being white http://jamproa.com/ideology/product-innovation.php http://jamproa.com/industrial-design/what-is.php Thanks!
Technical SEO | | dmac0