Thanks for everything.
i'll stick to the slower method and see what's going on in the index.
Welcome to the Q&A Forum
Browse the forum for helpful insights and fresh discussions about all things SEO.
Thanks for everything.
i'll stick to the slower method and see what's going on in the index.
Dear Dr. Meyers,
very insightful!!!
i must clear all the irrelevant page and the sooner the better.
(1) could take months or years
(2) sounds as a very good approach - i'm building my Sitemap with code so it's not a problem. the only problem is with a few hundreds at a time it could also take a long time. and wouldn't google spend a lot of time on crawling those pages and index less of the fresh new ones?
(3) what about google removal tool - and it's connected to my point on last post about setting a new site architecture:
what do you think about this approach?
Thanks again for all your help, i really appreciate it!
Assaf.
Dear Mark,
*i've sent you a private message.
i'm starting to understand i've a much bigger problem.
*my index status contain 120k pages while only 2000 are currently relevant.
your suggestion is - after a match finishes pragmatically add to the page and google will remove it from it's index. it could work for relatively new pages but since very old pages don't have links OR sitemap entry it could take a very long time to clear the index cause they're rarely crawled - if at all.
so if today a match URL is like this: www.domain.com/sport/match/T1vT2
restrict www.domain.com/sport/match/ on robots.txt
and from now on create all new matches on different folder like: www.domain.com/sport/new-match-dir/T1vT2
is this a good solution?
wouldn't google penalize me for removing a directory with 100k pages?
if it's a good approach, how much time it will take for google to clear all those pages from it's index?
I know it's a long one and i'll really appreciate your response.
Thanks a lot,
Assaf.
Dear Dr. Meyers,
i'm starting to understand i've a much bigger problem.
all finished matches are not relevant anymore and though you can reach them (their Page) from SERP or direct URL, they don't appear on site links OR sitemap. so the best idea is to remove all these old pages from google index - they don't contribute + they made my index status contain 120k pages while only 2000 are currently relevant.
this causes waste of google crawling on irrelevant pages and a potential that google may see some of them as dupes cause in some cases most of the page is relatively similar.
one suggestion i got is - after a match finishes pragmatically add to the page and google will remove it from it's index. - will it remove it if there're no links/sitemap to this page???
but i also have to handle the problem of the huge index - the above approach may/or not handle pages from now on, but what about all the other far past pages with finished matches??? how can i remove them all from the index.
adding <meta name="robots" content="noindex,follow">to all of them could take months or more to clean the index cause they're probably rarely crawled.</meta name="robots" content="noindex,follow">
more aggressive approach would be to change this site architecture and restrict by robot.txt the folder that holds all the past irrelevant pages.
so if today a match URL is like this: www.domain.com/sport/match/T1vT2
restrict www.domain.com/sport/match/ on robots.txt
and from now on create all new matches on different folder like: www.domain.com/sport/new-match/T1vT2
is this a good solution?
wouldn't google penalize me for removing a directory with 100k pages?
if it's a good approach, how much time it will take for google to clear all those pages from it's index?
I know it's a long one and i'll really appreciate your response.
Thanks a lot,
Assaf.
yes. when the 1st Panda update was rolled out i've lost 50% of the traffic from google and haven't really recovered since.
Thanks Mark!
any good article about how to recover from Panda?
Hi Mark
these pages are very important when they are relevant (before the match finished) - they are the source of most of our traffic which come from long tail searches.
some of these pages have inbound link and it would be a shame to lose all this juice.
would noindex remove the pages from the google index? how much time it would take? wouldn't a huge noindex also look suspicious?
by "evergreen pages" - you mean pages that are always relevant like League page / Sport page etc...?
Thanks,
Assaf.
Hi,
i run a site about sport matches, every match has a page and the pages are generated automatically from the DB. pages are not duplicated, but over time some look a little bit similar. after a match finishes it has no internal links or sitemap entry, but it's reachable by direct URL and continues to be on google index. so over time we have more than 100,000 indexed pages.
since past matches have no significance and they're not linked and a match can repeat and it may look like duplicate content....what you suggest us to do:
when a match is finished - not linked, but appears on the index and SERP
301 redirect the match Page to the match Category which is a higher hierarchy and is always relevant?
use rel=canonical to the match Category
do nothing....
*301 redirect will shrink my index status, some say a high index status is good...
*is it safe to 301 redirect 100,000 pages at once - wouldn't it look strange to google?
*would canonical remove the past matches pages from the index?
what do you think?
Thanks,
Assaf.
we're on the same stage.
tips and ideas please
Thanks!
CuteRank - a small desktop app that can be updated on demand - shows top 100
1st thing is to add google authorship (if you haven't already done it)
CuteRank - a small desktop app that can be updated on demand - shows top 100
we're on the same stage.
tips and ideas please
Thanks!
Software Engineer & Internet entrepreneur
Looks like your connection to Moz was lost, please wait while we try to reconnect.