Thanks again Ryan!
Posts made by friendoffood
-
Home page links -- Ajax When Too Many?
My home page has links to major cities. If someone chooses a specific city I want to give them the choice to choose a suburb within the city, With say 50 cities and 50 suburbs for each city that's 2500 links on the home page. In order to avoid that many links on the home page (or any page) I would like to have just the 50 cities and pull up the suburbs as an ajax call that search engines would not read/crawl. This would be better than clicking on a main city and then getting the city page which they then can choose a suburb. Better to do it all at once.
Is it a bad idea to ajax the subregions on the home page and to code it so Google, Bing, other search engines don't crawl or even see anything on the home page related to the suburbs? The search engines will still find the suburb links because they will be listed on each main city page.
-
RE: Upper to Lower Case and Endless Redirects
Hi Ryan,
The sitemap has yet to be submitted as I've been cleaning up url issues first. However, Google has crawled 15,000 pages anyway, and Bing several thousand. Everything is lowercase on the site now, and there are only 9 backlinks (all no-follow) from just 2 domains, as I haven't yet started building them. I'll wait a couple more months and see if the number gets closer to zero,.
Thanks. Ted -
Upper to Lower Case and Endless Redirects
I have a site that first got indexed about a year ago for several months. I shut it down and opened it up again about 5 months ago. 2 months ago I discovered that the upper case in the pages: www.site.com/some-detail/Bizname/product was a no-no. There are no backlinks to these pages yet, so for search engines I put in 301 redirects to the lower case version thinking after a few weeks Bing and Google would figure it out and no longer try to crawl them. FYI there are thousands of these pages, and they are dynamically created.
Well, 2 months later google is still crawling the upper case urls even though it appears that only the lower case are in the index (from when I do a site:www.site.com/some-detail ) search.
Bing is also crawling the upper case although I'm not seeing any of the upper case pages and only a small percentage of the lower case ones show using a site:www.site.com/details.... command
Assuming there are no backlinks will they eventually stop crawling those uppercase pages? If not, again assuming there are no backlinks, should I 410 the upper case pages, or will that remove any credit I am getting for the page having existed for over a year prior to changing the upper to lower?
-
Is it OK to have Search Engines Skip Ajax Content Execution?
I recently added some ajax pages to automatically fill in small areas of my site upon page loading. That is, the user doesn't have to click anything. Therefore when Google and Bing crawl the site the ajax is executed too. However, my understanding is that does not mean Google and Bing are also crawling the ajax content.
I actually would prefer that the content would be not be executed OR crawled by them. In the case of Bing I would prefer that the content not even be executed because indications are that the program exits the ajax page for Bing because Bing isn't retaining session variables which that page uses, which makes me concerned that perhaps when that happens Bing isn't able to even crawl the main content..dunno..So, ajax execution seems potentially risky for normal crawling in this case.
I would like to simply have my program skip the ajax execution for Google and Bing by recognizing them in the useragent and using an If robot == Y skip ajax approach. I assume I could put the ajax program in the robots.txt file but that wouldn't keep Bing from executing it (and having that exit problem mentioned above). It would be simpler to just have them skip the ajax execution altogether.
Is that ok or is there a chance the search engines will penalize my site if they find out (somehow) that I have different logic for them than for the actual users? In the past this surely was not a concern but I understand that Google is increasingly trying to become like a browser so may increasingly have a problem with this approach.
Thoughts?
-
RE: Pages are Indexed but not Cached by Google. Why?
Thanks Travis,
I have discovered that for some users the initial page loads up to 3 times nearly immediately without it being visual. It never happens to me or any or the browser/system combinations I use on remote machine testing - even if I match someones setup who is getting the problem, but there is no question it is happening. This was triggering the robot message I was giving. I don't yet know the cause as the typical culprits don't apply. I relaxed the rule by 1 more load in 30 seconds, which is why you didn't get a message. Am going to use someone's computer tomorrow that gets the problem to try and narrow it down.
Agree on the human testing. Thanks for the suggestions.
take care
-
RE: Pages are Indexed but not Cached by Google. Why?
Hi Travis.
Thanks for the info re Screaming Frog.
I didn't white list your IP. I just changed the number in my files to something else, which would already be unblocked since a day has passed.
The scenario you gave would be quite rare and wouldn't create a block because that requires everything happening in 30 seconds or less (and the session wouldn't expire in that time frame), or that the same IP address also tried to crawl my site with a bot in the useragent (your scenario with screaming frog).. But yours and Max's experience is looking more and more like it is commonplace, and I'm the fool who hasn't known that's what's happening because I can't distinguish between you and a robot (which doesn't keep sessions).
All I need is to verify that I have a sessions problem. Here's what it takes:
1. use a desktop or laptop
2. remove all cookies related to qjamba
3. go to http://www.qjamba.com
4. choose an option (restaurants) and a location (Saint Louis) and click
5. dont do anything else - just close out the tab
6. If your ip address changes, let me know when you do this so I can find it in the logsIn all my usage it keeps the session between steps 3 and 4. It looks like for you and Max it doesn't do that, which means many of my users would be having the same terrible experience as you, for the 3 months since it has been live. It's a disaster. But I have to first verify that it really is a problem and unfortunately I have to rely on strangers like you who are experiencing the problem to do that.
If you just do those steps I promise we'll be done
Thanks, Ted
-
RE: Pages are Indexed but not Cached by Google. Why?
Massimiliano, my guess of your path was the most logical conclusion based on the fact that I have 3 records of the urls you went to on my site, and showing that the program didn't keep any session variables between the 3 urls you came to. You first went to wildwood. Then you went to the home page. This implies that you either did that in a new tab, or you hit the back key, or you modified the url and removed the wildwood part to go to the home page, as opposed to clicking on something on the page. Telling me I'm wrong at least lets me know I may have a serious problem to fix, but you are mistaken to think that this is a robot problem. It is a php session variable problem, apparently, that none of my extensive (hundreds of hours) testing has ever had.
This is a serious problem unrelated to the OP and about 100 times more important than the OP that I was hoping to get some help with because it is very difficult to diagnose without feedback from those having the experience that you had with my site,. However, that's my problem I'll have to deal with. I don't know if you just don't remember or aren't telling me because you think it is a robot problem, but if you do happen to recall the steps (or at least tell me it was all done in the same tab or you hit the back key) I'd appreciate whatever it is you can tell me. If I can't solve the problem it probably means I'll have to shut down my website which I've put more than 4 years of my life into. Seriously.
Thanks for your various other responses though.. Take care. Ted
-
RE: Pages are Indexed but not Cached by Google. Why?
Massimiliano,
Can you tell me your steps that led to that error? It looks like you went directly to www.qjamba.com/local-coupons/wildwood/mo/all and then you opened up a separate tab and went to www.qjamba.com and then either refreshed the home page or opened the home page again in another tab -- all within 30 seconds. That's the only way I have been able to reproduce this , because it looks for 3 searches without any existing session within 30 seconds by the same ip address, and the home page wipes out the session and cookies, and those are the urls the db table shows that you went to, and in that order.
Normally a user stays in the same tab, so with the 2nd search will have a session -- but your ip had no session with each search. And, normally you can't go to the home page from a location page. So, I'm confused as to what you did if it wasn't like what I wrote above. If you didn't do this then I'm worried of a serious programming problem having to do with the php sessions getting dropped between pages.
I"ve put a lot of time into this website and a ton of testing of it too, and just went live a few months ago, so these kinds of problems are disheartening. Ironically, your experience is almost identical to that of Travis, except that in your case you must have moved a little faster since you got a different message. But, it would REALLY help me to get some feedback from both of you confirming what I wrote or setting me straight if you did something different.
-
RE: Pages are Indexed but not Cached by Google. Why?
Geez, I'm so pedantic sometimes. Just need to understand what this means:
<<or, you="" make="" a="" new="" page.="" it="" gets="" crawled.="" <strong="">Checking if it's indexed... no, no, no, no, yes?! That's how long it takes.>></or,>
How do you do the bolded? site:www.site.com/thepage "my content change on the page" ?
And, you did say one can change and not the other yet the page really has been indexed, right?
-
RE: Pages are Indexed but not Cached by Google. Why?
Thanks for sharing that. I was only kidding above, but obviously it's no joking matter when a user gets blocked like you did.
I just looked and see that it blocks when something/someone clicks 3 times within 30 seconds. EDIT: but that's only if it isn't keeping the session between clicks--see next post
-
RE: Pages are Indexed but not Cached by Google. Why?
I'm sorry, but once I know they have crawled a page, shouldn't there be a way to know when it has also been indexed? I know I can get them to crawl a page immediately or nearly, by fetching it. But, I can't tell about the indexing--are you saying that after they crawl the page, the 'time to indexing the crawled page' can vary by site and there really is no way to know when it is in the new index? that is, if it shows as newly cached that doesn't mean it has been indexed too, or it can be indexed and not show up as a site:www... , etc..?
-
RE: Pages are Indexed but not Cached by Google. Why?
Masimilliano, thanks for your input. So you're on of them,huh? Good points, the last thing I want to do is annoy users, yet I also want to track 'real' usage, so there is a conflict. I know it is impossible to block all that I don't want as there is always another trick to employ..I'll have to think about it more.
Yeah the cut and paste blocking is annoying to anyone that would want to do it. But, none of my users should want to do it. My content is in low demand but I hate to make anything easier for potential competition, and some who might be interested won't know how to scrape. Anyway thanks for your feedback on that too.
-
RE: Pages are Indexed but not Cached by Google. Why?
Well, I'm ready to test -- but still not quite sure how since I don't know how to tell when Google has indexed the new content, since sometimes it doesn't get cache'd and sometimes it disappears from the site:www.. listing. I've read it only takes a couple of days after Google crawls the page, and can go with that, but was hoping there is a way to actually 'see' the evidence that it has been indexed.
So, while I've gotten some great input, I am somewhat unsatisfied because I'm not sure how to tell WHEN my content has really been put in the index so that the algorithm is updated for the newly crawled page.
-
RE: Pages are Indexed but not Cached by Google. Why?
I think there's been a misunderstanding. I'm not writing a bot. I am talking about making programming changes and then submitting them to Google via the fetch tool to see how it affects my ranking as quickly as possible, instead of waiting for the next time Google crawls that page -- which could be weeks. I think the early reply may have given you a different impression. I want to speed up the indexing by fetching in Google the pages and then look to see what the effect is. My whole reason for starting this thread was confusion over knowing how to tell when it was indexed because of unexpected results (by me) with the cache and site:www... on Google.
-
RE: Pages are Indexed but not Cached by Google. Why?
Some of my pages are on Google's page 2, 3 and a few on page 1 for certain search terms that don't have a lot of competition but that I know SOME people are using (they are in my logs). and those pages have virtually no backlinks. I want to boost those on page 2 or 3 to page 1 as quickly as possible because p1 is 10x or more better than p2. Time/Cost is an issue here: I can make changes overnight at no cost as opposed to blogging or paying someone to blog.
Because domain authority and usage takes so long, it seems worth tweaking/testing NOW to try to boost certain pages from p2 or 3 to page 1 virtually overnight as opposed to waiting for months on end for usage to kick in. I don't know why Google would penalize me for moving a menu or adding content--basically for performing SEO on page, so it would be nice to be able to figure out what tools (cached pages, site:www. GWT, GA or otherwise) to look at to know if Google has re-indexed the new changes.
Of course, the biggest pages with the most common search terms probably HAVE to have plenty of backlinks and usage to get there, and I know that in the long run that's the way to success overall when there is high competion, but it just seems to me that on page SEO is potentially very valuable when the competition is slimmer.
-
RE: When does Google index a fetched page?
For those following, see this link where Ryan has provided some interesting answers regarding the cache and the site:www.. command
-
RE: Pages are Indexed but not Cached by Google. Why?
What a great answer Ryan! Thanks. I'll tell you what my concern is. As a coupon site I know that users don't want a bunch of wording at the beginning of pages. They just want to find the coupon and get it, but from what I've read Google probably would reward the site more if there was beefier wording in it instead of a bunch of listings that are closer in some ways to just a bunch of links, resembling a simple link page. I also have a 'mega-menu' on some of my pages which I think is pretty user friendly but have read that Google might not know for sure if it is part of the page content or not, as some forums I found talk about how their rankings improved when they simplified their menus. Lastly, I have a listing of location links at the top of the page for users to 'drill down' closer to their neighborhood. This is just about the first thing Google sees and may again be confusing to Google as to what the page is all about.
So IF the lack of 'wording content' and the early location of menu-type content is making my site hard to figure out from Google's perspective, I have alternatives and thought I could test those with Google ranking. For example, I can enter wording content early on so as to 'beef' up the page so that it isnt just a bunch of coupon offer links. I also could ajax the stuff that is above the 'coupon content' so that Google doesn't read that and get confused, and then put the actual links for Google to read down at the bottom of the page. Both of those would be moves soley to satisfy Google and with no effect on the user Google isn't perfect and i don't want to be penalized on ranking as a result of not addressing Google's 'imperfections', as it seems every edge counts and being on page 2 or 3 just isn't quite good enough. I view this as reasonable testing rather than devious manipulation, but of course what matters with Google ranking is what Google thinks.
So in these cases the user response will be neutral -- they generally won't care if I have wording about what is on the page (especially if most requires clicking on 'more info') or am ajaxing the menu information--they again just want to find coupons. But, if Google cares, as I have read they do, then it would be nice to be able to verify that with some simple tests. It may be that my issues are somewhat unique as far as the typical webpage is concerned.
Having said all of that I do think your advice makes a ton of sense as the user is really what it is all about ultimately.
Thanks very much, and I'm giving you a 'good' answer as soon as I hear back!
-
RE: Pages are Indexed but not Cached by Google. Why?
Sorry about your being blocked. A day hadn't passed due to the timing of your second visit--sorry. I just changed the ip address in the table so you aren't blocked now.
Ok, I think I figured out what happened. You first went to the ferguson site. You may clicked on something but the same page was reloaded. Then in a different tab you clicked on my home page from a google search results page. Then in a third tab you went directly to my home page. Then you ran screaming frog and the program stopped it without a message, seeing the word 'spider' in the useragent. Then you tried it again and it recognized that as a stopped bot and gave the message about suspicious activity.
The program wipes out sessions and cookies when a user goes to the home page (it's not even a link anywhere) since that is just a location-choosing page, so when you opened it in a different tab the sessions were wiped out. It had nothing to do with you being in incognito or not having cookies allowed.
Does this sound like what you may have done, and in sequence?
That's what it looks like, and that, if correct, is a huge relief for me since that is not usual user activity. (Although I may have to reconsider whether its still a poor approach).
I don't know about what happened with your second visit and the timeout. curious that you got some 60 pages crawled or so--I don't suppose you have anything that would tell me the first 3 of those so I can look into why it timed out? The table isn't keeping the ip on crawling so I can only look those up by the url crawled and the time.
-
RE: Pages are Indexed but not Cached by Google. Why?
Travis,
First of all, I absolutely appreciate all the time you are taking to address my issues here. Second of all, it IS very tempting to join you and any others here to go build houses or do something else, especially given the last few days..:)
Ok. I'll try to keep it short:
I wasn't thinking you had any bearing on my site going down, but that maybe there was a 'moz effect'. hope not.
Re: Chrome Incognito Settings:
I'm really worried now that there is a sessions problem since anyone with cookies allowed should have the session ID saved between pages--in which case they would have only 1 entry in my 'user' table and you had 3 in a short amount of time. That's why it thought you were a robot. I don't know how to duplicate the problem though because I've never had it personally, and I use a program that connects to other machines with all kinds of combinations of operating systems and browsers and computers and have never had this problem with those. It's my problem. I'll have to figure it out somehow. I have many session variables and it would be a huge overhaul to not use sessions at this point. If you have any ideas (I'm using php) I'm all ears.re:Fun w/ Screaming Frog:
The IP for the 8.5 hour later instance was the same as your first one. Yet, if you were spoofing it shouldn't have said screaming frog in the user agent, right? It was in my 'bot-stopped' file as instantly stopping because it was an unexpected bot. So, confused unless perhaps you tried it separately from running with the spoof?<<normally screaming="" frog="" would="" display="" notifications,="" but="" in="" this="" instance="" the="" connection="" just="" timed="" out="" for="" requested="" urls.="" it="" didn't="" appear="" to="" be="" a="" connectivity="" issue="" on="" my="" end,="" so...="" yeah...="">>
Ok.Fun w/ Scraping and/or Spoofing:
<_>
I'll have to check into it. I've run Yslow and Gtmetrix without problems. I see you tried to run it on the ferguson page and the home page. I just ran the ferguson page in gtmetrix - which uses Page Speed Test (Google?) and Yslow both - and it ran ok, although not a great grade.<<while i'm="" running="" off="" in="" an="" almost="" totally="" unrelated="" direction,="" i="" thought="" this="" was="" interesting.="" apparently="" bingbot="" can="" be="" cheeky="" at="" times.="">> That is interesting.
I'm worried now most about the session issue, as that may be affecting a lot of my users and I've assumed multiple entries were from robots that generally don't keep sessions between page crawls (actually quite a few of the seo crawlers do -- but google, bing, yahoo don't). If you are ok with going to my home page without incognito and clicking on a few pages and letting me know the first part of your IP when you do that it might really help me. Your shouldn't be blocked anymore (it lasts 1 day). But, no worries if you're ready to move on.</while>_</normally>
Sorry, wasn't so short after all. Thanks again. Ted
-
RE: Pages are Indexed but not Cached by Google. Why?
Thanks Ryan. I am new to this so I appreciate the cautions. Are you saying it is a bad idea to try and run tests on specific page changes (ie like reducing the size of a menu or adding more content in paragraph form, for example) to see how they affect ranking on given searches? I'm struggling with why that is a problem for Google.. The SEO experts who run tests--to get a sense of what factors are important -- isn't that exactly what they are doing in a lot of cases? I plan on using the MOZ tools and have much to learn, but it just seems so logical to test things out like I want to do also.
The GWT index status page shows an increasing number of pages, but no detail on which ones have been indexed. That's why I was looking into the site: command and the cache command in the first place. It seems those exist for a reason, and I thought it was to help us with specific pages. If that isn't it, why do they exist? Other than showing the number of indexed pages I don't see much value to the index status page. Perhaps there is a good tutorial somewhere that helps people know how to use the GWT not just to find crawl errors or see general trends but to actually implement SEO page changes to improve their site?
I hope this doesn't sound argumentative..I just don't quite understand your perspective on it.
-
RE: Pages are Indexed but not Cached by Google. Why?
Thanks. I will give you a 'good answer soon', but dont want to close this out yet since there are some other issues going on with the other response. In addition I'm trying to figure out the practical way to use all this info:
Here's what I was trying to do: Make a program change, fetch with google, and see in a couple of days if the site has risen in the rankings. It was complicated (for me) by the absence of the fetched site being cached (which even the one you found is not a recent cache), and the the site:www... sometimes not appearing to have changed at all even if there was a new cache. How is a person to test the immediate effect on ranking when they make a change if they can't tell what Google is doing?
I haven't found in GWT a list of my indexed pages, or the dates they were indexed, and certainly not the content that has been indexed, yet your answer above seems to indicate there is more under the hood than what i have found there. Is there really a way in GWT to see if a specific page has been indexed (my site has thousands so I would need to search for it), and when it was last indexed? I can tell from my dbtables what pages and when they were crawled but not whether Google has added the new info into their indexing algorithm.
-
RE: Pages are Indexed but not Cached by Google. Why?
Thanks for your reply Ryan..
Well, that was easy. How come for some pages they have that code and others they don't? It seems like a backward way to find the cached version of your own pages too..
Thanks much.
Now, for a twist, suddenly I checked just now for site:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all
and it is no longer there. It was there yesterday. Why would it suddenly disappear? I again checked webmaster tools--not finding any crawling errors or problems for it. So, the question suddenly gets reversed: how now can I have a cached version without a site:www... listing?
-
RE: Pages are Indexed but not Cached by Google. Why?
Glad you didn't get frustrated. Ok, lets see, there are a number of issues:
Screaming Frogs--I see the entry, and we know why it was stopped yesterday (robots.txt).
Your activity yesterday--I see 3 entries for you yesterday. I'm very concerned I might have a sessions problem because if you had cookies enabled my tables should have shown only one entry for you with several searches. If you didn't have cookies enabled you should have gottena message saying the site requires cookies. Did you get that message regarding cookies for each page you went to last night? IF you didn't get that message i may have a serious issue to address -- it would really help in this case to know which items you have checked in Chrome for the cookies settings. I used the site this morning incognito without blocking cookies and my table showed just one entry and no blocking, so It would really help me to know what your setttings are for cookies, as I may have to program differently.
On your second visit last night:
First, did you run Screaming Frogs on that second visit? Someone did about 8 1/2 hours after your first visit.Re the scraping with your googlbot spoof: My program shouldn't block the IP for google ever, so are you sure it got blocked--did it give you a message? If so I need to figure that one out too and it would help to have the IP you grabbed if possible. As for the CSS and Javascript, I don't know enough about that--I don't think I have any hyperlinks in the javascript that it would find. I have a both internal and external CSS and JS.
RE the robots file and googlebot: It's crawling ok and was verified in their Webmaster Tools, and the format I use is based on what I read. While you are right that bad bots ignore robots.txt. I want to keep a lot of the neutral or even 'good' bots out, so thats why I do it that way. I have htaccess blocking too, although its never a perfect thing..
RE: the ferguson page OP issue While I haven't formally submitted the ferguson page (how'd you like that for a city choice?) to google via a site map (that's a long story), google has been crawling that page since at least November 15, 2014. I don't know if a cache existed at that time as that is something I have just learned about in the last few days after I was trying to figure out if Google indexed some changes I put in before fetching just a few days ago.
Re this:<_> I"m not sure what a noarchive directive is, so I'd have to say probably not on that one.
Thanks for whatever more help you can provide--I really hope to solve that session issue if it exists (as that has been a concern for several months but I wasn't sure how distinguish between a robot (which doesn't keep sessions) and a real user -- like you! ), and of course the actual OP issue too!_
p.s my server/site was down this morning and my host didn't know why. Is it a bad idea to put one's website urls on the moz forum like I have (ie bad bots crawl the moz site looking for urls to torture?)
-
RE: Pages are Indexed but not Cached by Google. Why?
HI Travis,
Thanks you for your reply and I'm sorry for the frustration. I looked at my db tables to try and figure out which one was you and what actions you took so I could diagnose why it gave you that message. Sounds like I should relax the restrictions on user actions.
I know it would be tempting to ignore me at this point but it really could help me if I can find your activity in my dbase records that the program keeps so I can come up with a better solution.Could you give me part of your IP address so I can try and find your activity in my dbase table records so I can solve the problem that gave you that irritating message? I would really appreciate it, as there actually may be an underlying problem with how it is storing sessions that I've had a real hard time pinpointing because it is very hard to distinguish real users that the program is having a sessions problem for from a returning robot that never keeps session cookies.
As for the robots.txt file, I purposefully block all but a handful of crawlers to avoid getting bombarded by crawlers which can slow down my site, and make it difficult to gauge real users. This is nothing new for my site, and Google crawls my site every day. So, I think perhaps you tried using a tool on it that it blocked and I'm sorry for any confusion that caused. Which tool did you use?
I'm somewhat new to robot activity and how to deal with the pesky guys without causing user problems, so am open to suggestions.
While I don't think these problems you had relate directly to the OP I certainly can see how they would relate to anyone trying to address the OP with a tool-oriented or hands on interaction with my site--again sorry for the frustration. Thanks for your feedback.
-
Pages are Indexed but not Cached by Google. Why?
Here's an example:
I get a 404 error for this:
But a search for qjamba restaurant coupons gives a clear result as does this:
site:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all
What is going on? How can this page be indexed but not in the Google cache?
I should make clear that the page is not showing up with any kind of error in webmaster tools, and Google has been crawling pages just fine. This particular page was fetched by Google yesterday with no problems, and even crawled again twice today by Google Yet, no cache.
-
RE: Bingpreview/1.0b Useragent Using Adding Trailing Slash to all URLs
Will do. Forgot to mention Bing is checking into it. But for the reasons you mentioned I am still going to do the 301s. Thanks again.
-
RE: When does Google index a fetched page?
I'm going to post a question about the non-cached as upon digging I'm not finding an answer.
And, I'm reading where it seems to take a couple of days before indexing, but seeing something strange that makes it confusing:,
This page was cached a few days ago: http://webcache.googleusercontent.com/search?q=cache:http://www.qjamba.com/restaurants-coupons/wildwood/mo/all
The paragraphs wording content that starts with 'The Wildwood coupons page' was added as a test just 3 days ago and then I ran a fetch. When I do a Google search for phrases in it, it does show up in google results (like qjamba wildwood buried by the large national chains). So, it looks like it indexed the new content.
But if you search for wildwood qjamba restaurants cafes the result Google shows includes the word diners that is gone from the cached content (it was previously in the meta description tag)! But if you then search wildwood qjamba restaurants diners it doesn't come up! So, this seems to indicate that the algorithm was applied to the cached file, but that the DISPLAY by Google when the user does a search is still of older content that isn't even in the new cached file! Very odd.
I was thinking I could put changes on pages and test the effect on search results 1 or 2 days after fetching, but maybe it isn't that simple. Or maybe it is but is just hard to tell because of the timing of what Google is displaying.
I appreciate your feedback. I have H2 first on some pages because H1 was pretty big. I thought I read once that the main thing isn't if you start with H1 or H2 but that you never want to put an H1 after an H2.
I'm blocking the cut and paste just to make it harder for a copycat to pull the info. Maybe overkill though.
Thanks again, Ted
-
RE: Bingpreview/1.0b Useragent Using Adding Trailing Slash to all URLs
Thanks for your reply Cyrus. Wow, so much to learn.
I will put in logic via a mod redirect to basically remove the trailing slash and go to the resulting url because otherwise all the trailing slash urls will be a different page of basically a 'no-product' business and the like.
These are all dynamically generated pages, so I think as long as I resolve to the 'proper' no-slash version then I won't need to worry about anything else, like a rel=canonical tag because there wont be any identical content.
Does that sound right to you?
-
RE: When does Google index a fetched page?
Thanks.. That does help..
<<if 404="" you="" have="" a="" for="" the="" cache:="" command="" that="" page="" is="" not="" indexed,="" if="" searching="" content="" of="" using="" site:="" find="" different="" page,="" it="" means="" other="" indexed="" (and="" one="" possible="" explanation="" duplicate="" issue)="">></if>
THIS page gives a 404:
but site:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all
Give ONLY that exact same page. How can that be?
-
RE: When does Google index a fetched page?
Thanks Massimiliano. I'll give you a 'good' answer here, and cross fingers that this next round will work. I still don't understand the timing on site:www , nor what page+features is all about. I thought site:www was supposed to be the method people use to see what is currently indexed.
-
RE: When does Google index a fetched page?
I have a bigger problem than I realized:
I accidentally put duplicate content in my subcategory pages that was just meant for category pages. It's about 100-150 pages, and many of them have been crawled in the last few days. I have already changed the program so those pages don't have that content. Will I get penalized by Google-- de-indexed? Or should I be ok going forward because the next time they crawl it will be gone?
I'm going to start over with the fetching since I made that mistake but can you address the following just so when I get back to this spot I maybe understand better?:
1. When I type into the google searchbar lemay mo restaurant coupons smoothies qjamba
the description it gives is <cite class="_Rm">www.qjamba.com/restaurants-coupons/lemay/mo/smoothies</cite>The Lemay coupons page features both national franchise printable restaurant coupons for companies such as KFC, Long John Silver's, and O'Charlies and ...
BUT when I do a site:<cite class="_Rm">www.qjamba.com/restaurants-coupons/lemay/mo/smoothies</cite>it gives the description found in the meta description tag: www.qjamba.com/restaurants-coupons/.../smoothie...Traduci questa pagina Find Lemay all-free printable and mobile coupons for Smoothies, and more.
It looks like site:www does NOT always give the most recent indexed content since 'The Lemay coupons page...' is the content I added 2 days ago for testing! Maybe that's because Lemay was one of the urls that I inadvertently created duplicate content for.
2. Are ANY of the cache command, page+features command, or site:www supposed to be the most recent indexed content?
-
RE: When does Google index a fetched page?
thanks.
That's weird because doing the site: command separately for that first page for the /smoothies gives different content than for /all :
site:www.qjamba.com/restaurants-coupons/lemay/mo/smoothies
site:www.qjamba.com/restaurants-coupons/lemay/mo/all
But why would that 'page+features' command show the same description when the description in reality is different? This seems like a different issue than my op, but maybe it is related somehow--even if not I prob should still understand it.
-
RE: When does Google index a fetched page?
you are missing a w there. site:www and you have site:ww
That's why I'm so confused--it appears to be indexed from the past, they are in my dbase table with the date and time crawled -- right after the fetch --, and there is no manual penalty in webmaster tools.
Yet there is no sign it re-indexed after crawling 2 days ago now. I could resubmit (there are 15 pages I fetched), but I'm not expecting a different response and need to understand what is happening in order to use this approach to test SEO changes.
thanks for sticking with this. Any more ideas on what is happening?
-
RE: When does Google index a fetched page?
Hi, thanks again.
this gives an error:
but the page exists, AND site:www.qjamba.com/restaurants-coupons/lemay/mo/all
has a result, so I'm not sure what a missing cache means in this case..
The log shows that it was crawled right after it was fetched but the result for site:... doesn't reflect the changes on the page. so it appears not to have been re-indexed yet, but why not in the cache?
-
RE: When does Google index a fetched page?
HI Massimiliano,
Thanks for your reply.
I'm getting an error in both FF and Chrome with this in the address bar. Have I misunderstood?
http://webcache.googleusercontent.com/search?q=cache:http://www.mysite.com/mypage
Is the command (assuming I can get it to work) supposed to show when the page was indexed, or last crawled?
I am storing when it crawls, but am wondering about the couple of days part, since it has been 2 days now and when I first did it it was re-indexing within 5 minutes a few days ago.
-
When does Google index a fetched page?
I have seen where it will index on of my pages within 5 minutes of fetching, but have also read that it can take a day. I'm on day #2 and it appears that it has still not re-indexed 15 pages that I fetched. I changed the meta-description in all of them, and added content to nearly all of them, but none of those changes are showing when I do a site:www.site/page
I'm trying to test changes in this manner, so it is important for me to know WHEN a fetched page has been indexed, or at least IF it has. How can I tell what is going on?
-
Bingpreview/1.0b Useragent Using Adding Trailing Slash to all URLs
The Bingpreview crawler, which I think exists in order to take snapshots of mobile friendly pages, crawled my pages last night for the first time. However, it is adding a trailing slash to the end of each of my dynamic pages. The result is my program is giving the wrong page--my program is not expecting a trailing slash at the end of the urls. It was 160 pages, but I have thousands of pages it could do this to.
I could try doing a mod rewrite but that seems like it should be unnecessary. ALL the other crawlers are crawling the proper urls. None of my hyperlinks have the slash on the end. I have written to Bing to tell them of the problem.
Is anyone else having this issue? Any other suggestions for what to do?
The user agent is: Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 BingPreview/1.0b
-
RE: Why wont rogerbot crawl my page?
Thanks. The robots.txt file was the problem. It originally (yesterday) excluded rogerbot (by default) and then I remembered that and put it in as rogerbot but that didn't work. So I changed it to RogerBot and that didn't work. Today I removed the robots.txt file completely and it worked. Then I put it back with rogerbot and it is working.
It APPEARS that maybe it read the robots.txt yesterday before i put in rogerbot and for some reason didn't read it after I put it in. Will never know but it is now working.
Thanks for the help!
-
RE: Why wont rogerbot crawl my page?
Hi sure, thanks. This page shouldn't have a speed issue but maybe you can see what the issue is:
www.qjamba.com/local-coupons/wentzville/mo/all
Thanks.
-
Why wont rogerbot crawl my page?
How can I find out why rogerbot won't crawl an individual page I give it to crawl for page-grader? Google, bing, yahoo all crawl pages just fine, but I put in one of the internal pages fo page-grader to check for keywords and it gave me an F -- it isn't crawling the page because the keyword IS in the title and it says it isn't. How do I diagnose the problem?
-
RE: Attack of the dummy urls -- what to do?
Thanks Ray. Appreciate the advice!
-
RE: Attack of the dummy urls -- what to do?
Hi Ray-pp,
Thanks for your answer. I'm not getting anything significant, but occasionally a bot will come with extra stuff added to the parameter names, so it got me to thinking a malicious program or nasty competitor might want to do that to cause havoc. My understanding is that 404s don't hurt SEO ranking from Google, but I was thinking that the way things are set up now no-one would get a 404 and in fact Google would index the 'bad' pages, so maybe I needed to do something proactively to 404 or 301 such pages so they would never get put into an index at all.
Since my site has lots of dynamically generated pages, I've had my share of surprises, and am just trying to avoid any new ones!
-
Attack of the dummy urls -- what to do?
It occurs to me that a malicious program could set up thousands of links to dummy pages on a website:
www.mysite.com/dynamicpage/dummy123
www.mysite.com/dynamicpage/dummy456
etc..
How is this normally handled? Does a developer have to look at all the parameters to see if they are valid and if not, automatically create a 301 redirect or 404 not found? This requires a table lookup of acceptable url parameters for all new visitors.
I was thinking that bad url names would be rare so it would be ok to just stop the program with a message, until I realized someone could intentionally set up links to non existent pages on a site.
-
RE: Is it better to find a page without the desired content, or not find the page?
Great answer Monica -- thank you!
-
RE: Do 404s really 'lose' link juice?
Matt, thanks.. Good points for sure. My concern is that since something like 50% of new businesses close doors within 5 years, so the list of redirected urls will just keep getting bigger over time..Is that a concern? I guess over time less people will link to the defunct businesses, but I will still have to track them..maybe at some point when the number of links to them is small it would make sense to then 404 them? Of course, I'd still need to track which ones to 404, so i'm now wondering when 404 ever makes sense on prior legitimate pages..
Just to be clear -- redirecting does remove the old link from the index, right?
-
Do 404s really 'lose' link juice?
It doesn't make sense to me that a 404 causes a loss in link juice, although that is what I've read. What if you have a page that is legitimate -- think of a merchant oriented page where you sell an item for a given merchant --, and then the merchant closes his doors. It makes little sense 5 years later to still have their merchant page so why would removing them from your site in any way hurt your site? I could redirect forever but that makes little sense. What makes sense to me is keeping the page for a while with an explanation and options for 'similar' products, and then eventually putting in a 404. I would think the eventual dropping out of the index actually REDUCES the overall link juice (ie less pages), so there is no harm in using a 404 in this way. It also is a way to avoid the site just getting bigger and bigger and having more and more 'bad' user experiences over time.
Am I looking at it wrong?
ps I've included this in 'link building' because it is related in a sense -- link 'paring'.
-
Is it better to find a page without the desired content, or not find the page?
Are there any studies that show which is best? If you find my page but not the specific thing you want on it, you may still find something of value. But, if you don't you may associate my site with poor results, which can be worse than finding what you want at a competitor site. IOW maybe it is best to have pages that ONLY and ALWAYS have the content desired.
What do the studies suggest?
I'm asking because I have content that maybe 1/3 of the time exists and 2/3 of the time doesn't...think 'out of stock' products. So, I'm wondering if I should look into removing the page from being indexed during the 2/3 or should keep it. If I remove it then my concern is whether I lose the history/age factor that I've read Google finds important for credibility. Your thoughts?