Just to follow up - I have now actually 410'd the pages and the 410's are still being re-indexed.
Posts made by Tom3_15
-
RE: 404's being re-indexed
-
RE: 404's being re-indexed
I'll check this one out as well, thanks! I used a header response extension which reveals the presence of x-botots headers called web developer.
-
RE: 404's being re-indexed
Thank you for the quick response,
The pages are truly removed, however, because there were so many of these types of pages that leaked into the index, I added a redirect to keep users on our site - no intentions of being "shady", I just didn't want hundreds of 404's getting clicked and causing a very high bounce rate.
For the x-robots header, could you offer some insight into why my directive isn't working? I believe it's a regex issue on the wp-content. I have tried to troubleshoot to no avail.
<filesmatch <strong="">"(wp-content)">
Header set X-Robots-Tag: "noindex, nofollow"</filesmatch>I appreciate the help!
-
RE: 404's being re-indexed
Thank you! I am in the process of doing so, however with a 410 I can not leave my JS redirect after the page loads, this creates some UX issues. Do you have any suggestions to remedy this?
Additionally, after the 410 the non x-robots noindex is now being stripped so it only resolves to a 410 with no noindex or redirect. I am still working on a noindex header, as the 410 is server-side, I assume this would be the only way, correct?
-
RE: 404's being re-indexed
Yes, all pages have a noindex. I have also tried to noindex them using htaccess, to add an extra layer of security, but it seems to be incorrect. I believe it is an issue with the regex. Attempting to match anything with wp-content.
<filesmatch "(wp-content)"="">Header set X-Robots-Tag: "noindex, nofollow"</filesmatch>
-
404's being re-indexed
Hi All,
We are experiencing issues with pages that have been 404'd being indexed. Originally, these were /wp-content/ index pages, that were included in Google's index. Once I realized this, I added in a directive into our htaccess to 404 all of these pages - as there were hundreds. I tried to let Google crawl and remove these pages naturally but after a few months I used the URL removal tool to remove them manually.
However, Google seems to be continually re/indexing these pages, even after they have been manually requested for removal in search console. Do you have suggestions? They all respond to 404's.
Thanks
-
International SEO And Duplicate Content Within The Same Language
Hello,
Currently, we have a .com English website serving an international clientele. As is the case we do not currently target any countries in Google Search Console. However, the UK is an important market for us and we are seeing very low traffic (almost entirely US). We would like to increase visibility in the UK, but currently for English speakers only. My question is this - would geo-targeting a subfolder have a positive impact on visibility/rankings or would it create a duplicate content issue if both pieces of content are in English? My plan was:
1. Create a geo-targeted subfolder (website.com/uk/) that copies our website (we currently cannot create new unique content)
2. Go into GSC and geo-target the folder to the UK
3. Add the following to the /uk/ page to try to negate duplicate issues. Additionally, I can add a rel=canonical tag if suggested, I just worry as an already international site this will create competition between pages
However, as we are currently only targeting a location and not the language at this very specific point, would adding a ccTLD be advised instead? The threat of duplicate content worries me less here as this is a topic Matt Cutts has addressed and said is not an issue.
I prefer the subfolder method as to ccTLD's, because it allows for more scalability, as in the future I would like to target other countries and languages.
Ultimately right now, the goal is to increase UK traffic. Outside of UK backlinks, would any of the above URL geo-targeting help drive traffic?
Thanks
-
If we should add a .eu or remain .com solely
Hello,
Our company is international and we are looking to gain more traffic specifically from Europe. While I am aware that translating content into local languages, targeting local keywords, and gaining more European links will improve rankings, I am curious if it is worthwhile to have a company.eu domain in addition to our company.com domain.
Assuming the website's content and domain will be exactly the same, with the TLD (.eu vs .com) being the only change - will this add us benefit or will it hurt us by creating duplicate content - even if we create a separate GSC property for it with localized targeting and hreflang tags? Also - if we have multiple languages on our .eu website, can different paths have differing hreflangs?
IE: company.eu/blog/german-content German hreflang and company.eu/blog/Italian-content Italian hreflang.
I should note - we do not currently have an hreflang attribute set on our website as content has always been correctly served to US-based English speaking users - we do have the United States targeted in Google Search Console though.
It would be ideal to target countries by subfolder rather if it is just as useful. Otherwise, we would essentially be maintaining two sites.
Thanks!
-
RE: Dynamically Inserting Noindex With Javascript
It seemed to work. Hopefully the noindex is respected, thank you!
-
RE: Dynamically Inserting Noindex With Javascript
It looks like it is active. Thanks, John! Can you no-index an entire directory in GSC? I thought it was only per URL.
-
Dynamically Inserting Noindex With Javascript
Hello,
I have a broken plugin creating hundreds of WP-Content directory pages being indexed by Google. I can not access the source code of these pages to add a noindex to them. The page URL's all have the plugin name within them. In order to resolve the issue, I wrote a solution with javascript to dynamically add in a noindex tag to any URL containing the plugin name. Would this noindex be respected by Google and is there a way to immediately check that it is respected?
Currently, I can not delete the plugin due to issues with it's php.
If you would like to view the code: https://codepen.io/trodrick/pen/Gwwaej?editors=0010
Thanks!
-
RE: Google is indexing bad URLS
I do agree, I may have to pass this off to someone with more backend experience than myself. In terms of plugins, are you aware of any that will allow you to add noindex tags to an entire folder?
Thanks!
-
RE: Google is indexing bad URLS
Thank you for all your help. I added in a directive to 410 the pages in my htaccess as so: Redirect 410 /revslider*/. However, it does not seem to work.
Currently, I am using Options All -Indexes to 404 the URLs. Although I still remain worried as even though Google would not revisit a 410, could it still initially index it? This seems to be the case with my 404 pages - Google is actively indexing the new 404 pages that the broken plugin is producing.
As I can not seem to locate the directory in Cpanel, adding a noindex to them has been tough. I will look for a plugin that can dynamically add it based on folder structure because the URLs are still actively being created.
The ongoing creation of the URL's is the ultimate source of the issue, I expected that deleting the plugin would have resolved it but that does not seem to be the case.
-
RE: Google is indexing bad URLS
Thank you for your response! I will certainly use the regex in my robots.txt and try to change my Htaccess directive to 410 the pages.
However, the issue is that a defunct plugin is randomly creating hundreds of these URL's without my knowledge, which I can not seem to access. As this is the case, I can't add a no-index tag to them.
This is why I manually de-indexed each page using the GSC removal tool and then blocked them in my robots.txt. My hope was that after doing so, Google would no longer be able to find the bad URL's.
Despite this, Google is still actively crawling & indexing new URL's following this path, even though they are blocked by my robots.txt (validated). I am unsure how these URL's even continue to be created as I deleted the plugin.
I had the idea to try to write a program with javascript that would take the status code and insert a no-index tag if the header returned a 404, but I don't believe this would even be recognized by Google, as it would be inserted dynamically. Ultimately, I would like to find a way to get the plugin to stop creating these URL's, this way I can simply manually de-index them again.
Thanks,
-
Google is indexing bad URLS
Hi All,
The site I am working on is built on Wordpress. The plugin Revolution Slider was downloaded. While no longer utilized, it still remained on the site for some time. This plugin began creating hundreds of URLs containing nothing but code on the page. I noticed these URLs were being indexed by Google. The URLs follow the structure: www.mysite.com/wp-content/uploads/revslider/templates/this-part-changes/
I have done the following to prevent these URLs from being created & indexed:
1. Added a directive in my Htaccess to 404 all of these URLs
2. Blocked /wp-content/uploads/revslider/ in my robots.txt
3. Manually de-inedex each URL using the GSC tool
4. Deleted the plugin
However, new URLs still appear in Google's index, despite being blocked by robots.txt and resolving to a 404. Can anyone suggest any next steps? I
Thanks!
-
Question Regarding Website Architecture
Hello All,
Our website currently has a general solutions subdirectory, which then links to each specific solution, following the path /solutions/ => /solutions/solution1/. As our solutions can be quite complex, we are adding another subdirectory to target individuals by profession. I would like to link from our profession pages to the varying solutions that help.
As both subdirectories will be top level pages in the main menu, would linking from our professions to **solutions **be poor architecture? In this case the path would look like: /professions/ => /professions/profession1/ => /solutions/solution1/.
Thanks!
-
RE: How to allow bots to crawl all but WP-content
Thank you for the help, Gaston!
-
RE: How to allow bots to crawl all but WP-content
Can I do so with:
Allow: *.jpg
Allow: *.png
-
RE: How to allow bots to crawl all but WP-content
Thanks, Gaston. I should have been more clear about what I am looking to do. I currently am having an indexation issue. Somehow, pages are being automatically generated by WordPress.
These pages are often .txt files of information or code from plugins, all beginning with /wp-content/uploads/ in their URL. I have been manually removing them from the index and would like to now have them be uncrawlable.
Best
-
RE: How to allow bots to crawl all but WP-content
Gaston,
Thanks for the fast reply! My images folder does follow that format, which is what makes me worrisome as we are blocking the wp-conent folder.
Thanks!
-
RE: How to allow bots to crawl all but WP-content
Hi Gaston,
I just wanted to follow up with you with one last question if possible. Would this allow my images and PDF's to be crawled & indexed still?
Thanks!
-
RE: How to allow bots to crawl all but WP-content
Thank you for the response. I'm still a little uncertain, does the version you wrote allow the bots to crawl the css and js as well?
Best
-
How to allow bots to crawl all but WP-content
Hello,
I would like my website to remain crawlable to bots, but to block my wp content and media. Does the following robots.txt work? I worry that the * user agent may conflict with the others.
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/User-agent: GoogleBot
Allow: /User-agent: GoogleBot-Mobile
Allow: /User-agent: GoogleBot-Image
Allow: /User-agent: Bingbot
Allow: /User-agent: Slurp
Allow: / -
RE: Sudden Indexation of "Index of /wp-content/uploads/"
Using the htaccess I 404'd all the pages using "Options All -Indexes". Will this resolve the issue?
-
RE: Sudden Indexation of "Index of /wp-content/uploads/"
We use Worpdress as our CMS and do indeed use Yoast. We have never had an issue with /wp-content/ being indexed before, and I have been very conscientious about keeping our index clean.
Why I am confused is that this is an index of our wp-content, similar to a sitemap. I do not have robots.txt blocked for this as I do not know what is making the index.
Thanks!
-
Sudden Indexation of "Index of /wp-content/uploads/"
Hi all,
I have suddenly noticed a massive jump in indexed pages. After performing a "site:" search, it was revealed that the sudden jump was due to the indexation of many pages beginning with the serp title "Index of /wp-content/uploads/" for many uploaded pieces of content & plugins.
This has appeared approximately one month after switching to https. I have also noticed a decline in Bing rankings. Does anyone know what is causing/how to fix this? To be clear, these pages are **not **normal /wp-content/uploads/ but rather "index of" pages, being included in Google.
Thank you.
-
RE: General SSL Questions After Move
Thanks for the great responses, Donna and Trenton. I just had one follow up question - I am still seeing small amounts of search console data on my http property. While I have read that this is not uncommon, it is mildly concerning as I force https server-side. Should I be alarmed by this?
-
RE: Pages Competing With One Another
Sounds good, Ill give it a shot. Thank you guys!
-
RE: Pages Competing With One Another
Thanks for the input, Nicholas. This is what I was thinking, however, it seems that the blog post is now ranking for the last four days, and my solutions page isn't ranking at all for the keyword. Usually, the blog post would rank 1-2 days a week while the product page would rank the rest.
Would you still suggest de-optimizing the blog? Ranking for the keyword has been a months-long initiative, and I don't want to ruin my efforts.
Or should I wait and see if the product page begins ranking instead of the post again before de-optimizing the post?
-
RE: Pages Competing With One Another
Thanks for the response, I do have the keyword as the anchor text linking from my blog post to my product page. I don't know why it is when one ranks, the other does not, rather than alongside each other.
Would de-optimizing my blog post allow for my product page to rank all the time - or will it cause a lack of coverage when the blog post would otherwise rank?
-
Pages Competing With One Another
Hello,
We are ranking for an acronym, which I understand can lead to fickle rankings. However, we have two pages ranking page one - two for the same keyword, but they do so in spite of each other.
By this I mean, one page will rank, while the other is nowhere to be found. It seems that the one page (a blog post) is more likely to rank on the weekends while the product page is more likely to rank on the weekdays.
I would like the product page to rank all the time, and to target another keyword with the blog post. Would removing the keyword from the blog post allow the product page to rank all the time - or would it lead to no pages ranking during times when the blog post would otherwise be ranking?
I should note the blog post has more external links and is not exactly optimized for the keyword, while the product page has more internal links and is optimized for the keyword.
-
RE: General SSL Questions After Move
Awesome, thank you for answering everything!
-
General SSL Questions After Move
Hello,
We have moved our site to https, Google Analytics seems to be tracking correctly. However, I have seen some conflicting information, should I create a new view in analytics?
Additionally, should I also create a new https property in Google search console and set it as the preferred domain? If so, should I keep the old sitemap for my http property while updating the sitemap to https only for the https property?
Thirdly, should I create a new property as well as new sitemaps in Bing webmaster?
Finally, after doing a crawl on our http domain which has a 301 to https, the crawl stopped after the redirect, is this a result of using a free crawling tool or will bots not be able to crawl my site after this redirect?
Thanks for all the help in advance, I know there are a lot of questions here.
-
RE: A crawl revealed two home pages
Thanks Niglel, after doing a little investigating, I believe google search console may have added in the backslash for formatting reasons. It appears with a backslash in home view, where you can see domains, however when viewing preferred domain, it does not appear with a backslash. To test this I used a practice site and added it in without a backslash, following my submission google added in a backslash under the domain view.
So I should be set?
Thanks!
-
RE: A crawl revealed two home pages
Thanks Nigel, what will happen to the existing data under the view of the current preferred domain with the backslash if I switch the preferred domain to no backslash? I worry that the existing data will be erased or not transferred.
-
RE: A crawl revealed two home pages
Thank you for the fast responses.
Currently, "www.domain.com/" has been claimed and set as preferred, all search console data appears on this account. (www and backslash)
"domain.com/" has also been claimed, with no data on this view.---(non www)
However, as stated, "www.domain.com/" (Preferred and with backslash) redirects to www.domain.com. So as per suggestions I should add "www.domain.com", should this now be my preferred domain?
Thanks guys!
-
RE: A crawl revealed two home pages
We are currently HTTP, however the page domain.com/ seems to redirect to domain.com, as I can not access domain.com/ without it bringing me to domain.com (sorry for the redundancy). However, the moz crawl did not reveal a 301. Does this resolve the duplicate content issue? Thanks for the fast answers.
-So far www and non www have been claimed only.
-
A crawl revealed two home pages
After doing a site crawl using the moz tool, I have found two home pages-www.domain.com/ and www.domain.com. Both URLS have the exact same metrics and I have set a preferred domain name in google, will this hurt seo? Should I claim the www.domain.com/ as well as www.domain.com and domain.com in the search console?
Thanks