Hi there,
I'm using Firecheckout on a few projects, and it is really easy to use. (M 1.9.3.x)
Welcome to the Q&A Forum
Browse the forum for helpful insights and fresh discussions about all things SEO.
Hi there,
I'm using Firecheckout on a few projects, and it is really easy to use. (M 1.9.3.x)
Oh, sorry. Somehow I didn't get any notification on your reply.
For IIS you could go with web.config of your website. The code will be something like:
<rule name="Force WWW and SSL" enabled="true" stopprocessing="true"><match url="(.*)"><conditions logicalgrouping="MatchAny"><add input="{HTTP_HOST}" pattern="^[^www]"><add input="{HTTPS}" pattern="off"></add></add></conditions>
<action type="Redirect" url="https://www.domainname.com/{R:1}" appendquerystring="true" redirecttype="Permanent"></action></match></rule>
Hi Sammy,
If I understand your question, you need help with htaccess code to force both https and www with same rule? If so, this might be what you are looking for:
RewriteEngine On
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} !^www.domainname.com$ [NC]
RewriteRule ^(.*)$ https://www.domainname.com/$1 [L,R=301]
Hi there,
The URL structure will remain the same? If so, in the .htaccess file of the subdomain, you should add the following after the RewriteEngine On:
RewriteEngineOn
RewriteCond%{HTTP_HOST}^shop.domain.co.uk$[NC]
RewriteRule(.*)https://www.domain.co.uk/$1 [R=301,L]
this should do the trick to redirect https://shop.domain.co.uk/product-category/great-merchandise/?product_order=desc to https://www.domain.co.uk/product-category/great-merchandise/?product_order=desc
I hope this helped
You have my details on my profile. And after we resolve it, we should paste here the solution without domain-specific information, so it helps others in the future. (if you don't mind).
Hi there,
Probably what is happening is that your plugins are not optimized for redirects. You should address it from your .htaccess file (probably adds the redirects, but they are not optimized). If you can give access, I can help you out.
Hi James,
So far as I can see you have the following architecture:
Since from the robots.txt the listing page pagination is blocked, the crawler can access only the first 15 job postings are available to crawl via a normal crawl.
I would say, you should remove the blocking from the robots.txt and focus on implementing a correct pagination. *which method you choose is your decision, but allow the crawler to access all of your job posts. Check https://yoast.com/pagination-seo-best-practices/
Another thing I would change is to make the job post title an anchor text for the job posting. (every single job is linked with "Find out more").
Also if possible, create a separate sitemap.xml for your job posts and submit it in Search Console, this way you can keep track of any anomaly with indexation.
Last, and not least, focus on the quality of your content (just as Matt proposed in the first answer).
Good luck!
In my experience, it will help the overall site, but still... do not expect a huge impact on these. URLs are shared, but I don't believe people will start to link to them except for private conversations.
This is a technical question, that they need to tackle from database side. It can be implemented, but it needs a few extra development hours, depending on the complexity of your website architecture/cms used/etc.. Anyways, you are changing the URL, so don't forget about the best practices for them. Good luck!
Hi there,
I believe the most logical implementation would be to use "noindex, follow" meta robots on these pages.
I wouldn't use canonical because it does not serve this purpose. Also make sure, these pages are not blocked via robots.txt.
Hi Rachel,
Like I have mentioned in the previous question, the best case would be to translate them (especially if you are creating a group of redirects now), but if it won't be implemented, then regardless check for the followings after implementation:
Btw.: A "must read" article, when we are talking about hreflangs: https://moz.com/blog/hreflang-behaviour-insights from Dave Sottimano.
The idea is (which we both highlighted), that blocking your listing page from robots.txt is wrong, for pagination you have several methods to deal with (how you deal with it, it really depends on the technical possibilities that you have on the project).
Regarding James' original question, my feeling is, that he is somehow blocking their posting pages. Cutting the access to these pages makes it really hard for Google, or any other search engine to index it. But without a URL in front of us, we cannot really answer his question, we can only create theories that he can test
Hi Rachel,
Regarding the language code in the URL, you can leave it (page**-uk**.html,page**-es**.html, etc.), but maybe it would be an idea of having a translated page url for each language. For example:
This would serve a little bit better than the previous version, where you would have:
Usually, we are talking about a wrongly coded page, which gives you a loop when crawling. If you can show the site itself, I will gladly help you find it.
If you cannot disclose the url, you can do the following: create a crawl with a tool such as Screaming Frog, filter for these URLs, and check their inlinks and anchor texts specifically for this type of URLs. When you will find a pattern, you will find where the code is broken. Good luck!
Hi Chuck,
I have seen something similar on one of my previous projects, where outsider domains have been redirected to the project itself. So spammycrap.tld/UbaOZ had a redirect to ourproject.tld/UbaOZ/ this way it was appearing in our Search Console error list.
I discovered it by going through all the backlinks from AHrefs and Majestic. If this is also your case, unfortunately, you cannot do much about it, especially if you do not have any control over those domains. What I did, I created a list of these URLs, and redirected them to a page on our project, which had a 410 status code.
Hopefully, this approach helps you.
István
Hi James,
First of all you need to categorize these 404 pages, some may come from website sections that were deleted in the past, and haven't been addressed. Other 404 URLs could appear through domain redirects to your website. (unfortunately, these are harder to find, process and resolve).
For the first category (when sections have been deleted, moved, etc) you will have to ask yourself which is the correct way to resolve the issue? 301 redirect to most relevant URLs? Or just 410 and let search engines know that your page are deleted, and URLs should be removed from index. Please don't start redirecting every single 404 URL to your homepage, or any irrelevant page, or you will be creating soft 404s.
Regarding the /insertgibberishurlhere type of URLs, you should check what kind of domains are redirected to your domains (I have seen domains that had this kind of 404 not found errors via massive domain redirects to a project). First of all, if this is the case, you need to ask yourself, what are you redirecting to your website. If the domains are not in your hand, you could also redirect all of these to a 410 status code url on your website.
Oh, and the obvious version: crawl your website with tools as Screaming Frog, and make sure you are not creating the 404 URLs yourself. (almost forget about this version)
Let me know, if you have further questions.
Sorry Richard, but using noindex with canonical link is not quite a good practice.
It's an old entry, but still true: https://www.seroundtable.com/noindex-canonical-google-18274.html
Hi James,
Regarding the robots.txt syntax:
Disallow: /jobs/? which basically blocks every single URL that contains /jobs/**? **
For example: domain.com**/jobs/?**sort-by=... will be blocked
If you want to disallow query parameters from URL, the correct implementation would be Disallow: /jobs/*? or even specify which query parameter you want to block. For example Disallow: /jobs/*?page=
My question to you, if these jobs are linked from any other page and/or sitemap? Or only from the listing page, which has it's pagination, sorting, etc. is blocked by robots.txt? If they are not linked, it could be a simple case of orphan pages, where basically the crawler cannot access the job posting pages, because there is no actual link to it. I know it is an old rule, but it is still true: Crawl > Index > Rank.
BTW. I don't know why you would block your pagination. There are other optimal implementations.
And there is always the scenario, that was already described by Matt. But I believe in that case you would have at least some of the pages indexed even if they are not going to get ranked well.
Also, make sure other technical implementations are not stopping your job posting pages from being indexed.
I'll check in a little bit later, currently, I am getting a DNS error when trying to access it.
Hey,
Try fetch and render from Google Search Console. There you could check if there is an X-Robots-Tag in the response header.
For reference: https://developers.google.com/search/reference/robots_meta_tag
Hi Bharat,
Your website is currently unreachable.
Hi there,
Yes, there are several ways you could do that, but my question is, if it is worth it, or not. If we are talking about a large website, you could have issues with Google's crawl budget. Basically, the crawlers would have to go through an additional 301 to land on your homepage.
Google describes their best practices about redirecting 404 pages here: https://support.google.com/webmasters/answer/181708?hl=en
In my opinion, the decision should be determined by the size of the website. If we are talking about a big website, maybe it would be more beneficial if you follow Google's guidelines, implement a 410 status code. If the website is small, maybe you could redirect the users to the homepage, and hope they are going continue their journey on your website.
István
Let me know how it turns out. If the problem persist, I'm glad to help good luck!
Hey,
Can you point out an example URL? (if you don't want to disclose the website URL in here, you can do it via a personal message). This way we can debug an exact URL and not just a theory.
Regarding blocking via robots.txt: it is never a good idea to block a search engine from URLs you want to deindex. This way the Google crawlers won't grab and process the data, and you will have your URLs in the search index.
Just check: https://support.google.com/webmasters/answer/6062608?hl=en
"While Google won't crawl or index the content blocked by robots.txt
, we might still find and index a disallowed URL if it is linked from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the page can still appear in Google search results."
In case of 301 redirects (make sure you are not using 302), if the crawler can access the page, you should have the old URL removed from the index.
Sorry, I've forgot to detail that case:) Thanks for pointing it out.
Hi there,
Regarding website.com/category/product
What you have to take into consideration: if a product is going to be placed in more than one category, then this product is going to be indexable on more than one URL paths. (Like Gaston mentioned below, in this case you need to take care of duplication, which you can do with canonical links or redirection to one path).
For example, let's say you have a product which is in both cat1, subcat1 and cat2. This way you will have minimum 3 available paths to the product:
This means that on product level you will have to deal with internal duplicated content. This is why usually prefer to use website.com/product url path (IMO).
Regarding the /category/subcategory/ vs /category-subcategory/ this is really a technical question. How "deep" is going to be your website? Do you want a flat infrastructure?
I usually prefer the /category/subcategory/ structure, because of the idea of contents of an e-commerce website should be build up with a structure based upon content silos, where from general you move towards the specific (indifferent of how we achieve this, with subcategories or filters within categories). I really hope it helps you answer your questions.
Hi there,
I would say, discuss with your developer. It happened with one of our developers, who has been using our company's subdomain for a personal project development/staging (without our knowledge) and forget to noindex the dev site.
I have found it out exactly with a site: search.
As I can see the links are not live anymore, but are indexed with the old.britishcarregistrations.co.uk subdomain.
Hi there,
Just check the answer from the following question: https://moz.com/community/q/unable-to-crawl-after-301-permanent-redirect-how-to-fix-this.
Jordan Railsback describes the issue, and how to fix it.
If you want to go with a free tool, you can also check the xenu link sleuth (http://home.snafu.de/tilman/xenulink.html)
Just make a full crawl with the tool, export page map to tab separated file. Then you can open this file in Excel (or any similar software). It should do the job
Hi!
I personally like to use the service of Kraken.io. Check their pricing, but I'd say it is a low cost and efficient way to handle the images.
We have been using it with their magento extension. But they also have a plugin for Wordpress. (https://kraken.io/plugins)
I know, it is not the only solution, but it worked well for me
Good luck!
Hi David,
There was a very good article about this topic back in 2014 (I know it sounds a little bit old, but still it is very descriptive): https://moz.com/blog/seo-guide-to-google-webmaster-recommendations-for-pagination
We also had a similar implementation, and I went with the Option 3B from the article pointed out above: **Option 3: Implement Pagination Relationships + noindex, follow directive after page 2. **
So you want to have only the first page indexed, then set the directive "robots" to "noindex, follow" after the first page. HINT If you use /page/ in your url structure (vs page query parameter), you can also use that to check if that page needs to be indexed or not, as it should only appear after page 2.
I hope it helps.
I'm glad I could help! Let me know if you hit any walls with the implementation.
Hi there,
Usually my advice is to add any custom code after the default WordPress rules, just to keep it more organised. It is very important not to add the rules in the WP section (# BEGIN WordPress -> # END WordPress).
Also I usually add comments before every rule group I create (just to have it more organised, and if anything goes wrong - check Search Console for anomalies after implementation - I know where I need to revert/adjust). You can add comments by starting the line with a # sign.
I hope it helps.
Oh and BTW, when using Redirect 301, you should use relative path for the OLD url and absolute path for the NEW url, so the lines that you provided need to contain the full URL for the new version:
Redirect 301 old-relative-path.html http://www.yourdomaingoeshere.com/newurl/
Hi there,
From what you are describing the first thought that came to me is a wrongly implemented relative URL.
What I would do in this case: run a full crawl for the website with screaming frog (you will need a paid version) and make a bulk export for 404 inlinks via: Bulk Export -> Response Codes -> Client error (4xx) Inlinks. I would use that list to find a pattern in the anchor texts used to generate these kind of URLs.
When you have found a pattern you can go digg into the source code of the pages where the links come from.
If you don't have a Screaming frog license, send me a PM with the website and I will make a quick crawl for you.
Istvan
Hi,
You could add the following code to your .htaccess to redirect all dated urls to non-dated version:
RedirectMatch 301 /([0-9]+)/([0-9]+)/([0-9]+)/(.*)$ http://www.**domain**.com/$4
Change domain.com with your domain name.
This should create a redirect from http://www.website.com/blog/2016/04/10/topic-on-how-to-optimise-blog to www.website.com/blog/topic-on-how-to-optimise-blog (and every similar situation).
Hey,
If you check today's whiteboard Friday with Dr. Pete (https://moz.com/blog/arent-301s-302s-canonicals-all-basically-the-same-whiteboard-friday), he mentions this case:
"Some types of 302s just don't make sense at all. So if you're migrating from non-secure to secure, from HTTP to HTTPS and you set up a 302, that's a signal that doesn't quite make sense. Why would you temporarily migrate?"
So answering your question, Google probably considered your initial http -> https redirects as 301.
Hey there,
Try to avoid using both canonical and noindex, it is not advised (source: https://www.seroundtable.com/noindex-canonical-google-18274.html).
If you are using canonical on the subdomain it should be more than enough to deindex the subdomain version (if in any way it gets indexed) and resolve the duplicate content issue.
Greetings, Keszi
Hi Jose,
The canonical "Warning" is a notification. Tools cannot tell which is the original page, but can alert you that you have a canonical link on the specific URL.
With this report and a little Excel work you can double-check your canonical implementation.
Greetings, Keszi
I personally check webmasterworld.com, they have a Google Updates and SERP Changes thread for each month.
Hi Adrienne,
I would try to use Barracuda's tool: http://barracuda.digital/panguin-tool/
sometimes it gives a clue when exactly the drop has happened (and what update has been near that date).
Also you could check Moz's Updates history: https://moz.com/google-algorithm-change
These will help you, if your website has been hit by an algorithmic update.
Let me know if you need further assistance.
Keszi
Hi Simon,
I will quote from: https://support.google.com/webmasters/answer/6066468?hl=en
OR
Note the URL is not unreachable for all of Google-- it is just unreachable for the Fetch as Google simulation tool.
Keszi
Hi there,
Redirects should always be written in a relative to absolute path.
So the .htaccess file for http://tshirts.com/ should contain something like:
Redirect 301 / http://www.mainsite.com/tshirts.html
Redirect 301 /blue.html http://www.lampclick.com/blue-t-shirts.html
Redirect 301 /white.html http://www.mainsite.com/white-t-shirts.html
Redirect 301 /black-tshirts.html http://www.mainsite.com/bk-t-shirts.html
I think that should solve the issue.
Keszi
Hi there!
First of all, I believe that you shouldn't use both canonical and rel=prev/next. The two techniques do not work together well: "In cases of paginated content, we recommend either a rel=canonical from component pages to a single-page version of the article, or to use rel=”prev” and rel=”next” pagination markup." (quoted from http://googlewebmastercentral.blogspot.com/2013/04/5-common-mistakes-with-relcanonical.html"
Basically you have more possibilities:
I think the best method for you would be to have a rel="prev/next" and have the canonical removed.
I hope this helps, Keszi
Hi,
Could you point out the website or what platform you are using? maybe it would be easier to help.
When everything else fails, I do the XML sitemaps manually (Notepad++ and Excel). But Screaming Frog also is helpful.
Keszi
Hi,
Yesterday there was a Mozscape Index update. (https://moz.com/products/api/updates)
More than possible you can see the effect of that update. It is enough if your current linking domains have their DA drop, and it can effect your DA value. But as John mentioned above, if the rankings have not been effected, I would not worry about it.
Keep up the good job!
Keszi