Can Search Engines Read "incorrect" urls?
-
I know that ideally a url should be something of the nature domain.com/topic, but if the url contains additional characters, for example, domain.com/topic?keyword, can the search engines still understand the complete words in the domain? Even though there are additional "incorrect" characters? Or do they stop "reading" once they find odd characters?
Thanks!
-
A few other things to note for having parameters in URLs:
- In Google Webmaster Tools and Bing Webmaster Tools, you can instruct the search engines to ignore certain parameters, so that they'll treat domain.com/topic?keyword and domain.com/topic as the same page (if ?keyword doesn't change the page content)
- You can also place the rel=canonical element on pages. So you could set domain.com/topic?keyword to rel canonical to domain.com/topic to pass its pagerank along.
-
Search engines will read all your parameters unless you tell google with webmaster tools what parameters to ignore. This can cause an issue with the url like domain.com/topic?keyword&somefield then pages that include the keyword and other parameters will share the link juice. So, if you have 10 options of somefield you will get ~1/10 value per page indexed.
So, it is better for you to use rewrites to include your keyword in the url and then mark parameters to not be indexed in Goggle etc.
-
Search engines can read most characters in a URL string, but specifically & generally refers to a variable in a script which doesn't typically have much valuable information regarding what a page may be about. Sometimes those variables may be the topic of a category of a shopping cart, so I have to imagine that information could be taken into account, but for long urls like the following it is hard to believe everything is factored into the URL's relevance to the keyword: http://www.google.com/search?q=long+url+string&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
Search engines index the whole URL and if there is keyword rich content that can definitely help, both from having the keyword bolded in the snippet (CTR WIN!) and a possible bump in the page's relevance to the keyword.
-
In general search engines are able to identify keywords in the URL even if they are i.e. a parameter that follows a "?" or other non-alphanumeric character. They might not treat it as an equally strong signal as when the keyword is a part of the file name, subdomain or domain name though. Hope that answers your question.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I make it so that robots.txt is not ignored due to a URL re-direct?
Recently a site moved from blog.site.com to site.com/blog with an instruction like this one: /etc/httpd/conf.d/site_com.conf:94: ProxyPass /blog http://blog.site.com
Technical SEO | | rodelmo4
/etc/httpd/conf.d/site_com.conf:95: ProxyPassReverse /blog http://blog.site.com It's a Wordpress.org blog that was set as a subdomain, and now is being redirected to look like a directory. That said, the robots.txt file seems to be ignored by Google bot. There is a Disallow: /tag/ on that file to avoid "duplicate content" on the site. I have tried this before with other Wordpress subdomains and works like a charm, except for this time, in which the blog is rendered as a subdirectory. Any ideas why? Thanks!0 -
Combining variants of "last modified", cache-duration etc
Hiya, As you know, you can specify the date of the last change of a document in various places, for example the sitemap, the http-header, ETag and also add an "expected" change, for example Cache-Duration via header/htaccess (or even the changefreq in the sitemap). Is it advisable or rather detrimental to use multiple variants that essentially tell browser/search engines the same thing? I.e. should I send a lastmod header AND ETag AND maybe something else? Should I send a cache duration at all if I send a lastmod? (Assume that I can keep them correct and consistent as the data for each will come from the very same place.) Also: Are there any clear recommendations on what change-indicating method should be used? Thanks for your answers! Nico
Technical SEO | | netzkern_AG0 -
Does "?" in my URL have a negative effect?
I am having a difficult time finding specific information about the effect, if any, having a ? within the URL structure. We have the descriptive keyword phrase followed by the ? location id as in this example: http://www.adventuresonly.com/adventure-locations/things-to-do-in-arizona?stateid=124 Any feedback on effect and a corrective process to improve if necessary would be appreciated!
Technical SEO | | RBBonds0 -
Choosing the right page for rel="canonical"
I am wondering how you would choose which page to use as a canonical ? All our articles sit in an article section and they are called in the url when linked from a particular category. Since some articles are in many categories, we may have several links for the same page. My first idea was to put the one in the article category as the canonical, but I wonder if Google will lose the context of the page for it's ranking because it will not be in the proper category. For exemple, this page in the article section : http://www.bdc.ca/en/advice_centre/articles/Pages/exporting_entering.aspx Same page in the Expand Your Sales > Going Global section : http://www.bdc.ca/EN/advice_centre/expand_your_sales/going_global_or_international_markets/Pages/RelatedArticles.aspx?PATH=/EN/advice_centre/articles/Pages/exporting_entering.aspx The second one has much more context related to it, like the breadcrumb is showing the path and the left menu is open at the right place. For this example, I would choose te second one, but some articles may be found in 2 or 3 categories. If you could share your lights on this it would be very appreciated ! Thanks
Technical SEO | | jfmonfette0 -
What's our easiest, quickest "win" for page load speed?
This is a follow up question to an earlier thread located here: http://www.seomoz.org/q/we-just-fixed-a-meta-refresh-unified-our-link-profile-and-now-our-rankings-are-going-crazy In that thread, Dr. Pete Meyers said "You'd really be better off getting all that script into external files." Our IT Director is willing to spend time working on this, but he believes it is a complicated process because each script must be evaluated to determine which ones are needed "pre" page load and which ones can be loaded "post." Our IT Director went on to say that he believes the quickest "win" we could get would be to move our SSL javascript for our SSL icon (in our site footer) to an internal page, and just link to that page from an image of the icon in the footer. He says this javascript, more than any other, slows our page down. My question is two parts: 1. How can I verify that this javascript is indeed, a major culprit of our page load speed? 2. Is it possible that it is slow because so many styles have been applied to the surrounding area? In other words, if I stripped out the "Secured by" text and all the syles associated with that, could that effect the efficiency of the script? 3. Are there any negatives to moving that javascript to an interior landing page, leaving the icon as an image in the footer and linking to the new page? Any thoughts, suggestions, comments, etc. are greatly appreciated! Dana
Technical SEO | | danatanseo0 -
Google's "cache:" operator is returning a 404 error.
I'm doing the "cache:" operator on one of my sites and Google is returning a 404 error. I've swapped out the domain with another and it works fine. Has anyone seen this before? I'm wondering if G is crawling the site now? Thx!
Technical SEO | | AZWebWorks0 -
Is it OK for a sitemap to appear as a "Top URL" in Google Webmaster?
I'm using Google Webmaster (alongside other tools) to understand how Google is indexing my site. One of the tools is "Content Keywords", where it lists keywords that Google sees as significant for your site. The keywords shown are generally fine, but when I click on an individual word, I am often seeing our sitemap as one of the "Top URLs" that the keyword is found on (our sitemap is at system/sitemap1.xml.gz) - is this OK? Obviously I don't want to add the sitemap URL to robots.txt, but I also want to ensure that 'real' user-focused pages (e.g. our homepage) appear higher in the "Top URLs" list for the keywords, as I'm assuming this is an indicator of how the site is performing in search. Any help appreciated!
Technical SEO | | anilababla0 -
Up-to-date list of search engine bot user agents
Does anyone know of an up-to-date-list of search engine bot user agents? Thanks.
Technical SEO | | JoeAmadon0