Geolocation is the act of identifying a user's location via his IP address. It allows ecommerce sites to deliver a stronger user experience based on each visitor’s location. Unfortunately, without careful architectural planning and testing, ecommerce developers can inadvertently lock out search engine crawlers in the process.
The challenge comes from the fact that search engine crawlers originate from a specific set of IP addresses. For example, Googlebot crawls from a range of IP addresses located in San Jose, Calif. Imagine the SEO consequences of a site developed to geolocate visitors on every page: Googlebot would be restricted to content suited for visitors in San Jose, excluding all other countries, states or other locations from the crawl.
A Simple Solution?
One possible solution would be to restrict the use of geolocation to specific pages of the site where it makes sense for the user experience. Use that geolocation information to set a cookie, and use the cookie to determine the geographically relevant content that each individual visitor is served. And lastly, include a crawlable and usable feature to manually switch the location.
For example, a hypothetical ecommerce site called FakeCompany that does business in the U.S., Canada and the U.K. would naturally want to geolocate at the page on which visitors enter. A cookie is set, and the user either navigates happily on or chooses to manually switch that location. The location selector itself can be as simple as static HTML links to the other locations, or as complex as a heavily styled fly-out menu, as long as it’s developed so that it degrades to crawlable HTML links when CSS, JavaScript and cookies are disabled.
What Will the Bots See?
There’s another consideration here, though. What exactly will the bots see when they come to the site? That’s an interesting question because bots don’t accept cookies, don’t (typically) execute JavaScript and CSS — this may be changing— and crawl from a fixed location. For our three-location example ecommerce site, let’s say that Googlebot initiates its crawl at the home page of www.FakeCompany.com. Below is an example of the types of crawl barriers that can form when SEO hasn’t been planned for:

Is this a problem? Depending on FakeCompany’s business model and priorities, it may be almost entirely focused on its U.S. business and may not be terribly concerned about unrealized search-referred traffic from the U.K. and Canada. Or perhaps FakeCompany occupies a very specific niche with low organic search potential, has ubiquitous brand recognition and 100 percent of the market share. In most cases, though, an architectural hurdle to SEO is a problem in concept, at least. Let’s discover if it’s a problem in reality.
Identify the Crawl Barrier
If a crawl barrier does exist, indexation will be low in the areas that are not accessible. A set of "site: queries" in Google and other major browsers will expose this data. You would search, for example, on site:www.fakecompany.com/us/ versus site:www.fakecompany.com/ca/ and site:www.fakecompany.com/uk/. For FakeCompay, an issue would be present if indexation is found to be high in the U.S. and low in the U.K. and Canada. A quick review of the analytics should also reveal that the U.K. and Canada portions of the site are not receiving organic search visits.
The next step would be locating the crawl barrier. You could look at which pages are and aren’t indexed and deduce from there where the breakdown is. But I prefer to mimic Googlebot and attempt to navigate the site. The closest way to experience the bot’s-eye view is using Chris Pederick’s User Agent Switcher and Web Developer Toolbar for Firefox. Set the user agent to Googlebot 2.1 in "User Agent Switcher," and disable JavaScript, cookies and CSS in Web Developer Toolbar. This combination of add-ons is not a 100 percent accurate representation. But the sections of a site that are unnavigable will likely also be unnavigable to bots. Make sure to try all the navigational aspects of the site, especially the location selector. When a location is selected, does it persist or is the visit routed back to the U.S.? Note any areas where links were expected to lead to one page based on the site’s human usability, but instead another was served based on lack of JavaScript and cookie acceptance.
Look at the Text-Only Cache
Once the barriers are found, compare these pages with the cached text-only view of the pages in Google. For example, FakeCompany might suspect that the location selector in the header has been coded with too much reliance on JavaScript and is thus uncrawlable. Reviewing the navigation in Google’s text only cache, with JavaScript, cookies and CSS still disabled can unearth more clues about what Google actually has and hasn’t been able to execute on the page.
If indexation levels are consistently strong, then check rankings and visitor data. Even if a site is indexed it may not have the link popularity flowing to it required to win rankings and drive visitors. This situation is more of a bottleneck than a barrier. If FakeCompany had a link popularity flow issue, we’d expect to see the U.S. pages ranking better and driving more organic search visitors, because the crawl paths are wide open and unrestricted. Each U.S. page links to the high-level U.S. pages and link popularity flows well deep into the site.
But the U.K. and Canada pages would be ranking more poorly and driving fewer organic search visitors because the only access a bot has to these pages is via a single HTML sitemap on the U.S. section. That sort of bottleneck in the bots’ crawl path would send the message that the Canada and U.K. pages were less important than the U.S. They’re relegated to a lower status in the architecture of FakeCompany’s site; so naturally the search engines consider them lower in value as well.
Why the Lower Ranking in Different Countries?
There could be alternate reasons for lower rankings and visitors as well, such as different levels of brand recognition in different countries, different demand for the products, and different levels of competition. FakeCompany would naturally need to use this business knowledge to help analyze the extent that organic search and link popularity may be the issue versus other business issues.
Once any bottlenecks have been identified, again review the text-only cached version and analyze the bottlenecks using the add-ons mentioned above. If marketers are frustrated by the round-about path they have to take when they turn off their site’s fancy bells and whistles to navigate like a bot, it’s a good bet that the bots are likewise needing to take circuitous routes to crawl the site. Every additional page that a bot has to crawl is a waste of link popularity. It's a bottleneck. And every bottleneck decreases the ability of the pages on the other side to rank and drive traffic and sales.
With the barriers and bottlenecks identified, a whole new round of work must be done to estimate the impact, the time and resources required to fix it, and the potential return on that investment when compared with the company’s other priorities.
