Hashbang URLs make the website difficult to crawl by Google?

2018-06-23 01:22:53

Our agency built a dynamic website that uses a lot of AJAX interactions and #! (hashbang) URLs: http://www.gunlawsbystate.com/

It's a long book which you can scroll through and the URL in the address bar changes dynamically. We have to support IE so please don't advise using pushState — hansbang is the only option for us for now.

There's a navigation in the left sidebar which contains links to all chapters in the book.

An example of a link: http://www.gunlawsbystate.com/#!/federal-properety/national-parks-and-wildlife-refuges/

We are expecting google to crawl this: http:// www.gunlawsbystate.com/?_escaped_fragment_=/federal-properety/national-parks-and-wildlife-refuges/ which is complete html snapshot of the section. (+ there are links to the subsections like www.gunlawsbystate.com/#!/federal-properety/national-parks-and-wildlife-refuges/ii-change-in-the-law/ => www.gunlawsbystate.com/?_escaped_fragment_=/federal-properety/national-parks-and-wildlife-refuges/ii-change-in-the-law/ ).

It all looks to be complete according to the Google's specifications ( developers.google.com/webmasters/ajax-crawling/docs/specification ). The site is run for about 3 months for now. The homepage is getting re-indexed every 10-15 days.

The problem is that for some reason Google doesn't crawl hashbang URLs properly. It seems like Google just "doesn't like" those URLs.

www.google.ru/search?&q=site%3Agunlawsbystate.com : Just 67 pages are indexed. Notice that most of the pages Google indexed have "normal" URLs (mostly wordpress blog posts, categories and tags) and just 5-10% of result pages are hashbang URLs, although there are more than 400 book sections with unique content which Google should really like if it crawles it properly.

Could someone give me an advise on this, why Google does not crawl our book pages properly? Any help will be appreciated.

PS I'm sorry for not-clickable links — stackoverflow doesn't let me post more than 2.

UPD. The sitemap has been submitted to google a while ago. Google Webmaster Tools says that 518 URLs submitted and just 62 URLs indexed. Also, on the 'Index Status' page of the Webmaster Tools I see that there are 1196 pages Ever crawled; 1071 pages are Not selected. It clearly points to the fact that for some reason google doesn't index the #! pages that it visits frequently.

You are missing a few things. First you need a meta tag to tell google that the Hash URLS can be accessed via a different url.

<meta name="fragment" content="!">

Next you need to serve a mapped version of each of the urls to googlebot.

When google visits:

http://www.gunlawsbystate.com/#!/federal-regulation/airports-and-aircraft/ii-boarding-aircraft/

It will instead crawl:

http://www.gunlawsbystate.com/?_escaped_fragment_=federal-regulation/airports-and-aircraft/i-introduction/

For that to work you either need to use something like PHP or ASP to serve up the correct page. Asp.net routing would also work if you can get the piping correct. There are services which will actually create these "snapshot" versions for you and then your meta tag will point to their servers.

Since it is deprecated by Google and now Google is not able to access the content under hashbang URLs.

Based on research Google avoids Escaped fragment URLs now and suggesting to create separate pages rather than using HashBang.

So I think PushState is the other option which can be used in this case.

链接地址: http://www.djcxy.com/p/64658.html

上一篇: 相对URL相对URL？

下一篇: Hashbang网址让Google很难抓取网站？