I love technical SEO (most of the time). However, it can be frustrating to come across the same site problems over and over again. In the years I've been doing SEO, I'm still surprised to see so many different websites suffering from the same issues.
1. Uppercase vs Lowercase URLs
2. Multiple versions of the homepage
3. Query parameters added to the end of URLs
- Waterproof jackets
- Hiking boots
- Women's walking trousers
- Size (i.e. Large)
- Colour (i.e. Black)
- Price (i.e. £49.99)
- Brand (i.e. North Face)
4. Soft 404 errors
5. 302 redirects instead of 301 redirects
6. Broken/Outdated sitemaps
A few uncommon technical problems
7. Ordering your robots.txt file wrong
8. Invisible character in robots.txt
9. Google crawling base64 URLs
10. Misconfigured servers
This issue is actually written by Tom, who worked on this particular client project. We encountered a problem with a website's main landing/login page not ranking. The page had been ranking and at some point had dropped out, and the client was at a loss. The pages all looked fine, loaded fine, and didn't seem to be doing any cloaking as far as we could see.
After lots of investigation and digging, it turned out that there was a subtle problem caused by a mis-configuration of the server software, with the HTTP headers from their server.
Normally an 'Accept' header would be sent by a client (your browser) to state which file types it understands, and very rarely this would modify what the server does. The server when it sends a file always sends a "Content-Type" header to specify if the file is HTML/PDF/JPEG/something else.
Their server (they're using Nginx) was returning a "Content-Type" that was a mirror of the first fiel type found in the clients "Accept" header. If you sent an accept header that started "text/html," then that is what the server would send back as the content-type header. This is peculiar behaviour, but it wasn't being noticed because browsers almost always send "text/html" as the start of their Accept header.
However, Googlebot sends "Accept: */*" when it is crawling (meaning it accepts anything).
(See: http://webcache.googleusercontent.com/search?sourceid=chrome&ie=UTF-8&q=cache:http://www.ericgiguere.com/tools/http-header-viewer.html)
I found if I sent a */* header this caused the server to fall down as */* is not a valid content-type and the server would crumble and send an error response.
Changing your browsers user agent to Googlebot does not influence the HTTP headers, and tools such as web-sniffer also don't send the same HTTP headers as Googlebot, so you would never notice this issue with them!
Within a few days of fixing the issue, the pages were re-indexed and the client saw a spike in revenue.
No comments:
Post a Comment