Troubleshooting Search Engine Problems
To try to combat the perennial problems webmasters of all ilk face with
search engines, especially Google, here's a 14-point website tune-up plan,
that may help :
More FAQ
...
- Get server response
headers for homepage - it should be a server response code 200; for
a missing page it should be a 404.
Resolve
any canonical domain issues before they become a headache (www vs non
www - make a decision) by using a 301
redirection of one to the other one to be retained. Here's a tutorial
for apache servers.
- Run a spider
simulator for the homepage and others - if not enough text is found,
then there's no content to index.
- Run Xenu
Link Sleuth starting at homepage and check broken links and number
of true links found - unless it never completes the crawl which will
mean a problem. Compare list of good links found with your sitemap.
- Run a web
page spam detector - more than 100 outbound links on a page signal
a link farm - linking to too many diverse sites can be a liability.
- Validate your homepage
and others - proper doctype and charset help - broken code especially at
the block level will prevent bots from crawling.
- Pages not reachable through a normal crawl ( orphan or reachable by javascript,
ajax or flash navigation) will not be crawled/indexed properly (if at all)
even if present in the sitemap - make sure sitemap is correctly laid out.
- Affiliate links present on pages - alarm bells will go off. You need
to ensure you have a lot of original and unique content to supplement the
products available through the affiliate links and you should use rel="nofollow" for
all such links. Forget cloaking them with any sort of redirecitons, this
is sneaky. Remember affiliate links are on par with paid links.
- Page titles, headers on the text, proper use and distribution of keywords
and keyphrases (don't keyword stuff, don't spam), anchor text, alt text
for images, increased use of css - all part of efforts of internal optimization
of pages. Optimze images and other media files so pages load fast.
- Watch your page
layout and page
size carefully. Putting ads (whether Adsense or any others) in the
prime viewing area of a page (e.g. above the fold, above the top menu,
in the left navigation area or smack bang where one expects actual content)
signals a low quality site and a poor user experience. Having ads disguised
as regular content and website links is a bad signal. Penalty material.
- Get or better attract relevant, quality incoming links - but not from
link farms. Don't buy rank-passing links. Forget about blog and forum signatures,
unless your post is truly appropriate and relevant to that forum or blog.
Don't spam. Careful with SEO specialists you may hire, make sure you know
exactly waht they do and that they don't break any guidelines. The statistics
are grim.
- Find out the indexing situation by checking in Google.com site:example.com and site:www.example.com -
investigate omitted (previously called supplemental) pages. When you see
an indication of similar pages that is usually because they have the same
title or descriptions tag as others already listed and that should be fixed.
Or are very thin content and don't deserve to be ranked. Or are among those
blocked in robots.txt. Some may benefit from having a robots "noindex" meta
tag instead of being blocked in robots.txt. It depends.
- Have dead urls currently indexed first removed
from the Google index by requesting it unless you have appropriate
equivalent new urls to which you can 301
redirect them.
- The file robots.txt is
your friend. Tweak it well, streamline it and make sure you know why you
are blocking what you are blocking.
- Make use of Google
Webmaster Tools.
Finally remember: CRAWLABILITY as the number 1 technical
requirement for a site to even start to be indexed. View Matt
Cutt's video where he explains a lot of concepts involved in SEO. He
mentions CRAWLABILTIY around 1min and again at around 3min into the
video.
301 Redirection on an Apache Server
NB: There are more comprehensive tutorials at faqhowto.info.
I will provide here the 3 kinds of 301 redirect most often needed, plus
a combo. Others have done it and maybe better. But this I have pieced together
and tested on my server, so I know it works.
The following directives are to be added to an .htaccess file and uploaded
to the root folder (as text or ASCII file transfer).
NB: If your site is meant to work with https, then replace http://
with https://
in all urls in the examples given below.
-
Redirecting from an old domain to a new domain called www.newdomain.com,
preserving the same website internal structure, same page names, so only
the domain changes. This is to go into the .htaccess file on the OLD
domain:
RewriteEngine on
RewriteRule (.*) http://www.newdomain.com/$1 [R=301,L]
-
Redirecting non-www url's to www urls on the same domain www.example.com:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
Or, to consolidate all parked domains and IP-based address or hosting
userid based-address into one main domain:
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www.example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
-
Redirecting oldpg.html to newpg.html on the same domain www.example.com
(keep all on one line):
RedirectMatch 301 ^/oldpg.html$ http://www.example.com/newpg.html
-
Several directives involving url rewriting can be combined. For instance
this useful combo will redirect non-www to www urls and /index.html to
/ (website root or folder root):
RewriteEngine on
RewriteBase /
### re-direct index.html to root / ###
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html\ HTTP/
RewriteRule ^(.*)index\.html$ /$1 [R=301,L]
### re-direct non-www to www
rewritecond %{http_host} !^www.example.com [nc]
rewriterule ^(.*)$ http://www.example.com/$1 [r=301,nc]
####
But a word of caution - the above may interfere with certain addressing
schemes. In particular if you are using Frontpage, you will need to make
adjustments. Refer to this
article for more information.