Since the time I have hosted this blog on WordPress, I am looking for patterns and tweaks to optimize my bandwidth consumption. Most hosting providers charge you for your bandwidth, so it is always a good idea to see where you can cut down on bandwidth consumption.
When I say optimize I don’t mean reducing size unnecessarily. A good theme with good content in it is definitely going to add to the user experience of visitors to your site and that shouldn’t come in the way of bandwidth optimization. A lot of these points are closely related to the SEO of your site.
So here are some things you can do to optimize the bandwidth.
- Ensure that you have a /favicon.ico
A lot of browsers just expect this file to be there, and make a request for it. If you don’t have this file, then your site throws a 404 error and adds that much to your bandwidth (depending on the page size of your 404 page) on each request.
- Make sure your 404 pages are not bulky
You never know how crawlers function. If you have a URL like http://example.com/categorty/postname/post-id, many crawlers make requests for each of the sub-levels. Ex: it expects to see something at http://example.com/category/postname, http://example.com/category and so on. So either have some content there, or make a 301 redirect to a good URL or just make sure your 404 pages are small and informative.
- Look for the bad bots
If you have a lot of content on your site, then you might want to consider who should be allowed to crawl your site and who shouldn’t. A simple way to do this is to use the Robots Exclusion protocol (robots.txt) and add entries for those who you don’t want crawling your site. Now, this does not necessarily mean you disallow everyone. Look for patterns. I have seen a few bots that don’t give you any SEO benefit, in other words, you don’t see any page views from those search engines, so I have tried blocking them while still keeping the less famous ones that don’t take up too much bandwidth.
- Block the bad bots
This is kind of related to the previous point. Sometimes there are a few bots that don’t stop crawling your site, inspite of the agent being added to robots.txt. In this case, you can block the IP range for these bots. A simple way to find out the IP range is to use the botsvsbrowsers site and then denying the whole range of IP addresses. You can easily find out the range of IP addresses by looking at your access logs or recent visit statistics.
- Use a feed proxy
Even if you have a post update period of a day, you see bots hitting you at a frequent interval (sometimes as less as 5 minutes). The simplest solution for this is to use a feed proxy like Feedburner. Not only will they proxy your feed for you, they also give you good graphs of the feed accesses and the number of readers you have.
- Be on the lookout for patterns
Most of the times you see patterns in your visitor accesses. There are a few keyphrases for which you rank high and people reach your site. Now, you could either drown the visitor with everything you have, or you could show the visitor exactly what he/she is looking for. That way, you keep the content short and precise and also save up on bandwidth. A simple thing like using the sidebar selectively on your pages is a good place to start.
- Make sure there is no duplicate content
Sometimes it so happens that you make a copy of the content and update one of them, while keeping the old one. If the old content has already been indexed, then you provide the crawlers with 2 different URL’s which have very similar content. This is bad both in terms of bandwidth usage as well as for your site SEO.
- Use nofollow on links that you don’t want to be followed
Crawlers read the robots.txt files only once per session (or lesser). So if you see a particular URL taking up too much bandwidth and you want to limit the crawl, a better thing to do is to use a ‘nofollow‘ on the links, so that subsequent requests are not crawled.
- Be aware of the tags you use
While stuffing your pages with a lot of tags can help in dumb SEO, be aware of the fact that visitors reaching your site via these keywords may not necessarily find what they want. This not only increases the bounce rate of the site but also unnecessarily adds to the bandwidth consumption.
- Decrease the crawl speed
If it is Google that is taking up a lot of your bandwidth, you can use Google Webmaster Tools to reduce the crawl speed of the bot.
- Reduce homepage clutter
The homepage is often the most frequently visited site. Depending on the bounce rate of your site, it makes sense to reduce the clutter in your homepage and keep only those features that are often visited.
- Smush the images, compress the CSS/JS
A very interesting way to reduce the bandwidth consumption is to reduce the size of the content that is returned. YSlow gives some interesting tips, including, reduced number of HTTP requests, use of CDN, use of expires header, GZIP compression, use of ETags, compression of CSS, JS and images. In order to compress the CSS files, you can use a CSS compressor, for JS there are JS compressors, and for images there is Smush It.
As you can see, there are quite a few things you can do to optimize the bandwidth consumption.