When we decided to give our website a brand new look, we also evaluated whether to change Content Management Systems and decided to migrate our content management system to Concrete 5 from DotNetNuke. DotNetNuke had served us well, but we wanted to move to a more modern and lightweight framework with a better administrative interface. This migration presented us with a few challenges to overcome.
While we redesigned our company website, we didn’t want to wait to launch it until we had migrated all of our product websites to Concrete 5. To combine the two systems under one domain name, we leveraged Varnish as a proxy to split requests between our LAMP stack hosting Concrete 5 and our Windows server hosting DotNetNuke. We’ve found Varnish to be a robust and yet lightweight tool for proxying, caching, and load balancing. You can read more about how we used Varnish to host Walkshed on Amazon Web Services in this related blog post.
Another concern when migrating CMS’s is making sure that the old URLs continue to function and issue the right redirects to users and search engines. Andrew Jennings took this project on and whipped up a URL redirection system for us using Apache’s mod_rewrite module.
- As requests come into Varnish, we first test to see if Concrete5 can respond to the URL.
- If Concrete5 isn’t managing the content for the URL, the request bounces into our redirection system that maps old URLs to new URLs using a binary hashtable and issues 301 permanent redirects to the client. We found that this approach is faster than simply relying on Apache .htaccess files since we have several thousand legacy URLs to support.
- If the URL request is not in the mapping file, we simply forward the request to the Windows server to be served by DotNetNuke.
- If DotNetNuke returns a 404 error, we redirect the request back to Concrete 5 to respond with a 404 error so that our redirection is invisible to the user.
A final goal of the CMS migration was to improve page load times for our visitors. Having pages that load quickly not only improves the experience for our web visitors, but also has an impact on search engine rankings. Concrete5 suggested we setup Alternative PHP Cache (APC) to speed up the pages it is serving, which has definitely improved performance on our new pages.
We found a great WordPress plugin (which runs our blogs) that can also leverage APC called W3 Total Cache. It provides a host of other functionality that we’re exploring including HTML, JS, and CSS minification as well as pushing design assets to Amazon’s CDN.
I always find it amusing how something as straight forward as a new website launch has so much happening behind the scenes. To find out more about the content that went into the new Azavea website read Rachel’s related blog post.
We’re excited for voting to open today for our entry into the NYC Big Apps Contest — Walkshed NYC.
Walkshed is very CPU intensive since we generate heatmaps for users’ custom walkability factors on the fly. Building on the work we did with using Amazon’s content delivery network for RedistrictingTheNation.com, we decided to expand our use of Amazon Web Services (AWS) for Walkshed as well as incorporate technology from the open source Varnish project.
Varnish for hardening (and an easier life)
Varnish is a HTTP accelerator that runs on Linux (and other Unix style OSes). We experimented with Varnish to solve a few goals:
- Caching frequently requested files and heatmaps tiles (i.e. the default walkability heatmap tiles)
- Scaling by letting Varnish load balance between multiple servers
- Improving reliability by allowing Varnish to resubmit failed requests and monitor server health
By pointing Walkshed.org directly to Varnish, we are able to adjust server configurations on the fly without bringing down our application. Currently, Varnish provides load balancing between 4 server instances which generate tiles using Walkshed’s DecisionTree engine. About 50% of the HTTP requests running through Varnish are cache hits, which helps eliminate unnecessary traffic clogging up our application servers.
One instance is hosted on our private server and is often able to meet demand, but adding 3 High-CPU Extra Large Instances from Amazon lets us improve fault tolerance and handle larger bursts in traffic. Varnish also monitors the health of our servers and removes them from the cluster if they become unresponsive.
Amazon EC2 Instances (bigger is better)
Our Amazon instances are using the new EBS-based images to improve boot speed. We’ve found that it takes about 7 minutes from when we launch an instance until it is successfully added to our Varnish pool, which certainly isn’t bad. By combining Varnish with Amazon’s on-demand resources, we should theoretically be able to scale as much as necessary. For this demo application, scaling is a manual process, but we are looking toward a future where the cluster would scale automatically based on demand.
We also experimented with a few EC2 instance sizes. Since our application is CPU intensive we really found we had to go with the High-CPU Extra Large Instance to get decent performance. The instances still don’t meet the performance we get on our private VMware-based server, but our hunch is that this is due to layers of virtualization causing memory allocation to be slow.
Technologies Used:
Most of the web applications we build are either used internally by our clients or have a steady stream of public user activity. With our recent Redistricting the Nation launch we wanted to experiment with some optimizations to make our site more resilient to traffic spikes as well as to improve the user experience.
Our strategy is broken down into a few components:
This post covers the Cloudfront CDN.
Previously, we had experimented with Amazon’s Web Services stack to host applications, but we hadn’t experimented with their Cloudfront CDN product. Pricing for the CDN is quite similar to Amazon S3 and allows organizations to build scalable applications without the upfront cost of most CDNs. We decided to use the CDN to host some large Javascript assets as well as our image components.
Cloudfront is quite easy to setup. We simply created an Amazon S3 bucket called s3.azavea.com and pointed a CNAME record for s3.azavea.com to the full bucket domain — s3.azavea.com.s3.amazonaws.com. Then, we enabled a Cloudfront distribution for the s3.azavea.com bucket using the free tool Cloudberry. Finally, we setup a CNAME record for cdn.azavea.com to the Cloudfront distribution domain d17ib0dlm1q8qa.cloudfront.net and we were rolling.
Since the CDN is heavily cached, it was easiest to use s3.azavea.com links during development to reduce the amount of file versioning that was necessary. Once we were settled on our assets, we switched to cdn.azavea.com links and started using the CDN.
The speed of the CDN is quite astounding. Splitting assets across another domain name also improves the browser’s ability to request more files at once improving the user experience. We were quite pleased with how easily we could offload assets to Cloudfront and realize gains with limited time investment.
A few notes to keep in mind when you are working with a CDN for the first time:
- Since there is no way to flush assets out of Cloudfront’s edge nodes, be sure to use file name versioning. This was a bit alien to us, but is easy to incorporate once you think it through. For instance, we decided not to set a far-future expiration header on our PDF assets as they are often directly linked to and we wanted to be able to update them regularly.
- Speaking of PDFs, it seems that while Cloudfront supports byte-range requests for assets, it doesn’t assert the “Accept-Ranges: bytes” HTTP header. This makes our large PDFs fully download before Adobe displays them within the browser. Unfortunately there is no way to add this header at the moment.
- Cloudberry is great to add HTTP headers to S3 assets. We decided that most of our assets would have a six month cache lifespan by asserting the “Cache-Control: max-age=15552000″ header.