Getting an ArcGIS Server Map Cache in S3

When deciding how to best handle the air photos in the new Philadelphia Water Department Stormwater Map Viewer, we kicked around a few ideas. We decided to put the cache in Amazon’s Simple Storage Service to offload some of the local disk requirements and leverage their fast data storage and delivery infrastructure. In moving the process, we learned a few things:

Tune Your Cache

Make sure you spend time planning the cache. Not only will the cache look better in the final application, but it will also load to S3 faster and cost less in the long run.

  • Set the extents in the MXD or MSD before publishing to a map service. The overhead of transferring the 254 byte empty tiles caused a lot of unnecessary burden on the upload process as well as the fact that you are paying for them to be stored in the cloud. If it doesn’t need to be there, don’t build it.
  • Choose the correct image format for the cache. If you are caching a base map and do not need to support transparency, make it a JPEG. If it needs to support background transparency, use PNG. ESRI’s suggestions for planning a map cache can be found here.

Get a Good Tool to Transfer the Files

I started using the free version of Cloudberry Labs S3 explorer. But I had to move over 90 Gbs worth of data to my S3 bucket. The CloudBerry S3 Explorer – Pro supported multithreading which allowed for up to 5 threads to either enumerate through the folders, copy the files or apply the ACL. It is a low cost application that more than pays for itself when moving a lot of files up to a bucket.

When transferring the files up, I was working in blocks of directories, not the whole scale level. It was quicker for me to work in 20 to 30 subdirectories than grabbing a whole scale level. It did require a little bit more management on my end, but more steady progress was made.

Accessing the Tiles

ArcGIS Server does not support cloud hosted caches at the 9.3.1 release. The ESRI Javascript API and Flex API can be extended to use caches hosted in the cloud (Flex example from Mansour Raad), so you’ll have to roll your own. For the Philly Storm Water project, we were using the Open Layers and someone has rolled one for us. There is a patch that can be used to access the cache without communicating through ArcGIS Server straight from the client-side library. The one thing to note is that the Tile Origin is pretty touchy, we had to make some adjustments to the origin values to make sure everything lined up correctly.

Summary

Now that the site is up there and we are starting to get some traffic hitting it, putting the tiles in S3 was the right decision. There is no reason for ArcGIS Server to waste any cycles moving tiles around, let it do the heavy lifting with the vector layers and queries. Hopefully the rumors are true, and the ArcGIS Server 10 release will be more aligned with cloud computing. Until then, there are still plenty of ways to take advantage of the benefits.

Envisioning Development

This is so simple, it’s cool: http://envisioningdevelopment.net/map

I especially like the hourglass-like effect way of populating the columns. It gives one the feel of really counting things. Like when you switch between East Harlem and the Upper East Side.

I would like to be able to see the distribution over the whole city, or the gradients between neighborhoods, but that’s just me. I think the design is neat and clean, and tells a very compelling story.

Philadelphia Civic Hackathon creates a Gang Survey App

SunLight Labs recently held it’s Great American Hackathon, an event that encourages groups in each region of the United States to gather together on one weekend and create software that will make government more open. Two Avencia employees, David Middlecamp and yours truly, participated in the Philadelphia version and also hosted the event in our offices. Josh Tauberer, a PhD candidate at U-Penn, and developer of GovTrack.us, organized the event.

njgangsurvey3

Seven of us came together to create a web-based visualization and display tool based on data from the New Jersey Gang Survey 2007. The NJ State Police have been conducting these surveys every three years since 2001. Using Django, MySQL, OpenLayers, OpenStreetMap and ArcGIS Desktop, we put together a full-blown app in two days. Two analysts from the New Jersey State Police joined us on Saturday, explained the background on the data set, wrote up the text and other content for the site and answered questions on how the data was structured.

New Jersey Gang Survey Viewer

New Jersey Gang Survey Viewer

The result is The New Jersey Gang Survey Viewer. Check it out. I was amazed by how much a small group could accomplish in such a short time frame, particularly when most of the participants neither knew each other nor knew many of the technology tools when they started. The players were:

Scaling Walkshed.org with Varnish and Amazon Web Services

We’re excited for voting to open today for our entry into the NYC Big Apps ContestWalkshed NYC.

Walkshed is very CPU intensive since we generate heatmaps for users’ custom walkability factors on the fly.  Building on the work we did with using Amazon’s content delivery network for RedistrictingTheNation.com, we decided to expand our use of Amazon Web Services (AWS) for Walkshed as well as incorporate technology from the open source Varnish project.

Varnish for hardening (and an easier life)

Varnish is a HTTP accelerator that runs on Linux (and other Unix style OSes).  We experimented with Varnish to solve a few goals:

  • Caching frequently requested files and heatmaps tiles (i.e. the default walkability heatmap tiles)
  • Scaling by letting Varnish load balance between multiple servers
  • Improving reliability by allowing Varnish to resubmit failed requests and monitor server health

By pointing Walkshed.org directly to Varnish, we are able to adjust server configurations on the fly without bringing down our application.   Currently, Varnish provides load balancing between 4 server instances which generate tiles  using Walkshed’s DecisionTree engine.  About 50% of the HTTP requests running through Varnish are cache hits, which helps eliminate unnecessary traffic clogging up our application servers.

One instance is hosted on our private server and is often able to meet demand, but adding 3 High-CPU Extra Large Instances from Amazon lets us improve fault tolerance and handle larger bursts in traffic.  Varnish also monitors the health of our servers and removes them from the cluster if they become unresponsive.

Amazon EC2 Instances (bigger is better)

Our Amazon instances are using the new EBS-based images to improve boot speed.   We’ve found that it takes about 7 minutes from when we launch an instance until it is successfully added to our Varnish pool, which certainly isn’t bad.   By combining Varnish with Amazon’s on-demand resources, we should theoretically be able to scale as much as necessary.  For this demo application, scaling is a manual process, but we are looking toward a future where the cluster would scale automatically based on demand.

We also experimented with a few EC2 instance sizes.   Since our application is CPU intensive we really found we had to go with the High-CPU Extra Large Instance to get decent performance.   The instances still don’t meet the performance we get on our private VMware-based server, but our hunch is that this is due to layers of virtualization causing memory allocation to be slow.

Technologies Used:

Ignite: Spatial, Boston

I got the opportunity to present at Ignite: Spatial, Boston a couple weeks ago.  I was fortunate to present Sourcemap.org in the company of other Boston area techies doing some cool work in laser scanning, CityML, social media and more.

All the videos are on YouTube. The presentation summaries are also online in this Google Doc.

Enjoy your spatial ignition this morning.

Echos of the Browser Wars

I caught this link in my feeds today: http://radar.oreilly.com/2009/12/google-android-on-inevitabilit.html

A good read on where mobile devices are, and why it is a non-trivial thing to gain market share in the mobile market.  Specifically, the article discusses the hurdles that Google is trying to jump with its investment into Android, and how Apple is setting the bar high with its i* products.

One of the things that jumps out at me is that the technical challenges of mobile development are nearly synonymous with those in web application development.  Mark Sigal points out that development in the mobile realm is essentially heterogenous.  I had a conversation with a team lead at uLocate a few weeks ago that explained the matrix that characterized this heterogeny.  It’s nuts.  It’s a 4 dimensional matrix, where the dimensions are: Device, Carrier, Platform, and OS.

I’m comparing it to the browser wars because when I test KIF (Kaleidocade Indicators Framework ) I look at the application across a 3 dimensional matrix, where the dimensions are: Browser, Version, and OS.

I can see how that similarity may make it easy for a developer to switch between developing a mobile application and developing a web application, since the testing strategy would be very similar.  I would like to see that transition be a smooth one (as a web developer and someone with a recreational interest in developing tools/toys for mobile devices), so that warms my heart.

However, what I see as a dangerous element to that matrix is how it can get so big so fast.  In the browser market, the matrix is limited to only a handful of items in each dimension.  In the mobile market, however, the number of handsets is always growing — so much so that it’s hard for developers to keep up.  Russell Beattie (Nokia employee) puts it this way (full article):

Multiply the number of models [Nokia puts out] per year (10-20) by the number of years Symbian’s been around by the various custom carrier modifications, and you get complete developer and consumer confusion.

From the chatter I’ve seen, it seems like it’s going to be a teething process by Google, then all out mobile platform wars after that.  The end result?  Probably the same as where we are today, in terms of browsers: supporting about 4 major browsers, with minor differences between them. That provides support to about 97% (as of 12/4/2009) of all browsers out there.  Not bad, but it’ll take mobile a while to get there, and I suspect there will be some corporate blood letting before it’s all over.

Wireframing w/ Balsamiq – 5 Tips

I have been spending a lot of time pulling together new wireframes for HunchLab using Balsamiq over the past couple of weeks. In doing so I have learned a few tricks along the way.

5. Get Additional Controls

Looking for particular interfaces or additional controls, check out Mockups to Go. I was able to find a ribbon toolbar, iPhone interface and a series of pointers that I could bring right into the wireframes.

4. Change to System Fonts

While I understand the goal of the application to make things look sketchy so the reviewers can focus on functionality not on the particulars of the specifics of the UI. But I definitely believe there is a reason why Comic Sans has been identified as the worst font ever (link to Google search for some supporting details).

There are a couple of ways to do this, the first is the most straight-forward. Select ‘View’ from the toolbar at the top and then choose ‘Use System Fonts’.
SystemFont

Another (slightly more complicated method) is to edit the Balsamiq Mockups config file. Here are the detailed instructions from Balsamiq’s website. Going this path gives a little more flexibility to the font used in the design process.

3. Clone Current Mockup & Duplicate Feature

I found these two functions to be great time savers. With the duplicate feature tool, no matter how many features are selected, it is a single-click copy and paste function.

The cloning of mockups lets you easily duplicate the interface and either create the next step in a series of wireframes or rearrange the controls to look at a different configuration.

2. Internal Links to Replicate Interaction

I have been using the links on controls to allow users to interact with the wireframes. Before any code is written users can interact with the UI and give feedback to workflows and interacting with the pages.

By default when working in the ‘Full Screen Presentation’ mode, there is a large exaggerated cursor that was confusing to some of the people I worked with.
BidgPointer

In the bottom right-hand corner of the presentation mode, there is tool to set the cursor to be more like a normal user experience when interacting in a browser or desktop environment.
NormalMode SmallPointer

1. Link to APIs for Images

Creating graphics to represent graphs or maps can be a time consuming task. There are a number of API’s out there that can be used to get a good representation of what will be presented in the interface. A few of my personal favorites are:

Google Static Maps API

Sparklines from Joe @ BitWorking


Google Charts API