Monday, 5 March 2012

Managing Varnish Content with MULE

A brief post on event driven management of Varnish Cache content.  This is something that Richard Guest and I have been experimenting with off and on for a while.  The concepts are similar to those covered by David Harrigan in a couple of recent blog posts (part1, part2).  David's posts explain the concepts well and are worth a read.  There are a few differences with our implementation that deserve a mention.

Banning Versus Purging

We want to let Varnish serve stale content when it can't reach the origin servers.  We also need to be able to use wild cards in the ban expressions.  This rules out HTTP purging for us - with an HTTP purge there is no option to use wild cards.  Also, with a successful purge the content is gone from the cache whether the origin server is reachable or not - there is no option for Varnish to serve stale content.


The ESB

We use MULE for our ESB and ActiveMQ for the messaging provider.  Not an important difference, but useful to know that you can manage content with a variety of means.

Less Code

I like Java most of the time, but it can be frustratingly verbose to use with HTTP.  I don't want to have to write code just to add a string to the http method (e.g., BAN).  The work-around is to change the Varnish config to use an HTTP method name that is available in Java.  

The Implementation

We're going to ban content with wild cards.  We have no other need for the HTTP DELETE method so we use that to implement BAN in our Varnish config.  We allowed the use of an additional HTTP header (X-BanExpression) to avoid having to deal with URL encoding for regular expressions.


This means we don't need any custom implementations for MULE HTTP outbound endpoints:


The MULE project is available on Github.

Other Considerations

We run multiple origin servers for Varnish.  This means we would like redundancy in our messaging as well.  The best way to achieve redundancy depends on the messaging provider.  If possible, it makes sense to run a messaging provider with each origin server and use fail over in the ESB client on each Varnish server.  I've covered fail over options with MULE and ActiveMQ previously.

Conclusions

This is a potentially powerful solution - the option for efficient caching with new content being available as soon as possible.  The flexibility in implementation is also a testament to Varnish and their focus on doing one thing and doing it really well.

Saturday, 3 March 2012

Mule ESB - Resilient JMS

We use the Community Edition of Mule ESB for several tasks including processing earthquake messages.  A critical function is to read earthquake information messages from SeisComP3.  These are output to a spool directory as SeisComPML.  They are converted to a simple XML format and put onto the messaging (in this case JMS provided by ActiveMQ).  The seiscomp-producer Mule project that does this is available on github.

I've been working on converting the seiscomp-producer project from Mule 2 to Mule 3 and revisiting resilience in the ESB.  In this post we'll look at the ActiveMQ JMS connectors and getting the application to start with ActiveMQ down and survive ActiveMQ restarts without having to restart Mule.  Retries for failed connectors are available in the Enterprise Edition of Mule or by using the common retry policies with the Community Edition.  As it turns out we don't need either with ActiveMQ.

Testing with Mule 3

I'm testing with Mule 3.2.0 and ActiveMQ 5.5.1.  Mule 3 makes it very easy to hot deploy multiple applications into the Mule server.  To help with testing seiscomp-producer I've created a very simple Mule application that uses a quartz timer to send a message to the logs every 5 seconds.

The seiscomp-producer uses two JMS ActiveMQ endpoints to send messages.  One durable (for the quake messages that must be delivered) and one non-durable for its own heartbeat messages.


In this case seiscomp.producer.amq.url=tcp://localhost:61616

With ActiveMQ running when we start Mule the seiscomp-producer starts fine, connects to JMS, and starts sending its heart beat messages to the soh JMS topic.


However, there are problems with this configuration:

  • If ActiveMQ is down when Mule starts the seiscomp-producer is not started.  
  • If ActiveMQ is restarted while Mule is running the seiscomp-producer looses it's connection to JMS and never reconnects.

The common retry policies address these issues by adding retries to Mule Community Edition connectors.  To get this functionality with Mule 3.2.0 and ActiveMQ 5.5.1 all we have to do is set the broker URL in the connectors to failover:

... brokerURL="failover:(${seiscomp.producer.amq.url})" ...

Now if we start Mule with ActiveMQ down Mule starts ok (the chatter app starts logging) and when ActiveMQ starts then seiscomp-producer connects to JMS and starts sending messages.  Similarly, if we restart ActiveMQ while Mule is running then seiscomp-producer will reconnect to JMS.

Restarting ActiveMQ closes the socket nicely and give a chance for the client to notice the stopped connection.  What about if ActiveMQ just goes away (e.g., the network connection is physically lost without a nice shutdown).  We can test this by dropping packets using iptables.  Thanks to Richard Guest (a GNS Science coworker on the GeoNet Project) for the suggestion and iptables rule to drop packets.   This time we run ActiveMQ on a remote server.  Once the connection is up we drop all packets coming from the remote host.  From the perspective of the client ActiveMQ is gone but there has been no nice shutdown to close the socket.  Here 'server' is the remote host that ActiveMQ is running on.

  iptables -I INPUT -s 'server' -j DROP

With packets being dropped the client times out after the channel has been inactive for to long and attempts to reconnect using the failover URL.  This goes on until we stop dropping packets simulating the ActiveMQ server becoming available again and then the client reconnects.

  iptables -D INPUT -s 'server' -j DROP

Using the failover broker URL you can define a list of hosts to try to connect to.  These are either selected randomly from the list or append randomize=false  to always use the first broker in the list

 brokerURL="failover:(tcp://amq1.com:61616,tcp://amq2.com:61616)?randomize=false"

If the first broker fails then the client connects to the second broker.  For ActiveMQ 5.5.1 it does not then reconnect to the first broker when it becomes available again.  It looks like keeping the first broker as the preferred one should be available as a feature in ActiveMQ 5.6 using priorityBackup.

The final connectors look something like:


Using ActiveMQ and MULE3 we have easily achieved fault tolerant messaging with a range of options for controlling failover.

Finally, we're dealing with a very low message throughput.  We rely on message storage in Mule and ActiveMQ to keep important messages when the connection is lost.  If you try this approach with a high message throughput do plenty of testing to ensure that messages do not get lost.