Monday, 1 October 2012

Optimizing Web Content – The Request is Evil, Stamp it Out Where Possible.


I've blogged previously about how we build fast web servers for hosting GeoNet content. If you care about a good user experience then you should care about having a fast web site and that means you should care about optimizing the content you serve as well as optimizing the web servers.

When you surf the web your browser requests objects (html, css, js, images, etc) from a web server, downloads them and uses those objects to render the web page that you view. The number of requests and the size of the objects all contribute to the so called page weight. A weighty page is a slow page and can make for a bad browsing experience for your users.

As a quick aside; give the web developers a fighting chance and use a modern browser. If you're not using one then stop reading this and go install one now. Now! Go!

Consider the two web pages shown below. They both have exactly the same information content – the location of three earthquakes. The one on the left uses 9 requests and a download size of 92.4KB, the one on the right only needs 3 requests and a download size 5.91KB. This is the sort of difference that can have a huge impact when your web servers are running at over 16,000 requests a second.


Now I'm not suggesting that we have no style or design on the internet (although you could probably run the entire web on an small fraction of the current number of servers and the CO2 emissions savings would be immense) but I am suggesting that you should strive to get your information density in terms of request and download size as high as possible.

Optimizing Content



The fastest way to improve your web content performance is to become a Steve Souders devotee. He literally wrote the book on web performance followed by another one. While you read his writings, or use any of the many open source tools he has been involved with, tip your hat to his current employer Google and his former employer Yahoo for supporting all of his work to improve the internet. If you work on web content and you haven't heard of Steve Souders then put down your triple latte and turn off that round cornered mp3 player - you have important things to learn!

There are a couple of browser extensions that condense all of the rules and performance knowledge into simple to use tools – have a look at either Yslow or PageSpeed. The Google Chrome Developer Tools are also very useful.

The Request Is Evil


The greatest killer to a fast loading web site is the request and many web optimization techniques are about reducing the number of requests your browser needs to make to render a page. A request is made every time the browser has to fetch an object to display a web page. Making a request requires a tcp connection to the web server and they are expensive to set up and tear down. Browsers can make multiple requests (usually 6 to 8 depending on the browser) per connection to the server before the connection must be closed and re-established. This all takes time and on a slow connection it can take a lot of time. In fact, depending on network speed, requests can have a much higher cost than large objects like images.

So if you want to make your web content faster the first thing to get a handle on is the number of requests involved. Then start stamping out requests where ever you can. Reducing the number of requests is the quickest way to make your pages load faster.

So take a look at the number of requests needed to load your web pages or ours (use the developer tools in Google Chrome) and with an empty cache (or forcing a refresh) load the page and look at the network tab. Check GeoNet's content (9-10 requests for the home page), check some large online media sites (renowned criminals for slow pages). You may be shocked at the number of requests to load some of those pages - over 200?! Oh my, that's really going to stuff my web browsing experience.

Also, every single request requires web server resources to serve it. The more requests for your content, the bigger the hammer you're going to need on the server. This means that optimizing your web content can result in direct cost savings in the data centre.

Making it Better


To improve the performance of GeoNet web content we use many of the approaches advocated by Steve Souders. I'll cover the important ones here.

CSS Sprites


Use CSS sprites to combine images. Here's our sprite for the common logos and icons. Loading those images separately is 8 requests. Combined it is 1. For the GeoNet home page 5 of those images are used for the cost of only 1 request. That's a savings of 4 requests compared to loading them individually – instant win. The cost of having a slightly larger image than we need for the home page (the extra 3 images that aren't used on the home) is more than covered by the reduction of the number of requests. But don't take my or anyone else's word for it, go dummy up some pages and investigate the size versus request trade off using Hammerhead for yourself.

Combine and Minify CSS and JavaScript


This reduces requests and also download size. We use Yuicompressor. There are other options out there. It can be hard to get this perfect across an entire site when different pages need different CSS and JavaScript.

And here's a hint: Adding 'min' to your file names to show that you have minified them adds a completely unnecessary step to deployment (changing your script or CSS names in the pages). Keep the same name and minify the files as part of deployment process or possibly even on the fly as part of serving them.

Compress Images


Use a tool like pngcrush or an online service to compress images. The reductions, even with no loss of quality, can be significant.

Set Far Future Expires Times on Content


The longer an expires time on an object is, the more likely it will still be in the browsers cache next time it is needed, which can mean no request to the server is needed at all. The expires time is a web server configuration problem that is made easier if you separate your content into directories organised by how often it needs to be updated (e.g., images versus icons).

Use Domain Sharding


A browser can make multiple requests (usually 6 to 8) per connection to a server. It can also make multiple parallel connections to different domains. You can trick a browser into making multiple parallel connections to the same domain using a technique called domain sharding. Look at where all the images on a page like All Quakes come from and you will see that there are up to 6 server names (static1.geonet.org.nz, static2 etc.) involved in serving the small icon maps that show the quake location. All of those server names point to the same server. This is domain sharding. Conventional wisdom used to say 6 was the best number. There is currently some debate about this number and it is being revised down. We may need to reduce the number of server names involved in serving GeoNet content.

Use Gzip Compression


Another web server technique to reduce the size of downloads on the wire. Only two thirds of compressible material is actually compressed. Roll your eyes and say lhaaaazy (then hurry off and check your settings).


Conclusions


Web performance is a complex topic. Optimizing content can be tricky and due to ever changing browsers and techniques it can be hard to stay up to date (while writing this I've noted several things that we can improve). The best bet is to read one of the many excellent online resources e.g., https://developers.google.com/speed/ and keep up with Steve Souders (if you can).

When you combine web content optimization with server optimization the difference between having a web site and having a fast web site can be an immense amount of work. But don't be put off, start now and make any improvement you can. Combine two images, stamp out one request and give yourself a pat on the back. Your users will thank you for it.


Monday, 17 September 2012

GeoNet Web Hosting – 16,257 Requests per Second. That's Apache httpd Serving Y'all


That's Apache httpd Serving Y'all

On July 3 2012 a deep magnitude 6.5 earthquake near Opunake was felt widely across New Zealand. Traffic to the GeoNet website topped out at 16,257 requests per second - this was only six minutes after the earthquake. It's this very fast rise, from the regular background traffic of a few requests a second to peaks like we just experienced, that makes our web hosting challenging. Our normal web traffic looks like a well organised denial of service attack.

The pattern of traffic to the GeoNet website www.geonet.org.nz before and after the Opunake earthquake. Traffic rises from the background of a few requests a second to 16,257 requests a second in six minutes.

To serve that traffic we use the venerable open source web server Apache httpd. We are not alone: since April 1996 httpd has been the most popular web server in the internet.

Making Apache httpd Fast

So it may be a surprise that we use Apache httpd as there is an assumption that it is a heavy server and that there are faster alternatives. That may be true (see 'That Question' below). This isn't a web server shoot out – it's about how we tuned httpd. The tuning approach applies to any web server.

The Apache httpd docs are excellent, and if you care about performance you should read at least the sections on performance tuning and multi-processor modules.

Optimizing web server performance is simple, right? Just tweak your configuration to maximise the use of your hardware and the result is the number of concurrent connections you can handle. Here's how we did it:

Our Servers and Content

Servers: Cheap, nothing fancy. Quad processor, dual core, 8 GB ram, running linux. You can probably make an effective web server using other operating systems, just like you can probably win Bathurst using antiquated V8 technology, but you might need to ban the more efficient competitors first.

Hard drives: Dunno, who cares? Something fast I guess, at least probably not a tape drive, definitely not SSD. Modern linux kernels use memory for disk caching to vastly speed up file access.

Virtual machines: Nope, not for this job. Instant performance loss even with the good virtualisation software.

We run six web servers in New Zealand largely for reasons of geographic redundancy and up time. They are separated into two Content Delivery Networks (CDNs). More on that in a future post.

The content being served here is static pages on disk – the ultimate cache. More on caching and dynamic content in a future post.

The Golden Rule: Swapping is Death

RAM used to be hugely expensive. Swapping allows a server to temporarily use disk in place of memory, and therefore fake having more RAM than you could afford – the only downside being a HUGE performance loss. Hard drive access speed is much much slower than RAM so if your process (say httpd) is being routinely swapped to disk it will slow to a crawl. Swapping is web server death.

To make a fast web server you want to tune things to get as many processes as possible to fit into the RAM available on the hardware. The objective becomes to get the Resident Set Size (RSS - the part of the web server process that is held in RAM) as small as possible. The smaller the RSS, the more processes the server can run and the more connections you can handle.


Apache httpd – All Things to All People

Apache httpd is a very flexible web server. Between its core functionality and additional modules httpd can be used to solve a wide range of web hosting problems. The rich functionality comes at a cost: a larger RSS. If you use a package installed httpd you may well get a lot of features you don't need. They are often provided as shared objects that can be dynamically loaded as required, but even that functionality adds to the RSS by having to enable the shared object module.

Apache httpd provides a number of different multi-processor modules (MPMs) that control how it handles requests for content. Choosing the correct one for your requirements can have a huge impact on performance. The default configuration choices tend to be conservative and are intended to provide the widest range of compatibility with possibly old software. The default choice for Unix systems is prefork, but the docs suggest this may not be the best choice for highly loaded systems. Don't trust me – go read the docs now. The docs also have this nugget:

'...Compilers are capable of optimizing a lot of functions if threads are used, but only if they know that threads are being used...'

Oh my, compilation... Fear not, it's easy as long as you don't peek under the covers.

Compiling and Tuning Apache.

There are two tasks: chose a suitable MPM and get the RSS size down so we can run lots of them. We use the worker MPM. If you need to run other modules or server side scripting, you may have to use the prefork worker.

We then look at what functionality we actually need Apache httpd to have and compile out everything we don't need:


./configure \
--prefix=/usr/local/apache \
--with-mpm=worker \
--enable-alias \
--enable-expires \
--enable-logio \
--enable-rewrite \
--enable-deflate \
...
--disable-authn \
--disable-authz \
--disable-auth \
--disable-dbd \
--disable-ext \
--disable-include \
...

make
make install

Follow this process and you will build an Apache httpd that will have the smallest RSS for your requirements.

Now just figure out how many process you can run in available RAM:

Find the total memory available:

cat /proc/meminfo | grep MemTotal

Stop httpd and figure out how much memory everything else running on the system is using:

ps aux | awk '{rss+=$6} END {print "rss =",rss/1000 "MB"}'

Subtract this from your total memory. Ponder the size of your content and how much memory caching you will use and subtract a suitable number from the memory left. This will be the total memory of RAM that you will let httpd use.

Fire up httpd and put it under load – you choose the tool. Httperf is good. Then look at the httpd RSS set size:

ps -aylC httpd | grep httpd | awk '{httpdrss+=$8} END {print "rss =",httpdrss/NR/1024 "MB"}'
rss = 10.826MB

For our servers a calculator says we should be able to run 636 server processes each of which run
50 threads per child worker, so 31800 workers. Under test conditions httpd proved be stable with this number of workers. For production we decided to be a little conservative and allow more memory overhead for other processes.

Finally, once you've come up with your number of workers then start them all at httpd start up. When our web servers come under load the rise in traffic is near vertical and the last thing we want httpd doing is spending time starting more threads when it should be serving content. Our worker configuration is then:

<IfModule mpm_worker_module>
  StartServers 400
  ServerLimit 400
  MaxClients 20000
  MinSpareThreads 1
  MaxSpareThreads 20000
  ThreadsPerChild 50
  MaxRequestsPerChild 0
</IfModule>


Firewall and Kernel

Finally, you may need to tune you firewall and kernel. Disable conntrack for http (port 80) connections:

...
-A PREROUTING -p tcp -m tcp --dport 80 -j NOTRACK
-A OUTPUT -p tcp -m tcp --sport 80 -j NOTRACK
...
-A INPUT -p tcp -m tcp --dport 80 -j ACCEPT
...

And increase the size of the TCP backlog queue:

tail -n 1 /etc/sysctl.conf
-> net.ipv4.tcp_max_syn_backlog = 4096


Conclusions

Configured like this Apache httpd is a very fast and stable web server. It's served us, and you, very well.

Interestingly, optimizing web content for performance is at least as important as optimizing web server performance but often far more subjective because it introduces nebulous concepts like look and style. More on that in the future.

That Question

Inevitably someone asks, “Why don't you use Nginx? It's faster.” I have no idea if Nginx would be faster for our use. A cursory glance at many of the comparisons available on the web show that they are not comparing like functionality – installing two web server packages with the default configurations and then comparing them without tuning proves nothing. A meaningful test has to be for the same functionality and needs to try to exhaust the critical hardware resources.

I recently saw an interesting presentation by Jim Jagielski that discussed new features in Apache httpd 2.4 and made some comparisons to Nginx. Nginx just wins out in some situations but not all. People will quite rightly point out that Jim Jagielski is the co-founder of the Apache Software Foundation and probably unfairly imply a vested interest in the comparisons. The only certainty is that the competition will be good for all users of the internet. The debate and unfounded proclamations will continue.

Anyway, as of last week I don't care. We've switched www.geonet.org.nz to Varnish.


Thursday, 10 May 2012

Monday, 5 March 2012

Managing Varnish Content with MULE

A brief post on event driven management of Varnish Cache content.  This is something that Richard Guest and I have been experimenting with off and on for a while.  The concepts are similar to those covered by David Harrigan in a couple of recent blog posts (part1, part2).  David's posts explain the concepts well and are worth a read.  There are a few differences with our implementation that deserve a mention.

Banning Versus Purging

We want to let Varnish serve stale content when it can't reach the origin servers.  We also need to be able to use wild cards in the ban expressions.  This rules out HTTP purging for us - with an HTTP purge there is no option to use wild cards.  Also, with a successful purge the content is gone from the cache whether the origin server is reachable or not - there is no option for Varnish to serve stale content.


The ESB

We use MULE for our ESB and ActiveMQ for the messaging provider.  Not an important difference, but useful to know that you can manage content with a variety of means.

Less Code

I like Java most of the time, but it can be frustratingly verbose to use with HTTP.  I don't want to have to write code just to add a string to the http method (e.g., BAN).  The work-around is to change the Varnish config to use an HTTP method name that is available in Java.  

The Implementation

We're going to ban content with wild cards.  We have no other need for the HTTP DELETE method so we use that to implement BAN in our Varnish config.  We allowed the use of an additional HTTP header (X-BanExpression) to avoid having to deal with URL encoding for regular expressions.


This means we don't need any custom implementations for MULE HTTP outbound endpoints:


The MULE project is available on Github.

Other Considerations

We run multiple origin servers for Varnish.  This means we would like redundancy in our messaging as well.  The best way to achieve redundancy depends on the messaging provider.  If possible, it makes sense to run a messaging provider with each origin server and use fail over in the ESB client on each Varnish server.  I've covered fail over options with MULE and ActiveMQ previously.

Conclusions

This is a potentially powerful solution - the option for efficient caching with new content being available as soon as possible.  The flexibility in implementation is also a testament to Varnish and their focus on doing one thing and doing it really well.

Saturday, 3 March 2012

Mule ESB - Resilient JMS

We use the Community Edition of Mule ESB for several tasks including processing earthquake messages.  A critical function is to read earthquake information messages from SeisComP3.  These are output to a spool directory as SeisComPML.  They are converted to a simple XML format and put onto the messaging (in this case JMS provided by ActiveMQ).  The seiscomp-producer Mule project that does this is available on github.

I've been working on converting the seiscomp-producer project from Mule 2 to Mule 3 and revisiting resilience in the ESB.  In this post we'll look at the ActiveMQ JMS connectors and getting the application to start with ActiveMQ down and survive ActiveMQ restarts without having to restart Mule.  Retries for failed connectors are available in the Enterprise Edition of Mule or by using the common retry policies with the Community Edition.  As it turns out we don't need either with ActiveMQ.

Testing with Mule 3

I'm testing with Mule 3.2.0 and ActiveMQ 5.5.1.  Mule 3 makes it very easy to hot deploy multiple applications into the Mule server.  To help with testing seiscomp-producer I've created a very simple Mule application that uses a quartz timer to send a message to the logs every 5 seconds.

The seiscomp-producer uses two JMS ActiveMQ endpoints to send messages.  One durable (for the quake messages that must be delivered) and one non-durable for its own heartbeat messages.


In this case seiscomp.producer.amq.url=tcp://localhost:61616

With ActiveMQ running when we start Mule the seiscomp-producer starts fine, connects to JMS, and starts sending its heart beat messages to the soh JMS topic.


However, there are problems with this configuration:

  • If ActiveMQ is down when Mule starts the seiscomp-producer is not started.  
  • If ActiveMQ is restarted while Mule is running the seiscomp-producer looses it's connection to JMS and never reconnects.

The common retry policies address these issues by adding retries to Mule Community Edition connectors.  To get this functionality with Mule 3.2.0 and ActiveMQ 5.5.1 all we have to do is set the broker URL in the connectors to failover:

... brokerURL="failover:(${seiscomp.producer.amq.url})" ...

Now if we start Mule with ActiveMQ down Mule starts ok (the chatter app starts logging) and when ActiveMQ starts then seiscomp-producer connects to JMS and starts sending messages.  Similarly, if we restart ActiveMQ while Mule is running then seiscomp-producer will reconnect to JMS.

Restarting ActiveMQ closes the socket nicely and give a chance for the client to notice the stopped connection.  What about if ActiveMQ just goes away (e.g., the network connection is physically lost without a nice shutdown).  We can test this by dropping packets using iptables.  Thanks to Richard Guest (a GNS Science coworker on the GeoNet Project) for the suggestion and iptables rule to drop packets.   This time we run ActiveMQ on a remote server.  Once the connection is up we drop all packets coming from the remote host.  From the perspective of the client ActiveMQ is gone but there has been no nice shutdown to close the socket.  Here 'server' is the remote host that ActiveMQ is running on.

  iptables -I INPUT -s 'server' -j DROP

With packets being dropped the client times out after the channel has been inactive for to long and attempts to reconnect using the failover URL.  This goes on until we stop dropping packets simulating the ActiveMQ server becoming available again and then the client reconnects.

  iptables -D INPUT -s 'server' -j DROP

Using the failover broker URL you can define a list of hosts to try to connect to.  These are either selected randomly from the list or append randomize=false  to always use the first broker in the list

 brokerURL="failover:(tcp://amq1.com:61616,tcp://amq2.com:61616)?randomize=false"

If the first broker fails then the client connects to the second broker.  For ActiveMQ 5.5.1 it does not then reconnect to the first broker when it becomes available again.  It looks like keeping the first broker as the preferred one should be available as a feature in ActiveMQ 5.6 using priorityBackup.

The final connectors look something like:


Using ActiveMQ and MULE3 we have easily achieved fault tolerant messaging with a range of options for controlling failover.

Finally, we're dealing with a very low message throughput.  We rely on message storage in Mule and ActiveMQ to keep important messages when the connection is lost.  If you try this approach with a high message throughput do plenty of testing to ensure that messages do not get lost.

Monday, 30 January 2012

A Billion Hits - Countries and Browsers

Over 2011 the GeoNet web site has received over one billion hits.  It's a surprising number given the optimization we do to reduce the number of hits (which put us in the top ten for download speed in an independent audit of New Zealand web sites).  We process the logs to track which browsers people are using so we know which versions to target.  We can also use the ip address from the request to figure out which country the request originated from which can help with capacity planning.


Breakdown By Country

Not surprisingly the majority of the traffic for 2011 came from New Zealand (86%) - there were a lot of felt earthquakes in New Zealand.  The other perils (volcanoes and tsunamis) that cause a lot of international traffic to the GeoNet web site were fortunately quiet.

Traffic to the GeoNet web site by country for 2011 - New Zealand dominates at 86%.

Breakdown By Browser

With eighty-six percent of requests for 2011 coming from New Zealand we can make a reasonable survey of the web browsers being used in New Zealand.  We also look at the type of web browser used to make the requests by vendor and version.  Why do we care?  The browser fetches and renders HTML, CSS is used to style the rendered page (to change the look), and JavaScript is often used to change the style further and to make the page interactive.  There are several different versions of HTML, CSS, and JavaScript.  The browser vendors implement slightly different subsets of those standards and occasionally add their own extensions.  If a web page is anything more than very simple it can be a great challenge to get it to behave consistently in different browsers.  To be fair, this situation has improved immensely over the last year but is still complicated by requests from older versions of browsers.

Browser Vendors

The great majority of traffic to the GeoNet web site comes from four main web browsers;
Plotting traffic by browser vendor over the year we see similar trends as elsewhere.  We don't seem to see the same rapid upswing in use of Chrome that is seen elsewhere and we're a little higher on Firefox and Safari.



Breakdown by Browser Version

This is where things get tough - there are different versions of every vendor's browser.  From the four vendors above we have seen requests from a total of twenty two different browser versions.  If you want to see a web developer cry take them out for a beer or two.  When the alcohol starts to replace the caffiene, they relax a little, and their fingers stop twitching, ask them about cross browser compatibility.   They will probably spend the rest of the night sobbing on your shoulder.


Chrome and Firefox - The Rapid Releasers

Chrome and Firefox (since version 5) both use a rapid release schedule - they are releasing versions something like every six to sixteen weeks and pushing updates to the browser so that it is very easy for a user to stay on the latest version.  The aim is to get new features to the user quickly.  It makes the plots over the year look spikey - a new version is released and then replaced pretty quickly.

Chrome uses rapid releases.  A new version comes and goes pretty quickly in the traffic.

Firefox has used rapid release since version 5.  There are still some version 3 users hanging in there though.

Safari and IE

Safari and IE use a different release approach so the plots are smoother.



Time for an Upgrade?

If you are using an older version of one of these browsers it could be time for an upgrade.  Some of them are free.  Try as many as you can.  Give each one a couple of weeks to see if you really like it.  Visit some sites you are familiar with.  You might be surprised by extra functionality - somewhere a web developer has been heroically using progressive enhancement to make that site work in as many browsers as possible but work even better in those where it can.

It's not just the look you may notice either.  Modern browsers are incredibly fast at downloading and rendering web pages.  There are other features too - a favorite of mine is a thread for each tab in Chrome.  Recall the old days when something bad happened on a web site and the whole browser crashed?  Now just the tab dies and you often get the option to wait for it to recover.  Nice!  

Now if we can just get them all to implement the same HTML standard the same way, everyone will be winning.




Friday, 20 January 2012

Using Mule - Earthquakes on the BUS


Adopting SeisComP3 for locating earthquakes has been a cross cutting change. As I mentioned in a previous post, SeisComP3 makes automatic earthquake locations and as it iterates the solution it releases many updates about an earthquake. Compare this to the current system (CUSP) where we get one or possibly two locations for any earthquake.

The current GeoNet web content is static content that is generated by daemon processes and rsynced to the public web servers. It is event driven; an XML message with earthquake details arrives from the location system, the web content is generated and then copied to the public servers. It takes around twenty to thirty seconds to update the servers. When it takes fifteen minutes or more to make an earthquake location and there is usually only one iteration this time is acceptable. Switching to SeisComP3 means that there are many location messages about an earthquake and they arrive in the few minutes that it takes SeisComP3 to finish iterating the automatic location. Clearly we need a faster way to get locations to the web.


Taking the BUS

To get the earthquake location information to the web quicker we have adopted Mule ESB, an Enterprise Service BUS.  

Note there is often confusion between ESB and messaging (e.g., JMS, AMQP).  An ESB implementation will often have messaging as one of its components.  A messaging solution on its own is not an ESB.

There are several open source ESB options available. When I was trying them out I found Mule to have a great mixture of features, documentation, community, and training available. Also, if we need them, there are commercial features and support options available beyond the open source version. These are all important features for building long term sustainable software solutions.

Our ESB implementation does have messaging, we use ActiveMQ which implements JMS 1.1.  When I get a chance I'll also try out RabbitMQ which implements AMQP and has some really nice features.

In the time we have been working on this project Mule version 3 has been released.  I've been having fun converting the ESB configuration from Mule 2 to Mule 3; it has some very nice features that make development, testing, and deployment a lot simpler.

Mule for GeoNet Rapid (Beta)

The primary use of Mule for GeoNet Rapid is to take earthquake messages from SeisComP3, transform them to a simple XML format, send them over the messaging, and insert them into the database.  The projects are available under GPL3 on Github:

  • seiscomp-producer - takes messages from SeisComP3, transforms them and puts them on the messaging. 
  • hazbus-xslt - XSLT's for simplifying the SeisComP3 output.  A dependency for seiscomp-producer.
  • quake-db-client - inserts quake messages into the database.

Note on loose coupling: Why that transform from SeisComPML to a simple XML format before sending to the messaging?  This is loose coupling - the other clients of the ESB don't need to know about the input format.  We have to do a little more work on the input (the message transform) but the benefit is that when the output format of SeisComP3 changes we only have to change the transform and not all the other ESB clients as well.

Conceptually the ESB implementation looks like this.



A reasonable question may be, "Do you really need that ActiveMQ network bridge?"  The answer is it's there to give us a lot of flexibility for when we work on adding other instances of SeisComP3 for geographical redundancy.  It's also useful to abstract away the network configuration from the Mule applications.  It remains to be seen if this architecture will make it to full production.

We've achieved a lot with Mule.  It really makes integration problems like this a lot simpler.

Finally a big thanks to David Dossot and Bruce Snyder for the books they've written on these topics and some very useful discussions.



Monday, 16 January 2012

GeoNet Rapid - Being Faster

GeoNet Rapid - Introducing SeisComP3

Over the last year we have been working on implementing a new system for GeoNet to locate earthquakes in New Zealand.  Soon we will start a public beta of the new system.  This post introduces the star of the new system: SeisComP3.  One of the main goals of the project is to be faster at locating earthquakes.  We selected SeisComP3 which was developed within the German Indonesian Tsunami Early Warning System (GITEWS) project by GeoForschungsZentrum (GFZ) Potsdam and is now maintained by Gempa.  SeisComP3 is a distributed system that uses a messaging bus.  This means that it can be scaled to allow processing of many seismic sites.  GFZ are currently processing around nine hundred sites from around the globe in real time.  Throughout testing for New Zealand SeisComP3 has proven to be very impressive and it looks like we should be able to make earthquake locations available on the web site much faster.

Locating Earthquakes - Being Faster

Being faster - easy, right?  Just stop doing the slow bit.  For GeoNet the slow part is that a person has to get involved.  The fastest we can currently typically locate an earthquake is around ten minutes but it's usually more like fifteen to twenty minutes. During the night it can be a real challenge – a pager beep wakes the Duty Officer from a dead sleep, they get up and boot a computer, login, bring up the seismograms, review, locate and finally post the event.  Learn more about locating earthquakes in the links at the end of this page.

SeisComP3 doesn't need manual help, it can make automatic earthquake locations.  As soon as it has enough data (from ten stations) it makes a first location.  As more data arrives, from more distant stations, it refines the location through an iterative process.  This process is finished when the last data arrives and is processed.  So how fast is it?  During the recent sequence of earthquakes on 23 December 2011 in Canterbury the test version of SeisComP3 located one hundred and six earthquakes over magnitude three.  On average SeisComP3 had;

  • The first automatic location after two minutes.
  • The final automatic location after four minutes. 

Compare that to how long a manual location takes!  Also, SeisComP3 doesn't struggle to keep up.  When the earthquakes are happening close together it is very hard for the Duty Officer to keep up.


On 23 December 2011 (UTC) there where 106 earthquakes over magnitude 3 in the Canterbury region.  On average SeisComP3 (SC3) had a first automatic location two minutes after the earthquake occurred and a final automatic location after four minutes.  Compare this to the fifteen to twenty minutes it typically takes to make a manual location.

So, SeisComP3 makes earthquake locations very quickly.  However, because initially it's fully automatic it can make mistakes; things like noise, two earthquakes at the same time, or the earthquake being off shore (outside the network) can cause SeisComP3 to make mistakes.  For significant events a Duty Officer will still manually review SeisComP3's work.

We are currently working out the best way to display the earthquake information as SeisComP3 iterates the location.  We're also updating our delivery systems to handle the extra information.  I'll cover these changes in future posts.

Links


Links to information about locating earthquakes.