Tuesday, 13 May 2014

Logging - What's Your App Doing Right Now?

Knowing how your application is performing in production should be really important to you.  I've written before about gathering application metrics.  This post is about gathering log messages.  I've started writing this post twice before and it quickly turned into an unhelpful rant about the state of Java logging.  Third time's a charm for sticking to the useful stuff without throwing my toys.

There is a list of what I want out of logging at the end.  This may help you come up with a list of your own and also understand some of the decisions I've made.

At the moment I'm focused on logging from Mule ESB applications.  This means I'm using log4j 1.x.  I'm guessing you are as well.    I want per application log files and the short host name to appear in the logs.  I don't want to repackage the application as I move it through dev, test, and prod.

I'm using Logentries.com for my log collector as they provide many of the features on my list.  Importantly, their Java collector is really nice and open source.  If nothing else use that so you get multline logging - no more mangled stack traces in your logs.  As an added bonus their pricing is really nice. There are other SAAS options out there, or you could roll your own infrastructure with something like http://logstash.net/  For me this is not worth the time or overhead and Logentries provide a far nicer service than I could ever set up in the time I have available.

So now to logging from a Mule ESB application.  There are a couple of ways to shave this yak.  The main thing I've had to work around is that log4j 1.x can include system properties in the logging setup but can't read environment variables.  The approach I've taken is:
  1. Write Spring config to reconfigure logging inside a Mule application.
  2. Provide a Logentries token and other properties to the application via properties files.
  3. Use a Spring profile to enable the logging in production.

1. The Spring Config


This is reusable between Mule applications so I pulled it out into its own library.  It's available on Github and our public Maven repo.  Looking at the config it does the following;
  • Provides a bean to look up the host name.
  • Provides a log4j pattern.
  • Removes any existing log4j appenders.
  • Adds an appender to send to Logentries.
  • Adds a appender per application rolling file system log  to mimic the one that Mule usually provides.
Before you load this config your application needs to have the following properties defined:
  • le.token - a valid token for logentries.com
  • app.name - the name of the Mule application.
  • app.version - the version of the Mule application.
A shout out to Harry Lime for his answer on Stack Overflow that got me started on reconfiguring log4j with Spring.

2. Providing the Properties

le.token


We use the same approach with all our applications.  We bundle default and empty properties with the application and then load an option set of overrides from the file system.  The override file is provided by Puppet in production and contains, amongst other things, the correct Logentries token for the environment.

app.name and app.version


We build our Mule applications using our own Gradle plugin which writes a build-version.properties file into the application zip for us.  The application version comes from our Gradle build version plugin.  If you don't want to take the same approach then you could write them into your mule-config as global properties.

Loading these properties then looks like this.  The load order is important to ensure that the file from /etc/sysconfig overrides the one bundled in the Mule app.  Also setting ignore-unresolvable="true" is important as we don't have the same properties in all our files.



See the MuleSoft Blog for a different approach to providing application properties.


3. Spring Profile


Once the properties are set all that's left is to import the logging configuration.  We do this in a Spring profile which is enabled in production.


Conclusion


Here's what some log messages look like at Logentries.  Awesome stuff for two dependencies and a few lines of XML.  The log messages are from our testing version of the back end services for a new version of the mobile quake applications that we are working on.  Multiple location options and faster notifications coming soon to a phone near you.




Appendix 1 - Enabling the Spring Profile and Logs from Mule


I also want to see the logs from the Mule server itself at Logentries as well as be able to turn on the Spring profile in production.  I want multi line logging so I'm going to use the logentries-appender again.  Here's one way to do this.  It requires repackaging your Mule server and adding some files on disk in production.  The goal is to pass some extra System properties to the Mule JVM.

In the Mule distribution:


  • Add the logentries-appender.jar to lib/boot
  • Add an appender to conf/log4j.properties  This refers to System properties that we will set in a moment:


  • In the startup script bin/mule (adjust for your distro or O/S) set some additional environment variables:



  • Where /etc/sysconfig/mule contains:



  • Finally, edit conf/wrapper.conf so that the environment variables are passed to the Mule JVM as System properties.  Remember to increment the properties appropriately for your config file.


You can now repackage the Mule server for however you deploy it.

To enable logging from the Mule server to logentries I need to provide a file /etc/sysconfig/logentries.token the only contents of which are a valid logentries token.  If you don't want the Mule server to log to logentries then remove this file or leave it empty (in which case the logentries-appender will throw debug errors).  Alternatively, provide all environments with a separate logging token.

To set the Spring profile add the profile(s) required to the file /etc/sysconfig/spring.profile (e.g. 'production') and restart the Mule server.

Appendix 2 - What I Want From Logging

  • log messages in one place.
  • search.
  • alerting.
  • developers involved in logging from their apps, not sysadmins having to bolt it on for production.
  • multi line logging - once you've had this for Java stack traces it's hard to go back to syslog mangled error messages.
  • a log message is a line of text - don't force pseudo formats on me.
  • log direct from an application (no faffing with syslog please).
  • don't block the application while logging.
  • don't lose messages if possible.
  • logging that plays well with running in the cloud or on saas.
  • log to the local file system as well in case the central collector goes away.
  • a log per application.
  • search by host occasionally.
  • easy reconfiguration between dev, test, and prod.
  • no requirement to repackage my app for each environment. 
  • some nice graphs would be good.
  • log everything that maybe useful.
  • the option to archive logs for longer periods of time.
  • security.
  • application deployment markers.
  • fast ingestion of log messages into the collector.
  • someone else to run the collector for me.
  • beatings for people that use logging when they should use a debugger (and vis versa).
  • The Moon on a stick.
This list is pretty long already.  Fortunately I don't also have to deal with:
  • regulatory requirements for logging.
  • private data in the log messages. 

Appendix 3 - A Challenge


If this is like any other blog written about logging from Java then people reading this will doubtless mention a different framework or facade that I can throw in here to make my life magically easier.   Logging in Java is already confusing enough.  Simply name checking another approach does not improve the problem.  I need to get a small amount of text onto the network and into a collector.  I shouldn't have to perform multilayer classpath and configuration surgery to achieve that.  If you must suggest alternatives then please do it with concrete examples on how your suggestion actually improves logging from Java in my situation.  I doubt you will fit convincing, tested, and proven examples into blog comments.  

Better yet - help get log4j2 over the line to a production release and into use on some popular projects.  It has some really nice looking features that I think actually will make logging from Java simpler.  We've tested it in dev and like what we saw.  I expect it will be used our next non-Mule project.





  

Monday, 10 March 2014

Metrics - What's Your App Doing Right Now?

It's been a long time between updates, work, busy, mumble mumble mumble.

In a minute I'm going to talk about monitoring Java and that's enough reason to stop reading right now.  Fair enough.  But before you go:

  • How is your application performing in production right now?  
  • Are you meeting demand?  
  • If you add more load to a server will it cause application problems? 
  • Do you know why things fail when they do?  

Developers - you are not excused, DevOps is the new now.  Code that is not in production and performing well is worthless.  If instrumenting your code to collect metrics about it's performance in production is not already part of your regular work it soon will be.  If Java's not your thing but you get a bit mumble mumble on the topic of how your code performs in production then scan the pictures below and then head on over to Librato Metrics or Hosted Graphite and get busy.  They don't care what metrics you're collecting they just make it so easy that you have no excuses not get on with it right now.

Here's a picture.  It's performance metrics for earthquake messages going through the Mule ESB from the SeisComP3 earthquake location system to the web.  There is also a heart beat message tracked so that we know everything is still working between earthquakes.  There is a lot of information here.  Not least of which is it takes about 10 times longer to insert messages into the a database (akapp01) than it does to read them from the file system (akeqp01).  This is the sort of information that makes targeting performance improvement, trouble shooting, or capacity planning easy.  Without this sort of information you're left guessing about what to do when something is not right with your application.  Guessing only rarely leads to success.


The picture above is a Librato Metrics dashboard.  Here's one of similar information sent to Hosted Graphite.  


Librato Metrics and Hosted Graphite both have strengths and weaknesses.  Try both and see which suits your needs the best.  One of the biggest differences, if you need it, is the ability to alert on your metrics.

So to the Java bit.  Monitoring the JVM and processes running in it usually involves using JMX.  Accessing information via JMX is easy and secure sucks. Running in the cloud, with servers coming and going at a moments notice, makes this problem worse.  The obvious answer is to turn the metric gathering problem around and have a JVM agent push metrics to you.  There are services available that do this.  There is the awesome New Relic and others like it.  However, they all come at a cost.  Enough of a cost that I ended up rolling our own agent for collecting JVM metrics from Mule, Jetty, and Tomcat.

I didn't have to do much.  Librato Metrics is a great data store with fantastic visualisations for what I want.  Getting data out of JMX is the only hard work and fortunately the Jolokia project removes all the pain by providing an HTTP-JMX bridge.  I started off with a Jolokia agent being queried with a Perl script.  Once I was happy with the moving parts I've written agents the wrap Jolokia to run as an application in Mule or a Servlet container and no external script is needed.  The applications themselves periodically gather and send metrics to Librato or Hosted Graphite.  Getting metrics is as simple as adding some config properties to a server and dropping an app into Mule, Jetty, or Tomcat.  Gathering metrics is deploying an application - no fire wall changes, no adding servers to a remote collector process, very little pain at all.  

Here's some Jetty metrics for Jetty and the JVM it's running in under test load.  It looks to me that with a little bit of tuning I could handle a lot more requests with this server.



These are the projects, they are open source on Githib:
If Librato or Graphite are not your thing then Mule Metrics or App Server Metrics should be pretty easy to extend (implement one method).  There is a lot more that could be done with Jolokia and JMX beyond using it to extract metrics.  As we find places where we need more detail about an application then I think we will start to look seriously at using Coda Hale's Metrics as we need it.  We're also getting far more sophisticated about how we use logging but that's a different topic.  For now, if you haven't got it, then work on getting some insight into your applications in production and avoid having to mumble mumble mumble when problems arise.