Wednesday, May 27, 2009

FogBugzReporter: Introduction

This is the first post in what I plan to be a series about a small application named FogBugzReporter that I began writing in October of 2007.  I'd recently started using the on demand edition of FogBugz from Fog Creek Software to track my time spent consulting and was disappointed to discover that it had no direct way for me to generate time reports.  Fortunately, I discovered that along with the web UI, FogBugz also had a pseudo-REST API for interacting with the application (I say "pseudo-REST" because almost all operations can be performed with HTTP GETs, including ones that change state on the server).  I read the API docs, quickly wrote a simple Swing application in Java, and made it available to the rest of the FogBugz community.  Soon after, I started experimenting with Groovy and realized that porting the app would be a nice way to get some experience with the language, including SwingBuilder and the great APIs for processing XML.  I did that and eventually moved the code from a private Subversion repository to one on Google Code under an Apache license.  Until a few days ago, it sat for the most part unchanged.  In the meantime, my Groovy skills have improved a bit (I'm still no expert), the language has evolved, and the technologies around it have grown.  In this series, I plan to gradually bring the application up-to-date and polish up the code.  I'll update this posting with links to the other posts in the series as I write them.

Posts:

Planned Posts:

  • FogBugzReporter and Griffon
  • Suggestions?

Saturday, May 23, 2009

AWS Elastic MapReduce Webinar @Noon ET on May 28

Amazon's new Elastic MapReduce service makes it easier to run Map/Reduce jobs within EC2. If that sounds interesting, you can find out more about it by registering and "attending" the webinar they're holding on Thursday, May 28 at Noon ET. I'll be there, in a virtual sense. :)

Sunday, May 17, 2009

ConcurrentLinkedHashMap and Possible Alternatives

I mentioned in an earlier post that a project I'm working on needed a thread-safe class with behavior similar to LinkedHashMap, which maintains the order of entries while allowing sub-classes to set a (rather primitive) eviction policy.  I came upon and started using Ben Manes's ConcurrentLinkedHashMap, with only minor modifications to allow for easier sub-classing.  I've been pretty happy with it so far and have been meaning to take a look at recent changes that he and "zellster" have made since I last downloaded the code.

The same project is also using ehcache for caching.  I was looking forward to the release of version 1.6 and inquired about the expected release date on the developer's forum.  That kicked off an exchange between me and Greg Luck, the lead for ehcache, that eventually led to him mentioning some problems he'd encountered with ConcurrentLinkedHashMap.  I'm going to check in with Ben and let him respond to what Greg said, along with giving him a chance to provide a guess for when his new version of CLHM is likely to be usable.  I also need to look into what Greg was referring to in the changelog for ehcache 1.6 beta5 when he said, "Make MemoryStore eviction policies injectable."  Interesting…

Just to confuse matters, I was also recently looking over some materials discussing Infinispan, which will essentially be the 4.0 release of JBoss Cache, and saw this interesting blog entry on "Implementing a performant, thread-safe ordered data container".  In it, Manik Surtani mentions their newly implemented classes FIFODataContainer and LRUDataContainer, which sound like another plausible way to approach my problem.  I'll make sure to ask more about them in the comments for that post.

Saturday, May 16, 2009

JPC: x86 Emulator in Java

Last night, I was reading Caches and Maps in Terracotta by Alex Miller when I came upon his link to the JPC (a.k.a. JavaPC) Project.  It sounded familiar and I decided to follow the link.  It turns out to be an open source project implementing an impressively functional and performant x86 emulator in Java.  I had fun last night launching an MS-DOS environment within an applet (what a concept!) and playing Donkey Kong for the first time in ages.  According to the site, the emulated layer runs at about 20% of your processor's speed - not bad for a emulator written in pure Java.

Note: After my first visit to the site, it seemed to be having problems.  Just in case it had to do with visitor load, I changed the link above to point to the copy in Coral Cache.  The site is now totally back up and behaving normally.  If Coral has any problems or is slow, you can safely go to the site directly.

Update: I got an email from one of the JPC team members saying that they'll be at JavaOne this year with some new stuff.  I wish I could be there to see it.  I guess I'll have to wait for it to show up on their site.

Monday, May 11, 2009

Trying out DISQUS for Comments

I've decided to try out DISQUS for handling comments on my blog. Let me know what you think. I can always revert back to Blogger comments if desired.

Update: It turns out that there's a bug in Blogger that precludes you from using DISQUS in conjunction with an external post editor. Since I'm happy with Windows Live Writer, I guess I'll disable DISQUS for the time being and revisit the decision if the bug gets fixed.

Sunday, May 10, 2009

VisualVM and Cutting Method Calls by Over 1000x

I’ve been using VisualVM on and off with modest success since I first found out about it at JavaOne 2008.  In addition, since JDK 6 update 7, it’s been included as part of the standard JDK installation (released in July of 2008).  I was quite happy when it helped me identify bottlenecks and cut down the time to run a suite of unit tests by 50% (from 2 minutes to 1 minute).  After all, the less time it takes to run unit tests, the more often they’ll get run.  However, my biggest success using VisualVM came in early April, when I was asked to figure out why a use case was exhibiting disturbing performance characteristics.

When I first walked through the use case, it was taking about 10 seconds to run with 50 items.  The time appeared to be proportional to the number of items or perhaps even worse.  This was a scary prospect given that there would often be more than 10,000 items in a real production setting and that the use case was intended to be interactive (as opposed to a batch or background task).  A careful code review might have revealed the problem, but I knew it would be far more efficient to profile the code as a way to identify hot spots.  I started up VisualVM, connected to the relevant JVM, turned on CPU profiling, and collected data for the use case.  Sadly, I seem to have lost the relevant snapshot, so you’ll have to trust me when I tell you that the method SqlQueryFile.isSupported() was being called over 36 million times!  I made some minor tweaks to the method itself which improved its performance by about 10 percent – okay, but not nearly enough.  I next identified what was effectively a loop invariant.  The isSupported() method was being called repeatedly with the same query and schema.  As you can see from the snippets below, I pulled that check out of the nested loop.

Before:
for(Platform platform=p; platform != null; platform=platform.getParent()) {
  for(SqlQuery q : metricQueries) {
    if (isSupported(q, datasourceSchema)) {
      if(q.getMetricPath().equalsIgnoreCase(metricPath) &&
         q.getPlatform().equalsIgnoreCase(platform.getName())) {
        if(ret==null) {
          ret=q;
        } else {
          if(isMoreRecentThan(q, ret)) {
            ret=q;
          }
        }
      }
    }
  }
}
After:
Collection supportedQueries = new ArrayList();
for (SqlQuery q : metricQueries) {
  if (isSupported(q, datasourceSchema) && q.getMetricPath().equalsIgnoreCase(metricPath)) {
    supportedQueries.add(q);
  }
}

// need to optimized this code for better performance
for (Platform platform = p; platform != null; platform = platform.getParent()) {
  for (SqlQuery q : supportedQueries) {
    if (q.getPlatform().equalsIgnoreCase(platform.getName()) &&
        (ret == null || isMoreRecentThan(q, ret))) {
      ret = q;
    }
  }
}

When I ran the profiler again, it turned out I'd vastly reduced the number of calls to (~36 million to ~9 million) and the time spent in (80+% to 26.3%) the isSupported() method, as you can see in the picture below:

snapshot-040109-7pm

I probably could have left it there, but after taking another quick look, I noticed a very simple way to save even more time.  I could flip the order of the conditional expression in the first if statement.  That way, the cheap operation (which usually returns false) would come first and frequently allow us to skip the evaluation of isSupported().

if (q.getMetricPath().equalsIgnoreCase(metricPath) && isSupported(q, datasourceSchema)) {

To my surprise, this cut the number of calls to around 27,000 and the time spent in the method to 1%!

snapshot-040109-8pm

With these optimizations in place, the same use case with 11,000 items now took less than a second.  Thanks VisualVM!

Update: This post was a second place winner in the Java VisualVM Blogging Contest!

Saturday, May 9, 2009

High Performance, Parallel, & Grid Computing

On Wednesday night (May 6th), I attended an alumni event held by the UMass Boston CS Department in the beautiful five year old Campus Center.  It was great to see my professors and former classmates, along with talking with current students about their software engineering projects.  Toward the end of the event, there was a talk given by an alumnus named Richard Anderson, the CTO of Symmetric Computing, a company that makes hardware and OS-level software for High Performance Computing (HPC).  Their slogan is something along the lines of “Supercomputing for the Masses”.  The speaker gave a fairly typical overview of the history of CPU performance, including references to Moore’s Law, hitting the gigahertz wall, and the emergence of multi-core chips, followed by a discussion of parallel programming.  He mentioned several times that the company would be making one of their 48 core/350GB systems available to the university for academic use.  It was a decent talk, if not extremely exciting.  It made me think of concurrent programming, grid/cloud computing, and the cool compute appliances made by Azul Systems.  Based on questions asked by myself and others, I was disappointed to discover that although the technical gap between HPC and the rest of the computing world is narrowing, most of the HPC world seems unaware of this.  A lot of work has gone into making clusters of commodity hardware work together and coding for these architectures easier (think Google, Map/Reduce, etc.).  It would be nice if people in HPC and distributed computing could do a better job of exchanging ideas.  Even if typical clusters have to deal with latency issues and the inability to directly access large amounts of memory, that doesn’t explain why someone should code an HPC app in Fortran or C on a Symmetric box instead of in Java on a 432 core/384GB Azul box.
I realize that this is a bit of a rant.  If any readers can provide counter-arguments or counter-examples, I’d be happy (and reassured) to read and know about them.
Update: I forgot to mention that if you’re doing academic research or teaching a class in clustering/distributed computing, you should check out the fairly new Amazon Web Services in Education program.  They provide free credits to educators, researchers, and students.  Pretty cool!

Sunday, May 3, 2009

Groovy versus Scala (Update)

Back in March of last year, I wrote a post comparing Groovy and Scala.  Enough time has passed that I thought it was worth revisiting the topic.

In my original post, I reached the conclusion that at least for me, it made sense to first focus on Groovy and then later learn more Scala.  I still feel that way, but my thoughts on the matter have evolved.  How about combining Groovy and Scala in the same program or architecture?  That thought had vaguely occurred to me already, but it was solidified by the discussion of Twitter’s use of Ruby and Scala.  They started off using Ruby (Ruby on Rails, in particular) to run their whole site.  When they ran into performance and scalability issues, they determined what was causing the most trouble (back-end tasks) and gradually converted the relevant parts of the system into Scala.  This kind of polyglot programming makes sense to me.  It follows the principle of “using the best tool for the job.”  I think the combination of Groovy and Scala makes even better sense: they’re designed to run and interoperate with other languages on the same VM.  Not surprisingly, I’m not the first person to think of this.  I found two posts from Andres Almiray (“Griffon: Groovy & Scala working together” and the follow up “Follow up on Griffon/Groovy/Scala”) showing some practical examples and another on a different blog discussing the ability jointly compile “Groovy+Scala+Java”.  I’m excited to watch and perhaps participate in these exciting developments.

Side Note: On the subject of Groovy, when my one-year IntelliJ license was close to expiring, I decided to check out NetBeans.  I was pleased to discover it had quite good Groovy and Grails support.  I’ve been using the 6.7 milestone builds and now the beta, in order to get support for Grails 1.1.  If you haven’t tried Groovy/Grails development in NetBeans yet, you should.

Update: Here are links to the items Andres mentioned in the comments

Update (Jan 12, 2010): Andres recently posted the highly relevant Groovy & Scala: a tale of two JVM languages