Monday, December 28, 2009

Handy But Hidden: Collections.newSetFromMap()

I was reminded again today of how much the core Java libraries have grown over the years.  It can be really hard to keep track of all the nice little features that get added in each release.  Today, I was wishing that there was a ConcurrentHashSet class or some other HashSet-style concurrent collection.  Before implementing something myself, I did some searching on Google and found a nice way to solve the problem.  In Java 6 (a.k.a. JDK 1.6), Sun added the utility method Collections.newSetFromMap(), which allows you to pass in an empty backing Map and get out a Set with behavior based on the underlying Map.  So, here is all I needed to do to build a concurrent HashSet:

Set<Observer> observers = Collections.newSetFromMap(new ConcurrentHashMap<Observer, Boolean>());

I also found a bug/enhancement request in Sun's bug database suggesting that this feature should be documented in any core Map for which there is no corresponding Set implementation, such as WeakHashMap and IdentityHashMap.  Given how long I went without knowing the method existed, I wish they'd done it.

Wednesday, December 2, 2009

MorphLabs, the Meta-Cloud Vendor?

Fascinating!  I just got an email from G2iX, the parent company of MorphLabs, inviting me to a webinar (Wed, Dec. 9, 2009 from 11AM - 12PM PST) in which they plan to demonstrate how to "Morph your Data Center into a Cloud Vendor":

Traditional data center and co-location vendors are under tremendous threat in the face of Amazon EC2 and related Web Services, and those providers who’re unable to adapt to some sort of cloud computing model face extinction as competing Infrastructure as a Service products become even more prevalent.

This webinar will demonstrate how the Morph Cloud Computing Platform can help regional data centers transform their business using their existing assets with minimal cost and hardware investments so that they can provide a cutting edge cloud solution to a broader customer base with rates that are more than competitive while still maintaining great profit margin.

For those who are unfamiliar with the company, MorphLabs provides a hosting platform that runs on top of Amazon's EC2 web service.  So, assuming I'm not mistaken, G2iX plans to help data centers other than Amazon's set up the Morph platform to run on their own servers.  My first thought was, "Why would they do that?  Won't this just create competition for MorphLabs?!"  However, once I mulled it over, I realized that if they structure things right, it could be quite beneficial to MorphLabs.  What if these Morph platform deployments simply became new pools of servers to which Morph apps could be deployed?  Viewed that way, it actually sounds quite clever.  I'm almost tempted to attend the webinar, even though it has no direct relevance to me. :)

Thursday, October 1, 2009

Great Star Trek:TOS Quote

My wife and I recently started watching Star Trek: The Original Series episodes via Netflix's Watch Instantly streaming service (they're also available on YouTube Shows).  Although it's a fun show, I still prefer The Next Generation and Deep Space Nine series.  I did, however, come upon a line that made me laugh out loud:

What makes you think you're a man?  You're an overgrown jackrabbit, an elf with a hyperactive thyroid!
Kirk intentionally trying to irritate Spock in This Side of Paradise, Season 1, Episode 24

Saturday, September 19, 2009

JavaScript Date Library Advice?

I've been doing some work recently with the Google Maps API, something I'll try to describe in a future post.  While doing that work, I've realized that I need (or would at least find very handy) a JavaScript library that provides more sophisticated date/time support than the standard JavaScript Date.  Searches have led me to two libraries which look plausible.  Datejs takes the approach of ornamenting the existing Date class with numerous additional functions.  I really like the coding style demonstrated in their tutorial, which is a point in their favor.  In contrast, Fleegix provides a Date class independent from, but interface-compatible with the standard Date (yay for duck-typing!).  They clearly view timezone support as critical for Dates, which I'm very happy to see, since dealing with them and daylight/standard time can be a real pain.  So, from a functional standpoint, both libraries look pretty strong.  I'm concerned that they're not particularly active, neither having any check-ins in 2009, but perhaps it's because they've been around for a while and work well already.  I can't say I found this discussion in the Datejs group very encouraging. :(

Any thoughts?  Anyone out there used either of these libraries or another one they'd recommend?

Update (May 26 2010): I just looked again at Fleegix and have some excellent news!  The projects are now hosted on GitHub and the timezone-js project has some updates from April of this year.  I'll have to give it a closer look soon.

Thursday, September 17, 2009

Subversion – Odd Problems and Funny Solutions

I've been using Subversion and TortoiseSVN since 2005.  They're both great technologies and I'm very happy they exist.  However, I think anyone who's been using them for a while has periodically encountered some weird problems and has had to come up with a few strange (and even funny) solutions.  I certainly have.  In this post, I describe one example.

Earlier this week, a developer I work with saw the following in a TortoiseSVN window when he ran update on a working copy:

Error: Couldn't do property merge 
Error: Unable to make name for 'C:\tempfile'

Not knowing how to proceed, he asked me for help.  We tried my typical first approach: running cleanup via Tortoise on the working copy.  That didn't fix it, so we ran a "Check for modifications" and discovered that there were a few temp files scattered through the directory tree.  We deleted those and tried again.  Still the same message, so on to Google.  I searched for various permutations of the error messages, but was unable to find anything other than code check-in comments for Subversion itself, which appropriately enough uses a Subversion repository.  For some reason, I wish I remember why, we next tried updating just one child directory within the working copy.  It worked!  We tried updating all the other child directories and those worked as well.  So, that suggested that there was something wrong with the Subversion metadata, but only at the top level.  We went into the .svn directory and started browsing through the files.  When we looked at the entries file, we noticed a few lines starting with "incomplete".  I tried moving the entries file out of the .svn directory to a totally different location in the file system, hoping that when we updated or ran another cleanup, the file would simply be recreated.  Unfortunately, with that file missing, the directory was no longer considered by Subversion to be a working copy.  Disappointed, I moved the entries file back into .svn and started thinking about what to do next.  On a whim, the guy I was helping tried updating again and it worked!  Somehow, changing the directory from a valid working copy to an invalid one and back again was enough to fix everything.  Who knew?!

Have any good Subversion/TortoiseSVN troubleshooting stories or troubleshooting tips of your own?  Please leave them in the comments.

Tuesday, August 25, 2009

Curious Concept's Excellent JSON Pretty Formatter and Validator

I just wrote a bit of code using Groovy HTTPBuilder to fetch and parse some JSON content to help answer someone's question on StackOverflow.  When trying to pull some interesting data out of the JSON to display, I realized that it would be much easier for me to understand the structure of the JSON if it were pretty-printed (or pretty-formatted).  When I googled for "JSON pretty format", I found a helpful blog entry pointing me to Curious Concept's JSON Formatter (& Validator).  It's a simple online tool that allows you to paste in either a URL pointing to JSON content or the content itself and display it in either compressed or readable form.  It might not have been very hard to code, but it's extremely useful.  Thanks Curious Concept!

Update: I'm trying out the JSONView Firefox plug-in.  So far it seems to work pretty well.  Even so, I expect I'll continue to use Curious Concept's tool for a while, since I spend a fair amount of time in Chrome.

Friday, August 21, 2009

Merger Madness & One Sample App To Rule Them All

VMware --- SpringSource --- /---G2One
\---CloudFoundry
Terracotta --- Ehcache
Oracle --- Sun


With all the mergers and acquisitions happening in the Java world, I thought I'd suggest a sample app someone could build to bring them all together.

Here it is: a Grails app using an Oracle DB with distributed caching provided by Terracotta and Ehcache, all running in a VMware cloud, with deployment and management handled by CloudFoundry.  I think that covers all the bases.  I have no idea what it will do yet, but let me know if I've missed anything. :)

Note: I actually have a lot of respect for all the technologies and people involved, I'm just feeling a bit overwhelmed with all of the recent consolidation!

Wednesday, August 19, 2009

Interruptible JDBC Statements

I work for a client on a product that makes direct queries against databases via JDBC. A while ago, I added some code so that a user could stop the execution of a series of queries by clicking on a cancel button. Behind the scenes, it interrupts the thread executing the queries. Since that thread checks whether it's been interrupted by calling Thread.currentThread().isInterrupted() before starting to execute each query, no new queries will start once the cancel button was pushed. However, any query that's already started will run to completion before the code discovers that a request to cancel had been made. Recently, my client decided that we should make cancellation more granular and add the ability to stop a query in mid-stream. Looking at the JDBC docs, there's nothing to indicate that any of the relevant methods respond to interrupts (none throw InterruptedExceptions or claim to wrap them in a SQLException), so I had to come up with another way. I settled on an approach where I submit the query as a Callable to an ExecutorService and then block, waiting for the result via Future.get(). If the thread is interrupted while we're waiting, get() throws an InterruptedException, which gives us the chance to call Statement.cancel(). It's pretty simple and works nicely. :)
Here's the code in a somewhat abridged/condensed form (I create the ExecutorService elsewhere using Executors.newCachedThreadPool()):
final String sql = "...<some SQL>...";
final Statement statement = conn.createStatement();
Future<ResultSet> queryFuture =
 execService.submit(new Callable<ResultSet>() {
   @Override
   public ResultSet call() throws Exception {
     statement.execute(sql);
     return statement.getResultSet();
   }
 }
);

ResultSet rs;
try {
 rs = queryFuture.get();
} catch (InterruptedException e) {
 logger.info("Query interrupted - calling Statement.cancel()");
 statement.cancel();
 throw e;
} catch (ExecutionException e) {
 //code to handle or rethrow the exception
}
By the way, if any of the java.util.concurrent classes above are unfamiliar to you, I highly recommend Java Concurrency in Practice by Brian Goetz (et al.). It's an excellent book.

Tuesday, August 18, 2009

Terracotta Acquires Ehcache

Exciting news!  This morning I received a note from Greg Luck via the Ehcache Open Discussion mailing list announcing that the Ehcache project was joining forces with Terracotta and that Greg would be joining the team at Terracotta, Inc.  Since there's been a Terracotta Integration Module for Ehcache for a while, I don't foresee any instant improvements with regard to integration between the two technologies, but I was quite excited to see the following in Greg's blog post announcing the news:

I am full-time on Ehcache. I have not had the time I would have liked to devote to Ehcache (I have been doing a miserly 10-15 hours per week for the past 6 years) but now I do. Look out!

Given what he's done so far with limited time, I'm looking forward to seeing what Greg can do when Ehcache becomes his full-time job!

For more information, see Terracotta's announcement, CTO of Terracotta Ari Zilka's blog post, or Alex Miller's blog post.

p.s. I submitted a patch that Greg ended up incorporating into Ehcache 1.6, so I feel a tiny bit of ownership toward the project – similar to the way I feel toward Tomcat.

Update: Terracotta and Ehcache are holding a webcast on August 20th at 4PM ET to further discuss the acquisition.  Also, I just submitted my second patch to Ehcache. :)

Tuesday, July 28, 2009

Attending SpringOne 2GX in October!

After much thought, I finally decided last night to register for the SpringOne 2GX conference running from October 19 through 22 in New Orleans.  The deadline for the super-early bird pricing is this Friday (July 31), so I thought I should make my decision.  I've been to talks on Groovy and Grails at NFJS and JavaOne, but I'm really looking forward to an entire conference devoted to those topics.  I hope to see you there!

Thursday, July 2, 2009

FogBugzReporter: NetBeans + Ivy = IvyBeans

(This is my second post in a series about FogBugzReporter, a small open source app that I originally wrote in Java and later ported to Groovy.  See the first post for more information and for links to other posts in the series as I write them.)

When I first ported FogBugzReporter to Groovy, I was able to take advantage of a free 1 year license for JetBrains IntelliJ IDEA (version 7) and its nice Groovy support.  I had good experiences with it, but when the year ran out, I was both reluctant to pay for a new license and curious to explore the Groovy and Grails support in NetBeans.  At the same time, I was interested in trying out Apache Ivy for dependency management.  As it turns out, Laurent Forêt wrote a plug-in for NetBeans called IvyBeans which integrates Ivy with the NetBeans internal build infrastructure.  So, if you reference a library in your ivy.xml, not only is it downloaded into your local repository (located in %USERPROFILE%/.netbeans/<netbeans_version>/modules/ext/cache on Windows) but it's also added as a NetBeans project dependency.  This post describes what it was like to move to NetBeans and start using Ivy.

Although I'd previously used NetBeans 6.1 and 6.5 for some small Grails apps, by the time I got around to moving FogBugzReporter into NetBeans, the 6.7 release process had already reached Milestone 2 or 3 (it has since been released!).  Since the Groovy/Grails plug-in for 6.7 provided better, more up-to-date support, I decided to jump directly to it.  After installing the base version and adding the Groovy/Grails plug-in, I looked over the options for opening the project in NetBeans.  I couldn't see any easy way to either import the project based on the existing IntelliJ files or create a new project pointing to the existing sources (any suggestions?), so I ended up doing something weird along the lines of creating a new project and checking out the files from Subversion on top of it.  I don't remember the exact steps, but even if I did I might not repeat them here, since I'm sure there's a better way to do it.  Once that was done, I was a bit disappointed to discover that I was unable to designate my Groovy script (MainFrame.groovy) as the project's main class.  This was even true after I converted the script into a class and gave it a main method.  Fortunately, it's still possible to run the app inside NetBeans by selecting that file and choosing Run File (Shift+F6) from the Run menu.

Next, it was time to try out Ivy and IvyBeans.  At the time, there was no up-to-date build available, so I built from HEAD, but at this point you can just download 1.1 Milestone 3.  Installation is as simple as extracting the zip you just downloaded and then following the basic plug-in install steps.  I searched on MVNRepository.com to find the appropriate org (a.k.a. group) and name for each of the two libraries my project uses.  I then removed the libraries from the NetBeans project, temporarily making the IDE unhappy.  After I added lines of the following form to ivy.xml and then executed a Clean and Build on the project, Ivy downloaded the libraries, IvyBeans made them available to the project, and the build completed successfully.  Pretty cool!

<dependency org="com.something" name="library_name" rev="1.2.3" conf="compile->*" />

My transition to NetBeans isn't totally finished, but it's almost there.  One thing I still need to figure out is what changes are needed to allow me to run an external build using the ant build.xml file generated by NetBeans – I think it involves specifying a few properties that are currently absent when ant is run standalone.

As I've said in most of my posts, please tell if you know of a better way to do any of what I've described above.  I'm definitely not an expert in NetBeans or Ivy.

Saturday, June 27, 2009

Embedded Groovy as an Application Extension Language

I recently worked with my client to add an extension mechanism (in this case, a very simple plug-in system) to their intranet Java web application.  It's initially meant for use by professional services staff and possibly other advanced users in the future.  Our first thought was to go with JavaScript via the Java Scripting API, since that pair of technologies was already in use elsewhere in the system for simple filtering expressions (e.g., "value > 10 && value < 100").  I wrote a few examples and discovered that the code quickly became a weird hybrid of Java and JavaScript that was very hard to read. ("Is that a Java String or a JavaScript String?…")  After a brief conversation with my client, we decided to instead go with Groovy for several reasons.  We can keep the code around as Strings for now, but easily compile it to class files later if our needs change.  Groovy's integration with Java is excellent and easy to understand.  If the user knows Java, but isn't comfortable with Groovy's many cool features, he or she can write normal Java (except for inner classes) and have it interpreted/compiled as Groovy.  What follows is a description of how I set it up.  Please let me know what you think of this approach, especially if you can suggest a way to improve it.

Each extension is a Groovy class extending a Java abstract adapter class that provides default implementations of all but one of the methods in a Java interface (ScriptingInterface).  The Java code used to load the extension gets a GroovyClassLoader using the following code:

GroovyClassLoader loader = new GroovyClassLoader(getClass().getClassLoader());

I use the following code to get the relevant class and instantiate it via its no-arg constructor, catching the (entertainingly-named) MultipleCompilationErrorsException along with several other exceptions:

String code = "<Groovy>"; 
Class<? extends ScriptingInterface> clazz = loader.parseClass(code); 
ScriptingInterface script = clazz.newInstance();

Does that sound like a reasonable way to do it?  It definitely works, but I'm not sure it's the best way.

Update: Ack!  For some reason, my Blogger settings changed from "New Posts Have Comments" to "New Posts Do Not Have Comments" through no action of my own!  While I try to figure out how to fix it (now that this post is no longer new), you can add any comments to this post's listing on DZone.

Update2: Problem fixed.  It turned out to be possible to turn on comments for a single post via the Blogger post editor.

Tuesday, June 16, 2009

Boston Grails Users' Group!

Last night, I watched an interview with Dave Klein done by Scott Davis of ThirstyHead at JavaOne 2009, in which they discussed Dave's upcoming book, Grails: A Quick-Start Guide. At the end of the interview, Dave mentioned that one of his kids had put together a site called g2groups.net, which helps people find and start Groovy/Grails user groups. I visited the site and was very excited to see that Boston has a newly formed group and that their first meeting is on Thursday! I have a conflict, but this is important enough that I plan to drop my original commitment to attend. See you there!

Wednesday, May 27, 2009

FogBugzReporter: Introduction

This is the first post in what I plan to be a series about a small application named FogBugzReporter that I began writing in October of 2007.  I'd recently started using the on demand edition of FogBugz from Fog Creek Software to track my time spent consulting and was disappointed to discover that it had no direct way for me to generate time reports.  Fortunately, I discovered that along with the web UI, FogBugz also had a pseudo-REST API for interacting with the application (I say "pseudo-REST" because almost all operations can be performed with HTTP GETs, including ones that change state on the server).  I read the API docs, quickly wrote a simple Swing application in Java, and made it available to the rest of the FogBugz community.  Soon after, I started experimenting with Groovy and realized that porting the app would be a nice way to get some experience with the language, including SwingBuilder and the great APIs for processing XML.  I did that and eventually moved the code from a private Subversion repository to one on Google Code under an Apache license.  Until a few days ago, it sat for the most part unchanged.  In the meantime, my Groovy skills have improved a bit (I'm still no expert), the language has evolved, and the technologies around it have grown.  In this series, I plan to gradually bring the application up-to-date and polish up the code.  I'll update this posting with links to the other posts in the series as I write them.

Posts:

Planned Posts:

  • FogBugzReporter and Griffon
  • Suggestions?

Saturday, May 23, 2009

AWS Elastic MapReduce Webinar @Noon ET on May 28

Amazon's new Elastic MapReduce service makes it easier to run Map/Reduce jobs within EC2. If that sounds interesting, you can find out more about it by registering and "attending" the webinar they're holding on Thursday, May 28 at Noon ET. I'll be there, in a virtual sense. :)

Sunday, May 17, 2009

ConcurrentLinkedHashMap and Possible Alternatives

I mentioned in an earlier post that a project I'm working on needed a thread-safe class with behavior similar to LinkedHashMap, which maintains the order of entries while allowing sub-classes to set a (rather primitive) eviction policy.  I came upon and started using Ben Manes's ConcurrentLinkedHashMap, with only minor modifications to allow for easier sub-classing.  I've been pretty happy with it so far and have been meaning to take a look at recent changes that he and "zellster" have made since I last downloaded the code.

The same project is also using ehcache for caching.  I was looking forward to the release of version 1.6 and inquired about the expected release date on the developer's forum.  That kicked off an exchange between me and Greg Luck, the lead for ehcache, that eventually led to him mentioning some problems he'd encountered with ConcurrentLinkedHashMap.  I'm going to check in with Ben and let him respond to what Greg said, along with giving him a chance to provide a guess for when his new version of CLHM is likely to be usable.  I also need to look into what Greg was referring to in the changelog for ehcache 1.6 beta5 when he said, "Make MemoryStore eviction policies injectable."  Interesting…

Just to confuse matters, I was also recently looking over some materials discussing Infinispan, which will essentially be the 4.0 release of JBoss Cache, and saw this interesting blog entry on "Implementing a performant, thread-safe ordered data container".  In it, Manik Surtani mentions their newly implemented classes FIFODataContainer and LRUDataContainer, which sound like another plausible way to approach my problem.  I'll make sure to ask more about them in the comments for that post.

Saturday, May 16, 2009

JPC: x86 Emulator in Java

Last night, I was reading Caches and Maps in Terracotta by Alex Miller when I came upon his link to the JPC (a.k.a. JavaPC) Project.  It sounded familiar and I decided to follow the link.  It turns out to be an open source project implementing an impressively functional and performant x86 emulator in Java.  I had fun last night launching an MS-DOS environment within an applet (what a concept!) and playing Donkey Kong for the first time in ages.  According to the site, the emulated layer runs at about 20% of your processor's speed - not bad for a emulator written in pure Java.

Note: After my first visit to the site, it seemed to be having problems.  Just in case it had to do with visitor load, I changed the link above to point to the copy in Coral Cache.  The site is now totally back up and behaving normally.  If Coral has any problems or is slow, you can safely go to the site directly.

Update: I got an email from one of the JPC team members saying that they'll be at JavaOne this year with some new stuff.  I wish I could be there to see it.  I guess I'll have to wait for it to show up on their site.

Monday, May 11, 2009

Trying out DISQUS for Comments

I've decided to try out DISQUS for handling comments on my blog. Let me know what you think. I can always revert back to Blogger comments if desired.

Update: It turns out that there's a bug in Blogger that precludes you from using DISQUS in conjunction with an external post editor. Since I'm happy with Windows Live Writer, I guess I'll disable DISQUS for the time being and revisit the decision if the bug gets fixed.

Sunday, May 10, 2009

VisualVM and Cutting Method Calls by Over 1000x

I’ve been using VisualVM on and off with modest success since I first found out about it at JavaOne 2008.  In addition, since JDK 6 update 7, it’s been included as part of the standard JDK installation (released in July of 2008).  I was quite happy when it helped me identify bottlenecks and cut down the time to run a suite of unit tests by 50% (from 2 minutes to 1 minute).  After all, the less time it takes to run unit tests, the more often they’ll get run.  However, my biggest success using VisualVM came in early April, when I was asked to figure out why a use case was exhibiting disturbing performance characteristics.

When I first walked through the use case, it was taking about 10 seconds to run with 50 items.  The time appeared to be proportional to the number of items or perhaps even worse.  This was a scary prospect given that there would often be more than 10,000 items in a real production setting and that the use case was intended to be interactive (as opposed to a batch or background task).  A careful code review might have revealed the problem, but I knew it would be far more efficient to profile the code as a way to identify hot spots.  I started up VisualVM, connected to the relevant JVM, turned on CPU profiling, and collected data for the use case.  Sadly, I seem to have lost the relevant snapshot, so you’ll have to trust me when I tell you that the method SqlQueryFile.isSupported() was being called over 36 million times!  I made some minor tweaks to the method itself which improved its performance by about 10 percent – okay, but not nearly enough.  I next identified what was effectively a loop invariant.  The isSupported() method was being called repeatedly with the same query and schema.  As you can see from the snippets below, I pulled that check out of the nested loop.

Before:
for(Platform platform=p; platform != null; platform=platform.getParent()) {
  for(SqlQuery q : metricQueries) {
    if (isSupported(q, datasourceSchema)) {
      if(q.getMetricPath().equalsIgnoreCase(metricPath) &&
         q.getPlatform().equalsIgnoreCase(platform.getName())) {
        if(ret==null) {
          ret=q;
        } else {
          if(isMoreRecentThan(q, ret)) {
            ret=q;
          }
        }
      }
    }
  }
}
After:
Collection supportedQueries = new ArrayList();
for (SqlQuery q : metricQueries) {
  if (isSupported(q, datasourceSchema) && q.getMetricPath().equalsIgnoreCase(metricPath)) {
    supportedQueries.add(q);
  }
}

// need to optimized this code for better performance
for (Platform platform = p; platform != null; platform = platform.getParent()) {
  for (SqlQuery q : supportedQueries) {
    if (q.getPlatform().equalsIgnoreCase(platform.getName()) &&
        (ret == null || isMoreRecentThan(q, ret))) {
      ret = q;
    }
  }
}

When I ran the profiler again, it turned out I'd vastly reduced the number of calls to (~36 million to ~9 million) and the time spent in (80+% to 26.3%) the isSupported() method, as you can see in the picture below:

snapshot-040109-7pm

I probably could have left it there, but after taking another quick look, I noticed a very simple way to save even more time.  I could flip the order of the conditional expression in the first if statement.  That way, the cheap operation (which usually returns false) would come first and frequently allow us to skip the evaluation of isSupported().

if (q.getMetricPath().equalsIgnoreCase(metricPath) && isSupported(q, datasourceSchema)) {

To my surprise, this cut the number of calls to around 27,000 and the time spent in the method to 1%!

snapshot-040109-8pm

With these optimizations in place, the same use case with 11,000 items now took less than a second.  Thanks VisualVM!

Update: This post was a second place winner in the Java VisualVM Blogging Contest!

Saturday, May 9, 2009

High Performance, Parallel, & Grid Computing

On Wednesday night (May 6th), I attended an alumni event held by the UMass Boston CS Department in the beautiful five year old Campus Center.  It was great to see my professors and former classmates, along with talking with current students about their software engineering projects.  Toward the end of the event, there was a talk given by an alumnus named Richard Anderson, the CTO of Symmetric Computing, a company that makes hardware and OS-level software for High Performance Computing (HPC).  Their slogan is something along the lines of “Supercomputing for the Masses”.  The speaker gave a fairly typical overview of the history of CPU performance, including references to Moore’s Law, hitting the gigahertz wall, and the emergence of multi-core chips, followed by a discussion of parallel programming.  He mentioned several times that the company would be making one of their 48 core/350GB systems available to the university for academic use.  It was a decent talk, if not extremely exciting.  It made me think of concurrent programming, grid/cloud computing, and the cool compute appliances made by Azul Systems.  Based on questions asked by myself and others, I was disappointed to discover that although the technical gap between HPC and the rest of the computing world is narrowing, most of the HPC world seems unaware of this.  A lot of work has gone into making clusters of commodity hardware work together and coding for these architectures easier (think Google, Map/Reduce, etc.).  It would be nice if people in HPC and distributed computing could do a better job of exchanging ideas.  Even if typical clusters have to deal with latency issues and the inability to directly access large amounts of memory, that doesn’t explain why someone should code an HPC app in Fortran or C on a Symmetric box instead of in Java on a 432 core/384GB Azul box.
I realize that this is a bit of a rant.  If any readers can provide counter-arguments or counter-examples, I’d be happy (and reassured) to read and know about them.
Update: I forgot to mention that if you’re doing academic research or teaching a class in clustering/distributed computing, you should check out the fairly new Amazon Web Services in Education program.  They provide free credits to educators, researchers, and students.  Pretty cool!

Sunday, May 3, 2009

Groovy versus Scala (Update)

Back in March of last year, I wrote a post comparing Groovy and Scala.  Enough time has passed that I thought it was worth revisiting the topic.

In my original post, I reached the conclusion that at least for me, it made sense to first focus on Groovy and then later learn more Scala.  I still feel that way, but my thoughts on the matter have evolved.  How about combining Groovy and Scala in the same program or architecture?  That thought had vaguely occurred to me already, but it was solidified by the discussion of Twitter’s use of Ruby and Scala.  They started off using Ruby (Ruby on Rails, in particular) to run their whole site.  When they ran into performance and scalability issues, they determined what was causing the most trouble (back-end tasks) and gradually converted the relevant parts of the system into Scala.  This kind of polyglot programming makes sense to me.  It follows the principle of “using the best tool for the job.”  I think the combination of Groovy and Scala makes even better sense: they’re designed to run and interoperate with other languages on the same VM.  Not surprisingly, I’m not the first person to think of this.  I found two posts from Andres Almiray (“Griffon: Groovy & Scala working together” and the follow up “Follow up on Griffon/Groovy/Scala”) showing some practical examples and another on a different blog discussing the ability jointly compile “Groovy+Scala+Java”.  I’m excited to watch and perhaps participate in these exciting developments.

Side Note: On the subject of Groovy, when my one-year IntelliJ license was close to expiring, I decided to check out NetBeans.  I was pleased to discover it had quite good Groovy and Grails support.  I’ve been using the 6.7 milestone builds and now the beta, in order to get support for Grails 1.1.  If you haven’t tried Groovy/Grails development in NetBeans yet, you should.

Update: Here are links to the items Andres mentioned in the comments

Update (Jan 12, 2010): Andres recently posted the highly relevant Groovy & Scala: a tale of two JVM languages

Saturday, April 25, 2009

Simple, But Very Clever

I was reading some articles on PCWorld (see my last post to find out why) and came upon an interesting link in one of the comments.  The link took me to SuperFiller.  It's a simple service which helps solve a problem I have all the time.  Have you ever put together an order on Amazon where all the items qualify for free shipping, but the total is a bit shy of the required 25 dollars?  Well, as the front page of the site shows, you can ask SuperFiller what items are available on Amazon for $1.47, or whatever amount you need.  It's simple and probably wasn't very hard to code, but someone had to think of it.

p.s. I'm guessing they make a small amount of money on each purchase through Amazon Associates referral fees.

Friday, April 24, 2009

I'm Quoted in PCWorld!

Juan Carlos Perez from IDG News Service (the parent company of PCWorld) contacted me today to ask if I'd be interested in speaking with him about the Amazon Web Services offerings. He explained that he was writing an article on the subject and wanted to hear how customers felt about it. We had a nice chat late this morning. I let him know that I've experimented with EC2 and S3, but haven't done any production work with either. I made sure to plug the Boston Scalability User Group (of which I've been a member since the first meeting) and mentioned that we'd already had Mike Culver of Amazon as a speaker.

The result was an article which appeared in PCWorld: "Amazon.com Eyes CIOs With Its AWS Cloud IT Services". My quotes were pretty clunky (apparently me thinks and me writes more gooder than me talks :) ), but I think readers will at least be able to figure out what I meant.

Update:
the article also appears in InfoWorld and ComputerWorld Norway

Another Update: I contacted the reporter about the clunkiness of my quotes.  He said that since the article had already been published, he would have to run any changes by his editors and it would effectively be treated as a retraction/correction.  That's obviously overkill, so I guess I'll just have to live it and follow his suggestion for next time: ask the author to run quotes by me before the article is filed.

Sunday, April 5, 2009

Blogger Causing Problems for SyntaxHighlighter

I kept noticing blog entries with beautifully rendered source code snippets and finally clicked the blue question mark in the upper left of one and discovered that they were using Alex Gorbatchev's SyntaxHighlighter JavaScript library. For my last post, I followed the instructions in the Usage Guide, using the hosted version since I can't upload JS files to Blogger, at least as far as I know. If you look at my post, you'll notice that I have two 1 line Groovy snippets and one 5 line Java snippet. The 1 liners were no problem, but when I first tried to display the 5 lines of Java it showed up as just one line with visible <br> tags where the line breaks should have been. It turns out that when I pasted the code into Blogger's post editor, even though it was in HTML source mode, it inserted <br> tags for the line breaks even though they were showing up inside of a <pre> tag. It turns out that there are two solutions to the problem. I'll explain the better one first, although I didn't find out about it until trying the other one.

When I asked Google for help, it led me to a page talking about how to use SyntaxHighlighter with Blogger. The post itself didn't contain any new information for me, but the comments were useful. "Robje" had apparently experienced exactly the same problem as me. "Debiprasad" then asked, "What is your settings for 'Convert line breaks' set to? In my case, if I set it to 'Yes', it's inserting <br/> into codes defined under <pre> tag." Ah ha! I looked on the "Formatting" subtab under the "Settings" tab in the Blogger control panel and discovered that mine was set to "Yes". Ack! What an annoying default! With it set to "No", it seems to fix the problem.

In case that doesn't work for you, Google also brought me to a page on what must been the old home for SyntaxHighlighter. Once again, the gold was in the comments. "fahd.shariff" mentioned that if you use the post editor in Blogger in Draft, this isn't a problem. As I said earlier, I gave this a try before finding the preferable solution and it worked for me so it's a reasonable solution to fall back on.

Succinctly Groovy

I was reading a post on displaying Google Visualization charts in Grails on mrhaki's blog and noticed something interesting. In his code, mrhaki wanted to get a handle on the newest (most recently modified) file in a directory. His approach was to list the files in the directory, sort them (in ascending order), reverse that, and then get the zeroth item:
def xmlFile = dir.listFiles().sort{ file -> file.lastModified() }.reverse()[0]
It's a nice example of how you can express a somewhat complex flow in a single line of Groovy. Still, it seemed like overkill to me and another reader who failed to identify him or herself. That reader linked to the Groovy Collections page and suggested using a Comparator that sorted the list in descending order to start with, but didn't include a code snippet. I liked the fact that this approach avoids the extra work to reverse the sorted list, but I wasn't sure it would be possible to express it so compactly and it still seemed more complicated than necessary. I looked over the Collections page and was reminded of the min(Closure) and max(Closure) methods available on Groovy's enhanced collections. So, rather than sorting the list of files, you can just ask for the max as determined by last modified date:
def xmlFile = dir.listFiles().max{ it.lastModified() }
Nice and simple, eh? Ignoring the overhead of imports and class/method declaration, neither of which are necessary with the Groovy snippet, here's the most compact way I know of expressing this using the Java standard libraries:
File xmlFile = Collections.max(Arrays.asList(dir.listFiles()), new Comparator<File>() {
  public int compare(File file1, File file2) {
    return (int)(file1.lastModified() - file2.lastModified());
  }
});
I'm glad Groovy's out there!

Saturday, March 14, 2009

A Teeny Tiny Bit of Fame

When I noticed I was credited for a fix on the changelog page for Tomcat 5.5.27 (search for "Passell" on the page if you don't believe me!), it made me feel a bit giddy. Does that make me a total dork? Well, I've been using Tomcat since it emerged out of the Apache JServ project many years back, so it makes me proud to think that I've helped make it better, even if just a little bit.
 
Update: my fix was incorporated into Tomcat 6.0.20 as well.

Why Twitter?

I'm not a technophobe. I don't blog as often as I mean to, but when I do, I enjoy it. I use IM on a regular basis. So, why do I have no interest in using Twitter? What do others see in it that I don't? Anybody out there reading this who can explain it to me?

Update: I have similar feelings about Google Buzz – I believe I have a good understanding of how it works, but I don't understand its value.

Friday, March 13, 2009

Amazon EC2 Reserved Instances

There have been some exciting developments recently in the world of Amazon EC2. For a long while, they've had compelling offerings for batch-oriented services and ones requiring dynamic scaling, but weren't as strong from a pricing perspective for interactive services with fairly steady demands. Well, that's no longer the case. Amazon just announced the introduction of "reserved instances" on EC2, which are different from regular ones in two primary ways. Reserved instances are guaranteed to be available when you ask to activate them and their pricing structure involves an up-front fee in return for a lower hourly operating cost. The availability guarantees are certainly an interesting development, but I'm not going to discuss those here. What currently excites me are the pricing options and what they mean for some architectures and demand patterns.

Others have already done the math comparing Amazon's reserved instance pricing with other hosting options and with standard EC2 on-demand pricing, but I've haven't seen anyone pointing out where the break-even point lies for a reserved instance versus an on-demand one. For example, a small instance costs 10 cents/hour on-demand and $325 (1 year plan) or $500 (3 year plan) and 3 cents/hour reserved. You'll only need to use a reserved instance for 4643 hours on a 1 year plan or 7143 hours on a 3 year plan before the reserved instance starts being cheaper per hour on average than an on-demand one. I've based this on the following equation: .1 * hours = upFrontCost + .03 * hours. For a 1 year plan, that works out to be 193.5 days/year, 16.1 days/month, or 53% of the time. For a 3 year plan, it's 2381 hours/year, 99.2 days/year, 8.3 days/month, or 27.2% of the time. (I'm not factoring in the time value of money here [paying $325 or $500 up-front is not the same as being able to spread it out over 1 to 3 years]. I'll leave it to the reader to do that.)

What does this all mean? If your architecture or typical demand requires 3 instances be running all the time, but you want to be able to dynamically scale up your instance usage, you can reserve the 3 consistently required instances and use on-demand ones to handle spikes in demand. In fact, it might even be worth reserving more instances even if you expect (with higher than 53% confidence), but aren't sure, you'll need them.

Update: On August 20, Amazon announced lower pricing for reserved instances.  I'd redo the math above, but Thomas Brox Røst did a particularly nice job in a post on his blog, so I'll let that do the talking.

Tuesday, March 3, 2009

Grails Demo Webinar

I just found out yesterday that Graeme Rocher, SpringSource's Head of Grails Development, will be giving a Grails demo webinar on Thursday, March 19. It will consist of building a Twitter clone using Grails. I've seen similar demos at SpringOne 2008 and JavaOne and I'd highly recommend "attending" if you're interested in seeing what Grails can do.