Strangeloop 2011 Day 2 Supporting tagline
I'm headed back home from Strangeloop 2011 this morning. Once again I booked an early flight so was up at 4:45 to get to the airport (when will I learn?) The conference was a smashing success as far as I am concerned. It was extremely well run and the talks were full of solid content. I didn't see nearly as much marketing during the conference as I've seen at other conferences which was really nice. Most of the marketing I did see was companies trying to recruit new developers. There seems to be a lot of demand out there right now for innovative thinkers and people who are eager to stay on the cutting edge. Makes me think...
I started the day with a talk by Jake Luciani called "Hadoop and Cassandra". Basically this was an introduction to a tool called Brisk which helps take some of the pain out of bringing up Hadoop clusters and running MapReduce jobs. In essence it embeds the components of Hadoop inside Cassandra and makes it easy to deploy and easy to scale with no downtime. It replaces HDFS with CassandraFS which in an of itself looks really interesting. It's turning the Cassandra DB into a distributed file system. Very interesting how they are doing that. Sounds like a topic for another post once I've had some time to read some more about it. Jake showed a demo that looked quite impressive as he brought up a four cluster Hadoop on Cassandra node and ran a portfolio manager application splitting it into an OLTP side and an OLAP side. Brisk definitely deserves further investigation.
The second talk of the day I went to was "Distributed Data Analysis with Hadoop and R" given by Jonathan Seidman and Ramesh Venkataramaiah from Orbitz. I've seen some things that Orbitz has been doing before, so I was excited to see what they have been doing with Hadoop and R. After covering what R and Hadoop are Ramesh described some of the problems they were trying to solve using analytics. One point made that resonated with me was the fact that using sampling to reduce the amount of data that you have to use in your analysis is very bad for long tail distributions. Definitely something to keep in mind. One other point made that I have heard before from other talk is to always keep the source data that you use in your analyses. This is very good advice for both future analyses as well as letting others validate your analyses.
Jonathan came on about halfway through to give more detailed information on hooking up R and Hadoop. I have to say I was a little disappointed to hear that the work in this area is incomplete to say the least. He talked about using Hadoop streaming, Hadoop interactive, RHIPE and rmr (from Revolution Analytics). Out of these he spent the most time talking about RHIPE since that is what they are using. He had very good things to say about rmr as well but they hadn't done much with it yet since it was so new. Jonathon also mentioned JD Long's segue package which I have seen JD give a demo of at an R users meeting before. It is something target toward applications which are embarassingly parallel (big cpu, not big data), so wouldn't be applicable in general. I came out of the talk interested in checking out both RHIPE and rmr and will keep segue where it fits. You can find the code for the talk and the slides online. I have to give credit to Orbitz for sharing this information with the community. They are doing some really interesting stuff.
Next I went to Benjamin Young's talk called "Why CouchDB?" I have been intrigued by CouchDB on and off for a while now, so wanted to hear what was new and what set it apart from other document databases like MongoDB. One thing that turns me off from CouchDB is the Map/Reduce style views that you have to create to do queries. I just don't see where that is flexible enough, but maybe that's the idea. Benjamin made a strong pitch for the importance of replication of data. Especially in the world that is becoming more and more mobile. It's an interesting idea to keep the data local so that you can access it very easily and quickly. I think that was the essence of Benjamin's talk. I don't think it swayed my toward CouchDB though. Mainly because I don't think I really have an application for it that is in its wheelhouse. I did learn a lot more, so will know when it fits.
After lunch was a languages panel. Alex got together some of the leading minds in the field of computer language development. The panelists were
blog comments powered by Disqus
- Gerald Sussman who we all know as one of the inventors of Scheme and a long time leading computer science professor at MIT.
- Jeremy Ashkenas from the NY Times who has worked on CoffeeScript
- Rich Hickey, creator of Clojure
- Allen Wirfs-Brock, who has done quite a bit of work with ECMAScript standards
- Joe Pamer from Microsoft who works on F#
- Anderei Alexandrescu who works on the D language
blog comments powered by Disqus
Published
21 September 2011