Latest Publications

import __future__

Recently ran into a nice feature of python 2.5.  Divison will truncate a floating result into an integer result.

Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type “help”, “copyright”, “credits” or “license” for more information.
>>> def e(d, dhat): return ((d – dhat)/d)**2

>>> e(40,35)
0
>>> from __future__ import division
>>> e(40,35)
0
>>> def e(d, dhat): return ((d – dhat)/d)**2

>>> e(40,35)
0.015625
>>>

Obviously, this has been addressed in the future (namely >2.6), but it’s definitely a pitfall to avoid while using the stock python on OS X.  The other piece that caught me by surprise is that need to redefine my method.

Update: apparently this feature persists in “Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39)”

A View of the Internet

A view of the nearly 300,000 autonomous systems (AS) on the internet.

The whole internetCollected using BGP data and processed/visualized with python and matplotlib.

Up for Air

I’ve been heads down at Elastra for the last year and half.  Their internal wiki is full of my contributions, much of which are specific to the team on the ground.  In the last year, quite a bit has changed.  I’ve made the jump from being a SOA specialist to embracing REST, I’ve squeezed my hands around RDF/OWL and have come to love N3 in lieu of XML.  I grew an engineering team from 3 to 25 in short order and learned a great deal in the process.

I’ve started to use Twitter and find it nice, but limiting when I want to write more.  I’m looking for a plugin for wordpress that will combine my posts with my tweets so I can keep the conversation in one space.

Looking forward, I’m coming up for air and I’m lucky to be able to look around and comment on the state of affairs.  There is quite a bit do discuss considering cloud computing and the rise of a bookseller in raining Seattle that changed the why people think about the IT.  This will have a ripple effect through the development community as we see the death of the appserver and the rebirth of something more…new and exciting.

I expect that there will be a focus on web technologies that at first appear simple and, as the layers  peel away, the fundamentals shine.  It will require new ways of doing things, but this industry reinvents itself every 18 months.  Keep an eye on Microsoft’s Azure and what they do around their developer tools.

Knocking MapReduce

MapReduce is the power application for grid computing.  Grid computing works very well if the problem is “embarrassing parallel” but it seems to stop there.  In the book, In Search of Clusters, you can’t expect to create a single system image or single image memory without making out your network pipes.  As long as the problem doesn’t require you share state, then grid computing works.  MapReduce is such a function that works very well for grid computing and like most “new” things it requires that most folks point out the death of the “old,” in this case, the relational database. I personally think that databases and MapReduce both solve real problem and are therefore valid approaches.

I think the following post, MapReduce: A major step backward, helps put things in perspective.  DeWitt and Stonebraker do a good job with analysis, but occasionally step over the line of objectivity with key comments like “…[g]iven the experimental evaluations to date, we have serious doubts about how well MapReduce applications can scale.”  I think Google’s implementation of MapReduce fits a solution to a problem very, very well and scales accordingly.  However, I agree that MapReduce applied generally is not a solution that scales.

Web Security Context: Experience, Indicators, and Trust

I recently posted an idea for a Security Commons, similar to the creative commons on the Security Catalyst Forum (login/reg required…highly recommended for the security conscious). I thought the idea was novel…not so: Web Security Context: Experience, Indicators, and Trust.  It’ll probably take 3 years for it come to fruition, but it’s the right start.

REST on the Edge

REST facilitates the last mile of integration. This may be from within the enterprise to foster enterprise mashups or to be consumed from afar across the internet in ways unforeseen.

The simple fact is that you may need/want to rely on Rest on the edge, but you may also want to leverage SOAP’s security mechanisms as you get closer to your enterprise foo (ERP, CRM, etc.).

1 Raindrop

REST for the Enterprise? Can state transition be managed correctly?

REST in the Enterprise…is it possible?

…I don’t think you can express the business interface in a RESTful way. REST says PUT is enough (I need an out-of-band contract to express it). The reality is that sure, Google, Yahoo, Amazon are ready to sacrifice the business interface to gain in scalability. There is nothing wrong with that, they have few interfaces and can easily ask their “clients” to figure it out. Their interfaces are fairly static. In the enterprise you generally don’t care too much about scalability, you care a lot more about your content updates and state transitions. This is why you have information systems in the first place.

From a post on ebpml.org

I’ve personally be pretty comfortable with REST for ENTITY services. I’m still on the fence when it comes to reliable interactions where consistency is maintained. WS-ReliableMessage from the WS-* camp basically provides a mechanism to gaurntee state transitions in a nominal sense. REST seems to be trying to reinvent the wheel here. They may achieve it with less angle brackets, but will it really be a benefit?

Why do people insist on remembering passwords?

What’s the most secure password you can remember? It’s the one you don’t even know!

I’ve been using for the last two years, KeePass (and KeePassX) to manage my passwords. I store my passwords on a dongle that I carry with me almost all the time. I have forgotten it on occasion, but I’ve never been in a situation where I needed it right away. For my high security passwords, I make a point not to remember it, put garbage into the challenge phrases (basically an officially approved back door…and don’t get me started when they choose the questions for you). All this goes into KeePass. I also use a key file, stored on the computers that are approved to use Keepass.

For daily passwords, I have my a rotating set that I use. These are consider medium grade and are only used on trusted sites. Anything else, goes into KeePass.

I’ve heard of the technique for using an algorithm for creating unique passwords. The problem is that you can only remember passwords that are so long, before it becomes impossible. I’ll use a 120 character password if I can. The algorithm method is just too easy to abuse or potentially hack.

Ditch the Server, Virtualize your Hardware

It looks like purchasing servers for your brand new web two-point-oh application is a sure way to get a cold shoulder in Silicon Valley.  CrunchBase and Mogulus do not own a single server according to the scobleizer blog.  It did require some changes to the fundamental way they built their application because Amazon’s EC2 is completely volatile.  Data is lost when the virtual instance terminates, unless, of course, you store it in Amazon’s S3.  So forget transaction durability.

On the bright side, Elastra does have a unique solution in this space.  They provide a distributed file system backed by S3 for your EC2 instance.  This is definitely a company to watch.

The new way forward

There is a new pattern for scaling web based systems that is emerging. Cache Farms and Read Pooling.

On the JavaSE 7, JCache is being proposed to provide a cache abstraction although it has been criticized for being out of date. Read Pooling is typically handled at the JDBC driver level or with a JDBC proxy.

There is still a good deal of engineering to be done with regards to these new technologies. I for one would like to see annotations that describe how an entity should be cached, something similar to ETags. At least on the Java runtime, various layers could manage caching much more precisely as requested by the domain object. For example, letting JPA’s cache know what to evict after how long so that clients that can tolerate stale data can coexist.