Knocking MapReduce

January 18th, 2008

MapReduce is the power application for grid computing.  Grid computing works very well if the problem is “embarrassing parallel” but it seems to stop there.  In the book, In Search of Clusters, you can’t expect to create a single system image or single image memory without making out your network pipes.  As long as the problem doesn’t require you share state, then grid computing works.  MapReduce is such a function that works very well for grid computing and like most “new” things it requires that most folks point out the death of the “old,” in this case, the relational database. I personally think that databases and MapReduce both solve real problem and are therefore valid approaches.

I think the following post, MapReduce: A major step backward, helps put things in perspective.  DeWitt and Stonebraker do a good job with analysis, but occasionally step over the line of objectivity with key comments like “…[g]iven the experimental evaluations to date, we have serious doubts about how well MapReduce applications can scale.”  I think Google’s implementation of MapReduce fits a solution to a problem very, very well and scales accordingly.  However, I agree that MapReduce applied generally is not a solution that scales.

Web Security Context: Experience, Indicators, and Trust

November 24th, 2007

I recently posted an idea for a Security Commons, similar to the creative commons on the Security Catalyst Forum (login/reg required…highly recommended for the security conscious). I thought the idea was novel…not so: Web Security Context: Experience, Indicators, and Trust.  It’ll probably take 3 years for it come to fruition, but it’s the right start.

REST on the Edge

November 23rd, 2007

REST facilitates the last mile of integration. This may be from within the enterprise to foster enterprise mashups or to be consumed from afar across the internet in ways unforeseen.

The simple fact is that you may need/want to rely on Rest on the edge, but you may also want to leverage SOAP’s security mechanisms as you get closer to your enterprise foo (ERP, CRM, etc.).

1 Raindrop

REST for the Enterprise? Can state transition be managed correctly?

November 21st, 2007

REST in the Enterprise…is it possible?

…I don’t think you can express the business interface in a RESTful way. REST says PUT is enough (I need an out-of-band contract to express it). The reality is that sure, Google, Yahoo, Amazon are ready to sacrifice the business interface to gain in scalability. There is nothing wrong with that, they have few interfaces and can easily ask their “clients” to figure it out. Their interfaces are fairly static. In the enterprise you generally don’t care too much about scalability, you care a lot more about your content updates and state transitions. This is why you have information systems in the first place.

From a post on ebpml.org

I’ve personally be pretty comfortable with REST for ENTITY services. I’m still on the fence when it comes to reliable interactions where consistency is maintained. WS-ReliableMessage from the WS-* camp basically provides a mechanism to gaurntee state transitions in a nominal sense. REST seems to be trying to reinvent the wheel here. They may achieve it with less angle brackets, but will it really be a benefit?

Why do people insist on remembering passwords?

November 20th, 2007

What’s the most secure password you can remember? It’s the one you don’t even know!

I’ve been using for the last two years, KeePass (and KeePassX) to manage my passwords. I store my passwords on a dongle that I carry with me almost all the time. I have forgotten it on occasion, but I’ve never been in a situation where I needed it right away. For my high security passwords, I make a point not to remember it, put garbage into the challenge phrases (basically an officially approved back door…and don’t get me started when they choose the questions for you). All this goes into KeePass. I also use a key file, stored on the computers that are approved to use Keepass.

For daily passwords, I have my a rotating set that I use. These are consider medium grade and are only used on trusted sites. Anything else, goes into KeePass.

I’ve heard of the technique for using an algorithm for creating unique passwords. The problem is that you can only remember passwords that are so long, before it becomes impossible. I’ll use a 120 character password if I can. The algorithm method is just too easy to abuse or potentially hack.

Ditch the Server, Virtualize your Hardware

November 16th, 2007

It looks like purchasing servers for your brand new web two-point-oh application is a sure way to get a cold shoulder in Silicon Valley.  CrunchBase and Mogulus do not own a single server according to the scobleizer blog.  It did require some changes to the fundamental way they built their application because Amazon’s EC2 is completely volatile.  Data is lost when the virtual instance terminates, unless, of course, you store it in Amazon’s S3.  So forget transaction durability.

On the bright side, Elastra does have a unique solution in this space.  They provide a distributed file system backed by S3 for your EC2 instance.  This is definitely a company to watch.

The new way forward

November 13th, 2007

There is a new pattern for scaling web based systems that is emerging. Cache Farms and Read Pooling.

On the JavaSE 7, JCache is being proposed to provide a cache abstraction although it has been criticized for being out of date. Read Pooling is typically handled at the JDBC driver level or with a JDBC proxy.

There is still a good deal of engineering to be done with regards to these new technologies. I for one would like to see annotations that describe how an entity should be cached, something similar to ETags. At least on the Java runtime, various layers could manage caching much more precisely as requested by the domain object. For example, letting JPA’s cache know what to evict after how long so that clients that can tolerate stale data can coexist.

What is Your Identity Worth?

November 12th, 2007

El Reg recently posted an article about a man you lost his eBay account after 11-years due to a glitch.  This customer was a power user and had a 99.7 rating with over 700 posts.  According to Waldman: “Once your ID is deleted, you cannot get into the help system. I don’t exist. I’ve been made into another person.”

At what point does your online identity become valuable.  I have email address that if I lost, wouldn’t be the end of the world just a hassle.  But if I lost an email that I traded on and built a solid reputation, I would be extremely upset.  This non-tangible, (hopefully) non-transferable asset is worth value and I’m not aware of any way to backup or insure it as a private entity.

Engineering Performance Metrics…Here’s Looking at You

November 9th, 2007

One of the main tenets of Scrum is transparency.  One virtue of transparency is accountability.  With Scrum it becomes immediately apparent what is wrong and it is up to the Scrummaster to go make right.  Scrum, purposefully, does not provide any metrics for helping determining where the breakdown has occurred so managers can help remove the impediment.

Having spent some time on the Sales side of the organization, it’s a very different story.  Account managers are responsible for delivering revenue period.  Their system is extremely transparent since its usally recorded on the company ledger every quarter.  Individual performance has even been refined to track performance through the entire sales life cycle (http://dashboardspy.com/dashboards/41/4-elements-of-sales-performance-and-their-metrics).  This helps keep sales efficient, lean and mean. 

A TSS post seems to show that some people are more interested in pointing the finger at management and still not request they’re measured.  It’s hard, but being accountable helps focus one’s actions to ensure they’re always providing value.

Schema First and Anemic Domain Models

November 8th, 2007

I was recently introduced to the notion of Anemic Objects while discussing schema first or POJO first xml definition. I come from the camp of schema first and considered the resulting java objects to be just a normal side effect from this approach. The alternative has scary ramifications from a system design/interoperability perspective.

Before I get into my thoughts on either approach, lets cover what an anemic object is. According to Martin Fowler, they’re an anti-pattern.

The fundamental horror of this anti-pattern is that it’s so contrary to the basic idea of object-oriented design; which is to combine data and process together.

The service, or business, objects are responsible for extracting the required bits and performing everything from validation to the actual business logic.

So is schema-first really a cause for this problem? Casually speaking, yes, but I believe it’s mostly due to the ease of code gen and focus on ease of development. For most developers, learning WS-Schema is not a simple undertaking, just like learning any language or standard. If I know Java, and a tool will create the required artifacts for me, then I can ignore what’s generated as long as it keeps in sync with my domain model.

Keeping in sync with a domain model can be challenging when a developer doesn’t control the underlying schema. Think about experiences of mapping a robust database schema to an object model. The relational/object impidence has resulted in numerous frameworks, Hibernate, JDO, JPA, EJB, Toplink and many more, to handle these differences. Even then, all allow for a way to just use plain SQL and do the mapping yourself. The same mismatch applies to XML Schema as well. There are various frameworks, JAXB and XMLBeans for example, that help out in this area as well.

It’s easier to allow the Domain Model to drive these models or to just consider any schema-first output as a second class object or bit bucket. I’ll admit to subscribing to the latter. As an enterprise architect, it’s easier to think in terms of data schema and service endpoints that act on it. WS-* and REST both encourage this type of behavior. For the developer on the ground, this can be limiting as it does not help promote OO design.

Indeed often these models come with design rules that say that you are not to put
any domain logic in the the domain objects.

But it doesn’t have to be that way.

I’ll pick on the automated build process for a moment. I think what happens is that the auto-gen of either Schema or POJO gets baked into the build process and developers forget. The output is a secondary though that is constantly sync’d with their domain. The problem is that those external parties that rely on those artifacts are also treated as second class. A developer changes and external interface contract without even thinking. This is the part that is scary

The middle ground is to use the schema gen once and lock it down. Put it in the repository and protect it with the walls of governance. The auto-gen can still place, but it needs to also verify that it has not broken the contract. This way, a schema that started from a POJO and given an official approval as the external interface is protected form unwarranted change, and the developer is granted the ability to grow the POJO into the rich domain model they want.

This approach places constraints on the evolution of a domain object so it’s not recommend that some time be taken to get the domain object as close to the use case/user story as possible. Any evolution will be quickly detected and signal integration or legacy concerns immediately.

Anemic models can be address with a slight change in build process and a stronger embrace of the binding tools that typically generate them.