Monday, February 25, 2008

Architect Versus Developer

Architect
We are adding these architectural requirements to the project. The purpose is to bring consistency between applications and enable us to realize better re-use across our code base.

Developer
Those "requirements" to add any value to the application from my customer's perspective. They introduce complexity and risk as an added "benefit."

Architect
The complexity is a balancing act. While it may introduce some complexity to your application, the goal is to reduce the total complexity as taken from an organizational perspective. In other words, while we may add some here, we balance it out by reducing complexity in other place and by having consistency, which makes complexity more tolerable.

Developer
In the meantime, I'll never get my application done because I'll be building an infrastructure that doesn't give me an application. Customers don't buy architecture, they buy features.

Architect
But architecture enables features to be built. Done carefully, it provides for cost reduction and faster response in development. It also allows for better deployments, which means happier customers.

Developer
Go back to drawing your pictures and leave me alone to build my application.

Architect
It is that myopic attitude that codes us into a corner every time. Somebody has to look at the big picture.


I'm in an interesting dilemma. I'm the architect. I'm supposed to be thinking big picture. But I'm a developer at heart. I think pragmatically. This is leading me to ponder a lot.

The Assignment

Take a series of products (web and thick) that have been developed over time and distance, brought together now mostly by acquisitions, and meld them into a seamless whole. In the process, open up the internals so customers can customize their workflows using our products. Oh, and do this without disrupting development.

Do this without disrupting development? Architecture should never be a disruption to development; it should be the foundation. As such, if it is being seen as a disruption, that means that there is something out alignment. Until your environment is realigned, you will struggle. From my experience, these are the most likely to be out of alignment.

The Architecture: The architecture itself may be causing the misalignment. Most often, when the architecture is at fault, it is because it is too complex for the problem it is trying to solve. This tends to cause ripple effects through the organization as people fight back. Since this is the architect's primary domain, this should always be checked first. Be honest about it. As architects, we have to decompose complex problems, but it is possible to over do it.

Far less often, but still possible, is an architecture that doesn't accomplish what it is setting out to do. If the architecture isn't complete, it will be difficult for people to catch the vision.

Developers: Developers and architects should not be constantly at odds, but sometimes it can feel that way. In my experience, the most challenging developers are the ones who want to know "Why?" It isn't that they shouldn't be allowed to ask, but rather that answering that question can be time consuming.

Answering "Why?" is a two-fold process. First, each aspect of the architecture should correspond to a real business driver. Second, there should be management agreement with those drivers. Ideally, all developers will be satisfied with the business drivers, but at some point, you may be forced to say "because it's the way we are doing it" and without the management agreement, you won't get anywhere.

Management: Management can throw things into confusion if they do not have a firm understanding of the answers to "Why?" There are problematic developers that will try to use management's lack of understanding. Management doesn't need to know the details, but they need to be sold well enough that they will back you up.

But management support isn't just about the problematic developer. It is also about the costs incurred while building the architecture. There is a cost associated with building an architecture, but it is a cost that should have positive business impact. Make sure you have management alignment.

I don't have things aligned yet. The fact that my assignment is coming under the terms of "don't disrupt development" means that management isn't clear on the value of changing the architecture versus continuing down the siloed application path. Instead, there needs to be a clear understanding that we are trading off a certain set of features for the opportunities that a consistent architecture provide.

And, honestly, if I'm having a conversation with myself like that one above, then I need to work on "why's" myself. If you aren't aligned with yourself, well, then you have a real problem.

Wednesday, February 20, 2008

Language Shootout Followup

In my previous post on Our Language Shootout on choosing between Jython, JRuby, and Groovy, I made the following comment:
"This is especially important around the APIs, since it uses the Java APIs directly..."

Which received this question:
What caused you to come to that conclusion? I can certainly see it, but only if you continue to program Java but in Groovy instead. If that's the case, why take the performance hit just to write the same (type of) code? (You could just as easily "not learn" the other languages and program Java in their syntax.)

Rather than continuing down the comment trail, I wanted to respond to this specifically because it was a very important factor in our decision.

I firmly believe that for us to see the benefit of using a language like Groovy, we have to change the way we write code. It isn't just a matter of writing Java without types; and, if that is what it becomes, our experiment will fail. Regardless of whether we are using Python, Ruby, or Groovy, we better be writing idiomatic Python, Ruby, or Groovy code and not writing idiomatic Java code in said language.

Personally, this actually pushed me more towards Ruby, as I think it would force people to abandon their Java ways a little more forcefully. However, there are real business factors that need to be balanced. In my previous post, I talked about the current disruption versus the long-term gains. Organizationally, we have to be careful how much current disruption we take.

So that brings me back to my quote and the question raised. As I mentioned, every one of these language implementations can use the Java APIs. However, beyond the Java APIs, Jython and JRuby also include the Python and Ruby core libraries as well.

What that means is that, in Jython and JRuby, you now have multiple ways of skinning the same cat: one way is through the language core libraries and one is through the Java APIs. In general, that probably isn't a big deal, except the languages are optimized around their respective libraries. For me to write idiomatic Python, I really need to learn the Python core libraries. The same holds true for Ruby as well as Groovy.

But with Groovy, those core libraries are the Java APIs, which means that my developers don't have to learn a new API set in the process. That reduces the disruption in my current development process.

Now, that said, I also think that the Java APIs will reduce that top-line efficiency that could be gained. So this becomes a financial decision that includes things like amortization and the time-value of money. Given our current needs, I can't amortize this investment over much more than six months. Add to that the fact that money today is more valuable than that money in six months.

So that is why we felt that Groovy won that point for our company's needs at this particular time. If I were starting from scratch, I can guarantee that it would be the same decision as some of these Java-transition sticking points wouldn't have the pull and instead top-line efficiency would be paramount.

Not that Groovy is a bad place to be. :)

Tuesday, February 19, 2008

Our Dynamic Language Shootout

As a fresh lead architect at a company coming late to the world of web based software, I've been given some significant challenges, which I'll be writing about over the coming months. We have some very interesting challenges as we take a 20 year old architecture and move it into the 21st century. Our solutions currently leverage a combination of C, Java, Perl, and others.

One of the things that is clear to us is the need to quickly revamp our user interfaces. There has been a LOT of discussion around about the benefits of dynamic languages and the ability to be more nimble when using them. As a result, we are making an investment in a dynamic language to help us in that regard. As we surveyed the landscape, we felt that our best options to look at were Ruby, Python, and Groovy. Scala is an interesting option, but we did not feel that we were ready for that significant of a switch.

For a variety of deployment reasons, we've decided that whatever we choose will be deployed on the JVM. As a result, this comparison is for the JVM versions of the languages, e.g. JRuby, Jython, and, of course, Groovy, which has no other deployment option. I want to also clarify that I have the most experience with Python and I really like the language. There is no doubt that the language influenced me in my evaluation, but I really tried to remain objective in spite of that.

As I did the evaluation, I tried to come up with a broad spectrum of important information. Others at my company gave feedback on the important characteristics. In the end, these are the features that we felt were most important: the interaction between Java and the selected language, the IDE support, the learning curve, existing web frameworks, and the existing community support for the JVM implementation of the language.

Java Interaction

Several factors make up this feature. The most obvious is how easy it is to call into Java. Since we have a large amount of code in Java, we need to be able to easily access it. Of course, all of the languages manage this without any problems.

The more interesting aspect of this is what happened the other way. All of the languages support compiling down to byte code, but how difficult is it to access code written in the language for Java. Also, since each of the languages are, in some way, a super-set of Java functionality, there needs to be a down-cast to the Java sub-set. What did that look like?

Groovy: Groovy was, without a doubt, the most straight-forward. Because Groovy supports applying types, overriding class methods is clean. Instantiating a Groovy class is the same as instantiating a Java class.

Jython: Jython is pretty similar to Groovy in its bi-directional support. It isn't quite as clean as the Groovy implementation as you are forced to use Docstrings to provide the additional type information that the class needs.

JRuby: Going from Java to JRuby is not trivial, even though JRuby compiles down to a class. The compiler seems to be primarily for faster JRuby-to-JRuby interaction.

Winner: Groovy

IDE Support

In the Java world, the IDE reigns supreme. As I sit here typing this blog in Emacs, I'm perfectly comfortable leaving the IDE behind. In reality, most of our Java engineers would not be. The flip-side is that, with dynamic languages, the needs of the IDE are less than they are with Java. Our organization has standardized on IntelliJ IDEA, so that colors this.

I did not spend a lot of time looking at language-specific IDEs. Since our developers are Java developers and will continue developing Java code, we'd prefer to have them be in one environment.

Groovy: IntelliJ has a really good Groovy plug-in. IntelliJ seems pretty committed to Groovy as well. Honestly, the support was good enough that I didn't look at the Eclipse support. That commercial-level support is comforting.

Jython: PyDev with its commercial extensions was pretty good, if a little buggy. As I said, though, IDEA is our chosen platform, so a switch would be disruptive.

JRuby: There is an Eclipse plug-in for JRuby, but it was pretty weak. The IntelliJ plug-in seemed to be better.

Winner: Groovy

Learning Curve

We recognize that there is going to be a disruptive effect by bringing a new language into our environment. We know that, for some amount of time, productivity will be reduced with a follow-on increase in productivity. The variables that come into play are how long does it take to come back to current levels of productivity and how much of an increase in productivity do we gain when the line flattens out at the end.

In the end, this is all supposition and subjective blather. Take it for what it is worth, and remember we are talking about Java engineers here.

Groovy: As a super-set of Java, it has a very straight-forward learning curve from Java. This is especially important around the APIs, since it uses the Java APIs directly. I honestly don't know whether the top-line productivity is as high as Python and Ruby, but I don't have any evidence that it is not. My gut feel is that the Python and Ruby libraries are optimized more towards their languages and will give a higher top-line.

Jython: Python's pseudo-code syntax is a short hop from Java. While the Java APIs can be used, they aren't going to be as efficient as the native Python libraries. The biggest hurdle is the learning curve of those libraries.

JRuby: Given its closer functional ties, the learning curve for Ruby is highest of the three. It also has the same issues around the Java and native libraries. I honestly think that, once the curve is passed, JRuby could offer the most productivity. I've been nothing but impressed by what I've read about Ruby in that regard.

Winner: Groovy

Existing Web Frameworks

To a greater or lesser degree, the entire Java web world is open to each of these languages. However, the thing that made Ruby so powerful was Rails. Similarly, compare Python alone versus Python with a mature framework like Django. Groovy followed Ruby's lead by adding Grails, based heavily on Rails. These frameworks leverage the strengths of these languages, and, in my opinion, that is a significant piece of what makes these languages great.

Groovy: Grails is based on Rails, with the "heavy lifting" underneath being done by Spring and Hibernate. I like the maturity of the underlying technologies. I think Grails is on its way, if it doesn't get usurped by the Java platform desire to make everything unbearably complicated. Given Groovy and Grails heavy Java emphasis, that is a major concern of mine.

Jython: *sigh* is all I can say here. While CPython has some great options, Jython went nowhere for two years. The main cause of this is two-fold: Jython's current version is 2.2.1, whereas CPython is 2.5 and so many frameworks require compiled C code for performance. Jython is just now coming back from that hiatus, but there aren't many options available for it. It looks like Django will be available soon, which will give it a much-needed boost, but in the meantime, it is a pretty desolate sphere that pretty much requires you to use a native Java technology.

JRuby: With its direct port of Rails, JRuby seems to come out on top here. Rails is a great package with some great options. JRuby does suffer some of the same compiled C problems as Jython, but since Rails is really the only web framework for Ruby, all focus could go towards that. Python does not have a "one and only" framework in the same way.

Winner: JRuby

JVM Community Support

We have an existing install and knowledge base built around the JVM that we are keeping. The disruption of moving to another deployment platform would be outrageous in a real business environment. As such, we focused looking at community support to the JVM support community. In the end, community support will make or break all of these languages.

Fortunately, regardless of choice, they all have some great communities. There are exciting things happening in each of the communities. Honestly, I with I had more time so I could participate more deeply in the communities.

Groovy: As the JVM is the only target for Groovy, the entire Groovy community is the JVM community. This obviously has some significant advantages for people looking to deploy on the JVM. It also seems to be picking up a lot of mind-share as the defacto "Java Scripting Language", which is helping that community.

Jython: As I mentioned above, Jython went through a dry period for a few years. That seems to have ended and a lot of exciting things are happening. First, the Jython community is doing a significant upgrade to bring Jython to the 2.5 Python specification. Second, PyPy is doing some very exciting things with Python overall, including the ability to target the JVM, LLVM, and C, and JavaScript back-ends in an optimal fashion.

JRuby: Sun made a nod to the JRuby community when it hired the core JRuby developers. There is a lot of effort being made to make JRuby a better deployment option than CRuby, and I honestly think it has some great possibilities.

Winner: Groovy (by a nose, and only because of the number of people using it)

Conclusion

I don't think it should surprise you at this point that we chose Groovy. Even being openly biases towards Python first and Ruby second (hey, it's cooler :), I could not, in good conscience, choose either of them for melding into our existing environment.

If I were starting from scratch on a project, my choice would be very different. If I wanted to target the JVM, I would choose JRuby (at least until Jython 2.5 and Django are available); if I wasn't targeting the JVM, then it would be, for my Python, but I'd be equally comfortable choosing Ruby.

Regardless, it is going to be exciting to breath some new life into some stilted development practices. I have good confidence that we will be very successful with this. In a later post, I'll discuss some of the ways we are going to be using Groovy and how we will decide on using Groovy or Java for the development of function points. I will also come back and discuss whether we get the benefit we hope out of this, but that will be some time before that can be determined.

Saturday, February 16, 2008

An Arc-Tangent

There has been a lot of talk going on about Arc, Paul Graham's LISP derivative. I've been watching this discussion with some interest, not because I'm a LISPer, but because it is bringing up some interesting questions about programming languages and software development.

First, I want to be clear that this is not a critique of Arc. As I said, I'm not a LISPer. I would not be qualified to say much about the language itself. I have a great amount of respect for what Paul is trying to do: he is trying to make an appreciable difference in how software is built. Whether I agree with him or not does not matter in that context. He is putting his sweat into his beliefs. In that context, even if I totally disagreed with everything he said and did, he has still done it.

What is driving this post is the question of what makes a language "best." For instance, the primary tenant driving Paul's development is code size:
...making programs short is what high level languages are for. It may not be 100% accurate to say the power of a programming language is in inverse proportion to the length of programs written in it, but it's damned close.

I wholeheartedly agree that brevity is an important aspect of high level programming languages. I've switched from Java to Python in the majority of my work for exactly that reason. However, I do not think that the primary aspect of importance is brevity. In fact, I would go so far as to say that there is no primary driver of a good language.

So what would my idea language look like? Like I said, brevity is important, but just as important is self-documentation. With those two items, you have a very good start to a language. However, those are not the only important aspects, in my mind. Added to brevity and self-documentation are clear flow control, a strong set of built-in libraries, and an emphasis in simplicity. In the end, what I'm looking for is an efficient language.

There is no doubt there is a relationship between code size, program grok-ability, and development speed. Code that is too long requires too many page faults to understand. Development speed is similarly related, where longer code has more constructs that need to be developed, tested, and debugged. Quite literally, the more you can do with a single line of code, the less opportunity you have to introduce a problem.

But can that go too far? For example, can you say that code that is too short slows down development? Can you put too much on a single line of code? At some point, I think the answer is 'yes'. At some point, code reaches a sufficiently dense size that it requires multiple mental translation phases to expand to a reasonable vocabulary, and therefore reasonable understanding. The litmus test there is "Do I think I need to comment this code to understand WHAT it is doing?" (as opposed to WHY it is doing it, which may be valid in any case). This is very hard to measure because the line of "too short" is going to vary significantly by developer and experience with a particular language, but I do firmly believe it is there. Furthermore, for code that somebody else will have to read (including yourself in the future), if you pass that point, you are going to pay a penalty in the future.

So, regarding brevity, the ideal code length is the point where grok-ability and development speed curves have the most area under them. Java, in my opinion, is high on grok-ability but very low on development speed. Perl is low on grok-ability but high on development speed. Python has come in just right, for me. My short experience with Ruby and Groovy seem to put them in that category as well. Your mileage may vary.

Grok-ability brings up an important point, which is self-documentation. To what level does a language encourage self-documentation. If a language requires constant commenting or some other form of context switch to understand it, then it is probably not a highly efficient language. This, of course, is also impacted by one's knowledge of said language: the more you know the language, the easier it is to understand the language, the more self-documenting it is. This is most important in the context of interacting with others' code.

An interesting challenge with self-documentation is understanding all of the ways code flows through the system. Branches, loops, goto's, breaks, labeled breaks, exceptions, come from's, alter, signals, continuations, function calls and undoubtedly others that I've never heard of, all increase the difficulty in understanding what the code does. Obviously branching and loops are necessary, but at some point, the complexity of it all might just overwhelm. As an example, a great many people consider exceptions to be evil. I'm not one of those, but a language ought to understand the implications of its flow control on the people who are both writing and reading the code.

If brevity was everything that mattered, APL would have a much larger mind-share than it does. Included above are some of the things I think are important, but they are not all of them. As you go through your process of choosing a language, make sure that you understand your needs. And, if Arc is the right one for you, happy trails for you.