Monday, October 02, 2006

The Java Closure Spectrum

I have little doubt Java 7 will introduce closures in one form or another, but which form exactly?

On one end of the spectrum, anonymous inner classes already take us part of the way today, but their clunkiness leaves much to be desired. On the other end, Neal Gafter et al proposed an ambitious, Ruby-like extension dubbed BGGA closures (short for Bracha, Gafter, Gosling, Ahe, the authors' last names).

You'll find the power to weight ratio along the spectrum between the current clunk and adopting the BGGA proposal non linear. In that vein, Josh Bloch, Doug Lea and I propose Concise Instance Creation Expressions (CICE).

Loosely speaking, simple syntax sugar for anonymous inner classes buys Java 90% of the power of BGGA closures while carrying only 10% of the weight. We think it's the "knee in curve" where we get the most bang for our buck.

As for that remaining 10%, while the other leading brand omits a class name and supports non local returns and limited custom control constructs, CICE closures strike a fine balance between brevity and explicitness and raise the already famous Java readability bar.

Without further ado, Concise Instance Creation Expressions: Closures without Complexity. Let us know what you think.

30 Comments:

Blogger Eugene Kuleshov said...

Is exception handling left to those 10% you leave outside of scop?

By the way, how about annotations on those closure methods?

7:24 PM  
Blogger Stephen said...

This definitely seems like a better proposal. Many feel (validly, I think) that the increasing complexity of the language (generics for instance) is a real risk to the readability and maintainability of code written and Java, and the future popularity of the language. BGGA seems to add too much complexity for too little gain; CICE definitely seems to have a simpler conceptual model. I hope all you smart folks figure out the perfect point on the curve to do closures right in Java.

7:30 PM  
Blogger Bob Lee said...

Eugene, better generic support for exceptions is orthogonal to closures and not part of the remaining 10%--it looks like we'll get it regardless of what happens with closures. As for CICE, the closure can throw whichever exceptions its SAM type allows.

Good question about annotations; the answer is up for debate.

8:10 PM  
Blogger Eugene Kuleshov said...

Bob, I just meant that BGGA proposal allows to rethrow exceptions from within closure body. It would be shame to declare them in methods for CICE closures.

WRT annotations on closure methods, I don't see why not, because they will be just method annotations after all. On the other hand Neal for some reason was against annotations for BGGA.

10:31 PM  
Blogger Bob Lee said...

Eugene, you don't need to declare exceptions with CICE; they're simply copied from the SAM type just like everything else.

10:57 PM  
Blogger Eugene Kuleshov said...

Nice

11:07 PM  
Blogger Thomas said...

Bob, I actually believe that the code fragment below will work:

for (public int taskId = 0; taskId < NUM_TASKS; taskId++) {
executor.execute(Runnable(){ newTask(taskId); });
}

The reason I think so is that newTask(taskId) will copy the current value of taskId (pass by value to the method). Only after it is copied do we get back to the loop and the increment of taskId. Or am I missing something?

3:20 AM  
Anonymous Damon Hart-Davis said...

Hi,

All I really want from this is Curried functions for the logical clarity and wonderful partial-pre-compilation HotSpot-type-goodness this could bring in some places. Also, a pure-functional usage should bring some opportunities for safe automatic threading by the JVM.

I really liked functions in Standard ML 20 years ago, and some of the same in JDK 7 would be great.

Rgds

Damon

5:01 AM  
Blogger Ricky Clarkson said...

The strange rules on local variables seem, er, strange!

Local variables *read* by a closure would be implicitly final; that is, they would have to be 'definitely assigned'.

Local variables *written to* by a closure would have to be specified as public.

Personally I would prefer to go one of two (and a half) ways.

1. Unsafe but concise.

Make all local variables used by the closure into heap variables, as opposed to stack variables. The problem is that you may use the variables by accident, but that problem exists with anonymous/inner classes too.

If those locals can be made into finals, i.e., if the enclosing method definitely assigns, once, and the closure doesn't write to it, then you can keep it as a stack variable.

The upshot is that the programmer doesn't need to care whether it's final, or about 'publishing' variables to the closure.

2. Safe but verbose.

I'm not sure whether 'public' is the right word, but it's the one chosen in the spec. Make all variables that are used by the closure have to be explicitly public.

2.5. Same as above, but only make those that are written by the closure have to be explicitly public. Non-finals promoted to the heap automatically.

Another option instead of public is an annotation. Perhaps:

@Published int x;

Or @Published(READ_ONLY) int x;

Suppose a method calls two closures - I don't think there's a way of telling the compiler which closure can access which variables. Maybe this doesn't matter (it would matter more for bigger methods).

6:15 AM  
Blogger plesner said...

The proposed syntax might introduce local ambiguities in the grammar, situations where you have to read arbitrarily far ahead to disambiguate an expression:

foo(X < A, B, C, D, E > ( Y y ) { ... });
foo(X < A, B, C, D, E > ( Y ));

To determine whether "X < A" is a boolean expression or the start of a closure you have to read ahead past the 'Y'.

Maybe you have to put the 'new' keyword back in the syntax?

8:45 AM  
Blogger Bob Lee said...

Thomas, it will create a copy of taskId when the closure executes (possibly after the loop), not when the closure is created.

9:29 AM  
Blogger Bob Lee said...

Ricky, another option is "!final". (Peter von der Ahe came up with that.) ;)

I like "published", too, but we don't want to emulate what should be language keywords using annotations. Adding java.lang.Published is equivalent to adding a new keyword anyway.

Plesner, I don't think parsing ahead is an issue; couldn't you make the same argument about invoking static methods with type parameters?

9:34 AM  
Blogger Eugene Kuleshov said...

!final is nice one! But I'd prefer ~final better. :-)

9:42 AM  
Blogger Ricky Clarkson said...

When I first read about annotations, I thought 'great, hopefully we can use these instead of new keywords for each release'. However, yes, I see that this is flawed, as the intention of annotations was not to provide new grammar. They are too limited for that anyway.

I think what I was looking for are lisp macros (not that I know what those are (but if I keep typing with parentheses (that's these things ()) then I'll probably be most of the way there)).

I don't think you can make that argument about invoking static methods with type parameters, because you have to fully qualify those.

ClassName.<Param>method();

This is very annoying, I wish the generic type inference was better. I tend to make a named variable to avoid that syntax, triggering the limited inference, e.g.:

import static fpeas.function.FunctionUtility.createIdentityFunction;

Function<String,String> identity=createIdentityFunction();
doStuff(identity);

rather than:

doStuff(FunctionUtility.<String>createIdentityFunction());

To the point: .< is not ambiguous. In other generics situations, the stuff to the left and right of < is typenames, not variable values, so there is no ambiguity.

In C++, you have to put a space between >> when used for generics (e.g., list<list<string> >), but Java has so far managed to avoid that kind of thing.

9:50 AM  
Anonymous Quintesse said...

Well I must say this proposal does not do anything for me, writing a bit shorter code is not what it is all about for me, I still wouldn't be able to write things like:

items.find(foo) {
return foo.isWhatWeAreLookingFor();
}

which is one of the reasons I would want closures. I don't mind having to go through a lot of trouble making the API (the implementation of find() here) as long as its use is extremely simple!

That doesn't mean that I'm not worried that it will too much complexity to the language, I'll just wait and see how the discussion turns out.

11:13 AM  
Anonymous Howard Lovatt said...

I think this is a very positive suggestion, but I need to show my bias. It is similer to an RFE I put in a while back:


Shorter Syntax for Common Operations

7:31 PM  
Blogger swankjesse said...

It seems like CICE misses more than 10% of the benefits of BGGA. Methods like withLock(), closeAtEnd() are very powerful for improving the readability of Java code. The beauty of these is that they allow us to simplify existing code without restructuring the continue, break and return statements.

But I agree that BGGA is quite complex and I'm glad to see a simpler approach.

Adding closures to Java might be just as difficult as adding object-orientation to C.

1:03 AM  
Anonymous Reinier Zwitserloot said...

I've been making a racket amongst all java-using professionals I know about the BGGA proposal, saying pretty much exactly the same thing - BGGA is waaaay too complicated for something that single-method interfaces already do virtually as well with FAR more readability and 'java-ness'. In fact, I had a proposal all ready to go, and it was exactly the same as yours, with the only difference being that I limited it to just interfaces and did not lop off the 'new' expression, so that parsers might have a slightly easier job with it. But the flexibility to be able to expand single-abstract-method abstract classes is sufficiently powerful (especially given stuff like LinkedHashMap and the like) that I like CICE even more.

CICE will continue the brilliant improvements to java, in the same vein that 'is' java, as the java5 changes. BGGA will try to turn java into some weak knockoff of Python, being neither as useful as Python or Ruby, nor as powerful in how it scales in the development process as java. BGGA would be a monumental mistake. CICE is -exactly- what's needed.

CICE for president^H^H^H^H^H^H^Hclosure proposal!

By the way, guys, do you need me to do anything to try and make this happen in favour of BGGA?

9:25 AM  
Anonymous Josh Bloch said...

To swankjesse: Funny you should mention this. I quite agree with you that "before-after" constructs such as withLock and closeAtEnd are important. So important, in fact, that they deserve their own purpose-built construct, as they have in C# (the "using" construct). I discussed this at the Java workshop at OOPSLA two years ago, the same year that Herb Sutter singled this out as Java's biggest shortcoming. At Sun's request, I wrote a proposal for this feature. This proposal is currently under consideration for inclusion in Java 7. Tell me what you think of it!

12:12 PM  
Anonymous Reinier and Roel said...

By the way - as far as the API is concerned: A friend and I also think the APIs should change to (strongly) favour SAMs. For example, WindowListener needs to be split into a number of much smaller listeners, each with only 1 method.

The API can be extended with abstract methods that define null implementations for all declared methods, except for one, which would be abstract. That way, you can write stuff like:

addWindowListener(CloseListener() {doStuff();});

12:58 PM  
Anonymous Reinier and Roel said...

One more note:

Check out the current version of Comparator.java. It actually defines -two- methods, not one.

public int compare() we all know about, but it also defines:

boolean equals(Object obj);

why? I have no clue. Possibly to allow adding extra javadoc clarifications.

Thus, the CICE spec needs a slight expansion:

CICE notation picks that method which is not already defined by Object (only applies to interfaces, not abstract classes). If there is more than 1 method in the interface not already defined in Object, generate the usual error - it's not a SAM.

1:01 PM  
Anonymous lf said...

My favorite parts of the original closure proposal were the type
inference and exception handling...

The original closure proposal allowed this:
enumerateMap(Key k, Value v : map) { k.sploot(v); }

This proposal, from what I gather, would require something like this:
enumerateMap(map, TwoArgCallableWithException<Key,Value,throws FooException|BarException>(Key k, Value v) { k.sploot(v); });

... which is of course still better than the 1.5 equivalent,
enumerateMap(map, new TwoArgCallableWithTwoExceptions<Key,Value,FooException,BarException>() {
  public void handle(Key k, Value v) throws FooException, BarException {
    k.sploot(v);
  }
});

The last section of this proposal mentions (possibly) doing type inference. If we get that plus some syntax to pass parameters
to the constructor of an abstract class, we can abuse the proposal by making some badly-named classes like these:

public abstract class withLock {
  protected abstract void run();
  public withLock(Lock lock) { ... }
}

withLock(lock) { ... }

public abstract class enumerateMap<K,V,throws X> {
  protected abstract void method(K k, V v) throws X;
  public enumerateMap(Map<K,V> map) throws X {
    for(Map.Entry<K,V> entry : map)
      method(entry.getKey(), entry.getValue());
  }
}

enumerateMap(Key k, Value v : map) { k.sploot(v); }

Ouch :)

1:10 PM  
Blogger Lachlan O'Dea said...

I love it. A very straight-forward improvement with easily demonstrable benefits and very little extra complexity.

The BGGA proposal puts a lot of effort into a new syntax for declaring anonymous functions. It has new | syntax for delimiting exception names and redefines the meaning of null, for example. However, the current idiom of declaring "function types" via interfaces is not a major problem. The main problem is that creating instances of those function types is a pain, and this is the problem CICE is attacking.

7:55 PM  
Blogger Sony Mathew said...

I have to say I'm not a big fan of this proposal. Its got too many inconsistencies and clarity is lost.
I completely dissaprove of the removal of 'final' and the use of 'public'. I find having 'final' is clear when accessed by annoynmous classes. Additionally, the change in meaning when defining Annoymous classes of SAM types is just confusing.

I would prefer adding in a new constructs altogether: For example (off the top of my head).
(Not sure if syntax collides with any existing semantics).

Block block = { return foo(); }

try {
block.exec(any,number,of,params,of,any,type);
}catch(BlockExecException x) {
x.getCause().printStackTrace();
}

//Cast Blocks to specific methods that one chooses to implement.
Runnable runable = (Runnable#run())block;

12:39 PM  
Blogger Xavi MirĂ³ said...

Josh,

is there a Weblog or web page where we can comment on your proposal about automatic resource management ?

Regards,

- Xavi

3:30 AM  
Anonymous Firefight said...

I like the proposal, it's simple and clean. I think keeping the new keyword would a good thing to remove ambiguity, if not for the compiler then for the human reader.

But, if that's all that closures are about then I'm more then a little dissapointed! Having a return or break in the middle of the construct like is allowed in BGGA seems like you would need more API documentation then for inheritance: you would probably need to see the actual source to be sure what you were doing. So much for abstraction.. now you would be dependant on the implementation.

11:32 AM  
Anonymous Bharath said...

Bob, I fully, fully agree with you. The proposal put forth by Neal Gafter et al is frightening, to say the least. I'm posting (a slightly edited version of) my comment on the java.net editorial expressing concerns:

"Speaking of Closures, its hard not to agree with Josh Bloch on the new Closures proposal. He expressed serious concerns in his interview about further changing an already complex type system just to accommodate Closures. The current proposal seems more like engineering for the sake of it without thinking about the learning curve for a newbie. Open sourcing Java will be mostly futile if we go the C++ way by introducing unnecessarily complex changes to the language based on the whims of a few theorists (even if they happen to be the pioneers in language theory). We need a more pragmatic solution that keeps in mind (the barriers to entry for) the average "Joe Java" as Josh says. While generics weren't over engineered to ape C++, I already notice newbies wincing at the prospect of having to understand them. That being the case, we need a more practical approach to implementing Closures instead of merely looking to satiate the engineering appetite of a few intellectuals."
I hope the better proposal wins, sanity prevails and the language benefits in the end.
By the way, Josh, the auto resource management proposal too is very elegant and simple. Hopefully, the JCP will accept it along with your closures proposal.

-Bharath
P.S: It'd be great if we could have a separate place to post comments of the auto resource management proposal.

9:43 AM  
Anonymous Anonymous said...

This is great to see. I suggested a similar syntax for the exact same thing in Reddit comments a while back. It's gratifying to see you guys proposing what I think is the simplest Closure extension to an already far too complex language. Java's syntactic and semantic complexity is astonishing given how Lisp does much more with so much less to learn.

One problem I have with the proposal: your way of getting around Java's final declaration. I think there is a much simpler, far cleaner approach: if a closure requires setting the value of an outer local variable that has to be declared final, say, final int foo;, then the declaration is automagically changed to final int[] foo = new int[1];, and all references to foo are just changed to foo[0].

The other option: just have Java 7 get rid of its stupid final requirement for closed local variables.

7:39 PM  
Blogger Andrew said...

I like the CICE proposal generally because of its simplicity and usability and I think the Java community could accept it if it were limited to what its name implies: a concise way of creating instances of anonymous classes (for classes with very constrained characteristics). The problem comes when the proposal goes beyond its CICE name and seems as if it is trying to provide closures (e.g. sections III and IV). Unfortunately (and exactly as stated in this article) the CICE proposal does not specify binding ALL lexically scoped constructs - only local variable names - so it probably cannot be accepted by the Java community as addressing closures.

With regard to the CICE-ness of the proposal, I agree with some other posters here and recommend that the syntax not allow dropping "new" on the instantiation. The "new" keyword is critical to the reader's comprehension that an object is being created and should not be treated lightly.

With regard to the closure-ness of the proposal, I have objections. First, section I of the proposal states "Since release 1.1, Java has had closures in the form of..." I think that is just not true. Making a closure requires more than just the ability to pass one function into another function and invoke the first function from within the second. While such function passing and calling is necessary, the passed function is not closed unless it's context is bound when the function is passed. As such, this proposal falls well short because it places almost no requirements on binding the passed function's context. Moreover, the rules proposed in section III seem almost as arbitrary as the current "annoying final" in Java.

It is annoying that Java requires local variables be marked final to be accessible by code in an anonymous inner class, but the right solution is not to further emphasize that arbitrary rule by having the compiler force all variables that could be final to be final (as is specified in section III). (The reason I say the "final" requirement is arbitrary is that there is no inherent Java language need for the constraint.) The correct solution is to demand that the language eliminate the constraint because finality should be independent of accessibility and "final" should modify only assignability rather than being overloaded to affect accessibility. Moreover, the requirement that local variables be "public" (or even "@Published") is contrary to normal Java lexical rules where local variables in any enclosing block are accessible by code in an enclosed block. (It seems to me that if there were a need to hide an enclosing scope's local variables from any of its enclosed code blocks, that would require some keyword or annotation. Java does not currently have a syntax for doing that, but I can imagine that Java could be expanded to allow a local variable to be declared "private" so that it would not be accessible to enclosed code blocks. That change to Java would not be specific to closures per se.) Using that argument should also doom any proposal to allow variables to be declared "!final" (or "~final").

There is a curios constraint in section III of the proposal that states "Formal parameters and for-loop variables may not be qualified as public." Notwithstanding that I've already rejected having to specify "public" on any local variables, I'd like to investigate why such a constraint would be imposed because I think this may be instructive for the general problem of closures in Java, or, more specifically, for the problem of binding the context of a method when it is passed (making a closure). In the proposal, the justification for the constraint is demonstrated using the following code snippets and claims:
1. for (public int taskId = 0; taskId < NUM_TASKS; taskId++) {
executor.execute(Runnable(){ newTask(taskId); });
}
is claimed to most likely not do what the author intended, and
2. for (int i = 0; i < NUM_TASKS; i++) {
int taskId = i;
executor.execute(Runnable(){ newTask(taskId); });
}
is claimed to do what the author intended.
When I read the code snippets, I concluded that either they will both fail or they will both work.

Let me be more specific. If the first code snippet fails as described in the proposal, it fails because a reference to "taskId" is bound in the closure. Assuming the same rules apply to the second code snippet (i.e. a reference to "taskId" is bound in the closure), surely it would fail also (but possible in some other less predictable way). If the second code snippet is successful, it could be so only if the value of "taskId" were bound with the closure. But if we assume the same rules apply to the first snippet (i.e. the value of "taskId" is bound in the closure), then surely it would be successful.

So what would make the authors of this proposal think that such a code change would make the second snippet work where the first fails? One possibility is that someone thinks that a new "taskId" variable is magically "created" on each iteration through the for loop and that the various references are then bound with each iterated closure. But that can't possibly be being proposed because it would be so wrong. In fact, that would truly be a generally misunderstanding of a large part of programming theory. The only other interpretation I can make where the first snippet fails and the second is successful is that in the first snippet, the reference to "taskId" is bound with the closure and in the second snippet, the value of "taskId" is bound for the closure. In the code snippets, the only real difference is that the variable being bound is "public" in one and not in the other. So does that mean that in this proposal, making a local variable "public" has the effect of binding its reference for the closure and that when a local variable is not "public", its value would be bound for the closure? If that is the case, it is a very bad idea.

At this point, astute readers have already recognized another obvious problem with that example. Specifically, "taskId" is a primitive and in the scenario described above where the code snippets fail, it appears that a reference to "taskId" is bound for the closure. A reference to a primitive? What does that mean? How un-Java-like. The good news is that in the scenario that succeeds, the value of the primitive is bound with the closure, which is totally Java-like. But this does raise the spectre of what would happen if "taskId" were not a primitive "int", but instead an object reference to an instantiated Integer. To be Java-like, the compiler would bind the value of the reference with the closure and then a code snippet similar to #1 above:
1a. for (Integer taskId = new Integer( 0 ); taskId < NUM_TASKS; taskId = taskId + 1) {
executor.execute(Runnable(){ newTask(taskId); });
}
would probably not do what the author intended, but a code snippet similar to #2 above:
2a. for (int i = 0; i < NUM_TASKS; i++) {
Integer taskId = new Integer( i );
executor.execute(Runnable(){ newTask(taskId); });
}
probably would do exactly what its author intended.
Note: 2a works specifically because it intentionally performs what I identified in snippet 2 as being not the case (i.e. the creation of a new variable for each iteration).

This discussion just leads back to the problem of closures generally, which is primarily: How can the bound context of one code block be preserved when the code block is passed as a parameter into another code block? More specifically for Java: How would Java need to change to support binding the context for a method and passing that bound context to another method? The CICE is nice, but it does not comprehend the complexity of the closure problem.

4:05 PM  
Blogger Barney said...

I think that BGGA closures are about the worst thing that could happen to Java - a much bigger mess than type-erased generics.

It is exactly the kind of proposal which appeals to those dangerous individuals who only care that their code does what it says, not that it says what it does.

If I were such an individual I would long ago have abandoned Java for Ruby or some other write-only language. Java has been so spectacularly successful precisely because it has championed readability over writeability.

I like CICE, by and large. It's simple and intuitive and is almost "gotcha"-free. However I agree completely with the previous poster that straying into "closure-iness" with the public variable idea is a bad move, with a much higher "gotcha" potential (synchronization issues with a parallel executor).

However, the syntax could and IMO should be even more concise in some very common cases (perhaps the majority of cases).

Where the SAM method has no parameters, or where I don't care about the parameters, it should not be necessary to pass anything but the code block...

executor.execute({doRun()});

button.addActionListener({buttonClicked()});

I think that in either of the above cases it is absolutely transparent what is happening. The only problem I can see is where the method accepting the SAM is overloaded on other SAMs - which seems extremely unlikely. In such a case the compiler would raise a carefully worded error pointing out the ambiguity.

In such cases, we really would have reduced anonymous method calls to their essence - passing a line of code into a method.

1:12 PM  

Post a Comment

Links to this post:

Create a Link

<< Home