Friday, July 13, 2007

Immutability Is Hard

I recently coded an immutable byte array wrapper which went something like this:
public final class Bytes {

  final byte[] bytes;

  public Bytes(byte[] original) {
    int size = original.length;
    this.bytes = new byte[size];
    System.arraycopy(
        original, 0, this.bytes, 0, size);
  }

  public byte byteAt(int index) {
    return bytes[index];
  }

  public int size() {
    return bytes.length;
  }

  ...

I need to write the bytes to an OutputStream in a couple places, so I added a method which does just that:

  public void writeTo(OutputStream out)
      throws IOException {
    out.write(bytes);
  }

Did you catch my mistake?

writeTo() exposes our internal byte[] to user code making our class not so immutable. For example, zeroing out the internal bytes is now as easy as:

  Bytes bytes = ...;
  bytes.writeTo(new OutputStream() {
    @Override
    public void write(byte[] bytes) {
      Arrays.fill(bytes, 0);
    }
    ...    
  });

To make our class immutable again, we must modify writeTo() to copy the bytes into a temporary buffer first, and then pass that buffer to the OutputStream.

It's times like this that I wish Java had immutable arrays (with runtime checking). Maybe I'll see what we can do about that...

25 Comments:

Blogger Unknown said...

You might also want to make your bytes member field private, otherwise I can write a class in the same package to zap it :-)

11:38 AM  
Blogger Bob said...

Not if I seal my package. :)

11:47 AM  
Blogger あじ said...

It doesn't matter if the variable is private, or if the package is sealed. I can directly modify supposedly immutable String objects via reflection. It's trivial, efficient, and dangerous.

12:03 PM  
Blogger Bob said...

Not if I install a security manager. :)

12:11 PM  
Blogger Ricky Clarkson said...

The real issue is that you can't create your own array type, with its own rules. It's a case of the language being more low-level than it needs to be.

If you could, then you could have a[b]=0 throw some exception.

You can ask the language designers to include the kitchen sink, but their kitchen sink becomes outdated. It's better if they give you, the developer, the tools to create and update your own kitchen sink.

While working within a language with an outdated kitchen sink, you might be better off not going to extreme lengths to prevent misuse, but relying on convention (apply a general rule - don't modify your parameters). I imagine you could, statically, check all OutputStream code that you're going to use, to see whether it actually writes to the byte[] you pass to it. If none does, then I don't think it's worth worrying about.

12:14 PM  
Blogger Eugene Kuleshov said...

Bob, Ricky, this is sounds like a meta type system and JSR 308 could help with that at some extent. So, you can put in your own compile-time type checker that could, for example, verify the @Immutable contract. You may want to try a prototype compiler and type checking plugins Michael Ernst published at the JSR 308 web site.

12:46 PM  
Blogger Bob said...

I think compile time checking would be nice to have, but we really need runtime checking as this relates directly to security.

12:51 PM  
Blogger Julien Chastang said...

Yep. Common mistake. Making your array final guarantees that the array reference will not be reinstantiated. However, the final keyword make no guarantees about the contents of that array.

I was recently reading O'Reilly's Java Generics. Therein is a section on "Arrays as a Deprecated Type?". Have you considered using an unmodifiable Collection instead in the internals of your class?

1:05 PM  
Blogger Bob said...

Well, I know keeping a final reference to an array doesn't make the array immutable. My error was almost forgetting that OutputStream isn't safe. Almost. A chain is only as strong as its weakest link.

Using a collection would not be a good idea. First, I'm not using generics, so there's no need to not use an array. Second, generics don't work with primitives, so I'd have to box each byte. This means each byte will now take up 4 bytes of memory, not to mention the overhead from the level of indirection surrounding each byte.

1:16 PM  
Blogger Eric J. Schwarzenbach said...

I agree with Ricky Clarkson. What cost are you willing to pay to prevent malice as opposed to preventing honest (and not-improbable) mistakes? An OutputStream could write over your bytes, but no sane OutputStream would. One might even argue that for an OutputStream to do so violates the semantics of the class, even if these particular semantics are unfortunately not enforced by the language or encoded into its syntax.

1:54 PM  
Blogger Sam Halliday said...

Often safe copies are the only way to do immutability, and remember that Java 6 added the Arrays.copyOf* methods to assist when working with arrays.

On a bit of a tangent, but while we're on arrays... one thing I'd love to have in Java is subarrays. Anyone know if that will ever happen? Python style syntax would be awesome. (I mean proper subarrays that modify the original array).

2:03 PM  
Blogger Bob said...

Eric, security and stability are both issues in this case, and the class may be used with untrusted code.

5:57 PM  
Blogger Unknown said...

.clone() is an easier way to copy arrays (particularly from 1.5).

8:25 PM  
Blogger Reinier Zwitserloot said...

I seriously disagree with the entire idea behind this post.

Arrays should be DEPRECATED, not festooned with more features. We HAVE run-time checked read-only arrays. It's called Collections.unmodifiableList. Those are perfectly safe yet don't actually copy any data over.

Instead, I'd like to see stuff like this java puzzler addressed:

What does the following code emit?

int[] intArray = new int[] {1, 2, 3, 4, 5};
sysout(Arrays.asList(intArray).size());

answer below the dots:

.

.

.

.

.

.

It prints 1, not 5 like you'd expect, because in order to fit int[] into the "{T extends Object}..." vararg, the only way to do so is to set T to int[]. In other words, this produces a list of int[], not a list of integers.

There is in fact no JRE-proper way to convert an int[] into a List of integers; you'll have to write that method yourself. (AFAIK).

Improvements in that area would be much more useful compared to extending arrays.

1:40 AM  
Blogger Bob said...

Reinier, I think we agree. I'm not writing application logic. I'm writing a wrapper so users don't have to pass around raw byte[]. To put it another way, would you recommend that I use List<Character> over String? Sure, I could make my wrapper implement List<Byte>, but I don't think all the boxing and unboxing would be worth it.

8:30 AM  
Blogger Reinier Zwitserloot said...

That's a far point, file transfer is one of those areas where the nominal (constant) speed difference between List{Byte} and byte[] are definitely large enough to warrant using them. Is there perhaps some way we can massage the hotspot compiler to automatically realize, somehow, that a List{Byte} can be treated as a byte[]?

5:48 PM  
Blogger Unknown said...

If you want to wrap a byte[], java.nio.ByteBuffer.wrap.asReadOnly is (I believe) your slightly bloated friend [not checked]. Wrapping xyz[] in a custom List<Xyz> is not necessarily as slow as you might think.

BTW: Ensuring that the OutputStream is wrapped in a BufferedOutputStream is not hacker proof. You could write a byte at a time...

7:27 PM  
Blogger Bob said...

Thanks for the response, Tom. As you know, read-only != immutable, and I need immutability. Implementing List<Byte> might perform OK (especially considering I can preload and cache all the wrappers), but I don't think it makes much sense here. I need an API which plays nicely with existing IO-related APIs.

Good point about BufferedOutputStream; it will pass the byte[] through if it's bigger than the buffer. Guess I'll have to implement a SecureBufferedOutputStream which always goes through the buffer.

8:14 PM  
Blogger phill said...

Advocation of "Collections.unmodifiableList" remind me of people who touted 'Fast Lane Reader' as a J2EE Design Pattern.. both are silly hacks that should be considered temporary and unfortunate workarounds.

Immutable arrays would be great, and let's kill a numbe of birds with one stone. The 'const' keyword would be PERFECTION.

Comeon, Bob. Help us get const! It's long overdue.

1:55 PM  
Blogger Unknown said...

'const' keyword will not work. const byte[] means that the pointer is const but the content of bytes. The individual byte can still be mutated. The only safeguard is immutable byte[].

2:04 AM  
Blogger Bob said...

Michael, I think you're thinking of "final."

10:03 AM  
Blogger hlovatt said...

My pet project, Pattern Enforcing Compiler (PEC), is a Java compiler that extends the type checking to understand patterns. Its immutable pattern is described here:

http://pec.dev.java.net/nonav/compile/javadoc/pec/compile/immutable/package-summary.html

A general description is here:

http://pec.dev.java.net/nonav/compile/index.html

If this compiler was used to write an immutable class it would prevent using a primitive array and hence prevent the problem. You need to use ImmutableArrayList, which is supplied as part of the compiler, instead of a primitive array in immutables. However ImmutableArrayList can still be attacked using reflection.

11:16 PM  
Blogger Ricky Clarkson said...

Howard, I think you mean it would prevent the solution (Bob's byte[] would be rejected).

IDEA can flag up leaks of internal collections and arrays if you configure it as such.

4:19 AM  
Blogger Unknown said...

I must be missing something here but would it not be simpler write out each element of the array in the writeTo(...) method instead of creating a temporary byte[].

There is still a performance here with the additional method invocations but this could be buffered anyway.

3:07 PM  
Blogger Bob said...

William, that's a fine solution.

8:59 AM  

Post a Comment

<< Home