Friday, May 19, 2006

JavaOne: Session Concurrency

All this talk of HTTP session clustering from Tangosol, Terracotta, etc., got me to thinking about a common problem in web applications which use in-memory HttpSession implementations: developers tend to forget that multiple threads can concurrently access objects on the session. From my experience, web developers think even less about session concurrency than security (yes, it's possible).

It's terribly easy to focus on the happy path during development and ignore the edge cases (especially considering how difficult it is to test them). In the past, we web developers only had to worry about users who double clicked links and buttons, but now technologies like AJAX and asynchronous session replication have exacerbated the problem by increasing the likelihood of two threads stepping on each other and resulting in race conditions or deadlocks.

Case in point: I once had a server grind to a halt due to a strange deadlock. The stack dump showed that one thread was waiting for a lock held by another thread which seemed to be mysteriously stuck in Object.hashCode():

  daemon runnable
    at java.lang.Object.hashCode(Native Method)
    at java.util.HashMap$Entry.hashCode(Unknown Source)
    at java.util.AbstractMap.hashCode(Unknown Source)
    ...

How on Earth could a thread block in Object.hashCode()? The call to a non-synchronized HashMap in the stack trace unlocked the mystery. Someone nested a HashMap instance deeply within the session. While one request iterated over the map, another request came in and mutated the map which created a cycle in the HashMap's underlying data structure, and the first thread went into an infinite loop. Object.hashCode() didn't really block, but the loop was so tight, it sure looked that way.

How do you avoid this pitfall? There is no silver bullet. You could synchronize. Synchronizing at the appropriate level is tricky. Too fine grained, you don't really solve the problem. Too coarse, and a long running request can block subsequent requests. Last I checked, Struts Action 1 performed no synchronization on session scoped beans, and it wasn't a problem you could solve at the application level (i.e. how would you get Struts to acquire your lock before mapping request parameters to your session-scoped bean?). I'm not sure how JSF implementations and Spring WebFlow address this problem, but I'd love to hear. My current project manages concurrency quite successfully using an in-house wizard framework.

You could also serialize the session at the beginning and end of each request (each request would have its own copy of the session). I think Rails takes this approach by virute of the fact that each request executes in its own process. I'm not sure how you would handle concurrent requests though; would the second request overwrite the session changes from the first? Does Rails write the problem off as too rare to worry about? I suppose you could mitigate this problem to some degree by storing the session in a hidden field on the client.

Nowadays I think I favor keeping all your state in the database and using one of the aforementioned clustered caching frameworks to scale. Thoughts?

11 Comments:

Anonymous Anonymous said...

In Coherence*Web, we have an option to serialize (run one after another) all requests related to a particular session. It was designed to avoid stuff like this ;-)

Peace.

5:58 PM  
Blogger Bob said...

That's *exactly* what I meant by, "too coarse, and a long running request can block subsequent requests." Synchronizing all requests for a given session is as simple as:

public class SynchronizingFilter implements Filter {
public void doFilter(ServletRequest request,
ServletResponse response,
FilterChain chain) {
synchronized (((HttpServletRequest) request).getSession()) {
chain.doFilter(request, response);
}
}
}

(Assuming the underlying servlet container implementation returns the same HttpSession instance every time.)

You must take care to only lock requests that actually touch the session and to closely monitor the user experience (i.e. composite response times). If you apply your filter a little too widely, how much would it suck if the browser had to download all the CSS, Javascript, images, etc. serially? Your web site would feel dog slow.

Even if you do that, with the "serialize on the session" model, if one request takes a long time (running a report maybe), then the user can't make subsequent requests. They're effectively locked out of your application until they close their browser. Their requests all line up one after the other until the server runs out of threads.

Our in-house framework locks at a finer grain: per wizard instance (you can run different instances of the same wizard type in different windows). For you Seam users, the synmonym for "wizard" in Seam is "conversation." Our framework also has the ability to abort a long running request (in a much safer manner than Thread.stop()).

Lastly, their's no guarantee of what order requests will come in, but in reality they tend to come in in the same order the user executed them. If you simply synchronize, you completely throw away that order as the next request to execute turns into whichever thread gets the lock first, not whichever request came in first. Our framework takes care to keep the requests for a given wizard in the right order, and I hope Coherence*Web does the same for a given session.

I think there's also a bigger problem at play here: developers store data on the session when they probably shouldn't. Will a given feature still work if the user uses the browser back button? If not, you should probably bite the bullet and pass your state from page to page or something else.

7:18 PM  
Anonymous Anonymous said...

Firstly, I am a bit mystified about how the original loop was triggered. A HashMap iterator will fail fast with a ConcurrentModificationException if it is modified during iteration (which is what happens during hashCode()). I can't find the code path that causes this behaviour, how does the infinite loop occur (although, reading the HashMap docs, HashMap doesn't guarantee it will fail fast)?

Secondly, the way I have solved such things in the past is not necessarily by locking (unless you really need it), but by using concurrent implementations of the underlying collections such as ConcurrentHashMap and CopyOnWriteArrayList - these provide much better support for concurrent access, and will not throw ConcurrentModifcationEx. Where I needed to guarantee concurrent access integrity (or if there are large numbers of writes in the case of CopyOnWriteArrayList) I have rolled my own collection wrappers that implement ReadWriteLock and take out a read lock on immutable operations and a write lock on mutable operations before calling the wrapped delegate. It is still the clients job to take out a lock when iterating of course.

9:42 PM  
Blogger Bob said...

I didn't dig too deeply after I realized it was a concurrency problem. Whether or not HashMap guarantees to fail fast, what matters is it doesn't guarantee anything when accessed concurrently; it's not thread safe. If you're interested in more, this case goes into a little more detail: http://blogs.opensymphony.com/plightbo/2005/07/hashmapget_can_cause_an_infini.html

I'm a huge fan of the new concurrent containers, but the right solution is to understand the problem at hand. Dropping in a concurrent implementation could simply mask the problem, i.e. the system may not blow up like it did here, but it will still fail in more mysterious ways.

10:45 PM  
Anonymous Anonymous said...

Nice Filter example! :-)

We do the concurrency management across the cluster, i.e. sequence access to the session not just on one box, but across all of them, with locality optimizations for sticky management.

Peace.

10:15 AM  
Blogger Bob said...

Locking in a cluster is orthogonal to the problems we've discussed. I think I actually doubt the value of synchronizing across multiple servers. In our application, we only send the user to a different server if their primary server goes down which means they should almost never access more than one server at a time.

10:48 AM  
Anonymous Anonymous said...

Thanks for the link, very interesting.

Certainly the solution has to fit the problem, its just that your particular problem was a concurrently modified Map whose internal structure got corrupted., which sounds like the perfect job for [trumpts blaze, a white horse rides in stage right carrying:] ConcurrentHashMap!

11:09 PM  
Blogger Bob said...

Simply dropping in ConcurrentHashMap assumes that the granularity of atomicity is a method on the map. If your code does something like put an item in the map and then expect that the map hasn't changed when it accesses the map again shortly thereafter, ConcurrentHashMap won't help. I love ConcurrentHashMap, but I usually find I can't just drop it in to code which uses a HashMap without thinking about it. I usually have to refactor the code to take advantage of the new atomic map methods.

8:30 AM  
Blogger Eugene Kuleshov said...

By the way, Bob, what servlet container you are running on? Speaking of concurrent modifications it should be relative easy to implement something to detect those, e.g. using some kind of versioning scheme... at least you can react accordingly.

11:06 AM  
Blogger Brian Egge said...

Sorry Jed, HashMap's aren't guaranteed to fail fast. If you getting CME's you should be very careful, because you'll eventually have the case where you get stuck in the hashCode loop. Basically, the iterator checks for a modification every time it moves, but there is a small window where it can check for a modification and pass, and then the collection mutates before it gets the next element.

12:19 AM  
Blogger vishal said...

I agree that there's no silver bullet !!!
I asked the same question to my former Colleagues at interface21 and Keith and Juergen both pointed me to

http://www.springframework.org/docs/api/org/springframework/web/util/HttpSessionMutexListener.html

which basically explains that it's being done using HttpSessionListener that automatically exposes the session mutex when an HttpSession gets created. Hence instead of locking the entire session lock can be acquired on the mutex.

10:24 PM  

Post a Comment

<< Home