While you are writing tests for a process you are registering after the spawn, you might get bitten by a race condition in the test module. But let’s start from the beginning with some code.
A super-simple gen_server:
Here is a too simple bank accounting server (the logic of the server is incomplete and really easy). I have written it for joke too many times, and here I am using dict, which is not the neat data structure to use, but ehy, we are talking about race conditions in unit testing.
I link you to the file, because 96 rows of code pasted here are too many.
The test module:
The tests are stupid tests. I am not focused on them right now, but only on the race condition. Remember?
-module(test_bank_server).
-include_lib("eunit/include/eunit.hrl").
create_account_test_() ->
{setup,
fun setup/0,
fun teardown/1,
fun(_Pid) ->
bank_server:create_account(mirko),
[
?_assertEqual(0, bank_server:balance(mirko))
] end}.
deposit_test_() ->
{setup,
fun setup/0,
fun teardown/1,
fun(_Pid) ->
bank_server:create_account(mirko),
bank_server:deposit(mirko, 10),
bank_server:deposit(mirko, 99),
bank_server:create_account(roberta),
bank_server:deposit(roberta, 1000),
[
?_assertEqual(109, bank_server:balance(mirko)),
?_assertEqual(1000, bank_server:balance(roberta))
] end}.
setup() ->
{ok, Pid} = bank_server:start_link(),
Pid.
teardown(_Pid) ->
bank_server:stop().
And yep, here we have a race condition:
Ok, as you can see our test is failing, but it is not failing the right way. We have got something wrong inside our test setup. But it seems to be caused by the teardown/1 function.
Why?
The explanation is very simple. We are issuing an asynchronous call to stop the server. So, Eunit is calling teardown/1 which sends the asynchronous message stop to the gen_server, an then it goes on to the next test, calling setup/0, but after the spawn, it could be that the previous gen_server is still alive, or it is shutting down right in that moment.
We can’t register a process if there is already that name registered or if the Pid does not exist on the local machine. And here we are in the first case and we get a badarg error on the function register/2 (but in this case I am using OTP, so the error message is even more clear, the bank_server process is already started, yep, because the process registered in the previous test generator is still there).
To tell the truth, I am using an asynchronous call here to shutdown the server. I have tried the same example with a synchronous call and I avoided the race condition returning a 4 element tuple {stop, normal, ok, State} from the gen_server, as Roberto Aloi have written in this StackOverflow post. But his mate Adam Lindberg still poiting to the fact we are not avoiding race condition even if we are issuing synchronous calls.
And at the end of the day, it could be that you don’t want to use a synchronous call at all, so let’s go to the solutions.
How to avoid it (dirty solution):
This is a shitty solutions (please don’t use this in your code or you will be screwed in this concurrent world). I want to write it here because you can understand that if you give the gen_server some milliseconds to shut down, you are ok, but it is still a bad practice, because you are betting on the time with an hard coded value, and it is not clean but really really dirty.
setup() ->
timer:sleep(100),
bank_server:start_link().
teardown(_Pid) ->
bank_server:stop().
So, yep, no race conditions here. But it is dangerous. What if the gen_server tooks more time to shutdown? Here comes the Adam way, which is in my opinion the neat one.
How to avoid it (neat solution):
As I have already written above, here is the Adam Lindberg solution. I have found it the the previous mentioned StackOverflow discussion.
teardown(Pid) ->
bank_server:stop(),
wait_for_exit(Pid).
wait_for_exit(Pid) ->
MRef = erlang:monitor(process, Pid),
receive
{'DOWN', MRef, _, _, _} ->
ok
end.
To get to the point Adam says that its way is to monitor the gen_server process, and wait for its {‘DOWN’, Ref, process, Pid, Reason} message. This is like you are synchronizing the teardown/1 and the next setup/0, and it is not hard coded, but really neat. It works good. And you are get things in sync here in your test module, but you are not forcing your gen_server to issue a synchronous call to shutdown.
And yep, it works:
Adam in the very last message of the discussion says which it is always a good practice to monitor processes that are supposed to die during testing, and I agree.
That’s it!
Time to wish you a great New Year’s Eve. See you within 24 hour in 2012, it will be a great year. Lots of things to do, and lots of Erlang code to write. Starting from the Erlang Factory in Brussels, see you there, ping me for beers or dinner and some Erlang discussions.
Bye.




