Wednesday, January 28, 2015

Resolving "The agent process was stopped while the test was running" on a TFS build agent

I'm currently working on a project that uses TFS for automated builds. Recently, our builds started failing with the error "The agent process was stopped while the test was running" while running unit tests. It wouldn't always happen during the same test, though it did frequently.

After inspecting the Event Viewer on the build machines, we found an error in the .NET Runtime that was taking down the build agent:

Application: QTAgent32_40.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: Moq.MockException
Stack:
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.<ThrowAsync>b__5(System.Object)
   at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
The exception that was being thrown on a ThreadPoolThread (always extremely bad, never good) was a MockException: our unit testing framework was throwing an exception that was taking down the build agent.  But where?

How We Solved It

By some miracle, one of our developers decided to run our problematic tests with mstest on the command line on his local machine. As it turns out, this was great because it showed the stack traces for the loose threads on the command line. Turns out we had a lot more loose threads than just the one that was taking down our test agent on the build machine. The command the developer used was (similar to) the following:

mstest.exe /testcontainer:"C:\path\to\my\test.dll" /noisolation

After auditing all of the errors by rooting them out with command line runs of mstest, the solutions to our problems all boiled down to one thing:

*ALWAYS* wrap the contents of 'async void' methods with a try-catch block!

No comments: