After inspecting the Event Viewer on the build machines, we found an error in the .NET Runtime that was taking down the build agent:
The exception that was being thrown on a ThreadPoolThread (always extremely bad, never good) was a MockException: our unit testing framework was throwing an exception that was taking down the build agent. But where?Application: QTAgent32_40.exeFramework Version: v4.0.30319Description: The process was terminated due to an unhandled exception.Exception Info: Moq.MockExceptionStack:at System.Runtime.CompilerServices.AsyncMethodBuilderCore.<ThrowAsync>b__5(System.Object)at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(System.Object)at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()at System.Threading.ThreadPoolWorkQueue.Dispatch()at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
How We Solved It
By some miracle, one of our developers decided to run our problematic tests with mstest on the command line on his local machine. As it turns out, this was great because it showed the stack traces for the loose threads on the command line. Turns out we had a lot more loose threads than just the one that was taking down our test agent on the build machine. The command the developer used was (similar to) the following:
mstest.exe /testcontainer:"C:\path\to\my\test.dll" /noisolation
After auditing all of the errors by rooting them out with command line runs of mstest, the solutions to our problems all boiled down to one thing:
*ALWAYS* wrap the contents of 'async void' methods with a try-catch block!