So are there any studies or metrics that back up this assertion that strongly typed languages produce better apps, or is it just "common sense"?
There must be tons.
A quick shufti to the IEEE Xplore Library shows this beyond the Dread Big Paywall:
▶ "A controlled experiment to assess the benefits of procedure argument type checking"
Published in: IEEE Transactions on Software Engineering ( Volume: 24, Issue: 4, Apr 1998 )
Abstract: Type checking is considered an important mechanism for detecting programming errors, especially interface errors. This report describes an experiment to assess the defect-detection capabilities of static, intermodule type checking. The experiment uses ANSI C and Kernighan & Ritchie (K&R) C. The relevant difference is that the ANSI C compiler checks module interfaces (i.e., the parameter lists calls to external functions), whereas K&R C does not. The experiment employs a counterbalanced design in which each of the 40 subjects, most of them CS PhD students, writes two nontrivial programs that interface with a complex library (Motif). Each subject writes one program in ANSI C and one in K&R C. The input to each compiler run is saved and manually analyzed for defects. Results indicate that delivered ANSI C programs contain significantly fewer interface defects than delivered K&R C programs. Furthermore, after subjects have gained some familiarity with the interface they are using, ANSI C programmers remove defects faster and are more productive (measured in both delivery time and functionality implemented).
I will see if my search-fu is good enough to find more before the 10 minutes are up.