Integrate the fp16 numerical checks script #226

cbcase · 2019-03-27T22:16:43Z

I've had this script sitting around for a while on a branch (amp_debug), and so I've brought it up-to-date to integrate with master.

Not entirely sure how much / whether I want to document -- meant primarily for internal use. The idea is that you hack the main train loop to do something like this:

x, y = load_batch()
def loss_fn():
    output = model(x)
    return criterion(output, y)
amp.run_amp_numerical_checks(model, loss_fn)

and then you get a report with a bunch of information:

If there are any overflows in forward/backward with no loss scaling
What is the largest loss scale that doesn't overflow
A comparison of the grads computed (max diff and cosine similarity) with amp enabled and disabled

Integrate the fp16 numerical checks script

d8315b6

NVIDIA / apex

Integrate the fp16 numerical checks script #226

Integrate the fp16 numerical checks script #226

cbcase commented Mar 27, 2019

NVIDIA / apex

Integrate the fp16 numerical checks script #226

Integrate the fp16 numerical checks script #226

Conversation

cbcase commented Mar 27, 2019