Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate the fp16 numerical checks script #226

Open
wants to merge 1 commit into
base: master
from
Open

Conversation

@cbcase
Copy link
Collaborator

@cbcase cbcase commented Mar 27, 2019

I've had this script sitting around for a while on a branch (amp_debug), and so I've brought it up-to-date to integrate with master.

Not entirely sure how much / whether I want to document -- meant primarily for internal use. The idea is that you hack the main train loop to do something like this:

x, y = load_batch()
def loss_fn():
    output = model(x)
    return criterion(output, y)
amp.run_amp_numerical_checks(model, loss_fn)

and then you get a report with a bunch of information:

  • If there are any overflows in forward/backward with no loss scaling
  • What is the largest loss scale that doesn't overflow
  • A comparison of the grads computed (max diff and cosine similarity) with amp enabled and disabled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.