Skip to content
The lightweight PyTorch wrapper for ML researchers. Scale your models. Write less boilerplate
Python Other
  1. Python 99.4%
  2. Other 0.6%
Branch: master
Clone or download

Latest commit

3 authors Allow user to select individual TPU core to train on (#1729)
* added tpu_id

added tpu_id to mixins

* train on individual tpu

* parallel loader if tpu_id is None

* removed progress_bar_refresh_rate

* chlog

* replaced num_tpu_cores with tpu_cores

* set tpu_id to None if int

* changed num_tpu_cores to tpu_cores in docs

* updated docs

* updated __init__.py
removed self.tpu_id for ParallelLoader

* Update pytorch_lightning/trainer/__init__.py

* check if tpu_cores is a list

Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>

* xla device conditional

* num_tpu_cores deprecation

* removed duplicate warning

* fixed pep8 error

* Revert "removed duplicate warning"

This reverts commit 8adb0a9

* deprecated api update

* fixed recursion error

* fixed tests

* fixed flake errors

* removed current_tpu_index

* Update CHANGELOG.md

* Update trainer.py

Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: William Falcon <waf2107@columbia.edu>
Latest commit 7c7e50c May 17, 2020

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci Fix failing docs (#1821) May 14, 2020
.github Fix build Docker releases (#1783) May 12, 2020
benchmarks Option to provide seed to random generators to ensure reproducibility ( May 12, 2020
docker Create Dockerfile (#1569) Apr 25, 2020
docs Allow user to select individual TPU core to train on (#1729) May 17, 2020
pl_examples fix bugs in semantic segmentation example (#1824) May 14, 2020
pytorch_lightning Allow user to select individual TPU core to train on (#1729) May 17, 2020
tests Allow user to select individual TPU core to train on (#1729) May 17, 2020
.codecov.yml enable Codecov (#1133) Mar 14, 2020
.drone.yml doctest for .rst files (#1511) May 5, 2020
.gitignore Move generated RST files to subfolder (#1555) May 4, 2020
.mergify.yml add python 3.8 testing (#915) Apr 2, 2020
.pep8speaks.yml improve partial Codecov (#1172) Mar 19, 2020
.readthedocs.yml Fix failing docs (#1821) May 14, 2020
.run_local_tests.sh CI: split tests-examples (#990) Mar 25, 2020
.update.sh default test logger (#1478) Apr 22, 2020
CHANGELOG.md Allow user to select individual TPU core to train on (#1729) May 17, 2020
LICENSE update license (#809) Feb 9, 2020
MANIFEST.in release 0.7.6 (#1813) May 15, 2020
README.md Allow user to select individual TPU core to train on (#1729) May 17, 2020
environment.yml Replace meta_tags.csv with hparams.yaml (#1271) May 13, 2020
pyproject.toml Split callbacks (#849) Feb 23, 2020
requirements-extra.txt Tests/docker (#1573) Apr 23, 2020
requirements.txt Replace meta_tags.csv with hparams.yaml (#1271) May 13, 2020
setup.cfg added warning to crash (#1625) Apr 30, 2020
setup.py Fix typo (#1750) May 7, 2020

README.md

Logo

PyTorch Lightning

The lightweight PyTorch wrapper for ML researchers. Scale your models. Write less boilerplate.

PyPI Status PyPI Status codecov CodeFactor

ReadTheDocs Slack license Next Release


Continuous Integration

System / PyTorch ver. 1.1 (min. reg) 1.2 1.3 1.4 1.5 (latest)
Linux py3.6 [CPU] CircleCI CircleCI CircleCI CircleCI CircleCI
Linux py3.7 [GPU] - - - - Build Status
Linux py3.6 / py3.7 / py3.8 CI testing - - - CI testing
OSX py3.6 / py3.7 / py3.8 CI testing - - - CI testing
Windows py3.6 / py3.7 / py3.8 CI testing - - CI testing -

Simple installation from PyPI

pip install pytorch-lightning

Docs

Refactoring your PyTorch code + benefits + full walk-through

Watch the video

Demo

Here's a minimal example without a validation or test loop.

# this is just a plain nn.Module with some structure

class LitClassifier(pl.LightningModule):

    def __init__(self):
        super().__init__()
        self.l1 = torch.nn.Linear(28 * 28, 10)

    def forward(self, x):
        return torch.relu(self.l1(x.view(x.size(0), -1)))

    def training_step(self, batch, batch_nb):
        x, y = batch
        loss = F.cross_entropy(self(x), y)
        tensorboard_logs = {'train_loss': loss}
        return {'loss': loss, 'log': tensorboard_logs}

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.02)

# train!
train_loader = DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=transforms.ToTensor()), batch_size=32)

model = LitClassifier()
trainer = pl.Trainer(gpus=8, precision=16)    
trainer.fit(model, train_loader) 

Other examples:
GAN
BERT
DQN
MNIST on TPUs

What is it?

READ THIS QUICK START PAGE

Lightning is a way to organize your PyTorch code to decouple the science code from the engineering. It's more of a PyTorch style-guide than a framework.

In Lightning, you organize your code into 3 distinct categories:

  1. Research code (goes in the LightningModule).
  2. Engineering code (you delete, and is handled by the Trainer).
  3. Non-essential research code (logging, etc... this goes in Callbacks).

Here's an example of how to refactor your research code into a LightningModule.

PT to PL

The rest of the code is automated by the Trainer! PT to PL

Testing Rigour

All the automated code by the Trainer is tested rigorously with every new PR.

In fact, we also train a few models using a vanilla PyTorch loop and compare with the same model trained using the Trainer to make sure we achieve the EXACT same results. Check out the parity tests here.

Overall, Lightning guarantees rigorously tested, correct, modern best practices for the automated parts.

How flexible is it?

As you see, you're just organizing your PyTorch code - there's no abstraction.

And for the stuff that the Trainer abstracts out, you can override any part you want to do things like implement your own distributed training, 16-bit precision, or even a custom backward pass.

For example, here you could do your own backward pass

class LitModel(LightningModule):
  def optimizer_step(self, current_epoch, batch_idx, optimizer, optimizer_idx,
                     second_order_closure=None):
      optimizer.step()
      optimizer.zero_grad()

For anything else you might need, we have an extensive callback system you can use to add arbitrary functionality not implemented by our team in the Trainer.

Who is Lightning for?

  • Professional researchers
  • Ph.D. students
  • Corporate production teams

If you're just getting into deep learning, we recommend you learn PyTorch first! Once you've implemented a few models, come back and use all the advanced features of Lightning :)

What does lightning control for me?

Everything in Blue! This is how lightning separates the science (red) from engineering (blue).

Overview

How much effort is it to convert?

If your code is not a huge mess you should be able to organize it into a LightningModule in less than 1 hour. If your code IS a mess, then you needed to clean up anyhow ;)

Check out this step-by-step guide.
Or watch this video.

Starting a new project?

Use our seed-project aimed at reproducibility!

Why do I want to use lightning?

Although your research/production project might start simple, once you add things like GPU AND TPU training, 16-bit precision, etc, you end up spending more time engineering than researching. Lightning automates AND rigorously tests those parts for you.

Support

  • 8 core contributors who are all a mix of professional engineers, Research Scientists, Ph.D. students from top AI labs.
  • 100+ community contributors.

Lightning is also part of the PyTorch ecosystem which requires projects to have solid testing, documentation and support.


README Table of Contents


Realistic example

Here's how you would organize a realistic PyTorch project into Lightning.

PT to PL

The LightningModule defines a system such as seq-2-seq, GAN, etc... It can ALSO define a simple classifier.

In summary, you:

  1. Define a LightningModule
    class LitSystem(pl.LightningModule):

        def __init__(self):
            super().__init__()
            # not the best model...
            self.l1 = torch.nn.Linear(28 * 28, 10)

        def forward(self, x):
            return torch.relu(self.l1(x.view(x.size(0), -1)))

        def training_step(self, batch, batch_idx):
            ...
  1. Fit it with a Trainer
from pytorch_lightning import Trainer

model = LitSystem()

# most basic trainer, uses good defaults
trainer = Trainer()
trainer.fit(model)

Check out the COLAB demo here

What types of research works?

Anything! Remember, that this is just organized PyTorch code. The Training step defines the core complexity found in the training loop.

Could be as complex as a seq2seq

# define what happens for training here
def training_step(self, batch, batch_idx):
    x, y = batch

    # define your own forward and loss calculation
    hidden_states = self.encoder(x)

    # even as complex as a seq-2-seq + attn model
    # (this is just a toy, non-working example to illustrate)
    start_token = '<SOS>'
    last_hidden = torch.zeros(...)
    loss = 0
    for step in range(max_seq_len):
        attn_context = self.attention_nn(hidden_states, start_token)
        pred = self.decoder(start_token, attn_context, last_hidden)
        last_hidden = pred
        pred = self.predict_nn(pred)
        loss += self.loss(last_hidden, y[step])

    #toy example as well
    loss = loss / max_seq_len
    return {'loss': loss}

Or as basic as CNN image classification

# define what happens for validation here
def validation_step(self, batch, batch_idx):
    x, y = batch

    # or as basic as a CNN classification
    out = self(x)
    loss = my_loss(out, y)
    return {'loss': loss}

And without changing a single line of code, you could run on CPUs

trainer = Trainer(max_epochs=1)

Or GPUs

# 8 GPUs
trainer = Trainer(max_epochs=1, gpus=8)

# 256 GPUs
trainer = Trainer(max_epochs=1, gpus=8, num_nodes=32)

Or TPUs

# Distributes TPU core training
trainer = Trainer(tpu_cores=8)

# Single TPU core training
trainer = Trainer(tpu_cores=[1])

When you're done training, run the test accuracy

trainer.test()

Visualization

Lightning has out-of-the-box integration with the popular logging/visualizing frameworks

tensorboard-support

Lightning automates 40+ parts of DL/ML research

  • GPU training
  • Distributed GPU (cluster) training
  • TPU training
  • EarlyStopping
  • Logging/Visualizing
  • Checkpointing
  • Experiment management
  • Full list here

Examples

Check out this awesome list of research papers and implementations done with Lightning.

Tutorials

Check out our introduction guide to get started. Or jump straight into our tutorials.


Asking for help

Welcome to the Lightning community!

If you have any questions, feel free to:

  1. read the docs.
  2. Search through the issues.
  3. Ask on stackoverflow with the tag pytorch-lightning.
  4. Join our slack.

FAQ

How do I use Lightning for rapid research?
Here's a walk-through

Why was Lightning created?
Lightning has 3 goals in mind:

  1. Maximal flexibility while abstracting out the common boilerplate across research projects.
  2. Reproducibility. If all projects use the LightningModule template, it will be much much easier to understand what's going on and where to look! It will also mean every implementation follows a standard format.
  3. Democratizing PyTorch power-user features. Distributed training? 16-bit? know you need them but don't want to take the time to implement? All good... these come built into Lightning.

How does Lightning compare with Ignite and fast.ai?
Here's a thorough comparison.

Is this another library I have to learn?
Nope! We use pure Pytorch everywhere and don't add unnecessary abstractions!

Are there plans to support Python 2?
Nope.

Are there plans to support virtualenv?
Nope. Please use anaconda or miniconda.

conda activate my_env
pip install pytorch-lightning

Which PyTorch versions do you support?

  • PyTorch 1.1.0
    # install pytorch 1.1.0 using the official instructions
    
    # install test-tube 0.6.7.6 which supports 1.1.0
    pip install test-tube==0.6.7.6
    
    # install latest Lightning version without upgrading deps
    pip install -U --no-deps pytorch-lightning
  • PyTorch 1.2.0+
    pip install pytorch-lightning

Custom installation

Bleeding edge

If you can't wait for the next release, install the most up to date code with:

  • using GIT (locally clone whole repo with full history)
    pip install git+https://github.com/PytorchLightning/pytorch-lightning.git@master --upgrade
  • using instant zip (last state of the repo without git history)
    pip install https://github.com/PytorchLightning/pytorch-lightning/archive/master.zip --upgrade

Any release installation

You can also install any past release 0.X.Y from this repository:

pip install https://github.com/PytorchLightning/pytorch-lightning/archive/0.X.Y.zip --upgrade

Lightning team

Leads

Core Maintainers

Funding

Building open-source software with only a few part-time people is hard! We've secured funding to make sure we can hire a full-time staff, attend conferences, and move faster through implementing features you request.

Our goal is to build an incredible research platform and a big supportive community. Many open-source projects have gone on to fund operations through things like support and special help for big corporations!

If you are one of these corporations, please feel free to reach out to will@pytorchlightning.ai!

Bibtex

If you want to cite the framework feel free to use this (but only if you loved it 😊):

@article{falcon2019pytorch,
  title={PyTorch Lightning},
  author={Falcon, WA},
  journal={GitHub. Note: https://github. com/williamFalcon/pytorch-lightning Cited by},
  volume={3},
  year={2019}
}
You can’t perform that action at this time.