reproducibility

Extracted from iterative/dvc#3608 (comment)

To match docs, where options have been ordered manually from more to less important (at least that's the idea).

Example difference:

λ dvc commit -h
usage: dvc commit [-h] [-q | -v] [-f] [-d] [-R] [targets [targets ...]]
...

optional arguments:
  -h, --help            show this help message and

Is there a method for setting a timeout period for an observer? For example, say the observer is a SQL database, and that database becomes inaccessible for a few minutes. Right now, the experiment simply fails. Is there an argument somewhere to avoid such a situation? Particularly for multi-day experiments.

Description
In some rare cases, for example, when you need to finetune a large model on a small dataset the majoring part of training loop is waiting for saving model checkpoints to a hard drive.

Proposal
Would be logically to add a CheckpointCallback with parameter save_n_best=0 to a configuration and do not store best checkpoints and instead use the latest state of the model.

Motivation:
Some of the challenge hosts want to delete the 'canceled` submissions and the submission in which participants have uploaded non-standard submission files from their All Submissions view. Hence, we require this feature.

Deliverables:

Add a boolean field is_disabled with default as False in the Submissions model.
Add migration file for the added models.

Is your feature request related to a problem? Please describe.
The current documentation states that the --cluster-config is deprecated and it refers to the --profile section, which in turn refers to https://github.com/snakemake-profiles/doc

This change apparently carries a few assumption, which do not necessarily hold:

clusters are by their very nature pretty heterogeneous, hence a clu

This would optionally store the postgres password, username, host, and port, which would be used by start_backend and by the main package.
This could also contain the desired API host and port.

I removed the content below from the vignette, in the name of having no executed code in the vignette. Would like to add it back in an article.

Advanced features

Embedded prose

Sometimes you want to mingle rendered code and prose. Put the embedded prose in as roxygen comments, i.e. comment lines that start with #'. This reprex code:

tmpfile <- tempfile(

@annakrystalli

Word of warning: This issue came up at an interesting talk by @annakrystalli. I have no time to help out, but she encouraged me to post this regardless.

Consider a hypothetical library X that, in version 1.0.0 contains an obscure bug where 0.5683/0 evaluates to -infinity, in violation of IEEE754. From the perspective of the library developers, this is a silly bug and a new version is releas

I just found that the hex equivalent of single quote and double quote in development SPEC should be switched. I.e. single quote is \x27, while double quote is \x22.

😸

https://github.com/kislyuk/argcomplete allows for easy integration with bash auto-completion.

The assert_executable_exists(cmd) function checks for the presence of an executable in the environment where the workflow is executed.

So, we can utilize this function to check for the presence of the required binaries/executables through the __init__() method(constructor) of the concerned runner classes, like [`run

We should have a default overlay containing a sort of LTS for standard data science libraries: tensorflow, pytorch, numpy, pandas, etc...

Many of these libraries are not always trivial to install, so this has an added value by itself, besides the convenience for the explorative data scientist who whishes to use JupyterWith, who does not have to be concerned with package setup.

The rendering the one section of the docs is broken:

https://anaconda-project.readthedocs.io/en/latest/user-guide/reference.html#variables-with-default-values

I'm not sure how to fix it: the RST looks OK.

In Step add option to:

persist small (1% to 5% of data) random sample of the output for browsing purposes.
This should be persisted to the separate directory (not to outputs).

With future.batchtools and other wrappers, it become a bit tricky to track down where errors are coming from when running in batch mode. For instance, I got some:

Error : BatchtoolsError in BatchtoolsFuture ('future_lapply-1'): 'Work dir does not exist'

I wasn't sure if that was from the scheduler or batchtools, but it turns out it's from here:

https://github.com/mllg/batchtools/bl

This guide is getting pretty rusty. We should really think about what the function of it is. It has a lot of useful information, but I feel like the information is hard absorb. There are too many sections, uneven content, and outdated information.

The first question is, in the condition the site is in now, does it serve a valuable function? And/Or does it do any harm staying up in the conditi

Is your feature request related to a problem? Please describe.
One of the reasons it's hard to write new documentation and update old documentation is because the documentation isn't well documented!

Describe the solution you'd like
Better document how to add documentation.

Additional context
One of the other reasons it's hard to add and update docs is because the organization

Admittedly, I'm not a pythonista, but I wonder whether there would be value in using bash versions of the three python scripts. For whatever reason, I'm running into problems with getting python installed correctly on my Mac. Once I got it pointed in the right direct, I ran into problems with installing numpy. It's quickly becoming a tutorial on installing python rather than make :)

I suspect the

I.e. remove tic/travis, instead write a few example YAML files for GH actions based on https://github.com/r-lib/actions & examples in https://github.com/ropenscilabs/actions_sandbox

What singularity image is used for this pipeline? Repo's readme says this pipeline can be run with singularity, but I don't see singularity: defined anywhere in the pipeline.

Using drag and drop can be annoying if nodes are far apart and zoomed out.

reproducibility

Here are 342 public repositories matching this topic...

iterative / dvc

IDSIA / sacred

catalyst-team / catalyst

joedicastro / vps-comparison

ropensci / drake

Cloud-CV / EvalAI

MaurizioFD / RecSys2019_DeepLearning_Evaluation

snakemake / snakemake

henripal / labnotebook

tidyverse / reprex

Advanced features

Embedded prose

benmarwick / rrtools

minerva-ml / open-solution-home-credit

openwdl / wdl

datmo / datmo

kwotsin / mimicry

ddsjoberg / gtsummary

VIDA-NYU / reprozip

getpopper / popper

tweag / jupyterWith

plynx-team / plynx

Anaconda-Platform / anaconda-project

minerva-ml / steppy

ropensci / DataPackageR

mllg / batchtools

ropensci / reproducibility-guide

SwissDataScienceCenter / renku

swcarpentry / make-novice

lockedata / starters

snakemake-workflows / dna-seq-gatk-variant-calling

VisTrails / VisTrails

Improve this page

Add this topic to your repo