dask

I just ran into an issue when trying to use to_csv with distributed workers that don't share a file system. I shouldn't have been surprised that writing to a local file system from a distributed worker doesn't work. It shouldn't work. But the error I got was just a File Not Found error. That brought me to:dask/dask#2656 (comment) - which was the answer.

Now that mpdist has been implemented, it would be useful to have an mpdist tutorial

The paper is here

The supporting website is here

The goal of this tutorial would be to reproduce Figure 5 from the paper

What happened:

When creating a LocalCluster object the comm is started on a random high port, even if there are no other clusters running.

What you expected to happen:

Should use port 8786.

Minimal Complete Verifiable Example:

$ conda create -n dask-lc-test -c conda-forge -y python=3.8 ipython dask distributed
$ conda activate dask-lc-test

The `d

Describe the bug
According to the multiscene documentation, the property all_same_area does:

Determine if all contained Scenes have the same ‘area’.

However, I have created a multiscene where all scenes have the same area (they just differ between datasets), yet the property returns Fa

Code Sample, a minimal, complete, and verifiable piece of code

from pyresample.boundary import Boundary
b = Boundary(my_lons, my_lats)
print(b.contour_poly.area())

Problem description

The above code doesn't fail if the provided lons/lats are 2D (not sure on 3D+), but the class and all functions/utilities underneath it assume 1D arrays. The end results are incor

from dask_jobqueue import SLURMCluster 
cluster = SLURMCluster(cores=1, memory='1GB') 
print(cluster.job_script())

#!/usr/bin/env bash

#SBATCH -J dask-worker
#SBATCH -n 1
#SBATCH --cpus-per-task=1
#SBATCH --mem=954M
#SBATCH -t 00:30:00

/home/lesteve/miniconda3/bin/python -m distributed.cli.dask_worker tcp://192.168.0.11:44065 --nthreads 1 --memory-limit 1000.00MB -

Problem description

Our dask update graphs are not properly optimized.

We ussually use dask.dataframe optimization and set ave_width=repartition_ratio for kartothek.io.dask.dataframe.update_dataset_from_ddf graphs. We should return an optimized graph from update_dataset_from_ddf to make our users' life simple.

We already have code that does this, whoever picks this up can ping me

@romainr

The ML implementation is still a bit experimental - we can improve on this:

SHOW MODELS and DESCRIBE MODEL
Hyperparameter optimizations, AutoML-like behaviour
@romainr brought up the idea of exporting models
and some more showcases and examples

Currently all of the metrics computed are independent of a target variable or column, but if lens.summarise took the name of a column as the target variable, the output of some metrics could be more interpretable even if the target variable is not used in any kind of predictive modelling.

A good example of this could be PCA (see #14), which could plot the different categories of the target va

._datasets is dict.

Implement:

Help https://github.com/pydata/xarray/blob/66acafa7f1f1477cfd6c5b7c3458859763433092/xarray/core/dataset.py#L475

dask

Here are 209 public repositories matching this topic...

dask / dask

pydata / xarray

TDAmeritrade / stumpy

jmcarpenter2 / swifter

dask / distributed

ironmussa / Optimus

itamarst / eliot

pytroll / satpy

ranaroussi / pystore

timkpaine / paperboy

JiaweiZhuang / xESMF

pytroll / pyresample

Code Sample, a minimal, complete, and verifiable piece of code

Problem description

dask / dask-jobqueue

JDASoftwareGroup / kartothek

Problem description

dask / dask-ec2

nils-braun / dask-sql

facultyai / lens

ironmussa / Bumblebee

pangeo-data / climpred

dymaxionlabs / dask-rasterio

chmp / framequery

dask / knit

LDO-CERT / orochi

NCAR / ncar-python-tutorial

JSybrandt / agatha

fugue-project / fugue

radix-ai / graphchain

dgerlanc / dask-scaling-dataframe

MITgcm / xmitgcm

backtick-se / cowait

Improve this page

Add this topic to your repo