Skip to content
#

data-wrangling

Here are 506 public repositories matching this topic...

baeolophus
baeolophus commented Jan 22, 2019

I suggest either adding a short code piece to use the rename() function to change the column "genus" to "genera" (thus alerting the learners to their relationship here, while adding a new function) or changing the column name in the original dataset. Otherwise, I've found that using the correct plural for genus confuses learners who are not biologists. Although it's the R ecology lesson and one

mstrimas
mstrimas commented Jan 12, 2020

This challenge asks student to print an informative message if there are any records in gapminder for the year 2002. Two solutions are provided, one using any(gapminder$year == 2002) (note any() isn't introduced until later in that episode) and one much more complicated one involving counting the number of rows for the year 2002. It seems to me the only reasonable way to do this is with %in%

umnik20
umnik20 commented May 4, 2020

Dear Community,

There is a typo in the section titled "The StringsAsFactors argument" after the second block of code that demonstrates the use of the str() function. Right after the code boxes is written "We can see that the $Color and $State columns are factors and $Speed is a numeric column", but the box shows that the $Color column is a vector of strings.

Regards,

Rodolfo

davis68
davis68 commented Sep 17, 2020
  • I felt like nunique was arbitrarily (re)introduced when it was necessary. It wouldn't be top-of-mind for students solving problems.
  • The lesson answers need to be adjacent to the exercises.
  • I like the pre-introduction of masks and then circling back around to explain them.
  • I feel like Part 4 needs to be broken up and integrated across other lessons: it felt thin on its own.
  • Horizo
jatonline
jatonline commented Apr 12, 2021

In recent (non-Carpentry) Python courses, we have come across learners that have experience with Python and using JupyterLab or Jupyter Notebooks, however are unaware that you can just run a Python script from the command line. We have observed that this has led to some confusion when they've been working with others who use script files.

I'm not for a second suggesting changing the way the les

lachlandeer
lachlandeer commented Jul 30, 2018

In episode _episodes_rmd/12-time-series-raster.Rmd

There is a big chunk of code that can probably be made to look nicer via dplyr:

# Plot RGB data for Julian day 133
 RGB_133 <- stack("data/NEON-DS-Landsat-NDVI/HARV/2011/RGB/133_HARV_landRGB.tif")
 RGB_133_df <- raster::as.data.frame(RGB_133, xy = TRUE)
 quantiles = c(0.02, 0.98)
 r <- quantile(RGB_133_df$X133_HARV_landRGB.1, q
cmrfoley
cmrfoley commented Jun 12, 2018

The discussion of data types and data structures in "Vectors and data types" could be clarified. Perhaps even defining these terms before using them would help. Also note that the first sentence of the section reads "A vector is the most common and basic data type in R, and is pretty much the workhorse of R." perhaps this should be changed to "basic data structure"

cwant
cwant commented May 7, 2018

The Survey table has a field called quant that holds what type of reading was taken. The values in this column are rad, sal, and temp. There is no legend that explains what these mean on the page where the data is introduced (the selecting data chapter). Much later in the course it's mentioned that these mean 'radiation', 'salinity' and 'temperature', but I think it would also be helpful

Improve this page

Add a description, image, and links to the data-wrangling topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-wrangling topic, visit your repo's landing page and select "manage topics."

Learn more