AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |
Back to Blog
Tidyverse summary9/26/2023 But I'm not sure if the workaround is necessary and I've missed an easy step somewhere. This seems OK to me and was the approach I suggested as an answer to a recent question. report % select(Sepal.Length:Petal.Width) %>% imap_dfr(report) Each remaining column relates to an arbitrary summary function. The first column returned is the original tibble column name. My current workaround is to ditch summarise_at() completely and define a function which returns a one row tibble. Is there another tidyverse way I should do this? One downfall of this approach is the logical result for anyNA() is now coerced to numeric. This is where I wonder if I'm heading in the wrong direction. Summarise_at(vars(Sepal.Length:Petal.Width), funs(min, anyNA)) %>% Perhaps I should use gather and spread to get the desired output: iris %>% Unfortunately, the above result isn't tidy. #> Sepal.Length_anyNA Sepal.Width_anyNA Petal.Length_anyNA #> Sepal.Length_min Sepal.Width_min Petal.Length_min Petal.Width_min I can then extend the previous example to summarise multiple columns: iris %>% summarise_at(vars(Sepal.Length:Petal.Width), funs(min, anyNA)) Iris %>% summarise_at("Petal.Width", funs(min, anyNA)) using min() and anyNA()): library(tidyverse) In addition, the results should be contained in a 'tidy' tibble.įor example, I can summarise one column multiple ways (e.g. $ extra 0.7, -1.6, -0.2, -1.2, -0.1, 3.4, 3.7, 0.8, 0.0, 2.0, 1.9, 0.Hi tidyverse community, I am wondering if there is a recommended tidyverse workflow when you want to summarise multiple columns in a tibble using multiple arbitrary summary functions. Summary(df3) # we use summary() for many many other purposesĬlasses 'data.table' and 'ame': 20 obs. Comparing the outputs of read.csv(x) and fread(x) refers to the parent directory of your current directory, which is the /home directory. If your current directory is /home/desktop, then. can be used to refer to the parent directory. in the file path simply refers to the current working directory, so it can be dropped. It’s super intelligent and fast (reads gigabytes of data in just a few seconds). I always use fread() from the data.table to read data now. R has a lot of built-in datasets type data() in the console to see what dataests are available. Try typing sleep in your console and ?sleep for more info on this dataset. The sleep dataset is actually a built-in dataset in R. # READ: assign the output return by read.csv("data/sleep.csv") into df1ĭf2 <- fread("./data/sleep.csv") # fread() from library(data.table)ĭf3 <- fread("data/sleep.csv") # same as above # same as df1 <- read.csv("data/sleep.csv") Use library() to load packages at the top of each R script.ĭf1 <- read.csv("./data/sleep.csv") # base R read.csv() function It takes me many hours to research, learn, and put together tutorials. Consider being a patron and supporting my work?ĭonate and become a patron: If you find value in what I do and have learned something from my site, please consider becoming a patron. Get source code for this RMarkdown script here.
0 Comments
Read More
Leave a Reply. |