Combining .Sd with Renamed Variable Messes with Names of .Sd Columns

Combining .SD with renamed variable messes with names of .SD columns

I assume this is because you wrap .SD in list (.()). The list(.SD) generates a list containing .SD, instead of only the .SD. This then messes with the naming.

Check str of .SD wrapped in list:

dt[, str(.(.SD)), .SDcol = my_vars]
# List of 1
# $ :Classes ‘data.table’ and 'data.frame': 5 obs. of 2 variables:
# ..$ cyl: num [1:5] 6 6 4 6 8
# ..$ vs : num [1:5] 0 0 1 1 0

Corresponding output has the .SD. prefix:

dt[ , .(.SD), .SDcol = my_vars]
# .SD.cyl .SD.vs
# 1: 6 0
# 2: 6 0
# 3: 4 1
# 4: 6 1
# 5: 8 0

Check str of .SD only:

dt[, str(.SD), .SDcol = my_vars]
# Classes ‘data.table’ and 'data.frame': 5 obs. of 2 variables:
# $ cyl: num 6 6 4 6 8
# $ vs : num 0 0 1 1 0

Given the basic property of j - "As long as j returns a list, each element of the list becomes a column in the resulting data.table" - and that .SD already is a list (check dt[ , is.list(.SD)]), we can use c to combine .SD with other list elements, e.g. your renamed column wrapped in list:

dt[, c(.SD, .(z = gear)), .SDcol = my_vars]
# cyl vs z
# 1: 6 0 4
# 2: 6 0 4
# 3: 4 1 4
# 4: 6 1 3
# 5: 8 0 3

Is there some way to keep variable names from.SD+.SDcols together with non .SD variable names in data.table?

not really sure why.. but it works ;-)

DT[, .(a, (.SD)), .SDcols=x:y]
# a x v y
# 1: 1 b 1 1
# 2: 2 b 1 3
# 3: 3 b 1 6
# 4: 4 a 2 1
# 5: 5 a 2 3
# 6: 6 a 1 6
# 7: 7 c 1 1
# 8: 8 c 2 3
# 9: 9 c 2 6

Using data.table in R, how can I supply the entire .SD to j in addition to creating new variables?

It seems to work if you put .SD in c() instead of the list.

library(data.table)

mpg[,c(.SD, .(gallon_ratio = hwy/cty))]

# manufacturer model displ year cyl trans drv cty hwy fl class gallon_ratio
# 1: audi a4 1.8 1999 4 auto(l5) f 18 29 p compact 1.611111
# 2: audi a4 1.8 1999 4 manual(m5) f 21 29 p compact 1.380952
# 3: audi a4 2.0 2008 4 manual(m6) f 20 31 p compact 1.550000
# 4: audi a4 2.0 2008 4 auto(av) f 21 30 p compact 1.428571
# 5: audi a4 2.8 1999 6 auto(l5) f 16 26 p compact 1.625000
---
#230: volkswagen passat 2.0 2008 4 auto(s6) f 19 28 p midsize 1.473684
#231: volkswagen passat 2.0 2008 4 manual(m6) f 21 29 p midsize 1.380952
#232: volkswagen passat 2.8 1999 6 auto(l5) f 16 26 p midsize 1.625000
#233: volkswagen passat 2.8 1999 6 manual(m5) f 18 26 p midsize 1.444444
#234: volkswagen passat 3.6 2008 6 auto(s6) f 17 26 p midsize 1.529412

Using both vectors of column names and hard-coded columns in j

You are looking for a list. This does not modify the object

 mtcars[,c(mget(mycols), .(newcol = get(mycol3)*3))]
cyl disp newcol
1: 6 160.0 12
2: 6 160.0 12
3: 4 108.0 3
4: 6 258.0 3
5: 8 360.0 6
6: 6 225.0 3
7: 8 360.0 12
8: 4 146.7 6
9: 4 140.8 6

edit:
as @Henrik pointed out, we need the mpg variable too:

mtcars[,c(mget(mycols), .(newcol = get(mycol3)*3, mpg = mpg))]
cyl disp newcol mpg
1: 6 160.0 12 21.0
2: 6 160.0 12 21.0
3: 4 108.0 3 22.8
4: 6 258.0 3 21.4
5: 8 360.0 6 18.7

or even

mtcars[,c(mget(mycols), .(newcol = get(mycol3)*3), .(mpg = mpg))]
cyl disp newcol mpg
1: 6 160.0 12 21.0
2: 6 160.0 12 21.0
3: 4 108.0 3 22.8
4: 6 258.0 3 21.4
5: 8 360.0 6 18.7
6: 6 225.0 3 18.1

That is, just add the the named variable mpg or any other into the list.

I want to rename a value of a column

rename is for renaming the whole column, i.e. its name. What you want is to change some values of a column.

Try

library(tidyverse)
gdp %>%
mutate(Country = if_else(Country == 'Russian Federation', 'Russia', Country))

Retaining all columns in both data tables during a join, then adding a column

Here's a workaround thanks to a post by sritchie73 at the link @Henrik provided in his comment above. One solution is to copy the variables which are used in the join prior to the join so that they're retained in the result and can be used in the calculation.

# Copy loc variables
dt1[, loc1 := loc]
dt2[, loc2 := loc]

# Perform join, calculate delta, drop loc1 & loc2
dt2[dt1,
on = c("ID", "loc"),
roll = "nearest"][
, delta := abs(loc1 - loc2)][
, c("loc1", "loc2") := NULL][]

which gives,

#     ID loc         e          f         g          h          a          b          c           d delta
# 1: E 2 0.6080648 0.59558616 0.9680243 0.65885155 0.75533475 0.46796072 0.07874670 0.372224933 1
# 2: B 22 0.2900181 0.89395076 0.5012072 0.81403388 0.24129711 0.66914193 0.11941211 0.330982361 5
# 3: C 23 0.7753557 0.31772779 0.3302613 0.02004258 0.32252276 0.09341920 0.29665070 0.563954195 6
# 4: A 46 0.1193827 0.89183103 0.7142606 0.17231293 0.62979589 0.19621242 0.48943734 0.318145133 0
# 5: B 26 0.2900181 0.89395076 0.5012072 0.81403388 0.65672029 0.45106318 0.47421905 0.605327569 1
# 6: E 17 0.4417452 0.03226111 0.5975499 0.49336668 0.83821385 0.99078941 0.93356571 0.459227328 2
# 7: D 24 0.8974042 0.90725532 0.5008502 0.21681072 0.86831894 0.41260922 0.65389531 0.930843432 14
# 8: D 24 0.8974042 0.90725532 0.5008502 0.21681072 0.82042112 0.82906524 0.59829109 0.859362233 14
# 9: D 44 0.3958956 0.06361996 0.8068514 0.56349064 0.29823590 0.04765864 0.65412304 0.742808806 1
# 10: E 11 0.4417452 0.03226111 0.5975499 0.49336668 0.15013055 0.83683385 0.18847332 0.139363770 4
# 11: D 11 0.5967619 0.23497655 0.5429504 0.56322079 0.68644344 0.46995509 0.35128292 0.910443478 8
# 12: A 50 0.1193827 0.89183103 0.7142606 0.17231293 0.65811523 0.48901176 0.96854302 0.875838825 4
# 13: E 17 0.4417452 0.03226111 0.5975499 0.49336668 0.93484739 0.57810132 0.75250483 0.607710552 2
# 14: A 21 0.4491745 0.61724476 0.3283133 0.51406071 0.96610736 0.03222779 0.05768814 0.436536989 4
# 15: A 6 0.4491745 0.61724476 0.3283133 0.51406071 0.69975907 0.35564120 0.42206040 0.309386788 19
# 16: B 49 0.1152318 0.99716746 0.1440101 0.70734795 0.05138897 0.80463532 0.41856763 0.421029334 6
# 17: C 9 0.1204828 0.47622000 0.6802176 0.36385191 0.98509395 0.49711655 0.68159049 0.003570911 3
# 18: D 7 0.5967619 0.23497655 0.5429504 0.56322079 0.69862668 0.91597522 0.53630369 0.297000037 4
# 19: C 8 0.1204828 0.47622000 0.6802176 0.36385191 0.80761410 0.87051653 0.93177628 0.671692311 2
# 20: B 5 0.5652708 0.50866629 0.3992037 0.87643314 0.69493460 0.99878010 0.77953456 0.820925302 1

Combine two data frames by rows (rbind) when they have different sets of columns

rbind.fill from the package plyr might be what you are looking for.

how to start out with no results and get rid of all results on blank search for Sunspot?

You can sort of hit both issues with 1 fix.

# /app/controllers/search_controller.rb
class SearchController < ApplicationController
def index
@products = []

# If the search param with it's whitespace stripped off
# actually has something left then search for it
unless params[:search].nil? || params[:search].strip.empty?
@search = Product.search do
fulltext params[:search]
end
@products = @search.results
end

@products
end
end

# /app/views/search/index.haml
- if @products.empty?
Your search did not return any results.
- else
# display the results or do whatever you want to do when something is actually found

Basically what I'm proposing is that you start SearchController#index by setting @products to an empty array. If the search param is passed in we check it's stripped result (white space removed) and see if there's anything left.

If the user has searched a bunch of spaces, then strip will have reduced that to nothing and the search will not be run.

In the event the stripped version of the search param does actually have something in there (aka probably valid search text) then do the search and set @products to the set of results.

Finally, return @products.

In your view, you can then check the @products array to see if it's empty or not. If it's empty then either the user searched for whitespace (bogus) or their search didn't return anything... so you can take appropriate action based on that.

Splitting a dataframe string column into multiple columns without a pattern

If I understood your desired output correctly then the following code should work for that:

# Given data example
tabla2 <- data.frame(Extra = c(
"IMPACT=MODIFIER;DISTANCE=3802;STRAND=1;MES-SWA_acceptor_alt=-1.269;MES-SWA_acceptor_diff=-4.016;MES-SWA_acceptor_ref=-5.005;MES-SWA_acceptor_ref_comp=-5.285;MES-SWA_donor_alt=-6.610;MES-SWA_donor_diff=0.781;MES-SWA_donor_ref=-1.165;MES-SWA_donor_ref_comp=-5.829",
"IMPACT=MODIFIER;STRAND=1;MES-SWA_acceptor_alt=0.965;MES-SWA_acceptor_diff=0.290;MES-SWA_acceptor_ref=1.255;MES-SWA_acceptor_ref_comp=1.255;MES-SWA_donor_alt=-9.796;MES-SWA_donor_diff=-1.219;MES-SWA_donor_ref=-10.341;MES-SWA_donor_ref_comp=-11.015"
)
)
# Empty data frame
temp_df <- data.frame()
# Split Everything by ";"
temp_list <- strsplit(tabla2$Extra, split = ";")
# Cycle through elements to fill data frame
for (i in 1:length(temp_list)){
temp_list_2 <- strsplit(temp_list[[i]], split = "=")
for (j in 1:length(temp_list_2)){
temp_df[i, temp_list_2[[j]][1]] <- temp_list_2[[j]][2]
}
}


Related Topics



Leave a reply



Submit