How to Create Variable Columns and Fill Them Up

How to create variable columns and fill them up?

columns can do this:

#parent {
background-color: firebrick;
column-width:120px; /* set the width of columns and the number will be automatic */
column-gap: 20px; /* to replace margin between element */
padding:0 10px;
}

.child {
background-color: #fff;
height: 30px;
display:inline-block; /* use inline-block because block element are buggy */
width:100%; /* make them full width so they behave like block */
margin:10px 0; /* keep only top and bottom margin */
padding: 3px;
box-sizing:border-box;
}
<div id="parent">
<div class="child">child</div>
<div class="child">child</div>
<div class="child">child</div>
<div class="child">child</div>
<div class="child">child</div>
<div class="child">child</div>
<div class="child">child</div>
<div class="child">child</div>
</div>

How do I create a new variable in my dataframe filling the values with the dataframe name?

Before you concat/join your dataframes together add a new column with the countries name as the default value, then concat.

print(df.name)
>>> Iran
print(df2.name)
>>> United States of America

df['Name'] = df.name
df2['Name'] = df2.name
countryDF = pd.concat([df, df2], axis=1).reset_index()

Dont know what added manipulations you are wanting to do i.e. Cutting out columns etc.

Create a new column a fill with values from a set of multiple columns conditional on column names

One option to achieve your desired result would be via an if condition:

library(dplyr)
library(stringr)
df %>%
rowwise() %>%
mutate(new_col = if (str_c('A0', X) %in% names(.)) get(str_c('A0', X)) else NA) %>%
ungroup()
#> # A tibble: 8 × 9
#> A01 A02 A03 A04 A05 A06 A07 X new_col
#> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 0 0 -5 -1 -1 2 3 2 0
#> 2 0 -1 -4 -3 -3 -3 -3 2 -1
#> 3 2 0 2 3 1 3 3 6 3
#> 4 0 1 -4 1 -1 1 1 7 1
#> 5 4 4 3 3 3 4 4 12 NA
#> 6 1 4 2 -3 0 0 0 15 NA
#> 7 10 9 8 9 7 7 7 22 NA
#> 8 10 12 12 12 10 12 9 24 NA

Turn field names into column names for specific variables and fill them with certain logic

Not sure I fully understand the question, but the code below produces your example dataframe.

library(tidyverse)
product<-c("ab","ab","ab","ac","ac","ac")
shop<-c("sad","sad","sad","sadas","fghj","xzzv")
category<-c("a","a","a","c","b","b")
tempr<-c(35,35,14,24,14,5)
value<-c(0,0,-6,8,4,0)
store<-data.frame(product,shop,category,tempr,value)

store %>% filter(value != 0 ) %>% # Remove 0 values
mutate(combined = paste0(tempr,"(",value,")")) %>% # Combine columns for spread
select(-tempr,-value) %>% #
spread(shop,combined) # spread to create shop columns and temr/value values.

# product category fghj sad sadas
# 1 ab a <NA> 14(-6) <NA>
# 2 ac b 14(4) <NA> <NA>
# 3 ac c <NA> <NA> 24(8)

Create new sequentially named variables and fill with mean of level

Depending on if I understood you right, I'll propose this giant ball of duct tape...

# fake data
dummydata <- data.frame(id=c(1:100),sex=rep(c(1,0),50),WBC=rnorm(100),RBC=rnorm(100))

# a function to calculate decile means
decilemean <- function(x) {
xrank <- rank(x)
xdec <- floor((xrank-1)/length(x)*10)+1
decmeans <- as.numeric(tapply(x,xdec,mean))
xdecmeans <- decmeans[xdec]
return(xdecmeans)
}

# looping thru your data columns and making new columns
newcol <- 5 # the first new column to create
for(j in c(3,4)) { # all of your colums to decilemean-ify
dummydata[,newcol] <- NA
dummydata[dummydata$sex==0,newcol] <- decilemean(dummydata[dummydata$sex==0,j])
names(dummydata)[newcol] <- paste0(names(dummydata)[j],"_decmean_women")
dummydata[,newcol+1] <- NA
dummydata[dummydata$sex==1,newcol+1] <- decilemean(dummydata[dummydata$sex==1,j])
names(dummydata)[newcol+1] <- paste0(names(dummydata)[j],"_decmean_men")
newcol <- newcol+2
}

I'd recommend testing it though ;)

Creating columns for each observed value of a variable

Base R approach, we can split the outcome column based on id and create a dataframe incrementally adding one value at a time in outcome variable and filling rest of them with NA and finally rbind these list of dataframes into one dataframe.

n <- 5
df[paste0("outcome_t", seq_len(n))] <- do.call(rbind,
lapply(split(df$outcome, df$id), function(x)
t(sapply(seq_along(x), function(y) c(x[seq_len(y - 1)], rep(NA, n - (y - 1)))))))

df
# id t outcome outcome_t1 outcome_t2 outcome_t3 outcome_t4 outcome_t5
#1 1 1 10 NA NA NA NA NA
#2 1 2 20 10 NA NA NA NA
#3 1 3 30 10 20 NA NA NA
#4 1 4 40 10 20 30 NA NA
#5 1 5 40 10 20 30 40 NA
#6 2 1 20 NA NA NA NA NA
#7 2 2 30 20 NA NA NA NA
#8 2 3 40 20 30 NA NA NA
#9 2 4 40 20 30 40 NA NA
#10 2 5 20 20 30 40 40 NA

A tidyverse option using separate

library(tidyverse)

df %>%
group_by(id) %>%
mutate(new = map_chr(seq_along(outcome),
~paste0(outcome[seq_len(. - 1)], collapse = ","))) %>%
separate(new, into = paste0("outcome_t", seq_len(n)),
sep = ",", fill = "right") %>%
mutate(outcome_t1 = replace(outcome_t1, outcome_t1 == "", NA))

data

df <- data.frame(id = rep(c(1, 2), each = 5), t = 1:5, 
outcome = c(10, 20, 30, 40, 40, 20, 30, 40, 40, 20))

How to create columns/variables by extracting characters from given column in R

Try

library(tidyr)
df_sep <- separate(df, key, into=c("State","Zip_Code", "Age_Group", "Race", "Gender"), sep="_")

State Zip_Code Age_Group Race Gender date census
1 01 35004 10-14 + M 11NOV2001 2.934397
2 01 35004 10-14 + M 06JAN2002 3.028231
3 01 35004 10-14 + M 07APR2002 3.180712
4 01 35004 10-14 + M 02JUN2002 3.274546
5 01 35004 10-14 + M 28JUL2002 3.368380
6 01 35004 10-14 + M 22SEP2002 3.462214
7 01 35004 10-14 + M 22DEC2002 3.614694
8 01 35004 10-14 + M 16FEB2003 3.708528
9 01 35004 10-14 + M 13JUL2003 3.954843
10 01 35004 10-14 + M 07SEP2003 4.048677

Edit: Alright, in your comments you have made it clear that you really want to have a solution that loops through observations, which is an inefficient approach and for a good reason typically considered bad practice. Having expressed my objections, let me show you one approach:

First, we need to populate the dataframe with the columns. To use your approach, this would be:

Var = c("State","Zip_Code", "Age_Group", "Race", "Gender")
for(j in Var){
df <- within(df, assign(j, NA))
}

However, a more efficient approach would be:

df[, Var]<- NA

Both give:

head(df)
key date census State Zip_Code Age_Group Race Gender
1 01_35004_10-14_+_M 11NOV2001 2.934397 NA NA NA NA NA
2 01_35004_10-14_+_M 06JAN2002 3.028231 NA NA NA NA NA
3 01_35004_10-14_+_M 07APR2002 3.180712 NA NA NA NA NA
4 01_35004_10-14_+_M 02JUN2002 3.274546 NA NA NA NA NA
5 01_35004_10-14_+_M 28JUL2002 3.368380 NA NA NA NA NA
6 01_35004_10-14_+_M 22SEP2002 3.462214 NA NA NA NA NA

Now, for each observation, we want to split key into components and fill columns 4 to 8 with the corresponding elements. This will be achieved with the following:

df[, Var] <- t(sapply(df$key, function(x) unlist(strsplit(as.character(x[1]), "_"))))

Here, sapply loops through the elements of df$key and passes each element as argument the the function that I have defined, and collects the result in an array.

See:

sapply(df$key, function(x) unlist(strsplit(as.character(x[1]), "_")))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "01" "01" "01" "01" "01" "01" "01" "01" "01" "01"
[2,] "35004" "35004" "35004" "35004" "35004" "35004" "35004" "35004" "35004" "35004"
[3,] "10-14" "10-14" "10-14" "10-14" "10-14" "10-14" "10-14" "10-14" "10-14" "10-14"
[4,] "+" "+" "+" "+" "+" "+" "+" "+" "+" "+"
[5,] "M" "M" "M" "M" "M" "M" "M" "M" "M" "M"

Transposing it t() makes sure that it "fits" into the dataframe df[, Var], and here you see that the results are identical:

identical(df[,Var], df_sep[Var])
[1] TRUE

I assume that some of the entries in df$key differ in their format, which is why you may want to check each value first. To do so, you can just embellish the function in the sapply call.

How do I fill a column with one value in Pandas?

Just select the column and assign like normal:

In [194]:
df['A'] = 'foo'
df

Out[194]:
A
0 foo
1 foo
2 foo
3 foo

Assigning a scalar value will set all the rows to the same scalar value



Related Topics



Leave a reply



Submit