## Add column which contains binned values of a numeric column

See `?cut`

and specify `breaks`

(and maybe `labels`

).

`x$bins <- cut(x$rank, breaks=c(0,4,10,15), labels=c("1-4","5-10","10-15"))`

x

# rank name info bins

# 1 1 steve red 1-4

# 2 3 joe blue 1-4

# 3 6 john green 5-10

# 4 3 liz yellow 1-4

# 5 15 jon pink 10-15

## Binning a column with Python Pandas

You can use `pandas.cut`

:

`bins = [0, 1, 5, 10, 25, 50, 100]`

df['binned'] = pd.cut(df['percentage'], bins)

print (df)

percentage binned

0 46.50 (25, 50]

1 44.20 (25, 50]

2 100.00 (50, 100]

3 42.12 (25, 50]

`bins = [0, 1, 5, 10, 25, 50, 100]`

labels = [1,2,3,4,5,6]

df['binned'] = pd.cut(df['percentage'], bins=bins, labels=labels)

print (df)

percentage binned

0 46.50 5

1 44.20 5

2 100.00 6

3 42.12 5

Or `numpy.searchsorted`

:

`bins = [0, 1, 5, 10, 25, 50, 100]`

df['binned'] = np.searchsorted(bins, df['percentage'].values)

print (df)

percentage binned

0 46.50 5

1 44.20 5

2 100.00 6

3 42.12 5

...and then `value_counts`

or `groupby`

and aggregate `size`

:

`s = pd.cut(df['percentage'], bins=bins).value_counts()`

print (s)

(25, 50] 3

(50, 100] 1

(10, 25] 0

(5, 10] 0

(1, 5] 0

(0, 1] 0

Name: percentage, dtype: int64

`s = df.groupby(pd.cut(df['percentage'], bins=bins)).size()`

print (s)

percentage

(0, 1] 0

(1, 5] 0

(5, 10] 0

(10, 25] 0

(25, 50] 3

(50, 100] 1

dtype: int64

By default `cut`

returns `categorical`

.

`Series`

methods like `Series.value_counts()`

will use all categories, even if some categories are not present in the data, operations in categorical.

## How do I reassign the values of a column based on different ranges in R?

We could use `case_when`

from `dplyr`

package:

`library(dplyr)`

df %>%

mutate(NEW = case_when(sleep_duration < 5 ~ 3,

sleep_duration >=5 & sleep_duration < 6 ~ 2,

sleep_duration >=6 & sleep_duration < 7 ~ 1,

sleep_duration >=7 ~ 0))

Output:

` sleep_duration NEW`

1 6.0 1

2 7.5 0

3 8.0 0

4 10.0 0

5 5.0 2

6 9.0 0

data:

`df <- data.frame(sleep_duration = c(6, 7.5, 8, 10, 5, 9))`

## How to bin data based on values in one column, and count occurrences from another column excluding duplicates in R?

Will This work?

`df <- data.frame(CNV=c("1:10405137","1:10405137","1:10405137","1:101161140","1:110028467")`

,r_value=c(0.035118621,0.070643341,0.391963719,0.376573375,0.950231679))

> df # minimal example

CNV r_value

1 1:10405137 0.03511862

2 1:10405137 0.07064334

3 1:10405137 0.39196372

4 1:101161140 0.37657337

5 1:110028467 0.95023168

df1 <- transform(df, group=cut(r_value,

breaks=c(0,0.1,0.2, 0.3, 0.4, 0.5,1),

labels=c("<0.1","0.1","0.2", "0.3", "0.4", "0.5<")))

res <- do.call(data.frame,aggregate(r_value~group, df1,

FUN=function(x) c(Count=length(x))))

> res # counts of intervals

group r_value

1 <0.1 2

2 0.3 2

3 0.5< 1

dNew <- data.frame(group=levels(df1$group))

dNew <- merge(res, dNew, all=TRUE)

colnames(dNew) <- c("interval","count")

> dNew # count of CNV by interval

interval count

1 <0.1 2

2 0.1 NA

3 0.2 NA

4 0.3 2

5 0.4 NA

6 0.5< 1

adapted from Group/bin/bucket data in R and get count per bucket and sum of values per bucket

## Add column into a dataframe based on condition

EDIT:

Your code is NOT Wrong.

You just have to reconvert your result into factor like this:

` df<-data.frame(B=c("A","B","C","C"), C=c("A","C","B","B"), D=c("B","A","C","A") ) `

df$A<-levels(df$B)[with(df,ifelse(df$B==df$C,df$D,df$C))]

To see why this happen you have to see what ifelse does:

`debugonce(ifelse)`

ifelse(df$B==df$C,df$D,df$C)

Keep in Mind "Factor variables are stored, internally, as numeric variables together with their levels. The actual values of the numeric variable are 1, 2, and so on."

In particular ifelse assign to the answer vector boolean values, that is you start with a logical vector. Then based on test comparison, ifelse subset this ans vector assigning "yes" values. So R keep the vector rapresentation.

Briefly something like this happen and you lose the factor rapresentation

` a<-c(TRUE,FALSE)`

a[1]<-df$D[1]

df$D

a

Try also this working example (an alternative way to do the same thing)

`df<-data.frame(B=c("A","B","C","C"), C=c("A","C","B","B"), D=c("B","A","C","A") )`

f<-data.frame(b,c,d)

df

f<-function(x,y,z){

if(x==y){

z

}else{

y

}

}

df$A<-unlist(Map(f,df$B,df$C,df$D))

### Related Topics

Lm' Summary Not Display All Factor Levels

Numeric Comparison Difficulty in R

Global and Local Variables in R

Axis Labels on Two Lines With Nested X Variables (Year Below Months)

Storing Ggplot Objects in a List from Within Loop in R

Plot Multiple Boxplot in One Graph

Reorder Levels of a Factor Without Changing Order of Values

Generate List of All Possible Combinations of Elements of Vector

Add a Common Legend For Combined Ggplots

Find Indices of Duplicated Rows

Create Stacked Barplot Where Each Stack Is Scaled to Sum to 100%

Looping Over a Date or Posixct Object Results in a Numeric Iterator

Specify Custom Date Format For Colclasses Argument in Read.Table/Read.Csv

Adding a New Column Based Upon Values in Another Column Using Dplyr

Cleaning Up Factor Levels (Collapsing Multiple Levels/Labels)