Aggregating All Unique Values of Each Column of Data Frame

Aggregating all unique values of each column of data frame

Moved from comments:

library(data.table)

dt <- as.data.table(data)
dt[, lapply(.SD, function(x) toString(unique(x))), by = a]

giving:

   a               b      c                  d
1: 1 apples, oranges 12, 22 Monday
2: 2 apples 45, 67 Tuesday, Wednesday
3: 3 grapefruit 28 Tuesday

Aggregate unique values from multiple columns with pandas GroupBy

Use groupby and agg, and aggregate only unique values by calling Series.unique:

df.astype(str).groupby('prop1').agg(lambda x: ','.join(x.unique()))

prop2 prop3 prop4
prop1
K20 12,1,66 travis,leo 10.0,4.0
L30 3,54,11,10 bob,john 11.2,10.0

df.astype(str).groupby('prop1', sort=False).agg(lambda x: ','.join(x.unique()))

prop2 prop3 prop4
prop1
L30 3,54,11,10 bob,john 11.2,10.0
K20 12,1,66 travis,leo 10.0,4.0

If handling NaNs is important, call fillna in advance:

import re
df.fillna('').astype(str).groupby('prop1').agg(
lambda x: re.sub(',+', ',', ','.join(x.unique()))
)

prop2 prop3 prop4
prop1
K20 12,1,66 travis,leo 10.0,4.0
L30 3,54,11,10 bob,john 11.2,10.0

How to get unique values from multiple columns in a pandas groupby

You can do it with apply:

import numpy as np
g = df.groupby('c')['l1','l2'].apply(lambda x: list(np.unique(x)))

Aggregate in R based on unique values in column

We need to get the length on the unique elements

aggregate(store~item, data = df,FUN = function(x) length(unique(x)))

Or if we are using dplyr

library(dplyr)
df %>%
group_by(item) %>%
summarise(storen = n_distinct(store))

Pandas, for each unique value in one column, get unique values in another column

Here are two strategies to do it. No doubt, there are other ways.

Assuming your dataframe looks something like this (obviously with more columns):

df = pd.DataFrame({'author':['a', 'a', 'b'], 'subreddit':['sr1', 'sr2', 'sr2']})

>>> df
author subreddit
0 a sr1
1 a sr2
2 b sr2
...

SOLUTION 1: groupby

More straightforward than solution 2, and similar to your first attempt:

group = df.groupby('author')

df2 = group.apply(lambda x: x['subreddit'].unique())

# Alternatively, same thing as a one liner:
# df2 = df.groupby('author').apply(lambda x: x['subreddit'].unique())

Result:

>>> df2
author
a [sr1, sr2]
b [sr2]

The author is the index, and the single column is the list of all subreddits they are active in (this is how I interpreted how you wanted your output, according to your description).

If you wanted the subreddits each in a separate column, which might be more useable, depending on what you want to do with it, you could just do this after:

df2 = df2.apply(pd.Series)

Result:

>>> df2
0 1
author
a sr1 sr2
b sr2 NaN

Solution 2: Iterate through dataframe

you can make a new dataframe with all unique authors:

df2 = pd.DataFrame({'author':df.author.unique()})

And then just get the list of all unique subreddits they are active in, assigning it to a new column:

df2['subreddits'] = [list(set(df['subreddit'].loc[df['author'] == x['author']])) 
for _, x in df2.iterrows()]

This gives you this:

>>> df2
author subreddits
0 a [sr2, sr1]
1 b [sr2]

Grouping unique column values to get average of each unique value in pandas dataframe column

Try this, maybe -

df_Paid['Days'] = df_Paid['Days'].astype(int)
df_Paid.groupby(['Charge Code'])['Days'].mean()

list unique values for each column in a data frame

Let dat be your data frame after reading in the csv file, you can do

ulst <- lapply(dat, unique)

If you further want to know the number of unique values for each column, do

k <- lengths(ulst)

show unique values for each column

The number of unique values are different in those two columns. You need to reduce them to a single element.

df2 <- map(df, ~str_c(unique(.x),collapse = ",")) %>% 
bind_rows() %>%
gather(key = col_name, value = col_unique)
> df2
# A tibble: 2 x 2
col_name col_class
<chr> <chr>
1 v1 1,2,3
2 v2 a,b


Related Topics



Leave a reply



Submit