"Unpacking" a Factor List from a Data.Frame

unpacking a factor list from a data.frame

The answer will depend on the format of category_list. If in fact it is a list for each row

Something like

mydf <- data.frame(ID = paste0('ID',1:3), 
category_list = I(list(c('cat1','cat2','cat3'), c('cat2','cat3'), c('cat1'))),
xval = 1:3, yval = 1:3)

or

library(data.table)
mydf <- as.data.frame(data.table(ID = paste0('ID',1:3),
category_list = list(c('cat1','cat2','cat3'), c('cat2','cat3'), c('cat1')),
xval = 1:3, yval = 1:3) )

Then you can use plyr and merge to create your long form data

 newdf <- merge(mydf, ddply(mydf, .(ID), summarize, cat_list = unlist(category_list)), by = 'ID')

ID category_list xval yval cat_list
1 ID1 cat1, cat2, cat3 1 1 cat1
2 ID1 cat1, cat2, cat3 1 1 cat2
3 ID1 cat1, cat2, cat3 1 1 cat3
4 ID2 cat2, cat3 2 2 cat2
5 ID2 cat2, cat3 2 2 cat3
6 ID3 cat1 3 3 cat1

or a non-plyr approach that doesn't require merge

 do.call(rbind,lapply(split(mydf, mydf$ID), transform, cat_list = unlist(category_list)))

Unpacking and merging lists in a column in data.frame

Here's a possible data.table approach

library(data.table)
setDT(dat)[, .(name = c(name, unlist(altNames))), by = id]
# id name
# 1: 1001 Joan
# 2: 1002 Jane
# 3: 1002 Janie
# 4: 1002 Janet
# 5: 1002 Jan
# 6: 1003 John
# 7: 1003 Jon
# 8: 1004 Bill
# 9: 1004 Will
# 10: 1005 Tom

Unpacking a list-column of multi-row tibbles while preserving the number of rows

Here's one solution using unnest_wider

library(tidyr)
unnest_wider(tmp, y) %>%
unnest_wider(a, names_repair = ~gsub('...', 'a', .)) %>%
unnest_wider(b, names_repair = ~gsub('...', 'b', .))

New names:
* `` -> ...1
...
New names:
* `` -> ...1
* `` -> ...2
# A tibble: 2 x 5
x a1 a2 b1 b2
<dbl> <dbl> <int> <dbl> <int>
1 1 1 NA 2 NA
2 2 4 5 6 7

Unpacking unknown object from within DataFrame

It seems you have object/class with properties id and name so you can try to get

{'id': st.id, 'name': st.name}

which means

df['customer_details'] = df['customer_details'].apply(lambda x: {'id': x.id, 'name': x.name})

or directly to separated columns

df['id']   = df['customer_details'].apply(lambda x: x.id)
df['name'] = df['customer_details'].apply(lambda x: x.name)

Example code:

import pandas as pd

class customer:
def __init__(self, id_, name):
self.id = id_
self.name = name
def __str__(self):
return '<customer {{id: {}, name: {}}} as x>'.format(self.id, self.name)

data = {
'trasaction_id': [1,2,3],
'customer_details': [
customer('A123', 'Tina'),
customer('B456', 'Tony'),
customer('C789', 'Tim')
],
}

df = pd.DataFrame(data)
print(df)

# ---

df['id'] = df['customer_details'].apply(lambda x: x.id)
df['name'] = df['customer_details'].apply(lambda x: x.name)
print(df)

df['customer_details'] = df['customer_details'].apply(lambda x: {'id': x.id, 'name': x.name})
print(df)

#new_df = pd.DataFrame( df['customer_details'].to_list() )

Result:

   trasaction_id                        customer_details
0 1 <customer {id: A123, name: Tina} as x>
1 2 <customer {id: B456, name: Tony} as x>
2 3 <customer {id: C789, name: Tim} as x>

trasaction_id customer_details id name
0 1 <customer {id: A123, name: Tina} as x> A123 Tina
1 2 <customer {id: B456, name: Tony} as x> B456 Tony
2 3 <customer {id: C789, name: Tim} as x> C789 Tim

trasaction_id customer_details id name
0 1 {'id': 'A123', 'name': 'Tina'} A123 Tina
1 2 {'id': 'B456', 'name': 'Tony'} B456 Tony
2 3 {'id': 'C789', 'name': 'Tim'} C789 Tim

EDIT: If you have strings then you can use regex to get values from string

import pandas as pd
import re

data = {
'trasaction_id': [1,2,3],
'customer_details': [
"<customer {id:'A123', name: 'Tina'} as x >",
"<customer {id:'B456', name: 'Tony'} as x >",
"<customer {id:'C789', name: 'Tim'} as x >",
]
}

df = pd.DataFrame(data)
print(df)

# ---

df['id'] = df['customer_details'].apply(lambda x: re.search("id:'(.*)',", x)[1])
df['name'] = df['customer_details'].apply(lambda x: re.search("name: '(.*)'}", x)[1])
print(df)

def myfunc(x):
r = re.search("id:'(.*)', name: '(.*)'}", x)
return {'id': r[1], 'name': r[2]}

df['customer_details'] = df['customer_details'].apply(myfunc)
print(df)

#new_df = pd.DataFrame( df['customer_details'].to_list() )

How to unpack a Series of tuples in Pandas?

maybe this is most strightforward (most pythonic i guess):

out.apply(pd.Series)

if you would want to rename the columns to something more meaningful, than:

out.columns=['Kstats','Pvalue']

if you do not want the default name for the index:

out.index.name=None


Related Topics



Leave a reply



Submit