Recursive Definitions in Pandas

Recursive definitions in Pandas

As I noted in a comment, you can use scipy.signal.lfilter. In this case (assuming A is a one-dimensional numpy array), all you need is:

B = lfilter([a], [1.0, -b], A)

Here's a complete script:

import numpy as np
from scipy.signal import lfilter

np.random.seed(123)

A = np.random.randn(10)
a = 2.0
b = 3.0

# Compute the recursion using lfilter.
# [a] and [1, -b] are the coefficients of the numerator and
# denominator, resp., of the filter's transfer function.
B = lfilter([a], [1, -b], A)

print B

# Compare to a simple loop.
B2 = np.empty(len(A))
for k in range(0, len(B2)):
if k == 0:
B2[k] = a*A[k]
else:
B2[k] = a*A[k] + b*B2[k-1]

print B2

print "max difference:", np.max(np.abs(B2 - B))

The output of the script is:

[ -2.17126121e+00  -4.51909273e+00  -1.29913212e+01  -4.19865530e+01
-1.27116859e+02 -3.78047705e+02 -1.13899647e+03 -3.41784725e+03
-1.02510099e+04 -3.07547631e+04]
[ -2.17126121e+00 -4.51909273e+00 -1.29913212e+01 -4.19865530e+01
-1.27116859e+02 -3.78047705e+02 -1.13899647e+03 -3.41784725e+03
-1.02510099e+04 -3.07547631e+04]
max difference: 0.0

Another example, in IPython, using a pandas DataFrame instead of a numpy array:

If you have

In [12]: df = pd.DataFrame([1, 7, 9, 5], columns=['A'])

In [13]: df
Out[13]:
A
0 1
1 7
2 9
3 5

and you want to create a new column, B, such that B[k] = A[k] + 2*B[k-1] (with B[k] == 0 for k < 0), you can write

In [14]: df['B'] = lfilter([1], [1, -2], df['A'].astype(float))

In [15]: df
Out[15]:
A B
0 1 1
1 7 9
2 9 27
3 5 59

Define recursive function in Pandas dataframe

You could try something like this.

import pandas as pd
import numpy as np
df = pd.DataFrame({'date': [1,2,3,4,5,6],
'col_1': [951, 909, 867, 844, 824, 826],
'col_2': [179, 170, 164, 159, 153, 149]})

col_2_update_list = []

for i, row in df.iterrows():

if i != 0:

today_col_1 = df.at[i,'col_1']
prev_day_col_2 = df.at[i-1,'col_2']

new_col_2_val = prev_day_col_2 * today_col_1

col_2_update_list.append(new_col_2_val)

else:
col_2_update_list.append(np.nan)

df['updated_col_2'] = col_2_update_list

Recursive Dictionary for Pandas Dataframe

Try:

df.groupby([0,1]).agg(list).to_dict('index')

{('a', 'a'): {'index': [0, 1], '2': [0.2, 0.4]},
('a', 'b'): {'index': [0, 1], '2': [0.4, 0.7]}}

Pandas - Recursively look for children in dataframe

If you only want to print an indented graph, you could use a simple recursive function:

def desc(i, indent=0):
print(' '*indent + i)
for j in df.loc[df['id2'] == i, 'id1']:
desc(j, indent + 2)

for i in ('111', '222'): desc(i)

With the example df, it gives:

111
aaa
ccc
333
222
bbb
zzz
999
888
ddd
eee

Recursive loop over pandas dataframe

Here's how I would approach this (explanations in the comments):

# Replace NaN in df["Employee Number"] with empty string
df["Employee Number"] = df["Employee Number"].fillna("")

# Add a column with sets that contain the individual employee numbers
df["EN_Sets"] = df["Employee Number"].str.findall(r"\d+").apply(set)

# Build the maximal distinct employee number sets
en_sets = []
for en_set in df.EN_Sets:
union_sets = []
keep_sets = []
for s in en_sets:
if s.isdisjoint(en_set):
keep_sets.append(s)
else:
union_sets.append(s)
en_sets = keep_sets + [en_set.union(*union_sets)]

# Build a dictionary with the replacement strings as keys the distinct sets
# as values
en_sets = {", ".join(sorted(s)): s for s in en_sets}

# Apply-function to replace the original employee number strings
def setting_en_numbers(s):
for en_set_str, en_set in en_sets.items():
if not s.isdisjoint(en_set):
return en_set_str

# Apply the function to df["Employee Number"]
df["Employee Number"] = df.EN_Sets.apply(setting_en_numbers)
df = df[["Company", "Employee Number"]]

Result for

df:
Company Employee Number
0 1 12
1 2 34, 12
2 3 56, 34, 78
3 4 90
4 5 NaN

is

   Company Employee Number
0 1 12, 34, 56, 78
1 2 12, 34, 56, 78
2 3 12, 34, 56, 78
3 4 90
4 5

Recursive Operation in Pandas

Check with networkx , you need a direction graph with 'root' to 'leaf' path

import networkx as nx
G=nx.from_pandas_edgelist(df,source='operator',target='nextval', edge_attr=None, create_using=nx.DiGraph())
road=[]
for n in G:
if G.out_degree(n)==0: #leaf
road.append(nx.shortest_path(G, 1, n))

road
Out[82]: [[1, 2, 4], [1, 3, 5, 6]]

Update

import networkx as nx
G=nx.from_pandas_edgelist(df,source='operator',target='nextval', edge_attr=None, create_using=nx.DiGraph())
road=[]
for n in G:
if G.out_degree(n)==0: #leaf
road.append(list(nx.all_simple_paths(G, 1, n)))

road
Out[509]: [[[1, 3, 5, 6], [1, 6]], [[1, 2, 4]]]


Related Topics



Leave a reply



Submit