Easiest Way to Convert Two Columns to Python Dictionary

Easiest way to convert two columns to Python dictionary

I just copied and pasted the important data into a text file called 'countries.txt' then did something like this:

import string

myFilename = "countries.txt"

myTuples = []


myFile = open (myFilename, 'r')

for line in myFile.readlines():
splitLine = string.split (line)
code = splitLine [-3]
country = string.join(splitLine[:-3])
myTuples.append(tuple([country, code]))

myDict = dict(myTuples)
print myDict

It's probably not the "best" way to do it, but it seems to work.

Here it is following John Machin's helpful recommendations:

import string

myFilename = "countries.txt"


myDict = {}

myFile = open (myFilename, 'r')

for line in myFile:
splitLine = string.split (line)
code = splitLine [-3]
country = " ".join(splitLine[:-3])
myDict[country] = code

print myDict

How to convert a two column csv file to a dictionary in python

Trick if you always have only two columns:

dict(df.itertuples(False,None))

Or make it a pandas.Series and use to_dict:

df.set_index("Name1")["Name2"].to_dict()

Output:

{'ASMITH': 'A Smith', 'JSMITH': 'J Smith'}

Note that if you need a mapper to a pd.Series.replace, Series works just as fine as a dict.

s = df.set_index("Name1")["Name2"]
df["Name1"].replace(s, regex=True)

0 J Smith
1 A Smith
Name: Name1, dtype: object

Which also means that you can remove to_dict and cut some overhead:

large_df = df.sample(n=100000, replace=True)

%timeit large_df.set_index("Name1")["Name2"]
# 4.76 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit large_df.set_index("Name1")["Name2"].to_dict()
# 20.2 ms ± 976 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

How to convert two columns values into a key-value pair dictionary?

Create Series and convert to dict:

d = df.set_index('event_type')['count'].to_dict()
print (d)
{'a': 29, 'b': 1042, 'c': 2928, 'd': 4492}

How to convert dataframe columns into a dictionary with one key and multiple value without tuples?

Use to_dict('list') on the transposed DataFrame:

df.set_index('km').T.to_dict('list')

output:

{24.6: ['test', 43, 555], 63.9: ['test', 31, 666]}

NB. note that in case you have duplicated values in "km", as you can only have unique keys in a dictionary, only the latest row will be kept

How to create a dictionary of two pandas DataFrame columns

In [9]: pd.Series(df.Letter.values,index=df.Position).to_dict()
Out[9]: {1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

Speed comparion (using Wouter's method)

In [6]: df = pd.DataFrame(randint(0,10,10000).reshape(5000,2),columns=list('AB'))

In [7]: %timeit dict(zip(df.A,df.B))
1000 loops, best of 3: 1.27 ms per loop

In [8]: %timeit pd.Series(df.A.values,index=df.B).to_dict()
1000 loops, best of 3: 987 us per loop

Pandas transform two columns of lists into a columns dictionary with repeated keys

Use custom function with defaultdict if performance is important:

from collections import defaultdict

def f(x):
d = defaultdict(list)
for y, z in zip(*x):
d[y].append(z)
return d

df['New Dict Column'] = [ f(x) for x in df[['column1','column2']].to_numpy()]
print(df)
column1 column2 New Dict Column
0 [a, b, c, a] [1, 2, 3, 4] {'a': [1, 4], 'b': [2], 'c': [3]}
1 [b, b, a] [1, 2, 3] {'b': [1, 2], 'a': [3]}

Performance is really good, 10 times faster:

#20k rows for test
df = pd.concat([df] * 10000, ignore_index=True)


In [211]: %timeit df.apply(lambda data: {k: [y for x, y in zip(data[0], data[1]) if x == k] for k in data[0]}, axis=1)
532 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [212]: %timeit [ f(x) for x in df[['column1','column2']].to_numpy()]
53.8 ms ± 596 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

python dataframe to dictionary with multiple columns in keys and values

You can loop through the DataFrame.

Assuming your DataFrame is called "df" this gives you the dict.

result_dict = {}
for idx, row in df.iterrows():
result_dict[(row.origin, row.dest, row['product'], row.ship_date )] = (
row.origin, row.dest, row['product'], row.truck_in )

Since looping through 400k rows will take some time, have a look at tqdm (https://tqdm.github.io/) to get a progress bar with a time estimate that quickly tells you if the approach works for your dataset.

Also, note that 400K dictionary entries may take up a lot of memory so you may try to estimate if the dict fits your memory.

Another, memory waisting but faster way is to do it in Pandas

Create a new column with the value for the dictionary

df['value'] = df.apply(lambda x: (x.origin, x.dest, x['product'], x.truck_in), axis=1)

Then set the index and convert to dict

df.set_index(['origin','dest','product','ship_date'])['value'].to_dict()

Pandas - Convert two columns into a new column as a dictionary

IIUC correctly then you use apply with a lambda:

In [19]:
df['merged'] = df.apply(lambda row: {row['Stage_Name']:row['Metrics']}, axis=1)
df

Out[19]:
Block_Name Metrics Stage_Name merged
0 A [(P, P), (Q, Q)] P {'P': [('P', 'P'), ('Q', 'Q')]}
1 B (K, K) K {'K': ('K', 'K')}
2 A (Z, Z) Z {'Z': ('Z', 'Z')}


Related Topics



Leave a reply



Submit