Easiest way to convert two columns to Python dictionary
I just copied and pasted the important data into a text file called 'countries.txt' then did something like this:
import string
myFilename = "countries.txt"
myTuples = []
myFile = open (myFilename, 'r')
for line in myFile.readlines():
splitLine = string.split (line)
code = splitLine [-3]
country = string.join(splitLine[:-3])
myTuples.append(tuple([country, code]))
myDict = dict(myTuples)
print myDict
It's probably not the "best" way to do it, but it seems to work.
Here it is following John Machin's helpful recommendations:
import string
myFilename = "countries.txt"
myDict = {}
myFile = open (myFilename, 'r')
for line in myFile:
splitLine = string.split (line)
code = splitLine [-3]
country = " ".join(splitLine[:-3])
myDict[country] = code
print myDict
How to convert a two column csv file to a dictionary in python
Trick if you always have only two columns:
dict(df.itertuples(False,None))
Or make it a pandas.Series
and use to_dict
:
df.set_index("Name1")["Name2"].to_dict()
Output:
{'ASMITH': 'A Smith', 'JSMITH': 'J Smith'}
Note that if you need a mapper to a pd.Series.replace
, Series
works just as fine as a dict
.
s = df.set_index("Name1")["Name2"]
df["Name1"].replace(s, regex=True)
0 J Smith
1 A Smith
Name: Name1, dtype: object
Which also means that you can remove to_dict
and cut some overhead:
large_df = df.sample(n=100000, replace=True)
%timeit large_df.set_index("Name1")["Name2"]
# 4.76 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit large_df.set_index("Name1")["Name2"].to_dict()
# 20.2 ms ± 976 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
How to convert two columns values into a key-value pair dictionary?
Create Series
and convert to dict
:
d = df.set_index('event_type')['count'].to_dict()
print (d)
{'a': 29, 'b': 1042, 'c': 2928, 'd': 4492}
How to convert dataframe columns into a dictionary with one key and multiple value without tuples?
Use to_dict('list')
on the transposed DataFrame:
df.set_index('km').T.to_dict('list')
output:
{24.6: ['test', 43, 555], 63.9: ['test', 31, 666]}
NB. note that in case you have duplicated values in "km", as you can only have unique keys in a dictionary, only the latest row will be kept
How to create a dictionary of two pandas DataFrame columns
In [9]: pd.Series(df.Letter.values,index=df.Position).to_dict()
Out[9]: {1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}
Speed comparion (using Wouter's method)
In [6]: df = pd.DataFrame(randint(0,10,10000).reshape(5000,2),columns=list('AB'))
In [7]: %timeit dict(zip(df.A,df.B))
1000 loops, best of 3: 1.27 ms per loop
In [8]: %timeit pd.Series(df.A.values,index=df.B).to_dict()
1000 loops, best of 3: 987 us per loop
Pandas transform two columns of lists into a columns dictionary with repeated keys
Use custom function with defaultdict
if performance is important:
from collections import defaultdict
def f(x):
d = defaultdict(list)
for y, z in zip(*x):
d[y].append(z)
return d
df['New Dict Column'] = [ f(x) for x in df[['column1','column2']].to_numpy()]
print(df)
column1 column2 New Dict Column
0 [a, b, c, a] [1, 2, 3, 4] {'a': [1, 4], 'b': [2], 'c': [3]}
1 [b, b, a] [1, 2, 3] {'b': [1, 2], 'a': [3]}
Performance is really good, 10 times faster:
#20k rows for test
df = pd.concat([df] * 10000, ignore_index=True)
In [211]: %timeit df.apply(lambda data: {k: [y for x, y in zip(data[0], data[1]) if x == k] for k in data[0]}, axis=1)
532 ms ± 2.54 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [212]: %timeit [ f(x) for x in df[['column1','column2']].to_numpy()]
53.8 ms ± 596 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
python dataframe to dictionary with multiple columns in keys and values
You can loop through the DataFrame.
Assuming your DataFrame is called "df" this gives you the dict.
result_dict = {}
for idx, row in df.iterrows():
result_dict[(row.origin, row.dest, row['product'], row.ship_date )] = (
row.origin, row.dest, row['product'], row.truck_in )
Since looping through 400k rows will take some time, have a look at tqdm (https://tqdm.github.io/) to get a progress bar with a time estimate that quickly tells you if the approach works for your dataset.
Also, note that 400K dictionary entries may take up a lot of memory so you may try to estimate if the dict fits your memory.
Another, memory waisting but faster way is to do it in Pandas
Create a new column with the value for the dictionary
df['value'] = df.apply(lambda x: (x.origin, x.dest, x['product'], x.truck_in), axis=1)
Then set the index and convert to dict
df.set_index(['origin','dest','product','ship_date'])['value'].to_dict()
Pandas - Convert two columns into a new column as a dictionary
IIUC correctly then you use apply
with a lambda
:
In [19]:
df['merged'] = df.apply(lambda row: {row['Stage_Name']:row['Metrics']}, axis=1)
df
Out[19]:
Block_Name Metrics Stage_Name merged
0 A [(P, P), (Q, Q)] P {'P': [('P', 'P'), ('Q', 'Q')]}
1 B (K, K) K {'K': ('K', 'K')}
2 A (Z, Z) Z {'Z': ('Z', 'Z')}
Related Topics
Macos: How to Downgrade Homebrew Python
How to Remove Square Brackets from List in Python
Python: Fastest Way to Compare Arrays Elementwise
Printing the Number of Days in a Given Month and Year [Python]
Python - Automatically Adjust Width of an Excel File'S Columns
Making a Dictionary from Each Line in a File
Calculating the Area Under a Curve Given a Set of Coordinates, Without Knowing the Function
Python - Having Trouble Opening a File With Spaces
Python: Draw Line Between Two Coordinates in a Matrix
How to Insert String Value into Specific Column Value on Python Pandas
What Is the Fastest Way to Stack Numpy Arrays in a Loop
How to Check If Numbers Are in a List in Python
Django: Check Whether an Object Already Exists Before Adding
Get Only Unique Words from a Sentence in Python
Finding the Most Frequent Character in a String
Webdriverexception: Message: Unknown Error: Chrome Failed to Start: Crashed
Matplotlib: Drawing Lines Between Points Ignoring Missing Data