python: combine two columns into one and duplicate corresponding cells
This is the way to do it with stack.. the way you were trying it before -
- Step 1 -
df.stack()
only the col1 and col 2 (and drop the Nans as well), then keep only the integer withreset_index()
index that will be used to merge it in the next step - Step 2 -
pd.merge()
the initial DataFrame with the stacked one on their index - Step 3 - DONE!
a = pd.DataFrame(df[['col1','col2']].stack(dropna=True),columns=['col1']).reset_index(level=1, drop=True)
pd.merge(df[['ID']],a,how='left',left_index=True, right_index=True)
ID col1
0 item_1 abc
1 item_2 bcd
2 item_3 NaN
3 item_4 mnb
3 item_4 lkj
Do update the correct answer in case you find this easier to understand, for anyone finding similar solutions. Cheers!
How to merge only specific text parts of two columns into one column?
Use splitting values by .
and selecting first lists by str[0]
indexing:
df['ColNew'] = (df['Col1'].str.rsplit('.', n=1).str[0] + '.' +
df['Col2'].astype(str).str.split('.').str[0])
print (df)
Col1 Col2 ColNew
0 11.50.199.1 12121.0 11.50.199.12121
1 12.55.222.1 12121.0 12.55.222.12121
Or use Series.str.replace
for last digits with .
in both columns:
df['ColNew'] = (df['Col1'].str.replace(r'\.\d+$', '') + '.' +
df['Col2'].astype(str).str.replace(r'\.\d+$', '') )
Merge DataFrames with Matching Values From Two Different Columns - Pandas
Use how='inner'
in pd.merge
:
merged_df = DF2.merge(DF1, how = 'inner', on = ['date', 'hours'])
This will perform and "inner-join" thereby omitting rows in each dataframe that do not match. Hence, no NaN in either the right or left part of merged dataframe.
Python Pandas merge only certain columns
You could merge the sub-DataFrame (with just those columns):
df2[list('xab')] # df2 but only with columns x, a, and b
df1.merge(df2[list('xab')])
How to add or combine two columns into another one in a dataframe if they meet a condition
Use numpy where, which works for conditionals. It is akin to an if statement in Python, but significantly faster. I rarely use iterrows
, since I don't find it as efficient as numpy where.
dfn['c'] = np.where(dfn['a']%2 !=0,
dfn.a + dfn.b,
dfn.a)
a b c
0 1 6 7
1 2 7 2
2 3 8 11
3 4 9 4
4 5 10 15
Basically, the first line in np.where defines your condition, which in this case is finding out if the 'a' column is an odd number. If it is, the next line is executed. If it is an even number, then the last line is executed. You can think of it as an if-else statement.
Related Topics
Test If Dictionary Key Exists, Is Not None and Isn't Blank
How to Pass a Dictionary Object as Parameter for a Function in Python
Comparing Digits in an Integer in Python
Beautifulsoup: Get the Contents of a Specific Table
How to Create a Common Function to Execute a Python Script in Jenkins
Convert Timedelta to Floating-Point
Insert Comma into Text File Using Python
Element That Appear More That Once in the List in Python
Python: String Iteration Replace a Space With a Hyphen (Or Other Character)
Convert a Python Int into a Big-Endian String of Bytes
Python Data Frame How to Find the Local Maximum in a 2D Array
Fastest Way to Compute Image Dataset Channel Wise Mean and Standard Deviation in Python
Numpy Import Throws Attributeerror: 'Module' Object Has No Attribute 'Core'
How to Convert a 16-Bit to an 8-Bit Image in Opencv
Typeerror: Strptime() Argument 1 Must Be Str, Not List
How to Convert Column With Dtype as Object to String in Pandas Dataframe