Removing a character from entire data frame
You can use DataFrame.replace
and for select use subset
:
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':['f;','d:','sda;sd'],
'D':['s','d;','d;p'],
'E':[5,3,6],
'F':[7,4,3]})
print (df)
A B C D E F
0 1 4 f; s 5 7
1 2 5 d: d; 3 4
2 3 6 sda;sd d;p 6 3
cols_to_check = ['C','D', 'E']
print (df[cols_to_check])
C D E
0 f; s 5
1 d: d; 3
2 sda;sd d;p 6
df[cols_to_check] = df[cols_to_check].replace({';':''}, regex=True)
print (df)
A B C D E F
0 1 4 f s 5 7
1 2 5 d: d 3 4
2 3 6 sdasd dp 6 3
remove a character from the entire data frame
I would use lapply
to loop over the columns and then replace the "
using gsub
.
df1[] <- lapply(df1, gsub, pattern='"', replacement='')
df1
# ID name value1 value2
#1 1 x a,b,c x
#2 2 y d,r z
and if need the class
can be changed with type.convert
df1[] <- lapply(df1, type.convert)
data
df1 <- structure(list(ID = c("\"1", "\"2"), name = c("x", "y"),
value1 = c("a,\"b,\"c",
"d,\"r\""), value2 = c("x\"", "z\"")), .Names = c("ID", "name",
"value1", "value2"), class = "data.frame", row.names = c(NA, -2L))
How to remove a character from some rows in a dataframe column?
Another way would be to use numpy.where
and evaluate your conditions using str.startswith
and str.endswith
:
import numpy as np
p = df['Price'].str
df['Price'] = np.where(p.startswith('.'),p.replace('.','',regex=True),
np.where(p.endswith('.T'),p.replace('.T','',regex=True),p))
This will check whether df['Price']
starts with a .
or ends with a .T
and replace them.
Brand Price
0 Honda Civic 22000
1 Toyota Corolla 25000
2 Ford Focus 27000
3 Audi A4 TPX
4 Suzuki NKM1
How to remove part of characters in data frame column
There are multiple ways of doing this:
- Using
as.numeric
on a column of your choice.
raw$Zipcode <- as.numeric(raw$Zipcode)
- If you want it to be a
character
then you can usestringr
package.
library(stringr)
raw$Zipcode <- str_replace(raw$Zipcode, "^0+" ,"")
- There is another function called
str_remove
instringr
package.
raw$Zipcode <- str_remove(raw$Zipcode, "^0+")
- You can also use
sub
from base R.
raw$Zipcode <- sub("^0+", "", raw$Zipcode)
But if you want to remove n
number of leading zeroes, replace +
with {n}
to remove them.
For instance to remove two 0's use sub("^0{2}", "", raw$Zipcode)
.
How to remove special characters from rows in pandas dataframe
I have different approach using regex. It will delete anything between brackets:
import re
import pandas as pd
df = {'LGA': ['Alpine (S)', 'Ararat (RC)', 'Bass Coast (S)'] }
df = pd.DataFrame(df)
df['LGA'] = [re.sub("[\(\[].*?[\)\]]", "", x).strip() for x in df['LGA']] # delete anything between brackets
Remove unwanted parts from strings in a column
data['result'] = data['result'].map(lambda x: x.lstrip('+-').rstrip('aAbBcC'))
Pandas: Remove all characters before a specific character in a dataframe column
Using str.replace
:
df["address"] = df["address"].str.replace(r'^[^,]*,\s*', '')
Here is a regex demo showing that the logic is working.
Removing special character from dataframe
That looks like a tuple to me, so give .str[0]
a shot:
df['IP_ADDRESS'] = df['IP_ADDRESS'].str[0]
R Remove string characters from a range of rows in a column
If we want to substring and filter, an option is to use trimws
(trims out the characters by default whitespace at either end of the string - if we want only left or right, specify the which
by default is 'both') with whitespace
as regex i.e. matching zero or more upper case letters followed by zero or more spaces ([A-Z]*\\s*
), and then filter
the rows where the elements are not blank
library(dplyr)
df %>%
mutate(Date = trimws(Date, whitespace = "[A-Z]*\\s*")) %>%
filter(nzchar(Date))
-output
Date Date_Approved
1 1/27/2020 1/28/2020
2 1/29/2020 1/30/2020
3 1/30/2020 1/31/2020
4 2/1/2020 2/2/2020
5 2/9/2020 2/10/2020
6 2/15/2020 2/16/2020
7 2/16/2020 2/17/2020
8 2/17/2020 2/19/2020
9 2/18/2020 2/20/2020
10 2/22/2020 2/23/2020
11 2/25/2020 2/26/2020
12 2/28/2020 2/29/2020
remove character for all column names in a data frame
If we need to remove only 'v' the one of more digits (\\d+
) at the end ($
) is not needed as the expected output also removes 'v' from first column 'q_ve5'
library(dplyr)
library(stringr)
df %>%
rename_with(~ str_remove(., "v"), everything())
-output
# A tibble: 2 × 5
q_e5 q_f_1 q_f_2 q_e6 q_e8
<int> <int> <int> <int> <int>
1 1 3 3 5 5
2 2 4 4 6 6
Or without any packages
names(df) <- sub("v", "", names(df))
Related Topics
Error Installing Packages from Github
Delete Rows Based on Multiple Conditions with Dplyr
R Ggplot Boxplot: Change Y-Axis Limit
Adding R^2 on Graph with Facets
How to Plot the Linear Regression in R
S4 Classes: Multiple Types Per Slot
Scoping and Functions in R 2.11.1:What's Going Wrong
How Does Branch Prediction Affect Performance in R
How to Turn the Numeric Output of Boxplot (With Plot=False) into Something Usable
R Pheatmap: Change Annotation Colors and Prevent Graphics Window from Popping Up
Ggplot2 Time Series Plotting: How to Omit Periods When There Is No Data Points
Ggplot: Manual Color Assignment for Single Variable Only
How to Escape Characters in Variable Names
In R, Match Function for Rows or Columns of Matrix