Remove a Character from the Entire Data Frame

Removing a character from entire data frame

You can use DataFrame.replace and for select use subset:

df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':['f;','d:','sda;sd'],
'D':['s','d;','d;p'],
'E':[5,3,6],
'F':[7,4,3]})

print (df)
A B C D E F
0 1 4 f; s 5 7
1 2 5 d: d; 3 4
2 3 6 sda;sd d;p 6 3

cols_to_check = ['C','D', 'E']

print (df[cols_to_check])
C D E
0 f; s 5
1 d: d; 3
2 sda;sd d;p 6

df[cols_to_check] = df[cols_to_check].replace({';':''}, regex=True)
print (df)
A B C D E F
0 1 4 f s 5 7
1 2 5 d: d 3 4
2 3 6 sdasd dp 6 3

remove a character from the entire data frame

I would use lapply to loop over the columns and then replace the " using gsub.

df1[] <- lapply(df1, gsub, pattern='"', replacement='')
df1
# ID name value1 value2
#1 1 x a,b,c x
#2 2 y d,r z

and if need the class can be changed with type.convert

df1[] <- lapply(df1, type.convert)

data

df1 <-  structure(list(ID = c("\"1", "\"2"), name = c("x", "y"),
value1 = c("a,\"b,\"c",
"d,\"r\""), value2 = c("x\"", "z\"")), .Names = c("ID", "name",
"value1", "value2"), class = "data.frame", row.names = c(NA, -2L))

How to remove a character from some rows in a dataframe column?

Another way would be to use numpy.where and evaluate your conditions using str.startswith and str.endswith:

import numpy as np

p = df['Price'].str
df['Price'] = np.where(p.startswith('.'),p.replace('.','',regex=True),
np.where(p.endswith('.T'),p.replace('.T','',regex=True),p))

This will check whether df['Price'] starts with a . or ends with a .T and replace them.

            Brand  Price
0 Honda Civic 22000
1 Toyota Corolla 25000
2 Ford Focus 27000
3 Audi A4 TPX
4 Suzuki NKM1

How to remove part of characters in data frame column

There are multiple ways of doing this:

  1. Using as.numeric on a column of your choice.
raw$Zipcode <- as.numeric(raw$Zipcode)

  1. If you want it to be a character then you can use stringr package.
library(stringr)
raw$Zipcode <- str_replace(raw$Zipcode, "^0+" ,"")

  1. There is another function called str_remove in stringr package.
raw$Zipcode <- str_remove(raw$Zipcode, "^0+")

  1. You can also use sub from base R.
raw$Zipcode <- sub("^0+", "", raw$Zipcode)

But if you want to remove n number of leading zeroes, replace + with {n} to remove them.

For instance to remove two 0's use sub("^0{2}", "", raw$Zipcode).

How to remove special characters from rows in pandas dataframe

I have different approach using regex. It will delete anything between brackets:

import re
import pandas as pd
df = {'LGA': ['Alpine (S)', 'Ararat (RC)', 'Bass Coast (S)'] }
df = pd.DataFrame(df)
df['LGA'] = [re.sub("[\(\[].*?[\)\]]", "", x).strip() for x in df['LGA']] # delete anything between brackets

Remove unwanted parts from strings in a column

data['result'] = data['result'].map(lambda x: x.lstrip('+-').rstrip('aAbBcC'))

Pandas: Remove all characters before a specific character in a dataframe column

Using str.replace:

df["address"] = df["address"].str.replace(r'^[^,]*,\s*', '')

Here is a regex demo showing that the logic is working.

Removing special character from dataframe

That looks like a tuple to me, so give .str[0] a shot:

df['IP_ADDRESS'] = df['IP_ADDRESS'].str[0]

R Remove string characters from a range of rows in a column

If we want to substring and filter, an option is to use trimws (trims out the characters by default whitespace at either end of the string - if we want only left or right, specify the which by default is 'both') with whitespace as regex i.e. matching zero or more upper case letters followed by zero or more spaces ([A-Z]*\\s*), and then filter the rows where the elements are not blank

library(dplyr)
df %>%
mutate(Date = trimws(Date, whitespace = "[A-Z]*\\s*")) %>%
filter(nzchar(Date))

-output

       Date Date_Approved
1 1/27/2020 1/28/2020
2 1/29/2020 1/30/2020
3 1/30/2020 1/31/2020
4 2/1/2020 2/2/2020
5 2/9/2020 2/10/2020
6 2/15/2020 2/16/2020
7 2/16/2020 2/17/2020
8 2/17/2020 2/19/2020
9 2/18/2020 2/20/2020
10 2/22/2020 2/23/2020
11 2/25/2020 2/26/2020
12 2/28/2020 2/29/2020

remove character for all column names in a data frame

If we need to remove only 'v' the one of more digits (\\d+) at the end ($) is not needed as the expected output also removes 'v' from first column 'q_ve5'

library(dplyr)
library(stringr)
df %>%
rename_with(~ str_remove(., "v"), everything())

-output

# A tibble: 2 × 5
q_e5 q_f_1 q_f_2 q_e6 q_e8
<int> <int> <int> <int> <int>
1 1 3 3 5 5
2 2 4 4 6 6

Or without any packages

names(df) <- sub("v", "", names(df))


Related Topics



Leave a reply



Submit