Select rows of a data.frame that contain only numbers in a certain column
You could use grep
:
df[grep("[[:digit:]]", df$b), ]
# a b
#1 1 4
#2 5 -2
#3 3 1
#4 1 0
#6 6 2
Filtering Dataframe by keeping numeric values of a specific column only in R
You could use a regular expression to filter the relevant rows of your dataframe.
The regular expression ^\\d+(\\.\\d+)?$
will check for character that contains only digits, possibly with .
as a decimal separator (i.e. 2, 2.3). You could then convert the Cost
column to numeric using as.numeric()
if needed.
See the example below:
Group = c("A", "A", "A", "B", "B", "C", "C", "C")
Cost = c(21,22,"closed", 12, 11,"ended", "closing", 13)
Year = c(2017,2016,2015,2017,2016,2017,2016,2015)
df = data.frame(Group, Cost, Year)
df[grep(pattern = "^\\d+(\\.\\d+)?$", df[,"Cost"]), ]
#> Group Cost Year
#> 1 A 21 2017
#> 2 A 22 2016
#> 4 B 12 2017
#> 5 B 11 2016
#> 8 C 13 2015
Note that this technique works even if your Cost
column is of factor
class while using df[!is.na(as.numeric(df$Cost)), ]
does not. For the latter you need to add as.character()
first: df[!is.na(as.numeric(as.character(df$Cost))), ]
. Both techniques keep factor levels.
Select all dataframe rows containing a specific integer
df.loc[df.x == 1, 'x'].count()
How to check if a pandas dataframe contains only numeric values column-wise?
You can check that using to_numeric
and coercing errors:
pd.to_numeric(df['column'], errors='coerce').notnull().all()
For all columns, you can iterate through columns or just use apply
df.apply(lambda s: pd.to_numeric(s, errors='coerce').notnull().all())
E.g.
df = pd.DataFrame({'col' : [1,2, 10, np.nan, 'a'],
'col2': ['a', 10, 30, 40 ,50],
'col3': [1,2,3,4,5.0]})
Outputs
col False
col2 False
col3 True
dtype: bool
Select only a number of rows from a pandas Dataframe based on a condition
Not sure how your dataframe looks like but you could groupby teams and then use head(16) to get only the first 16 of them.
df.groupby('club').head(16)
How to select only numbers from a dataframe in R using which()
You can use the built-in as.numeric()
converter to do something like this:
x <- my_data_frame$Column.Title
xn <- as.numeric(x)
which(!is.na(xn))
This won't distinguish between NA
s created by failed coercion and pre-existing (numeric) NA
values.
If there's a small enough variety of "missing" values you could read the data in with read.csv(..., na.strings=c("NA","missing","no input"))
Selecting only numeric columns from a data frame
EDIT: updated to avoid use of ill-advised sapply
.
Since a data frame is a list we can use the list-apply functions:
nums <- unlist(lapply(x, is.numeric), use.names = FALSE)
Then standard subsetting
x[ , nums]
## don't use sapply, even though it's less code
## nums <- sapply(x, is.numeric)
For a more idiomatic modern R I'd now recommend
x[ , purrr::map_lgl(x, is.numeric)]
Less codey, less reflecting R's particular quirks, and more straightforward, and robust to use on database-back-ended tibbles:
dplyr::select_if(x, is.numeric)
Newer versions of dplyr, also support the following syntax:
x %>% dplyr::select(where(is.numeric))
Related Topics
Subset Data Frame Using Row Names
Using Variable Column Names in Dplyr Summarise
How Does R Handle Unicode/Utf-8
How to Perform Arithmetic on Values and Operators Expressed as Strings
Loess Regression on Each Group with Dplyr::Group_By()
Display a Summary Line Per Facet Rather Than Overall
Aligning Data Frame with Missing Values
Make a List of Many Objects from a Vector of Object Names
Select Rows of a Data.Frame That Contain Only Numbers in a Certain Column
Finding Overlapping Ranges Between Two Interval Data
Understanding Element Wise Clearing of R's Workspace
R Xml - Combining Parent and Child Nodes into Data Frame
Difference Between 'Names(Df[1]) <- ' and 'Names(Df)[1] <- '
Make a File Writable in Order to Add New Packages
Dplyr::N() Returns "Error: Error: N() Should Only Be Called in a Data Context "