Difference Between 'Names(Df[1]) <- ' and 'Names(Df)[1] <- '

Difference between `names(df[1]) - ` and `names(df)[1] - `

What I think is happening is that replacement into a data frame ignores the attributes of the data frame that is drawn from. I am not 100% sure of this, but the following experiments appear to back it up:

df <- data.frame(a = 1:3, b = 5:7)
# a b
# 1 1 5
# 2 2 6
# 3 3 7

df2 <- data.frame(c = 10:12)
# c
# 1 10
# 2 11
# 3 12

df[1] <- df2[1] # in this case `df[1] <- df2` is equivalent

Which produces:

#    a b
# 1 10 5
# 2 11 6
# 3 12 7

Notice how the values changed for df, but not the names. Basically the replacement operator `[<-` only replaces the values. This is why the name was not updated. I believe this explains all the issues.

In the scenario:

names(df[2]) <- "x"

You can think of the assignment as follows (this is a simplification, see end of post for more detail):

tmp <- df[2]
# b
# 1 5
# 2 6
# 3 7

names(tmp) <- "x"
# x
# 1 5
# 2 6
# 3 7

df[2] <- tmp # `tmp` has "x" for names, but it is ignored!
# a b
# 1 10 5
# 2 11 6
# 3 12 7

The last step of which is an assignment with `[<-`, which doesn't respect the names attribute of the RHS.

But in the scenario:

names(df)[2] <- "x"

you can think of the assignment as (again, a simplification):

tmp <- names(df)
# [1] "a" "b"

tmp[2] <- "x"
# [1] "a" "x"

names(df) <- tmp
# a x
# 1 10 5
# 2 11 6
# 3 12 7

Notice how we directly assign to names, instead of assigning to df which ignores attributes.

df[2] <- 2

works because we are assigning directly to the values, not the attributes, so there are no problems here.


EDIT: based on some commentary from @AriB.Friedman, here is a more elaborate version of what I think is going on (note I'm omitting the S3 dispatch to `[.data.frame`, etc., for clarity):

Version 1 names(df[2]) <- "x" translates to:

df <- `[<-`(
df, 2,
value=`names<-`( # `names<-` here returns a re-named one column data frame
`[`(df, 2),
value="x"
) )

Version 2 names(df)[2] <- "x" translates to:

df <- `names<-`(
df,
`[<-`(
names(df), 2, "x"
) )

Also, turns out this is "documented" in R Inferno Section 8.2.34 (Thanks @Frank):

right <- wrong <- c(a=1, b=2)
names(wrong[1]) <- 'changed'
wrong
# a b
# 1 2
names(right)[1] <- 'changed'
right
# changed b
# 1 2

Python Pandas - Find difference between two data frames

By using drop_duplicates

pd.concat([df1,df2]).drop_duplicates(keep=False)

Update :

The above method only works for those data frames that don't already have duplicates themselves. For example:

df1=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]})
df2=pd.DataFrame({'A':[1],'B':[2]})

It will output like below , which is wrong

Wrong Output :

pd.concat([df1, df2]).drop_duplicates(keep=False)
Out[655]:
A B
1 2 3

Correct Output

Out[656]: 
A B
1 2 3
2 3 4
3 3 4


How to achieve that?

Method 1: Using isin with tuple

df1[~df1.apply(tuple,1).isin(df2.apply(tuple,1))]
Out[657]:
A B
1 2 3
2 3 4
3 3 4

Method 2: merge with indicator

df1.merge(df2,indicator = True, how='left').loc[lambda x : x['_merge']!='both']
Out[421]:
A B _merge
1 2 3 left_only
2 3 4 left_only
3 3 4 left_only

How to get the column names between two columns in R

You can use the dplyr package like this:

library(dplyr) 
df %>% select(Q1:Q5) %>% colnames()

or as function:

find_colnames <- function(c1, c2, data) data %>% select(c1:c2) %>% colnames()

How to rename a single column in a data.frame?

colnames(trSamp)[2] <- "newname2"

attempts to set the second column's name. Your object only has one column, so the command throws an error. This should be sufficient:

colnames(trSamp) <- "newname2"

R dataframe - Top n values in row with column names

You could pivot to long, group by the corresponding original row, use slice_max to get the top values, then pivot back to wide and bind that output to the original table.

library(dplyr, warn.conflicts = FALSE)
library(tidyr)

iris %>%
group_by(rn = row_number()) %>%
pivot_longer(-c(Species, rn), 'col', values_to = 'high') %>%
slice_max(col, n = 2) %>%
mutate(nm = row_number()) %>%
pivot_wider(values_from = c(high, col),
names_from = nm) %>%
ungroup() %>%
select(-c(Species, rn)) %>%
bind_cols(iris)
#> # A tibble: 150 × 9
#> high_1 high_2 col_1 col_2 Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <dbl> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 5.1 3.5 Sepal.… Sepa… 5.1 3.5 1.4 0.2
#> 2 4.9 3 Sepal.… Sepa… 4.9 3 1.4 0.2
#> 3 4.7 3.2 Sepal.… Sepa… 4.7 3.2 1.3 0.2
#> 4 4.6 3.1 Sepal.… Sepa… 4.6 3.1 1.5 0.2
#> 5 5 3.6 Sepal.… Sepa… 5 3.6 1.4 0.2
#> 6 5.4 3.9 Sepal.… Sepa… 5.4 3.9 1.7 0.4
#> 7 4.6 3.4 Sepal.… Sepa… 4.6 3.4 1.4 0.3
#> 8 5 3.4 Sepal.… Sepa… 5 3.4 1.5 0.2
#> 9 4.4 2.9 Sepal.… Sepa… 4.4 2.9 1.4 0.2
#> 10 4.9 3.1 Sepal.… Sepa… 4.9 3.1 1.5 0.1
#> # … with 140 more rows, and 1 more variable: Species <fct>

Created on 2022-02-16 by the reprex package (v2.0.1)

Edited to remove the unnecessary rename and mutate, thanks to tip from @Onyambu!

Determine the data types of a data frame's columns

Your best bet to start is to use ?str(). To explore some examples, let's make some data:

set.seed(3221)  # this makes the example exactly reproducible
my.data <- data.frame(y=rnorm(5),
x1=c(1:5),
x2=c(TRUE, TRUE, FALSE, FALSE, FALSE),
X3=letters[1:5])

@Wilmer E Henao H's solution is very streamlined:

sapply(my.data, class)
y x1 x2 X3
"numeric" "integer" "logical" "factor"

Using str() gets you that information plus extra goodies (such as the levels of your factors and the first few values of each variable):

str(my.data)
'data.frame': 5 obs. of 4 variables:
$ y : num 1.03 1.599 -0.818 0.872 -2.682
$ x1: int 1 2 3 4 5
$ x2: logi TRUE TRUE FALSE FALSE FALSE
$ X3: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5

@Gavin Simpson's approach is also streamlined, but provides slightly different information than class():

sapply(my.data, typeof)
y x1 x2 X3
"double" "integer" "logical" "integer"

For more information about class, typeof, and the middle child, mode, see this excellent SO thread: A comprehensive survey of the types of things in R. 'mode' and 'class' and 'typeof' are insufficient.

Row names & column names in R

As Oscar Wilde said

Consistency is the last refuge of the
unimaginative.

R is more of an evolved rather than designed language, so these things happen. names() and colnames() work on a data.frame but names() does not work on a matrix:

R> DF <- data.frame(foo=1:3, bar=LETTERS[1:3])
R> names(DF)
[1] "foo" "bar"
R> colnames(DF)
[1] "foo" "bar"
R> M <- matrix(1:9, ncol=3, dimnames=list(1:3, c("alpha","beta","gamma")))
R> names(M)
NULL
R> colnames(M)
[1] "alpha" "beta" "gamma"
R>


Related Topics



Leave a reply



Submit