Difference Between 'Names(Df[1]) <- ' and 'Names(Df)[1] <- '

Difference between `names(df[1]) - ` and `names(df)[1] - `

What I think is happening is that replacement into a data frame ignores the attributes of the data frame that is drawn from. I am not 100% sure of this, but the following experiments appear to back it up:

df <- data.frame(a = 1:3, b = 5:7)
#   a b
# 1 1 5
# 2 2 6
# 3 3 7

df2 <- data.frame(c = 10:12)
#    c
# 1 10
# 2 11
# 3 12

df[1] <- df2[1]   # in this case `df[1] <- df2` is equivalent

Which produces:

Notice how the values changed for df, but not the names. Basically the replacement operator `[<-` only replaces the values. This is why the name was not updated. I believe this explains all the issues.

In the scenario:

names(df[2]) <- "x"

You can think of the assignment as follows (this is a simplification, see end of post for more detail):

tmp <- df[2]
#   b
# 1 5
# 2 6
# 3 7

names(tmp) <- "x"
#   x
# 1 5
# 2 6
# 3 7

df[2] <- tmp   # `tmp` has "x" for names, but it is ignored!
#    a b
# 1 10 5
# 2 11 6
# 3 12 7

The last step of which is an assignment with `[<-`, which doesn't respect the names attribute of the RHS.

But in the scenario:

names(df)[2] <- "x"

you can think of the assignment as (again, a simplification):

tmp <- names(df)
# [1] "a" "b"

tmp[2] <- "x"
# [1] "a" "x"

names(df) <- tmp
#    a x
# 1 10 5
# 2 11 6
# 3 12 7

Notice how we directly assign to names, instead of assigning to df which ignores attributes.

df[2] <- 2

works because we are assigning directly to the values, not the attributes, so there are no problems here.

EDIT: based on some commentary from @AriB.Friedman, here is a more elaborate version of what I think is going on (note I'm omitting the S3 dispatch to `[.data.frame`, etc., for clarity):

Version 1 names(df[2]) <- "x" translates to:

df <- `[<-`(
  df, 2, 
  value=`names<-`(   # `names<-` here returns a re-named one column data frame
    `[`(df, 2),       
    value="x"
) )

Version 2 names(df)[2] <- "x" translates to:

df <- `names<-`(
  df,
  `[<-`(
     names(df), 2, "x"
) )

Also, turns out this is "documented" in R Inferno Section 8.2.34 (Thanks @Frank):

right <- wrong <- c(a=1, b=2)
names(wrong[1]) <- 'changed'
wrong
# a b
# 1 2
names(right)[1] <- 'changed'
right
# changed b
# 1 2

Python Pandas - Find difference between two data frames

By using drop_duplicates

pd.concat([df1,df2]).drop_duplicates(keep=False)

Update :

The above method only works for those data frames that don't already have duplicates themselves. For example:

df1=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]})
df2=pd.DataFrame({'A':[1],'B':[2]})

It will output like below , which is wrong

Wrong Output :

pd.concat([df1, df2]).drop_duplicates(keep=False)
Out[655]: 
   A  B
1  2  3

Correct Output

How to achieve that?

Method 1: Using isin with tuple

df1[~df1.apply(tuple,1).isin(df2.apply(tuple,1))]
Out[657]: 
   A  B
1  2  3
2  3  4
3  3  4

Method 2: merge with indicator

df1.merge(df2,indicator = True, how='left').loc[lambda x : x['_merge']!='both']
Out[421]: 
   A  B     _merge
1  2  3  left_only
2  3  4  left_only
3  3  4  left_only

How to get the column names between two columns in R

You can use the dplyr package like this:

library(dplyr) 
df %>% select(Q1:Q5) %>% colnames()

or as function:

find_colnames <- function(c1, c2, data) data %>% select(c1:c2) %>% colnames()

How to rename a single column in a data.frame?

colnames(trSamp)[2] <- "newname2"

attempts to set the second column's name. Your object only has one column, so the command throws an error. This should be sufficient:

colnames(trSamp) <- "newname2"

R dataframe - Top n values in row with column names

You could pivot to long, group by the corresponding original row, use slice_max to get the top values, then pivot back to wide and bind that output to the original table.

library(dplyr, warn.conflicts = FALSE)
library(tidyr)

iris %>% 
  group_by(rn = row_number()) %>% 
  pivot_longer(-c(Species, rn), 'col', values_to = 'high') %>% 
  slice_max(col, n = 2) %>% 
  mutate(nm = row_number()) %>% 
  pivot_wider(values_from = c(high, col), 
              names_from = nm) %>% 
  ungroup() %>% 
  select(-c(Species, rn)) %>% 
  bind_cols(iris)
#> # A tibble: 150 × 9
#>    high_1 high_2 col_1   col_2 Sepal.Length Sepal.Width Petal.Length Petal.Width
#>     <dbl>  <dbl> <chr>   <chr>        <dbl>       <dbl>        <dbl>       <dbl>
#>  1    5.1    3.5 Sepal.… Sepa…          5.1         3.5          1.4         0.2
#>  2    4.9    3   Sepal.… Sepa…          4.9         3            1.4         0.2
#>  3    4.7    3.2 Sepal.… Sepa…          4.7         3.2          1.3         0.2
#>  4    4.6    3.1 Sepal.… Sepa…          4.6         3.1          1.5         0.2
#>  5    5      3.6 Sepal.… Sepa…          5           3.6          1.4         0.2
#>  6    5.4    3.9 Sepal.… Sepa…          5.4         3.9          1.7         0.4
#>  7    4.6    3.4 Sepal.… Sepa…          4.6         3.4          1.4         0.3
#>  8    5      3.4 Sepal.… Sepa…          5           3.4          1.5         0.2
#>  9    4.4    2.9 Sepal.… Sepa…          4.4         2.9          1.4         0.2
#> 10    4.9    3.1 Sepal.… Sepa…          4.9         3.1          1.5         0.1
#> # … with 140 more rows, and 1 more variable: Species <fct>

^{Created on 2022-02-16 by the reprex package (v2.0.1)}

Edited to remove the unnecessary rename and mutate, thanks to tip from @Onyambu!

Determine the data types of a data frame's columns

Your best bet to start is to use ?str(). To explore some examples, let's make some data:

set.seed(3221)  # this makes the example exactly reproducible
my.data <- data.frame(y=rnorm(5), 
                      x1=c(1:5), 
                      x2=c(TRUE, TRUE, FALSE, FALSE, FALSE),
                      X3=letters[1:5])

@Wilmer E Henao H's solution is very streamlined:

sapply(my.data, class)
        y        x1        x2        X3 
"numeric" "integer" "logical"  "factor"

Using str() gets you that information plus extra goodies (such as the levels of your factors and the first few values of each variable):

str(my.data)
'data.frame':  5 obs. of  4 variables:
$ y : num  1.03 1.599 -0.818 0.872 -2.682
$ x1: int  1 2 3 4 5
$ x2: logi  TRUE TRUE FALSE FALSE FALSE
$ X3: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5

@Gavin Simpson's approach is also streamlined, but provides slightly different information than class():

sapply(my.data, typeof)
       y        x1        x2        X3 
"double" "integer" "logical" "integer"

For more information about class, typeof, and the middle child, mode, see this excellent SO thread: A comprehensive survey of the types of things in R. 'mode' and 'class' and 'typeof' are insufficient.

Row names & column names in R

As Oscar Wilde said

Consistency is the last refuge of the
unimaginative.

R is more of an evolved rather than designed language, so these things happen. names() and colnames() work on a data.frame but names() does not work on a matrix:

R> DF <- data.frame(foo=1:3, bar=LETTERS[1:3])
R> names(DF)
[1] "foo" "bar"
R> colnames(DF)
[1] "foo" "bar"
R> M <- matrix(1:9, ncol=3, dimnames=list(1:3, c("alpha","beta","gamma")))
R> names(M)
NULL
R> colnames(M)
[1] "alpha" "beta"  "gamma"
R>

Difference Between 'Names(Df[1]) <- ' and 'Names(Df)[1] <- '