Difference Between As.Data.Frame(X) and Data.Frame(X)

Difference between as.data.frame(x) and as(x, data.frame)

Maybe the R authors thought replicating the first method would be encouraging bad coding practice. The first result does not look particularly worth emulating because the name of the column will not be easy to use. The data.frame method for character values delivers a much better behaved result since it gets created with a valid name:

> as.data.frame(c('a','b'))
c("a", "b")
1 a
2 b

data.frame(c('a','b'))
c..a....b..
1 a
2 b

See what happens when you try to extract values with the name of that column. Since everyone knows that dataframes are really list objects, (right?)... then it would be more natural to expect coders to use a list argument:

data.frame(list(b=c('a', 'b'))  )
b
1 a
2 b

# same as
> as.data.frame(list(f=c('a','b')))
f
1 a
2 b

Alex's answer directs you to the as-function code, which elaborates and confirms joran's comment above. That function doesn't use the S3 dispatch, but rather looks up registered coercion methods that have been created by packages or constructed with setAs which is a process that is more commonly used in building S4-methods.

> setAs("character", "data.frame", function(from){ to=as.data.frame.character(from)})
> new=as(c('a', 'b'), "data.frame")
> new
from
1 a
2 b

The setAs function also allows you to use custom coercion at the time of input with the read.*-functions: How can I completely remove scientific notation for the entire R session

In R, What is the difference between df[x] and df$x

If I'm not mistaken, df$x is the same as df[['x']]. [[ is used to select any single element, whereas [ returns a list of the selected elements. See also the language reference. I usually see that [[ is used for lists, [ for arrays and $ for getting a single column or element. If you need an expression (for example df[[name]] or df[,name]), then use the [ or [[ notation also. The [ notation is also used if multiple columns are selected. For example df[,c('name1', 'name2')]. I don't think there is a best-practices for this.

What is the difference between data and data.frame in R?

If you're asking how to create a dataframe named people, so you can access the names of the people using people$students or people$teachers, then the code to achieve that is:

people <- data.frame(students = students$name, teachers = teachers$name)
people$students

people would be a dataframe that looks like this:
Sample Image

If you want a list, you can create a list object like the following:

people2 <- as.list(c("students" = students, "teachers" = teachers))
people2$students.name
# returns [1] Cedric Fred George

And people2 would be a list:
Sample Image

See the $ (dollar sign) next to each item in the list? That tells you how to access them. If you wanted teachers.name, then print(people2$teachers.name) will do that for you.

As for your other questions:

  1. Is dcd similar to a matrix with 101 rows and 19851 columns?

You can verify the dimension of a matrix-like object using dim(), ncol() or nrow(). In your case yes it has 101 rows and 19851 columns.


  1. class(dcd) outputs "xyz" and "matrix", does it mean the dcd belongs to both "xyz" and "matrix" types in the same time?

Simplistically, you can think of it inheriting a matrix class as well as xyz. You may want to read about classes and inheritance in R.


  1. How can I create a data like pdb which includes multiple data.frame?

Look at my code above. people2 <- as.list(c("students" = students, "teachers" = teachers)) creates a list of "multiple" dataframes.

what is the difference meaning between - and = in data.frame?

No this it is not strange. You call the constructor of a data.frame with named and unnamed objects.

Originally I supposed that a data.frame is a list and use help(list) to explain the behaviour of data.frame. Even the philosophy is the same (named and unnamed argument) it was a mistake and the answer is in the help of data.frame

from ?data.frame I take this part where we speak about the names of arguments

If the arguments are all named and simple objects (not lists,
matrices of data frames) then the argument names give the column
names. For an unnamed simple argument, a deparsed version of the
argument is used as the name (with an enclosing I(...) removed).

So

x<-data.frame(name<-c("n1","n2"),age<-c(5,6))

this is equivalent to :

 x <- data.frame(c("n1","n2"),c(5,6))   ## unnamed objects The functions return dotted pair list 
name<-c("n1","n2")
age<-c(5,6)

Then for y

y<-data.frame(name=c("n1","n2"),age=c(5,6))  ## named objects functions return a list 

But notice that this explain only the naming procedure for simple object argument. The naming is more complicated than adding some dots.
For example, I find very amazing that theses 2 statements are equivalent(with check.names=T or F) :

         a <- data.frame(y <- list(x=1)) 
a <- data.frame(y = list(x=1))

Get difference between column strings in R dataframe

This works on a multi-row data frame, doing comparisons by row:

library(dplyr)
major <- c("T2A,C26T,G652A", "world")
minor <- c("T2A,C26T,G652A,C725T", "hello,world")

df <- data.frame(major,minor)

df %>%
mutate(
across(c(major, minor), strsplit, split = ",")
) %>%
mutate(
diff = mapply(setdiff, minor, major)
)
# major minor diff
# 1 T2A, C26T, G652A T2A, C26T, G652A, C725T C725T
# 2 world hello, world hello

Note that it does modify the major and minor columns, turning them into list columns containing character vectors within each row. You can use the .names argument to across if you need to keep the originals.



Related Topics



Leave a reply



Submit