Numbers as Column Names of Data Frames

Numbers as column names of data frames

Yes, because R won't allow names of objects to start with numbers. If you were to call attach() with the data.frame, this would cause some issues.

data.frame (and read.table) function has the check.names parameter (default is TRUE)

If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names and are not duplicated. If necessary they are adjusted (by make.names) so that they are.

From ?make.names:

A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. [...] The character "X" is prepended if necessary.

Index pandas DataFrame by column numbers, when column names are integers

This is exactly the purpose of iloc, see here

In [37]: df
Out[37]:
10 11 12 13 14 15 16
x 0 1 2 3 4 5 6
y 7 8 9 10 11 12 13
u 14 15 16 17 18 19 20
z 21 22 23 24 25 26 27
w 28 29 30 31 32 33 34

In [38]: df.iloc[:,[1,3]]
Out[38]:
11 13
x 1 3
y 8 10
u 15 17
z 22 24
w 29 31

Select columns from dataframe start with number

Taking into account that your variables supposed to starting with numbers will be converted to variable names starting with X, you could do:

library(tidyverse)
df %>%
select(matches("^X[0-9]"))

which gives:

   X1..A X2..B X3..C X4..D
1
2 D A G
3 G
4 NA G
5 A G
6 D A G
7 A G
8 A G
9 D
10

With the same logic you can do your counts:

df %>% 
summarize(across(c(matches("^X[0-9]"), Concatenate), ~ sum(!is.na(.) & . != "" & . != "NA")))

which gives

  X1..A X2..B X3..C X4..D Concatenate
1 3 5 0 7 8

Although I'm not sure if you want to exclude the "NAG" value in the Concatenate column.

how to get numeric column names in pandas dataframe

Use select_dtypes with np.number for select all numeric columns:

df = pd.DataFrame({'A':list('abcdef'),
'B':[4.5,5,4,5,5,4],
'C':[7.4,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':list('aaabbb')})

print (df)
A B C D E
0 a 4.5 7.4 1 a
1 b 5.0 8.0 3 a
2 c 4.0 9.0 5 a
3 d 5.0 4.0 7 b
4 e 5.0 2.0 1 b
5 f 4.0 3.0 0 b

print (df.dtypes)
A object
B float64
C float64
D int64
E object
dtype: object

cols = df.select_dtypes([np.number]).columns
print (cols)
Index(['B', 'C', 'D'], dtype='object')

Here is possible specify float64 and int64:

df = pd.DataFrame({'A':list('abcdef'),
'B':[4.5,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':list('aaabbb')})

df['D'] = df['D'].astype(np.int32)
print (df.dtypes)
A object
B float64
C int64
D int32
E object
dtype: object

cols = df.select_dtypes([np.int64,np.float64]).columns
print (cols)
Index(['B', 'C'], dtype='object')

getting column names as integers in r data.frame

When we use data.frame, specify check.names = FALSE as the default option is TRUE and it appends non-standard column names (i.e. those starting with numbers) with 'X'

sales_df <- data.frame(sales_tb, check.names = FALSE)

The column names attribute will always be character class. It would not be integer. If we want to change it to integer, it should be outside

as.integer(colnames(sales_df))

How to reference column names that start with a number, in data.table

I think, this is what you're looking for, not sure. data.table is different from data.frame. Please have a look at the quick introduction, and then the FAQ (and also the reference manual if necessary).

require(data.table)
dt <- data.table("4PCS" = 1:3, y=3:1)
#   4PCS y
# 1:    1 3
# 2:    2 2
# 3:    3 1

# access column 4PCS
dt[, "4PCS"]

# returns a data.table
# 4PCS
# 1: 1
# 2: 2
# 3: 3

# to access multiple columns by name
dt[, c("4PCS", "y")]

Alternatively, if you need to access the column and not result in a data.table, rather a vector, then you can access using the $ notation:

dt$`4PCS` # notice the ` because the variable begins with a number
# [1] 1 2 3

# alternatively, as mnel mentioned under comments:
dt[, `4PCS`]
# [1] 1 2 3

Or if you know the column number you can access using [[.]] as follows:

dt[[1]] # 4PCS is the first column here
# [1] 1 2 3

Edit:

Thanks @joran. I think you're looking for this:

dt[, `4PCS` + y]
# [1] 4 4 4

Fundamentally the issue is that 4CPS is not a valid variable name in R (try 4CPS <- 1, you'll get the same "Unexpected symbol" error). So to refer to it, we have to use backticks (compare`4CPS` <- 1)



Related Topics



Leave a reply



Submit