How to Convert a Data Frame of Integer64 Values to Be a Matrix

How to convert a data frame of integer64 values to be a matrix?

For a raw vector, assigning the dim attribute directly seems to work:

> z <- as.integer64(1:10)
> z
integer64
 [1] 1  2  3  4  5  6  7  8  9  10
> dim(z) <- c(10, 1)
> z
integer64
      [,1]
 [1,] 1   
 [2,] 2   
 [3,] 3   
 [4,] 4   
 [5,] 5   
 [6,] 6   
 [7,] 7   
 [8,] 8   
 [9,] 9   
[10,] 10

For a data frame, cbinding the columns also works:

> df <- data.frame(x=as.integer64(1:5), y=as.integer64(6:10))
> df
  x  y
1 1  6
2 2  7
3 3  8
4 4  9
5 5 10
> cbind(df$x, df$y)
integer64
     [,1] [,2]
[1,] 1    6   
[2,] 2    7   
[3,] 3    8   
[4,] 4    9   
[5,] 5    10

So, for an arbitrary number of columns, do.call is the way to go:

> do.call(cbind, df)
integer64
     x y 
[1,] 1 6 
[2,] 2 7 
[3,] 3 8 
[4,] 4 9 
[5,] 5 10

Why do big integers get deformed in R when I convert from a data frame or data table to a matrix?

Hi maybe you should take a look to Why are values changing when converting from data.frame to a numeric matrix?

Look at Alex A.'s answer and tell me if it is helping you. I also think it is because the numeric values in your data frame are being treated as factors.

Alex A.'s code : y <- apply(as.matrix(x[, 1:5]), 2, as.numeric)

Edit : Nevermind seems like you have found your problem.

as.integer() on an int64 dataframe produces unexpected result

I think this is a limitation of bit64. bit64 uses the S3 Method as.integer.integer64 to convert from int64 to int, but only for vectors (unlike base as.integer which can be applied to other objects). The base as.integer doesn't know how to convert int64 to int on a data.frame or otherwise.

So after loading bit64, as.integer will call actually as.integer.integer64 on all int64 vectors, but not on a data.frame or tibble.

Convert all columns from int64 to int32

You can create dictionary by all columns with int64 dtype by DataFrame.select_dtypes and convert it to int32 by DataFrame.astype, but not sure if not fail if big integers numbers:

df = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,3,5,7,1,0],
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

d = dict.fromkeys(df.select_dtypes(np.int64).columns, np.int32)
df = df.astype(d)
print (df.dtypes)
A    object
B     int32
C     int32
D     int32
E     int32
F    object
dtype: object

Convert integer64 into integer in R

The integer64 class is created by teh bit64 package. There are many questions that have arisen over the years that have been answered on SO. Need to recover the data without mangling by using functions from the package that created the integer64 object.

 library(bit64)
 ?integer64
 # You might imagine that as.numeric should have an integer64 method.
 # ....  but like me you would have been wrong
 #Instead, division is defined for integer64 objects and it returns a double.
 # .... so divide by 1 ( if and only if you have installed and loaded pkg:bit64

  my_data3$export_value/1
 [1]      0      0      0      0 116290  66703      0      0      0      0      0      0      0      0      0  44671
 [17]      0      0      0   6350   7738      0      0 282161 148129   3499  14305  21185   4663  20628

I didn't notice earlier that there were both an as.integer and an as.double generic defined for 'integer64' objects, so it might be better to use them. Certainly if an integer were needed it might be better to start with as.integer.

I suppose that one or more of those earlier questions and answers might have contained an answer to this question but I didn't come across a duplicate in the first 5 I looked at.

How do I convert integer 'category' dtypes in a Pandas DataFrame to 'int64'/'float64'?

Suppose the following dataframe:

import pandas as pd
import numpy as np

df = pd.DataFrame({'cat_str': ['Hello', 'World'],
                   'cat_int': [0, 1],
                   'cat_float': [3.14, 2.71]}, dtype='category')
print(df.dtypes)

# Output
cat_str      category
cat_int      category
cat_float    category
dtype: object

You can try:

dtypes = {col: df[col].cat.categories.dtype for col in df.columns
             if np.issubdtype(df[col].cat.categories.dtype, np.number)}

df = df.astype(dtypes)
print(df.dtypes)

# Output
cat_str      category
cat_int         int64
cat_float     float64
dtype: object

Or if you want to remove all category dtypes, use:

dtypes = {col: df[col].cat.categories.dtype for col in df.columns}

df = df.astype(dtypes)
print(df.dtypes)

# Output
cat_str       object
cat_int        int64
cat_float    float64
dtype: object

Convert sparse pandas dataframe with `NaN` into integer values

UPDATE:

if you need nice looking string values you can do it:

In [84]: df.astype(object)
Out[84]:
   a  b    c
0  0  1    0
1  0  0    1
2  1  1    1
3  0  1    1
4  1  1  NaN

but all values - are strings (object in pandas terms):

In [85]: df.astype(object).dtypes
Out[85]:
a    object
b    object
c    object
dtype: object

Timings against 500K rows DF:

In [86]: df = pd.concat([df] * 10**5, ignore_index=True)

In [87]: df.shape
Out[87]: (500000, 3)

In [88]: %timeit df.astype(object)
10 loops, best of 3: 113 ms per loop

In [89]: %timeit df.applymap(lambda x: int(x) if pd.notnull(x) else x).astype(object)
1 loop, best of 3: 7.86 s per loop

OLD answer:

AFAIK you can't do it using modern pandas versions.

Here is a demo:

In [52]: df
Out[52]:
     a    b    c
0  1.0  NaN  0.0
1  NaN  1.0  1.0
2  0.0  0.0  NaN

In [53]: df[pd.isnull(df)] = -1

In [54]: df
Out[54]:
     a    b    c
0  1.0 -1.0  0.0
1 -1.0  1.0  1.0
2  0.0  0.0 -1.0

In [55]: df = df.astype(int)

In [56]: df
Out[56]:
   a  b  c
0  1 -1  0
1 -1  1  1
2  0  0 -1

we are almost there, let's replace -1 with NaN:

In [57]: df[df < 0] = np.nan

In [58]: df
Out[58]:
     a    b    c
0  1.0  NaN  0.0
1  NaN  1.0  1.0
2  0.0  0.0  NaN

Another demo:

In [60]: df = pd.DataFrame(np.random.choice([0,1], (5,3)), columns=list('abc'))

In [61]: df
Out[61]:
   a  b  c
0  1  0  0
1  1  0  1
2  0  1  1
3  0  0  1
4  0  0  1

look what happens with c column if we change a single cell in it to NaN:

In [62]: df.loc[4, 'c'] = np.nan

In [63]: df
Out[63]:
   a  b    c
0  1  0  0.0
1  1  0  1.0
2  0  1  1.0
3  0  0  1.0
4  0  0  NaN

How to Convert a Data Frame of Integer64 Values to Be a Matrix