How to convert a data frame of integer64 values to be a matrix?
For a raw vector, assigning the dim
attribute directly seems to work:
> z <- as.integer64(1:10)
> z
integer64
[1] 1 2 3 4 5 6 7 8 9 10
> dim(z) <- c(10, 1)
> z
integer64
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
[5,] 5
[6,] 6
[7,] 7
[8,] 8
[9,] 9
[10,] 10
For a data frame, cbind
ing the columns also works:
> df <- data.frame(x=as.integer64(1:5), y=as.integer64(6:10))
> df
x y
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
> cbind(df$x, df$y)
integer64
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
So, for an arbitrary number of columns, do.call
is the way to go:
> do.call(cbind, df)
integer64
x y
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
Why do big integers get deformed in R when I convert from a data frame or data table to a matrix?
Hi maybe you should take a look to Why are values changing when converting from data.frame to a numeric matrix?
Look at Alex A.'s answer and tell me if it is helping you. I also think it is because the numeric values in your data frame are being treated as factors.
Alex A.'s code : y <- apply(as.matrix(x[, 1:5]), 2, as.numeric)
Edit : Nevermind seems like you have found your problem.
as.integer() on an int64 dataframe produces unexpected result
I think this is a limitation of bit64
. bit64
uses the S3 Method as.integer.integer64
to convert from int64 to int, but only for vectors (unlike base as.integer which can be applied to other objects). The base as.integer
doesn't know how to convert int64 to int on a data.frame or otherwise.
So after loading bit64
, as.integer
will call actually as.integer.integer64
on all int64 vectors, but not on a data.frame or tibble.
Convert all columns from int64 to int32
You can create dictionary by all columns with int64
dtype by DataFrame.select_dtypes
and convert it to int32
by DataFrame.astype
, but not sure if not fail if big integers numbers:
df = pd.DataFrame({
'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
d = dict.fromkeys(df.select_dtypes(np.int64).columns, np.int32)
df = df.astype(d)
print (df.dtypes)
A object
B int32
C int32
D int32
E int32
F object
dtype: object
Convert integer64 into integer in R
The integer64 class is created by teh bit64 package. There are many questions that have arisen over the years that have been answered on SO. Need to recover the data without mangling by using functions from the package that created the integer64 object.
library(bit64)
?integer64
# You might imagine that as.numeric should have an integer64 method.
# .... but like me you would have been wrong
#Instead, division is defined for integer64 objects and it returns a double.
# .... so divide by 1 ( if and only if you have installed and loaded pkg:bit64
my_data3$export_value/1
[1] 0 0 0 0 116290 66703 0 0 0 0 0 0 0 0 0 44671
[17] 0 0 0 6350 7738 0 0 282161 148129 3499 14305 21185 4663 20628
I didn't notice earlier that there were both an as.integer
and an as.double
generic defined for 'integer64' objects, so it might be better to use them. Certainly if an integer were needed it might be better to start with as.integer
.
I suppose that one or more of those earlier questions and answers might have contained an answer to this question but I didn't come across a duplicate in the first 5 I looked at.
How do I convert integer 'category' dtypes in a Pandas DataFrame to 'int64'/'float64'?
Suppose the following dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame({'cat_str': ['Hello', 'World'],
'cat_int': [0, 1],
'cat_float': [3.14, 2.71]}, dtype='category')
print(df.dtypes)
# Output
cat_str category
cat_int category
cat_float category
dtype: object
You can try:
dtypes = {col: df[col].cat.categories.dtype for col in df.columns
if np.issubdtype(df[col].cat.categories.dtype, np.number)}
df = df.astype(dtypes)
print(df.dtypes)
# Output
cat_str category
cat_int int64
cat_float float64
dtype: object
Or if you want to remove all category dtypes, use:
dtypes = {col: df[col].cat.categories.dtype for col in df.columns}
df = df.astype(dtypes)
print(df.dtypes)
# Output
cat_str object
cat_int int64
cat_float float64
dtype: object
Convert sparse pandas dataframe with `NaN` into integer values
UPDATE:
if you need nice looking string values you can do it:
In [84]: df.astype(object)
Out[84]:
a b c
0 0 1 0
1 0 0 1
2 1 1 1
3 0 1 1
4 1 1 NaN
but all values - are strings (object
in pandas terms):
In [85]: df.astype(object).dtypes
Out[85]:
a object
b object
c object
dtype: object
Timings against 500K rows DF:
In [86]: df = pd.concat([df] * 10**5, ignore_index=True)
In [87]: df.shape
Out[87]: (500000, 3)
In [88]: %timeit df.astype(object)
10 loops, best of 3: 113 ms per loop
In [89]: %timeit df.applymap(lambda x: int(x) if pd.notnull(x) else x).astype(object)
1 loop, best of 3: 7.86 s per loop
OLD answer:
AFAIK you can't do it using modern pandas versions.
Here is a demo:
In [52]: df
Out[52]:
a b c
0 1.0 NaN 0.0
1 NaN 1.0 1.0
2 0.0 0.0 NaN
In [53]: df[pd.isnull(df)] = -1
In [54]: df
Out[54]:
a b c
0 1.0 -1.0 0.0
1 -1.0 1.0 1.0
2 0.0 0.0 -1.0
In [55]: df = df.astype(int)
In [56]: df
Out[56]:
a b c
0 1 -1 0
1 -1 1 1
2 0 0 -1
we are almost there, let's replace -1
with NaN
:
In [57]: df[df < 0] = np.nan
In [58]: df
Out[58]:
a b c
0 1.0 NaN 0.0
1 NaN 1.0 1.0
2 0.0 0.0 NaN
Another demo:
In [60]: df = pd.DataFrame(np.random.choice([0,1], (5,3)), columns=list('abc'))
In [61]: df
Out[61]:
a b c
0 1 0 0
1 1 0 1
2 0 1 1
3 0 0 1
4 0 0 1
look what happens with c
column if we change a single cell in it to NaN
:
In [62]: df.loc[4, 'c'] = np.nan
In [63]: df
Out[63]:
a b c
0 1 0 0.0
1 1 0 1.0
2 0 1 1.0
3 0 0 1.0
4 0 0 NaN
Related Topics
Loop Linear Regression and Saving Coefficients
Means from a List of Data Frames in R
Find Match of Two Data Frames and Rewrite The Answer as Data Frame
Visualizing Distance Between Nodes According to Weights - with R
How to Get Column Names When Using Skip Along with Read.Csv
Extract Names of Deeply Nested Lists
Install Previous Versions of R on Ubuntu
Using Mutate Rowwise Over a Subset of Columns
R Plotly: Preserving Appearance of Two Legends When Converting Ggplot2 with Ggplotly
Total of a Column in Dt Datatables in Shiny
Find Second Highest Value on a Raster Stack in R
Modifying Plot in Ggplot2 Using As.Yearmon from Zoo
Include Non-Cran Package in Cran Package