Preserving large numbers
It's not in a "1.67E+12 format", it just won't print entirely using the defaults. R is reading it in just fine and the whole number is there.
x <- 1665535004661
> x
[1] 1.665535e+12
> print(x, digits = 16)
[1] 1665535004661
See, the numbers were there all along. They don't get lost unless you have a really large number of digits. Sorting on what you brought in will work fine and you can just explicitly call print() with the digits option to see your data.frame instead of implicitly by typing the name.
How do I read large numbers precisely in R and perform arithmetic on them?
That's not large. It is merely a representation problem. Try this:
options(digits=22)
options('digits')
defaults to 7, which is why you are seeing what you do. All twelve digits are being read and stored, but not printed by default.
Problem with storing and retrieving very large numbers in parquet format
The problem isn't related with Parquet, but with your initial conversion of the row_list
to a pandas DataFrame:
row_list = get_row_list()
col_list = ['tree_id']
df = pd.DataFrame(row_list, columns=col_list)
>>> df
tree_id
0 NaN
1 2.353130e+17
2 NaN
3 1.353130e+17
4 9.353130e+17
5 8.353130e+17
6 NaN
7 NaN
Because there are missing values, pandas creates a float64 column. And it is this int -> float conversion that looses the precision for such large integers.
Later converting the float to an integer again (when creating the pyarrow Table with a schema that forces an integer column) will then result in a slightly different value, as can be seen doing this manually in python as well:
>>> row_list[1]
235313013750949476
>>> df.loc[1, "tree_id"]
2.3531301375094947e+17
>>> int(df.loc[1, "tree_id"])
235313013750949472
One possible solution is to avoid the temporary DataFrame. This will depend on your exact (real) use case of course, but if you start from a python list as in the reproducible example above, you can also create a pyarrow.Table directly from this list of values (pa.table({"tree_id": row_list}, schema=..)
and this will preserve the exact values in the Parquet file.
Related Topics
Differencebetween Gc() and Rm()
How to Save Data File into .Rdata
How to Sort a Data Frame by Date
How to Fix the Aspect Ratio in Ggplot
Setting Document Title in Rmarkdown from Parameters
Rmarkdown: How to Change the Font Color
Geom_Text How to Position the Text on Bar as I Want
Sort a Data.Table Fast by Ascending/Descending Order
Text Clustering with Levenshtein Distances
Access and Preserve List Names in Lapply Function
How to Show the Y Value on Tooltip While Hover in Ggplot2
Convert Binary String to Binary or Decimal Value
Removing Na Observations with Dplyr::Filter()
How to Extract Month from Date in R
R Shiny Set Datatable Column Width