How to decode PostgreSQL bytea column hex to int16/uint16 in r?
You can start with this conversion function, substitute a faster strsplit
and use readBin
on the result:
byteArray <- "\\xffff00000100020003000400050006000700080009000a00"
## Split a long string into a a vector of character pairs
Rcpp::cppFunction( code = '
CharacterVector strsplit2(const std::string& hex) {
unsigned int length = hex.length()/2;
CharacterVector res(length);
for (unsigned int i = 0; i < length; ++i) {
res(i) = hex.substr(2*i, 2);
}
return res;
}')
## A function to convert one string to an array of raw
f <- function(x) {
## Split a long string into a a vector of character pairs
x <- strsplit2(x)
## Remove the first element, "\\x"
x <- x[-1]
## Complete the conversion
as.raw(as.hexmode(x))
}
raw <- f(byteArray)
# int16
readBin(con = raw,
what = "integer",
n = length(raw) / 2,
size = 2,
signed = TRUE,
endian = "little")
# -1 0 1 2 3 4 5 6 7 8 9 10
# uint16
readBin(con = raw,
what = "integer",
n = length(raw) / 2,
size = 2,
signed = FALSE,
endian = "little")
# 65535 0 1 2 3 4 5 6 7 8 9 10
# int32
readBin(con = raw,
what = "integer",
n = length(raw) / 4,
size = 4,
signed = TRUE,
endian = "little")
# 65535 131073 262147 393221 524295 655369
This won't work for uint32
and (u)int64
, though, since R uses int32
internally. However, R can also use numerics
to store integers below 2^52. So we can use this:
# uint32
byteArray <- "\\xffffffff0100020003000400050006000700080009000a00"
int32 <- readBin(con = f(byteArray),
what = "integer",
n = length(raw) / 4,
size = 4,
signed = TRUE,
endian = "little")
ifelse(int32 < 0, int32 + 2^32, int32)
# 4294967295 131073 262147 393221 524295 655369
And for gzip
compressed data:
# gzip
byteArray <- "\\x1f8b080000000000000005c1870100200800209a56faffbd41d30dd3b285e37a52f9d033018818000000"
con <- gzcon(rawConnection(f(byteArray)))
readBin(con = con,
what = "integer",
n = length(raw) / 2,
size = 2,
signed = TRUE,
endian = "little")
close(con = con)
Since this is a real connection, we have to make sure to close it.
Bytea data format in R shiny
I dont believe you can convert it. Bytea data is basically bite data. I would suggest just not pulling it into the rdata frame. Bytea is normally bitdata that is an image or something that can not be stored in a traditional format.
Another approach would be to use a JDBC, that normally takes care of the transformations. I would take a look at require(RPostgreSQL) to get a sense of how it can be done.
You might be able to convert it back once it is loaded into R. Try something like this
udecode <- function(string){
uconv <- function(chars) intToUtf8(strtoi(chars, 16L))
ufilter <- function(string) {
if (substr(string, 1, 1)=="|") uconv(substr(string, 2, 5)) else string
}
string <- gsub("\\\\u([[:xdigit:]]{4})", ",|\\1,", string, perl=TRUE)
strings <- unlist(strsplit(string, ","))
string <- paste(sapply(strings, ufilter), collapse='')
return(string)
}
HDBC-postgreSQL bytea fields get returned as SqlByteString hex strings
This is a longtime known issue with this library. See this bug for example.
The broader problem is that getting raw bytes requires a fair amount of cleverness that the postgres api doesn't make obvious. You have to call your entire query with binary rather than text output (which is arguably better anyway, but would require a rewrite of that portion of the binding layer).
You can see where pqexecparams is called and note it is called with a last parameter of 0, which by the postgres docs means everything comes back in text. And for postgres that means this funny hex representation you see.
If that argument were swapped to 1, then things could be done more efficiently (including getting raw binary for bytea
fields) but Statement.hsc
would have to be pervasively rewritten to deserialize those binary values.
This is one of those things where its slightly irritating to lots of people, but nobody has yet been sufficiently motivated to go rewrite and debug the whole thing. But, of course, somebody really should! :-)
How to insert binary data into a PostgreSQL BYTEA column using the C++ libpqxx API?
Figured it out. Here is an example showing how to insert a bunch of binary objects into a table:
pqxx::connection conn( ... );
conn.prepare( "test", "INSERT INTO mytable( name, binfile ) VALUES ($1, $2)" );
pqxx::work work( conn );
for ( ... )
{
std::string name = "foo";
void * bin_data = ...; // obviously do what you need to get the binary data...
size_t bin_size = 123; // ...and the size of the binary data
pqxx::binarystring blob( bin_data, bin_size );
pqxx::result r = work.prepared( "test" )( name )( blob ).exec();
}
work.commit();
Here is how to get the binary data back out of the database:
pqxx::result result = work.exec( "SELECT * FROM mytable" );
for ( const auto &row : result )
{
pqxx::binarystring blob( row["binfile"] );
void * ptr = blob.data();
size_t len = blob.size();
...
}
Related Topics
How to Use R to Create a Word Co-Occurrence Matrix
Subsetting Data Based on Dynamic Column Names
How to Read Column Names 'As Is' from CSV File
How to Log Transform the Y-Axis of R Geom_Histogram in the Right Direction
Generating a Date from a String with a 'Month-Year' Format
How to Add a Legend for the Secondary Axis Ggplot
Filtering a Dataframe Showing Only Duplicates
Grouping Factor Levels in a Data.Table
R Geom_Tile Ggplot2 What Kind of Stat Is Applied
Data.Table: Sum by All Existing Combinations in Table
How to Draw Roc Curve Using Value of Confusion Matrix
How to Color Bar Plots When Using ..Prop.. in Ggplot
R - Identify Consecutive Sequences
Why Does Withcallinghandlers Still Stops Execution
How to Split a Vector by Delimiter
How to Highlight Area Between Two Lines? Ggplot
Create a New Column with Non-Null Columns' Names
How to Convert All Column Data Type to Numeric and Character Dynamically