How to Decode Postgresql Bytea Column Hex to Int16/Uint16 in R

How to decode PostgreSQL bytea column hex to int16/uint16 in r?

You can start with this conversion function, substitute a faster strsplit and use readBin on the result:

byteArray <- "\\xffff00000100020003000400050006000700080009000a00"

## Split a long string into a a vector of character pairs
Rcpp::cppFunction( code = '
CharacterVector strsplit2(const std::string& hex) {
unsigned int length = hex.length()/2;
CharacterVector res(length);
for (unsigned int i = 0; i < length; ++i) {
res(i) = hex.substr(2*i, 2);
}
return res;
}')

## A function to convert one string to an array of raw
f <- function(x) {
## Split a long string into a a vector of character pairs
x <- strsplit2(x)
## Remove the first element, "\\x"
x <- x[-1]
## Complete the conversion
as.raw(as.hexmode(x))
}

raw <- f(byteArray)
# int16
readBin(con = raw,
what = "integer",
n = length(raw) / 2,
size = 2,
signed = TRUE,
endian = "little")
# -1 0 1 2 3 4 5 6 7 8 9 10

# uint16
readBin(con = raw,
what = "integer",
n = length(raw) / 2,
size = 2,
signed = FALSE,
endian = "little")
# 65535 0 1 2 3 4 5 6 7 8 9 10

# int32
readBin(con = raw,
what = "integer",
n = length(raw) / 4,
size = 4,
signed = TRUE,
endian = "little")
# 65535 131073 262147 393221 524295 655369

This won't work for uint32 and (u)int64, though, since R uses int32 internally. However, R can also use numerics to store integers below 2^52. So we can use this:

# uint32
byteArray <- "\\xffffffff0100020003000400050006000700080009000a00"
int32 <- readBin(con = f(byteArray),
what = "integer",
n = length(raw) / 4,
size = 4,
signed = TRUE,
endian = "little")

ifelse(int32 < 0, int32 + 2^32, int32)
# 4294967295 131073 262147 393221 524295 655369

And for gzip compressed data:

# gzip
byteArray <- "\\x1f8b080000000000000005c1870100200800209a56faffbd41d30dd3b285e37a52f9d033018818000000"
con <- gzcon(rawConnection(f(byteArray)))
readBin(con = con,
what = "integer",
n = length(raw) / 2,
size = 2,
signed = TRUE,
endian = "little")
close(con = con)

Since this is a real connection, we have to make sure to close it.

Bytea data format in R shiny

I dont believe you can convert it. Bytea data is basically bite data. I would suggest just not pulling it into the rdata frame. Bytea is normally bitdata that is an image or something that can not be stored in a traditional format.

Another approach would be to use a JDBC, that normally takes care of the transformations. I would take a look at require(RPostgreSQL) to get a sense of how it can be done.

You might be able to convert it back once it is loaded into R. Try something like this

udecode <- function(string){
uconv <- function(chars) intToUtf8(strtoi(chars, 16L))
ufilter <- function(string) {
if (substr(string, 1, 1)=="|") uconv(substr(string, 2, 5)) else string
}
string <- gsub("\\\\u([[:xdigit:]]{4})", ",|\\1,", string, perl=TRUE)
strings <- unlist(strsplit(string, ","))
string <- paste(sapply(strings, ufilter), collapse='')
return(string)
}

HDBC-postgreSQL bytea fields get returned as SqlByteString hex strings

This is a longtime known issue with this library. See this bug for example.

The broader problem is that getting raw bytes requires a fair amount of cleverness that the postgres api doesn't make obvious. You have to call your entire query with binary rather than text output (which is arguably better anyway, but would require a rewrite of that portion of the binding layer).

You can see where pqexecparams is called and note it is called with a last parameter of 0, which by the postgres docs means everything comes back in text. And for postgres that means this funny hex representation you see.

If that argument were swapped to 1, then things could be done more efficiently (including getting raw binary for bytea fields) but Statement.hsc would have to be pervasively rewritten to deserialize those binary values.

This is one of those things where its slightly irritating to lots of people, but nobody has yet been sufficiently motivated to go rewrite and debug the whole thing. But, of course, somebody really should! :-)

How to insert binary data into a PostgreSQL BYTEA column using the C++ libpqxx API?

Figured it out. Here is an example showing how to insert a bunch of binary objects into a table:

pqxx::connection conn( ... );
conn.prepare( "test", "INSERT INTO mytable( name, binfile ) VALUES ($1, $2)" );
pqxx::work work( conn );
for ( ... )
{
std::string name = "foo";
void * bin_data = ...; // obviously do what you need to get the binary data...
size_t bin_size = 123; // ...and the size of the binary data

pqxx::binarystring blob( bin_data, bin_size );
pqxx::result r = work.prepared( "test" )( name )( blob ).exec();
}
work.commit();

Here is how to get the binary data back out of the database:

pqxx::result result = work.exec( "SELECT * FROM mytable" );
for ( const auto &row : result )
{
pqxx::binarystring blob( row["binfile"] );
void * ptr = blob.data();
size_t len = blob.size();
...
}


Related Topics



Leave a reply



Submit