As(X, 'Double') and As.Double(X) Are Inconsistent

as(x, 'double') and as.double(x) are inconsistent

as is for coercing to a new class, and double technically isn't a class but rather a storage.mode.

y <- x
storage.mode(y) <- "double"
identical(x,y)
[1] FALSE
> identical(as.double(x),y)
[1] TRUE

The argument "double" is handled as a special case by as and will attempt to coerce to the class numeric, which the class integer already inherits, therefore there is no change.

is.numeric(x)
[1] TRUE

Not so fast...

While the above made sense, there is some further confusion. From ?double:

It is a historical anomaly that R has two names for its floating-point
vectors, double and numeric (and formerly had real).

double is the name of the type. numeric is the name of the mode and
also of the implicit class. As an S4 formal class, use "numeric".

The potential confusion is that R has used mode "numeric" to mean
‘double or integer’, which conflicts with the S4 usage. Thus
is.numeric tests the mode, not the class, but as.numeric (which is
identical to as.double) coerces to the class.

Therefore as should really change x according to the documentation... I will investigate further.

The plot is thicker than whipped cream and cornflour soup...

Well, if you debug as, you find out that what eventually happens is that the following method gets created rather than using the c("ANY","numeric") signature for the coerce generic which would call as.numeric:

function (from, strict = TRUE) 
if (strict) {
class(from) <- "numeric"
from
} else from

So actually, class<- gets called on x and this eventually means R_set_class is called from coerce.c. I believe the following part of the function determines the behaviour:

...
else if(!strcmp("numeric", valueString)) {
setAttrib(obj, R_ClassSymbol, R_NilValue);
if(IS_S4_OBJECT(obj)) /* NULL class is only valid for S3 objects */
do_unsetS4(obj, value);
switch(TYPEOF(obj)) {
case INTSXP: case REALSXP: break;
default: PROTECT(obj = coerceVector(obj, REALSXP));
nProtect++;
}
...

Note the switch statement: it breaks out without doing coercion in the case of integers and real values.

Bug or not?

Whether or not this is a bug depends on your point of view. Integers are numeric in one sense as confirmed by is.numeric(x) returning TRUE, but strictly speaking they are not a numeric class. On the other hand, since integers get promoted to double automatically on overflow, one may view them conceptually as the same. There are two major differences: i) Integers require less storage space - this may be significant for larger vectors, and, ii) when interacting with external code that has greater type discipline conversion costs may come into play.

type/origin of R's 'as' function

as is not an S3 generic, but notice that you got a TRUE. (I got a FALSE.) That means you have loaded a package that definesas as an S4-generic. S3-generics work via class dispatch that employs a *.default function and the UseMethod-function. The FALSE I get means there is no method defined for a generic as that would get looked up. One arguable reason for the lack of a generic as is that calling such a function with only one data object would not specify a "coercion destination". That means the destination needs to be built into the function name.

After declaring as to be Generic (note the capitalization which is a hint that this applies to S4 features:

setGeneric("as")  # note that I didn't really even need to define any methods

get('as')
#--- output----
standardGeneric for "as" defined from package "methods"

function (object, Class, strict = TRUE, ext = possibleExtends(thisClass,
Class))
standardGeneric("as")
<environment: 0x7fb1ba501740>
Methods may be defined for arguments: object, Class, strict, ext
Use showMethods("as") for currently available ones.

If I reboot R (and don't load any libraries that call setGeneric for 'as') I get:

get('as')
#--- output ---
function (object, Class, strict = TRUE, ext = possibleExtends(thisClass,
Class))
{
if (.identC(Class, "double"))
Class <- "numeric"
thisClass <- .class1(object)
if (.identC(thisClass, Class) || .identC(Class, "ANY"))
return(object)
where <- .classEnv(thisClass, mustFind = FALSE)
coerceFun <- getGeneric("coerce", where = where)
coerceMethods <- .getMethodsTable(coerceFun, environment(coerceFun),
inherited = TRUE)
asMethod <- .quickCoerceSelect(thisClass, Class, coerceFun,
coerceMethods, where)
.... trimmed the rest of the code

But you ask "why", always a dangerous question when discussing language design, of course. I've flipped through the last chapter of Statistical Models in S which is the cited reference for most of the help pages that apply to S3 dispatch and find no discussion of either coercion or the as function. There is an implicit definition of "S3 generic" requiring the use of UseMethod but no mention of why as was left out of that strategy. I think of two possibilities: it is to prevent any sort of inheritance ambiguity in the application of the coercion, or it is an efficiency decision.

I should probably add that there is an S4 setAs-function and that you can find all the S4-coercion functions with showMethods("coerce").

R: Inconsistent class returned by `median()` when vector labelled with Hmisc

user20650 points out in the comments that attributes are droped and kept depending on the vector length of x.

When we look at the code of the median.default method, we can see why. If length(x) is an even number, then mean is used (inside median), otherwise x is just sorted and subsetted which, unlike mean, doesn't remove the attributes.

# lets have a look at the median.default method
function (x, na.rm = FALSE, ...)
{
if (is.factor(x) || is.data.frame(x))
stop("need numeric data")
if (length(names(x)))
names(x) <- NULL
if (na.rm)
x <- x[!is.na(x)]
else if (any(is.na(x)))
return(x[FALSE][NA])
n <- length(x)
if (n == 0L)
return(x[FALSE][NA])
half <- (n + 1L)%/%2L
if (n%%2L == 1L)
# when length is odd: attribute is kept
sort(x, partial = half)[half]
# when length is even: `mean` drops attribute
else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
}

Created on 2021-04-28 by the reprex package (v0.3.0)

Let's have another look at different vectors and how they behave. We can define a keep_attr function which will keep the attributes of the wrapped function and input.

x1 <- 1
Hmisc::label(x1) = "qw"
class(median(x1)) # keeps attribute
#> [1] "labelled" "numeric"
class(mean(x1)) # drops attribute
#> [1] "numeric"

x2 <- c(1, 2)
Hmisc::label(x2) = "qw"
class(median(x2)) # uses mean
#> [1] "numeric"
class(mean(x2))
#> [1] "numeric"

x3 <- c(1, 2, NA)
Hmisc::label(x3) = "qw"
class(median(x3)) # doesn't use mean
#> [1] "labelled" "numeric"
class(mean(x3))
#> [1] "numeric"

keep_attr <- function(.f, x, ...) {
x_att <- attributes(x)
res <- .f(x, ...)
attributes(res) <- x_att
res
}

class(keep_attr(median, x2))
#> [1] "labelled" "numeric"
class(keep_attr(mean, x2))
#> [1] "labelled" "numeric"
keep_attr(median, x3, na.rm = TRUE)
#> qw
#> [1] 1.5

Created on 2021-04-28 by the reprex package (v0.3.0)

Update
Regarding your dplyr problem I was now able to reproduce the problem (I first forgot to label the cd4_count column and thought it was a dplyr versioning issue). However, the workaround with the keep_attr seems to be working.

library(dplyr)

data <-
structure(
list(
cd4_count = c(
30, 97, 210, NA, 358, 242, 126,
792, 6, 145, 22, 150, 43, 23, 39, 953, 357, 427, 367, 239, 72,
61, 61, 438, 392, 1092, 245, 326, 42, 135, 199, 158, 17, NA,
287, 187, 252, 477, 157, NA, NA, 362, NA, 183, 885, 109, 321,
286, 142, 797
),
unsuccessful = c(
0, 0, 0, 1, 0, 0, 1, 1, 0, 1,
1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0
)
),
row.names = c(NA, 50L),
class = "data.frame"
)

# Add label to CD4 count, using Hmisc package
Hmisc::label(data$cd4_count) <- "CD4 count"

data %>%
dplyr::group_by(unsuccessful) %>%
dplyr::summarize_at(dplyr::vars(cd4_count), median, na.rm = TRUE)
#> Error: Problem with `summarise()` input `cd4_count`.
#> x Input `cd4_count` must return compatible vectors across groups
#> i Input `cd4_count` is `(function (x, na.rm = FALSE, ...) ...`.
#> i Result type for group 1 (unsuccessful = 0): <labelled>.
#> i Result type for group 2 (unsuccessful = 1): <double>.

data %>%
dplyr::group_by(unsuccessful) %>%
dplyr::summarize_at(dplyr::vars(cd4_count), ~ keep_attr(median, .x, na.rm = TRUE))
#> # A tibble: 2 x 2
#> unsuccessful cd4_count
#> <dbl> <labelled>
#> 1 0 210.0
#> 2 1 135.5

Created on 2021-04-28 by the reprex package (v0.3.0)

Unexpected Scala Map type

If you look into Scala's type hierarchy, you can see that the primitive types are not in a sub-type/super-type relationship. However, there is a mechanism called numeric widening which for example allows you to call a method that takes a Double argument, by passing in say an Int. The Int then is automatically "widened" to Double.

This is the reason that List(1.0, 2) gives you List[Double].

But the Map constructor takes Tuple[A, B] arguments. The numeric widening doesn't apply to higher order types, so the target type inference doesn't work for you if you mix numeric types.

case class Test[A](tup: (A, Char)*)
Test(1.0 -> 'A', 2 -> 'B') // AnyVal

Moreover, the arrow operator -> gets in your way:

Test[Double](2 -> 'B') // found: (Int, Char)  required: (Double, Char)

This is another limitation of the type inference I think. Writing a tuple a -> b is syntactic sugar for (a, b), provided by the implicit method any2ArrowAssoc on Predef. Without this indirection, if you construct the Tuple2 directly, it works:

Test[Double]((2, 'B'))

So the numeric widening still doesn't work, but at least you can enforce the type:

Map[Double, Char]((1.0, 'A'), (2, 'B'))

A final example showing numeric widening working:

def map[A, B](keys: A*)(values: B*) = Map((keys zip values): _*)
map(1.0, 2)('A', 'B') // Map[Double, Char]

Why doesn't this reinterpret_cast compile?

Perhaps a better way of thinking of reinterpret_cast is the rouge operator that can "convert" pointers to apples as pointers to submarines.

By assigning y to the value returned by the cast you're not really casting the value x, you're converting it. That is, y doesn't point to x and pretend that it points to a float. Conversion constructs a new value of type float and assigns it the value from x. There are several ways to do this conversion in C++, among them:

int main()
{
int x = 42;
float f = static_cast<float>(x);
float f2 = (float)x;
float f3 = float(x);
float f4 = x;
return 0;
}

The only real difference being the last one (an implicit conversion) will generate a compiler diagnostic on higher warning levels. But they all do functionally the same thing -- and in many case actually the same thing, as in the same machine code.

Now if you really do want to pretend that x is a float, then you really do want to cast x, by doing this:

#include <iostream>
using namespace std;

int main()
{
int x = 42;
float* pf = reinterpret_cast<float*>(&x);
(*pf)++;
cout << *pf;
return 0;
}

You can see how dangerous this is. In fact, the output when I run this on my machine is 1, which is decidedly not 42+1.

Inconsistent printing of floats. Why DOES it work sometimes?

You are printing numpy.float64 objects, not the Python built-in float type, which uses David Gay's dtoa algorithm.

As of version 1.14, numpy uses the dragon4 algorithm to print floating point values, tuned to approach the same output as the David Gay algorithm used for the Python float type:

Numpy scalars use the dragon4 algorithm in "unique" mode (see below) for str/repr, in a way that tries to match python float output.

The numpy.format_float_positional() function documents this in a bit more detail:

unique : boolean, optional

If True, use a digit-generation strategy which gives the shortest representation which uniquely identifies the floating-point number from other values of the same type, by judicious rounding. If precision was omitted, print out all necessary digits, otherwise digit generation is cut off after precision digits and the remaining value is rounded.

So 0.2 can uniquely be presented by only printing 0.2, but the next value in the series (0.30000000000000004) can't, you have to include the extra digits to uniquely represent the exact value.

The how of this is actually quite involved; you can read a full report on this in Bungie's Destiny gameplay engineer Ryan Juckett's Printing Floating-Point Numbers series.

But basically the code outputting the string needs to determine what shortest representation exists for all decimal numbers clustering around the possible floating point number that can't be interpreted as the next or preceding possible floating point number:

floating point number line for 0.1, with the next and previous possible float values and possible representations

This image comes from The Shortest Decimal String That Round-Trips: Examples by Rick Regan, which covers some other cases as well. Numbers in blue are possible float64 values, in green are possible representations of decimal numbers. Note the grey half-way point markers, any representation that fits between those two half-way points around a float value are fair game, as all of those representations would produce the same value.

The goal of both the David Gay and Dragon4 algorithms is to find the shortest decimal string output that would produce the exact same float value again. From the Python 3.1 What's New section on the David Gay approach:

Python now uses David Gay’s algorithm for finding the shortest floating point representation that doesn’t change its value. This should help mitigate some of the confusion surrounding binary floating point numbers.

The significance is easily seen with a number like 1.1 which does not have an exact equivalent in binary floating point. Since there is no exact equivalent, an expression like float('1.1') evaluates to the nearest representable value which is 0x1.199999999999ap+0 in hex or 1.100000000000000088817841970012523233890533447265625 in decimal. That nearest value was and still is used in subsequent floating point calculations.

What is new is how the number gets displayed. Formerly, Python used a simple approach. The value of repr(1.1) was computed as format(1.1, '.17g') which evaluated to '1.1000000000000001'. The advantage of using 17 digits was that it relied on IEEE-754 guarantees to assure that eval(repr(1.1)) would round-trip exactly to its original value. The disadvantage is that many people found the output to be confusing (mistaking intrinsic limitations of binary floating point
representation as being a problem with Python itself).

The new algorithm for repr(1.1) is smarter and returns '1.1'. Effectively, it searches all equivalent string representations (ones that get stored with the same underlying float value) and returns the shortest representation.

The new algorithm tends to emit cleaner representations when possible, but it does not change the underlying values. So, it is still the case that 1.1 + 2.2 != 3.3 even though the representations may suggest otherwise.

Inconsistent Accessibility: Parameter type is less accessible than method

Constructor of public class clients is public but it has a parameter of type ACTInterface that is private (it is nested in a class?). You can't do that. You need to make ACTInterface at least as accessible as clients.

Single vs Double quotes (' vs )

The w3 org said:

By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote marks can be included within the attribute value when the value is delimited by double quote marks, and vice versa. Authors may also use numeric character references to represent double quotes (") and single quotes ('). For double quotes authors can also use the character entity reference ".

So... seems to be no difference. Only depends on your style.



Related Topics



Leave a reply



Submit