Difference Between Mean(C(1,2,21)) and Mean(1,2,21)

Difference between mean(c(1,2,21)) and mean(1,2,21)

mean(c(1,2,21))
#[1] 8

This passes a vector of three elements to the mean function and the mean value of these three elements is calculated.

mean(1,2,21)
#[1] 1

This passes 1 as the first argument, 2 as the second argument and 21 as the third argument to the mean function. mean passes these arguments to mean.default. In help("mean.default") you can find the arguments of this function:

  1. The object you want the mean for.
  2. the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.
  3. a logical value indicating whether NA values should be stripped before the computation proceeds. (Since you pass a numeric value, it is coerced to logical automatically).

So you calculate this:

mean.default(1, 0.5, TRUE)
[1] 1

C++: what does (ab) mean?

1 << 1 means:

00000000 00000001 changes to 00000000 00000010

1 << 8 means:

00000000 00000001 changes to 00000001 00000000

It's a bit shift operation. For every 1 on the right, you can think of yourself as multiplying the value on the left by 2. So, 2 << 1 = 4 and 2 << 2 = 8. This is much more efficient than doing 1 * 2.

Also, you can do 4 >> 1 = 2 (and 5 >> 1 = 2 since you round down) as the inverse operation.

What is the difference between i = i + 1 and i += 1 in a 'for' loop?

The difference is that one modifies the data-structure itself (in-place operation) b += 1 while the other just reassigns the variable a = a + 1.


Just for completeness:

x += y is not always doing an in-place operation, there are (at least) three exceptions:

  • If x doesn't implement an __iadd__ method then the x += y statement is just a shorthand for x = x + y. This would be the case if x was something like an int.

  • If __iadd__ returns NotImplemented, Python falls back to x = x + y.

  • The __iadd__ method could theoretically be implemented to not work in place. It'd be really weird to do that, though.

As it happens your bs are numpy.ndarrays which implements __iadd__ and return itself so your second loop modifies the original array in-place.

You can read more on this in the Python documentation of "Emulating Numeric Types".

These [__i*__] methods are called to implement the augmented arithmetic assignments (+=, -=, *=, @=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=). These methods should attempt to do the operation in-place (modifying self) and return the result (which could be, but does not have to be, self). If a specific method is not defined, the augmented assignment falls back to the normal methods. For instance, if x is an instance of a class with an __iadd__() method, x += y is equivalent to x = x.__iadd__(y) . Otherwise, x.__add__(y) and y.__radd__(x) are considered, as with the evaluation of x + y. In certain situations, augmented assignment can result in unexpected errors (see Why does a_tuple[i] += ["item"] raise an exception when the addition works?), but this behavior is in fact part of the data model.

The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe

The R Language Definition is handy for answering these types of questions:

  • http://cran.r-project.org/doc/manuals/R-lang.html#Indexing


R has three basic indexing operators, with syntax displayed by the following examples



x[i]
x[i, j]
x[[i]]
x[[i, j]]
x$a
x$"a"


For vectors and matrices the [[ forms are rarely used, although they have some slight semantic differences from the [ form (e.g. it drops any names or dimnames attribute, and that partial matching is used for character indices). When indexing multi-dimensional structures with a single index, x[[i]] or x[i] will return the ith sequential element of x.


For lists, one generally uses [[ to select any single element, whereas [ returns a list of the selected elements.


The [[ form allows only a single element to be selected using integer or character indices, whereas [ allows indexing by vectors. Note though that for a list, the index can be a vector and each element of the vector is applied in turn to the list, the selected component, the selected component of that component, and so on. The result is still a single element.

Closest subsequent index for a specified value

Find the location of each value (numeric or character)

int = c(1, 1, 0, 5, 2, 0, 0, 2)
value = 0
idx = which(int == value)
## [1] 3 6 7

Expand the index to indicate the nearest value of interest, using an NA after the last value in int.

nearest = rep(NA, length(int))
nearest[1:max(idx)] = rep(idx, diff(c(0, idx))),
## [1] 3 3 3 6 6 6 7 NA

Use simple arithmetic to find the difference between the index of the current value and the index of the nearest value

abs(seq_along(int) - nearest)
## [1] 2 1 0 2 1 0 0 NA

Written as a function

f <- function(x, value) {
idx = which(x == value)
nearest = rep(NA, length(x))
if (length(idx)) # non-NA values only if `value` in `x`
nearest[1:max(idx)] = rep(idx, diff(c(0, idx)))
abs(seq_along(x) - nearest)
}

We have

> f(int, 0)
[1] 2 1 0 2 1 0 0 NA
> f(int, 1)
[1] 0 0 NA NA NA NA NA NA
> f(int, 2)
[1] 4 3 2 1 0 2 1 0
> f(char, "A")
[1] 0 2 1 0 0
> f(char, "B")
[1] 1 0 NA NA NA
> f(char, "C")
[1] 2 1 0 NA NA

The solution doesn't involve recursion or R-level loops, so should e fast even for long vectors.

NA problem when calculating mean by group

df <- within(df, {new = ave(old, groupID, FUN= function(x) mean(x, na.rm=TRUE))})

This in case you don't want to rewrite all your input data in a different (numeric) format

Compute differences between all variable pairs in R

Using base r:

df_dist <- t(apply(df, 1, dist))
colnames(df_dist) <- apply(combn(names(df), 2), 2, paste0, collapse = "_")

If you really want to use a tidy-approach, you could go with c_across, but this also removes the names, and is much slower if your data is huge

What is the difference between '/' and '//' when used for division?

In Python 3.x, 5 / 2 will return 2.5 and 5 // 2 will return 2. The former is floating point division, and the latter is floor division, sometimes also called integer division.

In Python 2.2 or later in the 2.x line, there is no difference for integers unless you perform a from __future__ import division, which causes Python 2.x to adopt the 3.x behavior.

Regardless of the future import, 5.0 // 2 will return 2.0 since that's the floor division result of the operation.

You can find a detailed description at PEP 238: Changing the Division Operator.

What does .view() do in PyTorch?

view() reshapes the tensor without copying memory, similar to numpy's reshape().

Given a tensor a with 16 elements:

import torch
a = torch.range(1, 16)

To reshape this tensor to make it a 4 x 4 tensor, use:

a = a.view(4, 4)

Now a will be a 4 x 4 tensor. Note that after the reshape the total number of elements need to remain the same. Reshaping the tensor a to a 3 x 5 tensor would not be appropriate.

What is the meaning of parameter -1?

If there is any situation that you don't know how many rows you want but are sure of the number of columns, then you can specify this with a -1. (Note that you can extend this to tensors with more dimensions. Only one of the axis value can be -1). This is a way of telling the library: "give me a tensor that has these many columns and you compute the appropriate number of rows that is necessary to make this happen".

This can be seen in this model definition code. After the line x = self.pool(F.relu(self.conv2(x))) in the forward function, you will have a 16 depth feature map. You have to flatten this to give it to the fully connected layer. So you tell PyTorch to reshape the tensor you obtained to have specific number of columns and tell it to decide the number of rows by itself.



Related Topics



Leave a reply



Submit