Is There an Elegant Way to Exclude the First Value of a Range

Is there an elegant way to exclude the first value of a range?

No.

((0+1)..10)

How to: Ruby Range that doesn't include the first value

No, there is no built-in support for such a range. You might want to roll your own Range-like class if this behavior is necessary.

Is there a way to iterate over a range, excluding a value

((0..10).to_a - [6]).each do |i|
  ...
end

Is there a more elegant way to find where a value fits in a range of values?

First, your code is redundant: you repeat several value checks. Instead:

if rev >= 300000000:
    return 10 # this is the score
elif rev >= 200000000:
    return 7
elif rev >= 100000000:
    return 5
elif rev >= 30000000:
    return 3
else:
    return 1

Now, generalize the problem: make a list of cutoffs and corresponding scores:

cutoff = [
    (3e8, 10),
    (2e8,  7),
    (1e8,  5),
    (3e7,  3),
    (0e1,  1)
]

Iterate through this list, checking rev against the cutoff value. When you fail a ">=" check, return the previous point value.

Implementation is left as an exercise for the student. Look out for the end-of-list case, too.

Is there a elegant way to only keep top[2~3] value for each row in a matrix?

First, use np.argsort() to find which locations have the highest values:

sort = np.argsort(df)

This gives a DataFrame whose column names are meaningless, but the second and third columns from the right contain the desired indices within each row:

        316  320  359  370  910
userId                         
316       4    3    1    2    0
320       3    4    0    2    1
359       4    0    1    3    2
370       1    0    4    2    3
910       1    0    2    3    4

Next, construct a boolean mask, set to true in the above locations:

mask = np.zeros(df.shape, bool)
rows = np.arange(len(df))
mask[rows, sort.iloc[:,-2]] = True
mask[rows, sort.iloc[:,-3]] = True

Now you have the mask you need:

array([[False,  True,  True, False, False],
       [ True, False,  True, False, False],
       [False,  True, False,  True, False],
       [False, False,  True, False,  True],
       [False, False,  True,  True, False]], dtype=bool)

Finally, df.where(mask):

             316       320       359       370       910
userId                                                  
316          NaN  0.202133  0.208618       NaN       NaN
320     0.202133       NaN  0.242837       NaN       NaN
359          NaN  0.242837       NaN  0.357620       NaN
370          NaN       NaN  0.357620       NaN  0.317371
910          NaN       NaN  0.175914  0.317371       NaN

find first element within a range not included in another range

based on std::set_difference I have written something like this:

/*
* first difference of elements: element present in first container
* and not present in second
*/
template <class T>
T first_difference (T* first1, T* last1,
                                 T* first2, T* last2)
{
  while (first1!=last1 && first2!=last2)
  {
    if (*first1<*first2) { return *first1; }
    else if (*first2<*first1) ++first2;
    else { ++first1; ++first2; }
  }
  if ( first1 != last1 && first2 == last2) return *first1;
  return 0;
}

usage:

int main(int argc, char** argv) {

    int first[] = {5, 10, 15, 20, 25};
    int second[] = {50, 40, 30, 20, 10};

    std::sort(first, first + 5); //  5 10 15 20 25
    std::sort(second, second + 5); // 10 20 30 40 50

    int i = first_difference( first, first + 5, second, second + 5);
    assert( i == 5);
    return 0;
}

Elegant way to skip elements in an iterable

Use the itertools recipe consume to skip n elements:

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

Note the islice() call there; it uses n, n, effectively not returning anything, and the next() function falls back to the default.

Simplified to your example, where you want to skip 999999 elements, then return element 1000000:

return next(islice(permutations(range(10)), 999999, 1000000))

islice() processes the iterator in C, something that Python loops cannot beat.

To illustrate, here are the timings for just 10 repeats of each method:

>>> from itertools import islice, permutations
>>> from timeit import timeit
>>> def list_index():
...     return list(permutations(range(10)))[999999]
... 
>>> def for_loop():
...     p = permutations(range(10))
...     for i in xrange(999999): p.next()
...     return p.next()
... 
>>> def enumerate_loop():
...     p = permutations(range(10))
...     for i, element in enumerate(p):
...         if i == 999999:
...             return element
... 
>>> def islice_next():
...     return next(islice(permutations(range(10)), 999999, 1000000))
... 
>>> timeit('f()', 'from __main__ import list_index as f', number=10)
5.550895929336548
>>> timeit('f()', 'from __main__ import for_loop as f', number=10)
1.6166789531707764
>>> timeit('f()', 'from __main__ import enumerate_loop as f', number=10)
1.2498459815979004
>>> timeit('f()', 'from __main__ import islice_next as f', number=10)
0.18969106674194336

The islice() method is nearly 7 times faster than the next fastest method.

How to remove an observation from a column that falls outside a desired range without leaving an NA

A numeric column can have normal values, NA, Inf, -Inf and NaN. But "empty" is not a possible value.

The reason for having NA is to mark that the value isn't available - seems exactly what you want! Using a negative number is just a more awkward way of doing the same thing - you'd have to remove all negative numbers before calculating mean, sum etc... You can do the same thing with NA - and that functionality is typically built into the functions: by specifying na.rm=TRUE.

df1 <- data.frame(col_a=c("male","female","male"),col_b=seq(1,30),col_c=seq(11,40))
df1$col_b[df1$col_b<5|df1$col_b>20] <- NA
sum(df1$col_b, na.rm=TRUE)    # 200
median(df1$col_b, na.rm=TRUE) # 12.5

How do I get the last 5 elements, excluding the first element from an array?

You can call:

arr.slice(Math.max(arr.length - 5, 1))

If you don't want to exclude the first element, use

arr.slice(Math.max(arr.length - 5, 0))

Is There an Elegant Way to Exclude the First Value of a Range