Why does 1..99,999 == 1..99,999 in R, but 100,000 != 100,000?
Have a look at as.character(100000)
. Its value is not equal to "100000"
(have a look for yourself), and R is essentially just telling you so.
as.character(100000)
# [1] "1e+05"
Here, from ?Comparison
, are R's rules for applying relational operators to values of different types:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of
precedence being character, complex, numeric, integer, logical and
raw.
Those rules mean that when you test whether 1=="1"
, say, R first converts the numeric value on the LHS to a character string, and then tests for equality of the character strings on the LHS and RHS. In some cases those will be equal, but in other cases they will not. Which cases produce inequality will be dependent on the current settings of options("scipen")
and options("digits")
So, when you type 100000=="100000"
, it is as if you were actually performing the following test. (Note that internally, R may well/probably does use something different than as.character()
to perform the conversion):
as.character(100000)=="100000"
# [1] FALSE
Why does one 2 equal FALSE in R?
From help("<")
:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
So in this case, the numeric is of lower precedence than the character. So 2
is coerced to the character "2"
. Comparison of strings in character vectors is lexicographic which, as I understand it, is alphabetic but locale-dependent.
Why is the expression 1==1 evaluating to TRUE?
From the help("==")
:
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
So 1
should be converted to "1"
.
random forest by group processing/scoring
Assuming your sales
dataset is 3,000 * 300 = 900,000
rows and both dataframes have a customer_id
column, you can do something like:
pred_groups <- split(seq_len(nrow(sales_score)), sales_score$customer_id)
# pred_groups is now a list, with names the customer_id's and each list
# element an integer vector of row numbers. Now iterate over each customer
# and make predictions on the training set.
preds <- unsplit(structure(lapply(names(pred_groups), function(customer_id) {
# Train using only observations for this customer.
# Note we are comparing character to integer but R's natural type
# coercion should still give the correct answer.
train_rows <- sales$customer_id == customer_id
sales.rf <- randomForest(Sales ~ ., ntree = 500,
data = sales[train_rows, ],importance=TRUE)
# Now make predictions only for this customer.
predict(sales.rf, sales_score[pred_groups[[customer_id]], ])
}), .Names = names(pred_groups)), sales_score$customer_id)
print(head(preds)) # Should now be a vector of predicted scores of length
# the number of rows in the train set.
Edit: Per @joran, here is a solution with a for
:
pred_groups <- split(seq_len(nrow(sales_score)), sales_score$customer_id)
preds <- numeric(nrow(sales_score))
for(customer_id in names(pred_groups)) {
train_rows <- sales$customer_id == customer_id
sales.rf <- randomForest(Sales ~ ., ntree = 500,
data = sales[train_rows, ],importance=TRUE)
pred_rows <- pred_groups[[customer_id]]
preds[pred_rows] <- predict(sales.rf, sales_score[pred_rows, ])
})
My FB HackerCup code too slow of large inputs
Here is my O(k) solution, which is based on the same idea as above, but runs much faster.
import os, sys
f = open(sys.argv[1], 'r')
T = int(f.readline())
def next(ary, start):
j = start
l = len(ary)
ret = start - 1
while j < l and ary[j]:
ret = j
j += 1
return ret
for t in range(T):
n, k = map(int, f.readline().strip().split(' '))
a, b, c, r = map(int, f.readline().strip().split(' '))
m = [0] * (4 * k)
s = [0] * (k+1)
m[0] = a
if m[0] <= k:
s[m[0]] = 1
for i in xrange(1, k):
m[i] = (b * m[i-1] + c) % r
if m[i] < k+1:
s[m[i]] += 1
p = next(s, 0)
m[k] = p + 1
p = next(s, p+2)
for i in xrange(k+1, n):
if m[i-k-1] > p or s[m[i-k-1]] > 1:
m[i] = p + 1
if m[i-k-1] <= k:
s[m[i-k-1]] -= 1
s[m[i]] += 1
p = next(s, p+2)
else:
m[i] = m[i-k-1]
if p == k:
break
if p != k:
print 'Case #%d: %d' % (t+1, m[n-1])
else:
print 'Case #%d: %d' % (t+1, m[i-k + (n-i+k+k) % (k+1)])
The key point here is, m[i] will never exceeds k, and if we remember the consecutive numbers we can find in previous k numbers from 0 to p, then p will never reduce.
If number m[i-k-1] is larger than p, then it's obviously we should set m[i] to p+1, and p will increase at least 1.
If number m[i-k-1] is smaller or equal to p, then we should consider whether the same number exists in m[i-k:i], if not, m[i] should set equal to m[i-k-1], if yes, we should set m[i] to p+1 just as the "m[i-k-1]-larger-than-p" case.
Whenever p is equal to k, the loop begin, and the loop size is (k+1), so we can jump out of the calculation and print out the answer now.
How to round a integer to the close hundred?
Try the Math.Round
method. Here's how:
Math.Round(76d / 100d, 0) * 100;
Math.Round(121d / 100d, 0) * 100;
Math.Round(9660d / 100d, 0) * 100;
knitr vs. interactive R behaviour
Thanks to Aleksey Vorona and Duncan Murdoch, this bug is now fixed in R-devel!
See: https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15411
Related Topics
Change Standard Error Color for Geom_Smooth
Rank Vector with Some Equal Values
How to Get a Minimum Value by Group
R Convert String Date (E.G. "October 1, 2014") to Date Format
Removing Unicode Symbols from Column Names
Create All Subvectors of a Certain Length (Moving Window)
How to Read Large Numbers Precisely in R and Perform Arithmetic on Them
Creating a Prng Engine for <Random> in C++11 That Matches Prng Results in R
Converting Multiple Boolean Columns to Single Factor Column
Control Padding of Grobs Added to Patchwork
Replacing for Loop with Foreach Loop
How to Filter Rows Based on the Previous Row and Keep Previous Row Using Dplyr
Removing Row with Duplicated Values in All Columns of a Data Frame (R)
Split a Column to Multiple Columns
Web Scraping Data Table with R Rvest
Object 'C_Stri_Join' Not Found - Using Knitr in Rstudio
Reshape Data from Long to Wide Format - More Than One Variable
How to Automate Nested Sections in Rmds Which Include Text, Maps and Tables