R: Sample() Command Subject to a Constraint

R: sample() command subject to a constraint

To make sure you're sampling uniformly, you could just generate all the permutations and limit to those that sum to 7:

library(gtools)
perms <- permutations(8, 7, 0:7, repeats.allowed=T)
perms7 <- perms[rowSums(perms) == 7,]

From nrow(perms7), we see there are only 1716 possible permutations that sum to 7. Now you can uniformly sample from the permutations:

set.seed(144)
my.perms <- perms7[sample(nrow(perms7), 100000, replace=T),]
head(my.perms)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 0 0 0 2 5 0 0
# [2,] 1 3 0 1 2 0 0
# [3,] 1 4 1 1 0 0 0
# [4,] 1 0 0 3 0 3 0
# [5,] 0 2 0 0 0 5 0
# [6,] 1 1 2 0 0 2 1

An advantage of this approach is that it's easy to see that we're sampling uniformly at random. Also, it's quite quick -- building perms7 took 0.3 seconds on my computer and building a 1 million-row my.perms took 0.04 seconds. If you need to draw many vectors this will be quite a bit quicker than a recursive approach because you're just using matrix indexing into perms7 instead of generating each vector separately.

Here's a distribution of counts of numbers in the sample:

#      0      1      2      3      4      5      6      7 
# 323347 188162 102812 51344 22811 8629 2472 423

Setting up an arbitrage strategy in R by solving system of inequalities subject to an equality constraint

We first explain why the code in the question got the results that it did.

Note that there is an implicit constraint of x >= 0.

Regarding the subject that refers to a equality constraint there are no equality constraints in the code shown in the question.

Regarding the minimum, in the lp arguments > means >= so clearly x=0 is feasible. Given that all components of the objective vector are positive it leads to a minimum of 0. From ?lp

const.dir: Vector of character strings giving the direction of the
constraint: each value should be one of "<," "<=," "=," "==,"
">," or ">=". (In each pair the two values are identical.)

Regarding the maximum, there are no upper bound constraints on the solution and the objective vector's components are all positive so there is no finite solution. Whether the linear program was successful or not should always be shown prior to displaying the solution and if it was unsuccessful then the solution should not be displayed since it has no meaning.

Regarding the code, cbind already produces a matrix so it is pointless to convert it to a data frame and then back to a matrix. Also the objective can be expressed as a plain vector and the rhs of the constraints can be written as a scalar which will be recycled to the appropriate length. We can write the code in the question equivalently as:

library(lpSolve)

A <- cbind(2:0, 1, 0:2, 3:1, c(1, 1, 0))
S <- c(1, 1, 1, 2, 1/3)

res1 <- lp("min", S, A, ">=", 0)
res1
## Success: the objective function is 0
res1$solution
## [1] 0 0 0 0 0

res2 <- lp("max", S, A, ">=", 0)
res2
## Error: status 3

CVXR

It is easier to formulate this using CVXR as shown below.

Find a vector x which satisfies Ax >= 0 and Sx == 0. (A and S are from above.) We add constraints -1 <= x <= 1 to keep the solution within bounds and use an arbitrary objective sum(x) since we are only looking for any feasible solution.

library(CVXR)

x <- Variable(5)
objective <- Maximize(sum(x)) # arbitrary objective
constraints <- list(A %*% x >= 0, sum(S*x) == 0, x >= -1, x <= 1)
problem <- Problem(objective, constraints)
soln <- solve(problem)

giving:

> soln$status
[1] "optimal"

> soln$value
[1] 1.66666

> soln$getValue(x)
[,1]
[1,] 0.8788689
[2,] 0.4790521
[3,] 0.3087133
[4,] -0.9999857
[5,] 1.0000113

lpSolve again

To do this using lpSolve make the change of variables

x = 2*y-1

or equivalently

y = (x+1)/2

which converts

Ax >= 0
Sx == 0
-1 <= x <= 1

to

2Ay >= A1
2Sy >= S'1
0 <= y <= 1

so we write:

f.con <- rbind(2*A, 2*S, diag(5))
f.dir <- c(rep(">=",3), "=", rep("<=", 5))
f.rhs <- c(rowSums(A), sum(S), rep(1, 5))

res3 <- lp("max", rep(1, 5), f.con, f.dir, f.rhs)
res3
## Success: the objective function is 3.333333
2*res3$solution-1
## [1] 1.000000e+00 -4.966028e-13 6.666667e-01 -1.000000e+00 1.000000e+00

Random sequence from fixed ensemble that contains at least one of each character

f <- function(x, n){
sample(c(x, sample(m, n-length(x), replace=TRUE)))
}
f(letters[1:3], 5)
# [1] "a" "c" "a" "b" "a"
f(letters[1:3], 5)
# [1] "a" "a" "b" "b" "c"
f(letters[1:3], 5)
# [1] "a" "a" "b" "c" "a"
f(letters[1:3], 5)
# [1] "b" "c" "b" "c" "a"

Constraints for a Linear Model Using Quadprog for Ranges/Limits on Coefficients

Just to clarify: You give an expected output "where the values would be predicted fitted values of X1, X2, X3 and X4". I assume you are referring to estimated (not fitted) parameter values.

It's quite straightforward to implement constrained parameters when modelling data in rstan. rstan is an R interface to Stan, which in turn is a probabilistic programming language for statistical Bayesian inference.

Here is an example based on the sample data you provide.

  1. First, let's define our model. Note that we constrain parameters beta2, beta3 and beta4 according to your requirements.

    model <- "
    data {
    int<lower=1> n; // number of observations
    int<lower=0> k; // number of parameters
    matrix[n, k] X; // data
    vector[n] y; // response
    }

    parameters {
    real beta1; // X1 unconstrained
    real<lower=negative_infinity(), upper=0.899> beta2; // X2 <= .899
    real<lower=0, upper=0.5> beta3; // 0 <= X3 <= 0.5
    real<lower=0, upper=0.334> beta4; // 0 <= X4 <= 0.334
    real sigma; // residuals
    }

    model {
    // Likelihood
    y ~ normal(beta1 * X[, 1] + beta2 * X[, 2] + beta3 * X[, 3] + beta4 * X[, 4], sigma);
    }"
  2. Next, we fit the model to your sample data.

    library(rstan)
    rstan_options(auto_write = TRUE)
    options(mc.cores = parallel::detectCores())
    df <- cbind.data.frame(Y, X)
    fit <- stan(
    model_code = model,
    data = list(
    n = nrow(df),
    k = ncol(df[, -1]),
    X = df[, -1],
    y = df[, 1]))
  3. Finally, let's print parameter estimates in a tidy way

    library(broom)
    tidy(fit, conf.int = TRUE)
    ## A tibble: 5 x 5
    # term estimate std.error conf.low conf.high
    # <chr> <dbl> <dbl> <dbl> <dbl>
    #1 beta1 29.2 6.53 16.9 42.8
    #2 beta2 0.609 0.234 0.0149 0.889
    #3 beta3 0.207 0.138 0.00909 0.479
    #4 beta4 0.164 0.0954 0.00780 0.326
    #5 sigma 16.2 5.16 9.42 29.1

    We can also plot parameter estimates including CI.

Note that parameter estimates are consistent with the imposed constraints.

Sample Image

list all permutations of k numbers, taken from 0:k, that sums to k

There's a package for that...

require( partitions )
parts(7)
#[1,] 7 6 5 5 4 4 4 3 3 3 3 2 2 2 1
#[2,] 0 1 2 1 3 2 1 3 2 2 1 2 2 1 1
#[3,] 0 0 0 1 0 1 1 1 2 1 1 2 1 1 1
#[4,] 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1
#[5,] 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
#[6,] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
#[7,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

You appear to be looking for compositions(). e.g. for k=4:

parts(4)

#[1,] 4 3 2 2 1
#[2,] 0 1 2 1 1
#[3,] 0 0 0 1 1
#[4,] 0 0 0 0 1

compositions(4,4)
#[1,] 4 3 2 1 0 3 2 1 0 2 1 0 1 0 0 3 2 1 0 2 1 0 1 0 0 2 1 0 1 0 0 1 0 0 0
#[2,] 0 1 2 3 4 0 1 2 3 0 1 2 0 1 0 0 1 2 3 0 1 2 0 1 0 0 1 2 0 1 0 0 1 0 0
#[3,] 0 0 0 0 0 1 1 1 1 2 2 2 3 3 4 0 0 0 0 1 1 1 2 2 3 0 0 0 1 1 2 0 0 1 0
#[4,] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 4

And just to check your math... :-)

ncol(compositions(8,8))
#[1] 6435

R: Generating all permutations of N weights in multiples of P

Assuming the order of the weights matters, these are compositions; if they don't then these are partitions. In either case, they are restricted by the number of parts, what you have called N, though the code which follows uses numparts. There is also the question of whether weights of 0 are allowed.

Since you want weights to add up to 1, you need 1/p to be an integer, which in the following code is sumparts; it does not depend on the number of weights. Once you have the compositions, you can multiply them by p, i.e. divide by n, to get your weights.

R has a partitions package to generate such compositions or restricted partitions. The following code should be self explanatory: each column in the matrix is a set of weights. I have taken seven weights and p=0.1 or 10%, and prohibited weights of 0: this gives 84 possibilities; allowing weights of 0 would mean 8008 possibilities. With p=0.01 or 1% there would be 1,120,529,256 possibilities without weights of 0, and 1,705,904,746 with. If order does not matter, use restrictedparts instead of compositions.

> library(partitions)
> numparts <- 7 # number of weights
> sumparts <- 10 # reciprocal of p
> weights <- compositions(n=sumparts, m=numparts, include.zero=FALSE)/sumparts
> weights

[1,] 0.4 0.3 0.2 0.1 0.3 0.2 0.1 0.2 0.1 0.1 0.3 0.2 0.1 0.2 0.1 0.1 0.2 0.1 0.1
[2,] 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.1 0.2 0.1 0.1 0.2 0.3 0.1 0.2 0.1 0.1 0.2 0.1
[3,] 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.1 0.1 0.1 0.2 0.2 0.3 0.1 0.1 0.2
[4,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2 0.2 0.3 0.3 0.3
[5,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
[6,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
[7,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

[1,] 0.1 0.3 0.2 0.1 0.2 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.3 0.2 0.1
[2,] 0.1 0.1 0.2 0.3 0.1 0.2 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.3
[3,] 0.1 0.1 0.1 0.1 0.2 0.2 0.3 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1
[4,] 0.4 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1
[5,] 0.1 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.3 0.4 0.1 0.1 0.1
[6,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2
[7,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

[1,] 0.2 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.3
[2,] 0.1 0.2 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1
[3,] 0.2 0.2 0.3 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1
[4,] 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1
[5,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.3 0.1 0.1 0.1 0.1 0.2 0.1 0.1
[6,] 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.3 0.3 0.4 0.1
[7,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2

[1,] 0.2 0.1 0.2 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1
[2,] 0.2 0.3 0.1 0.2 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1
[3,] 0.1 0.1 0.2 0.2 0.3 0.1 0.1 0.2 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1 0.1
[4,] 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.2 0.1
[5,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.3 0.1 0.1 0.1 0.1 0.2
[6,] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2
[7,] 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2

[1,] 0.1 0.2 0.1 0.1 0.1 0.1 0.1 0.1
[2,] 0.1 0.1 0.2 0.1 0.1 0.1 0.1 0.1
[3,] 0.1 0.1 0.1 0.2 0.1 0.1 0.1 0.1
[4,] 0.1 0.1 0.1 0.1 0.2 0.1 0.1 0.1
[5,] 0.1 0.1 0.1 0.1 0.1 0.2 0.1 0.1
[6,] 0.3 0.1 0.1 0.1 0.1 0.1 0.2 0.1
[7,] 0.2 0.3 0.3 0.3 0.3 0.3 0.3 0.4

How should I use {targets} when I have multiple data files

This is the pattern that I normally use

  tar_files(
file_paths,
"file_paths_folder" %>%
list.files(full.names = TRUE)
),
tar_target(
processed_files,
file_paths%>%
readxl::read_excel() %>% # can be anything read csv, parquet etc.
janitor::clean_names() %>% # start processing
mutate_at(vars(a,b,c), as.Date, format = "%Y-%m-%d"), # can be really complex operations
pattern = map(file_paths)
)


Related Topics



Leave a reply



Submit