How to Pad a Vector with Na from the Front

How can I pad a vector with NA from the front?

Assuming v1 has the desired length and v2 is shorter (or the same length) these left pad v2 with NA values to the length of v1. The first four assume numeric vectors although they can be modified to also work more generally by replacing NA*v1 in the code with rep(NA, length(v1)).

replace(NA * v1, seq(to = length(v1), length = length(v2)), v2)

rev(replace(NA * v1, seq_along(v2), rev(v2)))

replace(NA * v1, seq_along(v2) + length(v1) - length(v2), v2)

tail(c(NA * v1, v2), length(v1))

c(rep(NA, length(v1) - length(v2)), v2)

The fourth is the shortest. The first two and fourth do not involve any explicit arithmetic calculations other than multiplying v1 with NA values. The second is likely slow since it involves two applications of rev.

Adding NA's to a vector

You could use your own modification of diff:

mydiff <- function(data, diff){
  c(diff(data, lag = diff), rep(NA, diff))
}

mydiff(foo, 1)
[1]  0.62 -0.62 -1.38  2.57  0.43 -0.87    NA

data.frame(foo = foo,  diff = mydiff(foo, 3))

     foo  diff
1 102.25 -1.38
2 102.87  0.57
3 102.25  1.62
4 100.87  2.13
5 103.44    NA
6 103.87    NA
7 103.00    NA

adding a variable length padding to each element in a string/character vector

A vectorised base R option :

vec <- c("dog", "cat", "mouse", "hare", "snake") 
n <- max(nchar(vec))
paste0(vec, strrep('+', n - nchar(vec)))
#[1] "dog++" "cat++" "mouse" "hare+" "snake"

Calculate derivative diff() and keep length - add NA

From this answer to a question of mine.

If you were looking for a generic way to prepend NA

pad  <- function(x, n) {
    len.diff <- n - length(x)
    c(rep(NA, len.diff), x) 
} 

x <- 1:10
dif <- pad(diff(x, lag=1), length(x))

but if you are not afraid to bring in zoo library it's better to do:

library(zoo)
x <- 1:5
as.vector(diff(zoo(x), na.pad=TRUE)) # convert x to zoo first, then diff (that invokes zoo's diff which takes a na.pad=TRUE)
# NA 1 1 1 1 (same length as original x vector)

R Filling missing values with NA for a data frame

You could do:

data.frame(sapply(dyem_list, "length<-", max(lengths(dyem_list))))

   location         organization person date            Jobs
1       USA            Microsoft   NULL 1989             CEO
2 Singapore University of London   NULL 2001        Chairman
3        UK               Boeing   NULL 2018     VP of sales
4      NULL                Apple   NULL NULL General Manager
5      NULL                 NULL   NULL NULL        Director

Where dyem_list is the following:

dyem_list <- list(
  location = list("USA","Singapore","UK"),
  organization = list("Microsoft","University of London","Boeing","Apple"),
  person = list(),
  date = list("1989","2001","2018"),
  Jobs = list("CEO","Chairman","VP of sales","General Manager","Director")
)

python how to pad numpy array with zeros

Very simple, you create an array containing zeros using the reference shape:

result = np.zeros(b.shape)
# actually you can also use result = np.zeros_like(b) 
# but that also copies the dtype not only the shape

and then insert the array where you need it:

result[:a.shape[0],:a.shape[1]] = a

and voila you have padded it:

print(result)
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

You can also make it a bit more general if you define where your upper left element should be inserted

result = np.zeros_like(b)
x_offset = 1  # 0 would be what you wanted
y_offset = 1  # 0 in your case
result[x_offset:a.shape[0]+x_offset,y_offset:a.shape[1]+y_offset] = a
result

array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.]])

but then be careful that you don't have offsets bigger than allowed. For x_offset = 2 for example this will fail.

If you have an arbitary number of dimensions you can define a list of slices to insert the original array. I've found it interesting to play around a bit and created a padding function that can pad (with offset) an arbitary shaped array as long as the array and reference have the same number of dimensions and the offsets are not too big.

def pad(array, reference, offsets):
    """
    array: Array to be padded
    reference: Reference array with the desired shape
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    """
    # Create an array of zeros with the reference shape
    result = np.zeros(reference.shape)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offset[dim], offset[dim] + array.shape[dim]) for dim in range(a.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = a
    return result

And some test cases:

import numpy as np

# 1 Dimension
a = np.ones(2)
b = np.ones(5)
offset = [3]
pad(a, b, offset)

# 3 Dimensions

a = np.ones((3,3,3))
b = np.ones((5,4,3))
offset = [1,0,0]
pad(a, b, offset)

Some built-in to pad a list in python

a += [''] * (N - len(a))

or if you don't want to change a in place

new_a = a + [''] * (N - len(a))

you can always create a subclass of list and call the method whatever you please

class MyList(list):
    def ljust(self, n, fillvalue=''):
        return self + [fillvalue] * (n - len(self))

a = MyList(['1'])
b = a.ljust(5, '')

Pad with leading zeros to common width

Simply following the advise in @joran's comment,

DB <- data.frame(
HOUR  = c(1, 10, 5, 20),
ID  = c(2, 4, 6, 6))

NHOUR <- sprintf("%02d",DB$HOUR) # fix to 2 characters 

cbind(NHOUR, DB) # combine old and newdata 
  NHOUR HOUR ID
1    01    1  2
2    10   10  4
3    05    5  6
4    20   20  6

Update 2013-01-21 23:42:00Z Inspired by daroczig's performance test below, and because I wanted to try out the microbenchmark package, I've updated this question with a small performance test of my own comparing the three different solutions suggested in this thread.

# install.packages(c("microbenchmark", "stringr"), dependencies = TRUE)
require(microbenchmark)
require(stringr)

SPRINTF <- function(x) sprintf("%02d", x)
FORMATC <- function(x) formatC(x, width = 2,flag = 0)
STR_PAD <- function(x) str_pad(x, width=2, side="left", pad="0")

x <- round(runif(1e5)*10)
res <- microbenchmark(SPRINTF(x), STR_PAD(x), FORMATC(x), times = 15)

## Print results:
print(res)
Unit: milliseconds
        expr       min        lq    median        uq      max
1 FORMATC(x) 623.53785 629.69005 638.78667 671.22769 679.8790
2 SPRINTF(x)  34.35783  34.81807  35.04618  35.53696  37.1622
3 STR_PAD(x) 116.54969 118.41944 118.97363 120.05729 163.9664

### Plot results:
boxplot(res)

Box Plot of microbenchmark results

numpy pad array with nan, getting strange float instead

The result of pad has the same type as the input. np.nan is a float

In [874]: np.pad(np.ones(2,dtype=int),1,mode='constant',constant_values=(np.nan,))
Out[874]: array([-2147483648,           1,           1, -2147483648])

In [875]: np.pad(np.ones(2,dtype=float),1,mode='constant',constant_values=(np.nan,))
Out[875]: array([ nan,   1.,   1.,  nan])

The int pad is np.nan cast as an integer:

In [878]: np.array(np.nan).astype(int)
Out[878]: array(-2147483648)

Replacing NAs with latest non-NA value

You probably want to use the na.locf() function from the zoo package to carry the last observation forward to replace your NA values.

Here is the beginning of its usage example from the help page:

library(zoo)

az <- zoo(1:6)

bz <- zoo(c(2,NA,1,4,5,2))

na.locf(bz)
1 2 3 4 5 6 
2 2 1 4 5 2 

na.locf(bz, fromLast = TRUE)
1 2 3 4 5 6 
2 1 1 4 5 2 

cz <- zoo(c(NA,9,3,2,3,2))

na.locf(cz)
2 3 4 5 6 
9 3 2 3 2