How to Pad a Vector with Na from the Front

How can I pad a vector with NA from the front?

Assuming v1 has the desired length and v2 is shorter (or the same length) these left pad v2 with NA values to the length of v1. The first four assume numeric vectors although they can be modified to also work more generally by replacing NA*v1 in the code with rep(NA, length(v1)).

replace(NA * v1, seq(to = length(v1), length = length(v2)), v2)

rev(replace(NA * v1, seq_along(v2), rev(v2)))

replace(NA * v1, seq_along(v2) + length(v1) - length(v2), v2)

tail(c(NA * v1, v2), length(v1))

c(rep(NA, length(v1) - length(v2)), v2)

The fourth is the shortest. The first two and fourth do not involve any explicit arithmetic calculations other than multiplying v1 with NA values. The second is likely slow since it involves two applications of rev.

Adding NA's to a vector

You could use your own modification of diff:

mydiff <- function(data, diff){
c(diff(data, lag = diff), rep(NA, diff))
}

mydiff(foo, 1)
[1] 0.62 -0.62 -1.38 2.57 0.43 -0.87 NA

data.frame(foo = foo, diff = mydiff(foo, 3))

foo diff
1 102.25 -1.38
2 102.87 0.57
3 102.25 1.62
4 100.87 2.13
5 103.44 NA
6 103.87 NA
7 103.00 NA

adding a variable length padding to each element in a string/character vector

A vectorised base R option :

vec <- c("dog", "cat", "mouse", "hare", "snake") 
n <- max(nchar(vec))
paste0(vec, strrep('+', n - nchar(vec)))
#[1] "dog++" "cat++" "mouse" "hare+" "snake"

Calculate derivative diff() and keep length - add NA

From this answer to a question of mine.

If you were looking for a generic way to prepend NA

pad  <- function(x, n) {
len.diff <- n - length(x)
c(rep(NA, len.diff), x)
}

x <- 1:10
dif <- pad(diff(x, lag=1), length(x))

but if you are not afraid to bring in zoo library it's better to do:

library(zoo)
x <- 1:5
as.vector(diff(zoo(x), na.pad=TRUE)) # convert x to zoo first, then diff (that invokes zoo's diff which takes a na.pad=TRUE)
# NA 1 1 1 1 (same length as original x vector)

R Filling missing values with NA for a data frame

You could do:

data.frame(sapply(dyem_list, "length<-", max(lengths(dyem_list))))

location organization person date Jobs
1 USA Microsoft NULL 1989 CEO
2 Singapore University of London NULL 2001 Chairman
3 UK Boeing NULL 2018 VP of sales
4 NULL Apple NULL NULL General Manager
5 NULL NULL NULL NULL Director

Where dyem_list is the following:

dyem_list <- list(
location = list("USA","Singapore","UK"),
organization = list("Microsoft","University of London","Boeing","Apple"),
person = list(),
date = list("1989","2001","2018"),
Jobs = list("CEO","Chairman","VP of sales","General Manager","Director")
)

python how to pad numpy array with zeros

Very simple, you create an array containing zeros using the reference shape:

result = np.zeros(b.shape)
# actually you can also use result = np.zeros_like(b)
# but that also copies the dtype not only the shape

and then insert the array where you need it:

result[:a.shape[0],:a.shape[1]] = a

and voila you have padded it:

print(result)
array([[ 1., 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1., 0.],
[ 0., 0., 0., 0., 0., 0.]])

You can also make it a bit more general if you define where your upper left element should be inserted

result = np.zeros_like(b)
x_offset = 1 # 0 would be what you wanted
y_offset = 1 # 0 in your case
result[x_offset:a.shape[0]+x_offset,y_offset:a.shape[1]+y_offset] = a
result

array([[ 0., 0., 0., 0., 0., 0.],
[ 0., 1., 1., 1., 1., 1.],
[ 0., 1., 1., 1., 1., 1.],
[ 0., 1., 1., 1., 1., 1.]])

but then be careful that you don't have offsets bigger than allowed. For x_offset = 2 for example this will fail.


If you have an arbitary number of dimensions you can define a list of slices to insert the original array. I've found it interesting to play around a bit and created a padding function that can pad (with offset) an arbitary shaped array as long as the array and reference have the same number of dimensions and the offsets are not too big.

def pad(array, reference, offsets):
"""
array: Array to be padded
reference: Reference array with the desired shape
offsets: list of offsets (number of elements must be equal to the dimension of the array)
"""
# Create an array of zeros with the reference shape
result = np.zeros(reference.shape)
# Create a list of slices from offset to offset + shape in each dimension
insertHere = [slice(offset[dim], offset[dim] + array.shape[dim]) for dim in range(a.ndim)]
# Insert the array in the result at the specified offsets
result[insertHere] = a
return result

And some test cases:

import numpy as np

# 1 Dimension
a = np.ones(2)
b = np.ones(5)
offset = [3]
pad(a, b, offset)

# 3 Dimensions

a = np.ones((3,3,3))
b = np.ones((5,4,3))
offset = [1,0,0]
pad(a, b, offset)

Some built-in to pad a list in python

a += [''] * (N - len(a))

or if you don't want to change a in place

new_a = a + [''] * (N - len(a))

you can always create a subclass of list and call the method whatever you please

class MyList(list):
def ljust(self, n, fillvalue=''):
return self + [fillvalue] * (n - len(self))

a = MyList(['1'])
b = a.ljust(5, '')

Pad with leading zeros to common width

Simply following the advise in @joran's comment,

DB <- data.frame(
HOUR = c(1, 10, 5, 20),
ID = c(2, 4, 6, 6))

NHOUR <- sprintf("%02d",DB$HOUR) # fix to 2 characters

cbind(NHOUR, DB) # combine old and newdata
NHOUR HOUR ID
1 01 1 2
2 10 10 4
3 05 5 6
4 20 20 6

Update 2013-01-21 23:42:00Z Inspired by daroczig's performance test below, and because I wanted to try out the microbenchmark package, I've updated this question with a small performance test of my own comparing the three different solutions suggested in this thread.

# install.packages(c("microbenchmark", "stringr"), dependencies = TRUE)
require(microbenchmark)
require(stringr)

SPRINTF <- function(x) sprintf("%02d", x)
FORMATC <- function(x) formatC(x, width = 2,flag = 0)
STR_PAD <- function(x) str_pad(x, width=2, side="left", pad="0")

x <- round(runif(1e5)*10)
res <- microbenchmark(SPRINTF(x), STR_PAD(x), FORMATC(x), times = 15)

## Print results:
print(res)
Unit: milliseconds
expr min lq median uq max
1 FORMATC(x) 623.53785 629.69005 638.78667 671.22769 679.8790
2 SPRINTF(x) 34.35783 34.81807 35.04618 35.53696 37.1622
3 STR_PAD(x) 116.54969 118.41944 118.97363 120.05729 163.9664

### Plot results:
boxplot(res)

Box Plot of microbenchmark results

numpy pad array with nan, getting strange float instead

The result of pad has the same type as the input. np.nan is a float

In [874]: np.pad(np.ones(2,dtype=int),1,mode='constant',constant_values=(np.nan,))
Out[874]: array([-2147483648, 1, 1, -2147483648])

In [875]: np.pad(np.ones(2,dtype=float),1,mode='constant',constant_values=(np.nan,))
Out[875]: array([ nan, 1., 1., nan])

The int pad is np.nan cast as an integer:

In [878]: np.array(np.nan).astype(int)
Out[878]: array(-2147483648)

Replacing NAs with latest non-NA value

You probably want to use the na.locf() function from the zoo package to carry the last observation forward to replace your NA values.

Here is the beginning of its usage example from the help page:

library(zoo)

az <- zoo(1:6)

bz <- zoo(c(2,NA,1,4,5,2))

na.locf(bz)
1 2 3 4 5 6
2 2 1 4 5 2

na.locf(bz, fromLast = TRUE)
1 2 3 4 5 6
2 1 1 4 5 2

cz <- zoo(c(NA,9,3,2,3,2))

na.locf(cz)
2 3 4 5 6
9 3 2 3 2


Related Topics



Leave a reply



Submit