How to find the indices where there are n consecutive zeroes in a row
Here are two base R approaches:
1) rle First run rle
and then compute ok
to pick out the sequences of zeros that are more than 3 long. We then compute the starts
and ends
of all repeated sequences subsetting to the ok
ones at the end.
with(rle(x), {
ok <- values == 0 & lengths > 3
ends <- cumsum(lengths)
starts <- ends - lengths + 1
data.frame(starts, ends)[ok, ]
})
giving:
starts ends
1 6 17
2 34 58
3 72 89
2) gregexpr Take the sign of each number -- that will be 0 or 1 and then concatenate those into a long string. Then use gregexpr
to find the location of at least 4 zeros. The result gives the starts and the ends can be computed from that plus the match.length
attribute minus 1.
s <- paste(sign(x), collapse = "")
g <- gregexpr("0{4,}", s)[[1]]
data.frame(starts = 0, ends = attr(g, "match.length") - 1) + g
giving:
starts ends
1 6 17
2 34 58
3 72 89
Find consecutive zeroes in a row
#Had to fix Client 4, one number was missing
DF <- read.table(text = 'Clients Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
"Client 1" 123 768 678 452 213 123 55 10 0 0 0 0
"Client 2" 549 542 21 321 31 59 998 0 546 980 0 987
"Client 3" 500 0 500 0 500 0 500 0 500 0 500 0
"Client 4" 126 545 2315 27 268 126 56 0 0 0 0 0
"Client 5" 546 546 0 0 0 328 486 326 0 0 66 0
"Client 6" 0 0 0 25 78 563 698 631 230 53 0 0', header = TRUE)
Loop over rows, reverse the order, and find which entry is the first non-zero; if the client never head a transaction return length(x)
:
n <- apply(DF[, -1], 1, function(x) if (any(x)) which.max(rev(x) != 0) - 1 else length(x))
#[1] 4 0 1 5 1 2
DF$Clients[n >= 3]
#[1] Client 1 Client 4
#Levels: Client 1 Client 2 Client 3 Client 4 Client 5 Client 6
Finding the consecutive zeros in a numpy array
Here's a fairly compact vectorized implementation. I've changed the requirements a bit, so the return value is a bit more "numpythonic": it creates an array with shape (m, 2), where m is the number of "runs" of zeros. The first column is the index of the first 0 in each run, and the second is the index of the first nonzero element after the run. (This indexing pattern matches, for example, how slicing works and how the range
function works.)
import numpy as np
def zero_runs(a):
# Create an array that is 1 where a is 0, and pad each end with an extra 0.
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
For example:
In [236]: a = [1, 2, 3, 0, 0, 0, 0, 0, 0, 4, 5, 6, 0, 0, 0, 0, 9, 8, 7, 0, 10, 11]
In [237]: runs = zero_runs(a)
In [238]: runs
Out[238]:
array([[ 3, 9],
[12, 16],
[19, 20]])
With this format, it is simple to get the number of zeros in each run:
In [239]: runs[:,1] - runs[:,0]
Out[239]: array([6, 4, 1])
It's always a good idea to check the edge cases:
In [240]: zero_runs([0,1,2])
Out[240]: array([[0, 1]])
In [241]: zero_runs([1,2,0])
Out[241]: array([[2, 3]])
In [242]: zero_runs([1,2,3])
Out[242]: array([], shape=(0, 2), dtype=int64)
In [243]: zero_runs([0,0,0])
Out[243]: array([[0, 3]])
Finding the first number after consecutive zeros in data frame
We can use rle
to select the first row after first consecutive zeroes in each group (ID
).
library(dplyr)
data %>%
group_by(ID) %>%
slice(with(rle(event == 0), sum(lengths[1:which.max(values)])) + 1)
# ID time event
# <int> <int> <dbl>
#1 1 8 1
#2 2 6 1
Find instances within a column where consecutive rows are non zero?
IIUC using cumsum
create the groupby
key
s1=s[s==0].groupby(s.ne(0).cumsum()).transform('size')
n=5
s[(s==0)&(s1==n)]
Out[753]:
5 0
6 0
7 0
8 0
9 0
dtype: int64
Dput
l=[0,1,1,1,1,0,0,0,0,0,1,1,1,0,0,1,1,1,1,1,0,0,0]
s=pd.Series(l)
Python - Identify groups of consecutive 0's and replace them
I think this handles it.
newa = []
span = 0
for n in a:
# Is this number non-zero?
if n:
# Yes. Have we just passed a string of zeros?
if span:
# Yes. Average this value and the last non-zero value
# and duplicate for as many zeros as we saw.
avg = (newa[-1] + n) / 2
newa.extend( [avg] * span )
span = 0
# Always add this number to the new list.
newa.append( n )
else:
# No, this number was a zero. Just count it.
span += 1
Can this series end with a span of zeros? Only you know whether that's a concern or not.
EDIT to ignore series longer than 5.
newa = []
span = 0
for n in a:
# Is this number non-zero?
if n:
# Yes. Have we just passed a string of zeros?
if span:
# Yes. Average this value and the last non-zero value
# and duplicate for as many zeros as we saw.
if span > 5:
avg = 0
else:
avg = (newa[-1] + n) / 2
newa.extend( [avg] * span )
span = 0
# Always add this number to the new list.
newa.append( n )
else:
# No, this number was a zero. Just count it.
span += 1
get index of the first block of at least n consecutive False values in boolean array
I think for this linear search operation a python implementation is ok. My suggestion looks like this:
def find_block(arr, n_at_least=1):
current_index = 0
current_count = 0
for index, item in enumerate(arr):
if item:
current_count = 0
current_index = index + 1
else:
current_count += 1
if current_count == n_at_least:
return current_index
return None # Make sure this is outside for loop
Running this function yields the following outputs:
>>> import numpy
>>> w = numpy.array([True, False, True, True, False, False, False])
>>> find_block(w, n_at_least=1)
1
>>> find_block(w, n_at_least=3)
4
>>> find_block(w, n_at_least=4)
>>> # None
Lowest starting row indices for minimum 2 consecutive non-zero values per column
Minimum 2 consecutive non-zero values case
%// Mask of non-zeros in input, A
mask = A~=0
%// Find starting row indices alongwith boolean valid flags for minimum two
%// consecutive nonzeros in each column
[valid,idx] = max(mask(1:end-1,:) & mask(2:end,:),[],1)
%// Use the valid flags to set invalid row indices to zeros
out = idx.*valid
Sample run -
A =
0 0 0 0 -4 3
0 2 1 0 0 0
0 5 0 8 7 0
0 9 10 3 1 2
mask =
0 0 0 0 1 1
0 1 1 0 0 0
0 1 0 1 1 0
0 1 1 1 1 1
valid =
0 1 0 1 1 0
idx =
1 2 1 3 3 1
out =
0 2 0 3 3 0
Generic case
For generic case of minimum N consecutive non-zeros case, you can use 2D convolution
with a kernel as a column vectors of N
ones, like so -
mask = A~=0 %// Mask of non-zeros in input, A
%// Find starting row indices alongwith boolean valid flags for minimum N
%// consecutive nonzeros in each column
[valid,idx] = max(conv2(double(mask),ones(N,1),'valid')==N,[],1)
%// Use the valid flags to set invalid row indices to zeros
out = idx.*valid
Please note that the 2D convolution could be replaced by a separable convolution version as mentioned in the comments by Luis and that seems to be a bit faster. More info on this could be accessed at this link
. So,
conv2(double(mask),ones(N,1),'valid')
could be replaced by conv2(ones(N,1),1,double(mask),'valid')
.
Sample run -
A =
0 0 0 0 0 3
0 2 1 0 1 2
0 5 0 8 7 9
0 9 0 3 1 2
mask =
0 0 0 0 0 1
0 1 1 0 1 1
0 1 0 1 1 1
0 1 0 1 1 1
N =
3
valid =
0 1 0 0 1 1
idx =
1 2 1 1 2 1
out =
0 2 0 0 2 1
Find the lowest location within rows where there are non-zero elements for each column in a matrix in MATLAB
This should do it:
A = [0, 0, 0, 0, 4, 3;
0, 2, 1, 0, 0, 0;
0, 5, 0, 8, 7, 0;
8, 9, 10, 3, 0, 2];
indices = repmat((1:size(A))', [1, size(A, 2)]);
indices(A == 0) = NaN;
min(indices, [], 1)
Here indices is:
indices =
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
4 4 4 4 4 4
We then set every element of indices
to NaN
wherever A is zero, which gives us:
indices =
NaN NaN NaN NaN 1 1
NaN 2 2 NaN NaN NaN
NaN 3 NaN 3 3 NaN
4 4 4 4 NaN 4
We then simply take the minimum of each column
How to list the index of all consecutive and single values in a row in R matrix
Here you go. I think this should work with your data:
val = 1;
counter = 1;
temp = matrix();
for (i in 1:nrow(mdata))
{
for (j in 1:ncol(mdata))
{
if (mdata[i,j] == -3)
{
while (j <= ncol(mdata))
{
if (mdata[i,j + val] == -3)
{
counter = counter + 1;
val = val + 1;
next;
}
else
{
break;
}
}
if (counter == 1)
{
#print(j);
#print(mdata[i, (j - 1):(j + 1)]);
temp <- t(as.matrix(mdata[i, (j - 1):(j + 1)]))
cat("\n This is with counter 1 \n")
print(temp)
cat("\n matrix: temp-1", temp[,1],"temp-2", temp[,3],"\n");
to.avg <- c(temp[,1], temp[,3]);
avg<-mean(to.avg)
mdata[i,j] = avg;
}
else
{
temp <- t(as.matrix(mdata[i,(j - 1):(j + counter)]))
cat("\n This is with multiple count \n")
cat(counter,"consecutive values were found, processing accordingly \n")
print(temp);
for (k in 0:(counter-1))
{
# cat("\n reading temp at the start \n")
# print(temp)
cat("\n K is ",(k+1), "and array is",length(temp),"long \n")
to.avg <- c(temp[,(k+1)], temp[,length(temp)]);
cat("averaging", temp[,(k+1)],"and", temp[,length(temp)]);
avg<-mean(to.avg)
cat("\n average =",avg);
temp[,(k+2)] = avg;
# cat("\n reading temp as this \n")
# print(temp)
mdata[i,j+k]=avg
}
}
}
else
{
mdata[i,j] = mdata[i,j];
}
val = 1;
counter = 1;
}
}
Related Topics
Merge Data Based on Nearest Date R
Select List Element Programmatically Using Name Stored as String
Splitting (1:N)[Boolean] into Contiguous Sequences
Calculate Peak Values in a Plot Using R
Group Data Frame by Pattern in R
Embed Instagram/Youtube into Shiny R App
Ggplotly Not Displaying Geom_Line Correctly
R Not Responding Request to Interrupt Stop Process
Barplot with Multiple Columns in R
Extract Coefficients from Ggplot2-Created Nls Fit
How to Read All Files in One Directory into R at Once
How to Set Themes Globally for Ggplot2
Standard Error of Variance Component from The Output of Lmer
Getting Stargazer Column Labels to Print on Two or Three Lines
Using Inst/Extdata with Vignette During Package Checking R 2.14.0