Put Every N Rows of Input into a New Column

Put every N rows of input into a new column

replace 5 in following script with your number.

seq 20|xargs -n5| awk '{for (i=1;i<=NF;i++) a[i,NR]=$i; }END{
for(i=1;i<=NF;i++) {for(j=1;j<=NR;j++)printf a[i,j]" "; print "" }}'

output:

1 6 11 16 
2 7 12 17
3 8 13 18
4 9 14 19
5 10 15 20

note seq 20 above there is just for generating the number sequence for testing. You don't need it in your real work.

EDIT

as pointed out by sudo_O, I add an pure awk solution:

 awk -vn=5 '{a[NR]=$0}END{ x=1; while (x<=n){ for(i=x;i<=length(a);i+=n) printf a[i]" "; print ""; x++; } }' file

test

kent$  seq 20| awk -vn=5 '{a[NR]=$0}END{ x=1; while (x<=n){ for(i=x;i<=length(a);i+=n) printf a[i]" "; print ""; x++; } }'     
1 6 11 16
2 7 12 17
3 8 13 18
4 9 14 19
5 10 15 20

kent$ seq 12| awk -vn=5 '{a[NR]=$0}END{ x=1; while (x<=n){ for(i=x;i<=length(a);i+=n) printf a[i]" "; print ""; x++; } }'
1 6 11
2 7 12
3 8
4 9
5 10

Put every X rows of input into a new column

I suppose each block has the same pattern, I mean, the first column is in the same order [ABCD ASDF ... QWER] and again.
If so, you have to take the first column of the first BLOCK [47288 lines] and echo to the target file.
Then you have to get the second column of each BLOCK and paste it to the target file.
I tried with this data file :


ABCD 1001
EFGH 1002
IJKL 1003
MNOP 1004
QRST 1005
UVWX 1006
ABCD 2001
EFGH 2002
IJKL 2003
MNOP 2004
QRST 2005
UVWX 2006
ABCD 3001
EFGH 3002
IJKL 3003
MNOP 3004
QRST 3005
UVWX 3006
ABCD 4001
EFGH 4002
IJKL 4003
MNOP 4004
QRST 4005
UVWX 4006
ABCD 5001
EFGH 5002
IJKL 5003
MNOP 5004
QRST 5005
UVWX 5006

And with this script :


#!/bin/bash

#target number of lines, change to 47288
LINES=6
INPUT='data.txt'
TOTALLINES=`wc --lines $INPUT | cut --delimiter=" " --field=1`
TOTALBLOCKS=$((TOTALLINES / LINES))


#getting first block of target file, the first column of first LINES of data file
head -n $LINES $INPUT | cut --field=1 > target.txt

#get second column of each line, by blocks, and paste it into target file
BLOCK=1
while [ $BLOCK -le $TOTALBLOCKS ]
do
HEADVALUE=$((BLOCK * LINES))
head -n $HEADVALUE $INPUT | tail -n $LINES | cut --field=2 > tmpcol.txt
cp target.txt targettmp.txt
paste targettmp.txt tmpcol.txt > target.txt
BLOCK=$((BLOCK+1))
done

#removing temp files
rm -f targettmp.txt
rm -f tmpcol.txt

And I got this target file :


ABCD 1001 2001 3001 4001 5001
EFGH 1002 2002 3002 4002 5002
IJKL 1003 2003 3003 4003 5003
MNOP 1004 2004 3004 4004 5004
QRST 1005 2005 3005 4005 5005
UVWX 1006 2006 3006 4006 5006

I hope this helps you.

Convert one column into a new column every 5 rows (a numeric interval)

yes you can do following:

as.data.frame(split(data, 1:x))

where x = nr of rows / 5; in your example x = 2 as you have 10 observations

awk insert rows of one file as new columns to every nth rows of another file

Try this, see mywiki.wooledge - Process Substitution for details on <() syntax

$ # transforming file2
$ cut -d' ' -f2-3 file2.txt | sed 'p;p'
2 3
2 3
2 3
5 6
5 6
5 6
8 9
8 9
8 9

$ # then paste it together with required fields from file1
$ paste -d' ' <(cut -d' ' -f1-3 file1.txt) <(cut -d' ' -f2-3 file2.txt | sed 'p;p')
a b c 2 3
d e f 2 3
g h i 2 3
j k l 5 6
m n o 5 6
o q r 5 6
s t u 8 9
v w x 8 9
y z Z 8 9



Speed comparison, time shown for two consecutive runs

$ perl -0777 -ne 'print $_ x 1000000' file1.txt > f1
$ perl -0777 -ne 'print $_ x 1000000' file2.txt > f2
$ du -h f1 f2
95M f1
18M f2


$ time paste -d' ' <(cut -d' ' -f1-3 f1) <(cut -d' ' -f2-3 f2 | sed 'p;p') > t1

real 0m1.362s
real 0m1.154s

$ time awk '1;1;1' f2 | awk 'FNR==NR{a[FNR]=$2" "$3;next};{$NF=a[FNR]};1' - f1 > t2

real 0m12.088s
real 0m13.028s

$ time awk '{
if (c==3) c=0;
printf "%s %s %s ",$1,$2,$3;
if (!c++){ getline < "f2"; f4=$2; f5=$3 }
printf "%s %s\n",f4,f5
}' f1 > t3

real 0m13.629s
real 0m13.380s

$ time awk '{
if (c==3) c=0;
main_fields=$1 OFS $2 OFS $3;
if (!c++){ getline < "f2"; f4=$2; f5=$3 }
printf "%s %s %s\n", main_fields, f4, f5
}' f1 > t4

real 0m13.265s
real 0m13.896s

$ diff -s t1 t2
Files t1 and t2 are identical
$ diff -s t1 t3
Files t1 and t3 are identical
$ diff -s t1 t4
Files t1 and t4 are identical

Making every other row into a new column

awk to the rescue!

awk     '{k=$1 FS $2} 
NR==1 {p0=$0; pk=k}
pk==k {split(p0,a); for(i=3;i<=NF;i++) $i=a[i] FS $i; print}
pk!=k {p0=$0; pk=$1 FS $2}' file

samples pops condition_1 condition_1 condition_2 condition_2 condition_3 condition_3
A10051 15 1 2 3 4 4 4
A10052 15 2 2 1 1 4 4

will work for unspecified number of columns and records, as long as they are all well-formed (same number of columns) and grouped (same keys are in sequence).

Adding a column to a dataframe after every nth column

Create new DataFrame by indexing each 3rd column, add .5 for correct sorting and add to original with concat:

df.columns = np.arange(len(df.columns))

df1 = pd.DataFrame(50, index=df.index, columns= df.columns[2::3] + .5)

df2 = pd.concat([df, df1], axis=1).sort_index(axis=1)
df2.columns = np.arange(len(df2.columns))
print (df2)
0 1 2 3 4 5 6 7 8 9 10 11 12
0 a b c 50 d e f 50 g h i 50 j
1 k l m 50 n o p 50 q r s 50 t

add values to a dataframe every nth row or corresponding to a spec. column value

If I understand your problem, the following should work:

## Compute the rows in abto to add the values from databtz$abtl.wert
## Here, we look for changes in adjoining rows of the Blatt# column
## using diff. Descending in rows, the row before the change will have
## a diff != 0. We want to mark the next row as well as the first row,
## so prepend this resulting vector with TRUE
rind <- c(TRUE, diff(abto$`Blatt#`) != 0)

## Then it is just a matter of adding a new column to abto named length
## and inserting the values from databtz1$abtl.wert only for the rows
## that are marked (identified by which(rind==1))
result <- data.frame(abto, length=rep(NA,nrow(abto)))
result[which(rind==TRUE),"length"] <- databtz1$abtl.wert

This answer assumes that the number of rows in databtz1 matches the number of rows at which abto$Blatt# changes.

With this input data:

abto <- structure(list(See = c("ABT", "ABT", "ABT", "ABT", "ABT", "ABT", 
"ABT", "ABT", "ABT", "ABT", "ABT", "ABT", "ABT", "ABT", "ABT",
"ABT", "ABT", "ABT", "ABT"), Transekt = c("A", "A", "A", "A",
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"A", "A"), plant = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), `Blatt#` = c(1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 1L, 1L, 1L, 2L, 2L
), breiteo = c(2.0182, 1.973, 1.8024, 2.2081, 2.2858, 1.8532,
2.0384, 2.0757, 1.9567, 1.908, 1.8346, 2.0282, 2.1356, 1.7594,
1.6457, 1.6942, 2.0544, 2.1342, 1.9107), breitez = c(5.398, 4.2522,
3.7587, 4.288, 6.1115, 5.7426, 4.9074, 4.8801, 4.8879, 5.0652,
4.8862, 4.5545, 5.7157, 6.1688, 5.2868, 5.0414, 5.6711, 5.2867,
6.2139), bez = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L)), .Names = c("See", "Transekt",
"plant", "Blatt#", "breiteo", "breitez", "bez"), class = "data.frame", row.names = c(NA,
-19L))

databtz1 <- structure(list(abtl.wert = c(22.7738, 24.9137, 24.9474, 25.0498,
25.0431, 21.1024, 28.9083)), .Names = "abtl.wert", class = "data.frame", row.names = c(NA,
-7L))

I get this:

print(result)
## See Transekt plant Blatt. breiteo breitez bez length
##1 ABT A 1 1 2.0182 5.3980 1 22.7738
##2 ABT A 1 1 1.9730 4.2522 1 NA
##3 ABT A 1 1 1.8024 3.7587 1 NA
##4 ABT A 1 2 2.2081 4.2880 2 24.9137
##5 ABT A 1 2 2.2858 6.1115 2 NA
##6 ABT A 1 2 1.8532 5.7426 2 NA
##7 ABT A 1 3 2.0384 4.9074 2 24.9474
##8 ABT A 1 3 2.0757 4.8801 2 NA
##9 ABT A 1 4 1.9567 4.8879 2 25.0498
##10 ABT A 1 4 1.9080 5.0652 2 NA
##11 ABT A 1 4 1.8346 4.8862 2 NA
##12 ABT A 1 5 2.0282 4.5545 1 25.0431
##13 ABT A 1 5 2.1356 5.7157 1 NA
##14 ABT A 1 5 1.7594 6.1688 1 NA
##15 ABT A 2 1 1.6457 5.2868 1 21.1024
##16 ABT A 2 1 1.6942 5.0414 1 NA
##17 ABT A 2 1 2.0544 5.6711 1 NA
##18 ABT A 2 2 2.1342 5.2867 2 28.9083
##19 ABT A 2 2 1.9107 6.2139 2 NA

Note that I removed a row with Blatt# equal 3 from your original posted data to simulate your irregularities, and added a row in databtz1$abtl.wert so that we have equal number of changes in Blatt# as we do rows of databtz1$abtl.wert.

using awk to record certain column value every Nth row

Each line read is split into fields: $2 is the column 2

NR is the current line number of the line being read.

Give this a try:

awk 'NR>=304 && !((NR-304)%243) { print $2;}' file.txt

Pandas every nth row

I'd use iloc, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:

df.iloc[::5, :]


Related Topics



Leave a reply



Submit