Put Every N Rows of Input into a New Column

Put every N rows of input into a new column

replace 5 in following script with your number.

seq 20|xargs -n5| awk '{for (i=1;i<=NF;i++) a[i,NR]=$i; }END{
    for(i=1;i<=NF;i++) {for(j=1;j<=NR;j++)printf a[i,j]" "; print "" }}'

output:

note seq 20 above there is just for generating the number sequence for testing. You don't need it in your real work.

EDIT

as pointed out by sudo_O, I add an pure awk solution:

 awk -vn=5 '{a[NR]=$0}END{ x=1; while (x<=n){ for(i=x;i<=length(a);i+=n) printf a[i]" "; print ""; x++; } }' file

test

kent$  seq 20| awk -vn=5 '{a[NR]=$0}END{ x=1; while (x<=n){ for(i=x;i<=length(a);i+=n) printf a[i]" "; print ""; x++; } }'     
1 6 11 16 
2 7 12 17 
3 8 13 18 
4 9 14 19 
5 10 15 20 

kent$  seq 12| awk -vn=5 '{a[NR]=$0}END{ x=1; while (x<=n){ for(i=x;i<=length(a);i+=n) printf a[i]" "; print ""; x++; } }'     
1 6 11 
2 7 12 
3 8 
4 9 
5 10

Put every X rows of input into a new column

I suppose each block has the same pattern, I mean, the first column is in the same order [ABCD ASDF ... QWER] and again.
If so, you have to take the first column of the first BLOCK [47288 lines] and echo to the target file.
Then you have to get the second column of each BLOCK and paste it to the target file.
I tried with this data file :


ABCD    1001
EFGH    1002
IJKL    1003
MNOP    1004
QRST    1005
UVWX    1006
ABCD    2001
EFGH    2002
IJKL    2003
MNOP    2004
QRST    2005
UVWX    2006
ABCD    3001
EFGH    3002
IJKL    3003
MNOP    3004
QRST    3005
UVWX    3006
ABCD    4001
EFGH    4002
IJKL    4003
MNOP    4004
QRST    4005
UVWX    4006
ABCD    5001
EFGH    5002
IJKL    5003
MNOP    5004
QRST    5005
UVWX    5006

And with this script :


    #!/bin/bash

    #target number of lines, change to 47288
    LINES=6
    INPUT='data.txt'
    TOTALLINES=`wc --lines $INPUT | cut --delimiter=" " --field=1`
    TOTALBLOCKS=$((TOTALLINES / LINES))


    #getting first block of target file, the first column of first LINES of data file
    head -n $LINES $INPUT | cut --field=1 > target.txt

    #get second column of each line, by blocks, and paste it into target file
    BLOCK=1
    while [ $BLOCK -le $TOTALBLOCKS ]
    do
        HEADVALUE=$((BLOCK * LINES))
        head -n $HEADVALUE $INPUT | tail -n $LINES | cut --field=2 > tmpcol.txt
        cp target.txt targettmp.txt
        paste targettmp.txt tmpcol.txt > target.txt
        BLOCK=$((BLOCK+1))
    done

    #removing temp files
    rm -f targettmp.txt
    rm -f tmpcol.txt

And I got this target file :


ABCD    1001    2001    3001    4001    5001
EFGH    1002    2002    3002    4002    5002
IJKL    1003    2003    3003    4003    5003
MNOP    1004    2004    3004    4004    5004
QRST    1005    2005    3005    4005    5005
UVWX    1006    2006    3006    4006    5006

I hope this helps you.

Convert one column into a new column every 5 rows (a numeric interval)

yes you can do following:

as.data.frame(split(data, 1:x))

where x = nr of rows / 5; in your example x = 2 as you have 10 observations

awk insert rows of one file as new columns to every nth rows of another file

Try this, see mywiki.wooledge - Process Substitution for details on <() syntax

$ # transforming file2
$ cut -d' ' -f2-3 file2.txt | sed 'p;p'
2 3
2 3
2 3
5 6
5 6
5 6
8 9
8 9
8 9

$ # then paste it together with required fields from file1
$ paste -d' ' <(cut -d' ' -f1-3 file1.txt) <(cut -d' ' -f2-3 file2.txt | sed 'p;p')
a b c 2 3
d e f 2 3
g h i 2 3
j k l 5 6
m n o 5 6
o q r 5 6
s t u 8 9
v w x 8 9
y z Z 8 9

Speed comparison, time shown for two consecutive runs

$ perl -0777 -ne 'print $_ x 1000000' file1.txt > f1
$ perl -0777 -ne 'print $_ x 1000000' file2.txt > f2
$ du -h f1 f2
95M f1
18M f2


$ time paste -d' ' <(cut -d' ' -f1-3 f1) <(cut -d' ' -f2-3 f2 | sed 'p;p') > t1

real    0m1.362s
real    0m1.154s

$ time awk '1;1;1' f2 | awk 'FNR==NR{a[FNR]=$2" "$3;next};{$NF=a[FNR]};1' - f1 > t2

real    0m12.088s
real    0m13.028s

$ time awk '{ 
         if (c==3) c=0; 
         printf "%s %s %s ",$1,$2,$3; 
         if (!c++){ getline < "f2"; f4=$2; f5=$3 } 
         printf "%s %s\n",f4,f5 
     }' f1 > t3

real    0m13.629s
real    0m13.380s

$ time awk '{ 
         if (c==3) c=0; 
         main_fields=$1 OFS $2 OFS $3; 
         if (!c++){ getline < "f2"; f4=$2; f5=$3 } 
         printf "%s %s %s\n", main_fields, f4, f5 
     }' f1 > t4

real    0m13.265s 
real    0m13.896s

$ diff -s t1 t2
Files t1 and t2 are identical
$ diff -s t1 t3
Files t1 and t3 are identical
$ diff -s t1 t4
Files t1 and t4 are identical

Making every other row into a new column

awk to the rescue!

awk     '{k=$1 FS $2} 
   NR==1 {p0=$0; pk=k}
   pk==k {split(p0,a); for(i=3;i<=NF;i++) $i=a[i] FS $i; print}
   pk!=k {p0=$0; pk=$1 FS $2}' file

samples pops condition_1 condition_1 condition_2 condition_2 condition_3 condition_3
A10051 15 1 2 3 4 4 4
A10052 15 2 2 1 1 4 4

will work for unspecified number of columns and records, as long as they are all well-formed (same number of columns) and grouped (same keys are in sequence).

Adding a column to a dataframe after every nth column

Create new DataFrame by indexing each 3rd column, add .5 for correct sorting and add to original with concat:

df.columns = np.arange(len(df.columns))

df1 = pd.DataFrame(50, index=df.index, columns= df.columns[2::3] + .5)

df2 = pd.concat([df, df1], axis=1).sort_index(axis=1)
df2.columns = np.arange(len(df2.columns))
print (df2)
  0  1  2   3  4  5  6   7  8  9  10  11 12
0  a  b  c  50  d  e  f  50  g  h  i  50  j
1  k  l  m  50  n  o  p  50  q  r  s  50  t

add values to a dataframe every nth row or corresponding to a spec. column value

If I understand your problem, the following should work:

## Compute the rows in abto to add the values from databtz$abtl.wert
## Here, we look for changes in adjoining rows of the Blatt# column
## using diff. Descending in rows, the row before the change will have
## a diff != 0. We want to mark the next row as well as the first row, 
## so prepend this resulting vector with TRUE
rind <- c(TRUE, diff(abto$`Blatt#`) != 0)

## Then it is just a matter of adding a new column to abto named length
## and inserting the values from databtz1$abtl.wert only for the rows 
## that are marked (identified by which(rind==1))
result <- data.frame(abto, length=rep(NA,nrow(abto)))
result[which(rind==TRUE),"length"] <- databtz1$abtl.wert

This answer assumes that the number of rows in databtz1 matches the number of rows at which abto$Blatt# changes.

With this input data:

abto <- structure(list(See = c("ABT", "ABT", "ABT", "ABT", "ABT", "ABT", 
"ABT", "ABT", "ABT", "ABT", "ABT", "ABT", "ABT", "ABT", "ABT", 
"ABT", "ABT", "ABT", "ABT"), Transekt = c("A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A"), plant = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), `Blatt#` = c(1L, 1L, 1L, 
2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 1L, 1L, 1L, 2L, 2L
), breiteo = c(2.0182, 1.973, 1.8024, 2.2081, 2.2858, 1.8532, 
2.0384, 2.0757, 1.9567, 1.908, 1.8346, 2.0282, 2.1356, 1.7594, 
1.6457, 1.6942, 2.0544, 2.1342, 1.9107), breitez = c(5.398, 4.2522, 
3.7587, 4.288, 6.1115, 5.7426, 4.9074, 4.8801, 4.8879, 5.0652, 
4.8862, 4.5545, 5.7157, 6.1688, 5.2868, 5.0414, 5.6711, 5.2867, 
6.2139), bez = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L)), .Names = c("See", "Transekt", 
"plant", "Blatt#", "breiteo", "breitez", "bez"), class = "data.frame", row.names = c(NA, 
-19L))

databtz1 <- structure(list(abtl.wert = c(22.7738, 24.9137, 24.9474, 25.0498, 
25.0431, 21.1024, 28.9083)), .Names = "abtl.wert", class = "data.frame", row.names = c(NA, 
-7L))

I get this:

print(result)
##       See Transekt plant Blatt. breiteo breitez bez  length
##1  ABT        A     1      1  2.0182  5.3980   1 22.7738
##2  ABT        A     1      1  1.9730  4.2522   1      NA
##3  ABT        A     1      1  1.8024  3.7587   1      NA
##4  ABT        A     1      2  2.2081  4.2880   2 24.9137
##5  ABT        A     1      2  2.2858  6.1115   2      NA
##6  ABT        A     1      2  1.8532  5.7426   2      NA
##7  ABT        A     1      3  2.0384  4.9074   2 24.9474
##8  ABT        A     1      3  2.0757  4.8801   2      NA
##9  ABT        A     1      4  1.9567  4.8879   2 25.0498
##10 ABT        A     1      4  1.9080  5.0652   2      NA
##11 ABT        A     1      4  1.8346  4.8862   2      NA
##12 ABT        A     1      5  2.0282  4.5545   1 25.0431
##13 ABT        A     1      5  2.1356  5.7157   1      NA
##14 ABT        A     1      5  1.7594  6.1688   1      NA
##15 ABT        A     2      1  1.6457  5.2868   1 21.1024
##16 ABT        A     2      1  1.6942  5.0414   1      NA
##17 ABT        A     2      1  2.0544  5.6711   1      NA
##18 ABT        A     2      2  2.1342  5.2867   2 28.9083
##19 ABT        A     2      2  1.9107  6.2139   2      NA

Note that I removed a row with Blatt# equal 3 from your original posted data to simulate your irregularities, and added a row in databtz1$abtl.wert so that we have equal number of changes in Blatt# as we do rows of databtz1$abtl.wert.

using awk to record certain column value every Nth row

Each line read is split into fields: $2 is the column 2

NR is the current line number of the line being read.

Give this a try:

awk 'NR>=304 && !((NR-304)%243) { print $2;}' file.txt

Pandas every nth row

I'd use iloc, which takes a row/column slice, both based on integer position and following normal python syntax. If you want every 5th row:

df.iloc[::5, :]

Put Every N Rows of Input into a New Column