Print a File in Multiple Columns Based on Delimiter

Print a file in multiple columns based on delimiter

Alright, since apprently there is no clean way to do this I came up with my own solution. It's a bit messy and requires GNU screen to be installed, but it works. Any amount of lines within or around the blocks, 50% of the screen automatically resizing and each column prints independantly from each other with a fixed amount of newlines between them. Also automatic updates every x seconds. (120 in my example)

#!/bin/bash

screen -S testscr -X layout save default
screen -S testscr -X split -v
screen -S testscr -X screen tail -f /tmp/testscr1.txt
screen -S testscr -X focus
screen -S testscr -X screen tail -f /tmp/testscr2.txt

while : ; do
echo "" > /tmp/testscr1.txt
echo "" > /tmp/testscr2.txt
cfile=1 # current column
ctype=0 # start or end of block

while read; do
if [[ $REPLY == "------------------------------------------------------------" ]]; then
if [[ $ctype -eq 0 ]]; then
ctype=1
else
if [[ $cfile -eq 1 ]]; then
echo "${REPLY}" >> /tmp/testscr1.txt
echo "" >> /tmp/testscr1.txt
echo "" >> /tmp/testscr1.txt
cfile=2
else
echo "${REPLY}" >> /tmp/testscr2.txt
echo "" >> /tmp/testscr2.txt
echo "" >> /tmp/testscr2.txt
cfile=1
fi
ctype=0
fi
fi
if [[ $ctype -eq 1 ]]; then
if [[ $cfile -eq 1 ]]; then
echo "${REPLY}" >> /tmp/testscr1.txt
else
echo "${REPLY}" >> /tmp/testscr2.txt
fi
fi
done < "$1"
sleep 120
done

First, start a screen session with screen -S testscr then, either within or outside the session, execute the script above. This will split the screen vertically using 50% per column and execute tail -f on both columns, afterwards it will go through the input file and write block by block to each tmp. file in the desired way. Since it's in an infinite while loop it's essentially automatically updating the shown output every x seconds (here 120).

awk multiple delimiter and print multiple column

Awk can deal with multiple delimiters:

$ awk -F'[(/% ]' '{printf "%s",$1" "$2" "$3" "$4" "$5","$8","$9","$10",";for(i=12;i<=NF;i++)printf "%s ",$i;print ""}' file
May 24 2013 18:13:24 ROUTER1,01IFNET,4,UPDOWN,The state of interface GigabitEthernet0 0 22 was changed to DOWN.
May 24 2013 17:59:33 ROUTER1,01FIB,3,REFRESH_END,FIB refreshing end, the refresh group map is 0!

Print multiple fields in AWK but split one of them based on a different delimiter

$ cut -d: -f1 file
1 mcu
2 disney

or if the real file is more convoluted than the example in your question then maybe this will do what you really need:

$ awk -F'[\t:]' -v OFS='\t' '{print $1, $2}' file
1 mcu
2 disney

How to split a text file into multiple columns with Spark

Using RDD API: your mistake is that String.split expects a regular expression, where pipe ("|") is a special character meaning "OR", so it splits on anything. Plus - you should start from index 0 when converting the array into a tuple

The fix is simple - escape that character:

 sc.textFile("D:/data/dnr10.txt")
.map(_.split("\\|"))
.map(c => (c(0),c(1),c(2),c(3)))
.toDF()

Using Dataframe API: the same issue with escaping the pipe applies here. Plus you can simplify the code by splitting once and using that split column multiple times when selecting the columns:

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.IntegerType

results1.withColumn("split", split($"all", "\\|")).select(
$"split" getItem 0 cast IntegerType as "DEPT_NO",
$"split" getItem 3 cast IntegerType as "ART_GRP_NO",
$"split" getItem 7 as "ART_NO"
)

Using Spark 2.0 built-in CSV support: if you're using Spark 2.0+, you can let the framework do all the hard work for you - use format "csv" and set the delimiter to be the pipe character:

val result = sqlContext.read
.option("header", "true")
.option("delimiter", "|")
.option("inferSchema", "true")
.format("csv")
.load("D:/data/dnr10.txt")

result.show()
// +-------+----------+------+---+
// |DEPT_NO|ART_GRP_NO|ART_NO| TT|
// +-------+----------+------+---+
// | 29| 102|354814|SKO|
// | 29| 102|342677|SKO|
// | 29| 102|334634|DUR|
// | 29| 102|276728|I-P|
// +-------+----------+------+---+

result.printSchema()
// root
// |-- DEPT_NO: integer (nullable = true)
// |-- ART_GRP_NO: integer (nullable = true)
// |-- ART_NO: integer (nullable = true)
// |-- TT: string (nullable = true)

You'll get the column names, the right types - everything... :)

Split column into multiple based on match/delimiter using bash awk

Here's a csplit+paste solution

$ csplit --suppress-matched -zs test.file2 /male_position/ {*}
$ ls
test.file2 xx00 xx01 xx02
$ paste xx*
0.00 0 0
0.00 5 1
1.05 10 2
1.05 3
1.05 5
1.05
3.1
5.11
12.74

From man csplit

csplit - split a file into sections determined by context lines

-z, --elide-empty-files
remove empty output files

-s, --quiet, --silent
do not print counts of output file sizes

--suppress-matched
suppress the lines matching PATTERN

  • /male_position/ is the regex used to split the input file
  • {*} specifies to create as many splits as possible
  • use -f and -n options to change the default output file names
  • paste xx* to paste the files column wise, TAB is default separator

split one file into multiple files according to columns using bash cut or awk

With awk:

awk -F '[\t;]' '{for(i=1; i<=NF; i++) print $i >> "column" i ".txt"}' file

Use tab and semicolon as field separator. NF contains the number of last column in the current row. $i contains content of current column and i number of current column.

This creates 11 files. column11.txt contains:


k
p
k
k


Related Topics



Leave a reply



Submit