Print a file in multiple columns based on delimiter
Alright, since apprently there is no clean way to do this I came up with my own solution. It's a bit messy and requires GNU screen
to be installed, but it works. Any amount of lines within or around the blocks, 50% of the screen automatically resizing and each column prints independantly from each other with a fixed amount of newlines between them. Also automatic updates every x seconds. (120 in my example)
#!/bin/bash
screen -S testscr -X layout save default
screen -S testscr -X split -v
screen -S testscr -X screen tail -f /tmp/testscr1.txt
screen -S testscr -X focus
screen -S testscr -X screen tail -f /tmp/testscr2.txt
while : ; do
echo "" > /tmp/testscr1.txt
echo "" > /tmp/testscr2.txt
cfile=1 # current column
ctype=0 # start or end of block
while read; do
if [[ $REPLY == "------------------------------------------------------------" ]]; then
if [[ $ctype -eq 0 ]]; then
ctype=1
else
if [[ $cfile -eq 1 ]]; then
echo "${REPLY}" >> /tmp/testscr1.txt
echo "" >> /tmp/testscr1.txt
echo "" >> /tmp/testscr1.txt
cfile=2
else
echo "${REPLY}" >> /tmp/testscr2.txt
echo "" >> /tmp/testscr2.txt
echo "" >> /tmp/testscr2.txt
cfile=1
fi
ctype=0
fi
fi
if [[ $ctype -eq 1 ]]; then
if [[ $cfile -eq 1 ]]; then
echo "${REPLY}" >> /tmp/testscr1.txt
else
echo "${REPLY}" >> /tmp/testscr2.txt
fi
fi
done < "$1"
sleep 120
done
First, start a screen session with screen -S testscr
then, either within or outside the session, execute the script above. This will split the screen vertically using 50% per column and execute tail -f
on both columns, afterwards it will go through the input file and write block by block to each tmp. file in the desired way. Since it's in an infinite while loop it's essentially automatically updating the shown output every x seconds (here 120
).
awk multiple delimiter and print multiple column
Awk can deal with multiple delimiters:
$ awk -F'[(/% ]' '{printf "%s",$1" "$2" "$3" "$4" "$5","$8","$9","$10",";for(i=12;i<=NF;i++)printf "%s ",$i;print ""}' file
May 24 2013 18:13:24 ROUTER1,01IFNET,4,UPDOWN,The state of interface GigabitEthernet0 0 22 was changed to DOWN.
May 24 2013 17:59:33 ROUTER1,01FIB,3,REFRESH_END,FIB refreshing end, the refresh group map is 0!
Print multiple fields in AWK but split one of them based on a different delimiter
$ cut -d: -f1 file
1 mcu
2 disney
or if the real file is more convoluted than the example in your question then maybe this will do what you really need:
$ awk -F'[\t:]' -v OFS='\t' '{print $1, $2}' file
1 mcu
2 disney
How to split a text file into multiple columns with Spark
Using RDD API: your mistake is that String.split
expects a regular expression, where pipe ("|"
) is a special character meaning "OR", so it splits on anything. Plus - you should start from index 0 when converting the array into a tuple
The fix is simple - escape that character:
sc.textFile("D:/data/dnr10.txt")
.map(_.split("\\|"))
.map(c => (c(0),c(1),c(2),c(3)))
.toDF()
Using Dataframe API: the same issue with escaping the pipe applies here. Plus you can simplify the code by splitting once and using that split column multiple times when selecting the columns:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.IntegerType
results1.withColumn("split", split($"all", "\\|")).select(
$"split" getItem 0 cast IntegerType as "DEPT_NO",
$"split" getItem 3 cast IntegerType as "ART_GRP_NO",
$"split" getItem 7 as "ART_NO"
)
Using Spark 2.0 built-in CSV support: if you're using Spark 2.0+, you can let the framework do all the hard work for you - use format "csv" and set the delimiter to be the pipe character:
val result = sqlContext.read
.option("header", "true")
.option("delimiter", "|")
.option("inferSchema", "true")
.format("csv")
.load("D:/data/dnr10.txt")
result.show()
// +-------+----------+------+---+
// |DEPT_NO|ART_GRP_NO|ART_NO| TT|
// +-------+----------+------+---+
// | 29| 102|354814|SKO|
// | 29| 102|342677|SKO|
// | 29| 102|334634|DUR|
// | 29| 102|276728|I-P|
// +-------+----------+------+---+
result.printSchema()
// root
// |-- DEPT_NO: integer (nullable = true)
// |-- ART_GRP_NO: integer (nullable = true)
// |-- ART_NO: integer (nullable = true)
// |-- TT: string (nullable = true)
You'll get the column names, the right types - everything... :)
Split column into multiple based on match/delimiter using bash awk
Here's a csplit+paste
solution
$ csplit --suppress-matched -zs test.file2 /male_position/ {*}
$ ls
test.file2 xx00 xx01 xx02
$ paste xx*
0.00 0 0
0.00 5 1
1.05 10 2
1.05 3
1.05 5
1.05
3.1
5.11
12.74
From man csplit
csplit - split a file into sections determined by context lines
-z, --elide-empty-files
remove empty output files-s, --quiet, --silent
do not print counts of output file sizes--suppress-matched
suppress the lines matching PATTERN
/male_position/
is the regex used to split the input file{*}
specifies to create as many splits as possible- use
-f
and-n
options to change the default output file names paste xx*
to paste the files column wise, TAB is default separator
split one file into multiple files according to columns using bash cut or awk
With awk:
awk -F '[\t;]' '{for(i=1; i<=NF; i++) print $i >> "column" i ".txt"}' file
Use tab and semicolon as field separator. NF
contains the number of last column in the current row. $i
contains content of current column and i
number of current column.
This creates 11 files. column11.txt contains:
k
p
k
k
Related Topics
Python3 Unicodeencodeerror When Run via Synology Task Scheduler
Escaping Single Quotes in Shell for Postgresql
Shuffle Output of Find with Fixed Seed
Rsync, 'Uid/Gid Impossible to Set' Cases Cause Future Hard Link Failure, How to Fix
"Bad Interpreter" Error Message When Trying to Run Awk Executable
How to Start a Process That Won't End When My Ssh Session Ends
Linux Script Start,Stop,Restart
Where Is $Path Set? Specifically Where Is My MAC Port Path Being Set
Environment Variable Used in Shell Script Appear Blank in Log File When Run by Cron
Installing Gnu Parallel Without Root Permission
Linux, Serial Port, Non-Buffering Mode
Curl Command Doesn't Work in Bash Script
Why Fftw on Windows Is Faster Than on Linux
/Var/Log/Daemon.Log Taking More Space How to Reduce It
How to Read from User in Rpm Install Script