Count number of column in a pipe delimited file
awk -F\| '{print NF}'
gives correct result.
How to count number of unique values of a field in a pipe-delimited text file?
With macOS's built-in awk
like this:
awk -F'|' '{print $4}' YourFile | sort | uniq
Output
Blue
Green
Your question title implies you expect the answer to be 2, because there are two unique values, in that case, count the lines too:
awk -F'|' '{print $4}' file | sort | uniq | wc -l
2
Count unique values in each column of pipe delimited text file using Perl
not the most elegant, but the fastest I crafted for your new requirements:
import glob
import os
import sys
path = "/tmp"
file_mask = "file*.txt"
results = {}
for file in glob.glob(os.path.join(path, file_mask)):
column_names = {}
exchange_col = None
with open(file, "r") as f:
for line_num, line in enumerate(f.xreadlines()):
# process header
if not line_num:
line_parsed = line.strip().split("|")
for column_num, column in enumerate(line_parsed):
if column.strip() == "exchnage":
exchange_col = column_num
else:
column_names[column_num] = column.strip()
if exchange_col is None:
print "Can't find exchnage field"
sys.exit(1)
continue
line_parsed = line.strip().split("|")
if len(line_parsed) != len(column_names) + 1:
continue
# prepare empty structure for excahnge, if not added yet
if not line_parsed[exchange_col].strip() in results:
results[line_parsed[exchange_col].strip()] = {column_name:set() for column_name in column_names.values()}
# add uniq items to exchange
for column_num, column in enumerate(line_parsed):
column_val = column.strip()
# add only non empty values
if column_val and column_num != exchange_col:
results[line_parsed[exchange_col].strip()][column_names[column_num]].add(column_val)
column_names = column_names.values()
print "exchnage|" + "|".join("%8s" %c for c in column_names)
for exchange, values in results.iteritems():
print "%8s|" % exchange + "|".join("%8s" % str(len(values[column])) for column in column_names)
program output (as input your new files with different columns order were used):
$ python parser.py
exchnage| ticker| sedol| cusip
newyork| 2| 2| 3
london| 3| 2| 3
How to obtain max length of fields in huge pipe delimited file
When I import text data into a database, typically I first read the data into a staging table where are the columns are long-enough character fields (say varchar(8000)
).
Then, I load from the staging table into the final table:
create table RealTable (
RealTableId int identity(1, 1) primary key,
Column1 int,
Column2 datetime,
Column3 varchar(12),
. . .
);
insert into RealTable(<all columns but id>)
select (case when column1 not like '[^0-9]' then cast(column1 as int) end),
(case when isdate(column2) = 1 then cast(column2 as datetime),
. . .
I find it much easier to debug type issues inside the database rather than when inserting into the database.
Related Topics
Disable CPU Caches (L1/L2) on Armv8-A Linux
Find' (Command) Finds Nothing with -Wholename
Need Some Advise to Begin Programming on Arm (With Linux) Platform
Where Is G_Multi Configured in Beaglebone Black
Sort a File Based on a Column in Another File
How to Selectively Create Symbolic Links to Specific Files in Another Directory in Linux
Reading Memory Pointed by Register with Gdb
How to Remove File with Special Characters
Elf File Tls and Load Program Sections
How to 'Chmod -R +W' with Ant, Files and Folders
Echo - Syntax Error: Bad Substitution
Bash "&" Without Printing "[1]+ Done "
Authenticating Gtk App to Run with Root Permissions
Sed Command Works on Linux, But Not on Os X