Count Number of Column in a Pipe Delimited File

Count number of column in a pipe delimited file

awk -F\| '{print NF}'

gives correct result.

How to count number of unique values of a field in a pipe-delimited text file?

With macOS's built-in awk like this:

awk -F'|' '{print $4}' YourFile | sort | uniq

Output

Blue
Green

Your question title implies you expect the answer to be 2, because there are two unique values, in that case, count the lines too:

awk -F'|' '{print $4}' file | sort | uniq | wc -l
2

Count unique values in each column of pipe delimited text file using Perl

not the most elegant, but the fastest I crafted for your new requirements:

import glob
import os
import sys

path = "/tmp"
file_mask = "file*.txt"
results = {}

for file in glob.glob(os.path.join(path, file_mask)):
column_names = {}
exchange_col = None
with open(file, "r") as f:
for line_num, line in enumerate(f.xreadlines()):
# process header
if not line_num:
line_parsed = line.strip().split("|")
for column_num, column in enumerate(line_parsed):
if column.strip() == "exchnage":
exchange_col = column_num
else:
column_names[column_num] = column.strip()
if exchange_col is None:
print "Can't find exchnage field"
sys.exit(1)
continue
line_parsed = line.strip().split("|")
if len(line_parsed) != len(column_names) + 1:
continue
# prepare empty structure for excahnge, if not added yet
if not line_parsed[exchange_col].strip() in results:
results[line_parsed[exchange_col].strip()] = {column_name:set() for column_name in column_names.values()}
# add uniq items to exchange
for column_num, column in enumerate(line_parsed):
column_val = column.strip()
# add only non empty values
if column_val and column_num != exchange_col:
results[line_parsed[exchange_col].strip()][column_names[column_num]].add(column_val)

column_names = column_names.values()
print "exchnage|" + "|".join("%8s" %c for c in column_names)
for exchange, values in results.iteritems():
print "%8s|" % exchange + "|".join("%8s" % str(len(values[column])) for column in column_names)

program output (as input your new files with different columns order were used):

$ python parser.py
exchnage| ticker| sedol| cusip
newyork| 2| 2| 3
london| 3| 2| 3

How to obtain max length of fields in huge pipe delimited file

When I import text data into a database, typically I first read the data into a staging table where are the columns are long-enough character fields (say varchar(8000)).

Then, I load from the staging table into the final table:

create table RealTable (
RealTableId int identity(1, 1) primary key,
Column1 int,
Column2 datetime,
Column3 varchar(12),
. . .
);

insert into RealTable(<all columns but id>)
select (case when column1 not like '[^0-9]' then cast(column1 as int) end),
(case when isdate(column2) = 1 then cast(column2 as datetime),
. . .

I find it much easier to debug type issues inside the database rather than when inserting into the database.



Related Topics



Leave a reply



Submit