Count Number of Column in a Pipe Delimited File

Count number of column in a pipe delimited file

awk -F\| '{print NF}'

gives correct result.

How to count number of unique values of a field in a pipe-delimited text file?

With macOS's built-in awk like this:

awk -F'|' '{print $4}' YourFile | sort | uniq

Output

Blue
Green

Your question title implies you expect the answer to be 2, because there are two unique values, in that case, count the lines too:

awk -F'|' '{print $4}' file | sort | uniq | wc -l
2

Count unique values in each column of pipe delimited text file using Perl

not the most elegant, but the fastest I crafted for your new requirements:

import glob
import os
import sys

path = "/tmp"
file_mask = "file*.txt"
results = {}

for file in glob.glob(os.path.join(path, file_mask)):
    column_names = {}
    exchange_col = None
    with open(file, "r") as f:
        for line_num, line in enumerate(f.xreadlines()):
            # process header
            if not line_num:
                line_parsed = line.strip().split("|")
                for column_num, column in enumerate(line_parsed):
                    if column.strip() == "exchnage":
                        exchange_col = column_num
                    else:
                        column_names[column_num] = column.strip()
                if exchange_col is None:
                    print "Can't find exchnage field"
                    sys.exit(1)
                continue
            line_parsed = line.strip().split("|")
            if len(line_parsed) != len(column_names) + 1:
                continue
            # prepare empty structure for excahnge, if not added yet
            if not line_parsed[exchange_col].strip() in results:
                results[line_parsed[exchange_col].strip()] = {column_name:set() for column_name in column_names.values()}
            # add uniq items to exchange
            for column_num, column in enumerate(line_parsed):
                column_val = column.strip()
                # add only non empty values
                if column_val and column_num != exchange_col:  
                    results[line_parsed[exchange_col].strip()][column_names[column_num]].add(column_val)

column_names = column_names.values()
print "exchnage|" + "|".join("%8s" %c for c in column_names)
for exchange, values in results.iteritems():
    print "%8s|" % exchange + "|".join("%8s" % str(len(values[column])) for column in column_names)

program output (as input your new files with different columns order were used):

$ python parser.py
exchnage|  ticker|   sedol|   cusip
 newyork|       2|       2|       3
  london|       3|       2|       3

How to obtain max length of fields in huge pipe delimited file

When I import text data into a database, typically I first read the data into a staging table where are the columns are long-enough character fields (say varchar(8000)).

Then, I load from the staging table into the final table:

create table RealTable (
    RealTableId int identity(1, 1) primary key,
    Column1 int,
    Column2 datetime,
    Column3 varchar(12),
    . . .
);

insert into RealTable(<all columns but id>)
    select (case when column1 not like '[^0-9]' then cast(column1 as int) end),
           (case when isdate(column2) = 1 then cast(column2 as datetime),
           . . .

I find it much easier to debug type issues inside the database rather than when inserting into the database.

Count Number of Column in a Pipe Delimited File