Line Count with in the Text Files Having Multiple Lines and Single Lines

line count with in the text files having multiple lines and single lines

Columns in both case is different. To make it generic I wrote a perl script which will print the rows. It generates the regex from headers and used it to calculate the rows. I assumed that first line always represents the number of columns.

#!/usr/bin/perl -w

open(FH, $ARGV[0]) or die "Failed to open file";

# Get coloms from HEADER and use it to contruct regex
my $head = ;
my @col = split(",", $head); # Colums array
my $col_cnt = scalar(@col); # Colums count

# Read rest of the rows
my $rows;
while() {
$rows .= $_;
}

# Create regex based on number of coloms
# E.g for 3 coloms, regex should be
# ".*?",".*?",".*?"
# this represents anything between " and "
my $i=0;
while($i < $col_cnt) {
$col[$i++] = "\".*?\"";
}
my $regex = join(",", @col);

# /s to treat the data as single line
# /g for global matching
my @row_cnt = $rows =~ m/($regex)/sg;
print "Row count:" . scalar(@row_cnt);

Just store it as row_count.pl and run it as ./row_count.pl filename

Why does the lines count differently using two different way. to load text?

So, looking at the documentation for str.splitlines, we see that the line delimiters for this method are a superset of "universal newlines":

This method splits on the following line boundaries. In particular,
the boundaries are a superset of universal newlines.























































RepresentationDescription
\nLine Feed
\rCarriage Return
\r\nCarriage Return + Line Feed
\v or \x0bLine Tabulation
\f or \x0cForm Feed
\x1cFile Separator
\x1dGroup Separator
\x1eRecord Separator
\x85Next Line (C1 Control Code)
\u2028Line Separator
\u2029Paragraph Separator

Reading text Files - single line vs. multiple lines

There are many possible ways of doing this. Which is best for you might depend on how long these files are and how important performance is.

A simple solution is to just read characters one at a time until you hit your tilde delimiter.
The routine ReadOneItem below shows how this can be done.

procedure TForm1.Button1Click(Sender: TObject);
const
FileName = 'c:\kuiper\test2.txt';
var
MyFile : textfile;
Buffer : string;

// Read one item from text file MyFile.
// Load characters one at a time.
// Ignore CR and LF characters
// Stop reading at end-of-file, or when a '~' is read

function ReadOneItem : string;
var
C : char;
begin
Result := '';

// loop continues until break
while true do
begin

// are we at the end-of-file? If so we're done
if eof(MyFile) then
break;

// read in the next character
read ( MyFile, C );

// ignore CR and LF
if ( C = #13 ) or ( C = #10 ) then
{do nothing}
else
begin

// add the character to the end
Result := Result + C;

// if this is the delimiter then stop reading
if C = '~' then
break;
end;
end;
end;


begin
assignfile ( MyFile, FileName );
reset ( MyFile );
try

while not EOF(MyFile) do
begin
Buffer := ReadOneItem;
Memo1 . Lines . Add ( Buffer );
end;

finally
closefile ( MyFile );
end;
end;

Extract common lines from multiple text files and display original line numbers

Associate each unique line with a space separated list of line numbers indicating where it is seen in each file in an array, and print these next to each other at the end if the line is found in all three files.

awk '{
n[$0] = n[$0] FNR OFS
c[$0]++
}
END {
for (r in c)
if (c[r] == 3)
print n[r] r
}' file1 file2 file3

If the number of files is unknown, refer to Ravinder's answer, or just change the hardcoded 3 in the END block with ARGC-1 as shown there.

Search a text file for a multi line string and return line number in Python

Use .count() and the match object to count the number of newlines before the match:

import re

with open('example.txt', 'r') as file:
content = file.read()
match = re.search('second line\nThis third line', content)
if match:
print('Found a match starting on line', content.count('\n', 0, match.start()))

match.start() is the position of the start of the match in content.

content.count('\n', 0, match.start()) counts the number of newlines in content between character position 0 and the start of the match.

Use 1 + content.count('\n', 0, match.start()) if you prefer line numbers to start at 1 instead of 0.

How to count lines in a document?

Use wc:

wc -l 

This will output the number of lines in :

$ wc -l /dir/file.txt
3272485 /dir/file.txt

Or, to omit the from the result use wc -l < :

$ wc -l < /dir/file.txt
3272485

You can also pipe data to wc as well:

$ cat /dir/file.txt | wc -l
3272485
$ curl yahoo.com --silent | wc -l
63

Convert multiple lines to single line using RStudio text editor

Not an automated solution per se, but many people don't know about RStudio's support for multiline cursors. Just hold Alt (Option on Mac) and drag across multiple lines, then press backspace a few times:

Sample Image

Also very handy for adding commas or closing parentheses to multiple lines at once.



Related Topics



Leave a reply



Submit