Line Count with in the Text Files Having Multiple Lines and Single Lines

line count with in the text files having multiple lines and single lines

Columns in both case is different. To make it generic I wrote a perl script which will print the rows. It generates the regex from headers and used it to calculate the rows. I assumed that first line always represents the number of columns.

#!/usr/bin/perl -w

open(FH, $ARGV[0]) or die "Failed to open file";

# Get coloms from HEADER and use it to contruct regex 
my $head = ;
my @col = split(",", $head); # Colums array
my $col_cnt = scalar(@col);  # Colums count

# Read rest of the rows 
my $rows;
while() {
$rows .= $_;
}

# Create regex based on number of coloms
# E.g for 3 coloms, regex should be 
# ".*?",".*?",".*?" 
# this represents anything between " and "
my $i=0;
while($i < $col_cnt) {
$col[$i++] = "\".*?\"";
}
my $regex = join(",", @col);

# /s to treat the data as single line 
# /g for global matching
my @row_cnt = $rows =~ m/($regex)/sg; 
print "Row count:" . scalar(@row_cnt);

Just store it as row_count.pl and run it as ./row_count.pl filename

Why does the lines count differently using two different way. to load text?

So, looking at the documentation for str.splitlines, we see that the line delimiters for this method are a superset of "universal newlines":

This method splits on the following line boundaries. In particular,
the boundaries are a superset of universal newlines.

Representation	Description
`\n`	Line Feed
`\r`	Carriage Return
`\r\n`	Carriage Return + Line Feed
`\v` or `\x0b`	Line Tabulation
`\f` or `\x0c`	Form Feed
`\x1c`	File Separator
`\x1d`	Group Separator
`\x1e`	Record Separator
`\x85`	Next Line (C1 Control Code)
`\u2028`	Line Separator
`\u2029`	Paragraph Separator

Reading text Files - single line vs. multiple lines

There are many possible ways of doing this. Which is best for you might depend on how long these files are and how important performance is.

A simple solution is to just read characters one at a time until you hit your tilde delimiter.
The routine ReadOneItem below shows how this can be done.

procedure TForm1.Button1Click(Sender: TObject);
const
  FileName = 'c:\kuiper\test2.txt';
var
  MyFile : textfile;
  Buffer : string;

  // Read one item from text file MyFile.
  // Load characters one at a time.
  // Ignore CR and LF characters
  // Stop reading at end-of-file, or when a '~' is read

  function ReadOneItem : string;
  var
    C : char;
  begin
    Result := '';

    // loop continues until break
    while true do
      begin

        // are we at the end-of-file? If so we're done
        if eof(MyFile) then
          break;

        // read in the next character
        read ( MyFile, C );

        // ignore CR and LF
        if ( C = #13 ) or ( C = #10 ) then
          {do nothing}
        else
          begin

            // add the character to the end
            Result := Result + C;

            // if this is the delimiter then stop reading
            if C = '~' then
              break;
          end;
      end;
  end;


begin
  assignfile ( MyFile, FileName );
  reset ( MyFile );
  try

    while not EOF(MyFile) do
      begin
        Buffer := ReadOneItem;
        Memo1 . Lines . Add ( Buffer );
      end;

  finally
    closefile ( MyFile );
  end;
end;

Extract common lines from multiple text files and display original line numbers

Associate each unique line with a space separated list of line numbers indicating where it is seen in each file in an array, and print these next to each other at the end if the line is found in all three files.

awk '{
  n[$0] = n[$0] FNR OFS
  c[$0]++
}
END {
  for (r in c)
    if (c[r] == 3)
      print n[r] r
}' file1 file2 file3

If the number of files is unknown, refer to Ravinder's answer, or just change the hardcoded 3 in the END block with ARGC-1 as shown there.

Search a text file for a multi line string and return line number in Python

Use .count() and the match object to count the number of newlines before the match:

import re

with open('example.txt', 'r') as file:
    content = file.read()
match = re.search('second line\nThis third line', content)
if match:
    print('Found a match starting on line', content.count('\n', 0, match.start()))

match.start() is the position of the start of the match in content.

content.count('\n', 0, match.start()) counts the number of newlines in content between character position 0 and the start of the match.

Use 1 + content.count('\n', 0, match.start()) if you prefer line numbers to start at 1 instead of 0.

How to count lines in a document?

Use wc:

wc -l

This will output the number of lines in :

$ wc -l /dir/file.txt
3272485 /dir/file.txt

Or, to omit the from the result use wc -l < :

$ wc -l < /dir/file.txt
3272485

You can also pipe data to wc as well:

$ cat /dir/file.txt | wc -l
3272485
$ curl yahoo.com --silent | wc -l
63

Convert multiple lines to single line using RStudio text editor

Not an automated solution per se, but many people don't know about RStudio's support for multiline cursors. Just hold Alt (Option on Mac) and drag across multiple lines, then press backspace a few times:

Sample Image

Also very handy for adding commas or closing parentheses to multiple lines at once.

Line Count with in the Text Files Having Multiple Lines and Single Lines