line count with in the text files having multiple lines and single lines
Columns in both case is different. To make it generic I wrote a perl script which will print the rows. It generates the regex from headers and used it to calculate the rows. I assumed that first line always represents the number of columns.
#!/usr/bin/perl -w
open(FH, $ARGV[0]) or die "Failed to open file";
# Get coloms from HEADER and use it to contruct regex
my $head = ;
my @col = split(",", $head); # Colums array
my $col_cnt = scalar(@col); # Colums count
# Read rest of the rows
my $rows;
while() {
$rows .= $_;
}
# Create regex based on number of coloms
# E.g for 3 coloms, regex should be
# ".*?",".*?",".*?"
# this represents anything between " and "
my $i=0;
while($i < $col_cnt) {
$col[$i++] = "\".*?\"";
}
my $regex = join(",", @col);
# /s to treat the data as single line
# /g for global matching
my @row_cnt = $rows =~ m/($regex)/sg;
print "Row count:" . scalar(@row_cnt);
Just store it as row_count.pl
and run it as ./row_count.pl filename
Why does the lines count differently using two different way. to load text?
So, looking at the documentation for str.splitlines
, we see that the line delimiters for this method are a superset of "universal newlines":
This method splits on the following line boundaries. In particular,
the boundaries are a superset of universal newlines.
Representation | Description |
---|---|
\n | Line Feed |
\r | Carriage Return |
\r\n | Carriage Return + Line Feed |
\v or \x0b | Line Tabulation |
\f or \x0c | Form Feed |
\x1c | File Separator |
\x1d | Group Separator |
\x1e | Record Separator |
\x85 | Next Line (C1 Control Code) |
\u2028 | Line Separator |
\u2029 | Paragraph Separator |
Reading text Files - single line vs. multiple lines
There are many possible ways of doing this. Which is best for you might depend on how long these files are and how important performance is.
A simple solution is to just read characters one at a time until you hit your tilde delimiter.
The routine ReadOneItem below shows how this can be done.
procedure TForm1.Button1Click(Sender: TObject);
const
FileName = 'c:\kuiper\test2.txt';
var
MyFile : textfile;
Buffer : string;
// Read one item from text file MyFile.
// Load characters one at a time.
// Ignore CR and LF characters
// Stop reading at end-of-file, or when a '~' is read
function ReadOneItem : string;
var
C : char;
begin
Result := '';
// loop continues until break
while true do
begin
// are we at the end-of-file? If so we're done
if eof(MyFile) then
break;
// read in the next character
read ( MyFile, C );
// ignore CR and LF
if ( C = #13 ) or ( C = #10 ) then
{do nothing}
else
begin
// add the character to the end
Result := Result + C;
// if this is the delimiter then stop reading
if C = '~' then
break;
end;
end;
end;
begin
assignfile ( MyFile, FileName );
reset ( MyFile );
try
while not EOF(MyFile) do
begin
Buffer := ReadOneItem;
Memo1 . Lines . Add ( Buffer );
end;
finally
closefile ( MyFile );
end;
end;
Extract common lines from multiple text files and display original line numbers
Associate each unique line with a space separated list of line numbers indicating where it is seen in each file in an array, and print these next to each other at the end if the line is found in all three files.
awk '{
n[$0] = n[$0] FNR OFS
c[$0]++
}
END {
for (r in c)
if (c[r] == 3)
print n[r] r
}' file1 file2 file3
If the number of files is unknown, refer to Ravinder's answer, or just change the hardcoded 3 in the END block with ARGC-1 as shown there.
Search a text file for a multi line string and return line number in Python
Use .count()
and the match
object to count the number of newlines before the match:
import re
with open('example.txt', 'r') as file:
content = file.read()
match = re.search('second line\nThis third line', content)
if match:
print('Found a match starting on line', content.count('\n', 0, match.start()))
match.start()
is the position of the start of the match in content
.
content.count('\n', 0, match.start())
counts the number of newlines in content
between character position 0
and the start of the match.
Use 1 + content.count('\n', 0, match.start())
if you prefer line numbers to start at 1 instead of 0.
How to count lines in a document?
Use wc
:
wc -l
This will output the number of lines in
:
$ wc -l /dir/file.txt
3272485 /dir/file.txt
Or, to omit the
from the result use wc -l <
:
$ wc -l < /dir/file.txt
3272485
You can also pipe data to wc
as well:
$ cat /dir/file.txt | wc -l
3272485
$ curl yahoo.com --silent | wc -l
63
Convert multiple lines to single line using RStudio text editor
Not an automated solution per se, but many people don't know about RStudio's support for multiline cursors. Just hold Alt (Option on Mac) and drag across multiple lines, then press backspace a few times:
Also very handy for adding commas or closing parentheses to multiple lines at once.
Related Topics
How to Pass Variable as a Parameter in Execute SQL Task Ssis
SQL Query to Split Column Data into Rows
SQL Server: Make All Upper Case to Proper Case/Title Case
Cumulative Sum Over a Set of Rows in MySQL
Getting List of Tables, and Fields in Each, in a Database
How to Select an Entire Row Which Has the Largest Id in the Table
Is There Something Wrong With Joins That Don't Use the Join Keyword in SQL or MySQL
How to Find All Connected Subgraphs of an Undirected Graph
How to Create a Step in My SQL Server Agent Job Which Will Run My Ssis Package
Nvarchar(Max) Still Being Truncated
What's the Difference Between Varchar and Char
SQL Server:Dynamic Pivot Over 5 Columns