How to Read First N Lines of a File

How to read first N lines of a file?

Python 3:

with open("datafile") as myfile:
head = [next(myfile) for x in range(N)]
print(head)

Python 2:

with open("datafile") as myfile:
head = [next(myfile) for x in xrange(N)]
print head

Here's another way (both Python 2 & 3):

from itertools import islice

with open("datafile") as myfile:
head = list(islice(myfile, N))
print(head)

How to read file N lines at a time?

One solution would be a list comprehension and the slice operator:

with open(filename, 'r') as infile:
lines = [line for line in infile][:N]

After this lines is tuple of lines. However, this would load the complete file into memory. If you don't want this (i.e. if the file could be really large) there is another solution using a generator expression and islice from the itertools package:

from itertools import islice
with open(filename, 'r') as infile:
lines_gen = islice(infile, N)

lines_gen is a generator object, that gives you each line of the file and can be used in a loop like this:

for line in lines_gen:
print line

Both solutions give you up to N lines (or fewer, if the file doesn't have that much).

How to print the first n lines of file?

Literally, read first n lines, then stop.

def read_first_lines(filename, limit):
result = []
with open(filename, 'r') as input_file:
# files are iterable, you can have a for-loop over a file.
for line_number, line in enumerate(input_file):
if line_number > limit: # line_number starts at 0.
break
result.append(line)
return result

How to read first N lines of a text file and write it to another text file?

Well, you have it right, using islice(filename, n) will get you the first n lines of file filename. The problem here is when you try and write these lines to another file.

The error is pretty intuitive (I've added the full error one receives in this case):

TypeError: write() argument must be str, not list

This is because f.write() accepts strings as parameters, not list types.

So, instead of dumping the list as is, write the contents of it in your other file using a for loop:

with open("input.txt", "r") as myfile:
head = list(islice(myfile, 3))

# always remember, use files in a with statement
with open("output.txt", "w") as f2:
for item in head:
f2.write(item)

Granted that the contents of the list are all of type str this works like a charm; if not, you just need to wrap each item in the for loop in an str() call to make sure it is converted to a string.

If you want an approach that doesn't require a loop, you could always consider using f.writelines() instead of f.write() (and, take a look at Jon's comment for another tip with writelines).

How can I read first n and last n lines from a file?

Chances are you're going to want something like:

... | awk -v OFS='\n' '{a[NR]=$0} END{print a[1], a[2], a[NR-1], a[NR]}'

or if you need to specify a number and taking into account @Wintermute's astute observation that you don't need to buffer the whole file, something like this is what you really want:

... | awk -v n=2 'NR<=n{print;next} {buf[((NR-1)%n)+1]=$0}
END{for (i=1;i<=n;i++) print buf[((NR+i-1)%n)+1]}'

I think the math is correct on that - hopefully you get the idea to use a rotating buffer indexed by the NR modded by the size of the buffer and adjusted to use indices in the range 1-n instead of 0-(n-1).

To help with comprehension of the modulus operator used in the indexing above, here is an example with intermediate print statements to show the logic as it executes:

$ cat file   
1
2
3
4
5
6
7
8

.

$ cat tst.awk                
BEGIN {
print "Populating array by index ((NR-1)%n)+1:"
}
{
buf[((NR-1)%n)+1] = $0

printf "NR=%d, n=%d: ((NR-1 = %d) %%n = %d) +1 = %d -> buf[%d] = %s\n",
NR, n, NR-1, (NR-1)%n, ((NR-1)%n)+1, ((NR-1)%n)+1, buf[((NR-1)%n)+1]

}
END {
print "\nAccessing array by index ((NR+i-1)%n)+1:"
for (i=1;i<=n;i++) {
printf "NR=%d, i=%d, n=%d: (((NR+i = %d) - 1 = %d) %%n = %d) +1 = %d -> buf[%d] = %s\n",
NR, i, n, NR+i, NR+i-1, (NR+i-1)%n, ((NR+i-1)%n)+1, ((NR+i-1)%n)+1, buf[((NR+i-1)%n)+1]
}
}
$
$ awk -v n=3 -f tst.awk file
Populating array by index ((NR-1)%n)+1:
NR=1, n=3: ((NR-1 = 0) %n = 0) +1 = 1 -> buf[1] = 1
NR=2, n=3: ((NR-1 = 1) %n = 1) +1 = 2 -> buf[2] = 2
NR=3, n=3: ((NR-1 = 2) %n = 2) +1 = 3 -> buf[3] = 3
NR=4, n=3: ((NR-1 = 3) %n = 0) +1 = 1 -> buf[1] = 4
NR=5, n=3: ((NR-1 = 4) %n = 1) +1 = 2 -> buf[2] = 5
NR=6, n=3: ((NR-1 = 5) %n = 2) +1 = 3 -> buf[3] = 6
NR=7, n=3: ((NR-1 = 6) %n = 0) +1 = 1 -> buf[1] = 7
NR=8, n=3: ((NR-1 = 7) %n = 1) +1 = 2 -> buf[2] = 8

Accessing array by index ((NR+i-1)%n)+1:
NR=8, i=1, n=3: (((NR+i = 9) - 1 = 8) %n = 2) +1 = 3 -> buf[3] = 6
NR=8, i=2, n=3: (((NR+i = 10) - 1 = 9) %n = 0) +1 = 1 -> buf[1] = 7
NR=8, i=3, n=3: (((NR+i = 11) - 1 = 10) %n = 1) +1 = 2 -> buf[2] = 8

Read first N lines using readlines

from itertools import islice
with open("file.txt") as myfile:
k = list(islice(myfile, n))
print k

or

with open('file.txt') as w:
k = np.asarray(w.readlines(),np.float)
k = k[:,n]

Reading first N lines from a text file in C

With

if (i==0){
some_variable = ((int) atoi(line_string));
i++;
}
if (i==1){

You'll enter the two ifs the first time round. You need an else to tell the compiler to not enter the second if, when i goes from 0 to 1:

if (i==0){
some_variable = ((int) atoi(line_string));
i++;
}
else if (i==1){

How can I view only the first n lines of the file?

Did you try the man page for head?

head -n 10 filename


Related Topics



Leave a reply



Submit