Write() Versus Writelines() and Concatenated Strings

write() versus writelines() and concatenated strings

  • writelines expects an iterable of strings
  • write expects a single string.

line1 + "\n" + line2 merges those strings together into a single string before passing it to write.

Note that if you have many lines, you may want to use "\n".join(list_of_lines).

Python writelines() and write() huge time difference

file.writelines() expects an iterable of strings. It then proceeds to loop and call file.write() for each string in the iterable. In Python, the method does this:

def writelines(self, lines)
for line in lines:
self.write(line)

You are passing in a single large string, and a string is an iterable of strings too. When iterating you get individual characters, strings of length 1. So in effect you are making len(data) separate calls to file.write(). And that is slow, because you are building up a write buffer a single character at a time.

Don't pass in a single string to file.writelines(). Pass in a list or tuple or other iterable instead.

You could send in individual lines with added newline in a generator expression, for example:

 myWrite.writelines(line + '\n' for line in new_my_list)

Now, if you could make clean_data() a generator, yielding cleaned lines, you could stream data from the input file, through your data cleaning generator, and out to the output file without using any more memory than is required for the read and write buffers and however much state is needed to clean your lines:

with open(inputPath, 'r+') as myRead, open(outPath, 'w+') as myWrite:
myWrite.writelines(line + '\n' for line in clean_data(myRead))

In addition, I'd consider updating clean_data() to emit lines with newlines included.

TypeError: writelines() requires an iterable argument

You are using writelines() but passing in one item at a time; file.writelines() expects an iterable (something producing a sequence of 0 or more values) instead.

Use file.writeline() (singular) instead, or even better, just file.write():

caving.write(item[0])
caving.write('\t')
caving.write(item[1])
caving.write('\n')

If you are writing a Tab-separate file, you might want to use the csv module instead:

import csv

def normalize(dataset):
twoCol = [item[:2] for item in dataset]
labels = [item[2] for item in dataset]
twoColData = preprocessing.scale(float64(twoCol))

with open('/home/nima/Desktop/ramin-ML-Project/caving.txt', 'wb') as caving:
writer = csv.writer(caving, delimiter='\t')

for data, label in itertools.izip(twoColData, labels):
if label == 'caving':
writer.writerow(data)

This produces the same output, but with less hassle.

Multiple lines in file appearing as only one line

writelines doesn't add separators between the lines, so you have to add them yourself:

def saveall(sname, senemyname, scheckpoint,):
file = open("savefile.sav", "w")
file.writelines((line + '\n' for line in [sname, senemyname, scheckpoint]))
file.close()

saveall("John","Steve","Crossroads")

File content:

John 
Steve
Crossroads

Correct way to write line to file?

This should be as simple as:

with open('somefile.txt', 'a') as the_file:
the_file.write('Hello\n')

From The Documentation:

Do not use os.linesep as a line terminator when writing files opened in text mode (the default); use a single '\n' instead, on all platforms.

Some useful reading:

  • The with statement
  • open()
    • 'a' is for append, or use
    • 'w' to write with truncation
  • os (particularly os.linesep)

Is is better to Stream.Write multiple times or concatenate string and Stream.Write once?

There is no single answer to this. You can imagine that writing byte by byte, or character by character, usually causes overhead because each chunk of data travels through the layers of abstraction.

However you can also imagine that buffering as much as possible may not be optimal, ie. if you send data over a network stream, you would like the network to start transmitting data as soon as possible. And your application is busy buffering, so perhaps you're just moving the delay around instead of fixing anything.

In the case of a FileStream the operating system takes care of buffering, in normal circumstances you probably won't notice any difference between your two approaches.

Just write the data as is most fitting for your application, and if you find this is a bottleneck to your application, implement a buffered stream layer between the StreamWriter and the underlying Stream to counter the problem.

Writelines writes lines without newline, Just fills the file

This is actually a pretty common problem for newcomers to Python—especially since, across the standard library and popular third-party libraries, some reading functions strip out newlines, but almost no writing functions (except the log-related stuff) add them.

So, there's a lot of Python code out there that does things like:

fw.write('\n'.join(line_list) + '\n')

(writing a single string) or

fw.writelines(line + '\n' for line in line_list)

Either one is correct, and of course you could even write your own writelinesWithNewlines function that wraps it up…

But you should only do this if you can't avoid it.

It's better if you can create/keep the newlines in the first place—as in Greg Hewgill's suggestions:

line_list.append(new_line + "\n")

And it's even better if you can work at a higher level than raw lines of text, e.g., by using the csv module in the standard library, as esuaro suggests.

For example, right after defining fw, you might do this:

cw = csv.writer(fw, delimiter='|')

Then, instead of this:

new_line = d[looking_for]+'|'+'|'.join(columns[1:])
line_list.append(new_line)

You do this:

row_list.append(d[looking_for] + columns[1:])

And at the end, instead of this:

fw.writelines(line_list)

You do this:

cw.writerows(row_list)

Finally, your design is "open a file, then build up a list of lines to add to the file, then write them all at once". If you're going to open the file up top, why not just write the lines one by one? Whether you're using simple writes or a csv.writer, it'll make your life simpler, and your code easier to read. (Sometimes there can be simplicity, efficiency, or correctness reasons to write a file all at once—but once you've moved the open all the way to the opposite end of the program from the write, you've pretty much lost any benefits of all-at-once.)

Python2 writelines() called with string

from documentation

writelines(lines)

Write a list of lines to the stream. Line separators are not added, so
it is usual for each of the lines provided to have a line separator at
the end.

Example:

with open('test.txt', 'w') as f:
f.writelines(['a\n','a\n','a\n'])

Output:

a
a
a

Please note that the line separator \n is part of each string.



Related Topics



Leave a reply



Submit