Character Reading from File in Python

How to read a single character at a time from a file in Python?

with open(filename) as f:
while True:
c = f.read(1)
if not c:
print "End of file"
break
print "Read a character:", c

Character reading from file in Python

Ref: http://docs.python.org/howto/unicode

Reading Unicode from a file is therefore simple:

import codecs
with codecs.open('unicode.rst', encoding='utf-8') as f:
for line in f:
print repr(line)

It's also possible to open files in update mode, allowing both reading and writing:

with codecs.open('test', encoding='utf-8', mode='w+') as f:
f.write(u'\u4500 blah blah blah\n')
f.seek(0)
print repr(f.readline()[:1])

EDIT: I'm assuming that your intended goal is just to be able to read the file properly into a string in Python. If you're trying to convert to an ASCII string from Unicode, then there's really no direct way to do so, since the Unicode characters won't necessarily exist in ASCII.

If you're trying to convert to an ASCII string, try one of the following:

  1. Replace the specific unicode chars with ASCII equivalents, if you are only looking to handle a few special cases such as this particular example

  2. Use the unicodedata module's normalize() and the string.encode() method to convert as best you can to the next closest ASCII equivalent (Ref https://web.archive.org/web/20090228203858/http://techxplorer.com/2006/07/18/converting-unicode-to-ascii-using-python):

    >>> teststr
    u'I don\xe2\x80\x98t like this'
    >>> unicodedata.normalize('NFKD', teststr).encode('ascii', 'ignore')
    'I donat like this'

Python : read text file character by character in loop

I'd approach this differently, and make a function that takes in a filename that returns a generator:

def reader(filename):
with open(filename) as f:
while True:
# read next character
char = f.read(1)
# if not EOF, then at least 1 character was read, and
# this is not empty
if char:
yield char
else:
return

Then you need to give the filename only once

r = reader('filename')

And the file is kept opened for much faster operation. To fetch next character, use the next built-in function

print(next(r))  # 0
print(next(r)) # 1
...

You can also use itertools, such as islice on this object slice characters, or use that in a for loop:

# skip characters until newline
for c in r:
if r == '\n':
break

Python not able to read – character from text file

In open() the default encoding is platform dependent, you can find out what is the default for your system by checking what locale.getpreferredencoding() returns. This is from the documentation

For the 2nd part of your question, since you are not getting an error when you do not specify utf-8 as encoding, you could just use the output for locale.getpreferredencoding() as the encoding method.

Read file up to a character

This is still far from optimal, but it would be a pure-Python implementation of a very simple buffer:

def my_open(filename, char):
with open(filename) as f:
old_fb=""
for file_buffer in iter(lambda: f.read(1024), ''):
if old_fb:
file_buffer = old_fb + file_buffer
pos = file_buffer.find(char)
while pos != -1 and file_buffer:
yield file_buffer[:pos]
file_buffer = file_buffer[pos+1:]
pos = file_buffer.find(char)
old_fb = file_buffer
yield old_fb

# Usage:
for line in my_open("weirdfile", "~"):
print(line)

Read a specific line or character from a text file, not recognizing the text

Your code works, but it cannot recognize the characters because readlines() also includes a newline character, so it reads 'x\n' rather than 'x'. Therefore there is no literal match. Replace .readlines() with .read().splitlines() to solve this.



Related Topics



Leave a reply



Submit