Reading a Text File and Splitting It into Single Words in Python

Reading a text file and splitting it into single words in python

Given this file:

$ cat words.txt
line1 word1 word2
line2 word3 word4
line3 word5 word6

If you just want one word at a time (ignoring the meaning of spaces vs line breaks in the file):

with open('words.txt','r') as f:
for line in f:
for word in line.split():
print(word)

Prints:

line1
word1
word2
line2
...
word6

Similarly, if you want to flatten the file into a single flat list of words in the file, you might do something like this:

with open('words.txt') as f:
flat_list=[word for line in f for word in line.split()]

>>> flat_list
['line1', 'word1', 'word2', 'line2', 'word3', 'word4', 'line3', 'word5', 'word6']

Which can create the same output as the first example with print '\n'.join(flat_list)...

Or, if you want a nested list of the words in each line of the file (for example, to create a matrix of rows and columns from a file):

with open('words.txt') as f:
matrix=[line.split() for line in f]

>>> matrix
[['line1', 'word1', 'word2'], ['line2', 'word3', 'word4'], ['line3', 'word5', 'word6']]

If you want a regex solution, which would allow you to filter wordN vs lineN type words in the example file:

import re
with open("words.txt") as f:
for line in f:
for word in re.findall(r'\bword\d+', line):
# wordN by wordN with no lineN

Or, if you want that to be a line by line generator with a regex:

 with open("words.txt") as f:
(word for line in f for word in re.findall(r'\w+', line))

Split text file into lines by key word python

When you get the content of a .txt file like this...

with open("file.txt", 'r') as file:
content = file.read()

...you have it as a string, so you can split it with the function str.split():

content = content.split(my_keyword)

You can do it with a function:

def splitter(path: str, keyword: str) -> str:
with open(path, 'r') as file:
content = file.read()
return content.split(keyword)

that you can call this way:

>>> splitter("file.txt", "data")
["I really like to write the word ", ", because I think it has a lot of meaning."]

How to split a text file to its words in python?

Nobody has suggested a generator, I'm surprised. Here's how I would do it:

def words(stringIterable):
#upcast the argument to an iterator, if it's an iterator already, it stays the same
lineStream = iter(stringIterable)
for line in lineStream: #enumerate the lines
for word in line.split(): #further break them down
yield word

Now this can be used both on simple lists of sentences that you might have in memory already:

listOfLines = ['hi there', 'how are you']
for word in words(listOfLines):
print(word)

But it will work just as well on a file, without needing to read the whole file in memory:

with open('words.py', 'r') as myself:
for word in words(myself):
print(word)

splitting a text file into words using regex in python

You can use re.findall with the following regex pattern instead to find all words that are more than 1 character long.

Change:

message = print(re.split(',.-\d\c\s',text))

to:

message = re.findall(r'[A-Za-z]{2,}', text))

Python - Split content of text file when specific string is found

Thanks to the assistance from dantechguy and Thomas Weller the solution to the problem I had is below:

with open(info["textfile"], 'r') as file: # using with to open file means we don't have to close it after finishing
msg = file.read().strip().split ("--------------------") # reads content of textfile and split when "-------------------" is found and creates list of strings.
for item in msg: # for loop to call each item
print (item) # print to double check output
await message.author.send(item) # send each item as a new message in discord.

As explained in their comments, all that needed to be done was split on the "---------------------" to split the string into a list of strings then send each item as a message.

In Python, split a file into lines, and then print only ones starting with

In the loop you created, all the lines are already assigned to the lines list, and you assign them to the variable x sequentially with the loop. That's why you can access it directly using x when accessing it inside the loop.

with open('test.txt', 'r') as text:
lines = text.readlines()
for x in lines:
if x.startswith('result'):
print(x)

How can I split a text into sentences?

The Natural Language Toolkit (nltk.org) has what you need. This group posting indicates this does it:

import nltk.data

tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("test.txt")
data = fp.read()
print '\n-----\n'.join(tokenizer.tokenize(data))

(I haven't tried it!)

Python: Splitting words in text file into limited 40 characters AND filling the extra slot with spaces

Add the following:

def pad(line, limit):
return line + " " * (limit-len(line))

def split_string (text, limit, sep= " "):

words = text.split()
if max(map(len,words)) > limit:
raise ValueError("limit is too small")
res = []
part = words[0]
others = words[1:]
for word in others:
if len(sep)+len(word) > limit-len(part):
res.append(part)
part = word

else:
part +=sep+word

if part:
res.append(part)

result = [pad(l, limit) for l in res]
return result


Related Topics



Leave a reply



Submit