Reading a text file and splitting it into single words in python
Given this file:
$ cat words.txt
line1 word1 word2
line2 word3 word4
line3 word5 word6
If you just want one word at a time (ignoring the meaning of spaces vs line breaks in the file):
with open('words.txt','r') as f:
for line in f:
for word in line.split():
print(word)
Prints:
line1
word1
word2
line2
...
word6
Similarly, if you want to flatten the file into a single flat list of words in the file, you might do something like this:
with open('words.txt') as f:
flat_list=[word for line in f for word in line.split()]
>>> flat_list
['line1', 'word1', 'word2', 'line2', 'word3', 'word4', 'line3', 'word5', 'word6']
Which can create the same output as the first example with print '\n'.join(flat_list)
...
Or, if you want a nested list of the words in each line of the file (for example, to create a matrix of rows and columns from a file):
with open('words.txt') as f:
matrix=[line.split() for line in f]
>>> matrix
[['line1', 'word1', 'word2'], ['line2', 'word3', 'word4'], ['line3', 'word5', 'word6']]
If you want a regex solution, which would allow you to filter wordN
vs lineN
type words in the example file:
import re
with open("words.txt") as f:
for line in f:
for word in re.findall(r'\bword\d+', line):
# wordN by wordN with no lineN
Or, if you want that to be a line by line generator with a regex:
with open("words.txt") as f:
(word for line in f for word in re.findall(r'\w+', line))
Split text file into lines by key word python
When you get the content of a .txt
file like this...
with open("file.txt", 'r') as file:
content = file.read()
...you have it as a str
ing, so you can split it with the function str.split()
:
content = content.split(my_keyword)
You can do it with a function:
def splitter(path: str, keyword: str) -> str:
with open(path, 'r') as file:
content = file.read()
return content.split(keyword)
that you can call this way:
>>> splitter("file.txt", "data")
["I really like to write the word ", ", because I think it has a lot of meaning."]
How to split a text file to its words in python?
Nobody has suggested a generator, I'm surprised. Here's how I would do it:
def words(stringIterable):
#upcast the argument to an iterator, if it's an iterator already, it stays the same
lineStream = iter(stringIterable)
for line in lineStream: #enumerate the lines
for word in line.split(): #further break them down
yield word
Now this can be used both on simple lists of sentences that you might have in memory already:
listOfLines = ['hi there', 'how are you']
for word in words(listOfLines):
print(word)
But it will work just as well on a file, without needing to read the whole file in memory:
with open('words.py', 'r') as myself:
for word in words(myself):
print(word)
splitting a text file into words using regex in python
You can use re.findall
with the following regex pattern instead to find all words that are more than 1 character long.
Change:
message = print(re.split(',.-\d\c\s',text))
to:
message = re.findall(r'[A-Za-z]{2,}', text))
Python - Split content of text file when specific string is found
Thanks to the assistance from dantechguy and Thomas Weller the solution to the problem I had is below:
with open(info["textfile"], 'r') as file: # using with to open file means we don't have to close it after finishing
msg = file.read().strip().split ("--------------------") # reads content of textfile and split when "-------------------" is found and creates list of strings.
for item in msg: # for loop to call each item
print (item) # print to double check output
await message.author.send(item) # send each item as a new message in discord.
As explained in their comments, all that needed to be done was split on the "---------------------" to split the string into a list of strings then send each item as a message.
In Python, split a file into lines, and then print only ones starting with
In the loop you created, all the lines are already assigned to the lines list, and you assign them to the variable x sequentially with the loop. That's why you can access it directly using x when accessing it inside the loop.
with open('test.txt', 'r') as text:
lines = text.readlines()
for x in lines:
if x.startswith('result'):
print(x)
How can I split a text into sentences?
The Natural Language Toolkit (nltk.org) has what you need. This group posting indicates this does it:
import nltk.data
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("test.txt")
data = fp.read()
print '\n-----\n'.join(tokenizer.tokenize(data))
(I haven't tried it!)
Python: Splitting words in text file into limited 40 characters AND filling the extra slot with spaces
Add the following:
def pad(line, limit):
return line + " " * (limit-len(line))
def split_string (text, limit, sep= " "):
words = text.split()
if max(map(len,words)) > limit:
raise ValueError("limit is too small")
res = []
part = words[0]
others = words[1:]
for word in others:
if len(sep)+len(word) > limit-len(part):
res.append(part)
part = word
else:
part +=sep+word
if part:
res.append(part)
result = [pad(l, limit) for l in res]
return result
Related Topics
Find Column Name in Pandas That Matches an Array
Usage of Sys.Stdout.Flush() Method
Python Pip Install Module Is Not Found. How to Link Python to Pip Location
How to Start and Stop a Thread
Why Are There No ++ and -- Operators in Python
How to Source Virtualenv Activate in a Bash Script
How to Change Index of a for Loop
Selecting Multiple Slices from a Numpy Array at Once
Collision Between Masks in Pygame
How to Change Default Anaconda Python Environment
Pandas/Python: Set Value of One Column Based on Value in Another Column
Is It Safe to Replace a Self Object by Another Object of the Same Type in a Method
List Directory Tree Structure in Python