Finding The Longest Word in a Text File

Finding the longest word in a text file

Normally, you'd want to use a while read loop instead of for i in $(cat), but since you want all the words to be split, in this case it would work out OK.

#!/bin/bash
longest=0
for word in $(<so.txt)
do
len=${#word}
if (( len > longest ))
then
longest=$len
longword=$word
fi
done
printf 'The longest word is %s and its length is %d.\n' "$longword" "$longest"

How can I find the longest word in a text file?

So you want to find the longest sorted combination from a set of letters that exist in your dictionary.

To do so, you'd use itertools.combinations() with a length equal to the length of your string. You'd check all of those combinations against the sorted dictionary, and if you don't find a match, decrease the combination length.

You also want to load the entire dictionary into a set to decrease search times. I've loaded the set of words into a dictionary, where the key is the sorted string, and the value is a list of words that have the same sorted representation.

Something like this:

import itertools
from collections import defaultdict

words = defaultdict(list)
with open('/usr/share/dict/words') as qfile:
for word in qfile:
word = word.rstrip('\n').lower()
words[''.join(sorted(word))].append(word)

def longest_anagram(term, words):
search_length = len(term)
term = sorted(term) # combinations maintains sort order
while search_length > 0:
for combo in itertools.combinations(term, search_length):
search = ''.join(combo) # sort above means we dont need it here
if search in words:
return words[search]
search_length -= 1
return None

found = longest_anagram('qugteroda', words)
for w in found:
print(w)

For completeness I should mention that this approach is appropriate for a search string of 18 letters or less. If you need to find the longest anagram out of a string of letters that is greater than 18, you're better off flipping the algorithm so you sort the dictionary words by length into a list. You'd then iterate through all the words and check to see if they exist in the input search string - much like @abarnert's answer.

Python - Finding the longest word in a text file error

This should work for you:

from functools import reduce

def find_longest_word(filename):
f = open(filename, "r")
s = [y for x in f.readlines() for y in x.split()]
longest_word = reduce(lambda x, y: y if len(x) < len(y) else x, s, "")
print("The longest word is", longest_word, "and it is", len(longest_word),"characters long")

return longest_word

print(find_longest_word('input.txt'))

Finding the longest word in a .txt file without punctuation marks

You have to strip those characters from the words:

with open("original-3.txt", 'r') as file1:
lines = file1.readlines()
for line in lines:
if not line == "\n":
print(max(word.strip(",?;!\"") for word in line.split()), key=len))

or you use regular expressions to extract everything that looks like a word (i.e. consists of letters):

import re

for line in lines:
words = re.findall(r"\w+", line)
if words:
print(max(words, key=len))

Read Multiple text files and Finding longest word from each text files

The program you created calculates the longest word so far from all the files it has read. Meaning that 'disastrous' is the longest word in files f1 and f2.

If you want to get the longest word of each file separately and not in comparison with other files you should add this line of code at the end of each iteration of the for loop in your getLongestWords() method.

 longestWord = ""; 

Also the method getLongestWords() can be a void method since you do not need to use the string it returns.

Longest word in file

First of all, you cannot store the pointer to longest word like that. You re-use str for the next line and so the pointer is not likely to point to something useful.

Second, while strtok() appears simple, initially, I tend to apply a straightforward approach to a straightforward problem.
The problem is O(n) (where n is the length of the document). You just need to go through it character by character. Of course, since every line is ended by a \n, you can use the line based approach in this case.

So, instead of strtok, simply check each character, if it is a legal word character (an alphanumeric character, that is). You can easily do so with the standard library function isalpha() from header ctype.h.

Below is the program, copying the longest string into a dedicated buffer, using isalpha() and doing the line based reading of the file, just like the code in the original question did.

Of course, this code assumes, no line is ever longer than 999 characters.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <ctype.h>

static size_t gulp(const char* line, size_t istart, size_t len) {
size_t n = 0;
for (size_t i = istart; i < len; i++, n++) {
if (!isalpha(line[i])) {
break;
}
}
return n;
}

int main(int argc, const char * argv[]) {
FILE* f = fopen("document.txt","r");
char line[1000];
char longest_word[1000];
size_t longest_word_length = 0;
while (fgets(line, sizeof(line), f) != NULL) {
size_t i0 = 0;
size_t line_length = strlen(line);
while (i0 < line_length) {
if (isalpha(line[i0])) {
size_t n = gulp(line, i0, line_length);
if (n > longest_word_length) {
strncpy(longest_word, &line[i0], n);
longest_word[n] = '\0';
longest_word_length = n;
}
i0 = i0 + n;
} else {
i0++;
}
}
}
fclose(f);
f = NULL;
if (longest_word_length > 0) {
printf("longest word: %s (%lu characters)\n",
longest_word, longest_word_length);
}
return 0;
}

How to find longest words from a text file with any characters?

There you go fully working I believe: (finds and returns the longest word in the text file)

import java.util.Scanner;
import java.io.File;
import java.io.FileNotFoundException;

public class hello {
public static void main(String [ ] args) throws FileNotFoundException {
new hello().getLongestWords();
}

public String getLongestWords() throws FileNotFoundException {

String longestWord = "";
String current;
Scanner scan = new Scanner(new File("file.txt"));

while (scan.hasNext()) {
current = scan.next();
if (current.length() > longestWord.length()) {
longestWord = current;
}

}
System.out.println(longestWord);
return longestWord;
}

}

strip punctuation:

    longestWord.replaceAll("[^a-zA-Z ]", "").split("\\s+");

before you return !

If you want not to take into account words with numbers:

if ((current.length() > longestWord.length()) && (!current.matches(".*\\d.*"))) {

Everything together:

import java.util.Scanner;
import java.io.*;

public class hello {
public static void main(String [ ] args) throws FileNotFoundException {
new hello().getLongestWords();
}

public String getLongestWords() throws FileNotFoundException {

String longestWord = "";
String current;
Scanner scan = new Scanner(new File("file.txt"));

while (scan.hasNext()) {
current = scan.next();
if ((current.length() > longestWord.length()) && (!current.matches(".*\\d.*"))) {
longestWord = current;
}

}
System.out.println(longestWord);
longestWord.replaceAll("[^a-zA-Z ]", "").split("\\s+");
return longestWord;
}

}


Related Topics



Leave a reply



Submit