How to Find the Particular Text Stored in the File "Data.Txt" and It Occurs Only Once

How to find files with multiple occurrences of a specific string

@echo off
for %%a in (*.txt) do (
for /f %%b in ('type "%%a"^|find /c "myword"') do (
if %%b geq 2 echo %%a [actual count: %%b]
)
)

Notes:

  • find /c doesn't count occurrences of a string, but lines that contain that word (one or several times) which isn't the same but might be good enough for you.
  • you might want to find /i /c to make it case insensitive (finding "myword" as well as "MyWord")
  • echo %%a [actual count: %%b] is for troubleshooting only, you want to replace it with the copy command in your final code.

Linux: Extract values with one instance of record in file

Try with:

uniq -u input_file

From uniq manual:

-u, --unique

only print unique lines

read and get specific int line of the text file into variable on python

You can simply iterate through the lines, and assign the line to a previous created string variable, if the condition is met, that the string occurs in a line of the file:

a_file = open('file.txt')
string = 'a word'
res_line = ""

for line in a_file:
if string in line:
res_line = line

print(res_line)

a_file.close()

you could also create a list which contains every line in that the string occurs:

a_file = open('file.txt')
string = 'a word'

res = [line for line in a_file if string in line]
print(res)

a_file.close()

Find the lines in the text file

Not tested but It should work

def index(filename, words):
with open(filename, "r") as f:
for i, line in enumerate(f):
for word in words:
if word in line:
return "%s at line %i" % (word, i + 1)

print index("some_filename", ["word1", "word2"])

Or to avoid nested for loop :

def index(filename, words):
with open(filename, "r") as f:
for line, word in itertools.product(enumerate(f), words):
if word in line[1]:
return "%s at line %i" % (word, line[0] + 1)

print index("some_filename", ["word1", "word2"])

And using list comprehension :

def index(filename, words):
with open(filename, "r") as f:
return "\n".join("%s at line %i" % (word, line[0] + 1) for line, word in itertools.product(enumerate(f), words) if word in line[1])

print index("some_filename", ["word1", "word2"])

Reading certain letters after a specified string from a text file

Hm, basically 50MB is considered "small" nowadays. With taht small data, you can read the whole file into one std::stringand then do a linear search.

So, the algorithm is:

  1. Open files and check, if they could be opened
  2. Read complete file into a std::string
  3. Do a linear search for the string "data-permalink=""
  4. Remember the start position of the permalink
  5. Search for the closing "
  6. Use the std::strings substrfunction to create the output permalink string
  7. Write this to a file
  8. Goto 1.

I created a 70MB random test file with random data.

The whole procedure takes less than 1s. Even with slow linear search.

But caveat. You want to parse a HTML file. This will most probably not work, because of potential nested structures. For this you should use existing HTML parsers.

Anyway. Here is one of many possible solutions.

#include <iostream>
#include <fstream>
#include <string>
#include <random>
#include <iterator>
#include <algorithm>

std::string randomSourceCharacters{ " abcdefghijklmnopqrstuvwxyz" };
const std::string sourceFileName{ "r:\\test.txt" };
const std::string linkFileName{ "r:\\links.txt" };

void createRandomData() {
std::random_device randomDevice;
std::mt19937 randomGgenerator(randomDevice());
std::uniform_int_distribution<> randomCharacterDistribution(0, randomSourceCharacters.size() - 1);
std::uniform_int_distribution<> randomLength(10, 30);

if (std::ofstream ofs{ sourceFileName }; ofs) {

for (size_t i{}; i < 1000000; ++i) {

const int prefixLength{ randomLength(randomGgenerator) };
const int linkLength{ randomLength(randomGgenerator) };
const int suffixLength{ randomLength(randomGgenerator) };

for (int k{}; k < prefixLength; ++k)
ofs << randomSourceCharacters[randomCharacterDistribution(randomGgenerator)];
ofs << "data-permalink=\"";

for (int k{}; k < linkLength; ++k)
ofs << randomSourceCharacters[randomCharacterDistribution(randomGgenerator)];
ofs << "\"";
for (int k{}; k < suffixLength; ++k)
ofs << randomSourceCharacters[randomCharacterDistribution(randomGgenerator)];

}
}
else std::cerr << "\nError: Could not open source file '" << sourceFileName << "' for writing\n";
}

int main() {
// Please uncomment if you want to create a file with test data
// createRandomData();

// Open source file for reading and check, if file could be opened
if (std::ifstream ifs{ sourceFileName }; ifs) {

// Open link file for writing and check, if file could be opened
if (std::ofstream ofs{ linkFileName }; ofs) {

// Read the complete 50MB file into a string
std::string data(std::istreambuf_iterator<char>(ifs), {});

const std::string searchString{ "data-permalink=\"" };
const std::string permalinkEndString{ "\"" };

// Do a linear search
for (size_t posBegin{}; posBegin < data.length(); ) {

// Search for the begin of the permalink
if (posBegin = data.find(searchString, posBegin); posBegin != std::string::npos) {

const size_t posStartForEndSearch = posBegin + searchString.length() ;

// Search fo the end of the perma link
if (size_t posEnd = data.find(permalinkEndString, posStartForEndSearch); posEnd != std::string::npos) {

// Output result
const size_t lengthPermalink{ posEnd - posStartForEndSearch };
const std::string output{ data.substr(posStartForEndSearch, lengthPermalink) };
ofs << output << '\n';
posBegin = posEnd + 1;
}
else break;
}
else break;
}
}
else std::cerr << "\nError: Could not open source file '" << sourceFileName << "' for reading\n";
}
else std::cerr << "\nError: Could not open source file '" << sourceFileName << "' for reading\n";
}


Edit

If you need unique links you may store the result in an std::unordered_set and then output later.

#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <algorithm>
#include <unordered_set>

const std::string sourceFileName{ "r:\\test.txt" };
const std::string linkFileName{ "r:\\links.txt" };

int main() {

// Open source file for reading and check, if file could be opened
if (std::ifstream ifs{ sourceFileName }; ifs) {

// Open link file for writing and check, if file could be opened
if (std::ofstream ofs{ linkFileName }; ofs) {

// Read the complete 50MB file into a string
std::string data(std::istreambuf_iterator<char>(ifs), {});

const std::string searchString{ "data-permalink=\"" };
const std::string permalinkEndString{ "\"" };

// Here we will store unique results
std::unordered_set<std::string> result{};

// Do a linear search
for (size_t posBegin{}; posBegin < data.length(); ) {

// Search for the begin of the permalink
if (posBegin = data.find(searchString, posBegin); posBegin != std::string::npos) {

const size_t posStartForEndSearch = posBegin + searchString.length();

// Search fo the end of the perma link
if (size_t posEnd = data.find(permalinkEndString, posStartForEndSearch); posEnd != std::string::npos) {

// Output result
const size_t lengthPermalink{ posEnd - posStartForEndSearch };
const std::string output{ data.substr(posStartForEndSearch, lengthPermalink) };
result.insert(output);

posBegin = posEnd + 1;
}
else break;
}
else break;
}
for (const std::string& link : result)
ofs << link << '\n';

}
else std::cerr << "\nError: Could not open source file '" << sourceFileName << "' for reading\n";
}
else std::cerr << "\nError: Could not open source file '" << sourceFileName << "' for reading\n";
}


Related Topics



Leave a reply



Submit