How to find files with multiple occurrences of a specific string
@echo off
for %%a in (*.txt) do (
for /f %%b in ('type "%%a"^|find /c "myword"') do (
if %%b geq 2 echo %%a [actual count: %%b]
)
)
Notes:
find /c
doesn't count occurrences of a string, but lines that contain that word (one or several times) which isn't the same but might be good enough for you.- you might want to
find /i /c
to make it case insensitive (finding "myword" as well as "MyWord") echo %%a [actual count: %%b]
is for troubleshooting only, you want to replace it with thecopy
command in your final code.
Linux: Extract values with one instance of record in file
Try with:
uniq -u input_file
From uniq
manual:
-u, --unique
only print unique lines
read and get specific int line of the text file into variable on python
You can simply iterate through the lines, and assign the line to a previous created string variable, if the condition is met, that the string occurs in a line of the file:
a_file = open('file.txt')
string = 'a word'
res_line = ""
for line in a_file:
if string in line:
res_line = line
print(res_line)
a_file.close()
you could also create a list which contains every line in that the string occurs:
a_file = open('file.txt')
string = 'a word'
res = [line for line in a_file if string in line]
print(res)
a_file.close()
Find the lines in the text file
Not tested but It should work
def index(filename, words):
with open(filename, "r") as f:
for i, line in enumerate(f):
for word in words:
if word in line:
return "%s at line %i" % (word, i + 1)
print index("some_filename", ["word1", "word2"])
Or to avoid nested for loop :
def index(filename, words):
with open(filename, "r") as f:
for line, word in itertools.product(enumerate(f), words):
if word in line[1]:
return "%s at line %i" % (word, line[0] + 1)
print index("some_filename", ["word1", "word2"])
And using list comprehension :
def index(filename, words):
with open(filename, "r") as f:
return "\n".join("%s at line %i" % (word, line[0] + 1) for line, word in itertools.product(enumerate(f), words) if word in line[1])
print index("some_filename", ["word1", "word2"])
Reading certain letters after a specified string from a text file
Hm, basically 50MB is considered "small" nowadays. With taht small data, you can read the whole file into one std::string
and then do a linear search.
So, the algorithm is:
- Open files and check, if they could be opened
- Read complete file into a
std::string
- Do a linear search for the string "data-permalink=""
- Remember the start position of the permalink
- Search for the closing "
- Use the
std::string
ssubstr
function to create the output permalink string - Write this to a file
- Goto 1.
I created a 70MB random test file with random data.
The whole procedure takes less than 1s. Even with slow linear search.
But caveat. You want to parse a HTML file. This will most probably not work, because of potential nested structures. For this you should use existing HTML parsers.
Anyway. Here is one of many possible solutions.
#include <iostream>
#include <fstream>
#include <string>
#include <random>
#include <iterator>
#include <algorithm>
std::string randomSourceCharacters{ " abcdefghijklmnopqrstuvwxyz" };
const std::string sourceFileName{ "r:\\test.txt" };
const std::string linkFileName{ "r:\\links.txt" };
void createRandomData() {
std::random_device randomDevice;
std::mt19937 randomGgenerator(randomDevice());
std::uniform_int_distribution<> randomCharacterDistribution(0, randomSourceCharacters.size() - 1);
std::uniform_int_distribution<> randomLength(10, 30);
if (std::ofstream ofs{ sourceFileName }; ofs) {
for (size_t i{}; i < 1000000; ++i) {
const int prefixLength{ randomLength(randomGgenerator) };
const int linkLength{ randomLength(randomGgenerator) };
const int suffixLength{ randomLength(randomGgenerator) };
for (int k{}; k < prefixLength; ++k)
ofs << randomSourceCharacters[randomCharacterDistribution(randomGgenerator)];
ofs << "data-permalink=\"";
for (int k{}; k < linkLength; ++k)
ofs << randomSourceCharacters[randomCharacterDistribution(randomGgenerator)];
ofs << "\"";
for (int k{}; k < suffixLength; ++k)
ofs << randomSourceCharacters[randomCharacterDistribution(randomGgenerator)];
}
}
else std::cerr << "\nError: Could not open source file '" << sourceFileName << "' for writing\n";
}
int main() {
// Please uncomment if you want to create a file with test data
// createRandomData();
// Open source file for reading and check, if file could be opened
if (std::ifstream ifs{ sourceFileName }; ifs) {
// Open link file for writing and check, if file could be opened
if (std::ofstream ofs{ linkFileName }; ofs) {
// Read the complete 50MB file into a string
std::string data(std::istreambuf_iterator<char>(ifs), {});
const std::string searchString{ "data-permalink=\"" };
const std::string permalinkEndString{ "\"" };
// Do a linear search
for (size_t posBegin{}; posBegin < data.length(); ) {
// Search for the begin of the permalink
if (posBegin = data.find(searchString, posBegin); posBegin != std::string::npos) {
const size_t posStartForEndSearch = posBegin + searchString.length() ;
// Search fo the end of the perma link
if (size_t posEnd = data.find(permalinkEndString, posStartForEndSearch); posEnd != std::string::npos) {
// Output result
const size_t lengthPermalink{ posEnd - posStartForEndSearch };
const std::string output{ data.substr(posStartForEndSearch, lengthPermalink) };
ofs << output << '\n';
posBegin = posEnd + 1;
}
else break;
}
else break;
}
}
else std::cerr << "\nError: Could not open source file '" << sourceFileName << "' for reading\n";
}
else std::cerr << "\nError: Could not open source file '" << sourceFileName << "' for reading\n";
}
Edit
If you need unique links you may store the result in an std::unordered_set
and then output later.
#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <algorithm>
#include <unordered_set>
const std::string sourceFileName{ "r:\\test.txt" };
const std::string linkFileName{ "r:\\links.txt" };
int main() {
// Open source file for reading and check, if file could be opened
if (std::ifstream ifs{ sourceFileName }; ifs) {
// Open link file for writing and check, if file could be opened
if (std::ofstream ofs{ linkFileName }; ofs) {
// Read the complete 50MB file into a string
std::string data(std::istreambuf_iterator<char>(ifs), {});
const std::string searchString{ "data-permalink=\"" };
const std::string permalinkEndString{ "\"" };
// Here we will store unique results
std::unordered_set<std::string> result{};
// Do a linear search
for (size_t posBegin{}; posBegin < data.length(); ) {
// Search for the begin of the permalink
if (posBegin = data.find(searchString, posBegin); posBegin != std::string::npos) {
const size_t posStartForEndSearch = posBegin + searchString.length();
// Search fo the end of the perma link
if (size_t posEnd = data.find(permalinkEndString, posStartForEndSearch); posEnd != std::string::npos) {
// Output result
const size_t lengthPermalink{ posEnd - posStartForEndSearch };
const std::string output{ data.substr(posStartForEndSearch, lengthPermalink) };
result.insert(output);
posBegin = posEnd + 1;
}
else break;
}
else break;
}
for (const std::string& link : result)
ofs << link << '\n';
}
else std::cerr << "\nError: Could not open source file '" << sourceFileName << "' for reading\n";
}
else std::cerr << "\nError: Could not open source file '" << sourceFileName << "' for reading\n";
}
Related Topics
Sed Error:Bad Option in Substitution Expression
How to Print a Number in Arm Assembly
Does 'Dash' Support 'Bash' Style Arrays
Docker Container Accessible Only via Cloudflare Cdn (Selected Ip Ranges)
Temporarily Prevent Linux from Shutting Down
Bash Script to Run a Constant Number of Jobs in the Background
How to Run a Linux Command That Still Runs After I Close My Putty Ssh Session
Question About File Seeking Position
Getting Github Files (And Updates) Onto an Ubuntu Web Server
How to Timeout a Group of Commands in Bash
Low-Overhead Way to Access the Memory Space of a Traced Process
Linux Kernel API Changes/Additions
How to Change the Watchdog Timer in Linux Embedded
Find Command to Find Files and Concatenate Them