Remove match word from file
try this:
sed 's/\s*cmd//' File1.txt
explanation
s/ # substitute
\s* # ignore blanks
cmd # your pattern
// # replace with nothing
remove only matching word from file in java
To remove only 'medical_data' you need to change this code:
while ((currentLine = reader.readLine()) != null) {
currentLine = currentLine.replace(delete,"");
if (currentLine.equals("")) {
} else
writer.println(currentLine);
}
to this code:
while ((currentLine = reader.readLine()) != null) {
if (!currentLine.equals(delete)) {
writer.println(currentLine);
}
}
Then the output file looks like this:
medical_data01
medical_data02
Census_data_10000_k.gen
Census_data_10000_k.gen_01
Census_data_10000_k.gen_02
How to delete from a text file, all lines that contain a specific string?
To remove the line and print the output to standard out:
sed '/pattern to match/d' ./infile
To directly modify the file – does not work with BSD sed:
sed -i '/pattern to match/d' ./infile
Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:
sed -i '' '/pattern to match/d' ./infile
To directly modify the file (and create a backup) – works with BSD and GNU sed:
sed -i.bak '/pattern to match/d' ./infile
Deleting exact match String from text file
If you are using Java 8, You can do something like below using java.nio
package:
Path p = Paths.get("PATH-TO-FILE");
List<String> lines = Files.lines(p)
.map(str -> str.replaceFirst("STRING-TO-DELETE",""))
.filter(str -> !str.equals(""))
.collect(Collectors.toList());
Files.write(p, lines, StandardCharsets.UTF_8);
Remove words found in the second file
You may use this gnu awk
command:
awk -v RS='[[:space:]]+' 'FNR == NR {seen[$1]; next} !($1 in seen) {ORS=RT; print}' remove.txt corpus.txt
On a 450MB remove.txt
file above awk
command took 1 min 16 sec
to complete.
To make it more readable:
awk -v RS='[[:space:]]+' 'FNR == NR {
seen[$1]
next
}
!($1 in seen) {
ORS = RT
print
}' remove.txt corpus.txt
Earlier Solution: Using a single gnu sed
script:
sed -f <(sed 's~.*~s/ *\\<&\\> *//~' remove.txt) corpus.txt
this is amessage to check ifwords are removed correctly. The second line may or may not havewords. The third line also need not be as clean as first and second line.
There can be paragraphs in the text corpus and the entire file should be checked for.
“sed” command to remove a line that matches an exact string on first word
The .
is metacharacter in regex which means "Match any one character". So you accidentally created a regex that will also catch cnnPcom
or cnn com
or cnn\com
. While it probably works for your needs, it would be better to be more explicit:
sed -r '/^cnn\.com\b/d' raw.txt
The difference here is the \
backslash before the .
period. That escapes the period metacharacter so it's treated as a literal period.
As for your lines that start with a space, you can catch those in a single regex (Again escaping the period metacharacter):
sed -r '/(^[ ]*|^)127\.0\.0\.1\b/d' raw.txt
This (^[ ]*|^)
says a line that starts with any number of repeating spaces ^[ ]*
OR |
starts with ^
which is then followed by your match for 127.0.0.1
.
And then for stringing these together you can use the |
OR operator inside of parantheses to catch all of your matches:
sed -r '/(^[ ]*|^)(127\.0\.0\.1|cnn\.com|0\.0\.0\.0)\b/d' raw.txt
Alternatively you can use a ;
semicolon to separate out the different regexes:
sed -r '/(^[ ]*|^)127\.0\.0\.1\b/d; /(^[ ]*|^)cnn\.com\b/d; /(^[ ]*|^)0\.0\.0\.0\b/d;' raw.txt
Python Regex - remove words containing : from file
This might help. Removes all string which has ":"
in it.
a = ":raining, raining:, rai:ning aaaaaaa"
def removeStr(val):
if ":" not in val:
return val
each_line = " ".join(filter(removeStr, a.split()))
print each_line
Output:
aaaaaaa
Remove specific word from field
With shown samples could you please try following.
awk '
match($0,/0 ,\(\(/){
val=substr($0,RSTART,RLENGTH)
sub(/.*,/,"",val)
print substr($0,1,RSTART-1) val substr($0,RSTART+RLENGTH)
val=""
next
}
1
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/0 ,\(\(/){ ##Using match function to match 0 space (( in line.
val=substr($0,RSTART,RLENGTH) ##creating val which has sub string of matched regex.
sub(/.*,/,"",val) ##Substituting everything till comma with NULL in val.
print substr($0,1,RSTART-1) val substr($0,RSTART+RLENGTH) ##Printing sub string val and rest of line sub string here.
val="" ##Nullifying val here.
next ##next will skip all statements from here.
}
1 ##will print the current line here.
' Input_file ##Mentioning Input_file name here.
File Handling in C - Removing specific words from a list in text file
- You open two files: the one you've got (for reading) and a new one (for
writing). - You loop through the first file reading each line in turn.
- You compare the contents of each line with the words you need to
delete. - If the line does not match any of the deletion words, then
you write it to the new file.
If the manipulation that you need to do is much more complex then you can literally "read it into memory" using mmap(), but that is a more advanced technique; you need to treat the file as a byte array with no zero terminator and there are lots of ways to mess that up.
Related Topics
Interprocess Communication Using Pipe in Linux
How to Measure Separate CPU Core Usage for a Process
List Files Over a Specific Size in Current Directory and All Subdirectories
How to Find a File/Directory That Could Be Anywhere on Linux Command Line
How to Hide Wget Output in Linux
How to Open Sublime Text 2 Files from the Command Line in Linux to a Tab, Not a New Window
Docker: Are You Trying to Connect to a Tls-Enabled Daemon Without Tls
Xdg Basedir Directories for Windows
Openssh Client Hangs on Logout When Forwarding X Connections
Error: "Grep: Argument List Too Long"
How to Get a List of Programs Running with Nohup
I Want to Contribute to the Linux Kernel
How to Get Hostname from Ip (Linux)
What Are My Environment Variables
Install Zsh Without Root Access
"Docker Images" Shows Image, "Docker Rmi" Says "No Such Image" or "Reference Does Not Exist"