Remove Match Word from File

Remove match word from file

try this:

sed 's/\s*cmd//' File1.txt

explanation

s/   # substitute
\s*  # ignore blanks
cmd  # your pattern
//   # replace with nothing

remove only matching word from file in java

To remove only 'medical_data' you need to change this code:

while ((currentLine = reader.readLine()) != null) {
            currentLine = currentLine.replace(delete,"");
            if (currentLine.equals("")) {
            } else
                writer.println(currentLine);
        }

to this code:

while ((currentLine = reader.readLine()) != null) {
            if (!currentLine.equals(delete)) {              
                writer.println(currentLine);
            }
        }

Then the output file looks like this:

medical_data01
medical_data02
Census_data_10000_k.gen
Census_data_10000_k.gen_01
Census_data_10000_k.gen_02

How to delete from a text file, all lines that contain a specific string?

To remove the line and print the output to standard out:

sed '/pattern to match/d' ./infile

To directly modify the file – does not work with BSD sed:

sed -i '/pattern to match/d' ./infile

Same, but for BSD sed (Mac OS X and FreeBSD) – does not work with GNU sed:

sed -i '' '/pattern to match/d' ./infile

To directly modify the file (and create a backup) – works with BSD and GNU sed:

sed -i.bak '/pattern to match/d' ./infile

Deleting exact match String from text file

If you are using Java 8, You can do something like below using java.nio package:

Path p = Paths.get("PATH-TO-FILE");
List<String> lines = Files.lines(p)
              .map(str -> str.replaceFirst("STRING-TO-DELETE",""))
              .filter(str -> !str.equals(""))
              .collect(Collectors.toList());
Files.write(p, lines, StandardCharsets.UTF_8);

Remove words found in the second file

You may use this gnu awk command:

awk -v RS='[[:space:]]+' 'FNR == NR {seen[$1]; next} !($1 in seen) {ORS=RT; print}' remove.txt corpus.txt

On a 450MB remove.txt file above awk command took 1 min 16 sec to complete.

To make it more readable:

awk -v RS='[[:space:]]+' 'FNR == NR {
   seen[$1]
   next
}
!($1 in seen) {
   ORS = RT
   print
}' remove.txt corpus.txt

Earlier Solution: Using a single gnu sed script:

sed -f <(sed 's~.*~s/ *\\<&\\> *//~' remove.txt) corpus.txt

this is amessage to check ifwords are removed correctly. The second line may or may not havewords. The third line also need not be as clean as first and second line.
There can be paragraphs in the text corpus and the entire file should be checked for.

“sed” command to remove a line that matches an exact string on first word

The . is metacharacter in regex which means "Match any one character". So you accidentally created a regex that will also catch cnnPcom or cnn com or cnn\com. While it probably works for your needs, it would be better to be more explicit:

  sed -r '/^cnn\.com\b/d' raw.txt

The difference here is the \ backslash before the . period. That escapes the period metacharacter so it's treated as a literal period.

As for your lines that start with a space, you can catch those in a single regex (Again escaping the period metacharacter):

  sed -r '/(^[ ]*|^)127\.0\.0\.1\b/d' raw.txt

This (^[ ]*|^) says a line that starts with any number of repeating spaces ^[ ]* OR | starts with ^ which is then followed by your match for 127.0.0.1.

And then for stringing these together you can use the | OR operator inside of parantheses to catch all of your matches:

  sed -r '/(^[ ]*|^)(127\.0\.0\.1|cnn\.com|0\.0\.0\.0)\b/d' raw.txt

Alternatively you can use a ; semicolon to separate out the different regexes:

  sed -r '/(^[ ]*|^)127\.0\.0\.1\b/d; /(^[ ]*|^)cnn\.com\b/d; /(^[ ]*|^)0\.0\.0\.0\b/d;' raw.txt

Python Regex - remove words containing : from file

This might help. Removes all string which has ":" in it.

a = ":raining, raining:, rai:ning  aaaaaaa"
def removeStr(val):
    if ":" not in val:
        return val

each_line = " ".join(filter(removeStr, a.split()))
print each_line

Output:

aaaaaaa

Remove specific word from field

With shown samples could you please try following.

awk '
match($0,/0 ,\(\(/){
  val=substr($0,RSTART,RLENGTH)
  sub(/.*,/,"",val)
  print substr($0,1,RSTART-1) val substr($0,RSTART+RLENGTH)
  val=""
  next
}
1
'  Input_file

Explanation: Adding detailed explanation for above.

awk '                               ##Starting awk program from here.
match($0,/0 ,\(\(/){                ##Using match function to match 0 space (( in line.
  val=substr($0,RSTART,RLENGTH)     ##creating val which has sub string of matched regex.
  sub(/.*,/,"",val)                 ##Substituting everything till comma with NULL in val.
  print substr($0,1,RSTART-1) val substr($0,RSTART+RLENGTH)  ##Printing sub string val and rest of line sub string here.
  val=""                            ##Nullifying val here.
  next                              ##next will skip all statements from here.
}
1                                   ##will print the current line here.
' Input_file                        ##Mentioning Input_file name here.

File Handling in C - Removing specific words from a list in text file

You open two files: the one you've got (for reading) and a new one (for
writing).
You loop through the first file reading each line in turn.
You compare the contents of each line with the words you need to
delete.
If the line does not match any of the deletion words, then
you write it to the new file.

If the manipulation that you need to do is much more complex then you can literally "read it into memory" using mmap(), but that is a more advanced technique; you need to treat the file as a byte array with no zero terminator and there are lots of ways to mess that up.

Remove Match Word from File