Why Is My Bash Script Adding <Feff> to the Beginning of Files

Why is my Bash script adding feff to the beginning of files?

U+FEFF is the code point for a byte order mark. Your files most likely contain data saved in UTF-16 and the BOM has been corrupted by your 'cleaning process' which is most likely expecting ASCII. It's probably not a good idea to remove the BOM, but instead to fix your scripts to not corrupt it in the first place.

Add more hash to existing hash script in multiple files

You may use

sed -i 's/\bmd5([^()]*)/sha1(&)/gI' *.php

The POSIX BRE expression matches:

  • \b - a word boundary
  • md5( - an md5( substring
  • [^()]* - 0 or more chars other than ( and )
  • ) - a ) char.

The sha1(&) replacement pattern replaces the match woth sha1(, then the match value, and then ).

See the online demo:

s='some query part....,MD5('"'"'$pass'"'"'),.....some query part
some query part....,md5('"'"'$pass'"'"'),.....some query part'
sed 's/\bmd5([^()]*)/sha1(&)/gI' <<< "$s"

Output:

some query part....,sha1(MD5('$pass')),.....some query part
some query part....,sha1(md5('$pass')),.....some query part

0xEF,0xBB,0xBF character showing up in files. How to remove them?

perl -pi~ -CSD -e 's/^\x{fffe}//' file1.js path/to/file2.js

I would assume the tool will break if you have other utf-8 in your files, but if not, perhaps this workaround can help you. (Untested ...)

Edit: added the -CSD option, as per tchrist's comment.

remove feff from a file

Your input file has BOM (byte-order mark) characters, and Python doesn't strip them automatically when file is encoded in utf8. See: Reading Unicode file data with BOM chars in Python

>>> s = '\xef\xbb\xbfABC'
>>> s.decode('utf8')
u'\ufeffABC'
>>> s.decode('utf-8-sig')
u'ABC'

So for your specific case, try something like

from io import StringIO
s = StringIO(open(csvFile).read().decode('utf-8-sig'))
csvData = csv.reader(s)

Very terrible style, but that script is a hacked together script anyway for a one-shot job.

How to pass a list of files to bash script with less code

Here is an example by using arrays:

NAMES=( "jeff" "david" "kenny" "randy" ) 

for NAME in ${NAMES[@]}; do
# Do something with NAME
echo "${NAME}"
done

And here https://linuxize.com/post/bash-arrays/#:~:text=Bash%20supports%20one-dimensional%20numerically,1%20references%20the%20last%20element the documentation.

How am I suppposed to handle the BOM while text processing using sys.stdin in Python 3?

As a complement to the existing answer, it is possible to filter the UTF8 BOM from stdin with the codecs module. Simply you must use sys.stdin.buffer to access the underlying byte stream and decode it with a StreamReader

import sys
import codecs

# trick to process sys.stdin with a custom encoding
fin = codecs.getreader('utf_8_sig')(sys.stdin.buffer, errors='replace')

if '-i' in sys.argv: # For command line option "-i <infile>"
fin = open(sys.argv[sys.argv.index('-i') + 1], 'rt',
encoding='utf_8_sig', errors='replace')

for line in fin:
...Processing here...

How can I re-add a unicode byte order marker in linux?

Something like (backup first)):

for i in $(ls *.sql)
do
cp "$i" "$i.temp"
printf '\xFF\xFE' > "$i"
cat "$i.temp" >> "$i"
rm "$i.temp"
done


Related Topics



Leave a reply



Submit