Why is my Bash script adding feff to the beginning of files?
U+FEFF is the code point for a byte order mark. Your files most likely contain data saved in UTF-16 and the BOM has been corrupted by your 'cleaning process' which is most likely expecting ASCII. It's probably not a good idea to remove the BOM, but instead to fix your scripts to not corrupt it in the first place.
Add more hash to existing hash script in multiple files
You may use
sed -i 's/\bmd5([^()]*)/sha1(&)/gI' *.php
The POSIX BRE expression matches:
\b
- a word boundarymd5(
- anmd5(
substring[^()]*
- 0 or more chars other than(
and)
)
- a)
char.
The sha1(&)
replacement pattern replaces the match woth sha1(
, then the match value, and then )
.
See the online demo:
s='some query part....,MD5('"'"'$pass'"'"'),.....some query part
some query part....,md5('"'"'$pass'"'"'),.....some query part'
sed 's/\bmd5([^()]*)/sha1(&)/gI' <<< "$s"
Output:
some query part....,sha1(MD5('$pass')),.....some query part
some query part....,sha1(md5('$pass')),.....some query part
0xEF,0xBB,0xBF character showing up in files. How to remove them?
perl -pi~ -CSD -e 's/^\x{fffe}//' file1.js path/to/file2.js
I would assume the tool will break if you have other utf-8 in your files, but if not, perhaps this workaround can help you. (Untested ...)
Edit: added the -CSD
option, as per tchrist's comment.
remove feff from a file
Your input file has BOM (byte-order mark) characters, and Python doesn't strip them automatically when file is encoded in utf8. See: Reading Unicode file data with BOM chars in Python
>>> s = '\xef\xbb\xbfABC'
>>> s.decode('utf8')
u'\ufeffABC'
>>> s.decode('utf-8-sig')
u'ABC'
So for your specific case, try something like
from io import StringIO
s = StringIO(open(csvFile).read().decode('utf-8-sig'))
csvData = csv.reader(s)
Very terrible style, but that script is a hacked together script anyway for a one-shot job.
How to pass a list of files to bash script with less code
Here is an example by using arrays:
NAMES=( "jeff" "david" "kenny" "randy" )
for NAME in ${NAMES[@]}; do
# Do something with NAME
echo "${NAME}"
done
And here https://linuxize.com/post/bash-arrays/#:~:text=Bash%20supports%20one-dimensional%20numerically,1%20references%20the%20last%20element the documentation.
How am I suppposed to handle the BOM while text processing using sys.stdin in Python 3?
As a complement to the existing answer, it is possible to filter the UTF8 BOM from stdin with the codecs module. Simply you must use sys.stdin.buffer
to access the underlying byte stream and decode it with a StreamReader
import sys
import codecs
# trick to process sys.stdin with a custom encoding
fin = codecs.getreader('utf_8_sig')(sys.stdin.buffer, errors='replace')
if '-i' in sys.argv: # For command line option "-i <infile>"
fin = open(sys.argv[sys.argv.index('-i') + 1], 'rt',
encoding='utf_8_sig', errors='replace')
for line in fin:
...Processing here...
How can I re-add a unicode byte order marker in linux?
Something like (backup first)):
for i in $(ls *.sql)
do
cp "$i" "$i.temp"
printf '\xFF\xFE' > "$i"
cat "$i.temp" >> "$i"
rm "$i.temp"
done
Related Topics
Not Authorized for Query on Admin.System.Namespaces on Mongodb
Possible to Use a .Dll on Linux
How to Communicate with a Linux Kernel Module from User Space Without Littering /Dev with New Nodes
Cmake Doesn't Know Where Is Qt4 Qmake
Ssh: Could Not Resolve Hostname [Hostname]: Nodename Nor Servname Provided, or Not Known
How to Start a Shell Without Any User Configuration
Create New File But Add Number If Filename Already Exists in Bash
Cmake_Prefix_Path Doesn't Help Cmake in Finding Qt5
Installing Jenkins Plugins to Docker Jenkins
Why Is There No Directx API for Linux
Undefined Reference to 'Clock_Gettime' Although '-Lrt' Is Given
Add User to Group But Not Reflected When Run "Id"
How to Show Line Number When Executing Bash Script
Export Not Working in My Shell Script