Using grep with patternfile returns patterns (sequence names) not in patternfile
Use
grep -w -F -f patternfile.txt sequencefile.fastq > outputfile.txt
-w
means to match the pattern only when it's surrounded by word boundaries. -F
means to match fixed-text patterns, not regular expressions (this is probably not significant here, as your patterns don't seem to contain any characters that have special meaning, but it's good practice).
I suspect your pattern file contains a prefix of @NB501827:133:HMV5HAFX2:1:11101:26336:12921
, so it's matching this line. The -w
option will will prevent matching these prefixes.
Issue in reading frames sequence wise from folder in Python
Try this
files_list = os.listdir('/content/test1') # use your folderpath here
files_list.sort(key=lambda f: int(re.sub('\D', '', f)))
for example,
files_list = ['frame0.jpg','frame1.jpg','frame10.jpg','frame100.jpg',
'frame101.jpg','frame2.jpg','frame20.jpg','frame3.jpg']
files_list.sort(key=lambda name: int(re.sub('\D', '', name)))
Output from above
['frame0.jpg',
'frame1.jpg',
'frame2.jpg',
'frame3.jpg',
'frame10.jpg',
'frame20.jpg',
'frame100.jpg',
'frame101.jpg']
How to store the output generated in python in a txt file
It is only saving the last sequence because on each iteration it is opening the file again and overwriting. You should open it before the loop.
But I'd recommend using writes or print "chevron" format. i.e.:
f = open('outputfile', 'w')
for .....
print >>f,name, ' ', 'Sequence:',' ', Sequence,' ', 'Disorder:',' ', Disorder
Unable to read Hadoop Sequence files through stdin with a streaming python map-reduce on AWS
You need to provide SequenceFileAsTextInputFormat
as the inputformat
to hadoop streaming jar.
I have never used amazon aws mapreduce, but on a normal hadoop installation it would be done like this:
HADOOP=$HADOOP_HOME/bin/hadoop
$HADOOP jar $HADOOP_HOME/contrib/streaming/hadoop-*-streaming.jar \
-input <input_directory>
-output <output_directory> \
-mapper "mapper.py" \
-reducer "reducer.py" \
-inputformat SequenceFileAsTextInputFormat
Reading a sequence of txt files
Try this.
all_files = { 0: 'BDS00001', 1: 'BDS00002'}
for k,v in all_files.items():
all_files[k]=v+'.txt'
print(all_files) # {0: 'BDS00001.txt', 1: 'BDS00002.txt'}
#________________________READING FILE________________________#
for a in all_files:
with open(a) as data:
print(data.read())
Related Topics
/Var/Log/Daemon.Log Taking More Space How to Reduce It
Rsync, 'Uid/Gid Impossible to Set' Cases Cause Future Hard Link Failure, How to Fix
I'm Having Difficulty Understanding the Shellshock Vulnerability Verification
Replace Bash Variables in Template File
Usb Modem Is Echoing Back Wrong Characters
Old Logs Are Not Imported into Es by Logstash
Apache Proxypass Not Loading Resources
How to Scale Ejabberd Server MAChine on Centos to Handle 200 K Connections
Error Cl_Device_Not_Available When Calling Clcreatecontext (Intel Core2Duo, Intel Ocl Sdk 3.0 Beta)
Limit Output of All Linux Commands
X86_64 Assembly Execve *Char[] Syscall
Re-Encoding Only Images of a PDF? (Or, Ghostscript Fails on 8-Bit Rgb While Optimizing)
Calling Printf from Assembly Language on 64Bit and 32Bit Architecture Using Nasm