Output File Not Created When Reading Sequences

Using grep with patternfile returns patterns (sequence names) not in patternfile

Use

grep -w -F -f patternfile.txt sequencefile.fastq > outputfile.txt

-w means to match the pattern only when it's surrounded by word boundaries. -F means to match fixed-text patterns, not regular expressions (this is probably not significant here, as your patterns don't seem to contain any characters that have special meaning, but it's good practice).

I suspect your pattern file contains a prefix of @NB501827:133:HMV5HAFX2:1:11101:26336:12921, so it's matching this line. The -w option will will prevent matching these prefixes.

Issue in reading frames sequence wise from folder in Python

Try this

files_list = os.listdir('/content/test1') # use your folderpath here 
files_list.sort(key=lambda f: int(re.sub('\D', '', f)))

for example,

files_list = ['frame0.jpg','frame1.jpg','frame10.jpg','frame100.jpg',
              'frame101.jpg','frame2.jpg','frame20.jpg','frame3.jpg']
files_list.sort(key=lambda name: int(re.sub('\D', '', name)))

Output from above

['frame0.jpg',
 'frame1.jpg',
 'frame2.jpg',
 'frame3.jpg',
 'frame10.jpg',
 'frame20.jpg',
 'frame100.jpg',
 'frame101.jpg']

How to store the output generated in python in a txt file

It is only saving the last sequence because on each iteration it is opening the file again and overwriting. You should open it before the loop.

But I'd recommend using writes or print "chevron" format. i.e.:

f = open('outputfile', 'w')
for .....
    print >>f,name, ' ', 'Sequence:',' ', Sequence,' ', 'Disorder:',' ', Disorder

Unable to read Hadoop Sequence files through stdin with a streaming python map-reduce on AWS

You need to provide SequenceFileAsTextInputFormat as the inputformat to hadoop streaming jar.

I have never used amazon aws mapreduce, but on a normal hadoop installation it would be done like this:

HADOOP=$HADOOP_HOME/bin/hadoop
$HADOOP jar $HADOOP_HOME/contrib/streaming/hadoop-*-streaming.jar \
  -input <input_directory>
  -output <output_directory> \
  -mapper "mapper.py" \
  -reducer "reducer.py" \
  -inputformat SequenceFileAsTextInputFormat

Reading a sequence of txt files

Try this.

all_files =  { 0: 'BDS00001', 1: 'BDS00002'}

for k,v in all_files.items():
    all_files[k]=v+'.txt'

print(all_files) # {0: 'BDS00001.txt', 1: 'BDS00002.txt'}

#________________________READING FILE________________________#

for a in all_files:
    with open(a) as data:
        print(data.read())