AMOS¶
AMOS is a file format that is similar to any assembly file format such as ACE or SAM. It contains information about each read that is used to assemble each contig.
The format is broken into different message blocks. For the Ray assembler, it produces an AMOS file that is broken into 3 types of message blocks
RED
{RED iid:\d+ eid:\d+ seq: [ATGC]+ . qlt: [A-Z]+ }
- iid
Integer identifier
- eid
Same as iid?
- seq
Sequence data
- qlt
Should be quality, but is only a series of D’s from Ray assembler
TLE
{TLE src:\d+ off:\d+ clr:\d+,\d+ }
- src
RED iid that was used
- off
One would think offset, but unsure what it actually means
- clr
Not sure what this is either
CTG
{CTG iid:\d+ eid:\w+ com: .*$ . seq: [ATGC]+ . qlt: [A-Z]+ . {TLE ... } }
- iid
integer id of contig
- eid
contig name
- com
Communication software that generated this contig
- seq
Contig sequence data
- qlt
Supposed to be contig quality data, but for Ray it only produces D’s
- TLE
0 or more TLE blocks that represent RED sequences that compose the contig
Parsing¶
bio_bits contains an interface to parse a given file handle that has been opened on an AMOS file.
To read in the AMOS file you simply do the following
from bio_bits import amos
a = None
with open('AMOS.afg') as fh:
a = amos.AMOS(fh)
CTG¶
To get information about the contigs(CTG) you can access the .ctgs
attribute.
The contigs are indexed based on their iid so to get the sequence of contig iid 1
you would do the following:
ctg = a.ctgs[1]
seq = ctg.seq
To retrieve all the reads(RED) that belong to a specific contig:
reads = []
for tle in ctg.tlelist:
reads.append(a.reds[tle.src])
RED¶
To get information about the reads(RED) you can access the .reds
attribute.
The reds are indexed based on their iid so to get the sequence of red iid 1 you
would do the following:
red = a.reds[1]
seq = red.seq
If you want to convert a RED entry into anything you can use the .format
method. The .format
method allows you to utilize any of the properties of
a RED object such as .iid
, .eid
, .seq
, .qlt
. You can see in
the examples below how to do this.
Examples¶
Here is an example of how to convert all RED blocks into a single fastq file
from bio_bits import amos
# Fastq format string
fastq_fmt = '@{iid}\n{seq}\n+\n{qlt}'
with open('amos.fastq','w') as fh_out:
with open('AMOS.afg') as fh_in:
for iid, red in amos.AMOS(fh_in).reds.items():
fq = red.format(fastq_fmt)
fh_out.write(fq + '\n')