A twoBit file is a highly efficient way to store genomic sequence. The format is defined
here. Note that lower-case
nucleotides are considered masked in twoBit, which can cause such sequence to be ignored when
using the -mask
option with gfServer
; therefore, you may wish to
convert lower-case sequence to upper-case when preparing the FASTA format.
To complete the steps below you must first download the faToTwoBit
,
twoBitInfo
, and twoBitToFa
utilities. For more information on downloading
our command-line utilities, see these
instructions.
To create a twoBit file, follow these steps:
faToTwoBit
program on your FASTA file:
faToTwoBit genome.fa genome.2bit
twoBitInfo
to verify the sequences in this assembly and create a chrom.sizes
file, which is useful to construct the big* files in later processing steps: twoBitInfo genome.2bit stdout | sort -k2rn > genome.chrom.sizes
The twoBit commands can function with the .2bit file as a URL:
twoBitInfo -udcDir=. http://your-website.edu/~user/genome.2bit | sort -k2nr > genome.chrom.sizes
Sequence can be extracted from the .2bit file with the twoBitToFa
command, for example:
twoBitToFa -seq=chr1 -udcDir=. http://your-website.edu/~user/genome.2bit stdout > genome.chr1.fa