用awk和sed快速将fasta格式的序列改成一行显示

楼主  收藏   举报   帖子创建时间:  2018-03-27 00:00 回复:0 关注量:201

Some time when you want to change the fasta seq into one line, just as following. Then it will help you to do other process.
I know that perl and other script will do that, however, I would introduce two simple and fast way do achieve that with awk and sed.


> sq1
foofoofoobar
foofoofoo
> sq2
quxquxquxbar
quxquxquxbar
quxx
> sq3
paxpaxpax
pax

> sq1 foofoofoobarfoofoofoo
> sq2 quxquxquxbarquxquxquxbarquxx
> sq3 paxpaxpaxpax

For awk:

awk '/^>/&&NR>1{print "";}{ printf "%s",/^>/ ? $0" ":$0 }' YourFile

For sed:

sed -n '1{x;d;x};${H;x;s/\n/ /1;s/\n//g;p;b};/^>/{x;s/\n/ /1;s/\n//g;p;b};H' YourFile

Today, I want to extract contig which is more 500bp from my aseembly result, So I do that as following:

sed -n '1{x;d;x};${H;x;s/\n/ /1;s/\n//g;p;b};/^>/{x;s/\n/ /1;s/\n//g;p;b};H' |awk '{if (length($5)>500 ) print ">contig-"FNR"\n"$5}'