The problem with FTP

Using FTP to upload or download large file sets to/from here (UC Davis in northern California) to NCBI (on the east coast) is nearly an impossible task. It takes a long time to get connect if at all, once download is going, you may only see speeds of a few hundred KB/s and it dies constantly.

 

Aspire Connect to the rescue!

NCBI (NIH) recently offers another service using Aspera to download and upload files there. This NIH webpage

http://www.ncbi.nlm.nih.gov/public/

will give you a link to download an aspera plugin called Aspera Connect for your web browser as well as FTP sites now available for download using Aspera.

The Aserpa plugin download during Nov. 2010 for Linux is at version 2.4.5.30478 can be installed:

sh aspera-connect-2.4.5.30478-linux-64.sh

Along with the browse plugin, there is also a command utility called ascp, which is installed in

$HOME/.aspera/connect

You can use ascp to download NCBI files with this command

ascp -i ~/.aspera/connect/etc/asperaweb_id_dsa.putty -Q -l100m anonftp@ftp-private.ncbi.nlm.nih.gov:/1GB .

This will download a test file called 1GB (1GB in size, I guess).

You can also upload your SRA and other NCBI submission using the ascp command, this NCBI document explains the process that you need to follow:

1. Generate and save a putty (mainly a Windows ssh utility, but also available in Linux) ssh public/private key pair.

2. Save a copy of public key in OpenSSH format and send this OpenSSH public key to NCBI sra@ncbi.nlm.nih.gov along with your Center Name (UCDBinfo for us).

Apparently, when you generate keys in Windows puttygen, the OpenSSH public key is displayed on top and do not forget to move the mouse around (to gather some random entropy) during key generation using puttygen, otherwise, puttygen won’t do anything. Once NCBI enables your key, you may upload your files.

Uploading files to NCBI (uses the “incoming” folder):

ascp -i ucdbioinfo.ppk -QTr <file to transfer> -l 300M asp-ucdbioinfo@upload.ncbi.nlm.nih.gov:incoming/

Note that ascp will continue for a few minutes after the file has been fully transferred. Just let it run until it finishes.

ucdbioinfo.rsa is our SSH RSA private key. Remember to use asp-ucdbioinfo (all lower case) as our account name.

Checking NCBI files under your account

ssh -i ucdbioinfo-id_rsa asp-ucdbioinfo@upload.ncbi.nlm.nih.gov

Use OpenSSH private key here. According to NCBI document, the follow command will work for putty or

putty -i ucdbioinfo.ppk asp-ucdbioinfo@upload.ncbi.nlm.nih.gov

This fails to work on my ubuntu 10.04 system, apparently, that version of Linux putty does not have the -i and username@host options. I had to invoke putty without any argument and configure a profile for NCBI, then it work just fine! This is an account with limit command functionality (ls, mv, rm seem to work, no “cd” command)

This mixing of OpenSSH and putty SSH private/public key will probably confuse a large number of non-ssh savvy souls out there. It is probably a design decision based on the fact that the majority of NCBI users use Windows operating system, hence, putty was chosen for ascp authentication, on the other hand, the NCBI server runs OpenSSH daemon on Linux platform (SUSE Linux).

Notes on puttygen and ascp in Linux

To generate a putty private key:

puttygen -O private -t rsa -b 1024 -o ucdbioinfo.ppk

To generate a open-ssh public key from the private key:

puttygen ucdbioinfo.ppk -O public-openssh -o ucdbioinfo-id_rsa.pub

To convert a putty private key to an open-ssh private key:

puttygen ucdbioinfo.ppk -O private-openssh -o ucdbioinfo-id_rsa

To convert an openssh private key to a putty private key:

puttygen ucdbioinfo-id_rsa -O private -o ucdbioinfo.ppk

ascp expect a private key in putty format and in $HOME/.ssh directory or file name with absolute path such as

ascp -i /home/zwluxx/aspera/putty.ppk ...

The following command would fail even if your CWD is /home/zwluxx/aspera and putty.ppk is in the directory,

ascp -i putty.ppk ...

ascp will search for putty.ppk in $HOME/.ssh/putty.ppk

I have also written a simple script to upload file to NCBI

ascpToNCBI FileToNCBI Remote-NCBI-Directory

Notes on Aspera ascp and ssh-agent interactions

When I tried to download file from NCBI using aspera connect command line utility “ascp” on a Linux box, I hit with the following error messages:

ascp: failed to open ssh session., exiting.

Session Stop  (Error: failed to open ssh session.)

I thought that it might be problem on NCBI site or aspera connect version mismatch, it turns out to be an issue with ssh-agent on my linux box. Note that I didn’t see any Aspera documentation mentioning about ssh-agent working with or against ascp, but it actually does behind the scene.

I would guess that aspera connect (ascp) incorporates codes from openssh or putty for authentication, it will try the keys in the ssh-agent first (and ignoring the -i option or timed out before it offers the key in the -i option). In my case, I have a number of keys in my ssh-agent and it would time out the ssh authentications with NCBI server. Work around:

1. Delete all key from your ssh-agent before you start your ascp download

2. Put asperaweb_id_dsa.openssh as the first key in your ssh-agent, in this case, you can even ignore the -i option:

ssh-add asperaweb_id_dsa.openssh
ascp -Q -l100m anonftp@ftp-private.ncbi.nlm.nih.gov:/1GB .

would work just fine.

Apparently, ascp works with ssh-agent as long as asperaweb_id_dsa.openssh key is early in the ssh-agent list. The earlier mentioned step of converting openssh key to putty key is not necessary for Linux and other unix system.

It appears that the aspera web connect have the same problem with the command line in at lease ubuntu Linux system, I would get the following error message:

Error-failed to open ssh session. (Code 19)

The recipe to correct the problem would be the same as above (even you download/upload use your web browser)!!!

ASCP is a bandwidth hog

Please set a sensible rate limit (-l50m on a gigabit network, for example) for the ascp. Otherwise, ascp would use all bandwidth it sees, UDP does not have mechanism to slow itself down, your whole network would suffer dropped packet; web browser will complain about websites being unavailable (one of the symptoms of the ASCP hogging network bandwidth.)