Unix File System Structure
Basic Unix Commands
connect to another computer
list files in current directory
long format listing
long format, human readable file sizes, reverse time listing
Print characters to screen:
display current working directory
make new directory “fastafiles”
change current working directory to “fastafiles”
move up one directory
create a link to “contigs.fa”
ln -s contigs.fa link_to_contigs
edit file “contigs.fa”, using nano
paged viewing of text file “contigs.fa”
scroll through text file “contigs.fa”
view contents of text file “contigs.fa”
show the beginning of a file
show the end of a file
search for text in a file line by line
grep ACGTACGTAAAA contigs.fa
Line count of a file
wc -l contigs.fa
Cut out certain columns in a file:
cut -d' ' -f1 contigs.fa
Sort a file:
compress a file or directory
uncompress a file
remove (empty) directory “subdir”
copy file “contigs.fa” to new file “contigs2.fa”
cp contigs.fa contigs2.fa
rename (or move) file “contigs2.fa” to “contigs3.fa”
mv contigs2.fa contigs3.fa
remove file “contigs3.fa”
download a file from the internet using a URL
display manual (or help) page for “command”
clear the screen
exit a terminal window
Every Unix command can be put into a sequence that will take the output of one command and turn it into the input to another command. The construct for doing this is the pipe character (i.e |). For example:
grep ">" contigs.fa | wc -l
will count the number of lines with the text “>” in them. This is useful for counting the number of sequences in a fasta file.
The output of a Unix command can be redirected to a file by using the greater-than character construct (i.e >). For example:
grep "ACGTACGT" contigs.fa > lines.out
will create a file that will only have the lines from contigs.fa that have the text “ACGTACGTAAAA” in them.
Most Unix commands have options (sometimes called “switches”) that change the behavior of the command. Typically, a switch is a “-” followed by a letter. For example:
will count the number of characters in contigs.fa, however:
wc -l contigs.fa
will count the number of lines in contigs.fa. Switches can also be a “-” or “–” followed by a word. Switches can also take an argument that changes the behavior of the switch:
head -n 20 contigs.fa head -n 40 contigs.fa
will give you the first 20 and 40 lines, respectively, of the file.
Relative vs. Absolute Path
You can specify a file name using either relative or absolute paths. A relative path is a path that is relative to you current directory. An absolute path is the full path name of the file.
You can complete the name of a file/program by pressing the TAB key.
Pressing the Up and Down arrows will cycle through your list of commands, which you can edit and rerun.
Everything is Case Sensitive
All commands, file names, directories, etc. are ALL case sensitive. I.e., cd is not the same thing as CD. Most things on Unix systems are all lower case, however, they don’t have to be.
Spaces are delimiters
Remember that spaces are the boundaries between each piece of a command. Spaces are how Unix differentiates between different parts of the command.
Perl is a very useful language that we use often for manipulating and transforming data. Learning Perl is beyond the scope of this tutorial, however, it is something you may want to add to your bioinformatics toolkit.
Scripts are simply text files that have commands in them that can be run. Open a file using nano:
Then, add a few commands into the script:
ls -l wc -l myscript pwd
Save the file and exit nano. Then change the permissions on the script to make it executable:
chmod 755 myscript
Then run the script:
Getting Help for a command
Many commands have help text that shows you the options and what they do. With some commands you can type the command by itself and it will print out the options. Other commands will need a “-h” or “–help” switch to get the help text.