Lesson Schedule
|
|
What is Version Control
|
|
Setting Up Git
|
Use git config with the --global option to configure a user name, email address, editor, and other preferences once per machine.
GitHub needs an SSH key to allow access
|
Creating a Repository
|
|
Tracking Changes
|
git status shows the status of a repository.
Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).
git add puts files in the staging area.
git commit saves the staged content as a new commit in the local repository.
Write commit messages that accurately describe your changes.
git log --decorate lists the commits made to the local repository, along with whether or not they are up-to-date with any remote repository.
|
Exploring History
|
|
Remote Repositories
|
Git can easily synchronise your local repository with a remote one
GitHub needs an SSH key to allow access
Git can resolve ‘conflicting’ modifications to text files
|
Branching
|
Branches are parallel versions of a repository
You can easily switch between branches, and merge their changes
Branches help with code sharing and collaboration
|
Ignoring Things
|
|
Survey
|
|
Reference
|
|
Lesson Schedule
|
|
Introduction
|
Well-made software is easier to expand and reuse
You need to produce reproducible research.
You are a user of your own code.
|
Issues
|
Issues are a way of recording bugs or feature requests.
Issues can be categorised by type.
Issues can reference other issues, and be referenced by commits.
|
Project Management
|
Projects are broken-down into self-contained tasks.
Tasks are represented as cards on a board.
Cards are arranged to show their status.
Issues can be added to project boards and labelled.
Project boards can show the priority of their tasks.
Forks are copies of entire repositories that can be synced up with the original.
|
Release Management
|
Releases are stable versions of the code.
Zenodo can automatically generate DOIs for releases.
Software licenses can restrict what others can do with your code.
|
Writing Sustainable Code
|
Always assume that someone else will read your code at a later date, including yourself.
Rename variables and functions to add context to make your code more readable.
Add comments to explain why something was done in a certain way if not obvious.
Don’t add comments that just restate what code clearly already does.
Use docstrings at the start of functions and files to explain their behaviour and input/output parameters.
|
Managing a Mini-Project
|
Problems with code and documentation can be tracked as issues.
Issues can be managed on a project board.
Issues can be fixed using the feature-branch workflow.
Stable versions of the code can be published as releases.
|
Survey
|
|
Reference
|
|
Lesson Schedule
|
|
Python Basics
|
Start the python interpreter by typing python in the shell.
Variables are named memory locations, they are used to access data.
|
Arrays, Lists etc
|
A list is an ordered collection of items of any type.
Values in the list can be accessed using their index in square brackets e.g. my_list[ix]
Lists can be manipulated in place using attribute functions e.g. my_list.reverse()
Ranges of values in a list can be obtained via slicing e.g. mylist[start:stop]
|
Repeating actions using loops
|
|
Processing data files
|
The python function open lets us read r or write w to files by creating a file handler.
We can use string operations such as line.split(',') to process data in files.
|
Making choices
|
We can use logical operations to change the behavior of our code when it meets certain conditions.
Using if, elif, and else we can check conditions and add a branch that runs if none of the conditions are met.
We can combine conditions using and and or to make more complicated logical statements.
|
Modularising your code using functions
|
A function is created using the def keyword.
Functions take variables that are specified in the function definition and use the return keyword to specify their output.
We can use a module to keep our functions separate to the main body of our code to improve code readability.
|
Handling Errors
|
|
Command-Line Programs
|
Python uses the sys library to acess command line arguments. sys.argv is a list of command line arguments.
Python program outputs can be used in a pipeline, however, due to the way python works we need to use the signal library to make sure it handles piping output correctly.
|
Reading and analysing Patient data using libraries
|
Python has many libraries that add to the core language to improve functionality in specific use cases.
Numpy is a numerical python library that makes working with vectors, matricies, or large data tables easier.
Numpy can be used to load datasets directly from CSV files bypassing Pythons built in file systems.
|
Data Visualisation
|
We can use matplotlib to create and manipulate a wide variety of plots in Python.
Once a plot has been made we can use matplotlib’s function savefig to output it in formats appropriate for publication.
|
Python Style Guide
|
|
Survey
|
|
Challenges
|
|
Why Python?
|
|
Reference
|
|
Lesson Schedule
|
|
Introducing the Shell
|
|
Files and Directories
|
The file system is responsible for managing information on the disk.
Information is stored in files, which are stored in directories (folders).
Directories can also store other directories, which then form a directory tree.
cd [path] changes the current working directory.
ls [path] prints a listing of a specific file or directory; ls on its own lists the current working directory.
pwd prints the user’s current working directory.
/ on its own is the root directory of the whole file system.
Most commands take options (flags) that begin with a - .
A relative path specifies a location starting from the current location.
An absolute path specifies a location from the root of the file system.
Directory names in a path are separated with / on Unix, but \ on Windows.
. on its own means ‘the current directory’; .. ` means ‘the directory above the current one’.
--help is an option supported by many bash commands, and programs that can be run from within Bash, to display more information on how to use these commands or programs.
man [command] displays the manual page for a given command.
|
Creating Things
|
Command line text editors let you edit files in the terminal.
You can open up files with either command-line or graphical text editors.
nano [path] creates a new text file at the location [path] , or edits an existing one.
cat [path] prints the contents of a file.
rmdir [path] deletes an (empty) directory.
rm [path] deletes a file, rm -r [path] deletes a directory (and contents!).
mv [old_path] [new_path] moves a file or directory from [old_path] to [new_path] .
mv can be used to rename files, e.g. mv a.txt b.txt .
Using . in mv can move a file without renaming it, e.g. mv a/file.txt b/. .
cp [original_path] [copy_path] creates a copy of a file at a new location.
|
Wildcards, Pipes and Filters
|
wc counts lines, words, and characters in its inputs.
* matches zero or more characters in a filename, so *.txt matches all files ending in .txt .
? matches any single character in a filename, so ?.txt matches a.txt but not any.txt .
cat displays the contents of its inputs.
sort sorts its inputs.
head displays the first 10 lines of its input.
tail displays the last 10 lines of its input.
command > [file] redirects a command’s output to a file (overwriting any existing content).
command >> [file] appends a command’s output to a file.
[first] | [second] is a pipeline: the output of the first command is used as the input to the second.
The best way to use the shell is to use pipes to combine simple single-purpose programs (filters).
|
Finding Things
|
find finds files with specific properties that match patterns.
grep selects lines in files that match patterns.
$([command]) inserts a command’s output in place.
|
Shell Scripts
|
Save commands in files (usually called shell scripts) for re-use.
bash [filename] runs the commands saved in a file.
$@ refers to all of a shell script’s command-line arguments.
$1 , $2 , etc., refer to the first command-line argument, the second command-line argument, etc.
Use Ctrl+R to search through the previously entered commands.
Use history to display recent commands, and ![number] to repeat a command by number.
Place variables in quotes if the values might have spaces in them.
Letting users decide what files to process is more flexible and more consistent with built-in Unix commands.
|
Loops
|
A for loop repeats commands once for every thing in a list.
Every for loop needs a variable to refer to the thing it is currently operating on.
Use $name to expand a variable (i.e., get its value). ${name} can also be used.
Do not use spaces, quotes, or wildcard characters such as ‘*’ or ‘?’ in filenames, as it complicates variable expansion.
Give files consistent names that are easy to match with wildcard patterns to make it easy to select them for looping.
|
Additional Exercises
|
date prints the current date in a specified format.
Scripts can save the output of a command to a variable using $(command)
basename removes directories from a path to a file, leaving only the name
cut lets you select specific columns from files, with -d',' letting you select the column separator, and -f letting you select the columns you want.
|
Survey
|
|
Reference
|
|