Electives & Concentrations (2024)

Introduction to TACC

January 2023

Students in the course will learn what a cluster is and how to use the world-class clusters available at the Texas Advanced Computing Center (TACC). The course will discuss the basic architecture of the Lonestar and Stampede computing clusters, how they compare to a regular computer, job launchers and job scheduling, and how to submit your own jobs to TACC. Custom tools by the Bioinformatics Consulting Group for job submission will be emphasized. Familiarity with a unix command line is a prerequisite. Students must also establish a TACC account and can do so by visitingthis link.

Previously offered:

Introduction to Python I

Python is a simple and popular programming language that can be used across platforms and is useful for a wide variety of tasks.

This Short Course is a basic introduction to scripting using python. Skills taught will include data structures, input and output, loops, and if time permits, function definitions. These tools will be useful for researchers in many fields for data management, automating tedious computational tasks, and handling 'big data'. This course is taught at an introductory level and is appropriate for students with no experience, but will contain material and techniques helpful to moderately experienced python programmers.Topics to be covered:

  • Introduction
  • Control Flow
  • Lists
  • Input and Output
  • Strings
  • Functions

Introduction to Unix

Learn the basics of using UNIX from the command line. Introductory topics include the filesystem, the shell, permissions, and text files. The course will touch on manipulating text files using standard UNIX utilities, how to string utilities together, and how to output the results to files. The goal of the course is to develop some basic comfort at the command line, get a sense of what's possible, and learn how to find help.

Introduction to R, Part I

This course introduces R, a free and open-source software package used for statistical computing and graphics. We will cover navigating the free graphical user interface RStudio, importing and exporting data, combining (merging) data, creating and manipulating variables, basic data descriptives, and visualization. The course will introduce installation of R packages to leverage two popular workflows: the tidyverse and ggplot2.After completing this course, a new user should be able to:

  • Navigate RStudio.
  • Install and use R packages.
  • Import/export data from/to external files.
  • Create and manipulate new variables.
  • Generate simple descriptive statistics to summarize data.
  • Graphically display various types of data.
  • Edit features of graphs (titles/labels, colors, shading, etc.).
  • Make graphs using ggplot2.

Introduction to Data Visualization & SQL

This course introduces both principles and practice of scientific data visualization, especially as applied to large multivariate data sets. Will cover common methods of visually summarizing data and illustrating relationships between variables of various common types (continuous, categorical, etc.) as well as design concepts for increasing the clarity of quantitative graphical communication. Will introduce modern "grammar of graphics" ideas as foundation for thinking about, relating, and ultimately building new types of informative plots. Implementations of covered methods in both R and python will be presented.

Students should bring their own laptops to the course. Installation of either R (with ggplot2) or Python (with matplotlib, seaborn, and plotnine) prior to class is required.

Working with SQL Databases

This is an introductory course on the basics of database technology using MySQL, including the structured query language (SQL) along with modes of database interaction specific for bioinformatics workflows. We begin with a hands-on introduction to databases and the MySQLWorkbench user interface, creating and populating a schema, followed by simple and more complex queries to manipulate different subsets of data. Finally, we discuss bulk loading of database tables and programmatic database access, for example from Python.

Introduction to R, Part II

This short course covers data analysis in R in greater depth than the introductory course. For various statistical methods (see list below), participants will learn how to prepare a dataset, test relevant assumptions, and carry out the analyses. In addition, students will be introduced to the premise of reproducible research, and create RMarkdown objects. This hands-on course will teach participants how to use R to run different types of analysis, interpret the output, and reproducible documents.After completing this course, participants should be able to carry out:

  • Correlation and simple linear regression analysis.
  • Chi-squared tests.
  • T-tests and one-way ANOVA.
  • Multiple regression and multivariate ANOVA models.
  • Logistic regression.
  • Reproducible report creation in RStudio.

Machine Learning

This course introduces a selection of machine learning methods for both unsupervised learning (dimensionality reduction and clustering) and supervised learning (classification and regression). The phenomenon of model overfitting will be discussed along with techniques such as cross-validation for its assessment and quantification.

Bash Scripts

This course will cover advanced topics in writing Bash shell scripts, providing tips, examples, and best practices for creating robust "pipeline scripts" that execute multiple processing steps. Topics include defining functions, argument processing, and defaulting, error checking, effective use of awk, grep, and sed, as well as subtleties of Unix stream and text manipulation.

GitHub and Code Management

GitHub and Code Management is a hands-on workshop that will cover basic concepts and tools for version control and code management. In this course, you will learn Git and commands that are all you need for most day-to-day version control tasks. You'll also learn how to use GitHub to host your own Git repositories. More information about this course can be found at: https://alice-macqueen.github.io/2020-08-13-utexas/

Creating Publication Quality Graphics with ggplot2

ggplot2 is a plotting package in R that makes it simple to create complex plots from data in a data frame, allowing you to create and edit publication quality plots with minimal amounts of adjustments and tweaking. In this course, you will learn how to produce plots using ggplot2, set universal plot settings and themes, apply faceting in ggplot, and build and save complex and customized plots from data in a data frame.

Electives & Concentrations (2024)
Top Articles
Latest Posts
Article information

Author: Lakeisha Bayer VM

Last Updated:

Views: 6203

Rating: 4.9 / 5 (49 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Lakeisha Bayer VM

Birthday: 1997-10-17

Address: Suite 835 34136 Adrian Mountains, Floydton, UT 81036

Phone: +3571527672278

Job: Manufacturing Agent

Hobby: Skimboarding, Photography, Roller skating, Knife making, Paintball, Embroidery, Gunsmithing

Introduction: My name is Lakeisha Bayer VM, I am a brainy, kind, enchanting, healthy, lovely, clean, witty person who loves writing and wants to share my knowledge and understanding with you.