Quantitative research resources

Tags how-to

Organize

Organize your coding, batch files and even your data by using GitHub (GitHub in 10 easy steps) or other revision control tool.  Make changes in your programs iteratively, run and check in.  This will allow you to back out changes one at a time rather than have to search through many changes you made.  Make log entries when the program runs so you can debug your code faster.  

Store your files in a safe place and back up regularly.

Geographical Information System (mapping, analysis, etc.): There are a variety of GIS options at the University of Michigan. See the Library Guide to learn more.

Basic computation resources

Some basic computation resources you can use (perhaps a bit better than your machine) are:

When you move to the next levels of computing you will likely need to use the the Linux command line (guide to learning it) and need other programming skills.

See the CSCAR resources to get started and learn more from trainings on tools such as python, R and more.

Also check out David Blei's page.

Check out these training options:

  1. Big Data Summer Camp
  2. ARC: CSCAR events and TTC events
  3. Python:
    1. http://learnpythonthehardway.org/
    2. http://www.pythonlearn.com/book.php (this can be printed at the book station at the Library)
    3. Coursera has some great SI classes: https://www.coursera.org/specializations/data-science-python The university now gives free versions of these courses to University students, staff and alumni at online.umich.edu. The course materials are also available at www.py4e.com.
  4. Code 4 everyone - Computer Science
  5. Code academy for Data Science
  6. Coursera for Data Scraping

Mid-range computing on a Linux-based statistical service

Mid-range computing on a Linux-based statistical service is often better than the basic computation resources for jobs that take a long time to run.

ITS Statistical and Computation Service: You will need some Linux command line knowledge. See: Learning the Linux Command Line Interface

High Performance Computing

High Performance Computing runs on a multi-threading computer cluster based on Linux with a scheduler for optimal performance (run by LSA TS and ARC TS). Check out a Quick Introduction to HPC Resources.

It’s best to take some courses and consult with the arcts-support@umich.edu folks prior to starting your work on these systems.  You need to know how to properly call the software and your data.  You need to understand how the jobs run and what resources you need or will work best for your analysis.

  • ArcConnect — Graphical User Access to Great Lakes; the easiest way to use Great Lakes. To log in, go to https://connect.arc-ts.umich.edu/ (with VPN enabled).
  • Great Lakes (cluster) — The standard start to using cluster computing. Their User's Guide may be helpful.
  • Armis — When you are working with secure or confidential data.
  • XSEDE — When you need more resources than you can afford and there are equivalent resources in this NSF funded system of clusters. Get consulting help to know which clusters to choose to work on based on your data and experience.
  • Public Cloud — Reach out for a consult and keep in mind that you have to pay to get your data in and out of this service if you use Amazon storage.
  • Yottabyte — For Federally restricted analysis.

Environment

Research Great Lakes HPC

Additional notes

HPC NSF XSEDE Statistics Linux YBRC

Details

Article ID: 1598
Created
Tue 5/26/20 9:32 PM
Modified
Thu 8/27/20 12:47 PM