This workshop will provide an introduction on how to use Jupyter Notebooks that allow to write and execute codes and have become the de facto software environment for interactive data analysis, visualization, and machine learning. It is used by faculties for teaching, researchers for data analysis and students for class projects. This workshop will teach how to user Jupyter Notebooks on local compute resources and, once local compute resources are exhausted, on supercomputers such as the Expanse located at the San Diego Supercomputer Center (SDSC) at the University of California San Diego (UCSD). The easy to use portal on the Expanse provides an integrated and easy-to-use web interface to access Expanse’s highperformance computing (HPC) resources. The workshop will cover setting up reproducible and transferable software environments from a local systems to Expanse using Conda, demonstrate scaling up calculations to large datasets
and parallel computing on Expanse, and teach running Jupyter Notebooks in batch mode.
1. Attendees will have a thorough understanding of use of Jupyter Notebooks on high-performance computing (HPC) resources such as the Expanse supercomputer at SDSC.
2. Attendees will understand the basics of how to run Jupyter notebooks securely in an HPC environment
3. Attendees will have a good understanding about creating reproducible software environments to run Jupyter notebooks from laptop to HPC.
4. Attendees will learn about how to get compute time on HPC resources at SDSC
Goals of the Workshop
Attendees will learn about:
1. How to setup and use Conda environments for reproducible research
2. How to write and share computational analysis in Jupyter Notebooks
3. How to scale up data analysis to larger than memory (out-of-core) datasets
and processing them in parallel on CPU and GPU nodes
4. How to run Jupyter Notebooks on Expanse
5. How to get allocation on Expanse via SDSC’s [email protected] program
Introduction to customizing Python environments
Computing with Jupyter Notebooks anywhere (from laptops to supercomputers ) for reproducible research
Enrollment is free and is limited to 30 participants per workshop. It is hoped that the small workshop size will facilitate networking and promote collaboration across institutions by individuals who share common interests in research and education. Participation priority is for current HSI faculty and staff who teach
undergraduate STEM courses. Non-HSI faculty staff who teach undergraduate STEM courses are eligible to apply if they: 1) currently collaborate as PIs/co-PIs on a funded or pending NSF EHR/DUE grant that includes HSI faculty/staff as PI/co-PIs or 2) would like to network to find HSI partners for future collaborative projects in education or research. Admission priority is for faculty within the first 10 years of their first academic tenure-track appointment. Applicants should be aware that the selection decision is final and summary review are not provided.
NIH R24MH120037, NIH R01EB023297, NIH R01NS047293, NSF DBI1935749, NIH U24EB029005, NSF 2017767, NSF 1928224
Dr. Rose is Director of the Structural Bioinformatics Lab and Lead for Bioinformatics and Biomedical Applications at the San Diego Supercomputer Center (SDSC), UC San Diego. He has previously led bioinformatics and scientific computing departments at Pfizer and Agouron Pharmaceuticals. He led the RCSB Protein Data Bank team at UCSD, one of the largest open-access databases in biology. In his current position at SDSC, he is involved in projects to integrate cross-disciplinary data for novel COVID19 diagnostic and surveillance methods, and the application of knowledge graphs to COVID-19 and precision medicine datasets. His research interests include the development of interactive and scalable platforms for data integration and machine learning in biomedicine and structural biology. He is an advocate for open-source software development and reproducible computational research.
Dr. Marty Kandes is a Computational and Data Science Research Specialist in the HighPerformance Computing User Services Group at SDSC. He currently helps manage user support for Expanse and Voyager — SDSC’s two NSF-funded supercomputers — and maintains all of the Singularity and Docker containers supported on these systems. He is involved in research related to benchmarking of machine learning applications. Marty obtained his Ph.D. in Computational Peter Rose, Director, Structural Bioinformatics Lab, San Diego Supercomputer Center, UCSD Marty Kandes, Computational and Data Science Research Specialist, San Diego Supercomputer Center, UCSD 6 of 8 Science in 2015 from the Computational Science Research Center at San Diego State University, where his research focused on studying quantum systems in rotating frames of reference through the use of numerical simulation. He also holds an M.S. in Physics from San Diego State University and B.S. degrees in both Applied Mathematics and Physics from the University of Michigan, Ann Arbor. His current research interests include problems in Bayesian statistics, combinatorial optimization, nonlinear dynamical systems (e.g., epidemiological modeling), and numerical partial differential equations.
Nicole Wolter is a Computational and Data Science Research Specialist in the High-Performance Computing User Services Group at SDSC. She currently manages the accounts and allocations and provides user support for the three HPC systems at SDSC. Nicole graduated from San Diego State University with a degree in Computer Science in 2001. She is currently involved in working with and helping users porting their AI applications to SDSC’s NSF funded AI supercomputer – Voyager.