Jupyter Notebook

Workshop Summary

This workshop will provide an introduction on how to use Jupyter Notebooks that allow to write and execute codes and have become the de facto software environment for interactive data analysis, visualization, and machine learning. It is used by faculties for teaching, researchers for data analysis and students for class projects. This workshop will teach how to user Jupyter Notebooks on local compute resources and, once local compute resources are exhausted, on supercomputers such as the Expanse located at the San Diego Supercomputer Center (SDSC) at the University of California San Diego (UCSD). The easy to use portal on the Expanse provides an integrated and easy-to-use web interface to access Expanse’s highperformance computing (HPC) resources. The workshop will cover setting up reproducible and transferable software environments from a local systems to Expanse using Conda, demonstrate scaling up calculations to large datasets
and parallel computing on Expanse, and teach running Jupyter Notebooks in batch mode.

Learning Objectives

1. Attendees will have a thorough understanding of use of Jupyter Notebooks on high-performance computing (HPC) resources such as the Expanse supercomputer at SDSC.
2. Attendees will understand the basics of how to run Jupyter notebooks securely in an HPC environment
3. Attendees will have a good understanding about creating reproducible software environments to run Jupyter notebooks from laptop to HPC.
4. Attendees will learn about how to get compute time on HPC resources at SDSC

Goals of the Workshop

Attendees will learn about:
1. How to setup and use Conda environments for reproducible research
2. How to write and share computational analysis in Jupyter Notebooks

3. How to scale up data analysis to larger than memory (out-of-core) datasets
and processing them in parallel on CPU and GPU nodes
4. How to run Jupyter Notebooks on Expanse
5. How to get allocation on Expanse via SDSC’s HPC@MSI program

Schedule

Introduction to customizing Python environments

Computing with Jupyter Notebooks anywhere (from laptops to supercomputers ) for reproducible research

HPC@MSI

Eligibility

Enrollment is free and is limited to 30 participants per workshop. It is hoped that the small workshop size will facilitate networking and promote collaboration across institutions by individuals who share common interests in research and education. Participation priority is for current HSI faculty and staff who teach
undergraduate STEM courses. Non-HSI faculty staff who teach undergraduate STEM courses are eligible to apply if they: 1) currently collaborate as PIs/co-PIs on a funded or pending NSF EHR/DUE grant that includes HSI faculty/staff as PI/co-PIs or 2) would like to network to find HSI partners for future collaborative projects in education or research. Admission priority is for faculty within the first 10 years of their first academic tenure-track appointment. Applicants should be aware that the selection decision is final and summary review are not provided.

Acknowledging grants:

NIH R24MH120037, NIH R01EB023297, NIH R01NS047293, NSF DBI1935749, NIH U24EB029005, NSF 2017767, NSF 1928224

Workshop Details

  • Workshop Description

    This workshop will provide an introduction on how to use Jupyter Notebooks that allow to write and execute codes and have become the de facto software environment for interactive data analysis, visualization, and machine learning. It is used by faculties for teaching, researchers for data analysis and students for class projects.

  • Workshop Dates:

    10/21/2022 | 9 AM PDT – 12 PM PDT

  • Who should attend?

    Faculty and staff who are interested in learning about (i) Jupyter
    Notebooks and how to utilize them for teaching and research, (ii) how
    to scale up analysis using Jupyter Notebooks, and (iii) how to get
    supercomputer time allocation on SDSC’s Expanse. Some
    experience with the Linux command line, Python, and Jupyter
    Notebooks is recommended. Here are links to tutorials:

    http://www.ee.surrey.ac.uk/Teaching/Unix/

    https://automatetheboringstuff.com/

    https://jupyter.org/try-jupyter/lab/

  • Cost

    The workshop is free to all attendees

  • Flyer

  • Apply Here

    To join this session fill out the survey link below

    Apply today

Workshop Speakers

Dr. Peter RoseDr. Rose is Director of the Structural Bioinformatics Lab and Lead for Bioinformatics and Biomedical Applications at the San Diego Supercomputer Center (SDSC), UC San Diego. He has previously led bioinformatics and scientific computing departments at Pfizer and Agouron Pharmaceuticals. He led the RCSB Protein Data Bank team at UCSD, one of the largest open-access databases in biology. In his current position at SDSC, he is involved in projects to integrate cross-disciplinary data for novel COVID19 diagnostic and surveillance methods, and the application of knowledge graphs to COVID-19 and precision medicine datasets. His research interests include the development of interactive and scalable platforms for data integration and machine learning in biomedicine and structural biology. He is an advocate for open-source software development and reproducible computational research.

Dr. Marty KandesDr. Marty Kandes is a Computational and Data Science Research Specialist in the HighPerformance Computing User Services Group at SDSC. He currently helps manage user support for Expanse and Voyager — SDSC’s two NSF-funded supercomputers — and maintains all of the Singularity and Docker containers supported on these systems. He is involved in research related to benchmarking of machine learning applications. Marty obtained his Ph.D. in Computational Peter Rose, Director, Structural Bioinformatics Lab, San Diego Supercomputer Center, UCSD Marty Kandes, Computational and Data Science Research Specialist, San Diego Supercomputer Center, UCSD 6 of 8 Science in 2015 from the Computational Science Research Center at San Diego State University, where his research focused on studying quantum systems in rotating frames of reference through the use of numerical simulation. He also holds an M.S. in Physics from San Diego State University and B.S. degrees in both Applied Mathematics and Physics from the University of Michigan, Ann Arbor. His current research interests include problems in Bayesian statistics, combinatorial optimization, nonlinear dynamical systems (e.g., epidemiological modeling), and numerical partial differential equations.

Nicole WolterNicole Wolter is a Computational and Data Science Research Specialist in the High-Performance Computing User Services Group at SDSC. She currently manages the accounts and allocations and provides user support for the three HPC systems at SDSC. Nicole graduated from San Diego State University with a degree in Computer Science in 2001. She is currently involved in working with and helping users porting their AI applications to SDSC’s NSF funded AI supercomputer – Voyager.

NSF LogoHSI STem Hub Logo

Workshop Attendees

October 21, 2022

  • Yunfei Hou – Associate Professor, California State San Bernardino
  • David Torres – Chair of Mathematics and Physical Science, Northern New Mexico College
  • Dimah Dera – Assistant Professor of Biology, University of Texas Rio Grande Valley
  • Eric Garcia – Postdoc & Research Manager, Old Dominion University
  • Sridhar Malkaram – Associate Professor, West Virginia State University
  • Supratik Kar – Assistant Professor, Kean University
  • Filippo Posta – Math Faculty, Estrella Mountain Community College
  • Celia Jenkins – Director of Grants Management, Cochise College
  • Kee Lam – Professor, Los Angeles City College
  • Frank Willmore – HPC Engineer, Boise State University
  • Taffeta Elliott – Associate Professor, New Mexico Institute of Mining and Technology
  • Vinod Gupta – IT Director, Columbia University
  • Kevin Labrador – University Researcher, University of the Philippines Mindanao
  • Santosh KC – Assistant Professor, San Jose State University
  • Fernando Garzon – Computational and Data Scientist, San Diego Supercomputer Center
  • Kenneth Yoshimoto – Computational Scientist, San Diego Supercomputer Center
  • Mahidhar Tatineni – Research Programmer Analyst, San Diego Supercomputer Center
  • Mary Thomas – Computational & Data Science Researcher, San Diego Supercomputer Center
  • Subhashini Sivagnanam – Cyberinfrastructure Solutions and Services, San Diego Supercomputer Center
  • Arun Ghosh – Professor, Purdue University
  • Sumit Saluja – Senior System Engineer, Columbia University