f Perez 1304 Ipython

download f Perez 1304 Ipython

of 64

Transcript of f Perez 1304 Ipython

  • SciComp Examples IPython A growing project Wrapup

    IPythonA tool for the lifecycle of computational ideas

    Fernando Prezhttp://fperez.org, @fperez_org

    [email protected]

    Henry H. Wheeler Jr. Brain Imaging Center, UC Berkeley

    April 4, 2013

  • SciComp Examples IPython A growing project Wrapup

    Outline

    1 Scientific Computing

    2 Two examples

    3 IPython: Interactive Python

    4 A growing project

    5 Wrapup

    FP (UC Berkeley) IPython April 4, 2013 2 / 51

  • SciComp Examples IPython A growing project Wrapup

    Outline

    1 Scientific Computing

    2 Two examples

    3 IPython: Interactive Python

    4 A growing project

    5 Wrapup

    FP (UC Berkeley) IPython April 4, 2013 3 / 51

  • Computing is not the third branch of science...

    It is now the backbone of theory and experiment!Computing in science must improve drastically before we can really

    call it scientific.

  • Computing is not the third branch of science...

    It is now the backbone of theory and experiment!Computing in science must improve drastically before we can really

    call it scientific.

  • SciComp Examples IPython A growing project Wrapup

    A crisis of credibility and real issues

    The Duke clinical trials scandal - Potti/NevinA compounding of (common and otherwise) data analysis errors.Lawsuits, resignations, careers destroyed.More importantly: Patients were harmed.Major policy reviews and changes: NCI, IOM, ...More: see K. Baggerlys "starter set" page.

    The Duke situation is more common than wed like to believe!Begley & Ellis, Nature, 3/28/12: Drug development: Raise standardsfor preclinical cancer research.47 out of 53 landmark papers could not be replicated.

    Nature, Feb 2012, Ince et al: The case for open computer programsThe scientific community places more faith in computation than isjustifiedanything less than the release of actual source code is an indefensibleapproach for any scientific results that depend on computation

    FP (UC Berkeley) IPython April 4, 2013 5 / 51

  • SciComp Examples IPython A growing project Wrapup

    A crisis of credibility and real issues

    The Duke clinical trials scandal - Potti/NevinA compounding of (common and otherwise) data analysis errors.Lawsuits, resignations, careers destroyed.More importantly: Patients were harmed.Major policy reviews and changes: NCI, IOM, ...More: see K. Baggerlys "starter set" page.

    The Duke situation is more common than wed like to believe!Begley & Ellis, Nature, 3/28/12: Drug development: Raise standardsfor preclinical cancer research.47 out of 53 landmark papers could not be replicated.

    Nature, Feb 2012, Ince et al: The case for open computer programsThe scientific community places more faith in computation than isjustifiedanything less than the release of actual source code is an indefensibleapproach for any scientific results that depend on computation

    FP (UC Berkeley) IPython April 4, 2013 5 / 51

  • SciComp Examples IPython A growing project Wrapup

    A crisis of credibility and real issues

    The Duke clinical trials scandal - Potti/NevinA compounding of (common and otherwise) data analysis errors.Lawsuits, resignations, careers destroyed.More importantly: Patients were harmed.Major policy reviews and changes: NCI, IOM, ...More: see K. Baggerlys "starter set" page.

    The Duke situation is more common than wed like to believe!Begley & Ellis, Nature, 3/28/12: Drug development: Raise standardsfor preclinical cancer research.47 out of 53 landmark papers could not be replicated.

    Nature, Feb 2012, Ince et al: The case for open computer programsThe scientific community places more faith in computation than isjustifiedanything less than the release of actual source code is an indefensibleapproach for any scientific results that depend on computation

    FP (UC Berkeley) IPython April 4, 2013 5 / 51

  • What does it take to get reproducible research results?

    Reproducible research practices!Reproducibility at publication time?

    Its already too late.Learn from a community (open source) where

    reproducibility is an everyday practice(by necessity)

  • What does it take to get reproducible research results?

    Reproducible research practices!Reproducibility at publication time?

    Its already too late.Learn from a community (open source) where

    reproducibility is an everyday practice(by necessity)

  • What does it take to get reproducible research results?

    Reproducible research practices!Reproducibility at publication time?

    Its already too late.Learn from a community (open source) where

    reproducibility is an everyday practice(by necessity)

  • FOSS better than scientific research?FOSS: Free and Open Source Software

    Public distributed version control: provenance tracking

  • Pull requests: ongoing peer review

  • Pull requests: back and forth discussion

  • Automated tests: validationThe IPython build Dashboard: immediate feedback

  • Versioned scienceGit: the tool you didnt know you needed

    Reproducibility?Tracking and recreating every step of your workIn the software world: its called Version Control!

    Git: an enabling technology. Use version control for everythingPaper/grant writing (never get paper_v5_john.tex by email again!)git clone https://server.com/my-grant/repo.gitcd repomake nsf-fastlane

    Everyday research: track your resultsCollaboration: synchronize multi-author work.Teaching!

    A Git tutorial for scientists: http://bit.ly/YMBP83

  • Versioned scienceGit: the tool you didnt know you needed

    Reproducibility?Tracking and recreating every step of your workIn the software world: its called Version Control!

    Git: an enabling technology. Use version control for everythingPaper/grant writing (never get paper_v5_john.tex by email again!)git clone https://server.com/my-grant/repo.gitcd repomake nsf-fastlane

    Everyday research: track your resultsCollaboration: synchronize multi-author work.Teaching!

    A Git tutorial for scientists: http://bit.ly/YMBP83

  • The IBM Mark I at Harvard

  • In the beginning, IBM said...Let there be FORTRAN

  • In the beginning, IBM said...Let there be FORTRAN

  • Beyond (Floating Point) Number Crunching

    Hardwarefloating point

    Arbitrary precisionintegers

    RationalsInterval arithmetic

    Symbolic manipulation

    FORTRAN

    Extended precisionfloating point

    Text processingDatabases

    Graphical userinterfaces

    Web interfacesHardwarecontrol

    Multi-languageintegration

    Data formats: HDF5, XML, ...

  • The purpose of computing is insight, not numbers.Richard Hamming, 1962

  • SciComp Examples IPython A growing project Wrapup

    The computer as microscope

    Exploratory: Problems definition evolves as we understand it.No requirements to build an application against.Mathematica, Maple, Matlab, IDL, etc.

    All have an interactive environment.

    Applications Languages

    FP (UC Berkeley) IPython April 4, 2013 16 / 51

  • IPython: part of a Rich Ecosystem

    IPython

    NetworkX

  • SciComp Examples IPython A growing project Wrapup

    The Lifecycle of a Scientific Idea (schematically)

    1 Individual exploratory work2 Collaborative development3 Production work (HPC, cloud, parallel)4 Publication (with reproducible results!)5 Education6 Goto 1.

    The Problem with most toolsBarriers and discontinuities in workflow in between all the steps

    FP (UC Berkeley) IPython April 4, 2013 18 / 51

  • SciComp Examples IPython A growing project Wrapup

    The Lifecycle of a Scientific Idea (schematically)

    1 Individual exploratory work2 Collaborative development3 Production work (HPC, cloud, parallel)4 Publication (with reproducible results!)5 Education6 Goto 1.

    The Problem with most toolsBarriers and discontinuities in workflow in between all the steps

    FP (UC Berkeley) IPython April 4, 2013 18 / 51

  • SciComp Examples IPython A growing project Wrapup EEG analysis for epilepsy High quality plotting: matplotlib JPL: Mars mission data visualization Astronomy

    Outline

    1 Scientific Computing

    2 Two examples

    3 IPython: Interactive Python

    4 A growing project

    5 Wrapup

    FP (UC Berkeley) IPython April 4, 2013 19 / 51

  • Data analysis for epilepsy surgeryIsolating the origin of drug-resistant epileptic seizures which require surgery.John Hunter, Department of Pediatric Neurology, University of Chicago.

  • Electrode location in 3D, combined with MRI data

  • Correlation analysis of seizure data

  • Matplotlib: 2d plotting

  • Matplotlib: 3d plotting

  • JPL: Mars mission trajectory design and nav dataTed Drain and Lynn Craig, Jet Propulsion Laboratory (NASA/Caltech)

    From: Name Elided Date: Oct 2, 2007 7:15 PMSubject: Fwd: matplotlib bug numbersTo: John Hunter

    One of my lead developers mentioned that they had sent a bug to you about the annotations feature ofMatPlotLib. Would you be able to let me know what the timeline is to resolve that bug? The reason is thatthe feature is needed for the Phoenix project and their arrival at Mars will be in March sometime, but theyare doing their testing in the coming few months. This annotation feature is used on reports that presentthe analysis of the trajectory to the navigation team and it shows up on our schedule. It would reallyhelp me to know approximately when it could be resolved.B-plane plots are used to show the trajectory of a spacecraft with respect to the target body (specificallyperpendicular to the incoming asymptote of the spacecraft trajectory) and we plot them with the y-axisinverted. The plot is used heavily in flight operations so it is important to our customers.In addition, we have what is called a thundering heard plot where many different trajectory solutions(determined from different measurement sources) are plotted together. The annotations are import there sowe can see which plot corresponds to each source of data. I hope it helps to know how your code will beused in spacecraft navigation.Thanks for all your efforts.

  • JPL: Mars mission data visualizationExpected communication power levels between an orbiting spacecraft anda lander as it goes through the atmosphere:

  • August 23, 2011

    The astronomy event of a generationJosh Bloom, UC Berkeley Astronomy

    @profjsb

  • Monday Tuesday Wednesday

    Supernova PTF11kyl: Event of a Generation found on Tuesday

    Most nearby Type Ia supernova in > 25 yearsSoon visible with binoculars

    http://bit.ly/ptf11kly

  • SciComp Examples IPython A growing project Wrapup

    Outline

    1 Scientific Computing

    2 Two examples

    3 IPython: Interactive Python

    4 A growing project

    5 Wrapup

    FP (UC Berkeley) IPython April 4, 2013 28 / 51

  • Why IPython?

    (something other thanId rather not finish my dissertation)

  • Why IPython?

    (something other thanId rather not finish my dissertation)

  • SciComp Examples IPython A growing project Wrapup

    The Lifecycle of a Scientific Idea (schematically)

    1 Individual exploratory work2 Collaborative development3 Production work (HPC, cloud, parallel)4 Publication (with reproducible results!)5 Education6 Goto 1.

    The Problem with most toolsBarriers and discontinuities in workflow in between all the steps

    FP (UC Berkeley) IPython April 4, 2013 30 / 51

  • SciComp Examples IPython A growing project Wrapup

    The Lifecycle of a Scientific Idea (schematically)

    1 Individual exploratory work2 Collaborative development3 Production work (HPC, cloud, parallel)4 Publication (with reproducible results!)5 Education6 Goto 1.

    The Problem with most toolsBarriers and discontinuities in workflow in between all the steps

    FP (UC Berkeley) IPython April 4, 2013 30 / 51

  • IPythons goal:Fluid transitions in all these steps

  • Demo

  • SciComp Examples IPython A growing project Wrapup

    Pillar #1: An architecture for interactive computing

    FP (UC Berkeley) IPython April 4, 2013 33 / 51

  • SciComp Examples IPython A growing project Wrapup

    Pillar #2: the Notebook Format

    JSON but version control-friendlyEasy for machine processing, fixable by hand if need be.Lots of hooks for metadataNot Python-specific (Ruby, JS notebooks exist, R, Julia planned)Produce Markdown, reST, LATEX, HTML, etc...

    An open format for sharing, publishing andarchiving executable computational work

    FP (UC Berkeley) IPython April 4, 2013 34 / 51

  • SciComp Examples IPython A growing project Wrapup

    Outline

    1 Scientific Computing

    2 Two examples

    3 IPython: Interactive Python

    4 A growing project

    5 Wrapup

    FP (UC Berkeley) IPython April 4, 2013 35 / 51

  • Documented protocols and formats:a growing ecosystem around IPython

  • An Emacs Notebook Client!Takafumi Arakaki: http://tkf.github.com/emacs-ipython-notebook.

  • A vim client to control an IPython kernel/consolePaul Ivanov (Berkeley), https://github.com/ivanov/vim-ipython

  • Microsoft Visual Studio 2010 integrated consoleDino Viehland and Shahrokh Mortazavi (Microsoft); http://pytools.codeplex.com

  • Star Cluster: IPython parallel+Notebook on Amazon EC2Justin Riley (MIT): http://web.mit.edu/star/cluster

  • Other projects using IPython

    ScientificEPD: Enthought Python Distribution.Anaconda: Continuum Python Distribution.Sage: mathematics.PyRAF: Space Telescope Science InstituteCASA: Nat. Radio Astronomy ObservatoryGanga: CERNPyMAD: neutron spectrom., Laue LangevinSardana: European Synchrotron RadiationASCEND: eng. modeling (Carnegie Mellon).JModelica: dynamical systems.DASH: Denver Aerosol Sources and Health.Trilinos: Sandia National Lab.DoD: baseline configuration.NiPype: computational pipelines, MIT.PyIMSL Studio, by Visual Numerics.

    ...

    Web/OtherVisual Studio 2010: MS.Django.Turbo Gears.Pylons web frameworkZope and Plone CMS.Axon Shell, BBCKamaelia.Schevo database.Pitz: distributedtask/bug tracking.iVR (interactive VirtualReality).Movable Python(portable Pythonenvironment)....

  • Brian Granger Min Ragan-Kelley

    Thomas Kluyver Matthias Bussonnier Paul Ivanov Brad Froehle

    Jrgen Stenarson Robert Kern Evan Patterson Jonathan March

  • (Incomplete) Cast of CharactersBrian Granger - Physics, Cal State San Luis ObispoMin Ragan-Kelley - Nuclear Engineering, UC BerkeleyMatthias Bussonnier - Physics, Institut Curie, ParisJonathan March- EnthoughtThomas Kluyver - Biology, U. SheffieldJrgen Stenarson - Elect. Engineering, Sweden.Paul Ivanov - Neuroscience, UC Berkeley.Robert Kern - EnthoughtEvan Patterson - Physics, Caltech/EnthoughtBrad Froehle - Mathematics, UC BerkeleyStefan van der Walt - UC BerkeleyJohn Hunter - TradeLink Securities, Chicago.Prabhu Ramachandran - Aerospace Engineering, IIT Bombay.Satra Ghosh- MIT NeuroscienceGal Varoquaux - Neurospin (Orsay, France)Ville Vainio - CS, Tampere University of Technology, FinlandBarry Wark - Neuroscience, U. Washington.Ondrej Certik - Physics, U Nevada RenoDarren Dale - CornellJustin Riley - MITMark Voorhies - UC San FranciscoNicholas Rougier - INRIA Nancy Grand EstThomas Spura - Fedora projectMany more! (~220 commit authors)

  • SupportThank you!

    Enthought, Austin, TX: Lots!Microsoft: WinHPC support, Visual Studio integration, Azure(thanks to Shahrokh Mortazavi).DoD/DRC Inc: funding through Sept. 2012 (thanks to JoseUnpingco and Chris Keees).NIH: via NiPy grantNSF: via Sage compmath grantGoogle: summer of code 2005, 2010.Tech-X Corp., Boulder, CO: Parallel/notebook (previous versions)Recent stable funding (2 years, 7 people, J. Taylor):

  • Open Source:skills, tools and practices we need!

    A culture where things get done.Wildly collaborative.Reproducible by necessity.Version control, testing, documentation, public peer review, etc.

  • Reward Structure in academia:we punish all of the above

    Departmental boundaries: interdisciplinary work is a great buzzword,not such a great career path.

    Computational heritage is built on codenot on citations

    Continuous evolution vs publication milestonesAuthorship in collaborative works vs the first-author paper.Scholarship and intellectual effort embedded in the code.

  • SciComp Examples IPython A growing project Wrapup

    Too few are lifting too many

    1 2 3 4 5 6 7 8 9 10Individual Committer

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Commit rate

    Normalized commit rates since Jan-2010

    cython

    ipython

    matplotlib

    mayavi

    numpy

    scipy

    sympy

    FP (UC Berkeley) IPython April 4, 2013 47 / 51

  • NumFOCUS: Open Code, Better Science

    Support the development of core projects in education and research.Community-created and driven.A neutral ground for industry, academia and government.501(c)3 - donations are tax-exempt in the USA

    http://numfocus.org

  • SciComp Examples IPython A growing project Wrapup

    Outline

    1 Scientific Computing

    2 Two examples

    3 IPython: Interactive Python

    4 A growing project

    5 Wrapup

    FP (UC Berkeley) IPython April 4, 2013 49 / 51

  • The future of IPython: a 2-year roadmap

    Spring/summer 2013: IPython 1.0Notebook document management (nbconvert)JavaScript internals cleanup

    Fall 2013Interactive JavaScript APIWith callbacks to remote kernels.

    2014Multiuser serverSimple to deployTrusted (shell OK) Unix users in a lab, group, class, etc.

    https://github.com/ipython/ipython/wiki/Roadmap:-IPython

  • In closing: our vision of scientific computing

    Build on the right abstractionsThe kernel: unify interactive and parallel computing

    you only have one brain!A single protocol: many kernels, many clients.Communications and logging

    the protocol is the notebook file format.

    Insight and communication (Hamming)Literate computing vs literate programming.

    Build a community and an ecosystemHow to Scale a Code in the Human Dimension, M. Turk,http://arxiv.org/abs/1301.7064.

  • In closing: our vision of scientific computing

    Build on the right abstractionsThe kernel: unify interactive and parallel computing

    you only have one brain!A single protocol: many kernels, many clients.Communications and logging

    the protocol is the notebook file format.

    Insight and communication (Hamming)Literate computing vs literate programming.

    Build a community and an ecosystemHow to Scale a Code in the Human Dimension, M. Turk,http://arxiv.org/abs/1301.7064.

  • In closing: our vision of scientific computing

    Build on the right abstractionsThe kernel: unify interactive and parallel computing

    you only have one brain!A single protocol: many kernels, many clients.Communications and logging

    the protocol is the notebook file format.

    Insight and communication (Hamming)Literate computing vs literate programming.

    Build a community and an ecosystemHow to Scale a Code in the Human Dimension, M. Turk,http://arxiv.org/abs/1301.7064.

    Scientific ComputingTwo examplesIPython: Interactive PythonA growing projectWrapup