Py datanyc2015

11
PyData NYC 2015 November 10th 2015 Karim Chine [email protected] Towards a universal platform for data science on public and private clouds

Transcript of Py datanyc2015

Page 1: Py datanyc2015

PyData NYC 2015November 10th 2015

Karim Chine [email protected]

Towards a universal platform for

data science on

public and private clouds

Page 2: Py datanyc2015

2

A universal open platformfor data science

Computational Components R packages, Wrapped C,C++,Fortran code, Python modules, Matlab Toolkits… Open source or commercial

Computational Resources

Clusters, grids, private or public cloudsFree or pay-per-use

Computational GUIsHTML5 and Desktop WorkbenchBuilt-in views /Plugins /Collaborative viewsOpen source or commercial

Computational Scripts R / Python / Matlab / Groovy

Computational APIs Java / SOAP / REST, Stateless and stateful

Computational StorageLocal, NFS, FTP, Amazon S3, EBS

Generated Computational Web ServicesStateful or stateless, mapping of R objects/functions

Elastic-R

Page 3: Py datanyc2015

3

Infrastructures federation: rosetta virtual cloud

Public Clouds

Private Cloud

Page 4: Py datanyc2015

44

AWS: programmable infrastructure

Command Line

Web Console

SDK

API

Page 5: Py datanyc2015

55

Command Line

Web Console

SDK

API

rosettaHUB: programming with data and infrastructure

Page 6: Py datanyc2015

6

Google Docs-like real time collaboration

Page 7: Py datanyc2015

7

Traceable and Reproducible data science

Elastic-R AMI 1R 2.10

BioC 2.5

Elastic-R AMI 2R 2.9

BioC 2.3

Elastic-R AMI 3R 2.8

BioC 2.0

Elastic-R Amazon Machine Images

Elastic-R EBS 1

Data Set XXX

Elastic-R EBS 2

Data Set YYY

Elastic-R EBS 3

Data Set ZZZ

Elastic-R EBS 4

Data Set VVV

Elastic-R AMI 2

R 2.9BioC 2.3

Elastic-R EBS 4

Data Set VVV

Amazon Elastic Block Stores

Eastic-R AMI 2R 2.9

BioC 2.3

Elastic-R.org

Elastic-R EBS 4

Data Set VVV

Page 8: Py datanyc2015

8

Architecture

Page 9: Py datanyc2015

9

Architecture

Page 10: Py datanyc2015

10

Data science universal engine Remote Java/R

Processes Events-driven Remote

Objects/Engines R, Python, Mathematica,

Matlab, Scilab, ... Collaborative Spreadsheets Collaborative Scientific

Graphics Canvas Collaborative Dashboard with

collaborative widgets

Page 11: Py datanyc2015

11

www.rosettahub.com