Development and Deployment of the New Research ...the New Research Infrastructure for Open Science...

32
Development and Deployment of the New Research Infrastructure for Open Science in Japan 日本开放科学中新研究基础设施 的开发与部署 Kazu YAMAJI Professor and Director of Research Center for Open Science and Data Platform National Institute of Informatics, Japan TACC Workshop 26th September 2019 – 13:30

Transcript of Development and Deployment of the New Research ...the New Research Infrastructure for Open Science...

Page 1: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Development and Deployment of the New Research Infrastructure for

Open Science in Japan 日本开放科学中新研究基础设施

的开发与部署

Kazu YAMAJI Professor and Director of Research Center for Open Science and Data Platform

National Institute of Informatics, Japan

TACC Workshop 26th September 2019 – 13:30

Page 2: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

NII is the Japanese NREN 日本学术论文搜索网——日本的国家研究与教育网络

To US

To Asia

To Europe : Domestic line (100Gbps or more)

: International line (100Gbps) : International line (10Gbps)

National Universities

Municipal Universities

Private Universities

Junior Colleges

Colleges of Technology

Inter-Univ. Research Institutes

Labs and Others Total

Number of Organizations

86 (100%)

71 (78%)

348 (55%)

62 (18%)

55 (97%)

16 (100%)

179 817

(As of March 2015)

Sapporo

Fukuoka

Osaka

Tokyo

: SINET node

• SINET is a Japanese academic backbone network for more than 800 universities and research institutions, and for about 3 million users.

• SINET covers 100% of national, 78% of municipal, and 55% of private universities.

2

Page 3: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

SINET5

GakuNin Federation

GakuNin-Cloud Direct Connection VPN

Collaboration and Promotion in Research and Education

Resource

Network

Cloud Dramatic cost reduction and

enhancement of research and education environment by tailored cloud services

Promotion of academic information circulation and open access

Collaborative promotion of institutional repository expansion

Security Network flow analysis

and dynamic control Raise of security level

for SINET users

Collaborative enhancement of authentication between universities

Federation

Flow Analysis

Nationwide 100-Gbps backbone network and scalable network expansion High-speed direct international lines to USA, Europe, and Asia Introduction of new technologies such as SDN in response to user needs

21st Century Academic Information Infrastructure for Advancing Open Science 21世纪学术信息基础设施促进开放科学

3

Page 4: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Recent Trend in E-Infrastructure 电子基础设施的最新趋势

4

Page 5: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

European Open Science Cloud (EOSC)欧洲开放科学云数据

• Past: Each Institutions or Project have their own E-Infrastructure

• Future: Integrate Existing E-Infrastructure and make it All EU Available • Integrate Different Layer Services from Network to Domain DB • Make it Visible by Developing Portal (EOSC-hub) and Discovery (OpenAIRE) • Consider to Support Long-Tail Domain such as Humanity and Social Science • Consider to Collaborate with Small and Medium Enterprise

Federation Services

AAI, Accounting, Monitoring,

Basic Infrastructure Compute and Storage

Open Collaboration

Platforms

Application Repository,

Configuration Management, Marketplace

Common services

Thematic Service

Thematic Service

Thematic Service

Thematic Service

Thematic Service

Community Support services

Thematic Service

Added Value Services Compute, Data, Software

Management and Preservation

1. CLARIN (language resources) 2. DODAS-CMS (high energy physics) 3. ESAS-ENES (Climate analytics) 4. GEOSS (earth observation) 5. OpenCoastS (Coastal circulation forecast) 6. WeNMR (structural biology) 7. EP pillar (Earth observation) 8. DARIAH (digital humanities) 9. LifeWatch (biodiversity)

Common PF for RDM and Sharing

Discovery

Portal

Domain Services

Authn/Authz

Compute Storage

Network 5

Page 6: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Australia澳大利亚

ARDC first appears in 2016 roadmap

Service Architecture is almost same with EOSC 6

Australian Research Data Commons

Page 7: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Canada加拿大

7

Collaborative work between NREN, HPC and Library Community

Page 8: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Africa非洲

8

Page 9: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Africa非洲

9

Collaborative work between NREN and Library Community

Page 10: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

• De facto Service Layer Stack

• Integration for More Useful Infrastructure • Service Integration: Between Services and Domains • Organizational Cooperation: Business Plan and Budget

What can be seen from Recent Movement 最新动向的启示

Network Identity and Access Federation Virtual Organization Platform

Cloud Computing / HPC

Common Services and Tools

Domain Centric Services

Common Discover Service

Improve UX and Cost Effect by Integrating existing E-Infrastructures Hard to Develop by the Single Institution

Require to Facilitating in National and Regional Level Cooperation 10

Each Domain using NII RDC

Page 11: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Open Science E-Infra in JAPAN NII Research Data Cloud

日本的电子基础设施开放科学 日本学术论文搜索网研究数据云

11

Page 12: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

NII Research Data Cloud日本学术论文搜索网研究数据云

Discovery Platform

Publication Platform

Research Data Management System

DOI

Subject Repository

Metadata Management ● Linking Func between Article and Data ● Researcher and Research Project Identification and Management Func ● Data Exchange with International Discovery Service

Research Data Mng User Interface

Access Control Metadata Mng

Journal Article Supplemental

Data

Institutional Research Data Mng

Hot Storage

Hot Storage

Hot Storage

Cold Storage

Cold Storage

Cold Storage

Data Depositor

Archive Exp/Store

Search/Find

Data User

Article

● High Speed Access using SINET5 ● Data Sharing Func using Virtual NW and ID Federation ● Effective Data Storage Switcher

● Data oriented Self-Archiving Func ● Versioning and auto-Packaging Func ● User Dependent Personal Data Pseudonym Func

Research Data Repository

Private Shared Public

RDM Platform

Discovery Service

International Metadata

Aggregator

Storage Area for Long-term Preservation

Re-use Metadata Aggregation

Exp Data

User Flow Data Flow

by

12

Page 13: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Discovery Service CiNii 日本学术论文搜索网的探索服务

(2,600) (2,890) (3,090) (3,200) (3,500) (3,660) (3,790) (3,880) (4,020) (4,150) (4,260) (4,304)

7,206

13,286

28,919 35,000

64,100

57,600 60,460

57,580

56,400

61,200 61,270 59,120

0

10,000

20,000

30,000

40,000

50,000

60,000

70,000

0

5,000

10,000

15,000

20,000

25,000

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

Fulltext(internal) Meta Search Search (thousand)

Articles (thousand)

12,800

14,300 15,300

16,020

9,900 10,600 11,500

16,720

18,730 19,270

2016 Monthly ave. ■full text DL 4.52M ■detail view 10.9M ■search 4.93M

Articles and Searches

2009.4 Drastic UI Renewal

2007.4 Indexed by

Google

12,000

19,730

13

Page 14: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

14

CiNii Knowledge Base

CONTENT TYPE 件数 RESOURCE

Article 37,376,419 CiNii Articles, J-STAGE, Repository…

Book 11,801,960 NACSIS-CAT

Dissertation 634,467 NDL Onlne, Repository…

Project 848,051 KAKEN

Research Data 12,568,782 DataCite, Japan Link Center

Researcher 2,874,337 KAKEN, NACSIS-CAT

Total 66,104,016

Links 10,813,948 CiNii Articles, KAKEN

Biggest Knowledge Graph in Japan

CiNii KB could be used for Institutional Research

Research Data

Journal Article

Research Project

Conventional Research Output DOI, Handle, URI,

ISBN, ISSN...

Project ID

Funding Agency Crossref Funder, GRID, ISNI...

Research Institution Institutional ID, GRID, ISNI...

Researcher ID,ORCID...

Research Data DOI, URI...

Research Activity

Book Paper Dissertation

Coverage of Current CiNii

Page 15: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

JAIRO Cloud日本在线学会知识库云

• Background • Limited resources and less technical knowledge hamper

implementation of IR especially in small universities. • JAIRO Cloud provides a shared instance of IR system on the virtual

server hosted by NII since April 2012.

• Service Architecture

15

Page 16: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Number of Institutional Repository in Japan日本学会知识库数量

16

2 10 58 101 144 193 228 260 284 301 316 310 285 256 228 226

73 130

210 288 396 498 558 568

41

2 10 58

101 144

193 228 260

357 431

526 598

681 754 829

835

0 100 200 300 400 500 600 700 800 900 829 IRs

■ by JAIRO Cloud: Pilot Operation ■ by JAIRO Cloud: Production Operation ■ by University On-premise System

Page 17: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

• Current System WEKO2 • Journal Article Repository • Add Functions more and more

• New System WEKO3 • Based on Invenio3 which is originally focused as Data Repository • Integrate WEKO2 Functions into Invenio3

WEKO3 Data Repository

Strengthen Conventional Functions

Effective Development and Operation

Realize New Publication Platform based on sophisticated Invenio3 Architecture (Invenio3 = our RDM Platform in Architecture)

Domain Use-case by Extensibility

Research Data ✖

17

Page 18: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

International Collaboration with CERN 与CERN的国际合作

18

Page 19: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

• Article Repository (Current Role) Preparing for migration from current system to the new WEKO3

• Data Repository (New Role)

19

Multi Tenancy Workflow Enable to define different workflow sets which

are required by different operational model. Enable to operate different data repositories in the single institutional repository service

Sustainability Flexibility

Laboratory Project Institution Data Repository is required at Different Level

< <

Cloud type Data Repository

Page 20: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

International Collaboration with WACREN 与WACREN的国际合作

20

Page 21: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

International Development of new Invenio Flavor 一股清流Invenio的国际化发展

21

Page 22: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

New Service 新服务

Manage Research Data by Research Project

Share Research Data within Collaborators Authn by ID Fed

Connect Cloud Storage from Various Plugin

RDM Platform

Cloud Storage

Public Cloud (Provider DC)

Private Cloud (On-premise)

Customize selectable Plugin depending on University Environment and Policy

NII: Frontend Service

University: Backend Storage NII Storage

Public Cloud (Provider DC)

Default (minimum?) Storage by NII

Extension of Open Science Framework developed by COS, USA

22

Page 23: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

New Functions Developed in FY2018 and FY2019 2018和2019年开发的新功能 • New Plugin

• New External Storage • ownCloud, S3 Compatible Storage, OpenStack Swift

• Integration with Publication Platform • Integration with Data Analysis Tool

• JupyterHub • Plugin SDK

• Research Data Management • Research Footprint Management • Metadata Management • Workflow Management

• Institutional Management • Plugin Selection • Statistics • Institutional Template

23

Page 24: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Publication from Repository知识库的出版物

• DOI Registration • Embargo Control • Metadata Validation etc.

Publication Platform

24

Page 25: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Integration with Data Analysis Tool结合数据分析工具

• GakuNin RDM add-on for Data Analysis Tool: JupyterHub • Easy to Data Transfer between GakuNin RDM and JupyterHub • GakuNin ID Federation allow uses Single Sign On between Systems

Connect

JupyterHub ・Programming ・Execution

GakuNin RDM ・Storage ・Repository

(2018年12月実装)

25

Page 26: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Research Footprint Management研究痕迹管理

商用時刻認証局

Time Stamp

2007.11.8

10:05:32

Time Stamping Authority

Admin

国立情報学研究所[Test] Project Log

Institutional Log

26

Page 27: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Institutional Management Function机构管理功能

• Select Institutionally Available Storage • Select Authorized External Services • Download Institutional Logs

27

Page 28: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Experimental Plan with Universities and Research Institutions 与大学和研究机构的实验计划

• αTesting#1 : March 2017 Object:Obtain feedback from IT Center in Large Scale Institutions Participants : Hokkaido University, Tohoku University, Kyoto University, Osaka University Kyushu University,

Nagoya Institute of Technology, National Institute for Environmental Studies.

• αTesting#2 : October 2017 Object : Obtain feedback from Laboratory Use Case Participants : The University of Tokyo, Nagoya University, Tsukuba University, Keio University, Aizu University,

Fukushima Medical University, RIKEN, JAXA

• βTesting#1 : June 2018 Object : Middle Scale Experiment by adopting New Functions developed in 2017

• Long Run Study : April- 2019 Object : Obtain feedback from Institutional and Domain Specific Use Case

28

Page 29: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Internal and External Collaboration内部和外部合作

Research Center for Open Science and Data Platform

R&D Center for Academic Networks

Academic Authentication Systems Office Center for Cloud R&D

AXIES Research Data Management WG

JPCOAR Research Data TF

University IT Center (System Requirement, Operation Policy)

University Library (RDM Training)

• Secure NW • Service Deployment

• Storage Procurement • Data Analysis Infra.

• ID Federation • VO Platform

International

29

Page 30: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

• FY2019 –Q2 • RDM Collaboration Functionality, Operational System • Repository Migration Tools • Discovery KB Refinement, Several Different Algorisms

• FY2019 Q3-

• FY2020 Production Level Operation

Future Work未来的工作

30

Develop Effective Operational System

Extend Data Source from Domain DB

Case Study according to Research Data Life Cycle

Migration Test in JAIRO Cloud

Feasibility Study Obtain Case Study

Page 31: Development and Deployment of the New Research ...the New Research Infrastructure for Open Science in Japan ... and Discovery ( OpenAIRE) • Consider to Support Long-Tail Domain such

Relationship between Research Data Infrastructure and Research Workflow研究数据基础设施与研究工作流程的关系

Project Start (Application) Member Management Initial Setting

Data Management Data Analysis

Paper writing Deposit with Supplemental Data

Aggregation

Institutional or Domain

Repository

Experiment Data Acquisition

Discovery Platform

Publication Platform

RDM Platform

RDM Platform

31