Coinami

25
A Cryptocurrency with DNA Sequence Alignment as Proof-of-work Halil I. Ozercan*, Atalay M. Ileri*, Alper Gundogdu, A. Kerim Senol, M. Yusuf Ozkaya, Can Alkan Bilkent University, Ankara, Turkey

Transcript of Coinami

Page 1: Coinami

A Cryptocurrency with DNA Sequence Alignment as Proof-of-work

Halil I. Ozercan*, Atalay M. Ileri*, Alper Gundogdu, A. Kerim Senol, M. Yusuf Ozkaya, Can Alkan

Bilkent University, Ankara, Turkey

Page 2: Coinami

Atalay Mert IleriIdea, protocols

Introduction to Research course project

H. Ibrahim Ozercan, M. Yusuf Ozkaya, A. Kerim Senol, Alper GundogduDevelopers, Bitcoin enthusiasts

Senior Design Project

Undergrad power

No grad students were harmed during the making of this project

Page 3: Coinami

HTS read alignment

Aligning HTS reads is a compute intensive task ~35 CPU days per 30X genome using BWA ~18K human genomes / year can be sequenced

using HiSeqX Ten 630K CPU days = ~1800 CPU years per HiSeqX Ten

Estimated 1 million genomes by the end of 2017 35 million CPU days = ~100K CPU years for alignment

only

Page 4: Coinami

HTS read alignment (2)

Additionally, reference human genome gets an update every 3-4 years Fixes minor alleles Fixes collapsed duplications Fixes contig orientation (i.e. incorrect inversions) Adds new sequence

For better reliability it is best to remap existing data to new reference All 1000 Genomes Project data are being remapped to

GRCh38

Page 5: Coinami

Remapping old, or mapping new? Large clusters are not infinite resources While remapping old data, more new data are

generated, which typically have higher priority

Computational burden keeps increasing

Proposal: volunteer grid computing

Page 6: Coinami

Volunteer grid computing: BOINC Berkeley Open Infrastructure Network Computing Volunteers download “problem sets” from the server,

solve them in “spare time”, upload results back Made popular with the SETI@home project Some bioinformatics applications are ported

(Rosetta@home, RNAworld, DENIS@home) Total computational power of 8.68 PetaFLOPs

Page 7: Coinami

Read mapping w/BOINC

Data privacy, making sure the alignments are correct, other potential problems

Main Problem: HTS read mapping uses more compute resources on CPU, RAM, and disk. More unlikely for volunteers to dedicate such resources

Solution: Motivating volunteers

Page 8: Coinami

Cryptocurrencies

Digital “money” that uses cryptography to ensure security in transactions and to control creation of new units.

Bitcoin, Dogecoin, Litecoin, etc. Two parts

Mining: generation of new “block”s Transaction: money exchange between peers

Page 9: Coinami

Bitcoin

Most popular cryptocurrency Invented in 2008, open-source software in

2009 Block chain is the source of transactions Completely decentralized In 2013: 2,798,377 GH/s As of now: 353,633,397 GH/s

Page 10: Coinami

Useful in Amsterdam

Page 11: Coinami

Bitcoin blocks

Nonce: a number such that when the block content is hashed with the nonce, theresult is numerically smaller than the difficulty target.Proof-of-work: finding the nonce.• Hard to calculate• Easy to verify

Page 12: Coinami

Coinami: BOINC/Bitcoin hybrid Calculating the nonce in Bitcoin is simply

burning up compute power. No practical use.

Idea: replace the nonce calculation with something useful, while keeping the rest of the cryptocurrency intact

Coinami: Coin-Application Mediator Interface “Application” can be anything that is hard to

compute, easy to verify

Page 13: Coinami

Coinami: Features Not decentralized, but many-centralized.

Approved sequencing centers are signing authorities Root authority merely keeps track of the signing authorities

Multiplexing reads from multiple samples prevent FASTQ file reconstruction & enables data privacy

BWA read aligner, but can be changed Uses decoy reads for verification: real reads with previously-known

alignment locations. Used to check whether the returned BAM is real BWA output, or forged.

Read names are also encrypted, not possible to distinguish run IDs, sample names, decoy vs. queries

Demultiplexing samples and verification (decoy map checking) are done simultaneously O(1) verification

Page 14: Coinami

Coinami: Mining

Page 15: Coinami

Coinami Workflow

Page 16: Coinami

Coinami Workflow

Page 17: Coinami

Coinami Workflow

Page 18: Coinami

Coinami Workflow

Page 19: Coinami

Sample Job

//These two reads are coming from SAMPLE [email protected]/1CCTTNATACTTCCTGGACACCAACTGTTATACNNNGGNNNNNNNNNNNNAATGTCNNNNNCCTGGCCTTTCAAAAGCATAGGGGAATAAATTNNTCAATAA+CCCC#EEEEEHHHHHHHHHHHHHHGHHHHH@@###69############;>;<;=#####:9;;;HDHHHEDAEDEEEEEEHHHEEHGGH48##7:<=:<H@ SAMPLE1.Read425356/1ACCTAGAAGGCATGAAAAGATTAAGGAAATTTTTTAAAAAGATATTCAATGAAGAAAATATTTTGTTTTGGCTAGCATGTAAAGATTTCTTTTTTTAATGC+HHHHGHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHGHHDHHHGHHFGHHGHHHHFHHHHHB@DDEHHEEDGE8EDFFFIF@GGGGHHHHHHFGFHB

//These are from SAMPLE 2

@SAMPLE2.Read2340294/1GGTAACGCTCTATGATCCAGTCGATTTTCAGAGAGACGATGGCCGAGAGATCCGGCTTACGACACTGCCCAAGGGATTAGTAGAACAACAGTGCCACAGGA+D@5EEGGGGBFFD8GBDDCDFEEBDADDD########################################################################@ SAMPLE2.Read4983594/1GGTGATGACCATGTTTTTGGTTTATCGGCGGCCCCCCCCGCTGGCGGGGGTTTTTTTGCTATCCACCATTTTGGCGGCGCACCACTCTTGAGGGTGGTGCA+>6,6/:@;;>BEFAGGGGE7FGDCD?E=CD#######################################################################

//DECOY [email protected]:Z:35T64/1AGACAAGGCAAATTAAAGGTTTAGTAAGCTAAGTGTTCATGAACACATGACAAAAACGTGCCTGCTACTATTGTTGGGTGGCATTCTATAAATGAAATTAA+HHHHHHHHHHHEDHHGHFHCHHFHHGHHHHFCFFHHHFHFHHHHFHHHHHHFHHHGHEFGHD@HCEGG@FFFHEDGFG<EGEEFG=GEEFEGGG=G@GEFF

Page 20: Coinami

Sample Job - Encrypted@BF0C691315C8761672AEBD1F2A42ED43B4D0F9197BD3209B6CC13B27711CC946B21C6DAE1A008F75508C290B1C324EDB/1TTGCTAAATATGCTGAAATATTCGGATTGACCTCTGCGGAAGCCAGTAAGGATATACGGCAGGGATTGAAGAGTTTCGCCGGGGAGGGAGGGGGTTTTTAT+GGFFGGGFGGGGFEFFE?GGGGDFGGGGGGGBGGBFGGGGBFEEFGA?GG8DD=DFGGGFFFB######################################@C480AC6C6D59F77BB873186F1A5E524039D3FFE6567A40559D9434D888FAF7239FF2ECEFD07C79B2762E777D2A074BB3/1GCCCTCACCGACTGCCATTGTCCCTAATGCACCGTAACGGGTGTGGCTGTCTGAGCCGAGGCATATTTTTGCGCCGCCTGGCATTATCTCCAGCACATATT+F@DCFB@ABBDB=CD>BDC8@4@@?<EFFDFFFBDEEAEEEEE=EDDBDA###################################################@A78878C3BE292C0FE0F3E64D2AE9FB2640FFC6D006BC15CF107EA587DD6F0E0395E7F3ECA36A7A867C0DA19D16585146/1GAAGAGAGCTTTATGAGTCTCATGGCTAAATCTACACTGATGAGGGCAGTGACCCGGAGGCTGGTTTATTAGTATGAAAAAGTACGTCCACTGATAAAACT+FEE=FF@EE8CDDCC>@@DD299@;+>:@<19<@>E;EEE2,@:=EEE=-7,7<:ADA@9B4B46<AA#################################@FEAB1E450AF92466520964FD2B39E052AE07D3ECCE6C92460399749F597405B2FEB75F602573E255148F745AE88145BF/1GTTCAGGGTGAGTCGAATGATCCCTTGCCCGCATTCAGCGGAACTGTTGAATATGGGCAAATTCAGGGAACAATAGACAACTTTCAGGAACTCAATGTGCA+HHHHFHHHDFE@FFFBGGEBCGEGGFGHHFHGHGCGGHGHGHGGHCC>=FDC?CDBEEBE+>A;5@AB;?0<<0@@C@ABEEE/.@:>::.7>>>@:6?:A

Public key encryption + base64 encoding

Page 21: Coinami

Future directions

Complete decoupling of read mapping as proof-of-work Docker-based plugins to change the “work”

Miners -> employees Authority servers -> employers Root authority -> central bank

Web-based GUI for “job descriptions” A job bulletin board for different employers

Page 22: Coinami

Conclusions HTS data is monotonically increasing Computational analysis is the bottleneck

Additional burden due to reference updates But (fortunately) embarrassingly parallel problem

Voluntary grids may help “Market will decide”

Coins give motivation to miners since alignment is compute intensive

Decentralized transaction with centralized mining

Page 23: Coinami

Resources

Coinami web page (created as part of senior project) https://coinami.github.io/

GitHub page (code not public yet) https://github.com/coinami

Page 24: Coinami

AcknowledgementsBilkentAtalay Mert İleri (now at MIT)Halil İbrahim Özercan (now senior student)Alper Gündoğdu (now at Facebook)Ahmet Kerim Şenol (now at Google)M. Yusuf Özkaya (now at Georgia Tech)

Travel fellowship to Halil I. Özercan

Page 25: Coinami

Minin’, minin’, minin’Though the reads are mappin’Keep them coins signing’Rawhide!