Coinami
Transcript of Coinami
A Cryptocurrency with DNA Sequence Alignment as Proof-of-work
Halil I. Ozercan*, Atalay M. Ileri*, Alper Gundogdu, A. Kerim Senol, M. Yusuf Ozkaya, Can Alkan
Bilkent University, Ankara, Turkey
Atalay Mert IleriIdea, protocols
Introduction to Research course project
H. Ibrahim Ozercan, M. Yusuf Ozkaya, A. Kerim Senol, Alper GundogduDevelopers, Bitcoin enthusiasts
Senior Design Project
Undergrad power
No grad students were harmed during the making of this project
HTS read alignment
Aligning HTS reads is a compute intensive task ~35 CPU days per 30X genome using BWA ~18K human genomes / year can be sequenced
using HiSeqX Ten 630K CPU days = ~1800 CPU years per HiSeqX Ten
Estimated 1 million genomes by the end of 2017 35 million CPU days = ~100K CPU years for alignment
only
HTS read alignment (2)
Additionally, reference human genome gets an update every 3-4 years Fixes minor alleles Fixes collapsed duplications Fixes contig orientation (i.e. incorrect inversions) Adds new sequence
For better reliability it is best to remap existing data to new reference All 1000 Genomes Project data are being remapped to
GRCh38
Remapping old, or mapping new? Large clusters are not infinite resources While remapping old data, more new data are
generated, which typically have higher priority
Computational burden keeps increasing
Proposal: volunteer grid computing
Volunteer grid computing: BOINC Berkeley Open Infrastructure Network Computing Volunteers download “problem sets” from the server,
solve them in “spare time”, upload results back Made popular with the SETI@home project Some bioinformatics applications are ported
(Rosetta@home, RNAworld, DENIS@home) Total computational power of 8.68 PetaFLOPs
Read mapping w/BOINC
Data privacy, making sure the alignments are correct, other potential problems
Main Problem: HTS read mapping uses more compute resources on CPU, RAM, and disk. More unlikely for volunteers to dedicate such resources
Solution: Motivating volunteers
Cryptocurrencies
Digital “money” that uses cryptography to ensure security in transactions and to control creation of new units.
Bitcoin, Dogecoin, Litecoin, etc. Two parts
Mining: generation of new “block”s Transaction: money exchange between peers
Bitcoin
Most popular cryptocurrency Invented in 2008, open-source software in
2009 Block chain is the source of transactions Completely decentralized In 2013: 2,798,377 GH/s As of now: 353,633,397 GH/s
Useful in Amsterdam
Bitcoin blocks
Nonce: a number such that when the block content is hashed with the nonce, theresult is numerically smaller than the difficulty target.Proof-of-work: finding the nonce.• Hard to calculate• Easy to verify
Coinami: BOINC/Bitcoin hybrid Calculating the nonce in Bitcoin is simply
burning up compute power. No practical use.
Idea: replace the nonce calculation with something useful, while keeping the rest of the cryptocurrency intact
Coinami: Coin-Application Mediator Interface “Application” can be anything that is hard to
compute, easy to verify
Coinami: Features Not decentralized, but many-centralized.
Approved sequencing centers are signing authorities Root authority merely keeps track of the signing authorities
Multiplexing reads from multiple samples prevent FASTQ file reconstruction & enables data privacy
BWA read aligner, but can be changed Uses decoy reads for verification: real reads with previously-known
alignment locations. Used to check whether the returned BAM is real BWA output, or forged.
Read names are also encrypted, not possible to distinguish run IDs, sample names, decoy vs. queries
Demultiplexing samples and verification (decoy map checking) are done simultaneously O(1) verification
Coinami: Mining
Coinami Workflow
Coinami Workflow
Coinami Workflow
Coinami Workflow
Sample Job
//These two reads are coming from SAMPLE [email protected]/1CCTTNATACTTCCTGGACACCAACTGTTATACNNNGGNNNNNNNNNNNNAATGTCNNNNNCCTGGCCTTTCAAAAGCATAGGGGAATAAATTNNTCAATAA+CCCC#EEEEEHHHHHHHHHHHHHHGHHHHH@@###69############;>;<;=#####:9;;;HDHHHEDAEDEEEEEEHHHEEHGGH48##7:<=:<H@ SAMPLE1.Read425356/1ACCTAGAAGGCATGAAAAGATTAAGGAAATTTTTTAAAAAGATATTCAATGAAGAAAATATTTTGTTTTGGCTAGCATGTAAAGATTTCTTTTTTTAATGC+HHHHGHHHHHHHHGHHHHHHHHHHHHHHHHHHHHHHHHHHGHHDHHHGHHFGHHGHHHHFHHHHHB@DDEHHEEDGE8EDFFFIF@GGGGHHHHHHFGFHB
//These are from SAMPLE 2
@SAMPLE2.Read2340294/1GGTAACGCTCTATGATCCAGTCGATTTTCAGAGAGACGATGGCCGAGAGATCCGGCTTACGACACTGCCCAAGGGATTAGTAGAACAACAGTGCCACAGGA+D@5EEGGGGBFFD8GBDDCDFEEBDADDD########################################################################@ SAMPLE2.Read4983594/1GGTGATGACCATGTTTTTGGTTTATCGGCGGCCCCCCCCGCTGGCGGGGGTTTTTTTGCTATCCACCATTTTGGCGGCGCACCACTCTTGAGGGTGGTGCA+>6,6/:@;;>BEFAGGGGE7FGDCD?E=CD#######################################################################
//DECOY [email protected]:Z:35T64/1AGACAAGGCAAATTAAAGGTTTAGTAAGCTAAGTGTTCATGAACACATGACAAAAACGTGCCTGCTACTATTGTTGGGTGGCATTCTATAAATGAAATTAA+HHHHHHHHHHHEDHHGHFHCHHFHHGHHHHFCFFHHHFHFHHHHFHHHHHHFHHHGHEFGHD@HCEGG@FFFHEDGFG<EGEEFG=GEEFEGGG=G@GEFF
Sample Job - Encrypted@BF0C691315C8761672AEBD1F2A42ED43B4D0F9197BD3209B6CC13B27711CC946B21C6DAE1A008F75508C290B1C324EDB/1TTGCTAAATATGCTGAAATATTCGGATTGACCTCTGCGGAAGCCAGTAAGGATATACGGCAGGGATTGAAGAGTTTCGCCGGGGAGGGAGGGGGTTTTTAT+GGFFGGGFGGGGFEFFE?GGGGDFGGGGGGGBGGBFGGGGBFEEFGA?GG8DD=DFGGGFFFB######################################@C480AC6C6D59F77BB873186F1A5E524039D3FFE6567A40559D9434D888FAF7239FF2ECEFD07C79B2762E777D2A074BB3/1GCCCTCACCGACTGCCATTGTCCCTAATGCACCGTAACGGGTGTGGCTGTCTGAGCCGAGGCATATTTTTGCGCCGCCTGGCATTATCTCCAGCACATATT+F@DCFB@ABBDB=CD>BDC8@4@@?<EFFDFFFBDEEAEEEEE=EDDBDA###################################################@A78878C3BE292C0FE0F3E64D2AE9FB2640FFC6D006BC15CF107EA587DD6F0E0395E7F3ECA36A7A867C0DA19D16585146/1GAAGAGAGCTTTATGAGTCTCATGGCTAAATCTACACTGATGAGGGCAGTGACCCGGAGGCTGGTTTATTAGTATGAAAAAGTACGTCCACTGATAAAACT+FEE=FF@EE8CDDCC>@@DD299@;+>:@<19<@>E;EEE2,@:=EEE=-7,7<:ADA@9B4B46<AA#################################@FEAB1E450AF92466520964FD2B39E052AE07D3ECCE6C92460399749F597405B2FEB75F602573E255148F745AE88145BF/1GTTCAGGGTGAGTCGAATGATCCCTTGCCCGCATTCAGCGGAACTGTTGAATATGGGCAAATTCAGGGAACAATAGACAACTTTCAGGAACTCAATGTGCA+HHHHFHHHDFE@FFFBGGEBCGEGGFGHHFHGHGCGGHGHGHGGHCC>=FDC?CDBEEBE+>A;5@AB;?0<<0@@C@ABEEE/.@:>::.7>>>@:6?:A
Public key encryption + base64 encoding
Future directions
Complete decoupling of read mapping as proof-of-work Docker-based plugins to change the “work”
Miners -> employees Authority servers -> employers Root authority -> central bank
Web-based GUI for “job descriptions” A job bulletin board for different employers
Conclusions HTS data is monotonically increasing Computational analysis is the bottleneck
Additional burden due to reference updates But (fortunately) embarrassingly parallel problem
Voluntary grids may help “Market will decide”
Coins give motivation to miners since alignment is compute intensive
Decentralized transaction with centralized mining
Resources
Coinami web page (created as part of senior project) https://coinami.github.io/
GitHub page (code not public yet) https://github.com/coinami
AcknowledgementsBilkentAtalay Mert İleri (now at MIT)Halil İbrahim Özercan (now senior student)Alper Gündoğdu (now at Facebook)Ahmet Kerim Şenol (now at Google)M. Yusuf Özkaya (now at Georgia Tech)
Travel fellowship to Halil I. Özercan
Minin’, minin’, minin’Though the reads are mappin’Keep them coins signing’Rawhide!