Painless Data Storage with MongoDB & Go
-
Upload
steve-francia -
Category
Technology
-
view
2.338 -
download
1
description
Transcript of Painless Data Storage with MongoDB & Go
Painless Data Storage with
MongoDB & Go
• Author of Hugo, Cobra, Viper & More
• Chief Developer Advocate for MongoDB
• Gopher
@spf13
Why Go?
Why Another Language?
• Software is slow • Sofware is hard to write • Software doesn’t scale well
Go is Fast• Go execution speed is close to C • Go compile time rivals dynamic
interpretation
Go is Friendly• Feels like a dynamic language in many ways
• Very small core language, easy to remember all of it
• Single binary installation, no dependencies
• Extensive Tooling & StdLib
Go is Concurrent• Concurrency is part of the language
• Any function can become a goroutine
• Goroutines run concurrently, communicate through channels
• Select waits for communication on any of a set of channels
MongoDB
Why Another Database?• Databases are slow • Relational structures don’t fit well
with modern programming (ORMs)
• Databases don’t scale well
MongoDB is Fast• Written in C++
• Extensive use of memory-mapped files i.e. read-through write-through memory caching.
• Runs nearly everywhere
• Data serialized as BSON (fast parsing)
• Full support for primary & secondary indexes
• Document model = less work
MongoDB is Friendly• Ad Hoc queries
• Real time aggregation
• Rich query capabilities
• Traditionally consistent
• Geospatial features
• Support for most programming languages
• Flexible schema
MongoDB is “Web Scale”• Built in sharding support distributes data
across many nodes
• MongoS intelligently routes to the correct nodes
• Aggregation done in parallel across nodes
Document Database• Not for .PDF & .DOC files
• A document is essentially an associative array
• Document == JSON object
• Document == PHP Array
• Document == Python Dict
• Document == Ruby Hash
• etc
Data Serialization• Applications need persistant data
• The process of translating data structures into a format that can be stored
• Ideal format accessible from many languages
BSON• Inspired by JSON
• Cross language binary serialization format
• Optimized for scanning
• Support for richer types
MongoDB & Go
Go’s Data Types• Go uses strict & static typing
• 2 Types are similar to a BSON document
• Struct
• Map
bob := &Person{ Name: "Bob", Birthday: time.Now(), } !data, err := bson.Marshal(bob) if err != nil { return err } fmt.Printf("Data: %q\n", data)
!var person Person err = bson.Unmarshal(data, &person) if err != nil { return err } fmt.Printf("Person: %v\n", person)
Serializing with BSON
bob := &Person{ Name: "Bob", Birthday: time.Now(), } !data, err := bson.Marshal(bob) if err != nil { return err } fmt.Printf("Data: %q\n", data)
!var person Person err = bson.Unmarshal(data, &person) if err != nil { return err } fmt.Printf("Person: %v\n", person)
Serializing with BSON
bob := &Person{ Name: "Bob", Birthday: time.Now(), } !data, err := bson.Marshal(bob) if err != nil { return err } fmt.Printf("Data: %q\n", data)
!var person Person err = bson.Unmarshal(data, &person) if err != nil { return err } fmt.Printf("Person: %v\n", person)
Serializing with BSON
bob := &Person{ Name: "Bob", Birthday: time.Now(), } !data, err := bson.Marshal(bob) if err != nil { return err } fmt.Printf("Data: %q\n", data)
!var person Person err = bson.Unmarshal(data, &person) if err != nil { return err } fmt.Printf("Person: %v\n", person)
Serializing with BSON
bob := &Person{ Name: "Bob", Birthday: time.Now(), } !data, err := bson.Marshal(bob) if err != nil { return err } fmt.Printf("Data: %q\n", data)
!var person Person err = bson.Unmarshal(data, &person) if err != nil { return err } fmt.Printf("Person: %v\n", person)
Serializing with BSON
bob := &Person{ Name: "Bob", Birthday: time.Now(), } !data, err := bson.Marshal(bob) if err != nil { return err } fmt.Printf("Data: %q\n", data)
!var person Person err = bson.Unmarshal(data, &person) if err != nil { return err } fmt.Printf("Person: %v\n", person)
Serializing with BSONData: "%\x00\x00\x00\x02name\x00\x04\x00\x00\x00Bob\x00\tbirthday\x00\x80\r\x97|^\x00\x00\x00\x00"!
Person: {Bob 2014-07-21 18:00:00 -0500 EST}
! type Project struct { Name string `bson:"name"` ImportPath string `bson:"importPath"` } project := Project{name, path} !!! project := map[string]string{"name": name, "importPath": path} !!! project := bson.D{{"name", name}, {"importPath", path}}
Equal After MarshalingStruct
Custom Map
Document Slice
mgo (mango)• Pure Go
• Created in late 2010 ("Where do I put my Go data?")
• Adopted by Canonical and MongoDB Inc. itself
• Sponsored by MongoDB Inc. from late 2011
Connecting
• Same interface for server, replica set, or shard
• Driver discovers and maintains topology
• Server added/removed, failovers, response times, etc
Connectingsession, err := mgo.Dial("localhost") if err != nil { return err }
• Sessions are lightweight
• Sessions are copied (settings preserved)
• Single management goroutine for all copied sessions
Sessionsfunc (s *Server) handle(w http.ResponseWriter, r *http.Request) { session := s.session.Copy() defer session.Close() // ... handle request ... }
• Saves typing
• Uses the same session over and over
Convenient Accessprojects := session.DB("OSCON").C("projects")
Writing
type Project struct { Name string `bson:"name,omitempty"` ImportPath string `bson:"importPath,omitempty"` }
Defining Our Own Type
var projectList = []Project{ {"gocheck", "gopkg.in/check.v1"}, {"qml", "gopkg.in/qml.v0"}, {"pipe", "gopkg.in/pipe.v2"}, {"yaml", "gopkg.in/yaml.v1"}, } !for _, project := range projectList { err := projects.Insert(project) if err != nil { return err } } fmt.Println("Okay!")
Insert
Okay!
type M map[string]interface{} !change := M{"$set": Project{ImportPath: "gopkg.in/qml.v1"}} !err = projects.Update(Project{Name: "qml"}, change) if err != nil { return err } !fmt.Println("Done!")
Update
Done!
Querying
var project Project !err = projects.Find(Project{Name: "qml"}).One(&project) if err != nil { return err } !fmt.Printf("Project: %v\n", project)
Find
Project: {qml gopkg.in/qml.v0}
iter := projects.Find(nil).Iter() !
var project Project for iter.Next(&project) { fmt.Printf("Project: %v\n", project) } !
return iter.Err()
Iterate
Project: {gocheck gopkg.in/check.v1} Project: {qml gopkg.in/qml.v0} Project: {pipe gopkg.in/pipe.v2} Project: {yaml gopkg.in/yaml.v1}
m := map[string]interface{}{ "name": "godep", "tags": []string{"tool", "dependency"}, "contact": bson.M{ "name": "Keith Rarick", "email": "[email protected]", }, } !err = projects.Insert(m) if err != nil { return err } fmt.Println("Okay!")
Nesting
Okay!
type Contact struct { Name string Email string } !type Project struct { Name string Tags []string `bson:",omitempty"` Contact Contact `bson:",omitempty"` } !err = projects.Find(Project{Name: "godep"}).One(&project) if err != nil { return err } !pretty.Println("Project:", project)
Nesting IIProject: main.Project{ Name: "godep", Tags: {"tool", "dependency"}, Contact: {Name:"Keith Rarick", Email:"[email protected]"}, }
• Compound
• List indexing (think tag lists)
• Geospatial
• Dense or sparse
• Full-text searching
Indexing// Root field err = projects.EnsureIndexKey("name") ... !// Nested field err = projects.EnsureIndexKey("author.email") ...
Concurrency
func f(projects *mgo.Collection, name string, done chan error) { var project Project err := projects.Find(Project{Name: name}).One(&project) if err == nil { fmt.Printf("Project: %v\n", project) } done <- err } !done := make(chan error) !go f(projects, "qml", done) go f(projects, "gocheck", done) !if err = firstError(2, done); err != nil { return err }
Concurrent
func f(projects *mgo.Collection, name string, done chan error) { var project Project err := projects.Find(Project{Name: name}).One(&project) if err == nil { fmt.Printf("Project: %v\n", project) } done <- err } !done := make(chan error) !go f(projects, "qml", done) go f(projects, "gocheck", done) !if err = firstError(2, done); err != nil { return err }
Concurrent
func f(projects *mgo.Collection, name string, done chan error) { var project Project err := projects.Find(Project{Name: name}).One(&project) if err == nil { fmt.Printf("Project: %v\n", project) } done <- err } !done := make(chan error) !go f(projects, "qml", done) go f(projects, "gocheck", done) !if err = firstError(2, done); err != nil { return err }
Concurrent
func f(projects *mgo.Collection, name string, done chan error) { var project Project err := projects.Find(Project{Name: name}).One(&project) if err == nil { fmt.Printf("Project: %v\n", project) } done <- err } !done := make(chan error) !go f(projects, "qml", done) go f(projects, "gocheck", done) !if err = firstError(2, done); err != nil { return err }
Concurrent
func f(projects *mgo.Collection, name string, done chan error) { var project Project err := projects.Find(Project{Name: name}).One(&project) if err == nil { fmt.Printf("Project: %v\n", project) } done <- err } !done := make(chan error) !go f(projects, "qml", done) go f(projects, "gocheck", done) !if err = firstError(2, done); err != nil { return err }
Concurrent
func f(projects *mgo.Collection, name string, done chan error) { var project Project err := projects.Find(Project{Name: name}).One(&project) if err == nil { fmt.Printf("Project: %v\n", project) } done <- err } !done := make(chan error) !go f(projects, "qml", done) go f(projects, "gocheck", done) !if err = firstError(2, done); err != nil { return err }
Concurrent
func f(projects *mgo.Collection, name string, done chan error) { var project Project err := projects.Find(Project{Name: name}).One(&project) if err == nil { fmt.Printf("Project: %v\n", project) } done <- err } !done := make(chan error) !go f(projects, "qml", done) go f(projects, "gocheck", done) !if err = firstError(2, done); err != nil { return err }
Concurrent
func f(projects *mgo.Collection, name string, done chan error) { var project Project err := projects.Find(Project{Name: name}).One(&project) if err == nil { fmt.Printf("Project: %v\n", project) } done <- err } !done := make(chan error) !go f(projects, "qml", done) go f(projects, "gocheck", done) !if err = firstError(2, done); err != nil { return err }
Concurrent
Project: {qml gopkg.in/qml.v1} Project: {gocheck gopkg.in/check.v1}
• Find 1 issued
• Doc 1 returned • Find 2 issued
• Doc 2 returned
A Common ApproachFind 1 Find 2 DB
}}
• Find 1 issued • Find 2 issued
• Doc 1 returned • Doc 2 returned
Concurrent QueriesFind 1 Find 2 DB
}}
• Loads 200 results at a time
• Loads next batch with (0.25 * 200) results left to process
Concurrent Loadingsession.SetBatch(200) session.SetPrefetch(0.25) !for iter.Next(&result) { ... }
• Each Copy uses a different connection
• Closing session returns socket to the pool
• defer runs at end of function
Handler With Session Copyfunc (s *Server) handle(w http.ResponseWriter, r *http.Request) { session := s.session.Copy() defer session.Close() ! // ... handle request ... }
• Shares a single connection
• Still quite efficient thanks to concurrent capabilities of go + mgo
Handler With Single Sessionfunc (s *Server) handle(w http.ResponseWriter, r *http.Request) { session := s.session ! // ... handle request ... }
GridFS
GridFS• Not quite a file system
• Really useful for local file storage
• A convention, not a feature
• Supported by all drivers
• Fully replicated, sharded file storage
gridfs := session.DB("OSCON").GridFS("fs") !file, err := gridfs.Create("cd.iso") if err != nil { return err } defer file.Close() !started := time.Now() !_, err = io.Copy(file, iso) if err != nil { return err } !fmt.Printf("Wrote %d bytes in %s\n", file.Size(), time.Since(started))
GridFS
gridfs := session.DB("OSCON").GridFS("fs") !file, err := gridfs.Create("cd.iso") if err != nil { return err } defer file.Close() !started := time.Now() !_, err = io.Copy(file, iso) if err != nil { return err } !fmt.Printf("Wrote %d bytes in %s\n", file.Size(), time.Since(started))
GridFS
gridfs := session.DB("OSCON").GridFS("fs") !file, err := gridfs.Create("cd.iso") if err != nil { return err } defer file.Close() !started := time.Now() !_, err = io.Copy(file, iso) if err != nil { return err } !fmt.Printf("Wrote %d bytes in %s\n", file.Size(), time.Since(started))
GridFS
gridfs := session.DB("OSCON").GridFS("fs") !file, err := gridfs.Create("cd.iso") if err != nil { return err } defer file.Close() !started := time.Now() !_, err = io.Copy(file, iso) if err != nil { return err } !fmt.Printf("Wrote %d bytes in %s\n", file.Size(), time.Since(started))
GridFS
gridfs := session.DB("OSCON").GridFS("fs") !file, err := gridfs.Create("cd.iso") if err != nil { return err } defer file.Close() !started := time.Now() !_, err = io.Copy(file, iso) if err != nil { return err } !fmt.Printf("Wrote %d bytes in %s\n", file.Size(), time.Since(started))
GridFS
gridfs := session.DB("OSCON").GridFS("fs") !file, err := gridfs.Create("cd.iso") if err != nil { return err } defer file.Close() !started := time.Now() !_, err = io.Copy(file, iso) if err != nil { return err } !fmt.Printf("Wrote %d bytes in %s\n", file.Size(), time.Since(started))
GridFS
gridfs := session.DB("OSCON").GridFS("fs") !file, err := gridfs.Create("cd.iso") if err != nil { return err } defer file.Close() !started := time.Now() !_, err = io.Copy(file, iso) if err != nil { return err } !fmt.Printf("Wrote %d bytes in %s\n", file.Size(), time.Since(started))
GridFS
!
Wrote 470386961 bytes in 7.0s
Full Featured
Features• Transactions (mgo/txn experiment)
• Aggregation pipelines
• Full-text search
• Geospatial support
• Hadoop
In Conclusion
Getting Started• 1. Install MongoDB
• 2. go get gopkg.in/mgo.v2
• 3. Start small
• 4. Build something great
Learning More• MongoDB Manual
• Effective Go
• labix.org/mgo
Workshop Using mgo on spf13.com
on spf13.com
• @spf13
• Author of Hugo, Cobra, Viper & More
• Chief Developer Advocate for MongoDB
• Gopher
Thank You