I am trying to use the svgo package to plot points on an svg file and display that using the web browser. From looking at the net/http documetation, I don't know how I could pass arguments into my svgWeb function.
The example below compiles and displays a triangle and a line in my web-browser, but what I would really like to do is plot xpts and ypts using the Polyline method. How can I pass the appropriate arguments or restructure this example to accomplish that task?
package main
import (
"github.com/ajstarks/svgo"
"log"
"net/http"
)
func svgWeb(w http.ResponseWriter, req *http.Request) {
w.Header().Set("Content-Type", "image/svg+xml")
xpts := []int{1, 200, 5}
ypts := []int{200, 400, 300}
s := svg.New(w)
s.Start(500, 500)
s.Line(5, 10, 400, 400, "stroke:black")
s.Polyline(xpts, ypts, "stroke:black")
s.End()
}
//// Main Program function
//////////////////////////////
func main() {
xpts := []int{}
ypts := []int{}
for i := 0; i < 100; i++ {
xpts = append(xpts, i)
xpts = append(ypts, i+5)
}
http.Handle("/economy", http.HandlerFunc(svgWeb))
err := http.ListenAndServe(":2003", nil)
if err != nil {
log.Fatal("ListenAndServe:", err)
}
}
If your arguments are meant to be supplied by the client, then they should be passed to your handler via the http.Request.
But if what you are trying to do is to drive your svgWeb handler by points that are not supplied by the client request, but rather by some other functions in your application generating these values internally, then one way would be to structure your handler into a struct and use member attributes.
The struct may look like this:
type SvgManager struct {
Xpts, Ypts []int
}
func (m *SvgManager) SvgWeb(w http.ResponseWriter, req *http.Request) {
w.Header().Set("Content-Type", "image/svg+xml")
s := svg.New(w)
s.Start(500, 500)
s.Line(5, 10, 400, 400, "stroke:black")
s.Polyline(m.Xpts, m.Ypts, "stroke:black")
s.End()
}
Then in your main:
manager := new(SvgManager)
for i := 0; i < 100; i++ {
manager.Xpts = append(manager.Xpts, i)
manager.Ypts = append(manager.Ypts, i+5)
}
// I only did this assignment to make the SO display shorter in width.
// Could have put it directly in the http.Handle()
handler := http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
manager.SvgWeb(w, req)
})
http.Handle("/economy", handler)
Now you have an SvgManager instance that could contain other handlers as well, and can be updated to affect the output of their handlers.
Satisfying the Handler interface
As mentioned by #Atom in the comments, you could completely avoid the closure and the wrapper by simply renaming your method to ServeHTTP. This would satisfy the Handler interface
func (m *SvgManager) ServeHTTP(w http.ResponseWriter, req *http.Request) {
...
manager := new(SvgManager)
http.Handle("/economy", manager)
You should define your function inside main as an anonymous function. This way, it can refer to the local variables xpts and ypts (the function will be a closure).
Related
I am working on a project where during startup I need to read certain files and store it in memory in a map and then periodically look for new files if there are any and then replace whatever I had in memory in the map earlier during startup with this new data. Basically every time if there is a new file which is a full state then I want to refresh my in memory map objects to this new one instead of appending to it.
Below method loadAtStartupAndProcessNewChanges is called during server startup which reads the file and store data in memory. Also it starts a go-routine detectNewFiles which periodically checks if there are any new files and store it on a deltaChan channel which is later accessed by another go-routine processNewFiles to read that new file again and store data in the same map. If there is any error then we store it on err channel. loadFiles is the function which will read files in memory and store it in map.
type customerConfig struct {
deltaChan chan string
err chan error
wg sync.WaitGroup
data *cmap.ConcurrentMap
}
// this is called during server startup.
func (r *customerConfig) loadAtStartupAndProcessNewChanges() error {
path, err := r.GetPath("...", "....")
if err != nil {
return err
}
r.wg.Add(1)
go r.detectNewFiles(path)
err = r.loadFiles(4, path)
if err != nil {
return err
}
r.wg.Add(1)
go r.processNewFiles()
return nil
}
This method basically figures out if there are any new files that needs to be consumed and if there is any then it will put it on the deltaChan channel which will be later on consumed by processNewFiles go-routine and read the file in memory. If there is any error then it will add error to the error channel.
func (r *customerConfig) detectNewFiles(rootPath string) {
}
This will read all s3 files and store it in memory and return error. In this method I clear previous state of my map so that it can have fresh state from new files. This method is called during server startup and also called whenever we need to process new files from processNewFiles go-routine.
func (r *customerConfig) loadFiles(workers int, path string) error {
var err error
...
var files []string
files = .....
// reset the map so that it can have fresh state from new files.
r.data.Clear()
g, ctx := errgroup.WithContext(context.Background())
sem := make(chan struct{}, workers)
for _, file := range files {
select {
case <-ctx.Done():
break
case sem <- struct{}{}:
}
file := file
g.Go(func() error {
defer func() { <-sem }()
return r.read(spn, file, bucket)
})
}
if err := g.Wait(); err != nil {
return err
}
return nil
}
This method read the files and add in the data concurrent map.
func (r *customerConfig) read(file string, bucket string) error {
// read file and store it in "data" concurrent map
// and if there is any error then return the error
var err error
fr, err := pars3.NewS3FileReader(context.Background(), bucket, file, r.s3Client.GetSession().Config)
if err != nil {
return errs.Wrap(err)
}
defer xio.CloseIgnoringErrors(fr)
pr, err := reader.NewParquetReader(fr, nil, 8)
if err != nil {
return errs.Wrap(err)
}
if pr.GetNumRows() == 0 {
spn.Infof("Skipping %s due to 0 rows", file)
return nil
}
for {
rows, err := pr.ReadByNumber(r.cfg.RowsToRead)
if err != nil {
return errs.Wrap(err)
}
if len(rows) <= 0 {
break
}
byteSlice, err := json.Marshal(rows)
if err != nil {
return errs.Wrap(err)
}
var invMods []CompModel
err = json.Unmarshal(byteSlice, &invMods)
if err != nil {
return errs.Wrap(err)
}
for i := range invMods {
key := strconv.FormatInt(invMods[i].ProductID, 10) + ":" + strconv.Itoa(int(invMods[i].Iaz))
hasInventory := false
if invMods[i].Available > 0 {
hasInventory = true
}
r.data.Set(key, hasInventory)
}
}
return nil
}
This method will pick what is there on the delta channel and if there are any new files then it will start reading that new file by calling loadFiles method. If there is any error then it will add error to the error channel.
// processNewFiles - load new files found by detectNewFiles
func (r *customerConfig) processNewFiles() {
// find new files on delta channel
// and call "loadFiles" method to read it
// if there is any error, then it will add it to the error channel.
}
If there is any error on the error channel then it will log those errors from below method -
func (r *customerConfig) handleError() {
// read error from error channel if there is any
// then log it
}
Problem Statement
Above logic works for me without any issues but there is one small bug in my code which I am not able to figure out on how to solve it. As you can see I have a concurrent map which I am populating in my read method and also clearing that whole map in loadFiles method. Because whenever there is a new file on delta channel I don't want to keep previous state in the map so that's why I am removing everything from the map and then adding new state from new files to it.
Now if there is any error in read method then the bug happens bcoz I have already cleared all the data in my data map which will have empty map which is not what I want. Basically if there is any error then I would like to preserve previous state in the data map. How can I resolve this issue in my above current design.
Note: I am using golang concurrent map
I think your design is over complicated. It can be solved much simpler, which gives all the benefits you desire:
safe for concurrent access
detected changes are reloaded
accessing the config gives you the most recent, successfully loaded config
the most recent config is always, immediately accessible, even if loading a new config due to detected changes takes long
if loading new config fails, the previous "snapshot" is kept and remains the current
as a bonus, it's much simpler and doesn't even use 3rd party libs
Let's see how to achieve this:
Have a CustomerConfig struct holding everything you want to cache (this is the "snapshot"):
type CustomerConfig struct {
Data map[string]bool
// Add other props if you need:
LoadedAt time.Time
}
Provide a function that loads the config you wish to cache. Note: this function is stateless, it does not access / operate on package level variables:
func loadConfig() (*CustomerConfig, error) {
cfg := &CustomerConfig{
Data: map[string]bool{},
LoadedAt: time.Now(),
}
// Logic to load files, and populate cfg.Data
// If an error occurs, return it
// If loading succeeds, return the config
return cfg, nil
}
Now let's create our "cache manager". The cache manager stores the actual / current config (the snapshot), and provides access to it. For safe concurrent access (and update), we use a sync.RWMutex. Also has means to stop the manager (to stop the concurrent refreshing):
type ConfigCache struct {
configMu sync.RWMutex
config *CustomerConfig
closeCh chan struct{}
}
Creating a cache loads the initial config. Also launches a goroutine that will be responsible to periodically check for changes.
func NewConfigCache() (*ConfigCache, error) {
cfg, err := loadConfig()
if err != nil {
return nil, fmt.Errorf("loading initial config failed: %w", err)
}
cc := &ConfigCache{
config: cfg,
closeCh: make(chan struct{}),
}
// launch goroutine to periodically check for changes, and load new configs
go cc.refresher()
return cc, nil
}
The refresher() periodically checks for changes, and if changes are detected, calls loadConfig() to load new data to be cached, and stores it as the current / actual config (while locking configMu). It also monitors closeCh to stop if that is requested:
func (cc *ConfigCache) refresher() {
ticker := time.NewTicker(1 * time.Minute) // Every minute
defer ticker.Stop()
for {
select {
case <-ticker.C:
// Check if there are changes
changes := false // logic to detect changes
if !changes {
continue // No changes, continue
}
// Changes! load new config:
cfg, err := loadConfig()
if err != nil {
log.Printf("Failed to load config: %v", err)
continue // Keep the previous config
}
// Apply / store new config
cc.configMu.Lock()
cc.config = cfg
cc.configMu.Unlock()
case <-cc.closeCh:
return
}
}
}
Closing the cache manager (the refresher goroutine) is as easy as:
func (cc *ConfigCache) Stop() {
close(cc.closeCh)
}
The last missing piece is how you access the current config. That's a simple GetConfig() method (that also uses configMu, but in read-only mode):
func (cc *ConfigCache) GetConfig() *CustomerConfig {
cc.configMu.RLock()
defer cc.configMu.RUnlock()
return cc.config
}
This is how you can use this:
cc, err := NewConfigCache()
if err != nil {
// Decide what to do: retry, terminate etc.
}
// Where ever, whenever you need the actual (most recent) config in your app:
cfg := cc.GetConfig()
// Use cfg
Before you shut down your app (or you want to stop the refreshing), you may call cc.Stop().
Added RWMutex for collectedData concurrent write protecting by worker goroutine
type customerConfig struct {
...
m sync.RWMutex
}
Instead of updating map in read method let read method just return the data and error
func (r *customerConfig) read(file string, bucket string) ([]CompModel, error) {
// read file data and return with error if any
var err error
fr, err := pars3.NewS3FileReader(context.Background(), bucket, file, r.s3Client.GetSession().Config)
if err != nil {
return (nil, errs.Wrap(err))
}
defer xio.CloseIgnoringErrors(fr)
pr, err := reader.NewParquetReader(fr, nil, 8)
if err != nil {
return (nil, errs.Wrap(err))
}
if pr.GetNumRows() == 0 {
spn.Infof("Skipping %s due to 0 rows", file)
return (nil, errors.New("No Data"))
}
var invMods = []CompModel{}
for {
rows, err := pr.ReadByNumber(r.cfg.RowsToRead)
if err != nil {
return (nil, errs.Wrap(err))
}
if len(rows) <= 0 {
break
}
byteSlice, err := json.Marshal(rows)
if err != nil {
return (nil, errs.Wrap(err))
}
var jsonData []CompModel
err = json.Unmarshal(byteSlice, &jsonData)
if err != nil {
return (nil, errs.Wrap(err))
}
invMods = append(invMods, jsonData...)
}
return invMods, nil
}
And then loadFiles you can collect the data return by read
method and if no error only then clear and update the map else
leave the old data as it was before
func (r *customerConfig) loadFiles(workers int, path string) error {
var err error
...
var files []string
files = .....
// reset the map so that it can have fresh state from new files.
// r.data.Clear() <- remove the clear from here
g, ctx := errgroup.WithContext(context.Background())
sem := make(chan struct{}, workers)
collectedData := []CompModel{}
for _, file := range files {
select {
case <-ctx.Done():
break
case sem <- struct{}{}:
}
file := file
g.Go(func() error {
defer func() { <-sem }()
data, err:= r.read(spn, file, bucket)
if err != nil {
return err
}
r.m.Lock()
append(collectedData, data...)
r.m.Unlock()
return nil
})
}
if err := g.Wait(); err != nil {
return err
}
r.data.Clear()
for i := range collectedData {
key := strconv.FormatInt(collectedData[i].ProductID, 10) + ":" + strconv.Itoa(int(collectedData[i].Iaz))
hasInventory := false
if collectedData[i].Available > 0 {
hasInventory = true
}
r.data.Set(key, hasInventory)
}
return nil
}
Note: Since the code is not runnable just updated methods for reference and I have not include mutex lock for updating the slice you may need to handle for the case.
The same can be achieved with just 3 functions - detect, read, load, detect will check for new files by interval and push to delta channel if found any, load will get file path to read from delta channel and call read method to get the data and error then checks if no error then clear the map and update with new content else log the error, so you would have 2 go routines and 1 function which would be called by load routine
package main
import (
"fmt"
"time"
"os"
"os/signal"
"math/rand"
)
func main() {
fmt.Println(">>>", center("STARTED", 30), "<<<")
c := &Config{
InitialPath: "Old Path",
DetectInterval: 3000,
}
c.start()
fmt.Println(">>>", center("ENDED", 30), "<<<")
}
// https://stackoverflow.com/questions/41133006/how-to-fmt-printprint-this-on-the-center
func center(s string, w int) string {
return fmt.Sprintf("%[1]*s", -w, fmt.Sprintf("%[1]*s", (w + len(s))/2, s))
}
type Config struct {
deltaCh chan string
ticker *time.Ticker
stopSignal chan os.Signal
InitialPath string
DetectInterval time.Duration
}
func (c *Config) start() {
c.stopSignal = make(chan os.Signal, 1)
signal.Notify(c.stopSignal, os.Interrupt)
c.ticker = time.NewTicker(c.DetectInterval * time.Millisecond)
c.deltaCh = make(chan string, 1)
go c.detect()
go c.load()
if c.InitialPath != "" {
c.deltaCh <- c.InitialPath
}
<- c.stopSignal
c.ticker.Stop()
}
// Detect New Files
func (c *Config) detect() {
for {
select {
case <- c.stopSignal:
return
case <- c.ticker.C:
fmt.Println(">>>", center("DETECT", 30), "<<<")
c.deltaCh <- fmt.Sprintf("PATH %f", rand.Float64() * 1.5)
}
}
}
// Read Files
func read(path string) (map[string]int, error) {
data := make(map[string]int)
data[path] = 0
fmt.Println(">>>", center("READ", 30), "<<<")
fmt.Println(path)
return data, nil
}
// Load Files
func (c *Config) load() {
for {
select {
case <- c.stopSignal:
return
case path := <- c.deltaCh:
fmt.Println(">>>", center("LOAD", 30), "<<<")
data, err := read(path)
if err != nil {
fmt.Println("Log Error")
} else {
fmt.Println("Success", data)
}
fmt.Println()
}
}
}
Note: Not included map in sample code it can be easily updated to include map
Just allocate new one map. Like this:
var mu sync.Mutex
before := map[string]string{} // Some map before reading
after := make(map[string]string)
// Read files and fill `after` map
mu.Lock()
before = after
mu.Unlock()
Instead of clearing the map in loadFile method, do something like this in read
func (r *customerConfig) read(file string, bucket string) error {
m := cmap.New() // create a new map
// ...
for {
rows, err := pr.ReadByNumber(r.cfg.RowsToRead)
if err != nil {
return errs.Wrap(err)
}
if len(rows) <= 0 {
break
}
byteSlice, err := json.Marshal(rows)
if err != nil {
return errs.Wrap(err)
}
var invMods []CompModel
err = json.Unmarshal(byteSlice, &invMods)
if err != nil {
return errs.Wrap(err)
}
for i := range invMods {
key := strconv.FormatInt(invMods[i].ProductID, 10) + ":" + strconv.Itoa(int(invMods[i].Iaz))
hasInventory := false
if invMods[i].Available > 0 {
hasInventory = true
}
m.Set(key, hasInventory)
}
}
r.data = m // Use the new map
return nil
}
currently working with the vishvananda/netns package trying to extract routes from a specific network namespace.
There is a defined Handle struct which is returned when I request a 'handle' for a specific network namespace. As such:
func NewHandleAt(ns netns.NsHandle, nlFamilies ...int) (*Handle, error)
This is then a receiver argument (?) to a function that requires that handle,
func (h *Handle) LinkList() ([]Link, error)
I'm new to go and not sure how to tie these together. I'm stuck with:
func (h *Handle) showInts() {
int, err := h.netlink.LinkList()
if err != nil {
log.Fatal(err)
}
for i, r := range int {
log.Printf("%d: %s", i, r.Attrs().Name)
}
}
func main() {
ints, err := netlink.LinkList()
if err != nil {
log.Fatal(err)
}
for i, r := range ints {
log.Printf("%d: %s", i, r.Attrs().Name)
}
pid, err := netns.GetFromPid(9097)
if err != nil {
log.Fatal(err)
}
netlink.NewHandleAt(pid)
showInts()
}
Update
While writing the original answer, touched on a number of things, without any clear structure, so here's a more structured version:
Depending on what you're actually asking (ie "How do I add a receiver function/method to an exported type", or "What the hell is a receiver function"), the answers are as follows:
How do I add a receiver function to an exported type?
Easy, same as you do with any other type. You were close, in fact. This doesn't work:
func (h *Handler) showInts() {}
Because you're adding a method to the Handler type in your package. Given you have a main function, that would be the main package. You're trying to add it to the netlink.Handler type instead. In which case, this will work:
func (h *netlink.Handler) showInts(){}
The type is netlink.Handler in your main package after all... This, however will not work. The compiler will refuse to compile, telling you: "Cannot define new methods on non-local type". This is easily mitigated, though, by creating a new type, and add the method there:
type MyHandler netlink.Handler
func (h *MyHandler) showInts(){}
Be that as it may, the last 2 lines in your code strike me as wrong.
Given that NewHandleAt returns (*Handle, error), and netlink.Handle is a receiver argument, the correct way would be:
var mh *MyHandle
if h, err := netlink.NewHandleAt(pid); err != nil {
log.Fatal(err) // something went wrong
} else {
mh = (*MyHandle)(h)
}
mh.showInts() // call showInts on mh, which is of type *MyHandle
The fact that you've "wrapped" the external type in a custom type does mean you'll find yourself casting the same thing quite a lot. Say netlink.Handle has a Test method, and you want to call it inside showInts:
func (h *MyHandle) showInts() {
nh := (*netlink.Handle)(h) //cast required
nh.Test()
}
I'd also change the varname from pid to nsh or something, because it's a NsHandle, and not a pid after all...
What is a receiver argument?
Because you wrote this:
This is then a receiver argument (?) to a function that requires that handle,
I get the impression you're not entirely clear on what a receiver argument is. Put simply, it's like a function argument, but instead of an argument that is just passed to a function, it's an argument that holds the object/value on which the function is called. Basically, it's the "instance" on which the function/method is called. Think of it as the this keyword in many OOP languages:
func (h *MyHandle) showInts() {
return
}
In something like C++ would be
class MyHandle : Handle
{
public:
void showInts(void) { return; } // replace h with this
}
There are significant differences, however:
The receiver argument can be a pointer, or a value - in case of a value receiver, the method cannot modify the receiver value
There's no such thing as private, public, or protected... at least not in the traditional OO way
...
There's quite a few differences, perhaps consider going through the golang tour. The stuff about go methods can be found here
Other issues/weird things
After looking at your code again, I'm really not sure whether this is correct:
h.netlink.LinkList()
In your main function, you call netlink.LinkList(). h is a *netlink.Handler. If you need to call the netlink.LinkList function, it's highly likely h.netlink.LinkList is not what you want to do. Instead, you should simply call netlink.LinkList().
That's assuming you need to call the function in the first place.
Given that you've already called it in the main function, why not pass it as an argument?
//in main:
ints, err := netlink.LinkList()
//...
h.showInts(ints)
func (h *MyHandle)showInts(ll []netlink.Link) {
}
Thanks Elias, awesome answer!
From that, I've written the following code which will list interfaces belonging to a specific namespace. Thanks!
package main
import (
"github.com/vishvananda/netns"
"github.com/vishvananda/netlink"
"log"
)
type NSHandle netlink.Handle
func (h *NSHandle) showInts() {
nh := (*netlink.Handle)(h) //cast required
int, err := nh.LinkList()
if err != nil {
log.Fatal(err)
}
log.Printf("Namespace Ints:")
for i, r := range int {
log.Printf("%d: %s", i, r.Attrs().Name)
}
}
func getNSFromPID(pid int) (*NSHandle) {
hpid, err := netns.GetFromPid(9115)
if err != nil {
log.Fatal(err)
}
var nsh *NSHandle
if h, err := netlink.NewHandleAt(hpid); err != nil {
log.Fatal(err) // something went wrong
} else {
nsh = (*NSHandle)(h)
}
return nsh
}
func main() {
getNSFromPID(9115).showInts()
}
I have a package named "seeder":
package seeder
import "fmt"
func MyFunc1() {
fmt.Println("I am Masood")
}
func MyFunc2() {
fmt.Println("I am a programmer")
}
func MyFunc3() {
fmt.Println("I want to buy a car")
}
Now I want to call all functions with MyFunc prefix
package main
import "./seeder"
func main() {
for k := 1; k <= 3; k++ {
seeder.MyFunc1() // This calls MyFunc1 three times
}
}
I want something like this:
for k := 1; k <= 3; k++ {
seeder.MyFunc + k ()
}
and this output:
I am Masood
I am a programmer
I want to buy a car
EDIT1:
In this example, parentKey is a string variable which changed in a loop
for parentKey, _ := range uRLSjson{
pppp := seeder + "." + strings.ToUpper(parentKey)
gorilla.HandleFunc("/", pppp).Name(parentKey)
}
But GC said:
use of package seeder without selector
You can't get a function by its name, and that is what you're trying to do. The reason is that if the Go tool can detect that a function is not referred to explicitly (and thus unreachable), it may not even get compiled into the executable binary. For details see Splitting client/server code.
With a function registry
One way to do what you want is to build a "function registry" prior to calling them:
registry := map[string]func(){
"MyFunc1": MyFunc1,
"MyFunc2": MyFunc2,
"MyFunc3": MyFunc3,
}
for k := 1; k <= 3; k++ {
registry[fmt.Sprintf("MyFunc%d", k)]()
}
Output (try it on the Go Playground):
Hello MyFunc1
Hello MyFunc2
Hello MyFunc3
Manual "routing"
Similar to the registry is inspecting the name and manually routing to the function, for example:
func callByName(name string) {
switch name {
case "MyFunc1":
MyFunc1()
case "MyFunc2":
MyFunc2()
case "MyFunc3":
MyFunc3()
default:
panic("Unknown function name")
}
}
Using it:
for k := 1; k <= 3; k++ {
callByName(fmt.Sprintf("MyFunc%d", k))
}
Try this on the Go Playground.
Note: It's up to you if you want to call the function identified by its name in the callByName() helper function, or you may choose to return a function value (of type func()) and have it called in the caller's place.
Transforming functions to methods
Also note that if your functions would actually be methods of some type, you could do it without a registry. Using reflection, you can get a method by name: Value.MethodByName(). You can also get / enumerate all methods without knowing their names using Value.NumMethod() and Value.Method() (also see Type.NumMethod() and Type.Method() if you need the name of the method or its parameter types).
This is how it could be done:
type MyType int
func (m MyType) MyFunc1() {
fmt.Println("Hello MyFunc1")
}
func (m MyType) MyFunc2() {
fmt.Println("Hello MyFunc2")
}
func (m MyType) MyFunc3() {
fmt.Println("Hello MyFunc3")
}
func main() {
v := reflect.ValueOf(MyType(0))
for k := 1; k <= 3; k++ {
v.MethodByName(fmt.Sprintf("MyFunc%d", k)).Call(nil)
}
}
Output is the same. Try it on the Go Playground.
Another alternative would be to range over an array of your functions
package main
import (
"fmt"
)
func MyFunc1() {
fmt.Println("I am Masood")
}
func MyFunc2() {
fmt.Println("I am a programmer")
}
func MyFunc3() {
fmt.Println("I want to buy a car")
}
func main() {
for _, fn := range []func(){MyFunc1, MyFunc2, MyFunc3} {
fn()
}
}
I am trying to implement a multithreaded crawler using a go lang as a sample task to learn the language.
It supposed to scan pages, follow links and save them do DB.
To avoid duplicates I'm trying to use map where I save all the URLs I've already saved.
The synchronous version works fine, but I have troubles when I'm trying to use goroutines.
I'm trying to use mutex as a sync object for map, and channel as a way to coordinate goroutines. But obviously I don't have clear understanding of them.
The problem is that I have many duplicate entries, so my map store/check does not work properly.
Here is my code:
package main
import (
"fmt"
"net/http"
"golang.org/x/net/html"
"strings"
"database/sql"
_ "github.com/ziutek/mymysql/godrv"
"io/ioutil"
"runtime/debug"
"sync"
)
const maxDepth = 2;
var workers = make(chan bool)
type Pages struct {
mu sync.Mutex
pagesMap map[string]bool
}
func main() {
var pagesMutex Pages
fmt.Println("Start")
const database = "gotest"
const user = "root"
const password = "123"
//open connection to DB
con, err := sql.Open("mymysql", database + "/" + user + "/" + password)
if err != nil { /* error handling */
fmt.Printf("%s", err)
debug.PrintStack()
}
fmt.Println("call 1st save site")
pagesMutex.pagesMap = make(map[string]bool)
go pagesMutex.saveSite(con, "http://golang.org/", 0)
fmt.Println("saving true to channel")
workers <- true
fmt.Println("finishing in main")
defer con.Close()
}
func (p *Pages) saveSite(con *sql.DB, url string, depth int) {
fmt.Println("Save ", url, depth)
fmt.Println("trying to lock")
p.mu.Lock()
fmt.Println("locked on mutex")
pageDownloaded := p.pagesMap[url] == true
if pageDownloaded {
p.mu.Unlock()
return
} else {
p.pagesMap[url] = true
}
p.mu.Unlock()
response, err := http.Get(url)
if err != nil {
fmt.Printf("%s", err)
debug.PrintStack()
} else {
defer response.Body.Close()
contents, err := ioutil.ReadAll(response.Body)
if err != nil {
if err != nil {
fmt.Printf("%s", err)
debug.PrintStack()
}
}
_, err = con.Exec("insert into pages (url) values (?)", string(url))
if err != nil {
fmt.Printf("%s", err)
debug.PrintStack()
}
z := html.NewTokenizer(strings.NewReader((string(contents))))
for {
tokenType := z.Next()
if tokenType == html.ErrorToken {
return
}
token := z.Token()
switch tokenType {
case html.StartTagToken: // <tag>
tagName := token.Data
if strings.Compare(string(tagName), "a") == 0 {
for _, attr := range token.Attr {
if strings.Compare(attr.Key, "href") == 0 {
if depth < maxDepth {
urlNew := attr.Val
if !strings.HasPrefix(urlNew, "http") {
if strings.HasPrefix(urlNew, "/") {
urlNew = urlNew[1:]
}
urlNew = url + urlNew
}
//urlNew = path.Clean(urlNew)
go p.saveSite(con, urlNew, depth + 1)
}
}
}
}
case html.TextToken: // text between start and end tag
case html.EndTagToken: // </tag>
case html.SelfClosingTagToken: // <tag/>
}
}
}
val := <-workers
fmt.Println("finished Save Site", val)
}
Could someone explain to me how to do this properly, please?
Well you have two chooses, for a little and simple implementation, I would recommend to separate the operations on the map into a separate structure.
// Index is a shared page index
type Index struct {
access sync.Mutex
pages map[string]bool
}
// Mark reports that a site have been visited
func (i Index) Mark(name string) {
i.access.Lock()
i.pages[name] = true
i.access.Unlock()
}
// Visited returns true if a site have been visited
func (i Index) Visited(name string) bool {
i.access.Lock()
defer i.access.Unlock()
return i.pages[name]
}
Then, add another structure like this:
// Crawler is a web spider :D
type Crawler struct {
index Index
/* ... other important stuff like visited sites ... */
}
// Crawl looks for content
func (c *Crawler) Crawl(site string) {
// Implement your logic here
// For example:
if !c.index.Visited(site) {
c.index.Mark(site) // When marked
}
}
That way you keep things nice and clear, probably a little more code, but definitely more readable. You need to instance crawler like this:
sameIndex := Index{pages: make(map[string]bool)}
asManyAsYouWant := Crawler{sameIndex, 0} // They will share sameIndex
If you want to go further with a high level solution, then I would recommend Producer/Consumer architecture.
In the example below I've embedded http.ResponseWriter into my own struct called Response. I've also added an extra field called Status. Why can't I access that field from inside my root handler function?
When I print out the type of w in my root handler function it says it's of type main.Response which seems correct and when I print out the values of the struct I can see that Status is there. Why can't I access by going w.Status?
This is the contents of stdout:
main.Response
{ResponseWriter:0xc2080440a0 Status:0}
Code:
package main
import (
"fmt"
"reflect"
"net/http"
)
type Response struct {
http.ResponseWriter
Status int
}
func (r Response) WriteHeader(n int) {
r.Status = n
r.ResponseWriter.WriteHeader(n)
}
func middleware(h http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
resp := Response{ResponseWriter: w}
h.ServeHTTP(resp, r)
})
}
func root(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("root"))
fmt.Println(reflect.TypeOf(w))
fmt.Printf("%+v\n", w)
fmt.Println(w.Status) // <--- This causes an error.
}
func main() {
http.Handle("/", middleware(http.HandlerFunc(root)))
http.ListenAndServe(":8000", nil)
}
w is a variable of type http.ResponseWriter. ResponseWriter does not have a field or method Status, only your Response type.
http.ResponseWriter is an interface type, and since your Response type implements it (because it embeds ResponseWriter), the w variable may hold a value of dynamic type Response (and in your case it does).
But to access the Response.Status field, you have to convert it to a value of type Response. For that use Type assertion:
if resp, ok := w.(Response); ok {
// resp is of type Response, you can access its Status field
fmt.Println(resp.Status) // <--- properly prints status
}