Why do ANTLR4 (Go target) parsers require so many type assertions? - antlr4

I'm migrating a project that includes an ANTLR-generated parser from Java to Go. Rather than using the visitor pattern, the project simply iterates through the contents of the parsed documents. Something like this, for example, in Java:
public static Context loadContext(ThriftParser p) {
Context ret = new Context();
DocumentContext doc = p.document();
for (HeaderContext header : doc.header()) {
if (header.namespace() != null) {
ret.namespace = header.namespace().namespace_value().getText();
}
}
for (DefinitionContext def : doc.definition()) {
if (def.enum_rule() != null) {
ret.Enums.add(def.enum_rule().identifier().getText());
} else if (def.struct_rule() != null) {
ret.Structs.add(def.struct_rule().identifier().getText());
}
}
return ret;
}
Translating this to Go, I ended up with something like:
func LoadContext(p *parser.ThriftParser) Context {
ret := Context{}
doc := p.Document()
for _, header := range doc.AllHeader() {
if ns := header.Namespace(); ns != nil {
ret.Namespace = ns.Namespace_value().GetText()
}
}
for _, def := range doc.AllDefinition() {
if o := def.Enum_rule(); o != nil {
ret.Enums = append(ret.Enums, o.Identifier().GetText())
} else if o := def.Struct_rule(); o != nil {
ret.Structs = append(ret.Structs, o.Identifier().GetText())
}
}
return ret
}
However, this fails to compile with:
./main.go:51:29: doc.AllHeader undefined (type parser.IDocumentContext has no field or method AllHeader)
./main.go:56:26: doc.AllDefinition undefined (type parser.IDocumentContext has no field or method AllDefinition)
To make this code build, it seems that a lot of type assertions are necessary, like:
func LoadContext(p *parser.ThriftParser) Context {
ret := Context{}
doc := p.Document().(*parser.DocumentContext)
for _, header := range doc.AllHeader() {
if ns := header.(*parser.HeaderContext).Namespace(); ns != nil {
ret.Namespace = ns.(*parser.NamespaceContext).Namespace_value().GetText()
}
}
for _, def := range doc.AllDefinition() {
if o := def.(*parser.DefinitionContext).Enum_rule(); o != nil {
ret.Enums = append(ret.Enums, o.(*parser.Enum_ruleContext).Identifier().GetText())
} else if o := def.(*parser.DefinitionContext).Struct_rule(); o != nil {
ret.Structs = append(ret.Structs, o.(*parser.Struct_ruleContext).Identifier().GetText())
}
}
return ret
}
The getters on the Go parser seem to return interfaces rather than concrete types, and the interfaces don't include signatures for the methods on the concrete types.
The parser API would be much easier to use if the interfaces included the required method signatures or the getters returned concrete types.
Is this an omission in the parser API for the Go target, or am I misunderstanding how to use this?
Here is the grammar I used with the examples above: Thrift.g4

Related

Load data from reading files during startup and then process new files and clear old state from the map

I am working on a project where during startup I need to read certain files and store it in memory in a map and then periodically look for new files if there are any and then replace whatever I had in memory in the map earlier during startup with this new data. Basically every time if there is a new file which is a full state then I want to refresh my in memory map objects to this new one instead of appending to it.
Below method loadAtStartupAndProcessNewChanges is called during server startup which reads the file and store data in memory. Also it starts a go-routine detectNewFiles which periodically checks if there are any new files and store it on a deltaChan channel which is later accessed by another go-routine processNewFiles to read that new file again and store data in the same map. If there is any error then we store it on err channel. loadFiles is the function which will read files in memory and store it in map.
type customerConfig struct {
deltaChan chan string
err chan error
wg sync.WaitGroup
data *cmap.ConcurrentMap
}
// this is called during server startup.
func (r *customerConfig) loadAtStartupAndProcessNewChanges() error {
path, err := r.GetPath("...", "....")
if err != nil {
return err
}
r.wg.Add(1)
go r.detectNewFiles(path)
err = r.loadFiles(4, path)
if err != nil {
return err
}
r.wg.Add(1)
go r.processNewFiles()
return nil
}
This method basically figures out if there are any new files that needs to be consumed and if there is any then it will put it on the deltaChan channel which will be later on consumed by processNewFiles go-routine and read the file in memory. If there is any error then it will add error to the error channel.
func (r *customerConfig) detectNewFiles(rootPath string) {
}
This will read all s3 files and store it in memory and return error. In this method I clear previous state of my map so that it can have fresh state from new files. This method is called during server startup and also called whenever we need to process new files from processNewFiles go-routine.
func (r *customerConfig) loadFiles(workers int, path string) error {
var err error
...
var files []string
files = .....
// reset the map so that it can have fresh state from new files.
r.data.Clear()
g, ctx := errgroup.WithContext(context.Background())
sem := make(chan struct{}, workers)
for _, file := range files {
select {
case <-ctx.Done():
break
case sem <- struct{}{}:
}
file := file
g.Go(func() error {
defer func() { <-sem }()
return r.read(spn, file, bucket)
})
}
if err := g.Wait(); err != nil {
return err
}
return nil
}
This method read the files and add in the data concurrent map.
func (r *customerConfig) read(file string, bucket string) error {
// read file and store it in "data" concurrent map
// and if there is any error then return the error
var err error
fr, err := pars3.NewS3FileReader(context.Background(), bucket, file, r.s3Client.GetSession().Config)
if err != nil {
return errs.Wrap(err)
}
defer xio.CloseIgnoringErrors(fr)
pr, err := reader.NewParquetReader(fr, nil, 8)
if err != nil {
return errs.Wrap(err)
}
if pr.GetNumRows() == 0 {
spn.Infof("Skipping %s due to 0 rows", file)
return nil
}
for {
rows, err := pr.ReadByNumber(r.cfg.RowsToRead)
if err != nil {
return errs.Wrap(err)
}
if len(rows) <= 0 {
break
}
byteSlice, err := json.Marshal(rows)
if err != nil {
return errs.Wrap(err)
}
var invMods []CompModel
err = json.Unmarshal(byteSlice, &invMods)
if err != nil {
return errs.Wrap(err)
}
for i := range invMods {
key := strconv.FormatInt(invMods[i].ProductID, 10) + ":" + strconv.Itoa(int(invMods[i].Iaz))
hasInventory := false
if invMods[i].Available > 0 {
hasInventory = true
}
r.data.Set(key, hasInventory)
}
}
return nil
}
This method will pick what is there on the delta channel and if there are any new files then it will start reading that new file by calling loadFiles method. If there is any error then it will add error to the error channel.
// processNewFiles - load new files found by detectNewFiles
func (r *customerConfig) processNewFiles() {
// find new files on delta channel
// and call "loadFiles" method to read it
// if there is any error, then it will add it to the error channel.
}
If there is any error on the error channel then it will log those errors from below method -
func (r *customerConfig) handleError() {
// read error from error channel if there is any
// then log it
}
Problem Statement
Above logic works for me without any issues but there is one small bug in my code which I am not able to figure out on how to solve it. As you can see I have a concurrent map which I am populating in my read method and also clearing that whole map in loadFiles method. Because whenever there is a new file on delta channel I don't want to keep previous state in the map so that's why I am removing everything from the map and then adding new state from new files to it.
Now if there is any error in read method then the bug happens bcoz I have already cleared all the data in my data map which will have empty map which is not what I want. Basically if there is any error then I would like to preserve previous state in the data map. How can I resolve this issue in my above current design.
Note: I am using golang concurrent map
I think your design is over complicated. It can be solved much simpler, which gives all the benefits you desire:
safe for concurrent access
detected changes are reloaded
accessing the config gives you the most recent, successfully loaded config
the most recent config is always, immediately accessible, even if loading a new config due to detected changes takes long
if loading new config fails, the previous "snapshot" is kept and remains the current
as a bonus, it's much simpler and doesn't even use 3rd party libs
Let's see how to achieve this:
Have a CustomerConfig struct holding everything you want to cache (this is the "snapshot"):
type CustomerConfig struct {
Data map[string]bool
// Add other props if you need:
LoadedAt time.Time
}
Provide a function that loads the config you wish to cache. Note: this function is stateless, it does not access / operate on package level variables:
func loadConfig() (*CustomerConfig, error) {
cfg := &CustomerConfig{
Data: map[string]bool{},
LoadedAt: time.Now(),
}
// Logic to load files, and populate cfg.Data
// If an error occurs, return it
// If loading succeeds, return the config
return cfg, nil
}
Now let's create our "cache manager". The cache manager stores the actual / current config (the snapshot), and provides access to it. For safe concurrent access (and update), we use a sync.RWMutex. Also has means to stop the manager (to stop the concurrent refreshing):
type ConfigCache struct {
configMu sync.RWMutex
config *CustomerConfig
closeCh chan struct{}
}
Creating a cache loads the initial config. Also launches a goroutine that will be responsible to periodically check for changes.
func NewConfigCache() (*ConfigCache, error) {
cfg, err := loadConfig()
if err != nil {
return nil, fmt.Errorf("loading initial config failed: %w", err)
}
cc := &ConfigCache{
config: cfg,
closeCh: make(chan struct{}),
}
// launch goroutine to periodically check for changes, and load new configs
go cc.refresher()
return cc, nil
}
The refresher() periodically checks for changes, and if changes are detected, calls loadConfig() to load new data to be cached, and stores it as the current / actual config (while locking configMu). It also monitors closeCh to stop if that is requested:
func (cc *ConfigCache) refresher() {
ticker := time.NewTicker(1 * time.Minute) // Every minute
defer ticker.Stop()
for {
select {
case <-ticker.C:
// Check if there are changes
changes := false // logic to detect changes
if !changes {
continue // No changes, continue
}
// Changes! load new config:
cfg, err := loadConfig()
if err != nil {
log.Printf("Failed to load config: %v", err)
continue // Keep the previous config
}
// Apply / store new config
cc.configMu.Lock()
cc.config = cfg
cc.configMu.Unlock()
case <-cc.closeCh:
return
}
}
}
Closing the cache manager (the refresher goroutine) is as easy as:
func (cc *ConfigCache) Stop() {
close(cc.closeCh)
}
The last missing piece is how you access the current config. That's a simple GetConfig() method (that also uses configMu, but in read-only mode):
func (cc *ConfigCache) GetConfig() *CustomerConfig {
cc.configMu.RLock()
defer cc.configMu.RUnlock()
return cc.config
}
This is how you can use this:
cc, err := NewConfigCache()
if err != nil {
// Decide what to do: retry, terminate etc.
}
// Where ever, whenever you need the actual (most recent) config in your app:
cfg := cc.GetConfig()
// Use cfg
Before you shut down your app (or you want to stop the refreshing), you may call cc.Stop().
Added RWMutex for collectedData concurrent write protecting by worker goroutine
type customerConfig struct {
...
m sync.RWMutex
}
Instead of updating map in read method let read method just return the data and error
func (r *customerConfig) read(file string, bucket string) ([]CompModel, error) {
// read file data and return with error if any
var err error
fr, err := pars3.NewS3FileReader(context.Background(), bucket, file, r.s3Client.GetSession().Config)
if err != nil {
return (nil, errs.Wrap(err))
}
defer xio.CloseIgnoringErrors(fr)
pr, err := reader.NewParquetReader(fr, nil, 8)
if err != nil {
return (nil, errs.Wrap(err))
}
if pr.GetNumRows() == 0 {
spn.Infof("Skipping %s due to 0 rows", file)
return (nil, errors.New("No Data"))
}
var invMods = []CompModel{}
for {
rows, err := pr.ReadByNumber(r.cfg.RowsToRead)
if err != nil {
return (nil, errs.Wrap(err))
}
if len(rows) <= 0 {
break
}
byteSlice, err := json.Marshal(rows)
if err != nil {
return (nil, errs.Wrap(err))
}
var jsonData []CompModel
err = json.Unmarshal(byteSlice, &jsonData)
if err != nil {
return (nil, errs.Wrap(err))
}
invMods = append(invMods, jsonData...)
}
return invMods, nil
}
And then loadFiles you can collect the data return by read
method and if no error only then clear and update the map else
leave the old data as it was before
func (r *customerConfig) loadFiles(workers int, path string) error {
var err error
...
var files []string
files = .....
// reset the map so that it can have fresh state from new files.
// r.data.Clear() <- remove the clear from here
g, ctx := errgroup.WithContext(context.Background())
sem := make(chan struct{}, workers)
collectedData := []CompModel{}
for _, file := range files {
select {
case <-ctx.Done():
break
case sem <- struct{}{}:
}
file := file
g.Go(func() error {
defer func() { <-sem }()
data, err:= r.read(spn, file, bucket)
if err != nil {
return err
}
r.m.Lock()
append(collectedData, data...)
r.m.Unlock()
return nil
})
}
if err := g.Wait(); err != nil {
return err
}
r.data.Clear()
for i := range collectedData {
key := strconv.FormatInt(collectedData[i].ProductID, 10) + ":" + strconv.Itoa(int(collectedData[i].Iaz))
hasInventory := false
if collectedData[i].Available > 0 {
hasInventory = true
}
r.data.Set(key, hasInventory)
}
return nil
}
Note: Since the code is not runnable just updated methods for reference and I have not include mutex lock for updating the slice you may need to handle for the case.
The same can be achieved with just 3 functions - detect, read, load, detect will check for new files by interval and push to delta channel if found any, load will get file path to read from delta channel and call read method to get the data and error then checks if no error then clear the map and update with new content else log the error, so you would have 2 go routines and 1 function which would be called by load routine
package main
import (
"fmt"
"time"
"os"
"os/signal"
"math/rand"
)
func main() {
fmt.Println(">>>", center("STARTED", 30), "<<<")
c := &Config{
InitialPath: "Old Path",
DetectInterval: 3000,
}
c.start()
fmt.Println(">>>", center("ENDED", 30), "<<<")
}
// https://stackoverflow.com/questions/41133006/how-to-fmt-printprint-this-on-the-center
func center(s string, w int) string {
return fmt.Sprintf("%[1]*s", -w, fmt.Sprintf("%[1]*s", (w + len(s))/2, s))
}
type Config struct {
deltaCh chan string
ticker *time.Ticker
stopSignal chan os.Signal
InitialPath string
DetectInterval time.Duration
}
func (c *Config) start() {
c.stopSignal = make(chan os.Signal, 1)
signal.Notify(c.stopSignal, os.Interrupt)
c.ticker = time.NewTicker(c.DetectInterval * time.Millisecond)
c.deltaCh = make(chan string, 1)
go c.detect()
go c.load()
if c.InitialPath != "" {
c.deltaCh <- c.InitialPath
}
<- c.stopSignal
c.ticker.Stop()
}
// Detect New Files
func (c *Config) detect() {
for {
select {
case <- c.stopSignal:
return
case <- c.ticker.C:
fmt.Println(">>>", center("DETECT", 30), "<<<")
c.deltaCh <- fmt.Sprintf("PATH %f", rand.Float64() * 1.5)
}
}
}
// Read Files
func read(path string) (map[string]int, error) {
data := make(map[string]int)
data[path] = 0
fmt.Println(">>>", center("READ", 30), "<<<")
fmt.Println(path)
return data, nil
}
// Load Files
func (c *Config) load() {
for {
select {
case <- c.stopSignal:
return
case path := <- c.deltaCh:
fmt.Println(">>>", center("LOAD", 30), "<<<")
data, err := read(path)
if err != nil {
fmt.Println("Log Error")
} else {
fmt.Println("Success", data)
}
fmt.Println()
}
}
}
Note: Not included map in sample code it can be easily updated to include map
Just allocate new one map. Like this:
var mu sync.Mutex
before := map[string]string{} // Some map before reading
after := make(map[string]string)
// Read files and fill `after` map
mu.Lock()
before = after
mu.Unlock()
Instead of clearing the map in loadFile method, do something like this in read
func (r *customerConfig) read(file string, bucket string) error {
m := cmap.New() // create a new map
// ...
for {
rows, err := pr.ReadByNumber(r.cfg.RowsToRead)
if err != nil {
return errs.Wrap(err)
}
if len(rows) <= 0 {
break
}
byteSlice, err := json.Marshal(rows)
if err != nil {
return errs.Wrap(err)
}
var invMods []CompModel
err = json.Unmarshal(byteSlice, &invMods)
if err != nil {
return errs.Wrap(err)
}
for i := range invMods {
key := strconv.FormatInt(invMods[i].ProductID, 10) + ":" + strconv.Itoa(int(invMods[i].Iaz))
hasInventory := false
if invMods[i].Available > 0 {
hasInventory = true
}
m.Set(key, hasInventory)
}
}
r.data = m // Use the new map
return nil
}

Printing a list from within a list in Go. Loop trouble

Trying to print a list of routes from within a network namespace. The netlink.RouteList function requires an Interface type. A list of all interfaces is gathered by LinkList().
I'm trying to call RouteList with every interface and print it's output. RouteList returns type Route where I'm trying to print the int LinkIndex.
It appears as if my loop
for j := range rt {
log.Printf("Route: %d : %d",rt[j].LinkIndex)
}
Isn't executing for some reason, running another Printf test in there yields nothing.
Why wouldn't this loop be called?
func (h *NSHandle) showInts() {
nh := (*netlink.Handle)(h) //cast required
int, err := nh.LinkList()
if err != nil {
log.Fatal(err)
}
log.Printf("Namespace Ints:")
for i, r := range int {
log.Printf("%d: %s", i, r.Attrs().Name)
rt, err := netlink.RouteList(r,-1)
if err != nil {
log.Fatal(err)
}
for j := range rt {
log.Printf("Route: %d : %d",rt[j].LinkIndex)
}
}
}
This was a bad question. Soon after posting I had realised that the array was obviously empty due to the fact that RouteList was being called without the receiver Handler. This was fixed by simply:
for i, r := range rl {
log.Printf("%d: %s", i, LinkIndex)
}

How to convert a string value to the correct reflect.Kind in go?

Is there a generic helper method in Go to convert a string to the correct value based on reflect.Kind?
Or do I need to implement the switch over all kinds myself?
I have a value like "143" as a string and a reflect.Value with kind "UInt16" and like to convert that string value and set it into the UInt16 value of my struct.
My current code looks like:
func setValueFromString(v reflect.Value, strVal string) error {
switch v.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
val, err := strconv.ParseInt(strVal, 0, 64)
if err != nil {
return err
}
if v.OverflowInt(val) {
return errors.New("Int value too big: " + strVal)
}
v.SetInt(val)
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64:
val, err := strconv.ParseUint(strVal, 0, 64)
if err != nil {
return err
}
if v.OverflowUint(val) {
return errors.New("UInt value too big: " + strVal)
}
v.SetUint(val)
case reflect.Float32:
val, err := strconv.ParseFloat(strVal, 32)
if err != nil {
return err
}
v.SetFloat(val)
case reflect.Float64:
val, err := strconv.ParseFloat(strVal, 64)
if err != nil {
return err
}
v.SetFloat(val)
case reflect.String:
v.SetString(strVal)
case reflect.Bool:
val, err := strconv.ParseBool(strVal)
if err != nil {
return err
}
v.SetBool(val)
default:
return errors.New("Unsupported kind: " + v.Kind().String())
}
return nil
}
This works already, but I wonder if this is already implemented somewhere else.
Edit: Answer to the original question ("how to obtain a reflect.Kind from its string representation") is at the end. Answer to your edited question follows:
What you're doing is the fastest and "safest". If you don't want to hassle with that big switch, you may take advantage of e.g. the json package which already contains this switch to decode values from JSON string (in encoding/json/decode.go, unexported function literalStore()).
Your decoding function could look like this:
func Set(v interface{}, s string) error {
return json.Unmarshal([]byte(s), v)
}
A simple call to json.Unmarshal(). Using / testing it:
{
var v int
err := Set(&v, "1")
fmt.Println(v, err)
}
{
var v int
err := Set(&v, "d")
fmt.Println(v, err)
}
{
var v uint32
err := Set(&v, "3")
fmt.Println(v, err)
}
{
var v bool
err := Set(&v, "true")
fmt.Println(v, err)
}
{
var v float32
err := Set(&v, `5.1`)
fmt.Println(v, err)
}
{
var v string
err := Set(&v, strconv.Quote("abc"))
fmt.Println(v, err)
}
One thing to note: when you want to pass a string, that must be quoted, e.g. with strconv.Quote(). Output (try it on the Go Playground):
1 <nil>
0 invalid character 'd' looking for beginning of value
3 <nil>
true <nil>
5.1 <nil>
abc <nil>
If you don't want to require quoted strings (which just complicates things), you may build it into the Set() function:
func Set(v interface{}, s string) error {
if t := reflect.TypeOf(v); t.Kind() == reflect.Ptr &&
t.Elem().Kind() == reflect.String {
s = strconv.Quote(s)
}
return json.Unmarshal([]byte(s), v)
}
And then you may call it with the address of a string variable and a string value unquoted:
var v string
err := Set(&v, "abc")
fmt.Println(v, err)
Try this variant on the Go Playground.
Answer to the original question: how to obtain a reflect.Kind from its string representation:
Declaration of reflect.Kind:
type Kind uint
The different values of reflect.Kinds are constants:
const (
Invalid Kind = iota
Bool
Int
Int8
// ...
Struct
UnsafePointer
)
And the reflect package provides only a single method for the reflect.Kind() type:
func (k Kind) String() string
So as it stands, you cannot obtain a reflect.Kind from its string representation (only the reverse direction is possible by using the Kind.String() method). But it's not that hard to provide this functionality.
What we'll do is we build a map from all the kinds:
var strKindMap = map[string]reflect.Kind{}
We init it like this:
func init() {
for k := reflect.Invalid; k <= reflect.UnsafePointer; k++ {
strKindMap[k.String()] = k
}
}
This is possible and correct because constants are initialized using iota which evaluates to successive untyped integer constants, and the first value is reflect.Invalid and the last is reflect.UnsafePointer.
And now you can obtain reflect.Kind from its string representation by simply indexing this map. A helper function which does that:
func strToKind(s string) reflect.Kind {
k, ok := strKindMap[s]
if !ok {
return reflect.Invalid
}
return k
}
And we're done. Testing / using it:
fmt.Printf("All: %#v\n", strKindMap)
for _, v := range []string{"Hey", "uint8", "ptr", "func", "chan", "interface"} {
fmt.Printf("String: %q, Kind: %v (%#v)\n", v, strToKind(v), strToKind(v))
}
Output (try it on the Go Playground):
All: map[string]reflect.Kind{"int64":0x6, "uint8":0x8, "uint64":0xb, "slice":0x17, "uintptr":0xc, "int8":0x3, "array":0x11, "interface":0x14, "unsafe.Pointer":0x1a, "complex64":0xf, "complex128":0x10, "int":0x2, "uint":0x7, "int16":0x4, "uint16":0x9, "map":0x15, "bool":0x1, "int32":0x5, "ptr":0x16, "string":0x18, "func":0x13, "struct":0x19, "invalid":0x0, "uint32":0xa, "float32":0xd, "float64":0xe, "chan":0x12}
String: "Hey", Kind: invalid (0x0)
String: "uint8", Kind: uint8 (0x8)
String: "ptr", Kind: ptr (0x16)
String: "func", Kind: func (0x13)
String: "chan", Kind: chan (0x12)
String: "interface", Kind: interface (0x14)

Multiple goroutines access/modify a list/map

I am trying to implement a multithreaded crawler using a go lang as a sample task to learn the language.
It supposed to scan pages, follow links and save them do DB.
To avoid duplicates I'm trying to use map where I save all the URLs I've already saved.
The synchronous version works fine, but I have troubles when I'm trying to use goroutines.
I'm trying to use mutex as a sync object for map, and channel as a way to coordinate goroutines. But obviously I don't have clear understanding of them.
The problem is that I have many duplicate entries, so my map store/check does not work properly.
Here is my code:
package main
import (
"fmt"
"net/http"
"golang.org/x/net/html"
"strings"
"database/sql"
_ "github.com/ziutek/mymysql/godrv"
"io/ioutil"
"runtime/debug"
"sync"
)
const maxDepth = 2;
var workers = make(chan bool)
type Pages struct {
mu sync.Mutex
pagesMap map[string]bool
}
func main() {
var pagesMutex Pages
fmt.Println("Start")
const database = "gotest"
const user = "root"
const password = "123"
//open connection to DB
con, err := sql.Open("mymysql", database + "/" + user + "/" + password)
if err != nil { /* error handling */
fmt.Printf("%s", err)
debug.PrintStack()
}
fmt.Println("call 1st save site")
pagesMutex.pagesMap = make(map[string]bool)
go pagesMutex.saveSite(con, "http://golang.org/", 0)
fmt.Println("saving true to channel")
workers <- true
fmt.Println("finishing in main")
defer con.Close()
}
func (p *Pages) saveSite(con *sql.DB, url string, depth int) {
fmt.Println("Save ", url, depth)
fmt.Println("trying to lock")
p.mu.Lock()
fmt.Println("locked on mutex")
pageDownloaded := p.pagesMap[url] == true
if pageDownloaded {
p.mu.Unlock()
return
} else {
p.pagesMap[url] = true
}
p.mu.Unlock()
response, err := http.Get(url)
if err != nil {
fmt.Printf("%s", err)
debug.PrintStack()
} else {
defer response.Body.Close()
contents, err := ioutil.ReadAll(response.Body)
if err != nil {
if err != nil {
fmt.Printf("%s", err)
debug.PrintStack()
}
}
_, err = con.Exec("insert into pages (url) values (?)", string(url))
if err != nil {
fmt.Printf("%s", err)
debug.PrintStack()
}
z := html.NewTokenizer(strings.NewReader((string(contents))))
for {
tokenType := z.Next()
if tokenType == html.ErrorToken {
return
}
token := z.Token()
switch tokenType {
case html.StartTagToken: // <tag>
tagName := token.Data
if strings.Compare(string(tagName), "a") == 0 {
for _, attr := range token.Attr {
if strings.Compare(attr.Key, "href") == 0 {
if depth < maxDepth {
urlNew := attr.Val
if !strings.HasPrefix(urlNew, "http") {
if strings.HasPrefix(urlNew, "/") {
urlNew = urlNew[1:]
}
urlNew = url + urlNew
}
//urlNew = path.Clean(urlNew)
go p.saveSite(con, urlNew, depth + 1)
}
}
}
}
case html.TextToken: // text between start and end tag
case html.EndTagToken: // </tag>
case html.SelfClosingTagToken: // <tag/>
}
}
}
val := <-workers
fmt.Println("finished Save Site", val)
}
Could someone explain to me how to do this properly, please?
Well you have two chooses, for a little and simple implementation, I would recommend to separate the operations on the map into a separate structure.
// Index is a shared page index
type Index struct {
access sync.Mutex
pages map[string]bool
}
// Mark reports that a site have been visited
func (i Index) Mark(name string) {
i.access.Lock()
i.pages[name] = true
i.access.Unlock()
}
// Visited returns true if a site have been visited
func (i Index) Visited(name string) bool {
i.access.Lock()
defer i.access.Unlock()
return i.pages[name]
}
Then, add another structure like this:
// Crawler is a web spider :D
type Crawler struct {
index Index
/* ... other important stuff like visited sites ... */
}
// Crawl looks for content
func (c *Crawler) Crawl(site string) {
// Implement your logic here
// For example:
if !c.index.Visited(site) {
c.index.Mark(site) // When marked
}
}
That way you keep things nice and clear, probably a little more code, but definitely more readable. You need to instance crawler like this:
sameIndex := Index{pages: make(map[string]bool)}
asManyAsYouWant := Crawler{sameIndex, 0} // They will share sameIndex
If you want to go further with a high level solution, then I would recommend Producer/Consumer architecture.

Having trouble understanding interface/struct relationship

I am having difficulty understanding the relationship between interfaces and structs in go. I have declared an interface called Datatype as follows:
package main
type Datatype interface {
Unmarshal(record []string) error
String() string
}
I have also created several structs that implement this interface. Here is one simple example:
package main
import (
"encoding/csv"
"fmt"
"gopkg.in/validator.v2"
"reflect"
"strconv"
"time"
)
type User struct {
Username string `validate:"nonzero"`
UserId string `validate:"nonzero"`
GivenName string `validate:"nonzero"`
FamilyName string `validate:"nonzero"`
Email string `validate:"regexp=^[0-9a-zA-Z]+#[0-9a-zA-Z]+(\\.[0-9a-zA-Z]+)+$"`
SMS string `validate:"nonzero"`
Phone string `validate:"min=10"`
DateOfBirth time.Time
}
type Users []User
func (u *User) Unmarshal(record []string) error {
s := reflect.ValueOf(u).Elem()
if s.NumField() != len(record) {
return &FieldMismatch{s.NumField(), len(record)}
}
for i := 0; i > s.NumField(); i++ {
f := s.Field(i)
switch f.Type().String() {
case "string":
f.SetString(record[i])
case "int", "int64":
ival, err := strconv.ParseInt(record[i], 10, 0)
if err != nil {
return err
}
f.SetInt(ival)
default:
return &UnsupportedType{f.Type().String()}
}
}
return nil
}
func (u *User) String() string {
return fmt.Sprintf("%#v", u)
}
func (u *User) populateFrom(reader *csv.Reader) (Users, error) {
var users Users
for {
record, err := reader.Read()
check(err)
err = u.Unmarshal(record)
check(err)
valid := validator.Validate(u)
if valid == nil {
user := *u
users = append(users, user)
} else {
fmt.Println("Validation error?: ", valid)
}
}
return users, nil
}
Problem:
As you can see, I also have a type called Users which is just []User. When I try to return this type from a function that has a return type of []Datatype, I get the following error message:
cannot use results (type Users) as type []Datatype in return argument
I'm sure I'm missing something obvious but it seems to me that this should work.
Question:
Could someone please explain why it does not work? Is there a better (more idiomatic) way to achieve this end result?
Slices are not covariant; even though User implements Datatype, []User does not implement []Datatype (because nothing implements []Datatype: it itself is not an interface type, it's just a slice type whose element type is an interface type).
Edited to add: As Dave C points out in a comment above, a closely-related question appears in the Go FAQ. [link] The Go FAQ is licensed in a way that's compatible with Stack Exchange content, so, here's the question in its entirety:
Can I convert a []T to an []interface{}?
Not directly, because they do not have the same representation in memory. It is necessary to copy the elements individually to the destination slice. This example converts a slice of int to a slice of interface{}:
t := []int{1, 2, 3, 4}
s := make([]interface{}, len(t))
for i, v := range t {
s[i] = v
}

Resources