How to get webpage content into a string using Go - string

I am trying to use Go and the http package to get the content of a webpage into a string, then be able to process the string. I am new to Go, so I am not entirely sure where to begin. Here is the function I am trying to make.
func OnPage(link string) {
}
I am not sure how to write the function. Link is the url of the webpage to use, and result would be the string from the webpage. So for example, if I used reddit as the link, then the result would just be the string form of the content on reddit, and I could process that string in different ways. From what I have read, I want to use the http package, but as I stated before, I do not know where to begin. Any help would be appreciated.

package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func OnPage(link string)(string) {
res, err := http.Get(link)
if err != nil {
log.Fatal(err)
}
content, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
return string(content)
}
func main() {
fmt.Println(OnPage("http://www.bbc.co.uk/news/uk-england-38003934"))
}

Related

Matching text to images using fuzzy search

I am using this package: https://github.com/blevesearch/bleve to create a mapping of products2images.
It is working fine when I use single terms, but not at all if I use an entire phrase. For instance, if I use this :
query := bleve.NewFuzzyQuery("lacteo")
it will correctly map the right image. However, If I do this :
query := bleve.NewFuzzyQuery("lacteo leche yogurt cebolla")
It will not match anything at all.
What am I doing wrong here ?
Set DB :
package main
import (
"github.com/blevesearch/bleve"
)
func main() {
message := []struct {
Id string
Body string
}{
{
Id: "lacteos.jpg",
Body: "lacteo leche yogurt cebolla",
},
{
Id: "cafe.jpg",
Body: "café yerba té",
},
{
Id: "queso.jpg",
Body: "lacteo leche yogurt cebolla queso",
},
{
Id: "harina.jpg",
Body: "harina",
},
}
mapping := bleve.NewIndexMapping()
index, err := bleve.New("example.bleve", mapping)
if err != nil {
panic(err)
}
index.Index(message[0].Id, message[0])
index.Index(message[1].Id, message[1])
index.Index(message[2].Id, message[2])
index.Index(message[3].Id, message[3])
}
Search for something :
package main
import (
"fmt"
"log"
"github.com/blevesearch/bleve"
)
func main() {
index, _ := bleve.Open("example.bleve")
query := bleve.NewFuzzyQuery("lacteo leche yogurt cebolla queso")
query.SetFuzziness(2)
searchRequest := bleve.NewSearchRequest(query)
searchResult, err := index.Search(searchRequest)
if err != nil {
log.Fatal(err.Error())
}
for _, v := range searchResult.Hits {
fmt.Println(v.ID)
fmt.Println(v.Score)
fmt.Println("-------------")
}
}
So, after posting an issue at their repo : https://github.com/blevesearch/bleve/issues/1565 I found out this is actually not supported. I ended up adding a little bit more logic to my side to make this work.

Remove all characters after a delimiter in a string

I am building a web crawler application in golang.
After downloading the HTML of a page, I separate out the URLs.
I am presented with URLs that have "#s" in them, such as "en.wikipedia.org/wiki/Race_condition#Computing". I would like to get rid of all characters following the "#", since these lead to the same page anyways. Any advice for how to do so?
Use the url package:
u, _ := url.Parse("SOME_URL_HERE")
u.Fragment = ""
return u.String()
An improvement on the answer by Luke Joshua Park is to parse the URL relative to the URL of the source page. This creates an absolute URL from what might be relative URL on the page (scheme not specified, host not specified, relative path). Another improvement is to check and handle errors.
func clean(pageURL, linkURL string) (string, error) {
p, err := url.Parse(pageURL)
if err != nil {
return "", err
}
l, err := p.Parse(linkURL)
if err != nil {
return "", err
}
l.Fragment = "" // chop off the fragment
return l.String()
}
If you are not interested in getting an absolute URL, then chop off everything after the #. This works because the only valid use of # in a URL is the fragment separator.
func clean(linkURL string) string {
i := strings.LastIndexByte(linkURL, '#')
if i < 0 {
return linkURL
}
return linkURL[:i]
}

How to separate arrays (type structs) in Go?

I just created this code to experiment with type, i will explain the problems later.
My Code:
package main
import (
"fmt"
"math/rand"
"time"
)
type Games struct {
game string
creator string
}
func main() {
videogames := []Games{
{"inFamous", "Sucker Punch Games"},
{"Halo", "343 Games"},
{"JustCause", "Eidos"},
}
rand.Seed(time.Now().UTC().UnixNano())
i := rand.Intn(len(videogames))
fmt.Print(videogames[i])
}
If I run this the result will be,
{inFamous,Sucker Punch Games}
Now what i want to do is separate the arrays so that the result will be,
Game = inFamous
Publisher = Sucker Punch Games
Also i need to remove the opening and closing brackets.
You need a stringer method to define how your object will be printed:
func (g Games) String() string {
return fmt.Sprintf("Game = %v, Creator = %v", g.game, g.creator)
}
Check out the Tour of Go
fmt.Print() does not allow you to specify the format, but will use the type default format.
Instead, use fmt.Printf(). This should do what you need:
fmt.Printf("Game = %s\nPublisher = %s", videogames[i].game, videogames[i].creator)

GoLang put string in map

So, I'm trying to add a string to an existing map that is created from toml.
http://hastebin.com/vayolavose
When I try and build I get the error:
./web.go:56: arguments to copy have different element types: []proxy.Address and string
How would I go about converting it? I've been trying this for the past like 4 hours.
Thanks
while,the code below is your source code
func handleAddFunc(w http.ResponseWriter, r *http.Request) {
backend := r.FormValue("backend")
key := r.FormValue("key")
if !isAuthorized(key) {
respond(w, r, 403, "")
return
}
w.Header().Set("Content-Type", "text/plain")
if !readConfig() {
return
}
activeAddrs = make([]proxy.Address, len(config.Proxy.ServerAddrs))
backendAddr = make([]proxy.Address, len(backend))
copy(backendAddr, config.Proxy.ServerAddrs)
copy(backendAddr, backend)
loadBalancer.SetAddrs(backendAddr)
fmt.Fprintf(w, "Input value of ", backend, "and here is the byte", backendAddr)
}
your code's error, is copy(backendAddr, backend), variable backend is a string value from the request from, you may change this into []proxy.Address, such as (consider I donnot know the struct of proxy.Address ):
var backendAddr = []proxy.Address{}
for _,str := range strings.split(backend,","){
backendAddr = append(backendAddr, &proxy.Address(str))
}

How to check file existence by its base name (without extension)?

Question is quite self-explanatory.
Please, could anybody show me how can I check existence of the file by name (without extension) by short and efficient way. It would be great if code returns several occurrence if folder have several files with the same name.
Example:
folder/
file.html
file.md
UPDATE:
It is not obviously how to use one of filepath.Match() or filepath.Glob() functions by official documentation. So here is some examples:
matches, _ := filepath.Glob("./folder/file*") //returns paths to real files [folder/file.html, folder/file.md]
matchesToPattern, _ := filepath.Match("./folder/file*", "./folder/file.html") //returns true, but it is just compare strings and doesn't check real content
You need to use the path/filepath package.
The functions to check are: Glob(), Match() and Walk() — pick whatever suits your taste better.
Here is the updated code :
package main
import (
"fmt"
"os"
"path/filepath"
"regexp"
)
func main() {
dirname := "." + string(filepath.Separator)
d, err := os.Open(dirname)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
defer d.Close()
fi, err := d.Readdir(-1)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
r, _ := regexp.Compile("f([a-z]+)le") // the string to match
for _, fi := range fi {
if fi.Mode().IsRegular() { // is file
if r.Match([]byte(fi.Name())) { // if it match
fmt.Println(fi.Name(), fi.Size(), "bytes")
}
}
}
}
With this one you can also search for date, size, include subfolders or file properties.

Resources