Remove all characters after a delimiter in a string - string

I am building a web crawler application in golang.
After downloading the HTML of a page, I separate out the URLs.
I am presented with URLs that have "#s" in them, such as "en.wikipedia.org/wiki/Race_condition#Computing". I would like to get rid of all characters following the "#", since these lead to the same page anyways. Any advice for how to do so?

Use the url package:
u, _ := url.Parse("SOME_URL_HERE")
u.Fragment = ""
return u.String()

An improvement on the answer by Luke Joshua Park is to parse the URL relative to the URL of the source page. This creates an absolute URL from what might be relative URL on the page (scheme not specified, host not specified, relative path). Another improvement is to check and handle errors.
func clean(pageURL, linkURL string) (string, error) {
p, err := url.Parse(pageURL)
if err != nil {
return "", err
}
l, err := p.Parse(linkURL)
if err != nil {
return "", err
}
l.Fragment = "" // chop off the fragment
return l.String()
}
If you are not interested in getting an absolute URL, then chop off everything after the #. This works because the only valid use of # in a URL is the fragment separator.
func clean(linkURL string) string {
i := strings.LastIndexByte(linkURL, '#')
if i < 0 {
return linkURL
}
return linkURL[:i]
}

Related

Identifying an existing folder [duplicate]

This question already has answers here:
Expand tilde to home directory
(6 answers)
reader.ReadString does not strip out the first occurrence of delim
(4 answers)
Closed 3 years ago.
I have an issue where it seems go is telling me that a folder doesn't exist, when it clearly does.
path, _ := reader.ReadString('\n')
path, err := expand(path)
fmt.Println("Path Expanded: ", path, err)
if err == nil {
if _, err2 := os.Lstat(path); err2 == nil {
fmt.Println("Valid Path")
} else if os.IsNotExist(err2) {
fmt.Println("Invalid Path")
fmt.Println(err2)
} else {
fmt.Println(err2)
}
}
The expand function simply translates the ~ to the homeDir.
func expand(path string) (string, error) {
if len(path) == 0 || path[0] != '~' {
return path, nil
}
usr, err := user.Current()
if err != nil {
return "", err
}
return filepath.Join(usr.HomeDir, path[1:]), nil
}
If I input the value of ~ it correctly translates it to /home/<user>/ but it ultimately states that the folder does not exist, even though it clearly does, and I know I have access to it, so it doesn't seem to be a permissions thing.
if I try /root/ as the input, I correctly get a permissions error, I am ok with that. But I expect my ~ directory to return "Valid Path"
My error is almost always : no such file or directory
I am on Lubuntu 19.xx and it is a fairly fresh install, I am running this app from ~/Projects/src/Playground/AppName and I am using the bash terminal from vscode.
I have also tried both Lstat and Stat unsuccessfully, not to mention a ton of examples and different ways. I am sure this is some underlying linux thing that I don't understand...
The answer to this is that I was not trimming the ReadString which used the delimiter of \n, by adding strings.Trim(path, "\n"), it corrected my issue.

How to get webpage content into a string using Go

I am trying to use Go and the http package to get the content of a webpage into a string, then be able to process the string. I am new to Go, so I am not entirely sure where to begin. Here is the function I am trying to make.
func OnPage(link string) {
}
I am not sure how to write the function. Link is the url of the webpage to use, and result would be the string from the webpage. So for example, if I used reddit as the link, then the result would just be the string form of the content on reddit, and I could process that string in different ways. From what I have read, I want to use the http package, but as I stated before, I do not know where to begin. Any help would be appreciated.
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func OnPage(link string)(string) {
res, err := http.Get(link)
if err != nil {
log.Fatal(err)
}
content, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
return string(content)
}
func main() {
fmt.Println(OnPage("http://www.bbc.co.uk/news/uk-england-38003934"))
}

How to search a string in the elasticsearch document(indexed) in golang?

I am writing a function in golang to search for a string in elasticsearch documents which are indexed. I am using elasticsearch golang client elastic. For example consider the object is tweet,
type Tweet struct {
User string
Message string
Retweets int
}
And the search function is
func SearchProject() error{
// Search with a term query
termQuery := elastic.NewTermQuery("user", "olivere")
searchResult, err := client.Search().
Index("twitter"). // search in index "twitter"
Query(&termQuery). // specify the query
Sort("user", true). // sort by "user" field, ascending
From(0).Size(10). // take documents 0-9
Pretty(true). // pretty print request and response JSON
Do() // execute
if err != nil {
// Handle error
panic(err)
return err
}
// searchResult is of type SearchResult and returns hits, suggestions,
// and all kinds of other information from Elasticsearch.
fmt.Printf("Query took %d milliseconds\n", searchResult.TookInMillis)
// Each is a convenience function that iterates over hits in a search result.
// It makes sure you don't need to check for nil values in the response.
// However, it ignores errors in serialization. If you want full control
// over iterating the hits, see below.
var ttyp Tweet
for _, item := range searchResult.Each(reflect.TypeOf(ttyp)) {
t := item.(Tweet)
fmt.Printf("Tweet by %s: %s\n", t.User, t.Message)
}
// TotalHits is another convenience function that works even when something goes wrong.
fmt.Printf("Found a total of %d tweets\n", searchResult.TotalHits())
// Here's how you iterate through results with full control over each step.
if searchResult.Hits != nil {
fmt.Printf("Found a total of %d tweets\n", searchResult.Hits.TotalHits)
// Iterate through results
for _, hit := range searchResult.Hits.Hits {
// hit.Index contains the name of the index
// Deserialize hit.Source into a Tweet (could also be just a map[string]interface{}).
var t Tweet
err := json.Unmarshal(*hit.Source, &t)
if err != nil {
// Deserialization failed
}
// Work with tweet
fmt.Printf("Tweet by %s: %s\n", t.User, t.Message)
}
} else {
// No hits
fmt.Print("Found no tweets\n")
}
return nil
}
This search is printing tweets by the user 'olivere'. But if I give 'olive' then search is not working. How do I search for a string which is part of User/Message/Retweets?
And the Indexing function looks like this,
func IndexProject(p *objects.ElasticProject) error {
// Index a tweet (using JSON serialization)
tweet1 := `{"user" : "olivere", "message" : "It's a Raggy Waltz"}`
put1, err := client.Index().
Index("twitter").
Type("tweet").
Id("1").
BodyJson(tweet1).
Do()
if err != nil {
// Handle error
panic(err)
return err
}
fmt.Printf("Indexed tweet %s to index %s, type %s\n", put1.Id, put1.Index, put1.Type)
return nil
}
Output:
Indexed tweet 1 to index twitter, type tweet
Got document 1 in version 1 from index twitter, type tweet
Query took 4 milliseconds
Tweet by olivere: It's a Raggy Waltz
Found a total of 1 tweets
Found a total of 1 tweets
Tweet by olivere: It's a Raggy Waltz
Version
Go 1.4.2
Elasticsearch-1.4.4
Elasticsearch Go Library
github.com/olivere/elastic
Could anyone help me on this.? Thank you
How you search and find data depends on your analyser - from your code it's likely that the standard analyser is being used (i.e. you haven't specified an alternative in your mapping).
The Standard Analyser will only index complete words. So to match "olive" against "olivere" you could either:
Change the search process
e.g. switch from a term query to a Prefix query or use a Query String query with a wildcard.
Change the index process
If you want to find strings within larger strings then look at using nGrams or Edge nGrams in your analyser.
multiQuery := elastic.NewMultiMatchQuery(
term,
"name", "address", "location", "email", "phone_number", "place", "postcode",
).Type("phrase_prefix")

GoLang put string in map

So, I'm trying to add a string to an existing map that is created from toml.
http://hastebin.com/vayolavose
When I try and build I get the error:
./web.go:56: arguments to copy have different element types: []proxy.Address and string
How would I go about converting it? I've been trying this for the past like 4 hours.
Thanks
while,the code below is your source code
func handleAddFunc(w http.ResponseWriter, r *http.Request) {
backend := r.FormValue("backend")
key := r.FormValue("key")
if !isAuthorized(key) {
respond(w, r, 403, "")
return
}
w.Header().Set("Content-Type", "text/plain")
if !readConfig() {
return
}
activeAddrs = make([]proxy.Address, len(config.Proxy.ServerAddrs))
backendAddr = make([]proxy.Address, len(backend))
copy(backendAddr, config.Proxy.ServerAddrs)
copy(backendAddr, backend)
loadBalancer.SetAddrs(backendAddr)
fmt.Fprintf(w, "Input value of ", backend, "and here is the byte", backendAddr)
}
your code's error, is copy(backendAddr, backend), variable backend is a string value from the request from, you may change this into []proxy.Address, such as (consider I donnot know the struct of proxy.Address ):
var backendAddr = []proxy.Address{}
for _,str := range strings.split(backend,","){
backendAddr = append(backendAddr, &proxy.Address(str))
}

How to check file existence by its base name (without extension)?

Question is quite self-explanatory.
Please, could anybody show me how can I check existence of the file by name (without extension) by short and efficient way. It would be great if code returns several occurrence if folder have several files with the same name.
Example:
folder/
file.html
file.md
UPDATE:
It is not obviously how to use one of filepath.Match() or filepath.Glob() functions by official documentation. So here is some examples:
matches, _ := filepath.Glob("./folder/file*") //returns paths to real files [folder/file.html, folder/file.md]
matchesToPattern, _ := filepath.Match("./folder/file*", "./folder/file.html") //returns true, but it is just compare strings and doesn't check real content
You need to use the path/filepath package.
The functions to check are: Glob(), Match() and Walk() — pick whatever suits your taste better.
Here is the updated code :
package main
import (
"fmt"
"os"
"path/filepath"
"regexp"
)
func main() {
dirname := "." + string(filepath.Separator)
d, err := os.Open(dirname)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
defer d.Close()
fi, err := d.Readdir(-1)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
r, _ := regexp.Compile("f([a-z]+)le") // the string to match
for _, fi := range fi {
if fi.Mode().IsRegular() { // is file
if r.Match([]byte(fi.Name())) { // if it match
fmt.Println(fi.Name(), fi.Size(), "bytes")
}
}
}
}
With this one you can also search for date, size, include subfolders or file properties.

Resources