Matching text to images using fuzzy search - search

I am using this package: https://github.com/blevesearch/bleve to create a mapping of products2images.
It is working fine when I use single terms, but not at all if I use an entire phrase. For instance, if I use this :
query := bleve.NewFuzzyQuery("lacteo")
it will correctly map the right image. However, If I do this :
query := bleve.NewFuzzyQuery("lacteo leche yogurt cebolla")
It will not match anything at all.
What am I doing wrong here ?
Set DB :
package main
import (
"github.com/blevesearch/bleve"
)
func main() {
message := []struct {
Id string
Body string
}{
{
Id: "lacteos.jpg",
Body: "lacteo leche yogurt cebolla",
},
{
Id: "cafe.jpg",
Body: "café yerba té",
},
{
Id: "queso.jpg",
Body: "lacteo leche yogurt cebolla queso",
},
{
Id: "harina.jpg",
Body: "harina",
},
}
mapping := bleve.NewIndexMapping()
index, err := bleve.New("example.bleve", mapping)
if err != nil {
panic(err)
}
index.Index(message[0].Id, message[0])
index.Index(message[1].Id, message[1])
index.Index(message[2].Id, message[2])
index.Index(message[3].Id, message[3])
}
Search for something :
package main
import (
"fmt"
"log"
"github.com/blevesearch/bleve"
)
func main() {
index, _ := bleve.Open("example.bleve")
query := bleve.NewFuzzyQuery("lacteo leche yogurt cebolla queso")
query.SetFuzziness(2)
searchRequest := bleve.NewSearchRequest(query)
searchResult, err := index.Search(searchRequest)
if err != nil {
log.Fatal(err.Error())
}
for _, v := range searchResult.Hits {
fmt.Println(v.ID)
fmt.Println(v.Score)
fmt.Println("-------------")
}
}

So, after posting an issue at their repo : https://github.com/blevesearch/bleve/issues/1565 I found out this is actually not supported. I ended up adding a little bit more logic to my side to make this work.

Related

CRC32 Checksum Calculation via GO

Trying to create a GO function that produces the same result as the Ubuntu Linux "cksum" operation, for example:
$ echo 123 > /tmp/foo
$ cksum /tmp/foo
2330645186 4 /tmp/foo
Could someone please provide a GO function that produces the first substring of the above result ("2330645186")? Thank you.
(Update)
It turns out cksum doesn't implement a cyclic redundancy check based on the CRC32 process (quite). To test CRC32 (the same as you'd find listed for a CRC32 checksum) you can use CRC calculation # http://zorc.breitbandkatze.de/ - go's hash/crc32.ChecksumIEEE implementation matches this
To implement the cksum crc process (also known as POSIX cksum) I instead generated a golang version of the c algorithm found on a cksum man page (which uses a lookup table)
package main
import (
"bufio"
"fmt"
"io"
"os"
)
var tbl = [256]uint32{0x00000000, 0x04C11DB7, 0x09823B6E, 0x0D4326D9,
0x130476DC, 0x17C56B6B, 0x1A864DB2, 0x1E475005,
0x2608EDB8, 0x22C9F00F, 0x2F8AD6D6, 0x2B4BCB61,
0x350C9B64, 0x31CD86D3, 0x3C8EA00A, 0x384FBDBD,
0x4C11DB70, 0x48D0C6C7, 0x4593E01E, 0x4152FDA9,
0x5F15ADAC, 0x5BD4B01B, 0x569796C2, 0x52568B75,
0x6A1936C8, 0x6ED82B7F, 0x639B0DA6, 0x675A1011,
0x791D4014, 0x7DDC5DA3, 0x709F7B7A, 0x745E66CD,
0x9823B6E0, 0x9CE2AB57, 0x91A18D8E, 0x95609039,
0x8B27C03C, 0x8FE6DD8B, 0x82A5FB52, 0x8664E6E5,
0xBE2B5B58, 0xBAEA46EF, 0xB7A96036, 0xB3687D81,
0xAD2F2D84, 0xA9EE3033, 0xA4AD16EA, 0xA06C0B5D,
0xD4326D90, 0xD0F37027, 0xDDB056FE, 0xD9714B49,
0xC7361B4C, 0xC3F706FB, 0xCEB42022, 0xCA753D95,
0xF23A8028, 0xF6FB9D9F, 0xFBB8BB46, 0xFF79A6F1,
0xE13EF6F4, 0xE5FFEB43, 0xE8BCCD9A, 0xEC7DD02D,
0x34867077, 0x30476DC0, 0x3D044B19, 0x39C556AE,
0x278206AB, 0x23431B1C, 0x2E003DC5, 0x2AC12072,
0x128E9DCF, 0x164F8078, 0x1B0CA6A1, 0x1FCDBB16,
0x018AEB13, 0x054BF6A4, 0x0808D07D, 0x0CC9CDCA,
0x7897AB07, 0x7C56B6B0, 0x71159069, 0x75D48DDE,
0x6B93DDDB, 0x6F52C06C, 0x6211E6B5, 0x66D0FB02,
0x5E9F46BF, 0x5A5E5B08, 0x571D7DD1, 0x53DC6066,
0x4D9B3063, 0x495A2DD4, 0x44190B0D, 0x40D816BA,
0xACA5C697, 0xA864DB20, 0xA527FDF9, 0xA1E6E04E,
0xBFA1B04B, 0xBB60ADFC, 0xB6238B25, 0xB2E29692,
0x8AAD2B2F, 0x8E6C3698, 0x832F1041, 0x87EE0DF6,
0x99A95DF3, 0x9D684044, 0x902B669D, 0x94EA7B2A,
0xE0B41DE7, 0xE4750050, 0xE9362689, 0xEDF73B3E,
0xF3B06B3B, 0xF771768C, 0xFA325055, 0xFEF34DE2,
0xC6BCF05F, 0xC27DEDE8, 0xCF3ECB31, 0xCBFFD686,
0xD5B88683, 0xD1799B34, 0xDC3ABDED, 0xD8FBA05A,
0x690CE0EE, 0x6DCDFD59, 0x608EDB80, 0x644FC637,
0x7A089632, 0x7EC98B85, 0x738AAD5C, 0x774BB0EB,
0x4F040D56, 0x4BC510E1, 0x46863638, 0x42472B8F,
0x5C007B8A, 0x58C1663D, 0x558240E4, 0x51435D53,
0x251D3B9E, 0x21DC2629, 0x2C9F00F0, 0x285E1D47,
0x36194D42, 0x32D850F5, 0x3F9B762C, 0x3B5A6B9B,
0x0315D626, 0x07D4CB91, 0x0A97ED48, 0x0E56F0FF,
0x1011A0FA, 0x14D0BD4D, 0x19939B94, 0x1D528623,
0xF12F560E, 0xF5EE4BB9, 0xF8AD6D60, 0xFC6C70D7,
0xE22B20D2, 0xE6EA3D65, 0xEBA91BBC, 0xEF68060B,
0xD727BBB6, 0xD3E6A601, 0xDEA580D8, 0xDA649D6F,
0xC423CD6A, 0xC0E2D0DD, 0xCDA1F604, 0xC960EBB3,
0xBD3E8D7E, 0xB9FF90C9, 0xB4BCB610, 0xB07DABA7,
0xAE3AFBA2, 0xAAFBE615, 0xA7B8C0CC, 0xA379DD7B,
0x9B3660C6, 0x9FF77D71, 0x92B45BA8, 0x9675461F,
0x8832161A, 0x8CF30BAD, 0x81B02D74, 0x857130C3,
0x5D8A9099, 0x594B8D2E, 0x5408ABF7, 0x50C9B640,
0x4E8EE645, 0x4A4FFBF2, 0x470CDD2B, 0x43CDC09C,
0x7B827D21, 0x7F436096, 0x7200464F, 0x76C15BF8,
0x68860BFD, 0x6C47164A, 0x61043093, 0x65C52D24,
0x119B4BE9, 0x155A565E, 0x18197087, 0x1CD86D30,
0x029F3D35, 0x065E2082, 0x0B1D065B, 0x0FDC1BEC,
0x3793A651, 0x3352BBE6, 0x3E119D3F, 0x3AD08088,
0x2497D08D, 0x2056CD3A, 0x2D15EBE3, 0x29D4F654,
0xC5A92679, 0xC1683BCE, 0xCC2B1D17, 0xC8EA00A0,
0xD6AD50A5, 0xD26C4D12, 0xDF2F6BCB, 0xDBEE767C,
0xE3A1CBC1, 0xE760D676, 0xEA23F0AF, 0xEEE2ED18,
0xF0A5BD1D, 0xF464A0AA, 0xF9278673, 0xFDE69BC4,
0x89B8FD09, 0x8D79E0BE, 0x803AC667, 0x84FBDBD0,
0x9ABC8BD5, 0x9E7D9662, 0x933EB0BB, 0x97FFAD0C,
0xAFB010B1, 0xAB710D06, 0xA6322BDF, 0xA2F33668,
0xBCB4666D, 0xB8757BDA, 0xB5365D03, 0xB1F740B4}
type crc struct {
p, r uint32
Size int
final bool
}
func NewCrc() *crc {
return &crc{0, 0, 0, false}
}
func (pr *crc) Add(b byte) {
if pr.final {
return
}
pr.r = (pr.r << 8) ^ tbl[byte(pr.r>>24)^b]
pr.Size++
}
func (pr *crc) Crc() uint32 {
if pr.final {
return pr.r
}
for m := pr.Size; m > 0; {
b := byte(m & 0377)
m = m >> 8
pr.r = (pr.r << 8) ^ tbl[byte(pr.r>>24)^b]
}
pr.final = true //Prevent further modification
pr.r = ^pr.r
return pr.r
}
func cksum(filename string) (uint32, int, error) {
f, err := os.Open(filename)
if err != nil {
return 0, 0, err
}
defer f.Close()
in := bufio.NewReader(f)
pr := NewCrc()
for done := false; !done; {
switch b, err := in.ReadByte(); err {
case io.EOF:
done = true
case nil:
pr.Add(b)
default:
return 0, 0, err
}
}
return pr.Crc(), pr.Size, nil
}
func main() {
var filename = "foo"
crc, size, err := cksum(filename)
if err != nil {
fmt.Println("Error: ", err)
return
}
fmt.Printf("%d %d %s\n", crc, size, filename)
}
Obviously in this case the filename is hardcoded (to foo) but you could change that with flags. The content of foo is 123\n (**note: in windows you'll need to convert line endings to not get a 5 byte file) Results:
linux: $ cksum foo
2330645186 4 foo
linux: $ go run cksum.go
2330645186 4 foo
windows: > go run cksum.go **
2330645186 4 foo
Actually, I found a more simplified answer to my original question:
Using:
https://pkg.go.dev/github.com/cxmcc/unixsums#section-readme
Here is the snippet that provides the posix checksum equivalent value of a file in Go:
data, err := ioutil.ReadFile("/tmp/test.loop")
if err != nil {
log.Fatal(err)
}
fmt.Printf("cksum: %d\n", cksum.Cksum(data))

how to get golang to test a multiline output matches

I have the following code which generates some string output:
package formatter
import (
"bytes"
"log"
"text/template"
"github.com/foo/bar/internal/mapper"
)
// map of template functions that enable us to identify the final item within a
// collection being iterated over.
var fns = template.FuncMap{
"plus1": func(x int) int {
return x + 1
},
}
// Dot renders our results in dot format for use with graphviz
func Dot(results []mapper.Page) string {
dotTmpl := `digraph sitemap { {{range .}}
"{{.URL}}"
-> { {{$n := len .Anchors}}{{range $i, $v := .Anchors}}
"{{.}}"{{if eq (plus1 $i) $n}}{{else}},{{end}}{{end}}
} {{end}}
}`
tmpl, err := template.New("digraph").Funcs(fns).Parse(dotTmpl)
if err != nil {
log.Fatal(err)
}
var output bytes.Buffer
if err := tmpl.Execute(&output, results); err != nil {
log.Fatal(err)
}
return output.String()
}
It generates output like:
digraph sitemap {
"http://www.example.com/"
-> {
"http://www.example.com/foo",
"http://www.example.com/bar",
"http://www.example.com/baz"
}
}
Below is a test for this functionality...
package formatter
import (
"testing"
"github.com/foo/bar/internal/mapper"
)
func TestDot(t *testing.T) {
input := []mapper.Page{
mapper.Page{
URL: "http://www.example.com/",
Anchors: []string{
"http://www.example.com/foo",
"http://www.example.com/bar",
"http://www.example.com/baz",
},
Links: []string{
"http://www.example.com/foo.css",
"http://www.example.com/bar.css",
"http://www.example.com/baz.css",
},
Scripts: []string{
"http://www.example.com/foo.js",
"http://www.example.com/bar.js",
"http://www.example.com/baz.js",
},
},
}
output := `digraph sitemap {
"http://www.example.com/"
-> {
"http://www.example.com/foo",
"http://www.example.com/bar",
"http://www.example.com/baz"
}
}`
actual := Dot(input)
if actual != output {
t.Errorf("expected: %s\ngot: %s", output, actual)
}
}
Which fails with the following error (which is related to the outputted format spacing)...
--- FAIL: TestDot (0.00s)
format_test.go:43: expected: digraph sitemap {
"http://www.example.com/"
-> {
"http://www.example.com/foo",
"http://www.example.com/bar",
"http://www.example.com/baz"
}
}
got: digraph sitemap {
"http://www.example.com/"
-> {
"http://www.example.com/foo",
"http://www.example.com/bar",
"http://www.example.com/baz"
}
}
I've tried tweaking my test output variable so the spacing would align with what's actually outputted from the real code. That didn't work.
I also tried using strings.Replace() on both my output variable and the actual outputted content and bizarrely the output from my function (even though it was passed through strings.Replace would still be multi-lined (and so the test would fail)?
Anyone have any ideas how I can make the output consistent for the sake of code verification?
Thanks.
UPDATE
I tried the approach suggested by #icza and it still fails the test, although the output in the test looks more like it's expected to be:
=== RUN TestDot
--- FAIL: TestDot (0.00s)
format_test.go:65: expected: digraph sitemap {
"http://www.example.com/"
-> {
"http://www.example.com/foo",
"http://www.example.com/bar",
"http://www.example.com/baz"
}
}
got: digraph sitemap {
"http://www.example.com/"
-> {
"http://www.example.com/foo",
"http://www.example.com/bar",
"http://www.example.com/baz"
}
}
If you want to ignore format, you can use strings.Fields.
output := strings.Fields(`digraph sitemap {
"http://www.example.com/"
-> {
"http://www.example.com/foo",
"http://www.example.com/bar",
"http://www.example.com/baz"
}
}`)
actual := strings.Fields(Dot(input))
if !equal(output,actual) {
// ...
}
where equal is a simple function that compares two slices.
The simplest solution is to use the same indentation in the test when specifying the expected output (the same what you use in the template).
You have:
output := `digraph sitemap {
"http://www.example.com/"
-> {
"http://www.example.com/foo",
"http://www.example.com/bar",
"http://www.example.com/baz"
}
}`
Change it to:
output := `digraph sitemap {
"http://www.example.com/"
-> {
"http://www.example.com/foo",
"http://www.example.com/bar",
"http://www.example.com/baz"
}
}`
Note that for example the final line is not indented. When you use raw string literal, every character including indentation characters is part of the literal as-is.
Steps to create a correct, un-indented raw string literal
After all, this is completely a non-coding issue, but rather an issue of editors' auto-formatting and defining a raw string literal. An easy way to get it right is first to write an empty raw string literal, add an empty line to it and clear the auto-indentation inserted by the editor:
output := `
`
When you have this, copy-paste the correct input before the closing backtick, e.g.:
output := `
digraph sitemap {
"http://www.example.com/"
-> {
"http://www.example.com/foo",
"http://www.example.com/bar",
"http://www.example.com/baz"
}
}`
And as a last step, remove line break from the first line of the raw string literal, and you have the correct raw string literal:
output := `digraph sitemap {
"http://www.example.com/"
-> {
"http://www.example.com/foo",
"http://www.example.com/bar",
"http://www.example.com/baz"
}
}`
Once you have this, running gofmt or auto-formatting of editors will not mess with it anymore.
UPDATE:
I checked your updated test result, and in the result you get, there is a space after the first line: digraph sitemap {, and also there's a space after the 3rd line: -> {, but you don't add those to your expected output. Either add those to your expected output too, or remove those spaces from the template! When comparing strings, they are compared byte-wise, every character (including white-spaces) matter.
To remove those extra spaces from the template:
dotTmpl := `digraph sitemap { {{- range .}}
"{{.URL}}"
-> { {{- $n := len .Anchors}}{{range $i, $v := .Anchors}}
"{{.}}"{{if eq (plus1 $i) $n}}{{else}},{{end}}{{end}}
} {{end}}
}`
Note the use of {{-. This is to trim spaces around template actions, this was added in Go 1.6.
the problem is that there is an extra space. in your formatted text right after { that seems to be your problem. You can fix it by changing your format string to this
`digraph sitemap {{{range .}}
"{{.URL}}"
-> {{{$n := len .Anchors}}{{range $i, $v := .Anchors}}
"{{.}}"{{if eq (plus1 $i) $n}}{{else}},{{end}}{{end}}
}{{end}}
}`

Insert a mgo query []M.bson result into a file.txt as a string

i have to insert into a file the result of a mgo query MongoDB converted in Go to get the id of images
var path="/home/Medo/text.txt"
pipe := cc.Pipe([]bson.M{
{"$unwind": "$images"},
{"$group": bson.M{"_id": "null", "images":bson.M{"$push": "$images"}}},
{"$project": bson.M{"_id": 0}}})
response := []bson.M{}
errResponse := pipe.All(&response)
if errResponse != nil {
fmt.Println("error Response: ",errResponse)
}
fmt.Println(response) // to print for making sure that it is working
data, err := bson.Marshal(&response)
s:=string(data)
if err22 != nil {
fmt.Println("error insertion ", err22)
}
Here is the part where I have to create a file and write on it.
The problem is when I got the result of the query in the text file I got an enumeration values in the last of each value for example:
id of images
23456678`0`
24578689`1`
23678654`2`
12890762`3`
76543890`4`
64744848`5`
so for each value i got a number sorted in the last , and i can't figure out how , after getting the reponse from the query i converted the Bson to []Byte and then to Stringbut it keeps me getting that enumeration sorted values in the last of each results
I'd like to drop those 0 1 2 3 4 5
var _, errExistFile = os.Stat(path)
if os.IsNotExist(errExistFile) {
var file, errCreateFile = os.Create(path)
if isError(erro) {
return
}
defer file.Close()
}
fmt.Println("==> done creating file", path)
var file, errii = os.OpenFile(path, os.O_RDWR, 0644)
if isError(errii) {
return
}
defer file.Close()
// write some text line-by-line to file
_, erri := file.WriteString(s)
if isError(erri) {
return
}
erri = file.Sync()
if isError(erri) {
return
}
fmt.Println("==> done writing to file")
You could declare a simple struct eg
simple struct {
ID idtype `bson:"_id"`
Image int `bson:"images"`
}
The function to put the image ids into the file would be
open file stuff…
result := simple{}
iter := collection.Find(nil).Iter()
for iter.Next(&result){
file.WriteString(fmt.Sprintf("%d\n",result.Image))
}
iter.Close()

How to get webpage content into a string using Go

I am trying to use Go and the http package to get the content of a webpage into a string, then be able to process the string. I am new to Go, so I am not entirely sure where to begin. Here is the function I am trying to make.
func OnPage(link string) {
}
I am not sure how to write the function. Link is the url of the webpage to use, and result would be the string from the webpage. So for example, if I used reddit as the link, then the result would just be the string form of the content on reddit, and I could process that string in different ways. From what I have read, I want to use the http package, but as I stated before, I do not know where to begin. Any help would be appreciated.
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func OnPage(link string)(string) {
res, err := http.Get(link)
if err != nil {
log.Fatal(err)
}
content, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
return string(content)
}
func main() {
fmt.Println(OnPage("http://www.bbc.co.uk/news/uk-england-38003934"))
}

How to check file existence by its base name (without extension)?

Question is quite self-explanatory.
Please, could anybody show me how can I check existence of the file by name (without extension) by short and efficient way. It would be great if code returns several occurrence if folder have several files with the same name.
Example:
folder/
file.html
file.md
UPDATE:
It is not obviously how to use one of filepath.Match() or filepath.Glob() functions by official documentation. So here is some examples:
matches, _ := filepath.Glob("./folder/file*") //returns paths to real files [folder/file.html, folder/file.md]
matchesToPattern, _ := filepath.Match("./folder/file*", "./folder/file.html") //returns true, but it is just compare strings and doesn't check real content
You need to use the path/filepath package.
The functions to check are: Glob(), Match() and Walk() — pick whatever suits your taste better.
Here is the updated code :
package main
import (
"fmt"
"os"
"path/filepath"
"regexp"
)
func main() {
dirname := "." + string(filepath.Separator)
d, err := os.Open(dirname)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
defer d.Close()
fi, err := d.Readdir(-1)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
r, _ := regexp.Compile("f([a-z]+)le") // the string to match
for _, fi := range fi {
if fi.Mode().IsRegular() { // is file
if r.Match([]byte(fi.Name())) { // if it match
fmt.Println(fi.Name(), fi.Size(), "bytes")
}
}
}
}
With this one you can also search for date, size, include subfolders or file properties.

Resources