Program Execution gets stuck - multithreading

I'm trying to run the following worker threads in a multi-threaded environment but it seems like all the threads get stuck and none of them are able to proceed.
func executeTask(currentTask task, serial bool, imagePixels map[string]*imagePixelContainer, completedTracker map[string]*partsCompleted,
bufferPixels map[string]*imagePixelContainer, nextEffects map[string]*nextEffect, pl *sync.Mutex, threadid int) {
newPixels := applyEffect(imagePixels[currentTask.image].pixels, currentTask.effect, currentTask.starty, currentTask.endy)
mergeComputation(newPixels, bufferPixels[currentTask.image], currentTask.starty, currentTask.endy)
completedTracker[currentTask.image].mux.Lock()
completedTracker[currentTask.image].completed += 1
if completedTracker[currentTask.image].completed == 3 {
imagePixels[currentTask.image].mux.Lock()
imagePixels[currentTask.image].pixels = duplicate(bufferPixels[currentTask.image].pixels)
imagePixels[currentTask.image].mux.Unlock()
completedTracker[currentTask.image].completed = 0
nextEffects[currentTask.image].mux.Lock()
nextEffects[currentTask.image].index += 1
nextEffects[currentTask.image].mux.Unlock()
}
completedTracker[currentTask.image].mux.Unlock()
}
func mergeComputation(computedPixels [][]Pixel, storing *imagePixelContainer, starty int, endy int) {
storing.mux.Lock()
for y := 0; y < len(computedPixels); y++ {
for x := 0; x < len(computedPixels[y]); x++ {
storing.pixels[starty][x] = computedPixels[y][x]
}
starty += 1
}
storing.mux.Unlock()
}
// Worker threads that get spawned
func worker(que *Queue, nextEffects map[string]*nextEffect, effectorder map[string][]string, imagePixels map[string]*imagePixelContainer,
completedTracker map[string]*partsCompleted, bufferPixels map[string]*imagePixelContainer, printlock *sync.Mutex, counter *int, threadid int) {
for !(que.Empty()) {
ctask := que.Pop()
if ctask.effect != "" { // Indicates that the task queue is empty at this moment
nextEffects[ctask.image].mux.Lock()
cond := ctask.effect != effectorder[ctask.image][nextEffects[ctask.image].index]
nextEffects[ctask.image].mux.Unlock()
if cond {
que.Push(ctask)
} else {
executeTask(ctask, false, imagePixels, completedTracker, bufferPixels, nextEffects, printlock, threadid)
}
}
}
*counter -= 1
}
The queue's Push and Pop are thread safe. Each map object has a string key to a pointer struct that holds the data variable and a mutex variable to protect that data. Thanks for the help :)

Related

Why is the following a memory leak? [duplicate]

I've got code that looks like this:
for (std::list<item*>::iterator i=items.begin();i!=items.end();i++)
{
bool isActive = (*i)->update();
//if (!isActive)
// items.remove(*i);
//else
other_code_involving(*i);
}
items.remove_if(CheckItemNotActive);
I'd like remove inactive items immediately after update them, inorder to avoid walking the list again. But if I add the commented-out lines, I get an error when I get to i++: "List iterator not incrementable". I tried some alternates which didn't increment in the for statement, but I couldn't get anything to work.
What's the best way to remove items as you are walking a std::list?
You have to increment the iterator first (with i++) and then remove the previous element (e.g., by using the returned value from i++). You can change the code to a while loop like so:
std::list<item*>::iterator i = items.begin();
while (i != items.end())
{
bool isActive = (*i)->update();
if (!isActive)
{
items.erase(i++); // alternatively, i = items.erase(i);
}
else
{
other_code_involving(*i);
++i;
}
}
You want to do:
i= items.erase(i);
That will correctly update the iterator to point to the location after the iterator you removed.
You need to do the combination of Kristo's answer and MSN's:
// Note: Using the pre-increment operator is preferred for iterators because
// there can be a performance gain.
//
// Note: As long as you are iterating from beginning to end, without inserting
// along the way you can safely save end once; otherwise get it at the
// top of each loop.
std::list< item * >::iterator iter = items.begin();
std::list< item * >::iterator end = items.end();
while (iter != end)
{
item * pItem = *iter;
if (pItem->update() == true)
{
other_code_involving(pItem);
++iter;
}
else
{
// BTW, who is deleting pItem, a.k.a. (*iter)?
iter = items.erase(iter);
}
}
Of course, the most efficient and SuperCool® STL savy thing would be something like this:
// This implementation of update executes other_code_involving(Item *) if
// this instance needs updating.
//
// This method returns true if this still needs future updates.
//
bool Item::update(void)
{
if (m_needsUpdates == true)
{
m_needsUpdates = other_code_involving(this);
}
return (m_needsUpdates);
}
// This call does everything the previous loop did!!! (Including the fact
// that it isn't deleting the items that are erased!)
items.remove_if(std::not1(std::mem_fun(&Item::update)));
I have sumup it, here is the three method with example:
1. using while loop
list<int> lst{4, 1, 2, 3, 5};
auto it = lst.begin();
while (it != lst.end()){
if((*it % 2) == 1){
it = lst.erase(it);// erase and go to next
} else{
++it; // go to next
}
}
for(auto it:lst)cout<<it<<" ";
cout<<endl; //4 2
2. using remove_if member funtion in list:
list<int> lst{4, 1, 2, 3, 5};
lst.remove_if([](int a){return a % 2 == 1;});
for(auto it:lst)cout<<it<<" ";
cout<<endl; //4 2
3. using std::remove_if funtion combining with erase member function:
list<int> lst{4, 1, 2, 3, 5};
lst.erase(std::remove_if(lst.begin(), lst.end(), [](int a){
return a % 2 == 1;
}), lst.end());
for(auto it:lst)cout<<it<<" ";
cout<<endl; //4 2
4. using for loop , should note update the iterator:
list<int> lst{4, 1, 2, 3, 5};
for(auto it = lst.begin(); it != lst.end();++it){
if ((*it % 2) == 1){
it = lst.erase(it); erase and go to next(erase will return the next iterator)
--it; // as it will be add again in for, so we go back one step
}
}
for(auto it:lst)cout<<it<<" ";
cout<<endl; //4 2
Use std::remove_if algorithm.
Edit:
Work with collections should be like:
prepare collection.
process collection.
Life will be easier if you won't mix this steps.
std::remove_if. or list::remove_if ( if you know that you work with list and not with the TCollection )
std::for_each
The alternative for loop version to Kristo's answer.
You lose some efficiency, you go backwards and then forward again when deleting but in exchange for the extra iterator increment you can have the iterator declared in the loop scope and the code looking a bit cleaner. What to choose depends on priorities of the moment.
The answer was totally out of time, I know...
typedef std::list<item*>::iterator item_iterator;
for(item_iterator i = items.begin(); i != items.end(); ++i)
{
bool isActive = (*i)->update();
if (!isActive)
{
items.erase(i--);
}
else
{
other_code_involving(*i);
}
}
Here's an example using a for loop that iterates the list and increments or revalidates the iterator in the event of an item being removed during traversal of the list.
for(auto i = items.begin(); i != items.end();)
{
if(bool isActive = (*i)->update())
{
other_code_involving(*i);
++i;
}
else
{
i = items.erase(i);
}
}
items.remove_if(CheckItemNotActive);
Removal invalidates only the iterators that point to the elements that are removed.
So in this case after removing *i , i is invalidated and you cannot do increment on it.
What you can do is first save the iterator of element that is to be removed , then increment the iterator and then remove the saved one.
If you think of the std::list like a queue, then you can dequeue and enqueue all the items that you want to keep, but only dequeue (and not enqueue) the item you want to remove. Here's an example where I want to remove 5 from a list containing the numbers 1-10...
std::list<int> myList;
int size = myList.size(); // The size needs to be saved to iterate through the whole thing
for (int i = 0; i < size; ++i)
{
int val = myList.back()
myList.pop_back() // dequeue
if (val != 5)
{
myList.push_front(val) // enqueue if not 5
}
}
myList will now only have numbers 1-4 and 6-10.
Iterating backwards avoids the effect of erasing an element on the remaining elements to be traversed:
typedef list<item*> list_t;
for ( list_t::iterator it = items.end() ; it != items.begin() ; ) {
--it;
bool remove = <determine whether to remove>
if ( remove ) {
items.erase( it );
}
}
PS: see this, e.g., regarding backward iteration.
PS2: I did not thoroughly tested if it handles well erasing elements at the ends.
You can write
std::list<item*>::iterator i = items.begin();
while (i != items.end())
{
bool isActive = (*i)->update();
if (!isActive) {
i = items.erase(i);
} else {
other_code_involving(*i);
i++;
}
}
You can write equivalent code with std::list::remove_if, which is less verbose and more explicit
items.remove_if([] (item*i) {
bool isActive = (*i)->update();
if (!isActive)
return true;
other_code_involving(*i);
return false;
});
The std::vector::erase std::remove_if idiom should be used when items is a vector instead of a list to keep compexity at O(n) - or in case you write generic code and items might be a container with no effective way to erase single items (like a vector)
items.erase(std::remove_if(begin(items), end(items), [] (item*i) {
bool isActive = (*i)->update();
if (!isActive)
return true;
other_code_involving(*i);
return false;
}));
do while loop, it's flexable and fast and easy to read and write.
auto textRegion = m_pdfTextRegions.begin();
while(textRegion != m_pdfTextRegions.end())
{
if ((*textRegion)->glyphs.empty())
{
m_pdfTextRegions.erase(textRegion);
textRegion = m_pdfTextRegions.begin();
}
else
textRegion++;
}
I'd like to share my method. This method also allows the insertion of the element to the back of the list during iteration
#include <iostream>
#include <list>
int main(int argc, char **argv) {
std::list<int> d;
for (int i = 0; i < 12; ++i) {
d.push_back(i);
}
auto it = d.begin();
int nelem = d.size(); // number of current elements
for (int ielem = 0; ielem < nelem; ++ielem) {
auto &i = *it;
if (i % 2 == 0) {
it = d.erase(it);
} else {
if (i % 3 == 0) {
d.push_back(3*i);
}
++it;
}
}
for (auto i : d) {
std::cout << i << ", ";
}
std::cout << std::endl;
// result should be: 1, 3, 5, 7, 9, 11, 9, 27,
return 0;
}
I think you have a bug there, I code this way:
for (std::list<CAudioChannel *>::iterator itAudioChannel = audioChannels.begin();
itAudioChannel != audioChannels.end(); )
{
CAudioChannel *audioChannel = *itAudioChannel;
std::list<CAudioChannel *>::iterator itCurrentAudioChannel = itAudioChannel;
itAudioChannel++;
if (audioChannel->destroyMe)
{
audioChannels.erase(itCurrentAudioChannel);
delete audioChannel;
continue;
}
audioChannel->Mix(outBuffer, numSamples);
}

Call functions with special prefix/suffix

I have a package named "seeder":
package seeder
import "fmt"
func MyFunc1() {
fmt.Println("I am Masood")
}
func MyFunc2() {
fmt.Println("I am a programmer")
}
func MyFunc3() {
fmt.Println("I want to buy a car")
}
Now I want to call all functions with MyFunc prefix
package main
import "./seeder"
func main() {
for k := 1; k <= 3; k++ {
seeder.MyFunc1() // This calls MyFunc1 three times
}
}
I want something like this:
for k := 1; k <= 3; k++ {
seeder.MyFunc + k ()
}
and this output:
I am Masood
I am a programmer
I want to buy a car
EDIT1:
In this example, parentKey is a string variable which changed in a loop
for parentKey, _ := range uRLSjson{
pppp := seeder + "." + strings.ToUpper(parentKey)
gorilla.HandleFunc("/", pppp).Name(parentKey)
}
But GC said:
use of package seeder without selector
You can't get a function by its name, and that is what you're trying to do. The reason is that if the Go tool can detect that a function is not referred to explicitly (and thus unreachable), it may not even get compiled into the executable binary. For details see Splitting client/server code.
With a function registry
One way to do what you want is to build a "function registry" prior to calling them:
registry := map[string]func(){
"MyFunc1": MyFunc1,
"MyFunc2": MyFunc2,
"MyFunc3": MyFunc3,
}
for k := 1; k <= 3; k++ {
registry[fmt.Sprintf("MyFunc%d", k)]()
}
Output (try it on the Go Playground):
Hello MyFunc1
Hello MyFunc2
Hello MyFunc3
Manual "routing"
Similar to the registry is inspecting the name and manually routing to the function, for example:
func callByName(name string) {
switch name {
case "MyFunc1":
MyFunc1()
case "MyFunc2":
MyFunc2()
case "MyFunc3":
MyFunc3()
default:
panic("Unknown function name")
}
}
Using it:
for k := 1; k <= 3; k++ {
callByName(fmt.Sprintf("MyFunc%d", k))
}
Try this on the Go Playground.
Note: It's up to you if you want to call the function identified by its name in the callByName() helper function, or you may choose to return a function value (of type func()) and have it called in the caller's place.
Transforming functions to methods
Also note that if your functions would actually be methods of some type, you could do it without a registry. Using reflection, you can get a method by name: Value.MethodByName(). You can also get / enumerate all methods without knowing their names using Value.NumMethod() and Value.Method() (also see Type.NumMethod() and Type.Method() if you need the name of the method or its parameter types).
This is how it could be done:
type MyType int
func (m MyType) MyFunc1() {
fmt.Println("Hello MyFunc1")
}
func (m MyType) MyFunc2() {
fmt.Println("Hello MyFunc2")
}
func (m MyType) MyFunc3() {
fmt.Println("Hello MyFunc3")
}
func main() {
v := reflect.ValueOf(MyType(0))
for k := 1; k <= 3; k++ {
v.MethodByName(fmt.Sprintf("MyFunc%d", k)).Call(nil)
}
}
Output is the same. Try it on the Go Playground.
Another alternative would be to range over an array of your functions
package main
import (
"fmt"
)
func MyFunc1() {
fmt.Println("I am Masood")
}
func MyFunc2() {
fmt.Println("I am a programmer")
}
func MyFunc3() {
fmt.Println("I want to buy a car")
}
func main() {
for _, fn := range []func(){MyFunc1, MyFunc2, MyFunc3} {
fn()
}
}

structure with nested maps golang

Hi I'm new to go and was trying to figure out how maps work.
I have made up a little test program and can't seem to get it to work.
What I'm doing wrong?
package main
import (
"fmt"
)
type Stats struct {
cnt int
category map[string]Events
}
type Events struct {
cnt int
event map[string]Event
}
type Event struct {
value int64
}
func main() {
stats := new(Stats)
stats.cnt = 33
stats.category["aa"].cnt = 66
stats.category["aa"].event["bb"].value = 99
fmt.Println(stats.cnt, stats.category["aa"].event["bb"].value)
}
There are couple of issues with the code:
Map needs to be initialized using make function. Currently they are nil
Return value from map is non-addressable, this because if map is growing it needs to relocated which will cause memory address to change. Hence we need to extract value explicitly from map to a variable, update it and assigning it back.
Use pointer
I have updated the solution to show both updated it value returned and assigning it back and pointer.
http://play.golang.org/p/lv50AONXyU
package main
import (
"fmt"
)
type Stats struct {
cnt int
category map[string]Events
}
type Events struct {
cnt int
event map[string]*Event
}
type Event struct {
value int64
}
func main() {
stats := new(Stats)
stats.cnt = 33
stats.category = make(map[string]Events)
e, f := stats.category["aa"]
if !f {
e = Events{}
}
e.cnt = 66
e.event = make(map[string]*Event)
stats.category["aa"] = e
stats.category["aa"].event["bb"] = &Event{}
stats.category["aa"].event["bb"].value = 99
fmt.Println(stats)
fmt.Println(stats.cnt, stats.category["aa"].event["bb"].value)
}
Adding this as a different approach to the problem:
type Stats struct {
cnt int
categories map[string]*Events
}
func (s *Stats) Category(n string) (e *Events) {
if s.categories == nil {
s.categories = map[string]*Events{}
}
if e = s.categories[n]; e == nil {
e = &Events{}
s.categories[n] = e
}
return
}
type Events struct {
cnt int
events map[string]*Event
}
func (e *Events) Event(n string) (ev *Event) {
if e.events == nil {
e.events = map[string]*Event{}
}
if ev = e.events[n]; ev == nil {
ev = &Event{}
e.events[n] = ev
}
return
}
type Event struct {
value int64
}
func main() {
var stats Stats
stats.cnt = 33
stats.Category("aa").cnt = 66
stats.Category("aa").Event("bb").value = 99
fmt.Println(stats)
fmt.Println(stats.cnt, stats.Category("aa").Event("bb").value)
}
playground
There are a few issues with your approach.
You aren't initializing you maps. You need to create them first.
Maps return copies of their values. So when you pull out "aa" and modify it, you are getting a copy of "aa", changing it, then throwing it away. You need to put it back in the map, or use pointers.
Here's a working example (non-pointer version) on Play.
Notice the construction of the maps, and the re-assignment back to the map when modifying a value.
package main
import (
"fmt"
)
type Stats struct {
cnt int
category map[string]Events
}
type Events struct {
cnt int
event map[string]Event
}
type Event struct {
value int64
}
func main() {
stats := &Stats{category: map[string]Events{}}
stats.cnt = 33
tmpCat, ok := stats.category["aa"]
if !ok {
tmpCat = Events{event: map[string]Event{}}
}
tmpCat.cnt = 66
tmpEv := tmpCat.event["bb"]
tmpEv.value = 99
tmpCat.event["bb"] = tmpEv
stats.category["aa"] = tmpCat
fmt.Println(stats.cnt, stats.category["aa"].event["bb"].value)
}

Go: Excessive memory usage, memory leak

I am very, very memory careful as I have to write programs that need to cope with massive datasets.
Currently my application quickly reaches 32GB of memory, starts swapping, and then gets killed by the system.
I do not understand how this can be since all variables are collectable (in functions and quickly released) except TokensStruct and TokensCount in the Trainer struct. TokensCount is just a uint. TokensStruct is a 1,000,000 row slice of [5]uint32 and string, so that means 20 bytes + string, which we could call a maximum of 50 bytes per record. 50*1000000 = 50MB of memory required. So this script should therefore not use much more than 50MB + overhead + temporary collectable variables in the functions (maybe another 50MB max.) The maximum potential size of TokensStruct is 5,000,000, as this is the size of dictionary, but even then it would be only 250MB of memory. dictionary is a map and apparently uses around 600MB of memory, as that is how the app starts, but this is not an issue because dictionary is only loaded once and never written to again.
Instead it uses 32GB of memory then dies. By the speed that it does this I expect it would happily get to 1TB of memory if it could. The memory appears to increase in a linear fashion with the size of the files being loaded, meaning that it appears to never clear any memory at all. Everything that enters the app is allocated more memory and memory is never freed.
I tried implementing runtime.GC() in case the garbage collection wasn't running often enough, but this made no difference.
Since the memory usage increases in a linear fashion then this would imply that there is a memory leak in GetTokens() or LoadZip(). I don't know how this could be, since they are both functions and only do one task and then close. Or it could be that the tokens variable in Start() is the cause of the leak. Basically it looks like every file that is loaded and parsed is never released from memory, as that is the only way that the memory could fill up in a linear fashion and keep on rising up to 32GB++.
Absolute nightmare! What's wrong with Go? Any way to fix this?
package main
import (
"bytes"
"code.google.com/p/go.text/transform"
"code.google.com/p/go.text/unicode/norm"
"compress/zlib"
"encoding/gob"
"fmt"
"github.com/AlasdairF/BinSearch"
"io/ioutil"
"os"
"regexp"
"runtime"
"strings"
"unicode"
"unicode/utf8"
)
type TokensStruct struct {
binsearch.Key_string
Value [][5]uint32
}
type Trainer struct {
Tokens TokensStruct
TokensCount uint
}
func checkErr(err error) {
if err == nil {
return
}
fmt.Println(`Some Error:`, err)
panic(err)
}
// Local helper function for normalization of UTF8 strings.
func isMn(r rune) bool {
return unicode.Is(unicode.Mn, r) // Mn: nonspacing marks
}
// This map is used by RemoveAccents function to convert non-accented characters.
var transliterations = map[rune]string{'Æ': "E", 'Ð': "D", 'Ł': "L", 'Ø': "OE", 'Þ': "Th", 'ß': "ss", 'æ': "e", 'ð': "d", 'ł': "l", 'ø': "oe", 'þ': "th", 'Œ': "OE", 'œ': "oe"}
// removeAccentsBytes converts accented UTF8 characters into their non-accented equivalents, from a []byte.
func removeAccentsBytesDashes(b []byte) ([]byte, error) {
mnBuf := make([]byte, len(b))
t := transform.Chain(norm.NFD, transform.RemoveFunc(isMn), norm.NFC)
n, _, err := t.Transform(mnBuf, b, true)
if err != nil {
return nil, err
}
mnBuf = mnBuf[:n]
tlBuf := bytes.NewBuffer(make([]byte, 0, len(mnBuf)*2))
for i, w := 0, 0; i < len(mnBuf); i += w {
r, width := utf8.DecodeRune(mnBuf[i:])
if r == '-' {
tlBuf.WriteByte(' ')
} else {
if d, ok := transliterations[r]; ok {
tlBuf.WriteString(d)
} else {
tlBuf.WriteRune(r)
}
}
w = width
}
return tlBuf.Bytes(), nil
}
func LoadZip(filename string) ([]byte, error) {
// Open file for reading
fi, err := os.Open(filename)
if err != nil {
return nil, err
}
defer fi.Close()
// Attach ZIP reader
fz, err := zlib.NewReader(fi)
if err != nil {
return nil, err
}
defer fz.Close()
// Pull
data, err := ioutil.ReadAll(fz)
if err != nil {
return nil, err
}
return norm.NFC.Bytes(data), nil // return normalized
}
func getTokens(pibn string) []string {
var data []byte
var err error
data, err = LoadZip(`/storedir/` + pibn + `/text.zip`)
checkErr(err)
data, err = removeAccentsBytesDashes(data)
checkErr(err)
data = bytes.ToLower(data)
data = reg2.ReplaceAll(data, []byte("$2")) // remove contractions
data = reg.ReplaceAllLiteral(data, nil)
tokens := strings.Fields(string(data))
return tokens
}
func (t *Trainer) Start() {
data, err := ioutil.ReadFile(`list.txt`)
checkErr(err)
pibns := bytes.Fields(data)
for i, pibn := range pibns {
tokens := getTokens(string(pibn))
t.addTokens(tokens)
if i%100 == 0 {
runtime.GC() // I added this just to try to stop the memory craziness, but it makes no difference
}
}
}
func (t *Trainer) addTokens(tokens []string) {
for _, tok := range tokens {
if _, ok := dictionary[tok]; ok {
if indx, ok2 := t.Tokens.Find(tok); ok2 {
ar := t.Tokens.Value[indx]
ar[0]++
t.Tokens.Value[indx] = ar
t.TokensCount++
} else {
t.Tokens.AddKeyAt(tok, indx)
t.Tokens.Value = append(t.Tokens.Value, [5]uint32{0, 0, 0, 0, 0})
copy(t.Tokens.Value[indx+1:], t.Tokens.Value[indx:])
t.Tokens.Value[indx] = [5]uint32{1, 0, 0, 0, 0}
t.TokensCount++
}
}
}
return
}
func LoadDictionary() {
dictionary = make(map[string]bool)
data, err := ioutil.ReadFile(`dictionary`)
checkErr(err)
words := bytes.Fields(data)
for _, word := range words {
strword := string(word)
dictionary[strword] = false
}
}
var reg = regexp.MustCompile(`[^a-z0-9\s]`)
var reg2 = regexp.MustCompile(`\b(c|l|all|dall|dell|nell|sull|coll|pell|gl|agl|dagl|degl|negl|sugl|un|m|t|s|v|d|qu|n|j)'([a-z])`) //contractions
var dictionary map[string]bool
func main() {
trainer := new(Trainer)
LoadDictionary()
trainer.Start()
}
Make sure that if you're tokenizing from a large string, to avoid memory pinning. From the comments above, it sounds like the tokens are substrings of a large string.
You may need to add a little extra in your getTokens() function so it guarantees the tokens aren't pinning memory.
func getTokens(...) {
// near the end of your program
for i, t := range(tokens) {
tokens[i] = string([]byte(t))
}
}
By the way, reading the whole file into memory using ioutil.ReadFile all at once looks dubious. Are you sure you can't use bufio.Scanner?
I'm looking at the code more closely... if you are truly concerned about memory, take advantage of io.Reader. You should try to avoid sucking in the content of a whole file at once. Use io.Reader and the transform "along the grain". The way you're using it now is against the grain of its intent. The whole point of the transform package you're using is to construct flexible Readers that can stream through data.
For example, here's a simplification of what you're doing:
package main
import (
"bufio"
"bytes"
"fmt"
"unicode/utf8"
"code.google.com/p/go.text/transform"
)
type AccentsTransformer map[rune]string
func (a AccentsTransformer) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
for nSrc < len(src) {
// If we're at the edge, note this and return.
if !atEOF && !utf8.FullRune(src[nSrc:]) {
err = transform.ErrShortSrc
return
}
r, width := utf8.DecodeRune(src[nSrc:])
if r == utf8.RuneError && width == 1 {
err = fmt.Errorf("Decoding error")
return
}
if d, ok := a[r]; ok {
if nDst+len(d) > len(dst) {
err = transform.ErrShortDst
return
}
copy(dst[nDst:], d)
nSrc += width
nDst += len(d)
continue
}
if nDst+width > len(dst) {
err = transform.ErrShortDst
return
}
copy(dst[nDst:], src[nSrc:nSrc+width])
nDst += width
nSrc += width
}
return
}
func main() {
transliterations := AccentsTransformer{'Æ': "E", 'Ø': "OE"}
testString := "cØØl beÆns"
b := transform.NewReader(bytes.NewBufferString(testString), transliterations)
scanner := bufio.NewScanner(b)
scanner.Split(bufio.ScanWords)
for scanner.Scan() {
fmt.Println("token:", scanner.Text())
}
}
It becomes really easy then to chain transformers together. So, for example, if we wanted to remove all hyphens from the input stream, it's just a matter of using transform.Chain appropriately:
func main() {
transliterations := AccentsTransformer{'Æ': "E", 'Ø': "OE"}
removeHyphens := transform.RemoveFunc(func(r rune) bool {
return r == '-'
})
allTransforms := transform.Chain(transliterations, removeHyphens)
testString := "cØØl beÆns - the next generation"
b := transform.NewReader(bytes.NewBufferString(testString), allTransforms)
scanner := bufio.NewScanner(b)
scanner.Split(bufio.ScanWords)
for scanner.Scan() {
fmt.Println("token:", scanner.Text())
}
}
I have not exhaustively tested the code above, so please don't just copy-and-paste it without sufficient tests. :P I just cooked it up fast. But this kind of approach --- avoiding whole-file reading --- will scale better because it will read the file in chunks.
1 How large are "list.txt" and "dictionary"? If it is so large, No wonder the memory is so large
pibns := bytes.Fields(data)
how much is len(pibns)?
2 start the gc debug ( do GODEBUG="gctrace=1" ./yourprogram ) to see if there is any gc happening
3 do some profile like this:
func lookupMem(){
if f, err := os.Create("mem_prof"+time.Now.Unix()); err != nil {
log.Debug("record memory profile failed: %v", err)
} else {
runtime.GC()
pprof.WriteHeapProfile(f)
f.Close()
}
if f, err := os.Create("heap_prof" + "." + timestamp); err != nil {
log.Debug("heap profile failed:", err)
} else {
p := pprof.Lookup("heap")
p.WriteTo(f, 2)
}
}
func (t *Trainer) Start() {
.......
if i%1000==0 {
//if `len(pibns)` is not very large , record some meminfo
lookupMem()
}
.......

How to find out element position in slice?

How does one determine the position of an element present in slice?
I need something like the following:
type intSlice []int
func (slice intSlice) pos(value int) int {
for p, v := range slice {
if (v == value) {
return p
}
}
return -1
}
Sorry, there's no generic library function to do this. Go doesn't have a straight forward way of writing a function that can operate on any slice.
Your function works, although it would be a little better if you wrote it using range.
If you happen to have a byte slice, there is bytes.IndexByte.
You can create generic function in idiomatic go way:
func SliceIndex(limit int, predicate func(i int) bool) int {
for i := 0; i < limit; i++ {
if predicate(i) {
return i
}
}
return -1
}
And usage:
xs := []int{2, 4, 6, 8}
ys := []string{"C", "B", "K", "A"}
fmt.Println(
SliceIndex(len(xs), func(i int) bool { return xs[i] == 5 }),
SliceIndex(len(xs), func(i int) bool { return xs[i] == 6 }),
SliceIndex(len(ys), func(i int) bool { return ys[i] == "Z" }),
SliceIndex(len(ys), func(i int) bool { return ys[i] == "A" }))
You could write a function;
func indexOf(element string, data []string) (int) {
for k, v := range data {
if element == v {
return k
}
}
return -1 //not found.
}
This returns the index of a character/string if it matches the element. If its not found, returns a -1.
There is no library function for that. You have to code by your own.
Go supports generics as of version 1.18, which allows you to create a function like yours as follows:
func IndexOf[T comparable](collection []T, el T) int {
for i, x := range collection {
if x == el {
return i
}
}
return -1
}
If you want to be able to call IndexOf on your collection you can alternatively use #mh-cbon's technique from the comments.
You can just iterate of the slice and check if an element matches with your element of choice.
func index(slice []string, item string) int {
for i := range slice {
if slice[i] == item {
return i
}
}
return -1
}
Since Go 1.18 you can also use the experimental generic slices package from https://pkg.go.dev/golang.org/x/exp/slices like this:
package main
import "golang.org/x/exp/slices"
func main() {
s := []int{1,2,3,4,5}
wanted := 3
idx := slices.Index(s, wanted)
fmt.Printf("the index of %v is %v", wanted, idx)
}
It will return -1, if wanted is not in the slice. Test it at the playground.
This is my preferred way, since this might become part of the standard library someday.
Another option is to sort the slice using the sort package, then search for the thing you are looking for:
package main
import (
"sort"
"log"
)
var ints = [...]int{74, 59, 238, -784, 9845, 959, 905, 0, 0, 42, 7586, -5467984, 7586}
func main() {
data := ints
a := sort.IntSlice(data[0:])
sort.Sort(a)
pos := sort.SearchInts(a, -784)
log.Println("Sorted: ", a)
log.Println("Found at index ", pos)
}
prints
2009/11/10 23:00:00 Sorted: [-5467984 -784 0 0 42 59 74 238 905 959 7586 7586 9845]
2009/11/10 23:00:00 Found at index 1
This works for the basic types and you can always implement the sort interface for your own type if you need to work on a slice of other things. See http://golang.org/pkg/sort
Depends on what you are doing though.
I had the same issue few months ago and I solved in two ways:
First method:
func Find(slice interface{}, f func(value interface{}) bool) int {
s := reflect.ValueOf(slice)
if s.Kind() == reflect.Slice {
for index := 0; index < s.Len(); index++ {
if f(s.Index(index).Interface()) {
return index
}
}
}
return -1
}
Use example:
type UserInfo struct {
UserId int
}
func main() {
var (
destinationList []UserInfo
userId int = 123
)
destinationList = append(destinationList, UserInfo {
UserId : 23,
})
destinationList = append(destinationList, UserInfo {
UserId : 12,
})
idx := Find(destinationList, func(value interface{}) bool {
return value.(UserInfo).UserId == userId
})
if idx < 0 {
fmt.Println("not found")
} else {
fmt.Println(idx)
}
}
Second method with less computational cost:
func Search(length int, f func(index int) bool) int {
for index := 0; index < length; index++ {
if f(index) {
return index
}
}
return -1
}
Use example:
type UserInfo struct {
UserId int
}
func main() {
var (
destinationList []UserInfo
userId int = 123
)
destinationList = append(destinationList, UserInfo {
UserId : 23,
})
destinationList = append(destinationList, UserInfo {
UserId : 123,
})
idx := Search(len(destinationList), func(index int) bool {
return destinationList[index].UserId == userId
})
if idx < 0 {
fmt.Println("not found")
} else {
fmt.Println(idx)
}
}
Another option if your slice is sorted is to use SearchInts(a []int, x int) int which returns the element index if it's found or the index the element should be inserted at in case it is not present.
s := []int{3,2,1}
sort.Ints(s)
fmt.Println(sort.SearchInts(s, 1)) // 0
fmt.Println(sort.SearchInts(s, 4)) // 3
https://play.golang.org/p/OZhX_ymXstF

Resources