CRC32 Checksum Calculation via GO - linux

Trying to create a GO function that produces the same result as the Ubuntu Linux "cksum" operation, for example:
$ echo 123 > /tmp/foo
$ cksum /tmp/foo
2330645186 4 /tmp/foo
Could someone please provide a GO function that produces the first substring of the above result ("2330645186")? Thank you.

(Update)
It turns out cksum doesn't implement a cyclic redundancy check based on the CRC32 process (quite). To test CRC32 (the same as you'd find listed for a CRC32 checksum) you can use CRC calculation # http://zorc.breitbandkatze.de/ - go's hash/crc32.ChecksumIEEE implementation matches this
To implement the cksum crc process (also known as POSIX cksum) I instead generated a golang version of the c algorithm found on a cksum man page (which uses a lookup table)
package main
import (
"bufio"
"fmt"
"io"
"os"
)
var tbl = [256]uint32{0x00000000, 0x04C11DB7, 0x09823B6E, 0x0D4326D9,
0x130476DC, 0x17C56B6B, 0x1A864DB2, 0x1E475005,
0x2608EDB8, 0x22C9F00F, 0x2F8AD6D6, 0x2B4BCB61,
0x350C9B64, 0x31CD86D3, 0x3C8EA00A, 0x384FBDBD,
0x4C11DB70, 0x48D0C6C7, 0x4593E01E, 0x4152FDA9,
0x5F15ADAC, 0x5BD4B01B, 0x569796C2, 0x52568B75,
0x6A1936C8, 0x6ED82B7F, 0x639B0DA6, 0x675A1011,
0x791D4014, 0x7DDC5DA3, 0x709F7B7A, 0x745E66CD,
0x9823B6E0, 0x9CE2AB57, 0x91A18D8E, 0x95609039,
0x8B27C03C, 0x8FE6DD8B, 0x82A5FB52, 0x8664E6E5,
0xBE2B5B58, 0xBAEA46EF, 0xB7A96036, 0xB3687D81,
0xAD2F2D84, 0xA9EE3033, 0xA4AD16EA, 0xA06C0B5D,
0xD4326D90, 0xD0F37027, 0xDDB056FE, 0xD9714B49,
0xC7361B4C, 0xC3F706FB, 0xCEB42022, 0xCA753D95,
0xF23A8028, 0xF6FB9D9F, 0xFBB8BB46, 0xFF79A6F1,
0xE13EF6F4, 0xE5FFEB43, 0xE8BCCD9A, 0xEC7DD02D,
0x34867077, 0x30476DC0, 0x3D044B19, 0x39C556AE,
0x278206AB, 0x23431B1C, 0x2E003DC5, 0x2AC12072,
0x128E9DCF, 0x164F8078, 0x1B0CA6A1, 0x1FCDBB16,
0x018AEB13, 0x054BF6A4, 0x0808D07D, 0x0CC9CDCA,
0x7897AB07, 0x7C56B6B0, 0x71159069, 0x75D48DDE,
0x6B93DDDB, 0x6F52C06C, 0x6211E6B5, 0x66D0FB02,
0x5E9F46BF, 0x5A5E5B08, 0x571D7DD1, 0x53DC6066,
0x4D9B3063, 0x495A2DD4, 0x44190B0D, 0x40D816BA,
0xACA5C697, 0xA864DB20, 0xA527FDF9, 0xA1E6E04E,
0xBFA1B04B, 0xBB60ADFC, 0xB6238B25, 0xB2E29692,
0x8AAD2B2F, 0x8E6C3698, 0x832F1041, 0x87EE0DF6,
0x99A95DF3, 0x9D684044, 0x902B669D, 0x94EA7B2A,
0xE0B41DE7, 0xE4750050, 0xE9362689, 0xEDF73B3E,
0xF3B06B3B, 0xF771768C, 0xFA325055, 0xFEF34DE2,
0xC6BCF05F, 0xC27DEDE8, 0xCF3ECB31, 0xCBFFD686,
0xD5B88683, 0xD1799B34, 0xDC3ABDED, 0xD8FBA05A,
0x690CE0EE, 0x6DCDFD59, 0x608EDB80, 0x644FC637,
0x7A089632, 0x7EC98B85, 0x738AAD5C, 0x774BB0EB,
0x4F040D56, 0x4BC510E1, 0x46863638, 0x42472B8F,
0x5C007B8A, 0x58C1663D, 0x558240E4, 0x51435D53,
0x251D3B9E, 0x21DC2629, 0x2C9F00F0, 0x285E1D47,
0x36194D42, 0x32D850F5, 0x3F9B762C, 0x3B5A6B9B,
0x0315D626, 0x07D4CB91, 0x0A97ED48, 0x0E56F0FF,
0x1011A0FA, 0x14D0BD4D, 0x19939B94, 0x1D528623,
0xF12F560E, 0xF5EE4BB9, 0xF8AD6D60, 0xFC6C70D7,
0xE22B20D2, 0xE6EA3D65, 0xEBA91BBC, 0xEF68060B,
0xD727BBB6, 0xD3E6A601, 0xDEA580D8, 0xDA649D6F,
0xC423CD6A, 0xC0E2D0DD, 0xCDA1F604, 0xC960EBB3,
0xBD3E8D7E, 0xB9FF90C9, 0xB4BCB610, 0xB07DABA7,
0xAE3AFBA2, 0xAAFBE615, 0xA7B8C0CC, 0xA379DD7B,
0x9B3660C6, 0x9FF77D71, 0x92B45BA8, 0x9675461F,
0x8832161A, 0x8CF30BAD, 0x81B02D74, 0x857130C3,
0x5D8A9099, 0x594B8D2E, 0x5408ABF7, 0x50C9B640,
0x4E8EE645, 0x4A4FFBF2, 0x470CDD2B, 0x43CDC09C,
0x7B827D21, 0x7F436096, 0x7200464F, 0x76C15BF8,
0x68860BFD, 0x6C47164A, 0x61043093, 0x65C52D24,
0x119B4BE9, 0x155A565E, 0x18197087, 0x1CD86D30,
0x029F3D35, 0x065E2082, 0x0B1D065B, 0x0FDC1BEC,
0x3793A651, 0x3352BBE6, 0x3E119D3F, 0x3AD08088,
0x2497D08D, 0x2056CD3A, 0x2D15EBE3, 0x29D4F654,
0xC5A92679, 0xC1683BCE, 0xCC2B1D17, 0xC8EA00A0,
0xD6AD50A5, 0xD26C4D12, 0xDF2F6BCB, 0xDBEE767C,
0xE3A1CBC1, 0xE760D676, 0xEA23F0AF, 0xEEE2ED18,
0xF0A5BD1D, 0xF464A0AA, 0xF9278673, 0xFDE69BC4,
0x89B8FD09, 0x8D79E0BE, 0x803AC667, 0x84FBDBD0,
0x9ABC8BD5, 0x9E7D9662, 0x933EB0BB, 0x97FFAD0C,
0xAFB010B1, 0xAB710D06, 0xA6322BDF, 0xA2F33668,
0xBCB4666D, 0xB8757BDA, 0xB5365D03, 0xB1F740B4}
type crc struct {
p, r uint32
Size int
final bool
}
func NewCrc() *crc {
return &crc{0, 0, 0, false}
}
func (pr *crc) Add(b byte) {
if pr.final {
return
}
pr.r = (pr.r << 8) ^ tbl[byte(pr.r>>24)^b]
pr.Size++
}
func (pr *crc) Crc() uint32 {
if pr.final {
return pr.r
}
for m := pr.Size; m > 0; {
b := byte(m & 0377)
m = m >> 8
pr.r = (pr.r << 8) ^ tbl[byte(pr.r>>24)^b]
}
pr.final = true //Prevent further modification
pr.r = ^pr.r
return pr.r
}
func cksum(filename string) (uint32, int, error) {
f, err := os.Open(filename)
if err != nil {
return 0, 0, err
}
defer f.Close()
in := bufio.NewReader(f)
pr := NewCrc()
for done := false; !done; {
switch b, err := in.ReadByte(); err {
case io.EOF:
done = true
case nil:
pr.Add(b)
default:
return 0, 0, err
}
}
return pr.Crc(), pr.Size, nil
}
func main() {
var filename = "foo"
crc, size, err := cksum(filename)
if err != nil {
fmt.Println("Error: ", err)
return
}
fmt.Printf("%d %d %s\n", crc, size, filename)
}
Obviously in this case the filename is hardcoded (to foo) but you could change that with flags. The content of foo is 123\n (**note: in windows you'll need to convert line endings to not get a 5 byte file) Results:
linux: $ cksum foo
2330645186 4 foo
linux: $ go run cksum.go
2330645186 4 foo
windows: > go run cksum.go **
2330645186 4 foo

Actually, I found a more simplified answer to my original question:
Using:
https://pkg.go.dev/github.com/cxmcc/unixsums#section-readme
Here is the snippet that provides the posix checksum equivalent value of a file in Go:
data, err := ioutil.ReadFile("/tmp/test.loop")
if err != nil {
log.Fatal(err)
}
fmt.Printf("cksum: %d\n", cksum.Cksum(data))

Related

Get last n files in directory sorted by timestamp without listing all files

I am trying to get get last N files from a directory sorted by Creation/Modification time.
I am currently using this code:
files, err := ioutil.ReadDir(path)
if err != nil {
return 0, err
}
sort.Slice(files, func(i, j int) bool {
return files[i].ModTime().Before(files[j].ModTime())
})
The problem here is that the expected amount of files in this directory is ~ 2mil and when I get all of them in a slice, it consumes a lot of memory ~ 800mb. Also it is not sure when the GC will clean the memory.
Is there other way where I can get the last N files in the directory sorted by ts without reading and consuming all of the files in the memory?
My first answer using filepath.Walk was still allocating a huge chunk of memory as #Marc pointed out. So here an improved algorithm.
Note: This is not an optimized algorithm. It's just about providing an idea on how takle the problem.
maxFiles := 5
batch := 100 // optimize to find good balance
dir, err := os.Open(path)
if err != nil {
log.Fatal(err)
}
var files []os.FileInfo
for {
fs, err := dir.Readdir(batch)
if err != nil {
log.Println(err)
break
}
for _, fileInfo := range fs {
var lastFile os.FileInfo
if maxFiles <= len(files) {
lastFile = files[len(files)-1]
}
if lastFile != nil && fileInfo.ModTime().After(lastFile.ModTime()) {
break
}
files = append(files, fileInfo)
sort.Slice(files, func(i, j int) bool {
return files[i].ModTime().Before(files[j].ModTime())
})
if maxFiles < len(files) {
files = files[:maxFiles]
}
break
}
}
The basic idea is to only keep the oldest max X files in memory and discard the newer ones immediately or as soon as an older file pushes them out of the list.
Instead of a slice it might be helpful to look into using a btree (as it is sorted internally) or a double linked list. You'll have to do some benchmarking to figure out what is optimal.

python3.PyImport_ImportModule(name) will emit a fatal error when called the second time

Environments:
MacOS (Catalina Version 10.15.4)
Python3.7.6
Go1.13.8
I want to use go-python3 to invoke an algorithm written in Python3, but as described, a fatal error will generated when the second time I invoke this algorithm. From the output message, it seems that PyImport_ImportModule causes this error.
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xa pc=0x91256a3]
runtime stack:
runtime.throw(0x4967a75, 0x2a)
/usr/local/go/src/runtime/panic.go:774 +0x72
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:378 +0x47c
goroutine 41 [syscall]:
runtime.cgocall(0x4637740, 0xc000063c18, 0x48a4660)
/usr/local/go/src/runtime/cgocall.go:128 +0x5b fp=0xc000063be8 sp=0xc000063bb0 pc=0x4004d0b
github.com/DataDog/go-python3._Cfunc_PyImport_ImportModule(0x8061d90, 0x0)
_cgo_gotypes.go:3780 +0x4a fp=0xc000063c18 sp=0xc000063be8 pc=0x462c2fa
github.com/DataDog/go-python3.PyImport_ImportModule(0x49501f5, 0x8, 0x0)
/Users/zhao/go/pkg/mod/github.com/!data!dog/go-python3#v0.0.0-20191126174558-6ed25e33b3c4/import.go:24 +0x87 fp=0xc000063c80 sp=0xc000063c18 pc=0x462e267
PPGServer/pkg/algo.ImportModule(0x4964926, 0x26, 0x49501f5, 0x8, 0x1)
/Users/zhao/go/src/PPGServer/pkg/algo/ppg.go:42 +0x4cb fp=0xc000063d98 sp=0xc000063c80 pc=0x46332db
PPGServer/pkg/algo.CalcPre(0xc0003560c0, 0xd, 0x0, 0x0)
....
Here is the sample code.
A wrapper of PyImport_ImportModule:
// ImportModule will import python module from given directory
func ImportModule(dir, name string) *python3.PyObject {
fmt.Println("python3.PyImport_ImportModule before")
sysModule := python3.PyImport_ImportModule("sys") // import sys
fmt.Println("python3.PyImport_ImportModule success")
path := sysModule.GetAttrString("path") // path = sys.path
ob := python3.PyList_GetItem(path, 1)
fmt.Println("check:", python3.PyUnicode_Check(ob))
fmt.Println("path:", python3.PyUnicode_AsUTF8(ob))
fmt.Println("sysModule.GetAttrString success")
python3.PyList_Insert(path, 0, python3.PyUnicode_FromString("/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages"))
python3.PyList_Insert(path, 0, python3.PyUnicode_FromString(dir))
fmt.Println("After module insert:", python3.PyUnicode_AsUTF8(python3.PyList_GetItem(path, 0)))
fmt.Println("module name:", name)
return python3.PyImport_ImportModule(name)
}
Each time the algorithm is called in a goroutine.
func CalcPre(dataFilePath string) (sbpI int, dbpI int) {
python3.Py_Initialize()
if !python3.Py_IsInitialized() {
fmt.Println("Error initializing the python interpreter")
os.Exit(1)
}
gstate = python3.PyGILState_Ensure()
fmt.Println("Py_Initialize success")
vbp := ImportModule("/Users/zhao/Desktop/lab/ppython", "value_bp")
fmt.Println("ImportModule success")
b := vbp.GetAttrString("estimate")
fmt.Printf("[FUNC] b = %#v\n", b)
bArgs := python3.PyTuple_New(1)
python3.PyTuple_SetItem(bArgs, 0, python3.PyUnicode_FromString(dataFilePath))
re := b.Call(bArgs, python3.Py_None)
sbp := python3.PyTuple_GetItem(re, 0)
dbp := python3.PyTuple_GetItem(re, 1)
defer func() {
python3.Py_Finalize()
fmt.Println("python3.Py_Finalize()")
}()
sbpI = python3.PyLong_AsLong(sbp)
dbpI = python3.PyLong_AsLong(dbp)
python3.PyGILState_Release(gstate)
return
}
func Calc(dataFilePath string) {
CalcPre(dataFilePath)
}
Sample caller like this: go Calc("aaa.csv").
To reproduce this, use the code above and environments above, put these code into a goroutine, like go Calc("aaa.csv").For simplicity, you may just remove the algorithm part and just have the skeleton remained.
To be simple, you can use this code as well to reproduce this issue:
func CalcPre(dataFilePath string) (sbpI int, dbpI int) {
python3.Py_Initialize()
if !python3.Py_IsInitialized() {
fmt.Println("Error initializing the python interpreter")
os.Exit(1)
}
gstate = python3.PyGILState_Ensure()
ImportModule("/Users/zhao/Desktop/lab/ppython", "value_bp")
defer func() {
python3.Py_Finalize()
fmt.Println("python3.Py_Finalize()")
}()
return
}
Here is an option to achieve the same purpose.
Just use os/exec package~~~
Use exec.Command to set executable binary and its arguments,
sample code:
cmd := exec.Command("python3", "test.py", "./srcdata/aaa.csv")
Use cmd.Dir to set the command working directory;
Use output := cmd.CombinedOutput() to run this command and get its stdout information;
Do whatever you want to deal with the output.
Full snippet as follows:
func Calc(dataFilePath string) (sI int, dI int) {
cmd := exec.Command("python3", "test.py", "./srcdata/"+dataFilePath)
cmd.Dir = "/Users/username/Desktop/lab/ppython"
output, e := cmd.CombinedOutput()
if e != nil {
fmt.Println("Python Execution Error :", e)
}
result := string(output)
strs := strings.Split(result, "\n")
sI, e = strconv.Atoi(strs[0])
dI, e = strconv.Atoi(strs[1])
return
}
Anyway, this could handle most inter-language operations for me.

Insert a mgo query []M.bson result into a file.txt as a string

i have to insert into a file the result of a mgo query MongoDB converted in Go to get the id of images
var path="/home/Medo/text.txt"
pipe := cc.Pipe([]bson.M{
{"$unwind": "$images"},
{"$group": bson.M{"_id": "null", "images":bson.M{"$push": "$images"}}},
{"$project": bson.M{"_id": 0}}})
response := []bson.M{}
errResponse := pipe.All(&response)
if errResponse != nil {
fmt.Println("error Response: ",errResponse)
}
fmt.Println(response) // to print for making sure that it is working
data, err := bson.Marshal(&response)
s:=string(data)
if err22 != nil {
fmt.Println("error insertion ", err22)
}
Here is the part where I have to create a file and write on it.
The problem is when I got the result of the query in the text file I got an enumeration values in the last of each value for example:
id of images
23456678`0`
24578689`1`
23678654`2`
12890762`3`
76543890`4`
64744848`5`
so for each value i got a number sorted in the last , and i can't figure out how , after getting the reponse from the query i converted the Bson to []Byte and then to Stringbut it keeps me getting that enumeration sorted values in the last of each results
I'd like to drop those 0 1 2 3 4 5
var _, errExistFile = os.Stat(path)
if os.IsNotExist(errExistFile) {
var file, errCreateFile = os.Create(path)
if isError(erro) {
return
}
defer file.Close()
}
fmt.Println("==> done creating file", path)
var file, errii = os.OpenFile(path, os.O_RDWR, 0644)
if isError(errii) {
return
}
defer file.Close()
// write some text line-by-line to file
_, erri := file.WriteString(s)
if isError(erri) {
return
}
erri = file.Sync()
if isError(erri) {
return
}
fmt.Println("==> done writing to file")
You could declare a simple struct eg
simple struct {
ID idtype `bson:"_id"`
Image int `bson:"images"`
}
The function to put the image ids into the file would be
open file stuff…
result := simple{}
iter := collection.Find(nil).Iter()
for iter.Next(&result){
file.WriteString(fmt.Sprintf("%d\n",result.Image))
}
iter.Close()

How to check file existence by its base name (without extension)?

Question is quite self-explanatory.
Please, could anybody show me how can I check existence of the file by name (without extension) by short and efficient way. It would be great if code returns several occurrence if folder have several files with the same name.
Example:
folder/
file.html
file.md
UPDATE:
It is not obviously how to use one of filepath.Match() or filepath.Glob() functions by official documentation. So here is some examples:
matches, _ := filepath.Glob("./folder/file*") //returns paths to real files [folder/file.html, folder/file.md]
matchesToPattern, _ := filepath.Match("./folder/file*", "./folder/file.html") //returns true, but it is just compare strings and doesn't check real content
You need to use the path/filepath package.
The functions to check are: Glob(), Match() and Walk() — pick whatever suits your taste better.
Here is the updated code :
package main
import (
"fmt"
"os"
"path/filepath"
"regexp"
)
func main() {
dirname := "." + string(filepath.Separator)
d, err := os.Open(dirname)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
defer d.Close()
fi, err := d.Readdir(-1)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
r, _ := regexp.Compile("f([a-z]+)le") // the string to match
for _, fi := range fi {
if fi.Mode().IsRegular() { // is file
if r.Match([]byte(fi.Name())) { // if it match
fmt.Println(fi.Name(), fi.Size(), "bytes")
}
}
}
}
With this one you can also search for date, size, include subfolders or file properties.

Why file's name get messy using archive/zip in golang, linux?

I'm using golang's standard package archive/zip to wrap several files into a zipfile.
Here is my code for test:
package main
import (
"archive/zip"
"log"
"os"
)
func main() {
archive, _ := os.Create("/tmp/测试file.zip")
w := zip.NewWriter(archive)
// Add some files to the archive.
var files = []struct {
Name, Body string
}{
{"测试.txt", "test content: 测试"},
{"test.txt", "test content: test"},
}
for _, file := range files {
f, err := w.Create(file.Name)
if err != nil {
log.Fatal(err)
}
_, err = f.Write([]byte(file.Body))
if err != nil {
log.Fatal(err)
}
}
err := w.Close()
if err != nil {
log.Fatal(err)
}
}
results:
I get a zip file named 测试file.zip under /tmp as expected.
After unzip it, I get two files: test.txt, ц╡ЛшпХ.txt, and that is a mess.
The contents in both of the two files are normal as expected.
Why does this happen and how to fix this?
This might be an issue with unzip not handling UTF8 names properly. Explicitly using the Chinese locale worked for me:
$ LANG=zh_ZH unzip 测试file.zip
Archive: 测试file.zip
inflating: 测试.txt
inflating: test.txt
$ cat *.txt
test content: testtest content: 测试
import {
"golang.org/x/text/encoding/simplifiedchinese"
"golang.org/x/text/transform"
}
filename, _, err = transform.String(simplifiedchinese.GBK.NewEncoder(), "测试.txt")

Resources