Why file's name get messy using archive/zip in golang, linux? - linux

I'm using golang's standard package archive/zip to wrap several files into a zipfile.
Here is my code for test:
package main
import (
"archive/zip"
"log"
"os"
)
func main() {
archive, _ := os.Create("/tmp/测试file.zip")
w := zip.NewWriter(archive)
// Add some files to the archive.
var files = []struct {
Name, Body string
}{
{"测试.txt", "test content: 测试"},
{"test.txt", "test content: test"},
}
for _, file := range files {
f, err := w.Create(file.Name)
if err != nil {
log.Fatal(err)
}
_, err = f.Write([]byte(file.Body))
if err != nil {
log.Fatal(err)
}
}
err := w.Close()
if err != nil {
log.Fatal(err)
}
}
results:
I get a zip file named 测试file.zip under /tmp as expected.
After unzip it, I get two files: test.txt, ц╡ЛшпХ.txt, and that is a mess.
The contents in both of the two files are normal as expected.
Why does this happen and how to fix this?

This might be an issue with unzip not handling UTF8 names properly. Explicitly using the Chinese locale worked for me:
$ LANG=zh_ZH unzip 测试file.zip
Archive: 测试file.zip
inflating: 测试.txt
inflating: test.txt
$ cat *.txt
test content: testtest content: 测试

import {
"golang.org/x/text/encoding/simplifiedchinese"
"golang.org/x/text/transform"
}
filename, _, err = transform.String(simplifiedchinese.GBK.NewEncoder(), "测试.txt")

Related

Get last n files in directory sorted by timestamp without listing all files

I am trying to get get last N files from a directory sorted by Creation/Modification time.
I am currently using this code:
files, err := ioutil.ReadDir(path)
if err != nil {
return 0, err
}
sort.Slice(files, func(i, j int) bool {
return files[i].ModTime().Before(files[j].ModTime())
})
The problem here is that the expected amount of files in this directory is ~ 2mil and when I get all of them in a slice, it consumes a lot of memory ~ 800mb. Also it is not sure when the GC will clean the memory.
Is there other way where I can get the last N files in the directory sorted by ts without reading and consuming all of the files in the memory?
My first answer using filepath.Walk was still allocating a huge chunk of memory as #Marc pointed out. So here an improved algorithm.
Note: This is not an optimized algorithm. It's just about providing an idea on how takle the problem.
maxFiles := 5
batch := 100 // optimize to find good balance
dir, err := os.Open(path)
if err != nil {
log.Fatal(err)
}
var files []os.FileInfo
for {
fs, err := dir.Readdir(batch)
if err != nil {
log.Println(err)
break
}
for _, fileInfo := range fs {
var lastFile os.FileInfo
if maxFiles <= len(files) {
lastFile = files[len(files)-1]
}
if lastFile != nil && fileInfo.ModTime().After(lastFile.ModTime()) {
break
}
files = append(files, fileInfo)
sort.Slice(files, func(i, j int) bool {
return files[i].ModTime().Before(files[j].ModTime())
})
if maxFiles < len(files) {
files = files[:maxFiles]
}
break
}
}
The basic idea is to only keep the oldest max X files in memory and discard the newer ones immediately or as soon as an older file pushes them out of the list.
Instead of a slice it might be helpful to look into using a btree (as it is sorted internally) or a double linked list. You'll have to do some benchmarking to figure out what is optimal.

CRC32 Checksum Calculation via GO

Trying to create a GO function that produces the same result as the Ubuntu Linux "cksum" operation, for example:
$ echo 123 > /tmp/foo
$ cksum /tmp/foo
2330645186 4 /tmp/foo
Could someone please provide a GO function that produces the first substring of the above result ("2330645186")? Thank you.
(Update)
It turns out cksum doesn't implement a cyclic redundancy check based on the CRC32 process (quite). To test CRC32 (the same as you'd find listed for a CRC32 checksum) you can use CRC calculation # http://zorc.breitbandkatze.de/ - go's hash/crc32.ChecksumIEEE implementation matches this
To implement the cksum crc process (also known as POSIX cksum) I instead generated a golang version of the c algorithm found on a cksum man page (which uses a lookup table)
package main
import (
"bufio"
"fmt"
"io"
"os"
)
var tbl = [256]uint32{0x00000000, 0x04C11DB7, 0x09823B6E, 0x0D4326D9,
0x130476DC, 0x17C56B6B, 0x1A864DB2, 0x1E475005,
0x2608EDB8, 0x22C9F00F, 0x2F8AD6D6, 0x2B4BCB61,
0x350C9B64, 0x31CD86D3, 0x3C8EA00A, 0x384FBDBD,
0x4C11DB70, 0x48D0C6C7, 0x4593E01E, 0x4152FDA9,
0x5F15ADAC, 0x5BD4B01B, 0x569796C2, 0x52568B75,
0x6A1936C8, 0x6ED82B7F, 0x639B0DA6, 0x675A1011,
0x791D4014, 0x7DDC5DA3, 0x709F7B7A, 0x745E66CD,
0x9823B6E0, 0x9CE2AB57, 0x91A18D8E, 0x95609039,
0x8B27C03C, 0x8FE6DD8B, 0x82A5FB52, 0x8664E6E5,
0xBE2B5B58, 0xBAEA46EF, 0xB7A96036, 0xB3687D81,
0xAD2F2D84, 0xA9EE3033, 0xA4AD16EA, 0xA06C0B5D,
0xD4326D90, 0xD0F37027, 0xDDB056FE, 0xD9714B49,
0xC7361B4C, 0xC3F706FB, 0xCEB42022, 0xCA753D95,
0xF23A8028, 0xF6FB9D9F, 0xFBB8BB46, 0xFF79A6F1,
0xE13EF6F4, 0xE5FFEB43, 0xE8BCCD9A, 0xEC7DD02D,
0x34867077, 0x30476DC0, 0x3D044B19, 0x39C556AE,
0x278206AB, 0x23431B1C, 0x2E003DC5, 0x2AC12072,
0x128E9DCF, 0x164F8078, 0x1B0CA6A1, 0x1FCDBB16,
0x018AEB13, 0x054BF6A4, 0x0808D07D, 0x0CC9CDCA,
0x7897AB07, 0x7C56B6B0, 0x71159069, 0x75D48DDE,
0x6B93DDDB, 0x6F52C06C, 0x6211E6B5, 0x66D0FB02,
0x5E9F46BF, 0x5A5E5B08, 0x571D7DD1, 0x53DC6066,
0x4D9B3063, 0x495A2DD4, 0x44190B0D, 0x40D816BA,
0xACA5C697, 0xA864DB20, 0xA527FDF9, 0xA1E6E04E,
0xBFA1B04B, 0xBB60ADFC, 0xB6238B25, 0xB2E29692,
0x8AAD2B2F, 0x8E6C3698, 0x832F1041, 0x87EE0DF6,
0x99A95DF3, 0x9D684044, 0x902B669D, 0x94EA7B2A,
0xE0B41DE7, 0xE4750050, 0xE9362689, 0xEDF73B3E,
0xF3B06B3B, 0xF771768C, 0xFA325055, 0xFEF34DE2,
0xC6BCF05F, 0xC27DEDE8, 0xCF3ECB31, 0xCBFFD686,
0xD5B88683, 0xD1799B34, 0xDC3ABDED, 0xD8FBA05A,
0x690CE0EE, 0x6DCDFD59, 0x608EDB80, 0x644FC637,
0x7A089632, 0x7EC98B85, 0x738AAD5C, 0x774BB0EB,
0x4F040D56, 0x4BC510E1, 0x46863638, 0x42472B8F,
0x5C007B8A, 0x58C1663D, 0x558240E4, 0x51435D53,
0x251D3B9E, 0x21DC2629, 0x2C9F00F0, 0x285E1D47,
0x36194D42, 0x32D850F5, 0x3F9B762C, 0x3B5A6B9B,
0x0315D626, 0x07D4CB91, 0x0A97ED48, 0x0E56F0FF,
0x1011A0FA, 0x14D0BD4D, 0x19939B94, 0x1D528623,
0xF12F560E, 0xF5EE4BB9, 0xF8AD6D60, 0xFC6C70D7,
0xE22B20D2, 0xE6EA3D65, 0xEBA91BBC, 0xEF68060B,
0xD727BBB6, 0xD3E6A601, 0xDEA580D8, 0xDA649D6F,
0xC423CD6A, 0xC0E2D0DD, 0xCDA1F604, 0xC960EBB3,
0xBD3E8D7E, 0xB9FF90C9, 0xB4BCB610, 0xB07DABA7,
0xAE3AFBA2, 0xAAFBE615, 0xA7B8C0CC, 0xA379DD7B,
0x9B3660C6, 0x9FF77D71, 0x92B45BA8, 0x9675461F,
0x8832161A, 0x8CF30BAD, 0x81B02D74, 0x857130C3,
0x5D8A9099, 0x594B8D2E, 0x5408ABF7, 0x50C9B640,
0x4E8EE645, 0x4A4FFBF2, 0x470CDD2B, 0x43CDC09C,
0x7B827D21, 0x7F436096, 0x7200464F, 0x76C15BF8,
0x68860BFD, 0x6C47164A, 0x61043093, 0x65C52D24,
0x119B4BE9, 0x155A565E, 0x18197087, 0x1CD86D30,
0x029F3D35, 0x065E2082, 0x0B1D065B, 0x0FDC1BEC,
0x3793A651, 0x3352BBE6, 0x3E119D3F, 0x3AD08088,
0x2497D08D, 0x2056CD3A, 0x2D15EBE3, 0x29D4F654,
0xC5A92679, 0xC1683BCE, 0xCC2B1D17, 0xC8EA00A0,
0xD6AD50A5, 0xD26C4D12, 0xDF2F6BCB, 0xDBEE767C,
0xE3A1CBC1, 0xE760D676, 0xEA23F0AF, 0xEEE2ED18,
0xF0A5BD1D, 0xF464A0AA, 0xF9278673, 0xFDE69BC4,
0x89B8FD09, 0x8D79E0BE, 0x803AC667, 0x84FBDBD0,
0x9ABC8BD5, 0x9E7D9662, 0x933EB0BB, 0x97FFAD0C,
0xAFB010B1, 0xAB710D06, 0xA6322BDF, 0xA2F33668,
0xBCB4666D, 0xB8757BDA, 0xB5365D03, 0xB1F740B4}
type crc struct {
p, r uint32
Size int
final bool
}
func NewCrc() *crc {
return &crc{0, 0, 0, false}
}
func (pr *crc) Add(b byte) {
if pr.final {
return
}
pr.r = (pr.r << 8) ^ tbl[byte(pr.r>>24)^b]
pr.Size++
}
func (pr *crc) Crc() uint32 {
if pr.final {
return pr.r
}
for m := pr.Size; m > 0; {
b := byte(m & 0377)
m = m >> 8
pr.r = (pr.r << 8) ^ tbl[byte(pr.r>>24)^b]
}
pr.final = true //Prevent further modification
pr.r = ^pr.r
return pr.r
}
func cksum(filename string) (uint32, int, error) {
f, err := os.Open(filename)
if err != nil {
return 0, 0, err
}
defer f.Close()
in := bufio.NewReader(f)
pr := NewCrc()
for done := false; !done; {
switch b, err := in.ReadByte(); err {
case io.EOF:
done = true
case nil:
pr.Add(b)
default:
return 0, 0, err
}
}
return pr.Crc(), pr.Size, nil
}
func main() {
var filename = "foo"
crc, size, err := cksum(filename)
if err != nil {
fmt.Println("Error: ", err)
return
}
fmt.Printf("%d %d %s\n", crc, size, filename)
}
Obviously in this case the filename is hardcoded (to foo) but you could change that with flags. The content of foo is 123\n (**note: in windows you'll need to convert line endings to not get a 5 byte file) Results:
linux: $ cksum foo
2330645186 4 foo
linux: $ go run cksum.go
2330645186 4 foo
windows: > go run cksum.go **
2330645186 4 foo
Actually, I found a more simplified answer to my original question:
Using:
https://pkg.go.dev/github.com/cxmcc/unixsums#section-readme
Here is the snippet that provides the posix checksum equivalent value of a file in Go:
data, err := ioutil.ReadFile("/tmp/test.loop")
if err != nil {
log.Fatal(err)
}
fmt.Printf("cksum: %d\n", cksum.Cksum(data))

Identifying an existing folder [duplicate]

This question already has answers here:
Expand tilde to home directory
(6 answers)
reader.ReadString does not strip out the first occurrence of delim
(4 answers)
Closed 3 years ago.
I have an issue where it seems go is telling me that a folder doesn't exist, when it clearly does.
path, _ := reader.ReadString('\n')
path, err := expand(path)
fmt.Println("Path Expanded: ", path, err)
if err == nil {
if _, err2 := os.Lstat(path); err2 == nil {
fmt.Println("Valid Path")
} else if os.IsNotExist(err2) {
fmt.Println("Invalid Path")
fmt.Println(err2)
} else {
fmt.Println(err2)
}
}
The expand function simply translates the ~ to the homeDir.
func expand(path string) (string, error) {
if len(path) == 0 || path[0] != '~' {
return path, nil
}
usr, err := user.Current()
if err != nil {
return "", err
}
return filepath.Join(usr.HomeDir, path[1:]), nil
}
If I input the value of ~ it correctly translates it to /home/<user>/ but it ultimately states that the folder does not exist, even though it clearly does, and I know I have access to it, so it doesn't seem to be a permissions thing.
if I try /root/ as the input, I correctly get a permissions error, I am ok with that. But I expect my ~ directory to return "Valid Path"
My error is almost always : no such file or directory
I am on Lubuntu 19.xx and it is a fairly fresh install, I am running this app from ~/Projects/src/Playground/AppName and I am using the bash terminal from vscode.
I have also tried both Lstat and Stat unsuccessfully, not to mention a ton of examples and different ways. I am sure this is some underlying linux thing that I don't understand...
The answer to this is that I was not trimming the ReadString which used the delimiter of \n, by adding strings.Trim(path, "\n"), it corrected my issue.

Insert a mgo query []M.bson result into a file.txt as a string

i have to insert into a file the result of a mgo query MongoDB converted in Go to get the id of images
var path="/home/Medo/text.txt"
pipe := cc.Pipe([]bson.M{
{"$unwind": "$images"},
{"$group": bson.M{"_id": "null", "images":bson.M{"$push": "$images"}}},
{"$project": bson.M{"_id": 0}}})
response := []bson.M{}
errResponse := pipe.All(&response)
if errResponse != nil {
fmt.Println("error Response: ",errResponse)
}
fmt.Println(response) // to print for making sure that it is working
data, err := bson.Marshal(&response)
s:=string(data)
if err22 != nil {
fmt.Println("error insertion ", err22)
}
Here is the part where I have to create a file and write on it.
The problem is when I got the result of the query in the text file I got an enumeration values in the last of each value for example:
id of images
23456678`0`
24578689`1`
23678654`2`
12890762`3`
76543890`4`
64744848`5`
so for each value i got a number sorted in the last , and i can't figure out how , after getting the reponse from the query i converted the Bson to []Byte and then to Stringbut it keeps me getting that enumeration sorted values in the last of each results
I'd like to drop those 0 1 2 3 4 5
var _, errExistFile = os.Stat(path)
if os.IsNotExist(errExistFile) {
var file, errCreateFile = os.Create(path)
if isError(erro) {
return
}
defer file.Close()
}
fmt.Println("==> done creating file", path)
var file, errii = os.OpenFile(path, os.O_RDWR, 0644)
if isError(errii) {
return
}
defer file.Close()
// write some text line-by-line to file
_, erri := file.WriteString(s)
if isError(erri) {
return
}
erri = file.Sync()
if isError(erri) {
return
}
fmt.Println("==> done writing to file")
You could declare a simple struct eg
simple struct {
ID idtype `bson:"_id"`
Image int `bson:"images"`
}
The function to put the image ids into the file would be
open file stuff…
result := simple{}
iter := collection.Find(nil).Iter()
for iter.Next(&result){
file.WriteString(fmt.Sprintf("%d\n",result.Image))
}
iter.Close()

How to check file existence by its base name (without extension)?

Question is quite self-explanatory.
Please, could anybody show me how can I check existence of the file by name (without extension) by short and efficient way. It would be great if code returns several occurrence if folder have several files with the same name.
Example:
folder/
file.html
file.md
UPDATE:
It is not obviously how to use one of filepath.Match() or filepath.Glob() functions by official documentation. So here is some examples:
matches, _ := filepath.Glob("./folder/file*") //returns paths to real files [folder/file.html, folder/file.md]
matchesToPattern, _ := filepath.Match("./folder/file*", "./folder/file.html") //returns true, but it is just compare strings and doesn't check real content
You need to use the path/filepath package.
The functions to check are: Glob(), Match() and Walk() — pick whatever suits your taste better.
Here is the updated code :
package main
import (
"fmt"
"os"
"path/filepath"
"regexp"
)
func main() {
dirname := "." + string(filepath.Separator)
d, err := os.Open(dirname)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
defer d.Close()
fi, err := d.Readdir(-1)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
r, _ := regexp.Compile("f([a-z]+)le") // the string to match
for _, fi := range fi {
if fi.Mode().IsRegular() { // is file
if r.Match([]byte(fi.Name())) { // if it match
fmt.Println(fi.Name(), fi.Size(), "bytes")
}
}
}
}
With this one you can also search for date, size, include subfolders or file properties.

Resources