How to check file existence by its base name (without extension)? - search

Question is quite self-explanatory.
Please, could anybody show me how can I check existence of the file by name (without extension) by short and efficient way. It would be great if code returns several occurrence if folder have several files with the same name.
Example:
folder/
file.html
file.md
UPDATE:
It is not obviously how to use one of filepath.Match() or filepath.Glob() functions by official documentation. So here is some examples:
matches, _ := filepath.Glob("./folder/file*") //returns paths to real files [folder/file.html, folder/file.md]
matchesToPattern, _ := filepath.Match("./folder/file*", "./folder/file.html") //returns true, but it is just compare strings and doesn't check real content

You need to use the path/filepath package.
The functions to check are: Glob(), Match() and Walk() — pick whatever suits your taste better.

Here is the updated code :
package main
import (
"fmt"
"os"
"path/filepath"
"regexp"
)
func main() {
dirname := "." + string(filepath.Separator)
d, err := os.Open(dirname)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
defer d.Close()
fi, err := d.Readdir(-1)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
r, _ := regexp.Compile("f([a-z]+)le") // the string to match
for _, fi := range fi {
if fi.Mode().IsRegular() { // is file
if r.Match([]byte(fi.Name())) { // if it match
fmt.Println(fi.Name(), fi.Size(), "bytes")
}
}
}
}
With this one you can also search for date, size, include subfolders or file properties.

Related

Get last n files in directory sorted by timestamp without listing all files

I am trying to get get last N files from a directory sorted by Creation/Modification time.
I am currently using this code:
files, err := ioutil.ReadDir(path)
if err != nil {
return 0, err
}
sort.Slice(files, func(i, j int) bool {
return files[i].ModTime().Before(files[j].ModTime())
})
The problem here is that the expected amount of files in this directory is ~ 2mil and when I get all of them in a slice, it consumes a lot of memory ~ 800mb. Also it is not sure when the GC will clean the memory.
Is there other way where I can get the last N files in the directory sorted by ts without reading and consuming all of the files in the memory?
My first answer using filepath.Walk was still allocating a huge chunk of memory as #Marc pointed out. So here an improved algorithm.
Note: This is not an optimized algorithm. It's just about providing an idea on how takle the problem.
maxFiles := 5
batch := 100 // optimize to find good balance
dir, err := os.Open(path)
if err != nil {
log.Fatal(err)
}
var files []os.FileInfo
for {
fs, err := dir.Readdir(batch)
if err != nil {
log.Println(err)
break
}
for _, fileInfo := range fs {
var lastFile os.FileInfo
if maxFiles <= len(files) {
lastFile = files[len(files)-1]
}
if lastFile != nil && fileInfo.ModTime().After(lastFile.ModTime()) {
break
}
files = append(files, fileInfo)
sort.Slice(files, func(i, j int) bool {
return files[i].ModTime().Before(files[j].ModTime())
})
if maxFiles < len(files) {
files = files[:maxFiles]
}
break
}
}
The basic idea is to only keep the oldest max X files in memory and discard the newer ones immediately or as soon as an older file pushes them out of the list.
Instead of a slice it might be helpful to look into using a btree (as it is sorted internally) or a double linked list. You'll have to do some benchmarking to figure out what is optimal.

Remove all characters after a delimiter in a string

I am building a web crawler application in golang.
After downloading the HTML of a page, I separate out the URLs.
I am presented with URLs that have "#s" in them, such as "en.wikipedia.org/wiki/Race_condition#Computing". I would like to get rid of all characters following the "#", since these lead to the same page anyways. Any advice for how to do so?
Use the url package:
u, _ := url.Parse("SOME_URL_HERE")
u.Fragment = ""
return u.String()
An improvement on the answer by Luke Joshua Park is to parse the URL relative to the URL of the source page. This creates an absolute URL from what might be relative URL on the page (scheme not specified, host not specified, relative path). Another improvement is to check and handle errors.
func clean(pageURL, linkURL string) (string, error) {
p, err := url.Parse(pageURL)
if err != nil {
return "", err
}
l, err := p.Parse(linkURL)
if err != nil {
return "", err
}
l.Fragment = "" // chop off the fragment
return l.String()
}
If you are not interested in getting an absolute URL, then chop off everything after the #. This works because the only valid use of # in a URL is the fragment separator.
func clean(linkURL string) string {
i := strings.LastIndexByte(linkURL, '#')
if i < 0 {
return linkURL
}
return linkURL[:i]
}

Identifying an existing folder [duplicate]

This question already has answers here:
Expand tilde to home directory
(6 answers)
reader.ReadString does not strip out the first occurrence of delim
(4 answers)
Closed 3 years ago.
I have an issue where it seems go is telling me that a folder doesn't exist, when it clearly does.
path, _ := reader.ReadString('\n')
path, err := expand(path)
fmt.Println("Path Expanded: ", path, err)
if err == nil {
if _, err2 := os.Lstat(path); err2 == nil {
fmt.Println("Valid Path")
} else if os.IsNotExist(err2) {
fmt.Println("Invalid Path")
fmt.Println(err2)
} else {
fmt.Println(err2)
}
}
The expand function simply translates the ~ to the homeDir.
func expand(path string) (string, error) {
if len(path) == 0 || path[0] != '~' {
return path, nil
}
usr, err := user.Current()
if err != nil {
return "", err
}
return filepath.Join(usr.HomeDir, path[1:]), nil
}
If I input the value of ~ it correctly translates it to /home/<user>/ but it ultimately states that the folder does not exist, even though it clearly does, and I know I have access to it, so it doesn't seem to be a permissions thing.
if I try /root/ as the input, I correctly get a permissions error, I am ok with that. But I expect my ~ directory to return "Valid Path"
My error is almost always : no such file or directory
I am on Lubuntu 19.xx and it is a fairly fresh install, I am running this app from ~/Projects/src/Playground/AppName and I am using the bash terminal from vscode.
I have also tried both Lstat and Stat unsuccessfully, not to mention a ton of examples and different ways. I am sure this is some underlying linux thing that I don't understand...
The answer to this is that I was not trimming the ReadString which used the delimiter of \n, by adding strings.Trim(path, "\n"), it corrected my issue.

How to get webpage content into a string using Go

I am trying to use Go and the http package to get the content of a webpage into a string, then be able to process the string. I am new to Go, so I am not entirely sure where to begin. Here is the function I am trying to make.
func OnPage(link string) {
}
I am not sure how to write the function. Link is the url of the webpage to use, and result would be the string from the webpage. So for example, if I used reddit as the link, then the result would just be the string form of the content on reddit, and I could process that string in different ways. From what I have read, I want to use the http package, but as I stated before, I do not know where to begin. Any help would be appreciated.
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func OnPage(link string)(string) {
res, err := http.Get(link)
if err != nil {
log.Fatal(err)
}
content, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
return string(content)
}
func main() {
fmt.Println(OnPage("http://www.bbc.co.uk/news/uk-england-38003934"))
}

Why file's name get messy using archive/zip in golang, linux?

I'm using golang's standard package archive/zip to wrap several files into a zipfile.
Here is my code for test:
package main
import (
"archive/zip"
"log"
"os"
)
func main() {
archive, _ := os.Create("/tmp/测试file.zip")
w := zip.NewWriter(archive)
// Add some files to the archive.
var files = []struct {
Name, Body string
}{
{"测试.txt", "test content: 测试"},
{"test.txt", "test content: test"},
}
for _, file := range files {
f, err := w.Create(file.Name)
if err != nil {
log.Fatal(err)
}
_, err = f.Write([]byte(file.Body))
if err != nil {
log.Fatal(err)
}
}
err := w.Close()
if err != nil {
log.Fatal(err)
}
}
results:
I get a zip file named 测试file.zip under /tmp as expected.
After unzip it, I get two files: test.txt, ц╡ЛшпХ.txt, and that is a mess.
The contents in both of the two files are normal as expected.
Why does this happen and how to fix this?
This might be an issue with unzip not handling UTF8 names properly. Explicitly using the Chinese locale worked for me:
$ LANG=zh_ZH unzip 测试file.zip
Archive: 测试file.zip
inflating: 测试.txt
inflating: test.txt
$ cat *.txt
test content: testtest content: 测试
import {
"golang.org/x/text/encoding/simplifiedchinese"
"golang.org/x/text/transform"
}
filename, _, err = transform.String(simplifiedchinese.GBK.NewEncoder(), "测试.txt")

Resources