I've the below text:
str := `
Maybe we should all just listen to
records and quit our jobs
— gach White —
AZ QUOTES
`
And want to remove ALL empty lines.
I was able to remove the empty lines in the paragraphs as:
str = strings.Replace(str, "\n\n", "\n", -1)
fmt.Println(str)
And ended up with:
Maybe we should all just listen to
records and quit our jobs
— gach White —
AZ QUOTES
So, still have couple of empty lines at the beginning and few empty lines at the end, how can I get red of them?
In my app I'm trying to extract the texts from all "png" files in the same directory, and get it in pretty format, my full code so far is:
package main
import (
"fmt"
"io/ioutil"
"os"
"os/exec"
"path/filepath"
"strings"
_ "image/png"
)
func main() {
var files []string
root := "."
err := filepath.Walk(root, func(path string, info os.FileInfo, err error) error {
if filepath.Ext(path) == ".png" {
path = strings.TrimSuffix(path, filepath.Ext(path))
files = append(files, path)
}
return nil
})
if err != nil {
panic(err)
}
for _, file := range files {
fmt.Println(file)
err = exec.Command(`tesseract`, file+".png", file).Run()
if err != nil {
fmt.Printf("Error: %s\n", err)
} else {
b, err := ioutil.ReadFile(file + ".txt") // just pass the file name
if err != nil {
fmt.Print(err)
} else {
str := string(b) // convert content to a 'string'
str = strings.Replace(str, "\n\n", "\n", -1)
fmt.Println(str) // print the content as a 'string'
}
}
}
}
split the string with \n and remove whitespaces in splitted eliments and then concat them with \n
func trimEmptyNewLines(str string) string{
strs := strings.Split(str, "\n")
str = ""
for _, s := range strs {
if len(strings.TrimSpace(s)) == 0 {
continue
}
str += s+"\n"
}
str = strings.TrimSuffix(str, "\n")
return str
}
run full code here
You can use strings.TrimSpace to remove all leading and trailing whitespace:
str = strings.TrimSpace(str)
A little different answer.
package main
import (
"fmt"
)
func main() {
str := `
Maybe we should all just listen to
records and quit our jobs
— gach White —
AZ QUOTES
`
first := 0
last := 0
for i, j := range []byte(str) {
if j != 10 && j != 32 {
if first == 0 {
first = i
}
last = i
}
}
str = str[first : last+1]
fmt.Print(str)
}
I copied your string and turned it into JSON:
package main
import (
"encoding/json"
"log"
)
func main() {
// The string from the original post.
myString := `
Maybe we should all just listen to
records and quit our jobs
— gach White —
AZ QUOTES
`
// Marshal to json.
data, err := json.Marshal(myString)
if err != nil {
log.Fatalf("Failed to marshal string to JSON.\nError: %s", err.Error())
}
// Print the string to stdout.
println(string(data))
}
It'll probably be easier to see the whitespace in JSON.
"\n \n\nMaybe we should all just listen to\nrecords and quit our jobs\n\n— gach White —\n\nAZ QUOTES\n\n \n\n \n\n "
Do you see the problem here? There's a couple of spaces in between your newline characters, additionally, you have uneven numbers of newline characters. so replacing \n\n with \n won't behave as you'd like it to.
I see one of your goals is this:
And want to remove ALL empty lines.
(I'm not addressing extracting text from PNG files, as that's a separate question.)
package main
import (
"encoding/json"
"log"
"strings"
)
func main() {
// The string from the original post.
myString := `
Maybe we should all just listen to
records and quit our jobs
— gach White —
AZ QUOTES
`
// Create a resulting string.
result := ""
// Iterate through the lines in this string.
for _, line := range strings.Split(myString, "\n") {
if line = strings.TrimSpace(line); line != "" {
result += line + "\n"
}
}
// Print the result to stdout.
println(result)
// Marshal the result to JSON.
resultJSON, err := json.Marshal(result)
if err != nil {
log.Fatalf("Failed to marshal result to JSON.\nError: %s", err.Error())
}
println(string(resultJSON))
}
stdout:
Maybe we should all just listen to
records and quit our jobs
— gach White —
AZ QUOTES
"Maybe we should all just listen to\nrecords and quit our jobs\n— gach White —\nAZ QUOTES\n"
It looks like you have white space inbetween, e.g.
\n \n
So, doing a regexp replace with regular expression \n[ \t]*\n might be more sensible.
This won't remove single empty lines at the beginning though, for this you would use ^\n* and replace with an empty string.
Refining this a bit further, you can add more white space like \f and consider multiple empty lines at once
\n([ \t\f]*\n)+
\n a newline
(...)+ followed by one or more
[ \t\f]*\n empty lines
This clears all empty lines in between, but may keep white space at the beginning or the end of the string. As suggested in other answers, adding a strings.TrimSpace() takes care of this.
Putting everything together gives
https://play.golang.org/p/E07ZkE2nlcp
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
str := `
Maybe we should all just listen to
records and quit our jobs
— gach White —
AZ QUOTES
`
re := regexp.MustCompile(`\n([ \t\f]*\n)+`)
str = string(re.ReplaceAll([]byte(str), []byte("\n")))
str = strings.TrimSpace(str)
fmt.Println("---")
fmt.Println(str)
fmt.Println("---")
}
which finally shows
---
Maybe we should all just listen to
records and quit our jobs
— gach White —
AZ QUOTES
---
Related
I currently have a script that performs an os command, that returns a great deal of data, at the end of the data it gives a total such that:
N Total.
N can be any number from 0 upward.
I want to perform this command, and take N then put it into a value. I have the command running and I'm storing it in a bytes.Buffer, however I'm unsure how to scrape this so that I only get the number. The "N Total." string is always at the end of the output. Any help would be appreciated as I've seen various different methods but they all seem quite convoluted.
You can use a bufio.Scanner to read the command's output line-wise. Then just remember the last line and parse it once the command has finished.
package main
import (
"bufio"
"fmt"
"io"
"os/exec"
"strings"
)
func main() {
r, w := io.Pipe()
cmd := exec.Command("fortune")
cmd.Stdout = w
go func() {
cmd.Run()
r.Close()
w.Close()
}()
sc := bufio.NewScanner(r)
var lastLine string
for sc.Scan() {
line := sc.Text()
fmt.Println("debug:", line)
if strings.TrimSpace(line) != "" {
lastLine = line
}
}
fmt.Println(lastLine)
}
Sample output:
debug: "Get back to your stations!"
debug: "We're beaming down to the planet, sir."
debug: -- Kirk and Mr. Leslie, "This Side of Paradise",
debug: stardate 3417.3
stardate 3417.3
Parsing lastLine is left as an excercise for the reader.
You can split the string by \n and get the last line.
package main
import (
"fmt"
"strconv"
"strings"
)
func main() {
output := `
Some os output
Some more os output
Again some os output
1001 Total`
// If you're getting the string from the bytes.Buffer do this:
// output := myBytesBuffer.String()
outputSplit := strings.Split(output, "\n") // Break into lines
// Get last line from the end.
// -1 assumes the numbers in the last line. Change it if its not.
lastLine := outputSplit[len(outputSplit)-1]
lastLine = strings.Replace(lastLine, " Total", "", -1) // Remove text
number, _ := strconv.Atoi(lastLine) // Convert from text to number
fmt.Println(number)
}
peterSO points out that for big output the above may be slow.
Here's another way that uses a compiled regexp expression to match against a small subset of bytes.
package main
import (
"bytes"
"fmt"
"os/exec"
"regexp"
"strconv"
)
func main() {
// Create regular expression. You only create this once.
// Would be regexpNumber := regexp.MustCompile(`(\d+) Total`) for you
regexpNumber := regexp.MustCompile(`(\d+) bits physical`)
// Whatever your os command is
command := exec.Command("cat", "/proc/cpuinfo")
output, _ := command.Output()
// Your bytes.Buffer
var b bytes.Buffer
b.Write(output)
// Get end of bytes slice
var end []byte
if b.Len()-200 > 0 {
end = b.Bytes()[b.Len()-200:]
} else {
end = b.Bytes()
}
// Get matches. matches[1] contains your number
matches := regexpNumber.FindSubmatch(end)
// Convert bytes to int
number, _ := strconv.Atoi(string(matches[1])) // Convert from text to number
fmt.Println(number)
}
I just started with Go, and I'm having a little trouble accomplishing what I want to do. After loading a large text file in which each line begins with a word I want in my array, followed by single and multi-space delimited text I do not care about.
My first line of code creates an array of lines
lines := strings.Split( string( file ), "\n" )
The next step would be to truncate each line which I can do with a split statement. I'm sure I could do this with a for loop but I'm trying to learn some of the more efficient operations in Go (compared to c/c++)
I was hoping I could do something like this
lines := strings.Split( (lines...), " " )
Is there a better way to do this or should I just use some type of for loop?
Using bufio.NewScanner then word := strings.Fields(scanner.Text()) and slice = append(slice, word[0]) like this working sample code:
package main
import (
"bufio"
"fmt"
"strings"
)
func main() {
s := ` wanted1 not wanted
wanted2 not wanted
wanted3 not wanted
`
slice := []string{}
// scanner := bufio.NewScanner(os.Stdin)
scanner := bufio.NewScanner(strings.NewReader(s))
for scanner.Scan() {
word := strings.Fields(scanner.Text())
if len(word) > 0 {
slice = append(slice, word[0])
}
}
fmt.Println(slice)
}
Using strings.Fields(line) like this working sample code:
package main
import "fmt"
import "strings"
func main() {
s := `
wanted1 not wanted
wanted2 not wanted
wanted3 not wanted
`
lines := strings.Split(s, "\n")
slice := make([]string, 0, len(lines))
for _, line := range lines {
words := strings.Fields(line)
if len(words) > 0 {
slice = append(slice, words[0])
}
}
fmt.Println(slice)
}
output:
[wanted1 wanted2 wanted3]
If I have a multi line string like
this is a line
this is another line
what is the best way to remove the empty line? I could make it work by splitting, iterating, and doing a condition check, but is there a better way?
Similar to ΔλЛ's answer it can be done with strings.Replace:
func Replace(s, old, new string, n int) string
Replace returns a copy of the string s with the first n non-overlapping instances of old replaced by new. If old is empty, it matches at the beginning of the string and after each UTF-8 sequence, yielding up to k+1 replacements for a k-rune string. If n < 0, there is no limit on the number of replacements.
package main
import (
"fmt"
"strings"
)
func main() {
var s = `line 1
line 2
line 3`
s = strings.Replace(s, "\n\n", "\n", -1)
fmt.Println(s)
}
https://play.golang.org/p/lu5UI74SLo
Assumming that you want to have the same string with empty lines removed as an output, I would use regular expressions:
import (
"fmt"
"regexp"
)
func main() {
var s = `line 1
line 2
line 3`
regex, err := regexp.Compile("\n\n")
if err != nil {
return
}
s = regex.ReplaceAllString(s, "\n")
fmt.Println(s)
}
The more generic approach would be something like this maybe.
package main
import (
"fmt"
"regexp"
"strings"
)
func main() {
s := `
####
####
####
####
`
fmt.Println(regexp.MustCompile(`[\t\r\n]+`).ReplaceAllString(strings.TrimSpace(s), "\n"))
}
https://play.golang.org/p/uWyHfUIDw-o
I have this little bit of code that kept me busy the whole weekend.
package main
import (
"encoding/csv"
"fmt"
"log"
"os"
)
func main() {
f, err := os.Create("./test.csv")
if err != nil {
log.Fatal("Error: %s", err)
}
defer f.Close()
w := csv.NewWriter(f)
var record []string
record = append(record, "Unquoted string")
s := "Cr#zy text with , and \\ and \" etc"
record = append(record, s)
fmt.Println(record)
w.Write(record)
record = make([]string, 0)
record = append(record, "Quoted string")
s = fmt.Sprintf("%q", s)
record = append(record, s)
fmt.Println(record)
w.Write(record)
w.Flush()
}
When run it prints out:
[Unquoted string Cr#zy text with , and \ and " etc]
[Quoted string "Cr#zy text with , and \\ and \" etc"]
The second, quoted text is exactly what I would wish to see in the CSV, but instead I get this:
Unquoted string,"Cr#zy text with , and \ and "" etc"
Quoted string,"""Cr#zy text with , and \\ and \"" etc"""
Where do those extra quotes come from and how do I avoid them?
I have tried a number of things, including using strings.Quote and some such but I can't seem to find a perfect solution. Help, please?
It's part of the standard for storing data as CSV.
Double quote characters need to be escaped for parsing reasons.
A (double) quote character in a field must be represented by two (double) quote characters.
From: http://en.wikipedia.org/wiki/Comma-separated_values
You don't really have to worry because the CSV reader un-escapes the double quote.
Example:
package main
import (
"encoding/csv"
"fmt"
"os"
)
func checkError(e error){
if e != nil {
panic(e)
}
}
func writeCSV(){
fmt.Println("Writing csv")
f, err := os.Create("./test.csv")
checkError(err)
defer f.Close()
w := csv.NewWriter(f)
s := "Cr#zy text with , and \\ and \" etc"
record := []string{
"Unquoted string",
s,
}
fmt.Println(record)
w.Write(record)
record = []string{
"Quoted string",
fmt.Sprintf("%q",s),
}
fmt.Println(record)
w.Write(record)
w.Flush()
}
func readCSV(){
fmt.Println("Reading csv")
file, err := os.Open("./test.csv")
defer file.Close();
cr := csv.NewReader(file)
records, err := cr.ReadAll()
checkError(err)
for _, record := range records {
fmt.Println(record)
}
}
func main() {
writeCSV()
readCSV()
}
Output
Writing csv
[Unquoted string Cr#zy text with , and \ and " etc]
[Quoted string "Cr#zy text with , and \\ and \" etc"]
Reading csv
[Unquoted string Cr#zy text with , and \ and " etc]
[Quoted string "Cr#zy text with , and \\ and \" etc"]
Here's the code for the write function.
func (w *Writer) Write(record []string) (err error)
I have csv file with line with double quote string like:
text;//*[#class="price"]/span;text
And csv Reader generate error to read csv file.
Helpful was:
reader := csv.NewReader(file)
reader.LazyQuotes = true
The s variable's value is not what you think it is. http://play.golang.org/p/vAEYkINWnm
The file names.txt consists of many names in the form of:
"KELLEE","JOSLYN","JASON","INGER","INDIRA","GLINDA","GLENNIS"
Does anyone know how to split the string so that it is individual names separated by commas?
KELLEE,JOSLYN,JASON,INGER,INDIRA,GLINDA,GLENNIS
The following code splits by comma and leaves quotes around the name, what is the escape character to split out the ". Can it be done in one Split statement, splitting out "," and leaving a comma to separate?
package main
import "fmt"
import "io/ioutil"
import "strings"
func main() {
fData, err := ioutil.ReadFile("names.txt") // read in the external file
if err != nil {
fmt.Println("Err is ", err) // print any error
}
strbuffer := string(fData) // convert read in file to a string
arr := strings.Split(strbuffer, ",")
fmt.Println(arr)
}
By the way, this is part of Project Euler problem # 22. http://projecteuler.net/problem=22
Jeremy's answer is basically correct and does exactly what you have asked for. But the format of your "names.txt" file is actually a well known and is called CSV (comma separated values). Luckily, Go comes with an encoding/csv package (which is part of the standard library) for decoding and encoding such formats easily. In addition to your + Jeremy's solution, this package will also give exact error messages if the format is invalid, supports multi-line records and does proper unquoting of quoted strings.
The basic usage looks like this:
package main
import (
"encoding/csv"
"fmt"
"io"
"os"
)
func main() {
file, err := os.Open("names.txt")
if err != nil {
fmt.Println("Error:", err)
return
}
defer file.Close()
reader := csv.NewReader(file)
for {
record, err := reader.Read()
if err == io.EOF {
break
} else if err != nil {
fmt.Println("Error:", err)
return
}
fmt.Println(record) // record has the type []string
}
}
There is also a ReadAll method that might make your program even shorter, assuming that the whole file fits into the memory.
Update: dystroy has just pointed out that your file has only one line anyway. The CSV reader works well for that too, but the following, less general solution should also be sufficient:
for {
if n, _ := fmt.Fscanf(file, "%q,", &name); n != 1 {
break
}
fmt.Println("name:", name)
}
Split doesn't remove characters from the substrings. Your split is fine you just need to process the slice afterwards with strings.Trim(val, "\"").
for i, val := range arr {
arr[i] = strings.Trim(val, "\"")
}
Now arr will have the leading and trailing "s removed.