Replacing NewLine Character with an Empty String - excel

I want to add a new column with blank fields in my existing CSV data.
My current code is like this:
Dim data As IEnumerable(Of String) =
File.ReadLines(filename,Encoding.GetEncoding("iso-8859-1")).
Select(Function(line, index)
If index = 0 Then
Return "new_column," & line
Else
Return "," & line
End If
End Function)
File.WriteAllLines(savePath, data)
The problem is that "new_column" is being added on the line breaks. So, what I did is to open the CSV file in Excel and used the following steps:
Ctrl + H to open up the Find and Replace dialog box.
On the Find What text box, I used Ctrl + J to enter a line break character. I followed the instructions here
I tried using ReadAllText but the rows are not in the correct order after writing it.
Is there an alternate same way to what I am doing with Excel in VB.NET?

With a CSV file saved from Excel, if there is a line break in a cell then the cell value will be surrounded with double-quotes and the line break is represented by a Chr(10).
For example,
saved as a CSV file and opened in a hex editor gives (note: 10 decimal = 0A hexadecimal)
43 6F 6C 20 41 2C 43 6F 6C 20 42 2C 22 4C 69 6E Col A,Col B,"Lin
65 0A 62 72 65 61 6B 22 2C 43 6F 6C 20 44 0D 0A e·break",Col D··
So you need something which will regard a line break inside a double-quoted string as not being a new line.
The TextFieldParser can be configured to do that by setting the .HasFieldsEnclosedInQuotes property to True.
For example, with the above data,
Option Infer On
Option Strict On
Imports System.IO
Imports System.Text.RegularExpressions
Imports Microsoft.VisualBasic.FileIO
Module Module1
Sub Main()
Dim src = "C:\temp\Book1.csv"
Dim dest = "C:\temp\newColumn.csv"
Using tfp As New TextFieldParser(src)
tfp.HasFieldsEnclosedInQuotes = True
tfp.Delimiters = {","}
Dim re As New Regex("[,\n]")
Using sw As New StreamWriter(dest, False, Text.Encoding.GetEncoding("iso-8859-1"))
While Not tfp.EndOfData
Dim thisLine = tfp.ReadFields()
sw.WriteLine("new_column," & String.Join(",", thisLine.Select(Function(p) If(re.IsMatch(p), Chr(34) & p & Chr(34), p))))
End While
End Using
End Using
End Sub
End Module
generates
6E 65 77 5F 63 6F 6C 75 6D 6E 2C 43 6F 6C 20 41 new_column,Col A
2C 43 6F 6C 20 42 2C 22 4C 69 6E 65 0A 62 72 65 ,Col B,"Line·bre
61 6B 22 2C 43 6F 6C 20 44 0D 0A ak",Col D··
Which can be opened in Excel to give:
You may need to make it more robust than putting double-quotes around only entries with a Chr(10), for example entries with a comma in need the delimiters too.
I wrote it to also delimit fields with commas, although that isn't shown in the example data.
Of course, once you have the individual entries from a line in the array thisLine, you could replace the Chr(10) with a space, if desired.

Related

Can't convert utf8 buffer into ISO8859_1 buffer without data loss

As an example, I'm trying to convert the following utf8 string into ISO8859_1
String = Plœmeurl’
The String's hex representation: 50 6c c5 93 6d 65 75 72 6c e2 80 99 20
After iconv-lite conversion to ISO8859_1:
50 6c 3f 6d 65 75 72 6c 3f 20
The hex value once inserted in my ISO8859_1 database looks like this:
Code:
var iconv = require('iconv-lite');
iconv.encode(Buffer.from(`Plœmeurl’ `, 'utf8'), "ISO-8859-1")
Am I doing something silly ?

Excel cannot import a CSV file containing fields with multiple lines

Microsoft Excel will write a CSV file containing fields with multiple lines. The newlines are 0A (UNIX-style) instead of 0D0A.
However, it will not read correctly the .csv file it just wrote. The fields that contain 0A newlines, become new rows. How can this be overcome?
This Excel spreadsheet is saved as a CSV file named t-xl.xlsx and t-xl.csv.
PS H:\r> Format-Hex .\t-xl.csv
Path: H:\r\t-xl.csv
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 66 31 2C 66 32 0D 0A 31 2C 66 6F 72 0D 0A 32 2C f1,f2..1,for..2,
00000010 22 6E 6F 77 0A 69 73 0A 74 68 65 22 0D 0A 33 2C "now.is.the"..3,
00000020 74 69 6D 65 0D 0A time..
When the t-xl.csv is loaded, Excel seems to remember and handle the newlines correctly (as it was in t-xl.xlsx).
However, when using Data > From Text, it will not handle the newlines correctly.
At least one CSV reference describes support for field embedded newlines. Is there any reason Microsoft Excel does not support this?http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
The Excel legacy text import wizard does not respect quoted line-breaks as not splitting the field.
Opening the file directly with Excel, as you have seen, will respect the quoted line-breaks.
If you have Excel 2010+, you can use Power Query to Get & Transform from Text/CSV. There is an option to enable this (I believe it is enabled by default).
The only work-around for the legacy wizard of which I am aware would be to pre-process the file replacing the quoted line-break with something else, and then processing it again after import to replace "something else" with the line-break.

Go to next item with Find function in Hxd editor

I have a file that I am trying to find some items in.
Here is an example extract:
68 65 6C 6C 6F 20 77 6F 72 6C 64 20 68 6F 77 20 61 72 65 20 79 6F 75 20 68 65 6C 6C 6F 20 77 6F 72 6C 64
In my editor it shows up as:
hello world how are you hello world
I used Ctrl + f to bring up Find.
I entered what I was looking for (hello) in Search for and pressed Search. The first result shows up, but my file contains multiple results as you can see above. How do I move to the next result?
I.e. from the first hello, to the second.
For the version of HxD I am using (1.7.7.0), this function is bound to F3. To search in the reverse direction, press Shift+F3.
Alternatively, both options are selectable under 'Search' located on the menu bar.

Groovy gives error expecting EOF, found '?' # line 9, column 25

I'm using following code to generate random number in Groovy. I can run it in e.g. Groovy Web Console (https://groovyconsole.appspot.com/) and it works, however it fails when I try to run it in Mule. Here is the code I use:
log.info ">>run"
Random random = new Random()
def ranInt = random.nextInt()
def ran = Math.abs(​ranInt)​%20​0;
log.info ">>sleep counter:"+flowVars.counter+" ran: "+ran
sleep(ran)
And here is an exception that gets thrown:
Caused by:
org.codehaus.groovy.control.MultipleCompilationErrorsException:
startup failed: Script26.groovy: 9: expecting EOF, found '?' # line 9,
column 25. def ran = Math.abs(?400)?%20?0;
^
1 error
You have some extra unicode characters in line 4. If you convert it to hex you will get:
64 65 66 20 72 61 6e 20 3d 20 4d 61 74 68 2e 61 62 73 28 e2 80 8b 72 61 6e 49 6e 74 29 e2 80 8b 25 32 30 e2 80 8b 30 3b
Now if you convert this hex back to ascii, you will get:
def ran = Math.abs(​ranInt)​%20​0;
There is a character ​ added after first (, after ) and after first 0. If you remove it, your code will compile correctly.
Here is the hex of curated line:
64 65 66 20 72 61 6e 20 3d 20 4d 61 74 68 2e 61 62 73 28 72 61 6e 49 6e 74 29 25 32 30 30 3b
And the line itself:
def ran = Math.abs(ranInt)%200;

bash command truncating

I have a bash file with the content
cd /var/www/path/to/folder
git pull
When I run it I get
: No such file or directorywww/path/to/folder
' is not a git command. See 'git --help'.
Did you mean this?
pull
Any idea why bash gets a truncated version of commands?
You have carriage returns (Windows text file line endings) in your bash script. Remove them.
The bash file should look like this under hexdump -C:
00000000 63 64 20 2f 76 61 72 2f 77 77 77 2f 70 61 74 68 |cd /var/www/path|
00000010 2f 74 6f 2f 66 6f 6c 64 65 72 0a 67 69 74 20 70 |/to/folder.git p|
00000020 75 6c 6c 0a |ull.|
00000024
But yours looks like this instead:
00000000 63 64 20 2f 76 61 72 2f 77 77 77 2f 70 61 74 68 |cd /var/www/path|
00000010 2f 74 6f 2f 66 6f 6c 64 65 72 0d 0a 67 69 74 20 |/to/folder..git |
00000020 70 75 6c 6c 0d 0a |pull..|
Note the extra 0d's (hex 0D = decimal 13 = ASCII carriage return, ANSI \r) in front of the 0as (hex 0A = decimal 10 = ASCII linefeed, ANSI \n, which is what bash treats as the end of a line).
A carriage return is not whitespace in bash, so it is treated as part of the last argument on the command line. You're getting errors because the folder /var/www/path/to/folder.git\r doesn't exist and pull\r isn't a valid git subcommand.
When printed, a carriage return moves the cursor to the start of the line, which is why your error messages look wrong. Bash and git are printing something like foo.bash: line 1: cd: /www/path/to/folder\r: No such file or directory and git: 'pull\r' is not a git command. See 'git --help', but after the \r moves the cursor to the start of the line, the tail end of each message overwrites its beginning.
There's a program called dos2unix that converts a text file from DOS to Unix:
dos2unix filename >newfilename
But that conversion really consists of nothing but deleting the carriage returns, which you could also do explicitly with tr:
tr -d '\r' <filename >newfilename

Resources