Convert multiple lines of text to Array - text

So I have a really long text document (10 million lines) and I want to convert that document to an array.
For example, my text document is:
a
b
c
and I want to convert this to:
["a","b","c"]
How can I do that?

First read document as plain text and
use split method with split character '\n'. This will split your document on new lines.

Related

How to Automatically add thousand separators for every number in a string?

How can i create a thousand separator for every number which is in my string?
So for example this string:
string = "123456,78+1234"
should be displayed as:
TextView = "123.456,78+1.234"
And the string should be editable, so the thousand separator should adapt when i remove or add a digit.
I have already read all the posts I could find about it, but I could never find an up-to-date working answer. So I would be really grateful for your help!
Your question contains two sub-questions:
A. You want to add thousand separators to a string which contains a group of numbers.
B. You want it to change.
And the answers are:
A: In your example there's , as a delimiter, so you need to split the string using this delimiter to an array of strings.
Then iterate over them and have your dots added to every 3nth index of their characters; you can also use String.format("%,d", substr.toLong()).
Lastly, append all of the strings back together with , as the separator.
B: This one can be done in different ways. You may store the original string somewhere and observe it, so when it changes it goes to the function which does A, and use the function result the way you like (which I suppose is to be set in a TextView).

Excel VBA Textfile to 2d array

I am new to excel vba. I want to read a textfile that contains text like this:
John Smith Engineer Chicago
Bob Alice Doctor New York
Jane Smith Teacher St. Louis
So, I want to convert this into a 2D array so if I do print(3,3), it should return 'Teacher'.
I am able to read entire file contents into one string but am having difficulty in converting it to
a 2d array like above. Please advice on how to proceed. Thanks
unless the text file has some specific structure to it, you're going to struggle a bit. Things that might make it easier are:
Does the text file contain line breaks at the end of each line?
Are all the names in [FirstName][LastName] format as per your example
or might some have more/less words?
Does the Occupation always come directly after the name?
Are there a (very) limited number of Occupations?
as mentioned by NautMeg, You have to make some assumptions on the data based on the provided template.
However we can assume that :
a space is the delimiter
The Final column is City, which can contain a space
there are 4 columns
First Name
Last Name
Profession
City/Location
Using this information:
While Not EOF(my_file)
Line Input #my_file, text_line
// text_line contains the independent line
i = i + 1
// i is the line number
Wend
is how we retrieve each line.
Split ( Expression, [Delimiter], [Limit], [Compare] )
This will give you each item in the list. For index's < 3 (0 based index), they are unique columns of data and you can handle them however you want.
For Index >=3, Join these together into 1 string .
Join( SourceArray, [Delimiter] )
You'll likely want to make the delimiter in this case a simple space, since the split function will remove the space.
That will allow you to parse the data AS is.
However, for future reference if you can control the export of the text file, you should try exporting as a CSV file.
Good luck

extract data using regex?

I am writing a code to extract a paragragh in a document using regex and I am using python. The data contains a lot of similar words,but i need to extract the paragraph when it hits the first recurring word.
ex: data.txt
extract data
useful data is extracted
extract numbers
useful numbers are extracted
extract variable
useful variables are extracted
The question is, I have to extract only the below:
"extract numbers
useful numbers are extracted"
You can use re.findall and pattern ("([a-zA-Z].* *\n.[a-zA-Z .,']*)") for find all paragraphs. Also, it can be used for poems too.
We save your data in poem variable:
poem = """extract data
useful data is extracted
extract numbers
useful numbers are extracted
extract variable
useful variables are extracted"""
Now, we find all paragraphs and store them in par variable:
import re
par = re.findall("([a-zA-Z].* *\n.[a-zA-Z .,']*)",poem)
Now, par have three elements which you can choose them by par[0], par[1] and par[2].
par[0] is:
'extract data \nuseful data is extracted'
par[1] is:
'extract numbers\nuseful numbers are extracted'
par[2] is:
'extract variable \nuseful variables are extracted'

Pascal reading a line of text into separate strings

Basically a line looks like this: 'number number text text text' with spaces dividing them. The numbers are ok, because the readln() just splits them after the space, but it reads the 3 texts as one. How can i read them into separate strings?
If anybody faces this problem, here's a really easy solution I just found: read the whole thing into a string. Then pos(' ',stringsname), then copy('spacepos'+1, 200), then delete(spacepos,200) from the first string and voilá.

How to make excel treat text as string in Clojure using data.csv?

I am using data.csv to write export data to a csv while however i have some alphanumeric fields which are ids but since they are all numbers excel is treating them is doubles and showing them in exponential form.
Is there a way that we can tell excel to treat is as it is.
Excel displays long numbers in csv files in an abbreviated form with exponents.
Unfortunately there is no way to disable that functionality from within the generated csv.
Also sending it in as text shows the same abbreviated format. Your choices are
1) Assuming the id number has fewer than 16 digits you can go into excel and change the format.
2) Alternatively you can prepend an apostrophe or text character to your id's before you generate the csv. For example
(ns sample.core
(:use [clojure.data.csv]
[clojure.java.io]))
(defn gen-csv [filename]
(with-open [out-file (writer filename)]
(write-csv out-file
[["'123000000" "'45612333"]
["'789909990" "'90099999124"]])))

Resources