I have over 20,000 records that are being exported from a program that look like this:
Parent : 000691195
CUSTNO : 115225036-AD
COMPANY : BROOK FURNITURE RENTAL
ADDRESS1 : 100 N FIELD DR
city : LAKE FOREST
STATE : IL
ZIP : 600452580
Parent : 000691195
CUSTNO : 116952265-AD
COMPANY : BROOK FURNITURE RENTAL
ADDRESS1 : 100 N FIELD DR STE 220
city : LAKE FOREST
STATE : IL
ZIP : 600452598
I need to transpose them into usable columns and rows, but I have no IDEA how to get it to work from the text file. I have looked at some answers in how to replace the carriage return to a comma but the last piece of data need to NOT have a comma after it to indicate the end of the row and some of the ADDRESS1 fields are empty anyway.
Any help would be appreciated.
You can complete that task by writing small script, for example, in perl.
Perl is designed to work with text transformations better than other languages.
Perl - Practical Extraction and Reporting Language
If you do not know any programming language. You may hire freelancer that may complete that task at upwork.com or similar site
So I figured it out - it is actually a combination of a couple of different answers here in StackOverflow.
First I needed to replace ALL of the carriage returns with "^" this would allow me to open the delimited file later without worrying about commas in Company names.
Next where the one line ended and the next began, there was the text ^Parent so I replaced it with \r\nParent and that gave me each one a new line.
Finally I deleted all of the extraneous "header" information and that left me with just the data I needed.
Related
I am working on a dataset which has data (text) entries captured in different styles like we see in the table below in 1000's of rows:
**School Name **
Abirem school
Abirem sec School
Abirem Secondary school
Abirem second. School
Metropolitan elementary
Metropolitan Element.
Metropolitan ele
I need help to extract the unique data values within a group of similar entries regardless of the style it was entered. The output I want should look like we see below:
**School Name **
Abirem school
Metropolitan elementary
I have tried using the functions; EXACT, UNIQUE, MATCH and even XLOOKUP (with the wildcard option) but none of them gives me the output I want.
Is there a logical function that can be used?
This will prove to be tricky. Excel would not know wheather or not two different names that look similar are actually meant to be similar. Even for us humans it will become trivial. I mean; would School1 ABC be similar to School1 DEF or not? Without actually knowing geographical locations about these two schools these could well be two different schools with a similar first word in their names.
Either way, if you happen to be willing to accept this ambiguity you could make a match on the 1st word of each line here and return only those where they match first:
Formula in C1:
=LET(a,A1:A7,UNIQUE(XLOOKUP(TEXTSPLIT(a," ")&" *",a&" ",a,,2)))
I have a datasheet in Excel with 154 Columns. A is with a profile name for example like this:
T_Data_Capture_CustomerData_(jp)
What I want to do is make a new Column with the full name of the Author from the initials that are in the profile name _(jp) (example Johnson, Paul).
Now I have multiple profiles here, with different people and their initials like: (ss),(mwp),(an) etc and I also have the Full Names of the Author and their initials as a seperate Datasheet from which I can read the data from.
Also the profile names don't all start the same, they are different in lenght, examples:
P_V8_Intersport_I_WE_IBD_AVIS_SAVE_XRange_(mi)
P_DSV-DM_Fortras-Release-6_BORD128_to_ALFLAT-ALBORD_(ak)
P_V4_100_Gardner_Denver_Credit_to_ALINVOICE_Part_01_Processing_Of_Data_(ss)
It would look something like this:
Profile name
Author
T_Data_Capture_CustomerData_(jp)
Johnson, Paul
P_V8_Intersport_I_WE_IBD_AVIS_SAVE_XRange_(ss)
Smith, Sophie
I just don't really know how to achieve this. Any help would be appreciated.
So, simple, but the first one relies on the initials being in the last 3 characters:
VLOOKUP(LEFT(RIGHT(A1,3),2),E2:F3,2,0)
If you want more characters then you can use mid() with find() to locate the ( as the start and ) as the end. Like so:
I am new to excel vba. I want to read a textfile that contains text like this:
John Smith Engineer Chicago
Bob Alice Doctor New York
Jane Smith Teacher St. Louis
So, I want to convert this into a 2D array so if I do print(3,3), it should return 'Teacher'.
I am able to read entire file contents into one string but am having difficulty in converting it to
a 2d array like above. Please advice on how to proceed. Thanks
unless the text file has some specific structure to it, you're going to struggle a bit. Things that might make it easier are:
Does the text file contain line breaks at the end of each line?
Are all the names in [FirstName][LastName] format as per your example
or might some have more/less words?
Does the Occupation always come directly after the name?
Are there a (very) limited number of Occupations?
as mentioned by NautMeg, You have to make some assumptions on the data based on the provided template.
However we can assume that :
a space is the delimiter
The Final column is City, which can contain a space
there are 4 columns
First Name
Last Name
Profession
City/Location
Using this information:
While Not EOF(my_file)
Line Input #my_file, text_line
// text_line contains the independent line
i = i + 1
// i is the line number
Wend
is how we retrieve each line.
Split ( Expression, [Delimiter], [Limit], [Compare] )
This will give you each item in the list. For index's < 3 (0 based index), they are unique columns of data and you can handle them however you want.
For Index >=3, Join these together into 1 string .
Join( SourceArray, [Delimiter] )
You'll likely want to make the delimiter in this case a simple space, since the split function will remove the space.
That will allow you to parse the data AS is.
However, for future reference if you can control the export of the text file, you should try exporting as a CSV file.
Good luck
I have two CSV files, one of 25 000 lines containing all data and one of 9000 lines containing names i need to get the data from the first one.
Someone told me that would be fairly easy using excel but i can't seem to find a similar problem.
I've tried comparisons tools, but they are not helping me isolate what i need.
Using this example
Master file :
Name;email;displayname
Bbob;Bbob#mail.com;Bob bob
Mmartha;Martha#mail.com;Mmartha
Cclaire;Cclaire#mail.com;cclair
Name file :
Name
Mmartha
Cclaire
What i need to get after comparison :
Name;email;displayname
Mmartha;Martha#mail.com;Mmartha
Cclaire;Cclaire#mail.com;cclair`
So for the names I've in my second csv, I've got to get the entire line from the master csv file.
Right now i can use notepad compare for exemple, but on 25000 lines considering what i need, it's a lot of manual labor to come. I think there is a way someone faced a similar issue.
I can't seem to find a solution right now so here I am.
Beforehand, excuses for the Dutch screenshots, I'm unsure about the English terms in PowerQuery, but you should be able to follow the procedure.
Using PowerQuery:
Start PowerQuery
Load both source CSV1 and CSV2
Join Query as new
Select both column 1 and select Inner option
Result should look like this:
Use first row as headers:
Delete 4th column, close and load values
I have a list of addresses in Excel. They are all the same, except for one difference - some have "US" at the end, while others end in a zip code. Two examples below:
142 N. Birchwood Louisville KY 40206 US
3937 Ludlow Street Philadelphia PA 19104
I am trying to extract the zip code for all the addresses in another column. To achieve this, I did a 2 step process.
=SUBSTITUTE(N2, "US", "") to delete US from every address.
=RIGHT(P2, 6) to extract the 6 characters from the right to get the 5 digit zip code)
The problem is that these functions are in two different columns and done separately. How do I combine these functions into one to get rid of the extra step?
Thank you!
Something like this should be enough to do the job:
=IF(RIGHT(A1,2)="US",MID(A1,LEN(A1)-7,5),RIGHT(A1,5))
The idea is that you should check for "US" only on the last part of the string. Thus, you have two options, depending on the result - either RIGHT(A1,5) or MID(A1,LEN(A1)-7,5):