parsing specific data from .txt file to excel or something else - excel

I have extracted data from one source to .txt file. Source is some sort of address book and I used macro recorder for extraction. Now I have several files which are formated exactly in next way (example on 4 contacts):
Abbrucharbeiten
ATR Armbruster
Werkstr. 28
78727 Oberndorf
Tel. 0175 7441784
Fax 07423 6280
Abbrucharbeiten
Jensen & Sohn, Karl
Schallenberg 6A
25587 Münsterdorf
Tel. 04821 82538
Fax 04821 83381
Abbrucharbeiten
Kiwitt, R.
Auf der Heide 54
48282 Emsdetten
Tel. 02572 88559
Tel. 0172 7624359
Abbrucharbeiten, Sand und Kies, Transporte, Kiesgruben, Erdbau
Josef Grabmeier GmbH
Reitgesing 1
85560 Ebersberg
Tel. 08092 24701-0
Fax 08092 24701-24
1st row is always field(name) of bussines
2nd row is always name of company/firm
3rd row is always street adress
4th row is always Zip code and Place
and then
5th row and next couple of rows (sometimes are two rows sometimes more) are eithar Tel. or Fax.
I want to format it so it would be something like excel sheet like:
Branche: Name: Address: Place: contact1: contact2:
1st row 2nd row 3rd row 4th row 5th row 6th row.....
Now the main problem is I have over 500.000 contacts and my main problems are last fields which aren't always the same number... I don't wan't to do it manually, please help me...

Neither python nor visual basic but shouldn't be very difficult to translate to those languages. This is perl.
perl -lne '
## Print header. Either the header and data will be separated with pipes.
## Contacts(contact1, contact2, etc) are not included because at this
## moment I can not know how many there will be. It could be done but script
## would be far more complex.
BEGIN {
push #header, q|Branche:|, q|Name:|, q|Address:|, q|Place:|;
printf qq|%s\n|, join q{|}, #header;
}
## Save information for each contact. At least six lines. Over that only
## if lines begins with strings "Tel" or "Fax".
if ( $line < 6 || m/\A(?i)tel|fax/ ) {
push #contact_info, $_;
++$line;
## Not skip the printing of last contact.
next unless eof;
}
## Print info of contact, initialize data structures and repeat process
## for the next one.
printf qq|%s\n|, join q{|}, #contact_info;
$line = 0;
undef #contact_info;
push #contact_info, $_;
++$line;
' infile
It's a one-liner (I know it doesn't seem, but you can get rid of comments and remove newlines to get it), so run it directly from your shell. It yields:
Branche:|Name:|Address:|Place:
Abbrucharbeiten|ATR Armbruster|Werkstr. 28|78727 Oberndorf |Tel. 0175 7441784|Fax 07423 6280
Abbrucharbeiten|Jensen & Sohn, Karl|Schallenberg 6A|25587 Münsterdorf|Tel. 04821 82538|Fax 04821 83381
Abbrucharbeiten|Kiwitt, R.|Auf der Heide 54|48282 Emsdetten|Tel. 02572 88559|Tel. 0172 7624359
Abbrucharbeiten, Sand und Kies, Transporte, Kiesgruben, Erdbau|Josef Grabmeier GmbH|Reitgesing 1|85560 Ebersberg|Tel. 08092 24701-0|Fax 08092 24701-24
Take into account that I didn't print the full header and that fields are separated with pipes. I think that is not problematic to import it in Excel.

Related

praat - delete segment

I have several speech files and I need to cut a certain part of the sound file, from 0.21 milliseconds to 0.45 milliseconds. The script below will select the sound segment from 0.21 milliseconds to 0.45 milliseconds and save it. I want to cut the segment from the speech file and then save it without it. I should probably add another line after "Move end of selection to nearest zero crossing" and change the "Write selected sound..." but I am not sure how exactly.
form Files
sentence InputDir ./
endform
createDirectory ("output")
Create Strings as file list... list 'inputDir$'*.wav
numberOfFiles = Get number of strings
for ifile to numberOfFiles
select Strings list
fileName$ = Get string... ifile
Read from file... 'inputDir$''fileName$'
sound_name$ = selected$ ("Sound")
select Sound 'sound_name$'
Edit
editor Sound 'sound_name$'
Select... 0.21 0.45
Move start of selection to nearest zero crossing
Move end of selection to nearest zero crossing
Write selected sound to WAV file... ./output/'fileName$'
endeditor
select all
minus Strings list
Remove
endfor
select all
Remove
I want to cut the segment from the speech file and then save it without it
It's not clear what you want to do with the deleted segment. Do you want to edit it out (as in, shorten the total duration of the sound?) or simply silence it (as in set the samples contained within to zero?).
Either way, you don't need to open the sound editor (the window with the sound wave and the spectrogram). Below I reproduced your script with some alternatives (and updated the syntax).
form Files
sentence Input_dir ./
positive Start 0.21
positive End 0.45
endform
createDirectory("output")
list = Create Strings as file list: "list", input_dir$ + "*.wav"
numberOfFiles = Get number of strings
for ifile to numberOfFiles
selectObject: list
fileName$ = Get string: ifile
sound = Read from file: input_dir$ + fileName$
sound_name$ = selected$("Sound")
# Find zero crossings
start = Get nearest zero crossing: 1, start
end = Get nearest zero crossing: 1, end
sound_end = Get total duration
# Cut the selected part out, shortening the sound
# by extracting the part before and after and concatenating
before = Extract part: 0, start, "rectangular", 1, "no"
selectObject: sound
after = Extract part: end, sound_end, "rectangular", 1, "no"
selectObject: before, after
new = Concatenate
removeObject: before, after
## Or, if you want to set the selected part to zero
## (since we already have the zero crossings)
# new = Set part to zero: start, end, "at exactly these times"
## Or, if what you want is to get only the selected part
## and not the original sound
# new = Extract part: start, end, "rectangular", 1, "no"
# Either way, the new, modified sound is selected
# and its ID is stored in `new`
Save as WAV file: "./output/" + fileName$
# I prefer a less shotgunny way to remove unwanted objects
removeObject: sound, new
endfor
removeObject: list

Split, escaping certain splits

I have a cell that contains multiple questions and answers and is organised like a CSV. So to get all these questions and answers separated a simple split using the comma as the delimiter should separate this easily.
Unfortunately, there are some values that use the comma as the decimal separator. Is there a way to escape the split for those occurrences?
Fortunately, my data can be split using ", " as separator, but if this wouldn't be the case, would there still be a solution besides manually replacing the decimal delimiter from a comma to a dot?
Example:
"Price: 0,09,Quantity: 12,Sold: Yes"
Using Split("Price: 0,09,Quantity: 12,Sold: Yes",",") would yield:
Price: 0
09
Quantity: 12
Sold: Yes
One possibility, given this test data, is to loop through the array after splitting, and whenever there's no : in the string, add this entry to the previous one.
The function that does this might look like this:
Public Function CleanUpSeparator(celldata As String) As String()
Dim ret() As String
Dim tmp() As String
Dim i As Integer, j As Integer
tmp = Split(celldata, ",")
For i = 0 To UBound(tmp)
If InStr(1, tmp(i), ":") < 1 Then
' Put this value on the previous line, and restore the comma
tmp(i - 1) = tmp(i - 1) & "," & tmp(i)
tmp(i) = ""
End If
Next i
j = 0
ReDim ret(j)
For i = 0 To UBound(tmp)
If tmp(i) <> "" Then
ret(j) = tmp(i)
j = j + 1
ReDim Preserve ret(j)
End If
Next i
ReDim Preserve ret(j - 1)
CleanUpSeparator = ret
End Function
Note that there's room for improvement by making the separator caharacters : and , into parameters, for instance.
I spent the last 24 hours or so puzzling over what I THINK is a completely analogous problem, so I'll share my solution here. Forgive me if I'm wrong about the applicability of my solution to this question. :-)
My Problem: I have a SharePoint list in which teachers (I'm an elementary school technology specialist) enter end-of-year award certificates for me to print. Teachers can enter multiple students' names for a given award, separating each name using a comma. I have a VBA macro in Access that turns each name into a separate record for mail merging. Okay, I lied. That was more of a story. HERE'S the problem: How can teachers add a student name like Hank Williams, Jr. (note the comma) without having the comma cause "Jr." to be interpreted as a separate student in my macro?
The full contents of the (SharePoint exported to Excel) field "Students" are stored within the macro in a variable called strStudentsBeforeSplit, and this string is eventually split with this statement:
strStudents = Split(strStudentsBeforeSplit, ",", -1, vbTextCompare)
So there's the problem, really. The Split function is using a comma as a separator, but poor student Hank Williams, Jr. has a comma in his name. What to do?
I spent a long time trying to figure out how to escape the comma. If this is possible, I never figured it out.
Lots of forum posts suggested using a different character as the separator. That's okay, I guess, but here's the solution I came up with:
Replace only the special commas preceding "Jr" with a different, uncommon character BEFORE the Split function runs.
Swap back to the commas after Split runs.
That's really the end of my post, but here are the lines from my macro that accomplish step 1. This may or may not be of interest because it really just deals with the minutiae of making the swap. Note that the code handles several different (mostly wrong) ways my teachers might type the "Jr" part of the name.
'Dealing with the comma before Jr. This will handle ", Jr." and ", Jr" and " Jr." and " Jr".
'Replaces the comma with ~ because commas are used to separate fields in Split function below.
'Will swap ~ back to comma later in UpdateQ_Comma_for_Jr query.
strStudentsBeforeSplit = Replace(strStudentsBeforeSplit, "Jr", "~ Jr.") 'Every Jr gets this treatment regardless of what else is around it.
'Note that because of previous Replace functions a few lines prior, the space between the comma and Jr will have been removed. This adds it back.
strStudentsBeforeSplit = Replace(strStudentsBeforeSplit, ",~ Jr", "~ Jr") 'If teacher had added a comma, strip it.
strStudentsBeforeSplit = Replace(strStudentsBeforeSplit, " ~ Jr", "~ Jr") 'In cases when teacher added Jr but no comma, remove the (now extra)...
'...space that was before Jr.

Adding a newline character within a cell (CSV)

I would like to import product descriptions that need to be logically broken according by things like description, dimensions, finishes etc. How can I insert a line break so that when I import the file they will show up?
This question was answered well at Can you encode CR/LF in into CSV files?.
Consider also reverse engineering multiple lines in Excel. To embed a newline in an Excel cell, press Alt+Enter. Then save the file as a .csv. You'll see that the double-quotes start on one line and each new line in the file is considered an embedded newline in the cell.
I struggled with this as well but heres the solution. If you add " before and at the end of the csv string you are trying to display, it will consolidate them into 1 cell while honoring new line.
csvString += "\""+"Date Generated: \n" ;
csvString += "Doctor: " + "\n"+"\"" + "\n";
I have the same issue, when I try to export the content of email to csv and still keep it break line when importing to excel.
I export the conent as this: ="Line 1"&CHAR(10)&"Line 2"
When I import it to excel(google), excel understand it as string. It still not break new line.
We need to trigger excel to treat it as formula by:
Format -> Number | Scientific.
This is not the good way but it resolve my issue.
supposing you have a text variable containing:
const text = 'wonderful text with \n newline'
the newline in the csv file is correctly interpreted having enclosed the string with double quotes and spaces
'" ' + text + ' "'
On Excel for Mac 2011, the newline had to be a \r instead of an \n
So
"\"first line\rsecond line\""
would show up as a cell with 2 lines
I was concatenating the variable and adding multiple items in same row. so below code work for me. "\n" new line code is mandatory to add first and last of each line if you will add it on last only it will append last 1-2 character to new lines.
$itemCode = '';
foreach($returnData['repairdetail'] as $checkkey=>$repairDetailData){
if($checkkey >0){
$itemCode .= "\n".trim(#$repairDetailData['ItemMaster']->Item_Code)."\n";
}else{
$itemCode .= "\n".trim(#$repairDetailData['ItemMaster']->Item_Code)."\n";
}
$repairDetaile[]= array(
$itemCode,
)
}
// pass all array to here
foreach ($repairDetaile as $csvData) {
fputcsv($csv_file,$csvData,',','"');
}
fclose($csv_file);
I converted a pandas DataFrame to a csv string using DataFrame.to_csv() and then I looked at the results. It included \r\n as the end of line character(s). I suggest inserting these into your csv string as your row separation.
Depending on the tools used to generate the csv string you may need escape the \ character (\r\n).

Formatting columns in exporting SAS table to Excel spreadsheet

I am automating the exporting of a data set from SAS to Excel using ODS ExcelXP:
PURCHASE_ annual_ Minimum_
Obs APR fee Cash_Advance
1 23.45% NONE $10
2 23.45% NONE $10
3 23.45% NONE $10
4 18.45% NONE $10
5 18.45% NONE $10
6 18.45% NONE $10
7 23.45% NONE $10
8 23.45% NONE $10
9 23.45% NONE $10
10 23.45% $0
11 23.45% $0
12 23.45% $0
In SAS, the columns are formatted as text and I want all of the columns to be imported as text into Excel. I've used the following code to create the file using PROC REPORT:
ods tagsets.ExcelXP path="H:/path" file="file.xls" style=myStyle
options(frozen_headers='yes' WrapText='no'
embedded_titles='yes' suppress_bylines='yes'
sheet_interval='none' sheet_label=' '
sheet_name='Solicited'
width_points='1' width_fudge='1'
absolute_column_width='100' autofit_height='yes'
zoom='100');
title1;
proc report data=testing2 nowd;
column purchase_APR annual_fee minimum_cash_advance;
define purchase_APR / display style(column)={tagattr='format:#'} 'PURCHASE_APR';
define annual_fee / display style(column)={tagattr='format:#'} 'ANNUAL_FEE';
define minimum_cash_advance / display style(column)={tagattr='format:#'} 'MINIMUM_CASH_ADVANCE';
run;
ods tagsets.ExcelXP close;
However, when opening up the Excel file, the Text fields have been somehow changed from 23.45%, $0, and $10 (text) to 0.2345, 0, and 10 (text) respectively.
How can I get the output in Excel to be just like the data set in SAS?
I have tried using the specific formats to get them to look the same (i.e.tagattr='format:0.00%', etc.) but the output in Excel is numeric and not text format.
The proper way would be to modify how the template processes numbers. You can do that pretty easily in this case. You could even just comment out a line and one block of code, but here's the really proper answer.
Open the template in a text editor. We're going to add a couple of parameters, and implement them.
First, add the options to the $valid_options array. There are a bunch of lines like these, add these two more (Around line 635 or so):
set $valid_options["TEXTPERCENT"] "This value forces percentages to be displayed as text";
set $valid_options["TEXTCURRENCY"] "This value forces currency amounts to be displayed as text";
That text can be whatever you want, this is one interpretation. Now, around line 700 there are some lines setting the defaults, add these two:
set $option_defaults["TEXTCURRENCY"] 'no';
set $option_defaults["TEXTPERCENT"] 'no';
Now down much later (around row 1670) you have the section that defines $punctuation. We change how that works in order to remove "%" and "$" from the list if you set those options:
set $punctuation $thousands_separator " ";
set $punctuation $punctuation "%" /if ^$textpct;
set $punctuation $punctuation $currency_sym /if ^$textcurr;
(Basically, set $variable /if ; we set up punctuation to start with $thousands_separator and then add in the other bits if they are "no" only.)
Now around line 2100 in the "Yes/no on/off options... " section we evaluate the option's value. (The prior uses these values, but that's okay; it's actually called later.)
set $option_key 'TEXTPERCENT';
trigger do_yes_no;
eval $textpct $answer;
set $option_key 'TEXTCURRENCY';
trigger do_yes_no;
eval $textcurr $answer;
Finally, we implement things. Down around line 7400 is event value_type; which is where the % $ get removed and the numbers get adjusted to be 'real numbers' even if they shouldn't be. This is annoying. So we tell it not to.
do /if ^$textpct;
do /if $convert_percentages;
eval $tmp inputn($value, $test_format)/100;
else;
eval $tmp inputn($value, $test_format);
done;
/*putlog "Percent value:" $tmp;*/
set $value $tmp;
done;
We wrap the percent conversion code with do /if ^$textpct; and done, which tells it to skip doing the inputn (which will kill our percents). If we were cheating and not doing this the proper way, we could comment out this line:
set $value compress($value, $punctuation);
But since we fixed the $punctuation variable to contain (or not contain!) the right stuff already, this isn't an issue.
Now this will work! We just modify the tagset call:
*First include your tagset, which I put in c:\temp\ but you can put wherever and call whatever you like;
%include "c:\temp\excel_tpl_nocompress.txt";
ods tagsets.ExcelXP path="c:\temp\" file="testfile.xml"
options(frozen_headers='yes' WrapText='no'
embedded_titles='yes' suppress_bylines='yes'
sheet_interval='none' sheet_label=' '
sheet_name='Solicited' convert_percentages="no"
width_points='1' width_fudge='1'
absolute_column_width='100' autofit_height='yes'
textcurrency='yes' textpercent='yes'
zoom='100');
title1;
*Then add in the textpercent and textcurrency lines, and it should work as is.;
And now you're off to the races.
- <Row ss:AutoFitHeight="1">
- <Cell ss:StyleID="data__l1" ss:Index="1"> <Data ss:Type="String">23.45%</Data> </Cell>
- <Cell ss:StyleID="data__c1" ss:Index="2"> <Data ss:Type="String" /> </Cell>
- <Cell ss:StyleID="data__l1" ss:Index="3"> <Data ss:Type="String">$0</Data> </Cell>
</Row>
I found the answer I needed. I found the ExcelXP Options on the following page: ExcelXP Options I know it is a 'hack', but I changed the default options in the tagsets.ExcelXP to be currency_symbol = "|" and decimal_separator = "|"... fooling SAS into thinking that it should look for the pipe for currencies instead of a dollar sign and the pipe instead of a period for percentages. That way when it came across $0 or 23.45% it treated these as pure text.

Populate Drop Down List Box (DDLB) with two values in PowerBuilder

I've created a Drop Down List Box (DDLB) in my window (I'm using PowerBuilder 10.5). Once I would call my function, the DDLB would fill with all the different cities from my table. This is the code I've used:
FOR li_i=1 TO ii_br_red
ls_city = dw_city.GetItemString(li_i, 'city')
IF ddlb_city.FindItem(ls_city, 1) = -1 THEN
ddlb_city.AddItem(ls_city) END IF; NEXT
Next part of the code is in the ddlb "selectionchanged" event...
dw_city.SetFilter("city = '" + this.text + "'")
dw_city.Filter()
This works great, and after calling my function (via click on a command button) I'd get a list of all different cities in my table, ex.
Paris
London
New York
Washington
No town would be listed twice.
What I need to do now is add a country next to every city in my DDLB. So that after clicking my command button I would get this in my DDLB:
Paris (France)
London (GB)
New York (USA)
Washington (USA)
Any advice? Thanks in advance...
SECOND QUESTION, similar to this subject: I have an SQL code:
SELECT distinct name FROM table1;
This gives me 8 different names. What I want to do is fill another DDLB, ddlb_1 with these names, but this must occur on the open event of my program. This is what I've written in the open event of my program:
string ls_name
SELECT distinct name INTO :ls_name FROM tabel1;
ddlb_1.AddItem(ls_name)
But this only gives me the first name. I'm guessing I need some kind of count, but I just can't pull it off.
If you do not want to change the design of the program, and as you states that the country is in the same DW, you could hack the code a little to add the country to the ddlb (I suppose that the country is available on the same row of the dw):
String ls_country
FOR li_i=1 TO ii_br_red
ls_city = dw_city.GetItemString(li_i, 'city')
IF ddlb_city.FindItem(ls_city, 1) = -1 THEN
ls_country = dw_city.GetItemString(li_i, 'country')
ddlb_city.AddItem(ls_city + ' (' + ls_country + ')')
END IF
NEXT
A quick and dirty hack to get back the value in the event to filter the DW would be
int p
string ls_city
ls_city = this.text
p = pos(ls_city, '(')
if p > 0 then ls_city = left(ls_city, p - 2) //skip the "space + (country)" part
dw_city.SetFilter("city = '" + ls_city + "'")
dw_city.Filter()
But this kind of code is difficult to maintain and should be replaced by something else, as the processing of the city value is strongly coupled to its representation in the list.
A better solution would be a dropdowndatawindow, or (worse) an array of the cities names where the index of a city + country in the ddlb would correspond to the index of the bare city name suitable for filtering the DW
I think you should modify the "source" datawindow' select, and you should get the final result which you want, and you would only need to copy the datas from the datawindow to the ddlb. You should use distinct in the select something like this:
select distinct city + ' (' + country_code + ')' from cities_and_countries_table
of course you should replace the "city", "country_code" to the actual column name in your table as well the table-name. With this you will get every city only once and they will be already concatenated with the country code.
Br. Gábor
Does it really have to be DDLB? I would give the user a Single Line Edit for the city name and filter the DW as the user types.
To answer my own second question, this is how I've done it finally...
String ls_name
DECLARE xy CURSOR FOR
SELECT distinct name FROM table1;
OPEN xy;
FETCH xy INTO :ls_name;
do until sqlca.sqlcode <> 0
ddlb_1.AddItem(ls_name);
FETCH xyINTO :ls_name;
loop
CLOSE xy;
I'm new to PowerBuilder, But I just used that kind of scenario, however I used a DDW (Drop Down Data Window) Instead of a List Box On this case, you can display more than one column as soon as the DW gets the focus and you'd be able to dynamically populate the data. Give it a try. It worked for me, DW's are a pain in the neck when you're just starting (as in my case) but you can do a lot with it

Resources