Flat File Schema Wizard - xsd

I have an example of a tab delimited flat file like this:
Expense Report Id Name Geography StartDate EndDate TotalExpense
123456789 JJ Thompson Atlanta 6/12/2011 6/18/2011 454.10
ExpenseDate Guests (Separated by comma) CompanyAffiliation Establishment Project
6/14/2011 "Norm McDonald, Gary Shandling" Two Guys Hamburgers Little Debbie's MumboJumbo
6/16/2011 IBN Yo MumboJumbo Conceirge
6/18/2011 Jimi Hendrix The Experience Electric Ladyland MumboJumbo Client
I have to convert an xml schema to look like this tab delimited flat file. Any idea how when using the flat file schema wizard to keep the headers in there? I can do it w/out the headers no problem

One option is to create a top level header element that is max occurs 1 and min occurs 1 that has a field for each column header with a default value. You would need to map a non repeating node on the left to this header element so that it is created.

Related

how to increment google docs heading assignments while retaining relationships?

i've got a big doc with lots of headings assigned
i'd like to select a load of text, and change all heading1 to heading2, all heading2 to heading3, etc. so that the relationships stay the same they're all just one higher (or lower)
eg before:
*HEADING 1
some text
*Heading 2
more text
after:
*Heading 1
*Heading 2
some text
*Heading3
more text
basically i want to add a new category 'above' the existing text
is anything like this possible?

Extracting text in excel

I have some text which I receive daily that I need to seperate. I have hundreds of lines similar to the extract below:
COMMODITY PRICE DIFFERENTIAL: FEB50-FEB40 (APR): COMPANY A OFFERS 1000KB AT $0.40
I need to extract individual snippets from this text, so for each in a seperate cell, I the result needs to be the date, month, company, size, and price. In the case, the result would be:
FEB50-40
APR
COMPANY A
100
0.40
The issue I'm struggling with is uniformity. For example one line might have FEB50-FEB40, another FEB5-FEB40, or FEB50-FEB4. Another example giving me difficult is that some rows might have 'COMPANY A' and the other 'COMPANYA' (one word instead of two).
Any ideas? I've been trying combinations of the below but I'm not able to have uniform results.
=TRIM(MID(SUBSTITUTE($D7," ",REPT(" ",LEN($D7))), (5)*LEN($D7)+1,LEN($D7)))
=MID($D7,20,21-10)
=TRIM(RIGHT(SUBSTITUTE($D6,"$",REPT("$",2)),4))
Sometimes I get
FEB40-50(' OR 'FEB40-FEB5'
when it should be
'FEB40-FEB50'`
Thank you to who is able to help.
You might get to the limits of formulas with this scenario, but with Power Query you can still work.
As I see it, you want to apply the following logic to extract text from this string:
COMMODITY PRICE DIFFERENTIAL: FEB50-FEB40 (APR): COMPANY A OFFERS 1000KB AT $0.40
text after the first : and before the first (
text between the brackets
text after the word OFFERS and before AT
text after 'AT`
These can be easily translated into several "Split" scenarios inside Power Query.
split by custom delimiter : - that's colon and space - for each ocurrence
remove first column
Split new first column by ( - that's space and bracket - for leftmost
Replace ) with nothing in second column
Split third column by delimiter OFFERS
split new fourth column by delimiter AT
The screenshot shows the input data and the result in the Power Query editor after renaming the columns and before loading the query into the worksheet.
Once you have loaded the query, you can add / remove data in the input table and simply refresh the query to get your results. No formulas, just clicking ribbon commands.
You can take this further by removing the "KB" from the column, convert it to a number, divide it by 100. Your business processing logic will drive what you want to do. Just take it one step at a time.

Excel VBA Textfile to 2d array

I am new to excel vba. I want to read a textfile that contains text like this:
John Smith Engineer Chicago
Bob Alice Doctor New York
Jane Smith Teacher St. Louis
So, I want to convert this into a 2D array so if I do print(3,3), it should return 'Teacher'.
I am able to read entire file contents into one string but am having difficulty in converting it to
a 2d array like above. Please advice on how to proceed. Thanks
unless the text file has some specific structure to it, you're going to struggle a bit. Things that might make it easier are:
Does the text file contain line breaks at the end of each line?
Are all the names in [FirstName][LastName] format as per your example
or might some have more/less words?
Does the Occupation always come directly after the name?
Are there a (very) limited number of Occupations?
as mentioned by NautMeg, You have to make some assumptions on the data based on the provided template.
However we can assume that :
a space is the delimiter
The Final column is City, which can contain a space
there are 4 columns
First Name
Last Name
Profession
City/Location
Using this information:
While Not EOF(my_file)
Line Input #my_file, text_line
// text_line contains the independent line
i = i + 1
// i is the line number
Wend
is how we retrieve each line.
Split ( Expression, [Delimiter], [Limit], [Compare] )
This will give you each item in the list. For index's < 3 (0 based index), they are unique columns of data and you can handle them however you want.
For Index >=3, Join these together into 1 string .
Join( SourceArray, [Delimiter] )
You'll likely want to make the delimiter in this case a simple space, since the split function will remove the space.
That will allow you to parse the data AS is.
However, for future reference if you can control the export of the text file, you should try exporting as a CSV file.
Good luck

Sorting txt data files while importing in Excel Data Query

I am trying to enter approximately 190 txt datafiles in Excel using the New Query tool (Data->New Query->From File->From Folder). In the Windows explorer the data are properly ordered: the first being 0summary, the second 30summary etc.
However, when entering them through the query tool the files are sorted as shown in the picture (see line 9 for example, you will see that the file is not in the right position):
The files are sorted based on the first digit instead of the value represented. Is there a solution to this issue? I have tried putting space between the number and the summary but it also didn't work. I saw online that Excel doesn't recognize the text within "" or after /, but I am not allowed to save the text files with those symbols in their name in Windows. Even when removed the word summary the problem didn't fix. Any suggestions?
If all your names include the word Summary:
You can add a column "Extract" / "Text before delimiter" enter "Summary", change the column type to Number and sort over that column
If the only numbers are those you wish to sort on, you can
add a custom column with just the numbers
Change the data type to whole number
sort on that.
The formula for the custom column:
Text.Select([Name],{"0".."9"})
If the alpha portion varies, and you need to sort on that also, you can do something similar adding another column for the alpha portion, and sorting on that.
If there might be digits after the leading digits upon which you want to sort, then use the following formula for the added column which will extract only the digits at the beginning of the file name:
=Text.Middle([Name],0,Text.PositionOfAny([Name],{"A".."z"}))

Datacap how to create a field array and merge or avoid splitted excel sheets

Im trying to get the rows of an excel document. What i have achieved.
1-. Retrieve .xls, .xlsx files
2-. Convert those files to TIFF images
3-. Enhance image for better text recognition
4-. Identify Pages
5-. Create the Documents
6-. Recognize Page and Fields
7-. Populate Fields (this is were is my problem)
For example, in a table like
Name | Age | Size
Juan | 26 | 1.90m
Max | 25 | 1.85m
Victor | 26 | 1.65m
My project can find the keyword Name, Age & Size, and in the settings i can tell him, ok the value is down a line and group the leading and trailing words, but it will only fill the fields name, age and size with the first values below and will ignore the others, and datacap does not seems to have a field array type.
In the image, you can see that there is only one way add fields, and they are scalar (just one value), Add multiple only adds multiple fields at once, not a field of multiple values haha.
This is how my fields get retrieved
Another problem i face is that my excel sheet gets splitted in order to fill a document format, and i was expecting the whole sheet to be converted in 1 document not 4
In the image, those 4 pages are from the same sheet (in the excel)
IBM docs still lacks information, there are some pages that only has its title and zero information lol.
agreed for point 1, it does not support any field like array or something which is more of a advanced level. This feature is really needed and we may see something from IBM going ahead.
Coming back to second point, datacap will be converting the excel according to the print pages like when you print that excel. you have to add the ruleset to merge those in single file.. The most common way to do that is to use tiffmerge ootb given by datacap.

Resources