Sorting txt data files while importing in Excel Data Query - excel

I am trying to enter approximately 190 txt datafiles in Excel using the New Query tool (Data->New Query->From File->From Folder). In the Windows explorer the data are properly ordered: the first being 0summary, the second 30summary etc.
However, when entering them through the query tool the files are sorted as shown in the picture (see line 9 for example, you will see that the file is not in the right position):
The files are sorted based on the first digit instead of the value represented. Is there a solution to this issue? I have tried putting space between the number and the summary but it also didn't work. I saw online that Excel doesn't recognize the text within "" or after /, but I am not allowed to save the text files with those symbols in their name in Windows. Even when removed the word summary the problem didn't fix. Any suggestions?

If all your names include the word Summary:
You can add a column "Extract" / "Text before delimiter" enter "Summary", change the column type to Number and sort over that column

If the only numbers are those you wish to sort on, you can
add a custom column with just the numbers
Change the data type to whole number
sort on that.
The formula for the custom column:
Text.Select([Name],{"0".."9"})
If the alpha portion varies, and you need to sort on that also, you can do something similar adding another column for the alpha portion, and sorting on that.
If there might be digits after the leading digits upon which you want to sort, then use the following formula for the added column which will extract only the digits at the beginning of the file name:
=Text.Middle([Name],0,Text.PositionOfAny([Name],{"A".."z"}))

Related

Extracting text in excel

I have some text which I receive daily that I need to seperate. I have hundreds of lines similar to the extract below:
COMMODITY PRICE DIFFERENTIAL: FEB50-FEB40 (APR): COMPANY A OFFERS 1000KB AT $0.40
I need to extract individual snippets from this text, so for each in a seperate cell, I the result needs to be the date, month, company, size, and price. In the case, the result would be:
FEB50-40
APR
COMPANY A
100
0.40
The issue I'm struggling with is uniformity. For example one line might have FEB50-FEB40, another FEB5-FEB40, or FEB50-FEB4. Another example giving me difficult is that some rows might have 'COMPANY A' and the other 'COMPANYA' (one word instead of two).
Any ideas? I've been trying combinations of the below but I'm not able to have uniform results.
=TRIM(MID(SUBSTITUTE($D7," ",REPT(" ",LEN($D7))), (5)*LEN($D7)+1,LEN($D7)))
=MID($D7,20,21-10)
=TRIM(RIGHT(SUBSTITUTE($D6,"$",REPT("$",2)),4))
Sometimes I get
FEB40-50(' OR 'FEB40-FEB5'
when it should be
'FEB40-FEB50'`
Thank you to who is able to help.
You might get to the limits of formulas with this scenario, but with Power Query you can still work.
As I see it, you want to apply the following logic to extract text from this string:
COMMODITY PRICE DIFFERENTIAL: FEB50-FEB40 (APR): COMPANY A OFFERS 1000KB AT $0.40
text after the first : and before the first (
text between the brackets
text after the word OFFERS and before AT
text after 'AT`
These can be easily translated into several "Split" scenarios inside Power Query.
split by custom delimiter : - that's colon and space - for each ocurrence
remove first column
Split new first column by ( - that's space and bracket - for leftmost
Replace ) with nothing in second column
Split third column by delimiter OFFERS
split new fourth column by delimiter AT
The screenshot shows the input data and the result in the Power Query editor after renaming the columns and before loading the query into the worksheet.
Once you have loaded the query, you can add / remove data in the input table and simply refresh the query to get your results. No formulas, just clicking ribbon commands.
You can take this further by removing the "KB" from the column, convert it to a number, divide it by 100. Your business processing logic will drive what you want to do. Just take it one step at a time.

Extracting text from complex string in excel

The attached image (link: https://i.stack.imgur.com/w0pEw.png) shows a range of cells (B1:B7) from a table I imported from the web. I need a formula that allows me to extract the names from each cell. In this case, my objective is to generate the following list of names, where each name is in its own cell: Erik Karlsson, P.K. Subban, John Tavares, Matthew Tkachuk, Steven Stamkos, Dustin Brown, Shea Weber.
I have been reading about left, right, and mid functions, but I'm confused by the irregular spacing and special characters (i.e. the box with question mark beside some names).
Can anyone help me extract the names? Thanks
Assuming that your cells follow the same format, you can use a variety of text functions to get the name.
This function requires the following format:
Some initial text, followed by
2 new lines in Excel (represented by CHAR(10)
The name, which consists of a first name, a space, then a last name
A second space on the same line as the name, followed by some additional text.
With this format, you can use the following formula (assuming your data is in an Excel table, with the column of initial data named Text):
=MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,SEARCH(" ",MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,LEN([#Text])),SEARCH(" ",MID([#Text],SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1,LEN([#Text])))+1)-1)
To come up with this formula, we take the following steps:
First, we figure out where the name starts. We know this occurs after the 2 new lines, so we use:
=SEARCH(CHAR(10),[#Text],SEARCH(CHAR(10),[#Text])+1)+1
The inner (occurring second) SEARCH finds the first new line, and the outer (occurring first) finds the 2nd new line.
Now that we have that value, we can use it to determine the rest of the string (after the 2 new lines). Let's say that the previous formula was stored in a table column called Start of Name. The 2nd formula will then be:
=MID([#Text],[#[Start of Name]],LEN([#Text]))
Note that we're using the length of the entire text, which by definition is more than we need. However, that's not an issue, since Excel returns the smaller amount between the last argument to MID and the actual length of the text.
Once we have the text from the start of the name on, we need to calculate the position of the 2nd space (where the name ends). To do that, we need to calculate the position of the first space. This is similar to how we calculated the start of the name earlier (which starts after 2 new lines). The function we need is:
=SEARCH(" ",[#[Rest of String]],SEARCH(" ",[#[Rest of String]])+1)-1
So now, we know where the name starts (after 2 new lines), and where it ends (after the 2nd space). Assuming we have these numbers stored in columns named Start of Name and To Second Space respectively, we can use the following formula to get the name:
=MID([#Text],[#[Start of Name]],[#[To Second Space]])
This is equivalent to the first formula: The difference is that the first formula doesn't use any "helper columns".
Of course, if any cell doesn't match this format, then you'll be out of luck. Using Excel formulas to parse text can be finicky and inflexible. For example, if someone has a middle name, or someone has a initials with spaces (e.g. P.K. Subban was P. K. Subban), or there was a Jr. or something, your job would be a lot harder.
Another alternative is to use regular expressions to get the data you want. I would recommend this thorough answer as a primer. Although you still have the same issues with name formats.
Finally, there's the obligatory Falsehoods Programmers Believe About Names as a warning against assuming any kind of standardized name format.

How do I import data from a .xlsx file to Filemaker Pro if the "matching" field has trailing zeros?

I am importing data from an Excel file into a Filemaker Pro database (FMP 12.0 v5 for Mac). I am using the imported data to "Update matching records in found set". However, the field that I am using to match occasionally contains trailing zeros.
When importing, FMP does not match the fields correctly, because it ignores the trailing zeros.
To explain further: the field in the database is a calculated text field, "courseID.personID", determined by concatenating the numerical "courseID" and "personID" fields (with a dot in between them). The field in my Excel file is formed similarly, using Excel formulae. Some "personID" values end in a zero, e.g. 120, and so courseID.personID becomes something like "123.120". I am matching the Excel field to the FMP field.
I first noticed this was happening, and was very careful to go back to Excel and make a new file (to start fresh), select all cells and set format to Text. Then, I did a Paste Special from my original data, and selected Paste as Values. All the cells in the courseID.personID column gave a "number stored as text error", with the option to convert the text to numbers. I selected the option to ignore the error, to leave all the data stored as text, with the intention of preserving the trailing zeros.
Alas, the issue persists. So, does anyone have any ideas of how to force Excel to format and communicate the proper values? Or, is it an issue of making FMP interpret the data properly, maybe by adjusting field types?
the field in the database is a calculated text field,
"courseID.personID", determined by concatenating the numerical
"courseID" and "personID" fields (with a dot in between them). The
field in my Excel file is formed similarly, using Excel formulae.
Come to think of it, the simplest solution would be to eliminate the calculation fields and use the original values for the import:

Join a path variable to each row in the import csv file

I have many Import files, which look like this
So there are sales values per Team Member, but NO period inside.
The period is coded in the Path like:
AllData\201501\Revenues.txt
AllData\201502\Revenues.txt
AllData\201503\Revenues.txt
I want to have the Periode from the path on each data row, so my final output table should look like this:
So I must bring the period from the path inside the file anyway.
The question how to access the path is solved in perfect example here:
How can I save a path criteria when I import from folders?
But there I have still the period on the "whole" text, not on the row.
In the linked question you can change the custom column formula from:
Text.FromBinary([Content])
to
Text.Split(Text.FromBinary([Content]), "#(000a)")
(depending on how line breaks are represented, you may need to use "#(000a)#(000d)" instead).
This will split the text at each new line, and you'll get a list of the name;value pairs. Click on the box with the two arrows next to the column name to expand the column. Each row should now have the period associated with the name;value pair. Finally, split the column by delimiter on the semicolon to separate the name from the value.
There are 2 options, both involve horrible looking equations.
First option, we assume the paths are going to have the period in the same position in the string.
for the example, we want the number between the 1st and 2nd slashes.
=TRIM(LEFT(SUBSTITUTE(MID(A1,FIND("|",SUBSTITUTE(A1,"\","|",1))+1,LEN(A1)),"\",REPT(" ",LEN(A1))),LEN(A1)))
If it's between a different set of slashes, alter the ,1 to tell the formula which slash to start from. If the number of slashes can be different, then we will have to try for the second option.
Second option, we assume that those are the only numbers in the path.
This formula will extract those numbers:
=SUMPRODUCT(MID(0&A1,LARGE(INDEX(ISNUMBER(--MID(A1,ROW($1:$25),1))* ROW($1:$25),0),ROW($1:$25))+1,1)*10^ROW($1:$25)/10)
Note that this will extract all the numbers from the string. If the path contains numbers, then these will get added to the string. e.g. C:\2014Data\201401\Revenues.txt would return 2014201401
If this doesn't take care of it, then it may be easier putting a column into the table yourself

Mid Function for Microsoft Excel to obtain column .txt file

Captain Morgan ------ Insane Journeys -------- A-
I have easily gotten the left and right side parts using Left() and Right() functions.
I want to use a function in excel (not vba) that will allow me to get the middle phrase in this sentence (The dashes are really excessive spaces). can I accomplish this with a Mid() function?
This is just 1 item on a list of 80 different things in 1 column that needs to be turned into 3 columns. Every item has different character lengths. So the length counts cannot be manually entered.
I agree with Text to Columns but the image in the other answer only has one space per row while OP has some spaces that are redundant and some that are not. For this I’d suggest a modified approach:
Replace all pairs of spaces with a character unlikely to be encountered – I’d suggest a pipe.
Apply Text to Columns with pipe as delimiter.
Apply TRIM to the middle column to remove any remaining redundant spaces (eg =TRIM(B1) copied down and then that column pasted as values over the source).
But to answer can I accomplish this with a Mid() function? I think yes though not cost effective for a mere 80 entries when there is a viable alternative.
Try to use "Text to columns" from Data Tab. It has option to split data to different columns using various criteria.
All you need to do is select data you want to split to columns and select criteria you need.
In your case it can be either Space or Other:. When you select Other: you can add your own criteria like "space dot space" or anything you need.
For more detailed information you can enter this link.

Resources