I have a csv file to import to Power Query (250,000 records). One field contains our fee codes which are usually entered as numbers but sometimes contain characters (e.g. 17, 17A, 67, 67A, etc). When I import the file to Power Query, the field is treated as a numeric column and all the data with letters is not imported. I can convert the field to text AFTER the import - but by then it is too late and I have lost all the non numeric data. How can I tell Power Query to bring in this field as TEXT not as a number?
Is there an easy way to change the way the data is imported without having to change the data file or manually create a schema file? P.S. I am new to Power Query so this may be something simple that I have overlooked - but I really have looked!
Power Query adds an automatic type conversion step after certain data sources like CSV. If you look at the Applied Steps list you'll see that the last step is one that changes the type. You can delete that step to undo the type change.
You can also edit the step by making the formula bar visible (which can be enabled by going to the View ribbon and checking the Formula Bar checkbox) and then editing the column type. For example, if your formula was:
= Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type datetime}, {"ID", Int64.Type}})
you can change ID to be a text type by changing Int64.Type to type text.
Related
I'm working with hundreds of .txt files and I have to combine them into 1 single .csv file
Above is a sample format of the text file (there are only 2 columns but have hundreds more rows)
I'm required to first transpose the contents of each .txt file, and then merge all the results into one table, where they all have a common header row (the column of 31330_at, 31385_at, 31463_s_at etc)
This is my first time working with power query and I'm not entirely sure how to do this as I've tried importing all files and transposing them all at once, but it doesn't work.
let
Source = Folder.Files("Directory),
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".TXT")),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content", "Name"}),
#"Invert" = Table.TransformColumns(#"Removed Other Columns", {{"Content", eachTable.Transpose(_)}}),
.....
I've tried the code above but it runs into an error Expression.Error: We cannot convert a value of type Binary to type Table. at the #'Invert' function
For reference, it's the same concept as this link https://stackguides.com/questions/57805673/how-to-transpose-multiple-csv-files-and-combine-in-excel-power-query
How do I fix this?
It looks like you already know how to pull the files from the folder, as shown by your Source = Folder.Files("Directory).
After combining the Binary files, you probably see something like this on your screen, where all of the txt files are appended one after the other:
But that's not what you want. Right? You want the files appended based upon a transposed view of each file. I understand from your description above, that the first column of each file will contain the same information as the first column of every other file, and you want that first column's information to be used as the header for the information that is initially listed in column 2 of each file but will be transposed into appended rows.)
It looks like you are trying to do your transposing in the query that is generated and listed under Other Queries (probably called Directory for you).
Don't do the transposing there. Instead, look for the query called Transform Sample File, which should be listed under Helper Queries, and do the transposing in it.
Click on the query named Transform Sample File.
Then click Transform -> Transpose, to transpose your table.
Then click the ribbon button for Use First Row as Headers, to make the first row your headers.
Then click on that earlier query that is listed under Other Queries (probably called Directory for you)
...and you will see this error message:
This error is caused because the final step of the query is trying to change types using the old column names. So look to the right side of your screen and delete the Changed Type step by clicking on the X before Changed Type. (If you need to, you can change column types later, for the columns that need it.)
Then you should see what I understand you are wanting to see as a result.
Hope you have a great day. Recently I was trying to create a report in Excel and tried to get the data needed from an HTML file. The HTML file is basically the web page where all the issues are stored then filtered in little tables with what we need for the day. I don't have the option to get the data from web directly since the company does not allow add-ins to log in to the site and grab the data from there and the Get Data from Web does not work since the security of the database pops in and does not let you to get anything, so the workaround was to save the page as HTML every time I need to make the report and overwrite the old one that is connected to the Excel Workbook.
I managed to create the needed charts of the loaded tabled from the HTML file into excel, but I stumbled on an issue on the Power Query side. The tables from the page I save the HTML file are not the same, meaning sometimes a column is missing since there was no issues for it and the database will hide it automatically from the table, so when I refresh the query it will display the error "The Column X is missing from the table". I know it is missing, but I don't want to get the data every time one column is missing and redo everything again so the chart will update correctly.
Is there a way to make a code in Power Query advanced editor so the table will update anyway even if a column is missing without needing to code/get data every time? What I'm trying to do here is to automate a process so the least amount of work to get the data, the better for me.
Thanks in advance!
*Edit: This is the source M code of the query:
let
Source = Web.Page(File.Contents("D:\AUTO.html")),
Data1 = Source{1}[Data],
#"Changed Type" = Table.TransformColumnTypes(Data1,{{"Customer Impact", type text}, {"Yes", Int64.Type}, {"No", Int64.Type}, {"WIP", Int64.Type}, {"T:", Int64.Type}})
in
#"Changed Type"
The problem is with the #"Changed Type" step since it's trying to transform non-existing columns.
The simplest solution would be to just eliminate that step entirely and let the data come through without assigning types. That is, replace your query with this:
let
Source = Web.Page(File.Contents("D:\AUTO.html")),
Data1 = Source{1}[Data]
in
Data1
If the typing is important, you can write a more dynamic step to assign types that doesn't break. In this case, you'd need to provide details as to how that logic should work (e.g. "Customer Impact" is always present and should be text and the remainder should all be integers).
I'm using the Dutch version of Word and Excel 2016 to fill in data from an Excel table in a Word document using Mail Merge. When doing so, my dates are represented as a number. I tried using the \# format, both in English as in Dutch, but nothing is working. I checked the Excel file and the data is properly formatted as a date. So far, I tried the following formats in my Word document, including adding and removing spaces before and after the quotation marks:
{MERGEFIELD FieldName \# "dd-mm-jjjj"}
{MERGEFIELD FieldName \# "dd-MM-yyyy"}
{MERGEFIELD FieldName .\# "dd-MM-yyyy"} (adding the dot was only mentioned on one website)
I import the data using the 'Use an Existing List' and 'Insert Merge Fields' function in Word.
Does anyone know what I should change to get a proper Date format in my Word document?
FYI, other numbering formats are working fine.
If the dates are being represented as numbers, that means you have mixed data types in the Excel column.
By default, Word 2002 & later use the OLE DB provider to get records from the data source. Because the OLE DB provider is designed to return data in a way that is compatible with databases, it requires a specific data type for each field, and every record in that field must be of that data type. When using other data sources, the OLE DB provider queries the first 8 records to determine the data type for each field (the 8 can be changed in the Windows Registry, but it’s not advisable to do so). This can lead to unexpected results with data sources such as Excel workbooks, where rows (records) in a column (field) can have different data types.
When the OLE DB provider gets data from a column with mixed data types, records that don’t conform to the determined data type for the column are liable to not be handled correctly. The most common common mailmerge issue arising out of this include:
numbers but not text or dates being output; and
dates being output as numbers,
for some records.
Ideally, one would ensure each field has only one data type. Workarounds include:
Inserting a dummy first record containing data in the format that
is not being output correctly; or
Reordering the data so the first
record has content in the format that is not otherwise being output
correctly.
If you're unable to do either, see Importing Date and Time Values From Excel and Access in my Microsoft Word Date Calculation Tutorial, avialble at:
http://www.msofficeforums.com/word/38719-microsoft-word-date-calculation-tutorial.html
or:
http://www.gmayor.com/downloads.htm#Third_party
Do read the document's introductory material.
I have a CSV file with the following values:
3271.96;274;272;1;1;0;1;0.071690;0;0;0;0;0;0;1.753130;1.75;0;1.75;
But when I open the file with Excel I get this:
3271.96 274 272 1 1 0 1 0.071690 0 0 0 0 0 0 1.753.130 1.75 0 1.75
Why is "1.753130" converted into "1.753.130"? (1.753130 is a decimal number) how can I "force" Excel to understand that these are decimal numbers?
I create the CSV file with a web application, so is difficult just modify my Excel configuration because many people visit my website and download the CSV file to their machines.
For users seeking to this question with newer Excel versions like Excel 365...
As written at Professor Excel you could activate/restore "From Text (Legacy)" in the settings.
My prefered solution
File - Options - Data
Then you will be able to get the old import wizard... legacy but in my opinion more intuitiv.
Other possibilities
At that linked Professor Excel website there are also shown other possibilities. With Excels new import dialog, if you have several columns with numbers all in a different locale to your computers locale settings, then it will be much more effort to do the import.
With the old wizard you are set within a minute. With the new import dialog I haven't found yet a method to be as fast as with the legacy import method.
here is the answer I used:
go to Data tab on excel sheet.
click on from Text button.
then select text or csv file.
then the import wizard will come out. select comma separated or space separated option.
then select delimiter. (this is better if you don't want it to have problem while importing decimals)
then in the next window there will be Advanced option for General column type. Click the advanced button and choose how to separate decimals and thousands.
Change the decimal separator to a "." and remove the thousand separator with a space.
As of now (Sep, 2020), I managed to do this in a slightly different way. I'm using Excel from a Office 365 subscription.
With your Excel sheet open, go to:
Data (tab) > From Text/CSV (Get & Transform Data section)
Select your file (.txt or .csv), then you'll have 3 options:
File Origin: probably you won't have to change this
Delimiter: choose whatever your delimiter is (probably comma)
Data Type Detection: change this to "Do not detect data types"
rename the csv to .txt
open excel
go to file-->open and point to your txt file
go through the steps of importing it
make sure to use ; as the delimitter
I had the same problem but solely this solution didn't work out for me.
Before that I had to go to Office icon -> Excel Options -> Advanced and set the thousand delimitter from "." to "" (nothing).
There is a more straight forward method to import data from text/csv into Excel (2017):
Open a blank book in Excel and click in import data from text/csv.
Select the file.
The assistant will show a preview of the data, but if you are importing from a csv with decimal / scientific numbers all will be recognized as text.
Before importing, click on edit, you will see an Excel spreadsheet with a preview of your data.
If you click on the advanced editor button, a new window with the query Excel does will appear.
You will see something like:
let
Origin = Csv.Document(File.Contents("C:\Users\JoseEnriqueP\Downloads\evaluation_output.txt"),[Delimiter=",", Columns=8, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Updated type" = Table.TransformColumnTypes(Origin,{{"Column1", Int64.Type}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}})
in
#"Updated type"
Then, you can write down directly the types for each column:
- Text: type text
- Integers: Int64.Type
- Decimals: Double.Type
The import code would be as follows:
let
Origin = Csv.Document(File.Contents("C:\Users\JoseEnriqueP\Downloads\evaluation_output.txt"),[Delimiter=",", Columns=8, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Updated type" = Table.TransformColumnTypes(Origin,{{"Column1", Int64.Type}, {"Column2", Int64.Type}, {"Column3", Int64.Type}, {"Column4", type text}, {"Column5", Double.Type}, {"Column6", Double.Type}})
in
#"Updated type"
By doing this, you will get directly your data into Excel.
If you have a newer version of Excel(e.g. Office 365) and you don't need to correct the file's encoding, here is what worked for me:
open the .csv file by double clicking it in your file explorer
select the column(s) containing decimal numbers
use Find and Replace to change all dots (.) to a comma (,) sign
This assumes that no other data transformations are needed(which would likely require going through the import wizard), and that the file's encoding is correctly recognized by Excel.
If encoding is also an issue, do the following before the steps above:
edit the file in Notepad++
open the Encoding menu tab
choose a desired value to convert the file's encoding
Some of the other answers work also, but for sheer simplicity, you can't beat the Find and Replace method. No matter what you do, here is the most important step: Live long and prosper!
Something that worked for me in 2012 version of Excel is that when you import data, you have the option to open a 'Transform Data' box. In this box on the right side panel, you can see a list of 'Applied Steps'. These are the steps which excel applies on the source file. You can remove the steps from this list which are causing problems.
I had a problem with excel ignoring the decimal point while importing from my text file but this resolved the issue.
When an excel data source is used in SSIS, the data types of each individual column are derived from the data in the columns. Is it possible to override this behaviour?
Ideally we would like every column delivered from the excel source to be string data type, so that data validation can be performed on the data received from the source in a later step in the data flow.
Currently, the Error Output tab can be used to ignore conversion failures - the data in question is then null, and the package will continue to execute. However, we want to know what the original data was so that an appropriate error message can be generated for that row.
According to this blog post, the problem is that the SSIS Excel driver determines the data type for each column based on reading values of the first 8 rows:
If the top 8 records contain equal number of numeric and character types – then the priority is numeric
If the majority of top 8 records are numeric then it assigns the data type as numeric and all character values are read as NULLs
If the majority of top 8 records are of character type then it assigns the data type as string and all numeric values are read as
NULLs
The post outlines two things you can do to fix this:
First, add IMEX=1 to the end of your Excel driver connection string. This will allow Excel to read the values as Unicode. However, this is not sufficient if the data in the first 8 rows are numeric.
In the registry, change the value for HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Nod\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows to 0. This will ensure that the driver looks at all the rows to determine the data type for the column.
Yes, you can. Just go into the output column list on the Excel source and set the type for each of the columns.
To get to the input columns list right click on the Excel source, select 'Show Advanced Editor', click the tab labeled 'Input and Output Properties'.
A potentially better solution is to use the derived column component where you can actually build "new" columns for each column in Excel. This has the benefits of
You have more control over what you convert to.
You can put in rules that control the change (i.e. if null give me an empty string, but if there is data then give me the data as a string)
Your data source is not tied directly to the rest of the process (i.e. you can change the source and the only place you will need to do work is in the derived column)
If your Excel file contains a number in the column in question in the first row of data, it seems that the SSIS engine will reset the type to a numeric type. It kept resetting mine. I went into my Excel file and changed the numbers to "Numbers stored as text" by placing a single quote in front of them. They are now read as text.
I also noticed that SSIS uses the first row to IGNORE what the programmer has indicated is the actual type of the data (I even told Excel to format the entire column as TEXT, but SSIS still used the data, which was a bunch of digits), and reset it. Once I fixed that by putting a single-quote in my Excel file in front of the number in the first row of data, I thought it would get it right, but no, there is additional work.
In fact, even though the SSIS External DataSource Column now has the type DT_WSTR, it will still read 43567192 as 4.35671E+007. So you have to go back into your Excel file and put single quotes in front of all the numbers.
Pretty LAME, Microsoft! But there's your solution. I have no idea what to do if the Excel file is not under your control.
I was looking for a solution for the similar issue, but didn't find anything on the internet. Although most of the found solutions work at design time, they don't work when you want to automate your SSIS package.
I resolved the issue and made it work by changing the properties of "Excel Source". By default the AccessMode property is set to OpenRowSet. If you change it to SQL Command, you can write your own SQL to convert any column as you wish.
For me SSIS was treating the NDCCode column as float, but I needed it as a string and so I used following SQL:
Select [Site], Cstr([NDCCode]) as NDCCode From [Sheet1$]
Excel source is SSIS behaves crazy. SSIS determines the type of data in a particualr column by reading first 10 rows.. hence the issue. If you have a text column with null values in first 10 roes, SSIS takes the data type as Int. With a bit of struggle, here is a workaround
Insert a dummy row (preferrably first row) in the worksheet. I prefer doing this thru a Script task, you may consider using some service to preprocess the file before SSIS connects to it
With the duummy row, you are sure that the datatypes will be set as you need
Read the data using Excel source and filter out the dummy row before you take it for further processing.
I know it is a bit shabby, but it works :)
I could fix this issue. while creating the SSIS package, I manually changed the specific column to text (Open the excel file select the column, right click on column, select format cells, in number tab select Text and save the excel).
Now create the SSIS package and test it. It works. Now try to use the excel file where this column was not set as text.
It worked for me and I could execute the package successfully.
This should be resolved simply, just untick the box "Frist row as column names" and all data will be collected as text data type. Only downside of this choice is that you have to manage the columns names from the auto names (column 1, 2 etc) and handle the first row which contains the column names.
I had trouble implementing the solution here - I could follow the instructions, but it only gave new errors.
I solved my conversion issues by using a Data Conversion entity. This can be found on the SSIS Toolbox under Data Flow Transformations. I placed the Data Conversion between my Excel Source and OLE DB Destination, linked Excel to Data C, Data C to OLE DB, double clicked Data C to bring up a list of the data columns. Gave the problem column a new Alias, and changed the Data Type column.
Lastly, in the Mappings of the OLE DB Destination, use the Alias column name, rather than the original Excel column name. Job done.
You can use a Data Conversion component to convert to the desired data types.