I give my client a template that they are supposed to populate and then they upload the spreadsheet and I read the file with cfspreadsheet in order to copy the data into a database table.
Pretty easy. The template has only one column in it. The client can not upload a sheet with more than one column in it. This used to work.
So the one column header is ING_CAS but when I read the file in with cfspreadsheet I get COL_2, COL_3, ING_CAS. So not only are the blank cells getting read they are also being given default names because of this attribute headerrow="1".
I'm at a loss here. I keep downloading the template and selecting the extraneous blank rows and columns and deleting them but I have no control over the file once the client gets it.
Is there some strange setting I am missing that will make cfspreadsheet ignore blank cells?
<cfspreadsheet action="read" src="#theFile#" query="SpreadSheetData" headerrow="1">
<cfdump var="#SpreadSheetData#" />
I ended up writing a helper function that stripped out COL_(n) columns.
<cffunction name="CleanExcelQuery" access="public" returntype="query" output="false" hint="Strips out blank column headers picked up on read.">
<cfargument name="SpreadSheetQuery" type="query" required="true" />
<cfset var theColumnHeaders = SpreadSheetQuery.columnList>
<cfset var theNewColumnHeaders = "">
<cfloop list="#theColumnHeaders#" index="h">
<cfif uCase(left(h, 4)) IS NOT "COL_">
<cfset theNewColumnHeaders = ListAppend( theNewColumnHeaders, h )>
</cfif>
</cfloop>
<cfquery name="newSpreadSheetQuery" dbtype="query">
Select #theNewColumnHeaders#
From SpreadSheetQuery
</cfquery>
<cfreturn newSpreadSheetQuery />
</cffunction>
cfspreadsheet only omits cells that are completely blank: no value or format (such as when you select a cell and use "clear all"). If it is picking up "extra" columns, it is because one or more of the cells in that column have a value or a custom cell format. Meaning they are not really "blank".
If you know the column position, you can use the columns attribute to only read only the values in that column. For example, to read column C:
<cfspreadsheet action="read"
src="c:/path/to/file.xls"
columns="3"
headerrow="1"
query="qResult" />
But I am not sure I understand why this is an issue. If you only need one column, simply ignore the other columns in your code. Can you elaborate on why this is causing an issue?
If you know which rows you want to read at all times you can use this:
<cfspreadsheet action="read" src="#path#" query="data" headerrow="1" excludeHeaderRow = "true" columns = "1-5" >
The above code reads columns 1 through 5. You can also use Leigh's solution to read the first 3 columns or you can do something like columns=1,3,6 (If I remember correct) to read from a custom range
The columns part read only the columns you want it to read without jumping around. I used this to read files that come from our clients and usually I get a few columns that are not "blank" due to their format.
You can also check the Cf documentation for cfspreadsheet just to see what other entries the 'column' option supports.
Related
I want to reference all the data in my dynamic table, except for the first two columns. My goal is to return the header of the first column that isn't blank, starting with the third column. I have the formula figured out for everything except the starting with the third column part. Is there an easy way to accomplish this? I'm thinking I might have to just do something like
`=Table[#Data] unless in the range of the first two columns'
Hoping for an easier way though.
EDIT: if my request isn't clear enough, I am looking for a formula that would produce the following exact situation in these circumstances. It must work in a table that can change size without issue, it must ignore the first two columns, it must scan a complete column of data from left to right before moving onto the next column (most of the formulas I've tried would give the result Aug-21 here), and it must return the header in basically any format.
I don't have the time to write up a full answer for this, but you should use the "From Table" button "Get & Transform" section of the data ribbon.
Then, in the query editor window, In the home ribbon, click Manage Reference.To find the position of the first non-blank column will be hard, requiring learning Power-Query language, probably something like clicking the advanced editor and adding steps like
let
Source = #"YourSourceQueryName",
ColumnNames = Table.ColumnNames(Source),
ColumnsToRemove = 2 + List.PositionOf( // PositionOf is zero-based, returning -1 if all are blank
List.Transform(
List.RemoveFirstN( // list of column names except the first two
ColumnNames,
2
),
(columnName) => List.IsEmpty(List.RemoveNulls(Table.Column(myTable, columnName)))
),
false
), // Power query is lazy, so this won't actually look at every column, it will stop when it finds the first column!
ColumnNamesToKeep = List.RemoveFirstN(
ColumnNames,
ColumnsToRemove
),
ReturnTable = if (ColumnsToRemove = 1) then
"All columns were blank!" // PositionOf returned -1!
else
Table.SelectColumns(Source, ColumnsToKeep)
in
ReturnTable
You can now use this in other queries or you can load it to your spreadsheet. Unfortunately power query doesn't refresh live, you have to either explicitly refresh the query or use the "Refresh All" button in the data ribbon.
(I stressed the word "like" because I didn't debug. May contain syntax errors or other issues for you to debug.)
I would like to grab the first rows of all CSV files in a folder. I have read that power query would probably be best.
I have gone to Excel > Data > Get Data > From Folder > OK. That has brought me to a table of all the csvs in the folder. I would like to grab the first row of all of these files. I do not want to import all rows of the tables because it was way too many rows. It is also too many tables to do one by one. Please tell me what I should do next. Thank you!
First image is where I am, Second image is where I would like to be
The approach below should give you a single table, wherein each column contains a given CSV's first row's values. It's not exactly what you've shown in your second image (namely, there are no blank columns in between each column of values), but it might still be okay for you.
You can parse a CSV with Csv.Document function (which should give you a table).
You can get the first row of the table (from the previous step) using:
Table.First and Record.FieldValues
or Table.PromoteHeaders and Table.ColumnNames
(It would make sense to create a custom function to do above the steps for you and then invoke the function for each CSV. See GetFirstRowOfCsv in code below.)
The function above returns a list (containing the CSV's first row's values). Calling the function for all your CSVs should give you a list of lists, which you can then combine into a single table with Table.FromColumns.
Overall, starting from the Folder.Files call, the code looks like:
let
filesInFolder = Folder.Files("C:\Users\"),
GetFirstRowOfCsv = (someFile as binary) as list =>
let
csv = Csv.Document(someFile, [Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.Csv]),
promoted = Table.PromoteHeaders(csv, [PromoteAllScalars=true]),
firstRow = Table.ColumnNames(promoted)
in firstRow,
firstRowExtracted = Table.AddColumn(filesInFolder, "firstRowExtracted", each GetFirstRowOfCsv([Content]), type list),
combined =
let
columns = firstRowExtracted[firstRowExtracted],
headers = List.Transform(firstRowExtracted[Name], each Text.BeforeDelimiter(_, ".csv")),
toTable = Table.FromColumns(columns, headers)
in toTable
in
combined
which gives me:
The null values are because there were more values in the first row of my ActionLinkTemplate.csv than the first rows of the other CSVs.
You will need to change the folder path in the above code to whatever it is on your machine.
In the GUI, you can select the top N row(s) where you choose N. Then you can expand all remaining rows.
Hi I am trying to change to write VBA for excel to clean up data elements that has extra information without impacting the other elements.
I am writing VBA for the first time my table is in the middle of the sheet.
Given Table and Requested Output.
I think your question was not clear in regard to the "steps" that you want to perform on your data (i.e. the exact logic or transformation that needs to be applied).
Based purely on your images and your comment, I make the "steps" to be:
Split any customer IDs in column valueC into multiple rows.
If column valueC does not contain customer IDs (i.e. is blank or contains non-customer ID text), leave it untouched.
My answer uses Power Query instead of VBA. If you are interested in trying it out, in Excel try clicking Data > Get Data > From Other Sources > Blank Query, then click Advanced Editor near the top-left, copy-paste the code below, then click Done.
You might need to change the name of the table in the first line of the code (below), as it was "Table1" for me, but I imagine yours is named something else. Also, the code below is case-sensitive. So if there is no column named exactly valueC, then you will get an error.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
fxProcessSomeText = (textToProcess as any) =>
let
canBeSplit = Text.StartsWith(textToProcess, "### customer id"),
result = if textToProcess is null then null else if canBeSplit then Text.Split(Text.BetweenDelimiters(textToProcess, "### customer id", " ###"), ",") else {textToProcess}
in
result,
invokeFunction = Table.TransformColumns(Source, {{"valueC", fxProcessSomeText}}),
expanded = Table.ExpandListColumn(invokeFunction, "valueC"),
reindex =
let
removeIndex = Table.RemoveColumns(expanded, {"index"}),
addIndex = Table.AddIndexColumn(removeIndex, "index", 1, 1),
moveIndex = Table.ReorderColumns(addIndex, List.Distinct(List.InsertRange(Table.ColumnNames(addIndex), 0, {"index"})))
in
moveIndex
in
reindex
My output table contains more rows than yours. Also, the value in column valueA, row 11 is 1415 for me (it is 1234 in your request output). Not sure if this is a mistake in your example, or if I'm missing some logic.
We have a spreadsheet that gets updated monthly, which queries some data from our server.
The query url looks like this:
http://example.com/?2016-01-31
The returned data is in a json format, like below:
{"CID":"1160","date":"2016-01-31","rate":{"USD":1.22}}
We only need the value of 1.22 from the above and I can get that inserted into the worksheet with no problem.
My questions:
1. How to use a cell value [contain the date] to pass the date parameter [2016-01-31] in the query and displays the result in the cell next to it.
2. There's a long list of dates in a column, can this query be filled down automatically per each date?
3. When I load the query result to the worksheet, it always load in pairs. [taking up two cells, one says "Value", the other contains the value which is "1.22" in my case]. Ideally I would only need "1.22", not the title, can this be removed? [Del won't work, will give you a "Column 1" instead, or you have to hide the entire row which will mess up with the layout].
I know this is a lot to ask but I've tried a lot of search and reading in the last few days and I have to say the M language beats me.
Thanks in advance.
Convert your Web.Contents() request into a function:
let
myFunct = ( param as date ) => let
x = Web.Contents(.... & Date.ToText(date) & ....)
in
x
in
myFunct
Reference your data request function from a new query, include any transformations you need (in this case JSON.Document, table expansions, remove extraneous data. Feel free to delete all the extra data here, including columns that just contain the label 'value'.
(assuming your table of domain values already exists) add a custom column like
=Expand(myFunct( [someparameter] ))
edit: got home and got into my bookmarks. Here is a more detailed reference for what you are looking to do: http://datachix.com/2014/05/22/power-query-functions-some-scenarios/
For a table - Add column where you get data and parse JSON
let
tt=#table(
{"date"},{
{"2017-01-01"},
{"2017-01-02"},
{"2017-01-03"}
}),
add_col = Table.AddColumn(tt, "USD", each Json.Document(Web.Contents("http://example.com/?date="&[date]))[rate][USD])
in
add_col
If you need only one value
Json.Document(Web.Contents("http://example.com/?date="&YOUR_DATE_STRING))[rate][USD]
I have created query using cfspreadsheet. Now I'm wondering if it's possible to convert query to tab delimited text file. This is my code to get the query:
<cfspreadsheet action="read" src="C:\myFiles\Records.xlsx" query="myQuery" headerrow="1">
Here is the list of my records from excel represented in cfquery:
FIRST_NAME LAST_NAME DOB GENDER
1 FIRST_NAME LAST_NAME DOB GENDER
2 Mike Johns 01/12/98 M
3 Helen Johns 2/2/01 F
I would like my text file to look like this if possible:
FIRST_NAME LAST_NAME DOB GENDER
Mike Johns 01/12/98 M
Helen Johns 2/2/01 F
Tab delimiter between the values and \n to create newline. I have tried .csv but I could not get file organized as I showed above. Also if there is any other way to convert .xlsx file to .txt please let me know. I was looking xp_cmdshell commands but there is nothing that would be helpful in my case.
Here si the code that I used to get .csv file:
<cfspreadsheet action="read" format="csv" src="C:\myFiles\Records.xlsx" name="myCsv">
Then I used FileWrite() to get .txt file:
<cfscript>
FileWrite("C:\myFiles\Records.txt", "#myCsv#");
</cfscript>
Code above gave me tab delimited text file but one problem occured, if value in the field was empty those columns disappeared. For example if I did not have value in my GENDER column, that column was not created.
Mike Johns 01/12/98
You may see this literally as a question of converting a query result-set into a tab-delimited CSV file. That is, without the involvement of cfspreadsheet. You will get an answer by slightly modifying the answer I gave to a similar question from you:
<cfspreadsheet
action = "read"
src="C:\myFiles\Records.xlsx"
query="excelquery"
sheet="1">
<!--- Create CSV file in current directory--->
<cffile action="write" file="#expandpath('result.csv')#" output="">
<cfset columns = arraynew(1)>
<cfset columns = listToArray(excelquery.ColumnList)>
<cfoutput query="excelquery">
<cfset rowList = "">
<cfloop from="1" to="#arraylen(columns)#" index="n">
<cfset colName = columns[n]>
<cfset cellData = evaluate("#colName#[currentrow]")>
<!--- Tab-separated row data --->
<cfset rowList = listAppend(rowList,cellData,chr(9))>
</cfloop>
<!--- Place carriage-return at end of row --->
<cfset rowList = rowList & '<br>'>
<!--- Append row to CSV file --->
<cffile action="append" file="#expandpath('result.csv')#" output="#rowList#" >
</cfoutput>