How to create tab delimited text file from cfquery? - excel

I have created query using cfspreadsheet. Now I'm wondering if it's possible to convert query to tab delimited text file. This is my code to get the query:
<cfspreadsheet action="read" src="C:\myFiles\Records.xlsx" query="myQuery" headerrow="1">
Here is the list of my records from excel represented in cfquery:
FIRST_NAME LAST_NAME DOB GENDER
1 FIRST_NAME LAST_NAME DOB GENDER
2 Mike Johns 01/12/98 M
3 Helen Johns 2/2/01 F
I would like my text file to look like this if possible:
FIRST_NAME LAST_NAME DOB GENDER
Mike Johns 01/12/98 M
Helen Johns 2/2/01 F
Tab delimiter between the values and \n to create newline. I have tried .csv but I could not get file organized as I showed above. Also if there is any other way to convert .xlsx file to .txt please let me know. I was looking xp_cmdshell commands but there is nothing that would be helpful in my case.
Here si the code that I used to get .csv file:
<cfspreadsheet action="read" format="csv" src="C:\myFiles\Records.xlsx" name="myCsv">
Then I used FileWrite() to get .txt file:
<cfscript>
FileWrite("C:\myFiles\Records.txt", "#myCsv#");
</cfscript>
Code above gave me tab delimited text file but one problem occured, if value in the field was empty those columns disappeared. For example if I did not have value in my GENDER column, that column was not created.
Mike Johns 01/12/98

You may see this literally as a question of converting a query result-set into a tab-delimited CSV file. That is, without the involvement of cfspreadsheet. You will get an answer by slightly modifying the answer I gave to a similar question from you:
<cfspreadsheet
action = "read"
src="C:\myFiles\Records.xlsx"
query="excelquery"
sheet="1">
<!--- Create CSV file in current directory--->
<cffile action="write" file="#expandpath('result.csv')#" output="">
<cfset columns = arraynew(1)>
<cfset columns = listToArray(excelquery.ColumnList)>
<cfoutput query="excelquery">
<cfset rowList = "">
<cfloop from="1" to="#arraylen(columns)#" index="n">
<cfset colName = columns[n]>
<cfset cellData = evaluate("#colName#[currentrow]")>
<!--- Tab-separated row data --->
<cfset rowList = listAppend(rowList,cellData,chr(9))>
</cfloop>
<!--- Place carriage-return at end of row --->
<cfset rowList = rowList & '<br>'>
<!--- Append row to CSV file --->
<cffile action="append" file="#expandpath('result.csv')#" output="#rowList#" >
</cfoutput>

Related

Expand variable number of text tags from XML data source in Excel get&transform

I'm trying to use Excel's get&transform functionality (previously known as powerquery) to import an XML data source. The data source has a list of b tags, each with a variable number of d tags in a c2 child, such as the following:
<a>
<b>
<c1>foo</c1>
<c2>
<d>bar</d>
</c2>
</b>
<b>
<c1>fuz</c1>
<c2>
<d>baz</d>
<d>quz</d>
</c2>
</b>
</a>
When I import this data with the following query the data type for column c2.d is different for the two different rows representing the b items, for the first row it is a general spreadsheet cell type, for the second row it is a Table type.
let
Source = Xml.Tables(File.Contents("C:\Localdata\excel-powerquery-test2.xml")),
Table0 = Source{0}[Table],
#"Changed Type" = Table.TransformColumnTypes(Table0,{{"c1", type text}}),
#"Expanded c2" = Table.ExpandTableColumn(#"Changed Type", "c2", {"d"}, {"c2.d"})
in
#"Expanded c2"
It seems that for the first row it automatically converts the d tag into a simple spreadsheet cell as there is only one and it only contains text. However for the second row it sees there are two d tags and hence keeps it as a table. The problem now is that I can neither load the data as is as the Table in the second row is loaded into the spreadsheet as the literal string "Table" leaving me without the actual data, nor can I further expand the Table using Table.ExpandTableColumn as it (rightly) complains that bar in the first row is not a table.
I presume the automatic conversion of a single tag containing text to a simple cell rather than a table happens either in the Xml.Tables or ExpandTableColumn functions. The tooltip for Xml.Tables shows that it has an options parameter, unfortunately the documentation for Xml.Tables does not give any details on this options parameter.
How can I get this second row expanded out to two rows, one each for the two d tags contained in the second b tag having the same "fuz" string in the first column? Such an expansion works fine if the contents of the d tags are further XML tags, but apparently not if the d tags only contain text.
Let's add a step to make sure everything is at the same level:
let
Source = Xml.Tables(File.Contents("C:\Localdata\excel-powerquery-test2.xml")),
Table0 = Source{0}[Table],
Expandc2 = Table.ExpandTableColumn(Table0, "c2", {"d"}, {"d"}),
ToLists = Table.TransformColumns(Expandc2,
{"d", each if _ is table then Table.ToList(_) else {_}}),
ExpandLists = Table.ExpandListColumn(ToLists, "d")
in
ExpandLists
The ToLists step turns this:
Into a more consistent list format:
c1 d
-----------------------
foo {"bar"}
fuz {"baz", "quz"}
Then you can expand to rows without mixed data types.

Creating csv file from pandas dataframe as per the sheet names in file

I have a excel file which has multiple sheets(6) in it. I am writing a python script to convert the each individual sheet into a csv file.
My input file looks like this and this is for example sheetname = class5
Name ID
Mark 11
Tom 22
Jane 33
like this I have multiple sheets in the excel
I need to convert them in csv file having just 'Name' and class like this:
Mark,class5
Tom,class5
Jane,class5
This one one sheet like this I have multiple sheets so what I am using is converting every sheet in dataframe like this
xls = pd.Excelfile('path_of_file'.xlsx)
df1= pd.read_excel(xlsx, 'Sheet1')
df2 = pd.read_excel(xlsx, 'Sheet2')
df3 = pd.read_excel(xlsx, 'Sheet3')
How can I make csv file called 'class5'.csv with output as above and same for every sheet such as class6,7,8?
So, assuming from your question what you want is the contents of each sheet to be saved to a different csv, where the csv has the name column, and another column containing the name of the sheet it came from, without a header.
If that's what you're after, you could do:
xls = pd.read_excel('path_of_file',sheet_name = None)
for sheet_name, df in xls.items():
df['sheet'] = sheet_name
df[['Name','sheet']].to_csv(f'{sheet_name}.csv', header=False)
key point is the sheet_name argument of read_excel. as the commentor on your question states, leave this as None and you will get a dictionary you can iterate through

Creating csv file from pandas dataframe as per the sheet names in file from condition over a column

I have a excel file which has multiple sheets(6) in it. I am writing a python script to convert the each individual sheet into a csv file.
My input file looks like this and this is for example sheetname = class5
Name ID Result
Mark 11 Pass
Tom 22 Fail
Jane 33 Pass
Colin 44 Not Appeared
like this I have multiple sheets in the excel
I need to convert them in csv file having just 'Name' and sheetname for only 'pass' and 'fail' candidates and not for 'not appeared'candidates like this:
csv file to be created class5.csv which has content just:
Mark,class5
Tom,class5
Jane,class5
Note: No 'colin' as he did not appear
This one one sheet like this I have multiple sheets so I wrote the script which reads the excel and creates the Name,class csv file but I am not able to put filter from the column 'Result' for not displaying 'Not appeared' as now for this code I have all the Pass, fail and not appeared
xls = pd.read_excel('path_of_file',sheet_name = None)
for sheet_name, df in xls.items():
df['sheet'] = sheet_name
df[['Name','sheet']].to_csv(f'{sheet_name}.csv', header=False)
Need to know how I can add filter to my Result column to remove just 'Not appeared' students name?
You can filter by condition and also by column name Name with boolean indexing and DataFrame.loc:
xls = pd.read_excel('path_of_file',sheet_name = None)
for sheet_name, df in xls.items():
df = df.loc[df["Result"] != 'Not Appeared', ['Name']].copy()
df['sheet'] = sheet_name
df.to_csv(f'{sheet_name}.csv', header=False)

Cleaning Excel Table using VBA without impacting the entire table and formatting

Hi I am trying to change to write VBA for excel to clean up data elements that has extra information without impacting the other elements.
I am writing VBA for the first time my table is in the middle of the sheet.
Given Table and Requested Output.
I think your question was not clear in regard to the "steps" that you want to perform on your data (i.e. the exact logic or transformation that needs to be applied).
Based purely on your images and your comment, I make the "steps" to be:
Split any customer IDs in column valueC into multiple rows.
If column valueC does not contain customer IDs (i.e. is blank or contains non-customer ID text), leave it untouched.
My answer uses Power Query instead of VBA. If you are interested in trying it out, in Excel try clicking Data > Get Data > From Other Sources > Blank Query, then click Advanced Editor near the top-left, copy-paste the code below, then click Done.
You might need to change the name of the table in the first line of the code (below), as it was "Table1" for me, but I imagine yours is named something else. Also, the code below is case-sensitive. So if there is no column named exactly valueC, then you will get an error.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
fxProcessSomeText = (textToProcess as any) =>
let
canBeSplit = Text.StartsWith(textToProcess, "### customer id"),
result = if textToProcess is null then null else if canBeSplit then Text.Split(Text.BetweenDelimiters(textToProcess, "### customer id", " ###"), ",") else {textToProcess}
in
result,
invokeFunction = Table.TransformColumns(Source, {{"valueC", fxProcessSomeText}}),
expanded = Table.ExpandListColumn(invokeFunction, "valueC"),
reindex =
let
removeIndex = Table.RemoveColumns(expanded, {"index"}),
addIndex = Table.AddIndexColumn(removeIndex, "index", 1, 1),
moveIndex = Table.ReorderColumns(addIndex, List.Distinct(List.InsertRange(Table.ColumnNames(addIndex), 0, {"index"})))
in
moveIndex
in
reindex
My output table contains more rows than yours. Also, the value in column valueA, row 11 is 1415 for me (it is 1234 in your request output). Not sure if this is a mistake in your example, or if I'm missing some logic.

ColdFusion CFSpreadsheet reads empty cells

I give my client a template that they are supposed to populate and then they upload the spreadsheet and I read the file with cfspreadsheet in order to copy the data into a database table.
Pretty easy. The template has only one column in it. The client can not upload a sheet with more than one column in it. This used to work.
So the one column header is ING_CAS but when I read the file in with cfspreadsheet I get COL_2, COL_3, ING_CAS. So not only are the blank cells getting read they are also being given default names because of this attribute headerrow="1".
I'm at a loss here. I keep downloading the template and selecting the extraneous blank rows and columns and deleting them but I have no control over the file once the client gets it.
Is there some strange setting I am missing that will make cfspreadsheet ignore blank cells?
<cfspreadsheet action="read" src="#theFile#" query="SpreadSheetData" headerrow="1">
<cfdump var="#SpreadSheetData#" />
I ended up writing a helper function that stripped out COL_(n) columns.
<cffunction name="CleanExcelQuery" access="public" returntype="query" output="false" hint="Strips out blank column headers picked up on read.">
<cfargument name="SpreadSheetQuery" type="query" required="true" />
<cfset var theColumnHeaders = SpreadSheetQuery.columnList>
<cfset var theNewColumnHeaders = "">
<cfloop list="#theColumnHeaders#" index="h">
<cfif uCase(left(h, 4)) IS NOT "COL_">
<cfset theNewColumnHeaders = ListAppend( theNewColumnHeaders, h )>
</cfif>
</cfloop>
<cfquery name="newSpreadSheetQuery" dbtype="query">
Select #theNewColumnHeaders#
From SpreadSheetQuery
</cfquery>
<cfreturn newSpreadSheetQuery />
</cffunction>
cfspreadsheet only omits cells that are completely blank: no value or format (such as when you select a cell and use "clear all"). If it is picking up "extra" columns, it is because one or more of the cells in that column have a value or a custom cell format. Meaning they are not really "blank".
If you know the column position, you can use the columns attribute to only read only the values in that column. For example, to read column C:
<cfspreadsheet action="read"
src="c:/path/to/file.xls"
columns="3"
headerrow="1"
query="qResult" />
But I am not sure I understand why this is an issue. If you only need one column, simply ignore the other columns in your code. Can you elaborate on why this is causing an issue?
If you know which rows you want to read at all times you can use this:
<cfspreadsheet action="read" src="#path#" query="data" headerrow="1" excludeHeaderRow = "true" columns = "1-5" >
The above code reads columns 1 through 5. You can also use Leigh's solution to read the first 3 columns or you can do something like columns=1,3,6 (If I remember correct) to read from a custom range
The columns part read only the columns you want it to read without jumping around. I used this to read files that come from our clients and usually I get a few columns that are not "blank" due to their format.
You can also check the Cf documentation for cfspreadsheet just to see what other entries the 'column' option supports.

Resources