Like in sql;
Select Name as EmployeeName,
Age as EmployeeAge
From tableA.
How can I write to an excel with different column name?
df.to_excel(writer, columns=['Date' as TimeStamp,'Id' as DeliveryId],sheet_name='sales')
I think not possible, need rename before DataFrame.to_excel:
d = {'Date':'TimeStamp','Id':'DeliveryId'}
df.rename(columns=d).to_excel(writer, sheet_name='sales')
If multiple columns and want to specify them by list with parameter columns:
columns : sequence or list of str, optional
Columns to write.
d = {'Date':'TimeStamp','Id':'DeliveryId'}
df.rename(columns=d).to_excel(writer, columns=["TimeStamp","DeliveryId"], sheet_name='sales')
Related
I have dataset which i want to extract a column of values into a list, how to achieve that ?
I have tried :
List<Row> rows = dF.select("col1").collectAsList();
But then how to iterate over values of col1 ?
Thanks
You can iterate over a list of rows the same way you can iterate over any list.
For eg: Assuming you want to access the first column (string type) of all the rows, you can use the following snippet
import scala.collection.JavaConversions.asScalaBuffer
val rows = df.select("col1").collectAsList();
rows.map(r => r.getAs[String](0))
This is how you can iterate over the list selected from a column, but it is not recommended if the column is big to fit in the driver node
df.select("col1").collectAsList()
.stream().forEach(row -> System.out.println(row.getString(0)));
There are other functions for other datatype to get the value from Row like getString, getLong, getInt etc
Briefly I am doing this steps.
tables = camelot.read_pdf(doc_file)
tables[0].df
I am using tables[0].df.columns to get column names from the extracted table.
But it does not give the column names.
Camelot extracted tables have no alphabetic column names.
tables[0].df.columns returns, for example, for three columns table:
RangeIndex(start=0, stop=3, step=1)
Instead, you can try to read the first row and get a list from it: tables[0].df.iloc[0].tolist().
The output could be:
['column1', 'column2', 'column3']
I have a SharePoint list as a datasource in Power Query.
It has a "AttachmentFiles" column, that is a table, in that table i want the values from the column "ServerRelativeURL".
I want to split that column so each value in "ServerRelativeURL"gets its own column.
I can get the values if i use the expand table function, but it will split it into multiple rows, I want to keep it in one row.
I only want one row per unique ID.
Example:
I can live with a fixed number of columns as there are usually no more than 3 attachments per ID.
I'm thinking that I can add a custom column that refers to "AttachmentFiles ServerRelativeURL Value(1)" but I don't know how.
Can anybody help?
Try this code:
let
fn = (x)=> {x, #table({"ServerRelativeUrl"},List.FirstN(List.Zip({{"a".."z"}}), x*2))},
Source = #table({"id", "AttachmentFiles"},{fn(2),fn(3),fn(1)}),
replace = Table.ReplaceValue(Source,0,0,(a,b,c)=>a[ServerRelativeUrl],{"AttachmentFiles"}),
cols = List.Transform({1..List.Max(List.Transform(replace[AttachmentFiles], List.Count))}, each "url"&Text.From(_)),
split = Table.SplitColumn(replace, "AttachmentFiles", (x)=>List.Transform({0..List.Count(x)-1}, each x{_}), cols)
in
split
I manged to solve it myself.
I added 3 custom columns like this
CustomColumn1: [AttachmentFiles]{0}
CustomColumn2: [AttachmentFiles]{1}
CustomColumn3: [AttachmentFiles]{2}
And expanded them with only the "ServerRelativeURL" selected.
It would be nice to have a dynamic solution. But this will work fine for now.
i need to plot the state called 'kerala' in the column 'state/unionterritory' and 'confirmed' to create a lineplot.
so far I have written till
1sns.lineplot(x=my_data['state/unionterritory'],y=my_data['confirmed'])
[https://www.kaggle.com/essentialguy/exercise-final-project]
this is the dataframe.head() , see the column name
I assume you want to get a column state/unionterritory from your DataFrame and filter it so it contains only Kerala state:
my_data_kerala = my_data[my_data['state/unionterritory'] == 'Kerala']['state/unionterritory']
I am trying to split a copy off of a Pandas dataframe starting after a certain column by header name.
So far, I've been able to manipulate the column headers or indexes according to a set number of known columns, like below. However, the number of columns will change, and I want to still extract every column that happens after.
In the below example, say I want to grab all columns after 'Tail' even if the 'Body' columns goes to column X. So the below sample with X number of Body columns:
df = pd.DataFrame({'Intro1': ['blah'],
'Intro2': ['blah'],'Intro3': ['blah'],'Body1': ['blah'],'Body2': ['blah'],'Body3': ['blah'],'Body4': ['blah'], ... 'BodyX': ['blah'],'Tail': ['blah'],'OtherTail': ['blah'],'StillAnotherTail': ['blah'],})
Should produce a copy of the dataframe as:
dftail = pd.DataFrame({'Tail': ['blah'],'OtherTail': ['blah'],'StillAnotherTail': ['blah'],})
Ideally I'd like to find a way to combine the two techiques below so that the column starts at 'Tail' and goes to the end of the dataframe:
dftail = [col for col in df if col.startswith('Tail')]
dftail = df.iloc[:, 164:] # column number (164) will change based on 'Tail' index number
How about this:
df_tail = df.iloc[:, list(df.columns).index("Tail"):]
df_tail then prints out:
Tail OtherTail StillAnotherTail
0 blah blah blah