This may be a relatively amateur question, but how do I find the last row of a pandas dataframe containing data in python?
I have a poorly structured spreadsheet I am trying to read in and manipulate, but the doc has an excessive number of extra cells below the end of the actual data.
Related
Given a pdf(attached) with table row splitted across multiple pages with page break in between. I am trying to extract tabular data in a csv from this pdf using pdfplumber, but am getting this data in separate rows in a csv. Basically I would like to get this data in a single row.
With pdfplumber, is there a way to identify if the row has a horizontal border or not? If this information is available, it could help in merging the rows.
In the attached image, grey colour coded are the cells content.
So I've successfully imported a csv file as a data frame df
I need to compare successive rows to see which row in the 'Mass' column has the closest value to 20 and then save the entire content of the closest row in a list/any other format.
Any ideas on how I should go about doing this?
I am working with incomplete historical data and am using Python to select specific information from TXT files (e.g. via Regex) and write them to .csv tables.
Is it possible to write a certain item or a list of items to new rows in a particular column in an existing CSV file?
I can add individual strings or lists as consecutive new rows or columns to an existing table, but very often, I am only filling in "missing information".
It would be great to find a way to select the next row in the "n"-th column of a CSV table, or to select the column by name / column heading.
Have you considered using Pandas?
It has convenient methods for reading and writing csv-files. Working with columns, rows, and cells is quite intuitive.
It takes a little time to understand the basics of Pandas. but if you plan to work with csv and csv-like data more than once, it is worth it.
I did an experiment, where I recorded two columns of data (column A and column B). However my experiment needs calculated data. My goal is to create a data frame myself (as you do in excel), but I want to incorporate the formulas I need to get the calculated values for my data.
Note: I'm NOT using an existing data frame. I want to create my own data using arrays. Exactly as you do in excel, you put the values in a table and then plot it.
For example:
Column A Column B Column C
1 1.1 Column A*constant
Should I create functions for each formula I need? But still I don't know how to incorporate these functions in the arrays.
I'm trying to do literally what you do in excel (create a table with values and incorporate the formula in each cell), but I'm not sure what will be the easiest way to do this for a beginner in Python.
pd im using jupyter notebooks for this with python 3
In general, when we import an excel file to pandas as a data frame, the order of the rows is different from the order of the rows in the excel sheet. I want the rows of the data frame to be the same as the rows in that of the excel sheet.
Without looking at any code my guess is you have a parsing issue with pandas. You can try
arx=pd.ExcelFile("yourExcel.xlsx);
//specify your sheets here
parsed = pd.io.excel.ExcelFile.parse(arx, "Sheet1");
If you can show your code, I may be able to help out a bit more
pandas parse
Not sure what you are trying to do but when I had the same issue I used the df.columns to get the order that is in the excel sheet. You can now put it in a list with the correct order.
workbook = ExcelFile('myfile.xlsx')
df = workbook.parse('sheet1')
df_index = list(df.columns) #puts the col index in a list with correct order
lets say you know the column header and want the column number.
df_index.index('column header')
Hope this helped because it really helped me.