Python3 - Delete Top Rows from CSV before headers - python-3.x

I have a csv file that is a vulnerability report and at the top of the csv is 4 rows of with one to two columns of text about the report before the headers. I am trying to write a script that will delete these rows so that I can combine multiple files for calculations, reporting etc.
I have tried using pandas to convert the csv into a dataframe and then delete the rows, but because these top rows are not headers the dataframe conversion fails. Any advice on how I can delete these top four rows from the csv? Thanks!

Use the skiprows parameter of read_csv. For example:
# Skip 2 rows from top in csv and initialize a dataframe
usersDf = pd.read_csv('users.csv', skiprows=2)
# Skip rows at specific index
usersDf = pd.read_csv('users.csv', skiprows=[0,2,5])

Related

Pandas - concatanation multiple excel files with different names but the same data type

I have about 50 excel sheet with .'xlsb' extension. I'd like to concatanate a specific worksheet into pandas DataFrame (all worksheets names are the same). The problem I have is that the names of columns are not exactly the same in each worksheet. I wrote a code using pandas but the way it works is that it concats all values into the same column in pandas data frame but based on the name of column. So for example: sometimes I have column called: FgsNr and sometimes FgNr - the datatype and the meaning in both columns are exactly the same and I would like to have them in the same column in Data Frame but pandas creates to separate columns in data frame and stack together only those values that are listed in column with the same name.
files = glob(r'C:\Users\Folder\*xlsb')
for file in files:
Datafile = pd.concat(pd.read_excel(file, engine='pyxlsb', sheet_name='Sheet1', usecols='A:F', header=0) for file in files)
How could I correct the code so it copied and concatanted all values based on columns from excel at the same time ignoring the names of columns ?
When concatenating multiple dataframes with the same format, you can use the below snippet for speed and efficiency.
The basic logic is that you put them into a list, and then concatenate at the final stage.
files = glob(r'C:\Users\Folder\*xlsb')
dfs = []
for file in files:
df = pd.read_excel(file, engine='pyxlsb', sheet_name='Sheet1', usecols='A:F', header=0)
dfs.append(df)
large_df = pd.concat(dfs, ignore_index=True)
Also refer to the below :
Creating an empty Pandas DataFrame, then filling it?

Pandas read file incomplete column number comma separated file

Hello I have CSV file which has no header while reading I am getting error the data CSV look as below
School1,std1,std2
Schoo2,std3,std4,std5,std6,std7
School4,std1,std6
School6,std9,std10
Because of incomplete column not able read
df=of.read_csv("test.txt",sep=",", header=None)
Can any one suggest me how can I read this file
If you know the number of columns of your largest row, you can just create a list to be the header, as for your example, index 1 has 6 columns, so:
col_names = [1,2,3,4,5,6]
df = pd.read_csv("test.txt",sep=",", header=None, names=col_names)
If you're handling with a huge amount of rows and don't know the max number of columns, you can check this topic for a better solution:
import csv with different number of columns per row using Pandas

Write dataframe and strings in one single csv file

I want to export a dataframe (500 rows,2 columns) from python to a CSV file.
However, I need to ensure that 1st 20 rows have some text/strings written and then the dataframe(500 rows,2 columns) should start from the 21st row onwards.
I referred to the following link: Skip first rows when writing csv (pandas.DataFrame.to_csv) . However, it does not satisfy my requirements.
Can somebody please let me know how do we do this?
Get first 20 rows and save it to another dataframe
Check if there are any null values
If not any null values, remove first 20 rows
Save df as a csv file
df2 = df.head(20)
df2 = df2.isnull().values.any()
if not df2:
df = df[10:]
df.to_csv('updated.csv')

How can I split the headers of data and the data itself to their own respective columns?

I have a file that has 312759 rows but only one column with different header names in the one row, so I need to separate that rows with their own values and columns. So the data frame has 312759 rows × 1 columns but I need 312759 X approx. 40 headers/cols. I am new python and to stackoverflow community so any help would be appreciated.
read the data using pandas
import pandas as pd
read = pd.read_csv('output.csv')
read.drop(read.head(5).index, inplace=True)
then save it back as a .csv file
read.to_csv("output2.csv")

what is the best way to add a column to a large data set on a csv file using python?

i have a data (3000 rows and 20000 columns) i need to add a column with a header of 'class' and the all 3000 rows contain the same word which is here 'Big' how i can do that using python? i tried to do it manually but the file is too large for excel it can't be loaded completely.
I know it may seems easy but i'm new to python tried several codes but non of them gave the needed result.
Use Pandas module:
import pandas as pd
df = pd.read_csv(r'/path/to/file.csv').assign(Class='Big')
df.to_csv('/path/to/new_file.csv', index=False)
or as a one-liner:
pd.read_csv(r'/path/to/file.csv').assign(Class='Big') \
.to_csv(r'/path/to/new_file.csv', index=False)
UPDATE:
I have 9 files as the one you just helped me to add a column to, each
one represent a class's attributes. can you tell me how i can combine
these files in one csv file, that will be 27000 rows and 30000
columns?
files = ['file1.csv','file2.csv', ...]
df = pd.concat([pd.read_csv(f) for f in files], ignore_index=True)

Resources