Change one element of column heading in CSV using Pandas - python-3.x

I have created a CSV file which looks like this:
RigName,Date,DrillingMiles,TrippingMiles,CasingMiles,LinerMiles,JarringMiles,TotalMiles,Comments
0,08 July 2021,19.21,63.05,43.16,45.41,8.52,0,"Tested all totals. Edge cases for multiple clicks.
"
1,09 July 2021,19.21,63.05,43.16,45.41,8.52,0,"Test entry#2.
"
I wish to change the 'RigName' to something the user inputs. I have tried various ways of changing the word 'RigName' to user input. One of them is this:
df= pd.read_csv('ton_miles_record.csv')
user_input = 'Rig805'
df.columns = df.columns.str.replace('RigName', user_input)
df.to_csv('new_csv.csv', header=True, index=False)
However no matter what I do, the result in the csv file always comes to this:
Unnamed:0,Date,DrillingMiles,TrippingMiles,CasingMiles,LinerMiles,JarringMiles,TotalMiles,Comments
Why am I getting 'Unnamed: 0' instead of the user input value?
Also, is there a way to change 'RigName' to something else by calling its position? To make multiple changes to any word in its position in future?

Zubin, you would need to change the column name be looking at the columns as a list. The code below should do the trick. Also, the same code shows how to access the column by position...
import pandas as pd
df= pd.read_csv('ton_miles_record.csv')
user_input = 'Rig805'
df.columns.values[0] = user_input
df.to_csv('new_csv.csv', header=True, index=False)

After 3 hours of trial and error (and a lot of searching in vain), I solved it by doing this:
df= pd.read_csv('ton_miles_record.csv')
user_input = 'SD555'
df.rename(columns={ df.columns[1]: user_input}, inplace=True)
df.to_csv('new_csv.csv', index=False)
I hope this helps someone else struggling as I was.

Related

Use pandas to make sense of malfomed Excel data

My job has me doing some data analysis and the exported spreadsheet that is given to me (the ONLY way able to be given) has data that looks like this:
But what I need it to look like, ideally, would be something like this:
I've tried some other codes and to be honest I've made a mangled mess and got rid of it as I only succeeded in jumbling the data. I've done several other pandas projects where I was able to sort and make sense of the data, but it had the same structure and was easier to do. At this point I just dont feel I have the logical part of how to go about fixing the data. I would do it manually but it's over 48k lines. Any help you may be able to provide would be greatly appreciated.
Edit: This is what the data looks like if we 'delete blanks and shift-up'
Try this :
import pandas as pd
df = pd.read_excel('your_excel_file.xlsx')
for i, col in enumerate(df.columns[-4:]):
if col == 'Subscription Name':
df[col] = df[col].shift(-1)
elif col == 'Resource Group':
df[col] = df[col].shift(-2)
else:
df[col] = df[col].shift(-3)
out = df.ffill().drop_duplicates().reset_index(drop=True)
>>> display(out)
Edit :
You can also use :
out = df[df['Resource Name'].notna()].ffill()
Or for better efficiency (as per #Vladimir Fokow) :
out = df.dropna(how='all').ffill()
Instead of :
out = df.ffill().drop_duplicates().reset_index(drop=True)

Pandas text file to CSV

Trying to output only certain columns of a .txt file to .csv
PANDAS documentation and this answer got me this far:
import pandas as pd
read_file = pd.read_csv (r'death.txt')
header = ['County', 'Crude Rate']
read_file.to_csv (r'death.csv', columns=header, index=None)
But I receive an error:
KeyError: "None of [Index(['County', 'Crude Rate'], dtype='object')] are in the [columns]"
This is confusing as the .txt file I'm using is the following for hundreds of rows (from a government database):
"Notes" "County" "County Code" Deaths Population Crude Rate
"Autauga County, AL" "01001" 7893 918492 859.3
"Baldwin County, AL" "01003" 30292 3102984 976.2
"Barbour County, AL" "01005" 5197 499262 1040.9
I notice the first three columns have titles enclosed in quotes, and the last three do not. I have experimented with including quotes in my columns sequence (e.g. ""County"") but no luck. Based upon the error, I realize there is some discrepancy between column titles as I have typed them and how they are read in this script.
Any help in understanding this discrepancy is appreciated.
You are reading the file with default options
read_file = pd.read_csv (r'death.txt')
Change it to
read_file = pd.read_csv (r'death.txt', sep="\t")
Check this
df.columns
Index(['Notes', 'County', 'County Code', 'Deaths', 'Population', 'Crude Rate'], dtype='object')
and the....
You should filter your columns first, an then save.
Now, if your columns are ok:
read_file[['County', 'Crude Rate']].to_csv (r'death.csv', index=None)

How to save tuples output form for loop to DataFrame Python

I have some data 33k rows x 57 columns.
In some columns there is a data which I want to translate with dictionary.
I have done translation, but now I want to write back translated data to my data set.
I have problem with saving tuples output from for loop.
I am using tuples for creating good translation. .join and .append is not working in my case. I was trying in many case but without any success.
Looking for any advice.
data = pd.read_csv(filepath, engine="python", sep=";", keep_default_na=False)
for index, row in data.iterrows():
row["translated"] = (tuple(slownik.get(znak) for znak in row["1st_service"]))
I just want to see in print(data["1st_service"] a translated data not the previous one before for loop.
First of all, if your csv doesn't already have a 'translated' column, you'll have to add it:
import numpy as np
data['translated'] = np.nan
The problem is the row object you're trying to write to is only a view of the dataframe, it's not the dataframe itself. Plus you're missing square brackets for your list comprehension, if I'm understanding what you're doing. So change your last line to:
data.loc[index, "translated"] = tuple([slownik.get(znak) for znak in row["1st_service"]])
and you'll get a tuple written into that one cell.
In future, posting the exact error message you're getting is very helpful!
I have manage it, below working code:
data = pd.read_csv(filepath, engine="python", sep=";", keep_default_na=False)
data.columns = []
slownik = dict([ ])
trans = ' '
for index, row in data.iterrows():
trans += str(tuple([slownik.get(znak) for znak in row["1st_service"]]))
data['1st_service'] = trans.split(')(')
data.to_csv("out.csv", index=False)
Can you tell me if it is well done?
Maybe there is an faster way to do it?
I am doing it for 12 columns in one for loop, as shown up.

How to use Pandas to display specific columns from csv file?

I have a csv file with a number of columns in it. It is for students. I want to display only male students and their names. I used 1 for male students and 0 for female students. My code is:
import pandas as pd
data = pd.read_csv('normalizedDataset.csv')
results = pd.concat([data['name'], ['students']==1])
print results
I have got this error:
TypeError: cannot concatenate a non-NDFrame object
Can anyone help please. Thanks.
You can specify to read only certain column names of your data when you load your csv. Then use loc to locate all values where students equals 1.
data = pd.read_csv('normalizedDataset.csv', usecols=['name', 'students'])
data = data.loc[data.students == 1, :]
BTW, your original error is because you are trying to concatenate a dataframe with False.
>>> ['students']==1
False
No need to concat, you're stripping things away, not building.
Try:
data[data['friends']==1]['name']
To provide clarity on why you were getting the error:
The second thing you were trying to concat was:
['students']==1
Which is not an NDFrame object. You'd want to replace that with.
data[data['students']==1]['students']

Update Pandas DF during while loop (Python3, Pandas)

Some background: My code takes user input and applies it to my DF to remove certain rows. This process repeats as many times as the user would like. Unfortunately, I am not sure how to update my DF within the while loop I have created so that it keeps the changes being made:
data = ({'hello':['the man','is a','good guy']})
df = pd.DataFrame(data)
def func():
while True:
n = input('Words: ')
if n == "Done":
break
elif n != "Done":
pattern = '^'+''.join('(?=.*{})'.format(word) for word in n.split())
df[df['hello'].str.contains(pattern)==False]
How do I update the DF at the end of each loop so the changes being made stay put?
Ok, I reevaluated your problem and my old answer was totally wrong of course.
What you want is the DataFrame.drop method. This can be done inplace.
mask = df['hello'].str.contains(pattern)
df.drop(mask, inplace=True)
This will update your DataFrame.
Looks to me like you've already done all the hard work, but there are two problems.
Your last line doesn't store the result anywhere. Most Pandas operations are not "in-place", which means you have to store the result somewhere to be able to use it later.
df is a global variable, and setting its value inside a function doesn't work, unless you explicitly have a line stating global df. See the good answers to this question for more detail.
So I think you just need to do:
df = df[df['hello'].str.contains(pattern)==False]
to fix problem one.
For problem two, at the end of func, do return df then when you call func call it like:
df = func(df)
OR, start func with the line
global df

Resources