Related
I have a dataset given as such:
#Load the required libraries
import pandas as pd
#Create dataset
data = {'team': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'],
'Run_time': [1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 3, 4],
'Married': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No'],
'Self_Employed': ['No', 'No', 'Yes', 'No', 'No', 'No', 'Yes', 'No', 'No', 'Yes', 'No', 'No'],
'LoanAmount': [123, 128, 66, 120, 141, 52,96,15,85,36,58,89],
}
#Convert to dataframe
df = pd.DataFrame(data)
print("df = \n", df)
The dataset looks as such:
Here, in the 'Run_time' column, the numbering starts at different index values.
I wish to ensure that the 'Run_time' column starts from 1 only.
The dataset needs to look as such:
Can somebody please let me know how to modify this column in Python such that the numbering is continuous?
import pandas as pd
#Create dataset
data = {'team': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A'],
'Run_time': [1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 3, 4],
'Married': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No'],
'Self_Employed': ['No', 'No', 'Yes', 'No', 'No', 'No', 'Yes', 'No', 'No', 'Yes', 'No', 'No'],
'LoanAmount': [123, 128, 66, 120, 141, 52,96,15,85,36,58,89],
}
#Convert to dataframe
df = pd.DataFrame(data)
# print("df = \n", df)
df.Run_time = df.index+1
df
I have a dataset given as such:
#Load the required libraries
import pandas as pd
#Create dataset
data = {'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'],
'Run_time': [1, 2, 3, 4, 5, 1, 2, 3, 1, 2, 3, 4],
'Married': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes', 'No'],
'Self_Employed': ['No', 'No', 'Yes', 'No', 'No', 'No', 'Yes', 'No', 'No', 'Yes', 'No', 'No'],
'LoanAmount': [123, 128, 66, 120, 141, 52,96,15,85,36,58,89],
}
#Convert to dataframe
df = pd.DataFrame(data)
print("df = \n", df)
Here, I wish to add an additional column 'Last_entry' which will contain 0's and 1's.
This column appears such that, for team-A, the last run-time is 5. So that row has Last_entry=1 and all other run-times for team-A should be 0.
For team-B, the last run-time is 3. So that row has Last_entry=1 and all other run-times for team-B should be 0.
For team-C, the last run-time is 4. So that row has Last_entry=1 and all other run-times for team-C should be 0.
The net result needs to look as such:
New dataframe by adding additional column
Can somebody please let me know how to achieve this task in python?
I wish to add an additional column in an existing dataset by using python
You can use groupby and tail to get the last entry for each team. Then make a new column of zeroes, and set the resulting rows to one:
# Determine indicies for last entries
last_entry_idx = df.groupby('team').tail(1).index
# Create new column
df['last_entry'] = 0
df.loc[last_entry_idx, 'last_entry'] = 1
Unable to co-run code for result. Gives an error message [TypeError: The DTypes <class 'numpy.dtype[float64]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is object. ].[data set][1]
headers = ['Day', 'Hour', 'time', 'T', 'P0', 'P', 'Pa', 'U', 'Ff', 'R', 'Q']
dtypes = {'Day': 'str', 'Hour': 'str', 'time': 'str', 'T': 'float', 'P0': 'float', 'Pa': 'float',
'U': 'float', 'Ff': 'float', 'R': 'float', 'Q': 'float'}
parse_dates = ['Day', 'Hour', 'time']
points = pandas.read_excel('last.xlsx', sheet_name=1,
names=headers, dtype=str, parse_dates=True)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
points.time = pandas.to_datetime(points.time)
points.iloc[:,3:] = points.iloc[:,3:].astype(float)
x = points['time'].values
y = points['T'].values
z = points['Q'].values
ax.scatter(x, y, z, c='r', marker='o')
plt.show()
what am I doing wrong?
[1]: https://disk.yandex.ru/i/JxQBErHf7hptJw
I have inherited this piece of code
dummy_data1 = {
'id': ['1', '2', '3', '4', '5'],
'Feature1': ['A', 'C', 'E', 'G', 'I'],
'Feature2': ['Mouse', 'dog', 'house and parrot', '23', np.NaN],
'dates': ['12/12/2020','12/12/2020','12/12/2020','12/12/2020','12/12/2020']}
df1 = pd.DataFrame(dummy_data1, columns = ['id', 'Feature1', 'Feature2', 'dates'])
df1 = df1.assign(
Feature2=lambda df: df.Feature2.where(
~df.Feature2.str.isnumeric(),
pd.to_numeric(df.Feature2, errors="coerce").astype("Int64"),
)
)
print(df1)
I know that this is because of the np.NAN value. What does the code do? My understanding is that it tries to convert the String to Int, if it is of type integer. Also please tell me how to overcome this issue.
You can try via pd.to_numeric() and then fill NaN's:
df['Feature2']=pd.to_numeric(df['Feature2'], errors="coerce").fillna(df['Feature2'])
OR
go with the where() condition by filling those NaN's with fillna() in your condition ~df.Feature2.str.isnumeric():
df['Feature2']=df['Feature2'].where(~df.Feature2.str.isnumeric().fillna(True),
pd.to_numeric(df.Feature2, errors="coerce").astype("Int64")
)
Given the following data how can I create a dictionary where the keys are the names of the students, and the values are dictionaries where the key is the test and it´s value is the grade they got in it.
grades = [
['Students', 'Test 1', 'Test 2', 'Test 3'],
['Tomas', '100', '90', '80'],
['Marcos', '88', '99', '111'],
['Flavia', '45', '56', '67'],
['Ramon', '59', '61', '67'],
['Ursula', '73', '79', '83'],
['Federico', '89', '97', '101']
]
I tried doing this, but I don´t know why it´s not showing the grades correctly.
notas_dict={}
def dic(etiquets, notas):
for i in range(len(etiquets)):
notas_dict[etiquets[i]]=int(notas[i])
return notas_dict
dic(['Test 1','Test 2', 'Test 3'], ['100','80','90'] )
dic_final={}
for line in grades[1:]:
line_grades=[int(element) for element in line[1:]]
dic_final[line[0]]=dic(['Test 1','Test 2', 'Test 3'], line_grades)
print(dic_final)
The output should be :
{'Tomas': {'Test 1': 100, 'Test 2': 90, 'Test 3': 80}, 'Marcos': {'Test 1': 88, 'Test 2': 99, 'Test 3': 111}, 'Flavia': {'Test 1': 45, 'Test 2': 56, 'Test 3': 67}, 'Ramon': {'Test 1': 59, 'Test 2': 61, 'Test 3': 67}, 'Ursula': {'Test 1': 73, 'Test 2': 79, 'Test 3': 83}, 'Federico': {'Test 1': 89, 'Test 2': 97, 'Test 3': 101}}
You can use:
{i[0]:dict(zip(grades[0][1:],i[1:])) for i in grades[1:]}
results in:
{'Tomas': {'Test 1': '100', 'Test 2': '90', 'Test 3': '80'},
'Marcos': {'Test 1': '88', 'Test 2': '99', 'Test 3': '111'},
'Flavia': {'Test 1': '45', 'Test 2': '56', 'Test 3': '67'},
'Ramon': {'Test 1': '59', 'Test 2': '61', 'Test 3': '67'},
'Ursula': {'Test 1': '73', 'Test 2': '79', 'Test 3': '83'},
'Federico': {'Test 1': '89', 'Test 2': '97', 'Test 3': '101'}}
If you want to get grades as int:
{i[0]:dict(zip(grades[0][1:],list(map(int,i[1:])))) for i in grades[1:]}
create a dataframe then use to_records to create a list of tuples where each tuple is a row. You can then slice the tuple by index.
grades = [
['Students', 'Test 1', 'Test 2', 'Test 3'],
['Tomas', '100', '90', '80'],
['Marcos', '88', '99', '111'],
['Flavia', '45', '56', '67'],
['Ramon', '59', '61', '67'],
['Ursula', '73', '79', '83'],
['Federico', '89', '97', '101']
]
Columns=grades[0]
df=pd.DataFrame(columns=Columns)
for i in range(1, len(grades)):
df_length = len(df)
df.loc[df_length] = grades[i]
print(df.to_records())
output:
[(0, 'Tomas', '100', '90', '80') (1, 'Marcos', '88', '99', '111')
(2, 'Flavia', '45', '56', '67') (3, 'Ramon', '59', '61', '67')
(4, 'Ursula', '73', '79', '83') (5, 'Federico', '89', '97', '101')]
or
dict=df.T.to_dict()
for k,v in dict.items():
print(k,v['Students'],v['Test1'],v['Test2'],v['Test3'])