How can I covert AJAX to dictionary in python? - python-3.x

I Have the following AJAX submission data entry and I want to organize it in report format so I have to covert it to table format
The table should include the field_name as header and the field_value as rows
Can anyone help me?
[{"field_name":"patientno","field_value":"1"},
{"field_name":"patient_unhcr_id","field_value":"1"},
{"field_name":"jps_file","field_value":"1"},
{"field_name":"patient_individual_id","field_value":"1"},
{"field_name":"name","field_value":"1"},
{"field_name":"name_in_arabic","field_value":"1"},
{"field_name":"age","field_value":"1"},
{"field_name":"age_category","field_value":"U5"},
{"field_name":"gender","field_value":"F"},
{"field_name":"coo","field_value":"Syria"},
{"field_name":"phone_number","field_value":"1"},
{"field_name":"governorate","field_value":"Mafraq"},
{"field_name":"bank_branch","field_value":"\u0641\u0631\u0639 \u0636\u0627\u062d\u064a\u0629 \u0627\u0644\u064a\u0627\u0633\u0645\u064a\u0646"},
{"field_name":"treatment_site","field_value":"Ramtha Governmental Hospital"},
{"field_name":"case_category","field_value":"CS"},
{"field_name":"description","field_value":"a"},
{"field_name":"eligibilities_","field_value":"Eligible Level 2"},{"field_name":"approved_amount_before_rounding","field_value":"21.5"},
{"field_name":"approved_amount","field_value":"20"},
{"field_name":"radio_buttons","field_value":"Yes"},
{"field_name":"recipient_name","field_value":"a"},
{"field_name":"recipient__dob","field_value":"02\/24\/2020"},
{"field_name":"gender_of_recpient_","field_value":"F"},
{"field_name":"recipient_unhcr_id_number","field_value":"1"},
{"field_name":"recipient_individual_id","field_value":"1"},
{"field_name":"relationship_to_patient","field_value":"Daughter-in-law"},
{"field_name":"recepient_phone_no","field_value":"1"},
{"field_name":"date_request_send_to_unhcr","field_value":"02\/17\/2020"},
{"field_name":"approval_date","field_value":"02\/24\/2020"},
{"field_name":"closure_date","field_value":"02\/11\/2020"},
{"field_name":"comment","field_value":"a"},
{"field_name":"attatchment","field_value":"http:\/\/192.168.1.52:9999\/wordpress\/wp-content\/uploads\/2020\/02\/IC-Weekly-Task-List-Template-8624.xlsx"}]

Here is my crude approach, hope it helps
import pandas as pd
ajax = [{"field_name":"patientno","field_value":"1"},{"field_name":"patient_unhcr_id","field_value":"1"},{"field_name":"jps_file","field_value":"1"},{"field_name":"patient_individual_id","field_value":"1"},{"field_name":"name","field_value":"1"},{"field_name":"name_in_arabic","field_value":"1"},{"field_name":"age","field_value":"1"},{"field_name":"age_category","field_value":"U5"},{"field_name":"gender","field_value":"F"},{"field_name":"coo","field_value":"Syria"},{"field_name":"phone_number","field_value":"1"},{"field_name":"governorate","field_value":"Mafraq"},{"field_name":"bank_branch","field_value":"\u0641\u0631\u0639 \u0636\u0627\u062d\u064a\u0629 \u0627\u0644\u064a\u0627\u0633\u0645\u064a\u0646"},{"field_name":"treatment_site","field_value":"Ramtha Governmental Hospital"},{"field_name":"case_category","field_value":"CS"},{"field_name":"description","field_value":"a"},{"field_name":"eligibilities_","field_value":"Eligible Level 2"},{"field_name":"approved_amount_before_rounding","field_value":"21.5"},{"field_name":"approved_amount","field_value":"20"},{"field_name":"radio_buttons","field_value":"Yes"},{"field_name":"recipient_name","field_value":"a"},{"field_name":"recipient__dob","field_value":"02/24/2020"},{"field_name":"gender_of_recpient_","field_value":"F"},{"field_name":"recipient_unhcr_id_number","field_value":"1"},{"field_name":"recipient_individual_id","field_value":"1"},{"field_name":"relationship_to_patient","field_value":"Daughter-in-law"},{"field_name":"recepient_phone_no","field_value":"1"},{"field_name":"date_request_send_to_unhcr","field_value":"02/17/2020"},{"field_name":"approval_date","field_value":"02/24/2020"},{"field_name":"closure_date","field_value":"02/11/2020"},{"field_name":"comment","field_value":"a"},{"field_name":"attatchment","field_value":"http://192.168.1.52:9999/wordpress/wp-content/uploads/2020/02/IC-Weekly-Task-List-Template-8624.xlsx"}]
df = pd.DataFrame(data=ajax).T
df.columns = df.iloc[0]
df = df.drop(df.index[0])
Basically you use ajax list as data to create a dataframe, transpose it, set the first row as headers and drop that row afterwards.

Related

How to extract a table from a website(url) using python

The NIST dataset website contains some data of copper, how can I grab the table in the left (titled “HTML table format
“) from the website using a script of python. And only perverse the numbers in the second and third columns as shown in picture below. And store all data into a .csv file. I tried codes below, but it failed to get the correct format of the table.
import pandas as pd
# URL of the table
url = "https://physics.nist.gov/PhysRefData/XrayMassCoef/ElemTab/z29.html"
# Read the table into a pandas dataframe
df = pd.read_html(url, header=0, index_col=0)[0]
# Save the processed table to a CSV file
df.to_csv("nist_table.csv", index=False)
You could use:
.droplevel([0,1]) to remove the unwanted header rows
.dropna(axis=1, how='all') to remove the empty columns
.iloc[:,1:] to select only specific 3 columns
Example
import pandas as pd
url = "https://physics.nist.gov/PhysRefData/XrayMassCoef/ElemTab/z29.html"
df = pd.read_html(url, header=[0,1,2,3])[1].droplevel([0,1], axis=1).dropna(axis=1, how='all').iloc[:,1:]
df
For parsing HTML documents BeautifulSoup is a great Python package to use, this with the requests library you can extract the data you want.
The code below should extract the desired data:
# import packages/libraries
from bs4 import BeautifulSoup
import requests
import pandas as pd
# define URL link variable, get the response and parse the HTML dom contents
url = "https://physics.nist.gov/PhysRefData/XrayMassCoef/ElemTab/z29.html"
response = requests.get(url).text
soup = BeautifulSoup(response, 'html.parser')
# declare table variable and use soup to find table in HTML dom
table = soup.find('table')
# iterate over table rows (tr) and append table data (td) to rows list
rows = []
for i, row in enumerate(table.find_all('tr')):
# only append data if its after 3rd row -> (MeV),(cm2/g),(cm2/g)
if i > 3:
rows.append([value.text.strip() for value in row.find_all('td')])
# create DataFrame from the data appended to the rows list
df = pd.DataFrame(rows)
# export data to csv file called datafile
df.to_csv(r"datafile.csv")

how to add column name to the dataframe storing result of correlation of two columns in pyspark?

I have read a csv file and need to find correlation between two columns.
I am using df.stat.corr('Age','Exp') and result is 0.7924058156930612.
But I want to have this result stored in another dataframe with header as "correlation".
correlation
0.7924058156930612
Following up on what #gupta_hemant commented.
You can create a new column as
df.withColumn("correlation", df.stat.corr("Age", "Exp").collect()[0].correlation)
(I am guessing the exact syntax here, but it should be something like this)
After reviewing the code, the syntax should be
import pyspark.sql.functions as F
df.withColumn("correlation", F.lit(df.stat.corr("Age", "Exp")))
Try this and let me know.
corrValue = df.stat.corr("Age", "Exp")
newDF = spark.createDataFrame(
[
(corrValue)
],
["corr"]
)

Issue when exporting dataframe to csv

I'm working on a mechanical engineering project. For the following code, the user enters the number of cylinders that their compressor has. A dataframe is then created with the correct number of columns and is exported to Excel as a CSV file.
The outputted dataframe looks exactly like I want it to as shown in the first link, but when opened in Excel it looks like the image in the second link:
1.my dataframe
2.Excel Table
Why is my dataframe not exporting properly to Excel and what can I do to get the same dataframe in Excel?
import pandas as pd
CylinderNo=int(input('Enter CylinderNo: '))
new_number=CylinderNo*3
list1=[]
for i in range(1,CylinderNo+1):
for j in range(0,3):
Cylinder_name=str('CylinderNo ')+str(i)
list1.append(Cylinder_name)
df = pd.DataFrame(list1,columns =['Kurbel/Zylinder'])
list2=['Triebwerk', 'Packung','Ventile']*CylinderNo
Bauteil = {'Bauteil': list2}
df2 = pd.DataFrame (Bauteil, columns = ['Bauteil'])
new=pd.concat([df, df2], axis=1)
list3=['Nan','Nan','Nan']*CylinderNo
Bewertung={'Bewertung': list3}
df3 = pd.DataFrame (Bewertung, columns = ['Bewertung'])
new2=pd.concat([new, df3], axis=1)
Empfehlung={'Empfehlung': list3}
df4 = pd.DataFrame (Empfehlung, columns = ['Empfehlung'])
new3=pd.concat([new2, df4], axis=1)
new3.set_index('Kurbel/Zylinder')
new3 = new3.set_index('Kurbel/Zylinder', append=True).swaplevel(0,1)
#export dataframe to csv
new3.to_csv('new3.csv')
To be clear, a comma-separated values (CSV) file is not an Excel format type or table. It is a delimited text file that Excel like other applications can open.
What you are comparing is simply presentation. Both data frames are exactly the same. For multindex data frames, Pandas print output does not repeat index values for readability on the console or IDE like Jupyter. But such values are not removed from underlying data frame only its presentation. If you re-order indexes, you will see this presentation changes. The full complete data frame is what is exported to CSV. And ideally for data integrity, you want the full data set exported with to_csv to be import-able back into Pandas with read_csv (which can set indexes) or other languages and applications.
Essentially, CSV is an industry format to store and transfer data. Consider using Excel spreadsheets, HTML markdown, or other reporting formats for your presentation needs. Therefore, to_csv may not be the best method. You can try to build text file manually with Python i/o write methods, with open('new.csv', 'w') as f, but will be an extensive workaround See also #Jeff's answer here but do note the latter part of solution does remove data.

Python3 - Return CSV with row-level errors for missing data

New to Python. I'm importing a CSV, then if any data is missing I need to return a CSV with an additional column to indicate which rows are missing data. Colleague suggested that I import CSV into a dataframe, then create a new dataframe with a "Comments" column, fill it with a comment on the intended rows, and append it to the original dataframe. I'm stuck at the step of filling my new dataframe, "dferr", with the correct number of rows that would match up to "dfinput".
Have Googled, "pandas csv return error column where data is missing", but haven't found anything related to creating a new CSV that marks bad rows. I don't even know if the proposed way is the best way to go about this.
import pandas as pd
dfinput = None
try:
dfinput = pd.read_csv(r"C:\file.csv")
except:
print("Uh oh!")
if dfinput is None:
print("Ack!")
quit(10)
dfinput.reset_index(level=None, drop=False, inplace=True, col_level=0,
col_fill='')
dferr = pd.DataFrame(columns=['comment'])
print("Empty DataFrame", dferr, sep='\n')
Expected results: "dferr" would have an index column with number of rows equal to "dfinput", and comments on the correct rows where "dfinput" has missing values.
Actual results: "dferr" is empty.
My understanding of 'missing data' here would be null values. It seems that for every row, you want the names of null fields.
df = pd.DataFrame([[1,2,3],
[4,None,6],
[None,8,None]],
columns=['foo','bar','baz'])
# Create a dataframe of True/False, True where a criterion is met
# (in this case, a null value)
nulls = df.isnull()
# Iterate through every row of *nulls*,
# and extract the column names where the value is True by boolean indexing
colnames = nulls.columns
null_labels = nulls.apply(lambda s:colnames[s], axis=1)
# Now you have a pd.Series where every entry is an array
# (technically, a pd.Index object)
# Pandas arrays have a vectorized .str.join method:
df['nullcols'] = null_labels.str.join(', ')
The .apply() method in pandas can sometimes be a bottleneck in your code; there are ways to avoid using this, but here it seemed to be the simplest solution I could think of.
EDIT: Here's an alternate one-liner (instead of using .apply) that might cut down computation time slightly:
import numpy as np
df['nullcols'] = [colnames[x] for x in nulls.values]
This might be even faster (a bit more work is required):
np.where(df.isnull(),df.columns,'')

Error when using pandas read_excel(header=[0,1])

I'm trying to use pandas read_excel to work with a file. The file has two columns of headers so I'm trying to use the multiIndex feature apart of the header keyword argument.
import pandas as pd, os
"""data in 2015 MOR Folder"""
filename = 'MOR-JANUARY 2015.xlsx'
print(os.path.isfile(filename))
df1 = pd.read_excel(filename, header=[0,1], sheetname='MOR')
print(df1)
the error I get is ValueError: Length of new names must be 1, got 2. The file is in this google drive folder https://drive.google.com/drive/folders/0B0ynKIVAlSgidFFySWJoeFByMDQ?usp=sharing
I'm trying to follow the solution posted here
Read excel sheet with multiple header using Pandas
I could be mistaken but I don't think pandas handles parsing excel rows where there are merged cells. So in that first row, the merged cells get parsed as mostly empty cells. You'd need them nicely repeated to act correctly. This is what motivates the ffill below. If you could control the Excel workbook ahead of time and you might be able to use the code you have.
my solution
It's not pretty, but it'll get it done.
filename = 'MOR-JANUARY 2015.xlsx'
df1 = pd.read_excel(filename, sheetname='MOR', header=None)
vals = df1.values
mux = pd.MultiIndex.from_arrays(df1.ffill(1).values[:2, 1:], names=[None, 'DATE'])
df1 = pd.DataFrame(df1.values[2:, 1:], df1.values[2:, 0], mux)

Resources