How to insert values of variables dynamically in SQL Server Database using python script - python-3.x

Fist row in Data File:
1,Maria,Anders,Berlin,Germany,0300174321
f = open("Customer.csv", "r")
for row in f.readlines(i):
a = row
x = a.split(",")
ID1 = print(x[0].replace("",""))
FIRST_NM1 = print(x[1])
LAST_NM1 = print(x[2])
CITY1 = print(x[3])
COUNTRY1 = print(x[4])
PHONE1 = print(x[5])
cursor = cs.cursor()
cursor.execute("INSERT INTO sales.dbo.Customer_temp (ID,FIRST_NM,LAST_NM,CITY,COUNTRY,PHONE) VALUES ('%s','%s','%s','%s','%s','%s')" %(ID1,FIRST_NM1,LAST_NM1,CITY1,COUNTRY1,PHONE1))
cs.commit();
But it is inserting None in all rows so could you please suggest me.

Instead of printing the values you need to assign them;
FIRST_NM1 = x[1]
LAST_NM1 = x[2]
CITY1 = x[3]
etc..

Related

assign to grouped dataframe in Pandas

I want to calculate lags of multiple columns. I am able to do that for each column separately as shown below. How can I avoid the duplicate groupby and sorting.
### Pandas previous week values
search = search.assign(asp_lstwk2 = search.sort_values(by = 'firstdayofweek').groupby('asin_bk')['asp'].shift(1))\
.assign(lbb_lstwk2 = search.sort_values(by = 'firstdayofweek').groupby('asin_bk')['lbb'].shift(1))\
.assign(repoos_lstwk2 = search.sort_values(by = 'firstdayofweek').groupby('asin_bk')['repoos'].shift(1))\
.assign(ordered_units_lstwk2 = search.sort_values(by = 'firstdayofweek').groupby('asin_bk')['ordered_units'].shift(1))
Try:
search = search.join(search.sort_values(by = 'firstdayofweek')
.groupby('asin_bk')[['asp','lbb','repoos','ordered_units']]
.shift().add_suffix('_lstwk2'))

Replace empty string with null values in RDD

Hello i would like to convert empty string to 0 of my RDD.
I have read 20 files and they are in like this formation.
YEAR,MONTH,DAY,DAY_OF_WEEK,AIRLINE,FLIGHT_NUMBER,TAIL_NUMBER,ORIGIN_AIRPORT,DESTINATION_AIRPORT,SCHEDULED_DEPARTURE,DEPARTURE_TIME,DEPARTURE_DELAY,TAXI_OUT,WHEELS_OFF,SCHEDULED_TIME,ELAPSED_TIME,AIR_TIME,DISTANCE,WHEELS_ON,TAXI_IN,SCHEDULED_ARRIVAL,ARRIVAL_TIME,ARRIVAL_DELAY,DIVERTED,CANCELLED,CANCELLATION_REASON,AIR_SYSTEM_DELAY,SECURITY_DELAY,AIRLINE_DELAY,LATE_AIRCRAFT_DELAY,WEATHER_DELAY
2015,2,6,5,OO,6271,N937SW,FAR,DEN,1712,1701,-11,15,1716,123,117,95,627,1751,7,1815,1758,-17,0,0,,,,,,
2015,1,19,1,AA,1605,N496AA,DFW,ONT,1740,1744,4,15,1759,193,198,175,1188,1854,8,1853,1902,9,0,0,,,,,,
2015,3,8,7,NK,1068,N519NK,LAS,CLE,2220,2210,-10,12,2222,238,229,208,1824,450,9,518,459,-19,0,0,,,,,,
2015,9,21,1,AA,1094,N3EDAA,DFW,BOS,1155,1155,0,12,1207,223,206,190,1562,1617,4,1638,1621,-17,0,0,,,,,,
i would like to fill these empty strings with the number 0 to them
def import_parse_rdd(data):
# create rdd
rdd = sc.textFile(data)
# remove the header
header = rdd.first()
rdd = rdd.filter(lambda row: row != header) #filter out header
# split by comma
split_rdd = rdd.map(lambda line: line.split(','))
row_rdd = split_rdd.map(lambda line: Row(
YEAR = int(line[0]),MONTH = int(line[1]),DAY = int(line[2]),DAY_OF_WEEK = int(line[3])
,AIRLINE = line[4],FLIGHT_NUMBER = int(line[5]),
TAIL_NUMBER = line[6],ORIGIN_AIRPORT = line[7],DESTINATION_AIRPORT = line[8],
SCHEDULED_DEPARTURE = line[9],DEPARTURE_TIME = line[10],DEPARTURE_DELAY = (line[11]),TAXI_OUT = (line[12]),
WHEELS_OFF = line[13],SCHEDULED_TIME = line[14],ELAPSED_TIME = (line[15]),AIR_TIME = (line[16]),DISTANCE = (line[17]),WHEELS_ON = line[18],TAXI_IN = (line[19]),
SCHEDULED_ARRIVAL = line[20],ARRIVAL_TIME = line[21],ARRIVAL_DELAY = line[22],DIVERTED = line[23],CANCELLED = line[24],CANCELLATION_REASON = line[25],AIR_SYSTEM_DELAY = line[26],
SECURITY_DELAY = line[27],AIRLINE_DELAY = line[28],LATE_AIRCRAFT_DELAY = line[29],WEATHER_DELAY = line[30])
)
return row_rdd
the above is the code i am running.
I am working with RDD ROW OBJECTS not a dataframe
You can use na.fill("0") to replace all nulls with "0" strings.
spark.read.csv("path/to/file").na.fill(value="0").show()
In case you need integers, you can change the schema to convert string columns to integers.
You could add this to your dataframe to apply the change to a column named 'col_name'
from pyspark.sql import functions as F
(...)
.withColumn('col_name', F.regexp_replace('col_name', ' ', 0))
You could use this syntax directly in your code
You can add if-else condition while creating Row.
Let's consider WEATHER_DELAY.
row_rdd = split_rdd.map(lambda line: Row(#allothercols,
WEATHER_DELAY = 0 if "".__eq__(line[30]) else line[30])
Please allow me another try for your problem, using foreach() method dedicated to rdd.
def f(x) = x.replace(' ', 0)
(...)
row_rdd = row_rdd.foreach(f) # to be added at the end of your script

dataframe manipulation python based on conditons

input_df1: ID MSG
id-1 'msg1'
id-2 'msg2'
id-3 'msg3'
ref_df2: ID MSG
id-1 'msg1'
id-2 'xyzz'
id-4 'msg4'
I am trying to generate an output dataframe based on the following conditions:
If both 'id' & 'msg' values in input_df match the values in ref_df = matched
If 'id' value in input_df doesn't exists in ref_df = notfound
If only 'id' value in input_df matches with 'id' value in ref_df = not_matched
sample output: ID MSG flag
id-1 'msg1' matched
id-2 'msg2' not_matched
id-3 'msg3' notfound
I can do it using lists but considering the fact that I deal with huge amounts of data, performance is important, hence I am looking for a much faster solution.
Any little help will be highly appreciated
'''
Let's use map to map the ids to the reference messages and use np.select:
ref_msg = df1['ID'].map(df2.set_index('ID')['MSG'])
df1['flag'] = np.select((ref_msg.isna(), ref_msg==df1['MSG']),
('not found', 'matched'), 'not_matched')
Output (df1):
ID MSG flag
0 id-1 'msg1' matched
1 id-2 'msg2' not_matched
2 id-3 'msg3' not found
You can also use indicator=True parameter of df.merge:
In [3867]: x = df1.merge(df2, how='outer', indicator=True).groupby('ID', as_index=False).last()
In [3864]: d = {'both':'matched', 'right_only':'not_matched', 'left_only':'notfound'}
In [3869]: x._merge = x._merge.map(d)
In [3871]: x
Out[3871]:
ID MSG _merge
0 id-1 'msg1' matched
1 id-2 'xyzz' not_matched
2 id-3 'msg3' notfound
The fastest and the most Pythonic way of doing what you want to do is to use dictionaries, as shown below:
list_ID_in = ['id-1', 'id-2', 'id-3']
list_msg_in = ['msg1', 'msg2', 'msg3']
list_ID_ref = ['id-1', 'id-2', 'id-4']
list_msg_ref = ['msg1', 'xyzz', 'msg4']
dict_in = {k:v for (k, v) in zip(list_ID_in, list_msg_in)}
dict_ref = {k:v for (k, v) in zip(list_ID_ref, list_msg_ref)}
list_out = [None] * len(dict_in)
for idx, key in enumerate(dict_in.keys()):
try:
ref_value = dict_ref[key]
if ref_value == dict_in[key]:
list_out[idx] = 'matched'
else:
list_out[idx] = 'not_matched'
except KeyError:
list_out[idx] = 'not_found'

Python: Copying certain columns to an empty dataframe using For loop

wo = "C:/temp/temp/WO.xlsx"
dfwo = pd.read_excel(wo)
columnnames = ["TicketID","CreateDate","Status","Summary","CreatedBy","Company"]
main = pd.DataFrame(columns = columnnames)
for i in range(0,15):
print(i)
main["TicketID"][i] = dfwo["WO ID"][i]
main["CreateDate"][i] = dfwo["WO Create TimeStamp"][i]
main["Status"][i] = dfwo["Status"][i]
main["Summary"][i] = dfwo["WO Summary"][i]
main["CreatedBy"][i] = dfwo["Submitter Full Name"][i]
main["Company"][i] = dfwo["Company"][i]
I am trying to copy selected columns from 1 df to another.
dfwo is a df derived from Excel
Main is an empty dataframe and has selected columns from dfwo
When I run this code, it gives me the error, "IndexError: index 0 is out of bounds for axis 0 with size 0"
Any suggestions pls?
wo = "C:/temp/temp/WO.xlsx"
dfwo = pd.read_excel(wo)
columnnames =["TicketID","CreateDate","Status","Summary","CreatedBy","Company"]
main = dfwo[columnnames]
new_col_names = {
"TicketID":"WO ID",
"CreateDate":"WO Create TimeStamp",
"Status":"Status",
"Summary":"WO Summary",
"CreatedBy":"Submitter Full Name",
"Company":"Company"
}
main.rename(columns = new_col_names,inplace = True)

Set text from sqlite rows in Pyqt textbrowser

I need a way to set some rows from sqlite.db to pyqt textbrowser,But the problem is that it just put the last row.
cur = conn.cursor()
conn.text_factory = str
cur.execute(" SELECT text FROM Translation WHERE priority = ?", (m,))
for row in cur:
print('{0}'.format(row[0]))
self.SearchResults.setPlainText('{0}'.format(m))
I used pandas but,
query = "SELECT text FROM Translation WHERE priority=2;"
df = pd.read_sql_query(query,conn)
self.SearchResults.setPlainText('{0}'.format(df['text']))
This is not what I want.And this:
cur = conn.cursor()
conn.text_factory = str
cur.execute(" SELECT text FROM Translation WHERE priority = ?", (m,))
all_rows=cur.fetchall()
self.SearchResults.setPlainText('{0}'.format(all_rows))
In the first code you are replacing the text every time you iterate, the solution is to use append()
cur = conn.cursor()
conn.text_factory = str
cur.execute(" SELECT text FROM Translation WHERE priority = ?", (m,))
self.SearchResults.clear() # clear previous text
for row in cur:
self.SearchResults.append('{0}'.format(str(row[0])))

Resources