I'm trying to create the Mysql Insert query like this for inserting million of records:
INSERT INTO mytable (fee, fi) VALUES
('data1',96)
,('data2',33)
,('boot',17)
My values is stored as tuple in the list:
datatuplst = [("data1",96), ("data2", 33),("data3", 17)]
My code:
c3 = con.cursor()
c3.execute("INSERT INTO `bigdataloadin` (`block_key`, `id`) VALUES %s" %','.join(datatuplst))
This is not working and I'm getting error:
TypeError: sequence item 0: expected str instance, tuple found
Need help on how to create the dynamic query with values stored in tuples list.
You can generate the string that you need. The error is explanatory enough. When it is ','.join(datatuplst) the interpreter is forced to join tuples. So using list comprehension you can say this instead:
','.join([str(el) for el in datatuplst])
The output for this statement is going to be: "('data1', 96),('data2', 33),('data3', 17)"
Then your actual INSERT statement will be interpreted as follows:
"INSERT INTO `bigdataloadin` (`block_key`, `id`) VALUES ('data1', 96),
('data2', 33),('data3', 17)"
Good luck!
Related
I have a data frame (df).
The Data frame contains a string column called: supported_cpu.
The (supported_cpu) data is a string type separated by a comma.
I want to use this data for the ML model.
enter image description here
I had to get unique values for the column (supported_cpu). The output is a (list) of unique values.
def pars_string(df,col):
#Separate the column from the string using split
data=df[col].value_counts().reset_index()
data['index']=data['index'].str.split(",")
# Create a list including all of the items, which is separated by column
df_01=[]
for i in range(data.shape[0]):
for j in data['index'][i]:
df_01.append(j)
# get unique value from sub_df
list_01=list(set(df_01))
# there are some leading or trailing spaces in the list_01 which need to be deleted to get unique value
list_02=[x.strip(' ') for x in list_01]
# get unique value from list_02
list_03=list(set(list_02))
return(list_03)
supported_cpu_list = pars_string(df=df,col='supported_cpu')
The output:
enter image description here
I want to map this output to the data frame to encode it for the ML model.
How could I store the output in the data frame? Note : Some row have a multi-value(more than one CPU)
Input: string type separated by a column
output: I did not know what it should be.
Input: string type separated by a column
output: I did not know what it should be.
I really recommend to anyone who's starting using pandas to read about vectorization and thinking in terms of columns (aka Series). This is the way it was build and it is the way in which its supposed to be used.
And from what I understand (I may be wrong) is that you want to get unique values from supported_cpu column. So you could use the Series methods on string to split that particular column, then flatten the resulting array using internal `chain
from itertools import chain
df['supported_cpu'] = df['supported_cpu'].str.split(pat=',')
unique_vals = set(chain(*df['supported_cpus'].tolist()))
unique_vals = (item for item in unique_vals if item)
Multi-values in some rows should be parsed to single values for later ML model training. The list can be converted to dataframe simply by pd.DataFrame(supported_cpu_list).
how to make a Multidimensional Dictionary with multiple keys and value and how to print its keys and values?
from this format:
main_dictionary= { Mainkey: {keyA: value
keyB: value
keyC: value
}}
I tried to do it but it gives me an error in the manufacturer. here is my code
car_dict[manufacturer] [type]= [( sedan, hatchback, sports)]
Here is my error:
File "E:/Programming Study/testupdate.py", line 19, in campany
car_dict[manufacturer] [type]= [( sedan, hatchback, sports)]
KeyError: 'Nissan'
And my printing code is:
for manufacuted_by, type,sedan,hatchback, sports in cabuyao_dict[bgy]:
print("Manufacturer Name:", manufacuted_by)
print('-' * 120)
print("Car type:", type)
print("Sedan:", sedan)
print("Hatchback:", hatchback)
print("Sports:", sports)
Thank you! I'm new in Python.
I think you have a slight misunderstanding of how a dict works, and how to "call back" the values inside of it.
Let's make two examples for how to create your data-structure:
car_dict = {}
car_dict["Nissan"] = {"types": ["sedan", "hatchback", "sports"]}
print(car_dict) # Output: {'Nissan': {'types': ['sedan', 'hatchback', 'sports']}}
from collections import defaultdict
car_dict2 = defaultdict(dict)
car_dict2["Nissan"]["types"] = ["sedan", "hatchback", "sports"]
print(car_dict2) # Output: defaultdict(<class 'dict'>, {'Nissan': {'types': ['sedan', 'hatchback', 'sports']}})
In both examples above, I first create a dictionary, and then on the row after I add the values I want it to contain. In the first example, I give car_dict the key "Nissan" and set it's values to a new dictionary containing some values.
In the second example I use defaultdict(dict) which basically has the logic of "if i am not given a value for key then use the factory (dict) to create a value for it.
Can you see the difference of how to initiate the values inside of both of the different methods?
When you called car_dict[manufacturer][type] in your code, you hadn't yet initiated car_dict["Nissan"] = value, so when you tried to retrieve it, car_dict returned a KeyError.
As for printing out the values, you can do something like this:
for key in car_dict:
manufacturer = key
car_types = car_dict[key]["types"]
print(f"The manufacturer '{manufacturer}' has the following types:")
for t in car_types:
print(t)
Output:
The manufacturer 'Nissan' has the following types:
sedan
hatchback
sports
When you loop through a dict, you are looping through only the keys that are contained in it by default. That means that we have to retrieve the values of key inside of the loop itself to be able to print them correctly.
Also as a side note: You should try to avoid using Built-in's names such as type as variable names, because you then overwrite that functions namespace, and you can have some problems in the future when you have to do comparisons of types of variables.
I am using Python to query my DB and printing out the values line by like as follows:
cursor.execute("SELECT id, name FROM playlists")
lists = cursor.fetchall()
for index, list in lists:
print("{0} - {1}".format(list['id'], list['name']))
However this beings back the following error when executed:
TypeError: string indices must be integers
The method fetchall() returns a list of tuples representing the rows, where each tuple contains the values for each column. When you did for index, list in lists, you are already accessing each tuple and unpacking them into index and list respectively.
I assume you have columns 'id' and 'name'.
If you just want to print them:
for index, mylist in lists:
print("{0} - {1}".format(index, mylist))
or
rows = cursor.fetchall()
for row in rows
print("{0} - {1}".format(row[0], row[1]))
or with f-strings:
rows = cursor.fetchall()
for index, mylist in rows
print(f"{index} - {mylist}")
As I mentioned, it is bad to name your variables same as built-in types and methods.
Don't use list as a variable name.
What is the correct method to have the list (countryList) be available via %s in the SQL statement?
# using psycopg2
countryList=['UK','France']
sql='SELECT * from countries WHERE country IN (%s)'
data=[countryList]
cur.execute(sql,data)
As it is now, it errors out after trying to run "WHERE country in (ARRAY[...])". Is there a way to do this other than through string manipulation?
Thanks
For the IN operator, you want a tuple instead of list, and remove parentheses from the SQL string.
# using psycopg2
data=('UK','France')
sql='SELECT * from countries WHERE country IN %s'
cur.execute(sql,(data,))
During debugging you can check that the SQL is built correctly with
cur.mogrify(sql, (data,))
To expland on the answer a little and to address named parameters, and converting lists to tuples:
countryList = ['UK', 'France']
sql = 'SELECT * from countries WHERE country IN %(countryList)s'
cur.execute(sql, { # You can pass a dict for named parameters rather than a tuple. Makes debugging hella easier.
'countryList': tuple(countryList), # Converts the list to a tuple.
})
You could use a python list directly as below. It acts like the IN operator in SQL and also handles a blank list without throwing any error.
data=['UK','France']
sql='SELECT * from countries WHERE country = ANY (%s)'
cur.execute(sql,(data,))
source:
http://initd.org/psycopg/docs/usage.html#lists-adaptation
Since the psycopg3 question was marked as a duplicate, I'll add the answer to that here too.
In psycopg3, you can not use in %s with a tuple, like you could in psycopg2. Instead you have to use ANY() and wrap your list inside another list:
conn.execute("SELECT * FROM foo WHERE id = ANY(%s)", [[10,20,30]])
Docs: https://www.psycopg.org/psycopg3/docs/basic/from_pg2.html#you-cannot-use-in-s-with-a-tuple
I am trying to compare a list of links to the liks stored in an sqlite database.
assuming the links in the database are:
link.com\page1
link.com\page2
link.com\page3
I have written the following code to chick if a given link exists in the database and adds it if it did not exist.
links = ['link.com\page2', 'link.com\page4']
c.execute('SELECT link FROM ads')
previouslinks = c.fetchall()
for l in links:
if l not in previouslinks:
c.execute('''INSERT INTO ads(link) VALUES(?)''', (l))
conn.commit()
else:
pass
the problem is even though the link is in the database, the script does not recognise it!
when I try to print previouslinks variable, results look something like this:
[('link.com\page1',), ('link.com\page2',), ('link.com\page3',)]
I think the problem is with the extra parentheses and commas, but I am not exactly sure.
fetchall() returns a list of rows, where each row is a tuple containing all column values. A tuple containing a string is not the same as the string.
You have to extract the values from the rows (and you don't need fetchall() when iterating over a cursor):
previouslinks = [row[0] for row in c]