How to insert values from numpy into sql database with given columns? - python-3.x

I need to insert some columns into a table in my Mariadb. The table name is Customer and has 6 columns, A,B,C,D,E,F. The primary keys are in the first column, column B has an address, C,D, and E contain None values and F the zip code.
I have an pandas dataframe that follows similar format. I converted it to numpy array by doing the following:
data = df.iloc[:,1:4].values
hence data is a numpy array containing 3 columns and i need this inserted into C,D and E. I tried:
query = """
Insert Into Customer (C,D,E) VALUES (?,?,?)
"""
cur.executemany(query,data)
cur.commit()
But i get an error:
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I solved it. Although very slow...
query = """
Alter Customer SET
C = %s
D = %s
E = %s
where A = %s
"""
for row in data:
cur.execute(query,args=(row[1],row[2],row[3],row[0])
con.commit()

Related

How to select corresponding items of a columns without making it index column in pandas dataframe

I have a pandas dataframe like this:
How do I get the price of an item1 without making 'Items column' an index column?
I tried df['Price (R)'][item1] but it returns the price of item2, while I expect output to be 1
The loc operators are required in front of the selection brackets []. When using loc, the part before the comma is the rows you want, and the part after the comma is the columns you want to select. Therefore, the code can be:
result = df.loc[df['Items']=="item1","Price(R)"]
The data type of created output is Pandas Series object.

Splicing a Pandas dataframe by column name

I am trying to split a copy off of a Pandas dataframe starting after a certain column by header name.
So far, I've been able to manipulate the column headers or indexes according to a set number of known columns, like below. However, the number of columns will change, and I want to still extract every column that happens after.
In the below example, say I want to grab all columns after 'Tail' even if the 'Body' columns goes to column X. So the below sample with X number of Body columns:
df = pd.DataFrame({'Intro1': ['blah'],
'Intro2': ['blah'],'Intro3': ['blah'],'Body1': ['blah'],'Body2': ['blah'],'Body3': ['blah'],'Body4': ['blah'], ... 'BodyX': ['blah'],'Tail': ['blah'],'OtherTail': ['blah'],'StillAnotherTail': ['blah'],})
Should produce a copy of the dataframe as:
dftail = pd.DataFrame({'Tail': ['blah'],'OtherTail': ['blah'],'StillAnotherTail': ['blah'],})
Ideally I'd like to find a way to combine the two techiques below so that the column starts at 'Tail' and goes to the end of the dataframe:
dftail = [col for col in df if col.startswith('Tail')]
dftail = df.iloc[:, 164:] # column number (164) will change based on 'Tail' index number
How about this:
df_tail = df.iloc[:, list(df.columns).index("Tail"):]
df_tail then prints out:
Tail OtherTail StillAnotherTail
0 blah blah blah

How do I filter a dataframe column using regex?

Here is my regex
date_regex='\d{1,2}\/\d{1,2}\/\d{4}$'
Here is my dates_as_first_row Dataframe
I am trying to filter out the date column but get an empty (377,0) Dataframe.
date_column=dates_as_first_row.filter(regex=attempt,axis='columns')
You can do this using .str.match.
If your column is named '0', it looks like this:
indexer=df['0'].str.match('\d{1,2}\/\d{1,2}\/\d{4}$')
df[indexer]
If you want to select all rows which contain the pattern in any of the string columns, you can do:
# v- select takes all object columns
# v- apply the lambda expression to each of the selected object columns with True if the row contains the pattern in the specific column
# v- if any of the column vectors contains True, return True, otherwise False (the column vectors contain the result of str.match which is a boolean)
indexer=df.select_dtypes('O').apply(lambda ser: ser.str.match('\d{1,2}\/\d{1,2}\/\d{4}$'), axis='index').any(axis='columns')
df[indexer]
But note, that this only works, if all your object columns actually store strings. Which is usually the case if you let pandas figure out the types of the columns itself when the dataframe is created. If that is not the case, you need to add a type check, to avoid a runtime error:
import re
def filter_dates(ser):
date_re= re.compile('\d{1,2}\/\d{1,2}\/\d{4}$')
return ser.map(lambda val: type(val) == str and bool(date_re.match(val)))
df.select_dtypes('O').apply(filter_dates, axis='index').any(axis='columns')

Pass the column names from a list

I have a list of column names which varies every time. The column names are stored in a list. So, I need to pass the column names from the list (in the below example its id and programid) to the when clause and check if both the columns are holding null values. Please help me with the solution.
Pyspark Code:
ColumnList = ['id','programid']
joinSrcTgt.withColumn(
'action',
when(joinSrcTgt.id.isNull() & joinSrcTgt.prgmid.isNull(),'insert')
)
You can use a list comprehension to check if each column is null:
[col(c).isNull() for c in ColumnList]
Then you can use functools.reduce to bitwise-and (&) these together:
from functools import reduce
from pyspark.sql.functions import col, when
ColumnList = ['id','programid']
joinSrcTgt.withColumn(
'action',
when(
reduce(lambda a, b: a&b, [col(c).isNull() for c in ColumnList]),
'insert'
)
)

Assign values to a datetime column in Pandas / Rename a datetime column to a date column

Dataframe image
I have created the following dataframe 'user_char' in Pandas with:
## Create a new workbook User Char with empty datetime columns to import data from the ledger
user_char = all_users[['createdAt', 'uuid','gasType','role']]
## filter on consumers in the user_char table
user_char = user_char[user_char.role == 'CONSUMER']
user_char.set_index('uuid', inplace = True)
## creates datetime columns that need to be added to the existing df
user_char_rng = pd.date_range('3/1/2016', periods = 25, dtype = 'period[M]', freq = 'MS')
## converts date time index to a list
user_char_rng = list(user_char_rng)
## adds empty cols
user_char = user_char.reindex(columns = user_char.columns.tolist() + user_char_rng)
user_char
and I am trying to assign a value to the highlighted column using the following command:
user_char['2016-03-01 00:00:00'] = 1
but this keeps creating a new column rather than editing the existing one. How do I assign the value 1 to all the indices without adding a new column?
Also how do I rename the datetime column that excludes the timestamp and only leaves the date field in there?
Try
user_char.loc[:, '2016-03-01'] = 1
Because your column index is a DatetimeIndex, pandas is smart enough to translate the string '2016-03-01' into datetime format. Using loc[c] seems to hint to pandas to first look for c in the index, rather than create a new column named c.
Side note: the DatetimeIndex of time-series data is conventionally used as the (row) index of a DataFrame, not in the columns. (There's no technical reason why you can't use time in the columns, of course!) In my experience, most of the PyData stack is built to expect "tidy data", where each variable (like time) forms a column, and each observation (timestamp value) forms a row. The way you're doing it, you'll need to transpose your DataFrame before calling plot() on it, for example.

Resources