I need an Integer but its a string with a comma - python-3.x

I'm using sqlite3 and trying to get the oid by using the title of the row and then trying to use that oid to update a column in my table.
allOID is a tuple, and when I print it i get this:
>>> <class 'tuple'>
>>> [(1,)]
I'm trying to get the integer out of this tuple but the comma is throwing it off and I can't seem to get it.
Here is all of the code being used currently:
c.execute("""SELECT oid FROM books
WHERE title = :title""",
{
'title': title
})
allOID = c.fetchall()
print(type(allOID[0]))
print(allOID)
c.execute("SELECT * FROM books")
c.execute("""UPDATE books SET
rented = :rented
WHERE oid = :oid""",
{
'rented': rentedVar,
'oid': allOID[0]
})
any help and comments are greatly appreciated!

The comma just indicates that it is a tuple with a single element.
Access it using allOID[0][0].
allOID[0] gets you the tuple out of the list of results, going one level further with allOID[0][0] gets you the first element of the tuple.
For more info, see the docs:
Empty tuples are constructed by an empty pair of parentheses; a tuple with one item is constructed by following a value with a comma (it is not sufficient to enclose a single value in parentheses). Ugly, but effective.

Related

Python: (partial) matching elements of a list to DataFrame columns, returning entry of a different column

I am a beginner in python and have encountered the following problem: I have a long list of strings (I took 3 now for the example):
ENSEMBL_IDs = ['ENSG00000040608',
'ENSG00000070371',
'ENSG00000070413']
which are partial matches of the data in column 0 of my DataFrame genes_df (first 3 entries shown):
genes_list = (['ENSG00000040608.28', 'RTN4R'],
['ENSG00000070371.91', 'CLTCL1'],
['ENSG00000070413.17', 'DGCR2'])
genes_df = pd.DataFrame(genes_list)
The task I want to perform is conceptually not that difficult: I want to compare each element of ENSEMBL_IDs to genes_df.iloc[:,0] (which are partial matches: each element of ENSEMBL_IDs is contained within column 0 of genes_df, as outlined above). If the element of EMSEMBL_IDs matches the element in genes_df.iloc[:,0] (which it does, apart from the extra numbers after the period ".XX" ), I want to return the "corresponding" value that is stored in the first column of the genes_df Dataframe: the actual gene name, 'RTN4R' as an example.
I want to store these in a list. So, in the end, I would be left with a list like follows:
`genenames = ['RTN4R', 'CLTCL1', 'DGCR2']`
Some info that might be helpful: all of the entries in ENSEMBL_IDs are unique, and all of them are for sure contained in column 0 of genes_df.
I think I am looking for something along the lines of:
`genenames = []
for i in ENSEMBL_IDs:
if i in genes_df.iloc[:,0]:
genenames.append(# corresponding value in genes_df.iloc[:,1])`
I am sorry if the question has been asked before; I kept looking and was not able to find a solution that was applicable to my problem.
Thank you for your help!
Thanks also for the edit, English is not my first language, so the improvements were insightful.
You can get rid of the part after the dot (with str.extract or str.replace) before matching the values with isin:
m = genes_df[0].str.extract('([^.]+)', expand=False).isin(ENSEMBL_IDs)
# or
m = genes_df[0].str.replace('\..*$', '', regex=True).isin(ENSEMBL_IDs)
out = genes_df.loc[m, 1].tolist()
Or use a regex with str.match:
pattern = '|'.join(ENSEMBL_IDs)
m = genes_df[0].str.match(pattern)
out = genes_df.loc[m, 1].tolist()
Output: ['RTN4R', 'CLTCL1', 'DGCR2']

Returning a Pandas DataFrame Index as a String

I want to return the index of my DataFrame as a string. I am using this commandp_h = peak_hour_df.index.astype(str).str.zfill(4) It is not working, I am getting this result: Index(['1645'], dtype='object', name I need it to return the string '1645' How do I accomplish this?
In short:
do p_h = list(peak_hour_df.index.astype(str).str.zfill(4)). This will return a list and then you can index it.
In more detail:
When you do peak_hour_df.index.astype(str), as you see, the dtype is already an object (string), so that job is done. Note this is the type of the contents; not of the object itself. Also I am removing .str.zfill(4) as this is additional and does not change the nature of the problem or the retuning type.
Then the type of the whole objet you are returning is pandas.core.indexes.base.Index. You can check this like so: type(peak_hour_df.index.astype(str)). If you want to return a single value from it in type str (e.g. the first value), then you can either index the pandas object directly like so:
peak_hour_df.index.astype(str)[0]
or (as I show above) you can covert to list and then index that list (for some reason, most people find it more intuitive):
peak_hour_df.index.astype(str).to_list()[0]
list(peak_hour_df.index.astype(str))[0]

How to check if there are two identical strings in a list

I'm making a game of hangman. I use a list to keep track of the word that you are guessing for, and a list of blanks that you fill in. But I can't figure out what to do if for example someone's word was apple, and I guessed p.
My immediate thought was to just find if a letter is in the word twice, then figure out where it is, and when they guess that letter put it in both the first and second spot where that letter is. But I can't find
How to test if two STRINGS are duplicates in a list, and
If I were to use list.index to test where the duplicate letters are how to I find both positions instead of just one.
Create a string for your word
Create a string for user input
Cut your string into letters and keep it on a list/array
Get input
Cut input into letters and keep it on another array
Create a string = "--------" as displayed message
Using a for loop check every position in both array lists and compare them
If yourArray[i] == inputArray[i]
Then change displayedString[i] = inputArray[i] and display message then get another input
If it doesnt match leave "-" sings
Displayed the "---a--b" string
One way to do it would be to go through the list one by one and check if something comes up twice.
def isDuplicate(myList):
a = []
index = 0
for item in myList:
if type(item) == str:
if item in a:
return index
else:
a.append(item)
index += 1
return False
This function goes through the list and adds what it has seen so far into another list. Each time it also checks if the item it is looking at is already in that list, meaning it has already been seen before. If it gets through the whole list without any duplicates, it returns False.
It also keeps track of the index it is on, so it can return that index if it does find a duplicate.
Alternately, If you want to find multiple occurrences of a given string, you would use the same structure with some modifications.
def isDuplicate(myList, query):
index = 0
foundIndexes = []
for item in myList:
if item == query:
foundIndexes.append(index)
index += 1
return foundIndexes
This would return a list of the indexes of all instances of query in myList.

Convert sql.GroovyRowResult with two columns to map

I have a groovy GroovyRowResult (using sql.rows) with two columns. The first column has a group by in the sql, so it will always be unique.
Now I would like to convert the whole result into a map, which contains the first column value as the key and the second column value as the value. I know that I can do this with each for every single row (see below), but was wondering if there is a better aproeach. The only thing I've found is entrySet(), which did not work. Error is:
Message: No signature of method:
java.util.ArrayList.entrySet() is applicable for argument types: () values: [])
So my each row way would be:
def myMap = [:]
result.each {
myMap.put(it.colum1, it.column2)
}
Another alternative would be, to make the query in a way that will return me a map with the first column as the key and the second as the value (or the whole row as the value). Kind of Zend frameworks fetchAssoc() method.
You could do:
def myMap = result.collectEntries {
[ it.column1, it.column2 ]
}

Access list element using get()

I'm trying to use get() to access a list element in R, but am getting an error.
example.list <- list()
example.list$attribute <- c("test")
get("example.list") # Works just fine
get("example.list$attribute") # breaks
## Error in get("example.list$attribute") :
## object 'example.list$attribute' not found
Any tips? I am looping over a vector of strings which identify the list names, and this would be really useful.
Here's the incantation that you are probably looking for:
get("attribute", example.list)
# [1] "test"
Or perhaps, for your situation, this:
get("attribute", eval(as.symbol("example.list")))
# [1] "test"
# Applied to your situation, as I understand it...
example.list2 <- example.list
listNames <- c("example.list", "example.list2")
sapply(listNames, function(X) get("attribute", eval(as.symbol(X))))
# example.list example.list2
# "test" "test"
Why not simply:
example.list <- list(attribute="test")
listName <- "example.list"
get(listName)$attribute
# or, if both the list name and the element name are given as arguments:
elementName <- "attribute"
get(listName)[[elementName]]
If your strings contain more than just object names, e.g. operators like here, you can evaluate them as expressions as follows:
> string <- "example.list$attribute"
> eval(parse(text = string))
[1] "test"
If your strings are all of the type "object$attribute", you could also parse them into object/attribute, so you can still get the object, then extract the attribute with [[:
> parsed <- unlist(strsplit(string, "\\$"))
> get(parsed[1])[[parsed[2]]]
[1] "test"
flodel's answer worked for my application, so I'm gonna post what I built on it, even though this is pretty uninspired. You can access each list element with a for loop, like so:
#============== List with five elements of non-uniform length ================#
example.list=
list(letters[1:5], letters[6:10], letters[11:15], letters[16:20], letters[21:26])
#===============================================================================#
#====== for loop that names and concatenates each consecutive element ========#
derp=c(); for(i in 1:length(example.list))
{derp=append(derp,eval(parse(text=example.list[i])))}
derp #Not a particularly useful application here, but it proves the point.
I'm using code like this for a function that calls certain sets of columns from a data frame by the column names. The user enters a list with elements that each represent different sets of column names (each set is a group of items belonging to one measure), and the big data frame containing all those columns. The for loop applies each consecutive list element as the set of column names for an internal function* applied only to the currently named set of columns of the big data frame. It then populates one column per loop of a matrix with the output for the subset of the big data frame that corresponds to the names in the element of the list corresponding to that loop's number. After the for loop, the function ends by outputting that matrix it produced.
Not sure if you're looking to do something similar with your list elements, but I'm happy I picked up this trick. Thanks to everyone for the ideas!
"Second example" / tangential info regarding application in graded response model factor scoring:
Here's the function I described above, just in case anyone wants to calculate graded response model factor scores* in large batches...Each column of the output matrix corresponds to an element of the list (i.e., a latent trait with ordinal indicator items specified by column name in the list element), and the rows correspond to the rows of the data frame used as input. Each row should presumably contain mutually dependent observations, as from a given individual, to whom the factor scores in the same row of the ouput matrix belong. Also, I feel I should add that if all the items in a given list element use the exact same Likert scale rating options, the graded response model may be less appropriate for factor scoring than a rating scale model (cf. http://www.rasch.org/rmt/rmt143k.htm).
'grmscores'=function(ColumnNameList,DataFrame) {require(ltm) #(Rizopoulos,2006)
x = matrix ( NA , nrow = nrow ( DataFrame ), ncol = length ( ColumnNameList ))
for(i in 1:length(ColumnNameList)) #flodel's magic featured below!#
{x[,i]=factor.scores(grm(DataFrame[, eval(parse(text= ColumnNameList[i]))]),
resp.patterns=DataFrame[,eval(parse(text= ColumnNameList[i]))])$score.dat$z1}; x}
Reference
*Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses, Journal of Statistical Software, 17(5), 1-25. URL: http://www.jstatsoft.org/v17/i05/

Resources