About lists in python - python-3.x

I have an excel file with a column in which values are in multiple rows in this format 25/02/2016. I want to save all this rows of dates in a list. Each row is a separate value. How do I do this? So far this is my code:
I have an excel file with a column in which values are in multiple rows in this format 25/02/2016. I want to save all this rows of dates in a list. Each row is a separate value. How do I do this? So far this is my code:
import openpyxl
wb = openpyxl.load_workbook ('LOTERIAREAL.xlsx')
sheet = wb.get_active_sheet()
rowsnum = sheet.get_highest_row()
wholeNum = []
for n in range(1, rowsnum):
wholeNum = sheet.cell(row=n, column=1).value
print (wholeNum[0])
When I use the print statement, instead of printing the value of the first row which should be the first item in the list e.g. 25/02/2016, it is printing the first character of the row which is the number 2. Apparently it is slicing thru the date. I want the first row and subsequent rows saved as separate items in the list. What am I doing wrong? Thanks in advance

wholeNum = sheet.cell(row=n, column=1).value assigns the value of the cell to the variable wholeNum, so you're never adding anything to the initial empty list and just overwrite the value each time. When you call wholeNum[0] at the end, wholeNum is a the last string that was read, and you're getting the first character of it.
You probable want wholeNum.append(sheet.cell(row=n, column=1).value) to accumulate a list.

wholeNum =
This is an assignment. It makes the name wholeNum refer to whatever object the expression to the right of the = operator evaluates to.
for ...:
wholeNum = ...
Performing assignment in a loop is frequently not useful. The name wholeNum will refer to whatever value was assigned to it in the last iteration of the loop. The other iterations have no discernible effect.
To append values to a list, use the .append() method.
for ...:
wholeNum.append( ... )
print( wholeNum )
print( wholeNum[0] )

Related

How to find String from multiple columns

I am trying to find a string which I have in first column from another columns in the dateset. The dataset contains name in each column. Below is my dataset.
I am trying a following code but unable to find anything.
P_Name = new_data['P_Name_1']
for i in P_Name:
new_data['new1'] = (new_data.iloc[:,1:].values == i).any(0)
new_data
f there is any similar name found( even first name or last name) it appears in the new column
First of all, be aware that every iteration of your loop will overwrite the entry new_data['new1'].
Then the function x.any() will return a boolean value if any element in x equals to "True" or 1, so what your code would do is to assign a Boolean value to the column new_data['new1'].
I believe it would be easier for people to help if you can specify your problem more explicitly, for example, what's your desired output and what should the loop do?

Looping through a panda dataframe

My variable noExperience1 is a dataframe
I am trying to go through this loop:
num = 0
for row in noExperience1:
if noExperience1[row+1] - noExperience1[row] > num:
num = noExperience1[row+1] - noExperience1[row]
print(num)
My goal is to find the biggest difference in y values from one x value to the next. But I get the error that the line of my if statement needs to be a string and not an integer. How do I fix this so I can have a number?
We can't directly access a row of dataframe using indexing. We need to use loc or iloc for it. I had just solved the problem stated by you.
`noExperience1=pd.read_csv("../input/data.csv")#reading CSV file
num=0
for row in range(1,len(noExperience1)): #iterating row in all rows of DF
if int(noExperience1.loc[row]-noExperience1.loc[row-1]) > num:
num = int(noExperience1.loc[row]-noExperience1.loc[row-1])
print(num)`
Note:
1.Column Slicing : DataFrame[ColName] ==> will give you all enteries of specified column.
2.Row Slicing: DataFrame.loc[RowNumber] ==> will give you a complete row of specified row numbe.RowNumber starts with 0.
Hope this helps.

Use writerow to put a list of elements in the same row

Here is the source code I have. xs is a list of time float numbers, I am trying to put all of the elements of the list to the same row.
outFile=open('testing.csv','w',newline='')
writeFile=csv.writer(outFile)
writeFile.writerow(['cell'])
writeFile.writerow(['LevelLine for Impedance Magnitude:',baseline])
writeFile.writerow(['TimeMag (second):',xs])'
First I had tried using the for loop to load the numbers such as
for i in range(len(xs)-1):
writeFile.writerow('TimeMag(second):',xs[i])
However, the output result of this code prints xs[i] in different rows. I had tried watching the video and checked the CSV writerow function, but I couldn't find anything that put a list of elements to the same row.
I want my output like this:
output:
row 1: Cell
row 2: TimeMag(second): xs[0] xs[1] xs[2] and so on.
Please help, thank you!
You're close. The object you pass to writerow() should be a flat iterable containing the items you want to write to the row. Your attempt:
writeFile.writerow(['TimeMag (second):',xs])
...isn't flat, because it contains two elements, one of which (xs) is itself a list. Instead, you basically want to prepend your string ('TimeMag (second):') to the xs list. This should do what you're expecting:
writeFile.writerow(['TimeMag (second):'] + xs)

Openpyxl to check for keywords, then modify next to cells to contain those keywords and total found

I'm using python 3.x and openpyxl to parse an excel .xlsx file.
For each row, I check a column (C) to see if any of those keywords match.
If so, I add them to a separate list variable and also determine how many keywords were matched.
I then want to add the actual keywords into the next cell, and the total of keywords into the cell after. This is where I am having trouble, actually writing the results.
contents of the keywords.txt and results.xlsx file
here
import openpyxl
# Here I read a keywords.txt file and input them into a keywords variable
# I throwaway the first line to prevent a mismatch due to the unicode BOM
with open("keywords.txt") as f:
f.readline()
keywords = [line.rstrip("\n") for line in f]
# Load the workbook
wb = openpyxl.load_workbook("results.xlsx")
ws = wb.get_sheet_by_name("Sheet")
# Iterate through every row, only looking in column C for the keyword match.
for row in ws.iter_rows("C{}:E{}".format(ws.min_row, ws.max_row)):
# if there's a match, add to the keywords_found list
keywords_found = [key for key in keywords if key in row[0].value]
# if any keywords found, enter the keywords in column D
# and how many keywords into column E
if len(keywords_found):
row[1].value = keywords_found
row[2].value = len(keywords_found)
Now, I understand where I'm going wrong, in that ws.iter_rows(..) returns a tuple, which can't be modified. I figure I could two for loops, one for each row, and another for the columns in each row, but this test is a small example of a real-world scenario, where the amount of rows are in the tens of thousands.
I'm not quite sure which is the best way to go about this. Thankyou in advance for any help that you can provide.
Use the ws['C'] and then the offset() method of the relevant cell.
Thanks Charlie for the offset() tip. I modified the code slightly and now it works a treat.
for row in ws.iter_rows("C{}:C{}"...)
for cell in row:
....
if len(keywords_found):
cell.offset(0,1).value = str(keywords_found)
cell.offset(0,2).value = str(len(keywords_found))

pandas iterate rows and then break until condition

I have a column that's unorganized like this;
Name
Jack
James
Riddick
Random value
Another random value
What I'm trying to do is get only the names from this column, but struggling to find a way to differentiate real names to random values. Fortunately the names are all together, and the random values are all together as well. The only thing I can do is iterate the rows until it gets to 'Random value' and then break off.
I've tried using lambda's for this but with no success as I don't think there's a way to break. And I'm not sure how comprehension could work in this case.
Here's the example I've been trying to play with;
df['Name'] = df['Name'].map(lambda x: True if x != 'Random value' else break)
But the above doesn't work. Any suggestions on what could work based on what I'm trying to achieve? Thanks.
Find index of row containing 'Random value':
index_split = df[df.Name == 'Random value'].index.values[0]
Save your random values column for use later if you want:
random_values = df.iloc[index_split+1:,].values[0]
Remove random values from the Names column:
df = df[0:index_split]

Resources