How to perform a check with a csv file - groovy

I want to know if there is a better way that iterating through a csv when performing a check. Virtually I am using SOAP UI (free version) to test a web service based on a search.
What I want to do is look at a response from a particular search request (the step name of the SOAP Request is 'Search Request') and look for all instances of test found in between xml tags <TestID> for both within <IFInformation> and <OFInformation> (this will be in a groovy script step).
def groovyUtils = new com.eviware.soapui.support.GroovyUtils(context)
import groovy.xml.XmlUtil
def response = messageExchange.response.responseContent
def xml = new XmlParser().parseText( response )
def IF = xml.'soap:Body'
.IF*
.TestId.text()
def OF = xml.'soap:Body'
.OF*
.TestId.text()
Now what I want to do is for each instance of the 'DepartureAirportId', I want to check that the ID is within a CSV file. There are two columns within the csv file (let's call it Search.csv) and both columns contain many rows. If the flight is found within any row within the first column, add a count +1 for the variable 'Test1', else if found in second column in csv, add count +1 for variable 'Test2'. If not found within any, add count +1 for variable 'NotFound'
I don't know if iterating through a csv is the best outcome or output all the data from the csv into an array list and iterate it through there but I want to know how this can be done and the best way for my own learning experience?

don't know about your algorithm, but the easiest way to iterate through simple csv file in groovy by line and splitting each line with separator:
new File("/1.csv").splitEachLine(","){line->
println " ${ line[0] } ${ line[1] } "
}
http://docs.groovy-lang.org/latest/html/groovy-jdk/java/io/File.html#splitEachLine(java.lang.String,%20groovy.lang.Closure)

You might want to use CSV Validator.
Format.of(String regex)
It should do the trick - just provide the literal you're looking for as a rule for first column and check if it throws an exception or not.

Related

JMeter: Get list of column headers and their index

I am looking for a solution to get list of column headers and their indexes.
Task: pass column headers from a CSV file and their indexes to a variable for further use in HTTP request. Idea is to create a mapping which would indicate which column to use for the given header, as following example:
name
position
company
John
manager
Alphabet
Smith
intern
JP Morgan
So the request would contain:
{"name":0},{"position":1},{"company":2}
I use multiple files where number of columns can be anything from 3 to 50 (or more), so there's no max size. I thought about following approach:
Read all headers and split by coma into a list / collection
Loop through them and use index of the list items as the index I need:
Headers: |header_1 | header_2 | ... | heade_n |
Request: {"header_1":0},{"header_2":1}...{"header_n":n-1}
Question: How to iterate through all columns while size of the file is unknown?
I found this answer where the OP has a fixed number of columns, and solution uses every value for a separate request, but I have to send only one request with the list of headers and indexes.
P.S. I am new to JMeter and Groovy, so didnt have enough time to make a full scale research. So if the answer will explain how to use pass this variable to the request would be appreciated.
There are no "headers" or "columns" in CSV files, there is "first line" and entries delimited by comma.
So if you file looks like:
name,position,company
John,manager,Alphabet
Smith,intern,JP Morgan
You can generate the desired request body like:
def payload = []
def firstLine = new File('test.csv').readLines().get(0)
def entries = firstLine.split(',').size() - 1
0.upto(entries, { index ->
payload.add([('header_' + (index + 1)): index])
})
vars.put('payload', new groovy.json.JsonBuilder(payload).toString())
and refer generated value as ${payload} where required
Demo:
More information:
Reading a File in Groovy
Apache Groovy - Parsing and producing JSON
Apache Groovy - Why and How You Should Use It
You can use Groovy to get the names of column headers.
File file = new File("<path-to-file-name>.csv")
String header_1, header_2, header_3
file.withReader {reader ->
def lstColumns = reader.readLine().split(",")
header_1 = lstColumns[0]
header_2 = lstColumns[1]
header_3 = lstColumns[2]
}
I need to get the value for given index so the request would be like {"name":0},{"position":1},{"company":2}, i.e. not just "header_{0}"; and with your heads-up and those links, came up with following:
def payload = []
def firstLine = new File('test.csv').readLines().get(0)
def entries = firstLine.split(',')
entries.eachWithIndex{value, index ->
log.info("'${value}': ${index}")
payload.add("{'${value}': ${index}}")
}
vars.put('payload', new groovy.json.JsonBuilder(payload).toString())
//log.info('payload : ' + new groovy.json.JsonBuilder(payload).toString())

Add column and values to CSV or Dataframe

Brand new to Python and programming. I have a function that extracts a file creation date from .csv files (the date is included the file naming convention):
def get_filename_dates(self):
"""Extract date from filename and place it into a list"""
for filename in self.file_list:
try:
date = re.search("([0-9]{2}[0-9]{2}[0-9]{2})",
filename).group(0)
self.file_dates.append(date)
self.file_dates.sort()
except AttributeError:
print("The following files have naming issues that prevented "
"date extraction:")
print(f"\t{filename}")
return self.file_dates
The data within these files are brought into a DataFrame:
def create_df(self):
"""Create DataFrame from list of files"""
for i in range(0, len(self.file_dates)):
self.agg_data = pd.read_csv(self.file_list[i])
self.agg_data.insert(9, 'trade_date', self.file_dates[i],
allow_duplicates=False)
return self.agg_data
As each file in file_list is worked with, I need to insert its corresponding date into a new column (trade_date).
As written here, the value of the last index in the list returned by get_filename_dates() is duplicated into every row of the trade_date column. -- presumably because read_csv() opens and closes each file before the next line.
My questions:
Is there an advantage to inserting data into the csv file using with open() vs. trying to match each file and corresponding date while iterating through files to create the DataFrame?
If there is no advantage to with open(), is there a different Pandas method that would allow me to manipulate the data as the DataFrame is created? In addition to the data insertion, there's other clean-up that I need to do. As it stands, I wrote a separate function for the clean-up; it's not complex and would be great to run everything in this one function, if possible.
Hope this makes sense -- thank you
You could grab each csv as an intermediate dataframe, do whatever cleaning you need to do, and use pd.concat() to concatenate them all together as you go. Something like this:
def create_df(self):
self.agg_data = pd.DataFrame()
"""Create DataFrame from list of files"""
for i, date in enumerate(self.file_dates):
df_part = pd.read_csv(self.file_list[i])
df_part['trade_date'] = date
# --- Any other individual file level cleanup here ---
self.agg_data = pd.concat([self.agg_data, df_part], axis=0)
# --- Any aggregate-level cleanup here
return self.agg_data
It makes sense to do as much of the preprocessing/cleanup as possible on the aggregated level as you can.
I also went to the liberty of converting the for-loop to use the more pythonic enumerate

Read data from csv into list of class objects - Python

I'm having trouble figuring this out. Basically I have a .csv file that has 7 employees with their first and last names, employee ID, dept #, and job title. My goal is for def readFile(employees) to accept an empty List (called employees), open the file for reading, and load all the employees from the file into a List of employee objects (employees). I already have my class built as:
class Employee:
def __init__(self, fname, lname, eid, dept, title):
self.__firstName = fname
self.__lastName = lname
self.__employeeID = int(eid)
self.__department = int(dept)
self.__title = title
I have a couple other class methods, but basically I don't quite understand how to properly load the file into a list of objects.
I was able to figure this out. I opened the file and then read a line from it, stripping the \n and splitting my data. I used a while loop to keep reading lines, as long as it wasn't an empty line, and appended it to my empty list. I also had to split the first indexed item as it was first and last name together in the same string and I needed them separate.
def readFile(employees):
with open("employees.csv", "r") as f:
line = f.readline().strip().split(",")
while line != ['']:
line = line[0].split(" ") + line[1:]
employees.append(Employee(line[0], line[1], line[2], line[3], line[4]))
line = f.readline().strip().split(",")
It most likely could be written better and more pythonic but it does what I need it to do.
Why don’t use pandas. So you define an employee pandas object and use their index for select each employee and the name of each column for select an specific employee attribute.

Read from URL & Process data using list comprehension

I am new to python and I am trying to read data from URL. Basically I am reading the historical stock data, get the closing price and save the closing price in to a list. The closing price is available at the 4th index (5th column) of each line. And I want to do all of these within a list comprehension.
Code snippet:
from urllib.request import urlopen
URL = "http://ichart.yahoo.com/table.csv?s=AAPL&a=3&b=1&c=2016&d=9&e=30&f=2016"
def downloadClosingPrice():
urlHandler = urlopen(URL)
next(urlHandler)
return [float(line.split(",")[4]) for line in urlHandler.read().decode("utf8").splitlines() if line]
closingPriceList = downloadClosingPrice()
The above code just works fine. I am able to read and fetch the required data. However just out of curiosity, can the code for list comprehension be written in a more simpler or easier way ?
Thanks...
I did try out various ways and this is how I could do the same using different forms of list comprehension:
return [float(line.decode("utf8").split(",")[4]) for line in urlHandler if line]
# return [float(line.decode("utf8").split(",")[4]) for line in urlHandler.readlines() if line]
# return [float(line.split(",")[4]) for line in urlHandler.read().decode("utf8").splitlines() if line]
The first one is better because it reads the file line by line which saves memory. And of course it's simpler and easier to understand.

convert data string to list

I'm having some troubles processing some input.
I am reading data from a log file and store the different values according to the name.
So my input string consists of ip, name, time and a data value.
A log line looks like this and it has \t spacing:
134.51.239.54 Steven 2015-01-01 06:09:01 5423
I'm reading in the values using this code:
loglines = file.splitlines()
data_fields = loglines[0] # IP NAME DATE DATA
for loglines in loglines[1:]:
items = loglines.split("\t")
ip = items[0]
name = items[1]
date = items[2]
data = items[3]
This works quite well but I need to extract all names to a list but I haven't found a functioning solution.
When i use print name i get:
Steven
Max
Paul
I do need a list of the names like this:
['Steven', 'Max', 'Paul',...]
There is probably a simple solution and i haven't figured it out yet, but can anybody help?
Thanks
Just create an empty list and add the names as you loop through the file.
Also note that if that file is very large, file.splitlines() is probably not the best idea, as it reads the entire file into memory -- and then you basically copy all of that by doing loglines[1:]. Better use the file object itself as an iterator. And don't use file as a variable name, as it shadows the type.
with open("some_file.log") as the_file:
data_fields = next(the_file) # consumes first line
all_the_names = [] # this will hold the names
for line in the_file: # loops over the rest
items = line.split("\t")
ip, name, date, data = items # you can put all this in one line
all_the_names.append(name) # add the name to the list of names
Alternatively, you could use zip and map to put it all into one expression (using that loglines data), but you rather shouldn't do that... zip(*map(lambda s: s.split('\t'), loglines[1:]))[1]

Resources