How to remove '/5' from CSV file - python-3.x

I am cleaning a restaurant data set using Pandas' read_csv.
I have columns like this:
name, online_order, book_table, rate, votes
xxxx, Yes, Yes, 4.5/5, 705
I expect them to be like this:
name, online_order, book_table, rate, votes
xxxx, Yes, Yes, 4.5, 705

You basically need to split the item(dataframe["rate"]) based on / and take out what you need. .apply this on your dataframe using lambda x: getRate(x)
def getRate(x):
return str(x).split("/")[0]
To use it with column name rate, we can use:
dataframe["rate"] = dataframe["rate"].apply(lambda x: getRate(x))

You can use the python .split() function to remove specific text, given that the text is consistently going to be "/5", and there are no instances of "/5" that you want to keep in that string. You can use it like this:
num = "4.5/5"
num.split("/5")[0]
output: '4.5'
If this isn't exactly what you need, there's more regex python functions here

You can use DataFrame.apply() to make your replacement operation on the ratecolumn:
def clean(x):
if "/" not in x :
return x
else:
return x[0:x.index('/')]
df.rate = df.rate.apply(lambda x : clean(x))
print(df)
Output
+----+-------+---------------+-------------+-------+-------+
| | name | online_order | book_table | rate | votes |
+----+-------+---------------+-------------+-------+-------+
| 0 | xxxx | Yes | Yes | 4.5 | 705 |
+----+-------+---------------+-------------+-------+-------+
EDIT
Edited to handle situations in which there could be multiple / or that it could be another number than /5 (ie : /4or /1/3 ...)

Related

Let the user create a custom format using a string

I'd like a user to be able to create a custom format in QtWidgets.QPlainTextEdit() and it would format the string and split out the results in another QtWidgets.QPlainTextEdit().
For example:
movie = {
"Title":"The Shawshank Redemption",
"Year":"1994",
"Rated":"R",
"Released":"14 Oct 1994",
"Runtime":"142 min",
"Genre":"Drama",
"Director":"Frank Darabont",
"Writer":"Stephen King (short story \"Rita Hayworth and Shawshank Redemption\"),Frank Darabont (screenplay)",
"Actors":"Tim Robbins, Morgan Freeman, Bob Gunton, William Sadler",
"Plot":"Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.",
"Language":"English",
"Country":"USA",
"Awards":"Nominated for 7 Oscars. Another 21 wins & 36 nominations.",
"Poster":"https://m.media-amazon.com/images/M/MV5BMDFkYTc0MGEtZmNhMC00ZDIzLWFmNTEtODM1ZmRlYWMwMWFmXkEyXkFqcGdeQXVyMTMxODk2OTU#._V1_SX300.jpg",
"Ratings": [
{
"Source":"Internet Movie Database",
"Value":"9.3/10"
},
{
"Source":"Rotten Tomatoes",
"Value":"91%"
},
{
"Source":"Metacritic",
"Value":"80/100"
}
],
"Metascore":"80",
"imdbRating":"9.3",
"imdbVotes":"2,367,380",
"imdbID":"tt0111161",
"Type":"movie",
"DVD":"15 Aug 2008",
"BoxOffice":"$28,699,976",
"Production":"Columbia Pictures, Castle Rock Entertainment",
"Website":"N/A"
}
custom_format = '[ {Title} | ⌚ {Runtime} | ⭐ {Genre} | 📅 {Released} | {Rated} ]'.format(Title=movie['Title'], Runtime=movie['Runtime'], Genre=movie['Genre'],Released=movie['Released'],Rated=movie['Rated'])
print(custom_format)
This code above, would easily print [ The Shawshank Redemption | ⌚ 142 min | ⭐ Drama | 📅 14 Oct 1994 | R ].
However, if I change this code from:
custom_format = '[ {Title} | ⌚ {Runtime} | ⭐ {Genre} | 📅 {Released} | {Rated} ]'.format(Title=movie['Title'], Runtime=movie['Runtime'], Genre=movie['Genre'],Released=movie['Released'],Rated=movie['Rated'])
To:
custom_format = "'[ {Title} | ⌚ {Runtime} | ⭐ {Genre} | 📅 {Released} | {Rated} ]'.format(Title=movie['Title'], Runtime=movie['Runtime'], Genre=movie['Genre'],Released=movie['Released'],Rated=movie['Rated'])"
Notice, that the whole thing is wrapped in "". Therefor its a string. Now doing this will not print out the format that I want.
The reason I wrapped it in "" is because when I add my original custom_format into a QtWidgets.QPlainTextEdit(), it converts it into a string it wont format later on.
So my original idea was, the user creates a custom format for themselves in a QtWidgets.QPlainTextEdit(). Then I copy that format, open a new window wher the movie json variable is contained and paste the format into another QtWidgets.QPlainTextEdit() where it would hopefuly show it formatted correctly.
Any help on this would be appreciated.
ADDITIONAL INFORMATION:
User creates their format inside QtWidgets.QPlainTextEdit().
Then the user clicks Test Format which should display [ The Shawshank Redemption | ⌚ 142 min | ⭐ Drama | 📅 14 Oct 1994 | R ] but instead it displays
Trying to use the full format command would require an eval(), which is normally considered not only bad practice, but also a serious security issue, especially when the input argument is completely set by the user.
Since the fields are known, I see little point in providing the whole format line, and it is better to parse the format string looking for keywords, then use keyword lookup to create the output.
class Formatter(QtWidgets.QWidget):
def __init__(self):
super().__init__()
layout = QtWidgets.QVBoxLayout(self)
self.formatBase = QtWidgets.QPlainTextEdit(
'[ {Title} | ⌚ {Runtime} | ⭐ {Genre} | 📅 {Released} | {Rated} ]')
self.formatOutput = QtWidgets.QPlainTextEdit()
layout.addWidget(self.formatBase)
layout.addWidget(self.formatOutput)
self.formatBase.textChanged.connect(self.processFormat)
self.processFormat()
def processFormat(self):
format_str = self.formatBase.toPlainText()
# escape double braces
clean = re.sub('{{', '', re.sub('}}', '', format_str))
# capture keyword arguments
tokens = re.split(r'\{(.*?)\}', clean)
keywords = tokens[1::2]
try:
# build the dictionary with given arguments, unrecognized keywords
# are just printed back in the {key} form, in order let the
# user know that the key wasn't valid;
values = {k:movie.get(k, '{{{}}}'.format(k)) for k in keywords}
self.formatOutput.setPlainText(format_str.format(**values))
except (ValueError, KeyError):
# exception for unmatching braces
pass

Katalon Posibble to assert response = data?

I store json test data in excel file.
Make use of apache POI to read the json data and parse it as request body, call it from katalon.
Then I write many lines of assertion (groovy assert) to verify each line response = test data.
Example:
Assert test.responseText.fieldA == 'abc'
Assert test.responseText.fieldB == 'xyz'
And so on if I have total of 20 fields.
I'm thinking of there is better way to make use of the json data stored in data file.
To assert the response = test data. So I can save alot of time to key in each line and modify them is the test data changed.
Please advise if this can be achieved?
Here is an example: you have two excel sheets - current values and expected values (values you are testing against).
Current values:
No. | key | value
----+-----+------
1 a 100
2 b 6
3 c 13
Expected values:
No. | key | value
----+-----+------
1 a 100
2 b 6
3 c 14
You need to add those to Data Files:
The following code will compare the values in the for loop and the assertion will fail on the third run (13!=14):
def expectedData = findTestData("expected")
def currentData = findTestData("current")
for(i=1; i<=currentData.getRowNumbers(); i++){
assert currentData.getValue(2, i) == expectedData.getValue(2, i)
}
Failure message should look like this:
2020-07-02 15:16:40.471 ERROR c.k.katalon.core.main.TestCaseExecutor - ❌ Test Cases/table comparison FAILED.
Reason:
Assertion failed:
assert currentData.getValue(2, i) == expectedData.getValue(2, i)
| | | | | | |
| 14 3 | | 13 3
| | com.kms.katalon.core.testdata.reader.SheetPOI#5aabbb29
| false
com.kms.katalon.core.testdata.reader.SheetPOI#72c927f1

Is there a Python function that removes characters (with a digit) from a string?

I am working on a project about gentrification. My teammates pulled data from the census and cleaned it to get the values we need. The issue is, the zip code values won't print 0's (i.e. "2322" when it should be "02322"). We managed to find the tact value that prints the full zip code with the tact codes("ZCTA5 02322"). I want to remove "ZCTA5" to get the zip code alone.
I've tried the below code but it only gets rid of the "ZCTA" instead of "ZCTA5" (i.e. "502322"). I'm also concerned that if I manage to remove the 5 with the characters, it will remove all 5's in the zip codes as well.
From there I will be pulling from pgeocode to access the respective lat & lng values to create the heatmap. Please help?
I've tried the .replace(), .translate(), functions. Replace still prints the zip codes with 5. Translate gets an attribute error.
Sample data
Zipcode | Name | Change_In_Value | Change_In_Income | Change_In_Degree | Change_In_Rent
2322 | ZCTA5 02322 | -0.050242 | -0.010953 | 0.528509 | -0.013263
2324 | ZCTA5 02324 | 0.012279 | -0.022949 | -0.040456 | 0.210664
2330 | ZCTA5 02330 | 0.020438 | 0.087415 | -0.095076 | -0.147382
2332 | ZCTA5 02332 | 0.035024 | 0.054745 | 0.044315 | 1.273772
2333 | ZCTA5 02333 | -0.012588 | 0.079819 | 0.182517 | 0.156093
Translate
zipcode = []
test2 = gent_df['Name'] = gent_df['Name'].astype(str).translate({ord('ZCTA5'): None}).astype(int)
zipcode.append(test2)
test2.head()
Replace
zipcode = []
test2 = gent_df['Name'] = gent_df['Name'].astype(str).replace(r'\D', '').astype(int)
zipcode.append(test2)
test2.head()
Replace
Expected:
24093
26039
34785
38944
29826
Actual:
524093
526039
534785
538944
529826
Translate
Expected:
24093
26039
34785
38944
29826
Actual:
AttributeError Traceback (most recent call last)
<ipython-input-71-0e5ff4660e45> in <module>
3 zipcode = []
4
----> 5 test2 = gent_df['Name'] = gent_df['Name'].astype(str).translate({ord('ZCTA5'): None}).astype(int)
6 # zipcode.append(test2)
7 test2.head()
~\Anaconda3\envs\MyPyEnv\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5178 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5179 return self[name]
-> 5180 return object.__getattribute__(self, name)
5181
5182 def __setattr__(self, name, value):
AttributeError: 'Series' object has no attribute 'translate'
It looks like you are using pandas so you should be able to use the .lstrip() method. I tried this on a sample df and it worked for me:
gent_df.Name = gent_df.Name.str.lstrip(to_strip='ZCTA5')
Here is a link to the library page for .strip(), .lstrip(), and .rstrip()
I hope this helps!
There are many ways to do this. I can think of 2 off the top of my head.
If you want to keep the last 5 characters of the zipcode string, regardless of whether they are digits or not:
gent_df['Name'] = gent_df['Name'].str[-5:]
If want to get the last 5 digits of the zipcode string:
gent_df['Name'] = gent_df['Name'].str.extract(r'(\d{5})$')[0]
Include some sample data for more specific answer.

How do I get the contents of a list as a string?

This is st[0:6], which is my list (actually it is traces).
I can get some trace by numbers, e.g. st[3], but I want to get some trace by string, for example the data for ADK.10.
I know this name (ADK), but now I don't know this index number.
The 6 Trace(s) in Stream are below:
II.AAK.00.BHZ | 2010-02-18T01:19:08.019500Z
II.AAK.10.BHZ | 2010-02-18T01:19:08.019500Z
IU.ADK.00.BHZ | 2010-02-18T01:18:31.019536Z
IU.ADK.10.BHZ | 2010-02-18T01:18:31.019536Z
IU.AFI.00.BHZ | 2010-02-18T01:23:13.023144Z
IU.AFI.10.BHZ | 2010-02-18T01:23:13.010644Z
I tried this code below, but I'm getting an error:
tr = st['*.RSSD.00.*']
I want to get the RSSD.00 data to tr. What should I do?
I don't know what do you mean by data, but you can extract it with regular expressions. Here an example:
import re
tr = "II.AAK.00.BHZ | 2010-02-18T01:19:08.019500Z"
MAGIC_REGEX = "\w{2}\.(?P<name>.*)\.(?P<value>\d+)\.\w{3}\s+\|\s+(?P<date>\S+)"
match = re.fullmatch(MAGIC_REGEX, tr)
print(match.groupdict())
Result:
{'name': 'AAK', 'value': '00', 'date': '2010-02-18T01:19:08.019500Z'}
groupdict() Gives a dictionary with each group, but you also have groups() which gives you a tuple.
('AAK', '00', '2010-02-18T01:19:08.019500Z')

Behave: Writing a Scenario Outline with dynamic examples

Gherkin / Behave Examples
Gherkin syntax features test automation using examples:
Feature: Scenario Outline (tutorial04)
Scenario Outline: Use Blender with <thing>
Given I put "<thing>" in a blender
When I switch the blender on
Then it should transform into "<other thing>"
Examples: Amphibians
| thing | other thing |
| Red Tree Frog | mush |
| apples | apple juice |
Examples: Consumer Electronics
| thing | other thing |
| iPhone | toxic waste |
| Galaxy Nexus | toxic waste |
The test suite would run four times, once for each example, giving a result similar to:
My problem
How can I test using confidential data in the Examples section? For example, I would like to test an internal API with user ids or SSN numbers, without keeping the data hard coded in the feature file.
Is there a way to load the Examples dynamically from an external source?
Update: Opened a github issue on the behave project.
I've come up with another solution (behave-1.2.6):
I managed to dynamically create examples for a Scenario Outline by using before_feature.
Given a feature file (x.feature):
Feature: Verify squared numbers
Scenario Outline: Verify square for <number>
Then the <number> squared is <result>
Examples: Static
| number | result |
| 1 | 1 |
| 2 | 4 |
| 3 | 9 |
| 4 | 16 |
# Use the tag to mark this outline
#dynamic
Scenario Outline: Verify square for <number>
Then the <number> squared is <result>
Examples: Dynamic
| number | result |
| . | . |
And the steps file (steps/x.step):
from behave import step
#step('the {number:d} squared is {result:d}')
def step_impl(context, number, result):
assert number*number == result
The trick is to use before_feature in environment.py as it has already parsed the examples tables to the scenario outlines, but hasn't generated the scenarios from the outline yet.
import behave
import copy
def before_feature(context, feature):
features = (s for s in feature.scenarios if type(s) == behave.model.ScenarioOutline and
'dynamic' in s.tags)
for s in features:
for e in s.examples:
orig = copy.deepcopy(e.table.rows[0])
e.table.rows = []
for num in range(1,5):
n = copy.deepcopy(orig)
# This relies on knowing that the table has two rows.
n.cells = ['{}'.format(num), '{}'.format(num*num)]
e.table.rows.append(n)
This will only operate on Scenario Outlines that are tagged with #dynamic.
The result is:
behave -k --no-capture
Feature: Verify squared numbers # features/x.feature:1
Scenario Outline: Verify square for 1 -- #1.1 Static # features/x.feature:8
Then the 1 squared is 1 # features/steps/x.py:3
Scenario Outline: Verify square for 2 -- #1.2 Static # features/x.feature:9
Then the 2 squared is 4 # features/steps/x.py:3
Scenario Outline: Verify square for 3 -- #1.3 Static # features/x.feature:10
Then the 3 squared is 9 # features/steps/x.py:3
Scenario Outline: Verify square for 4 -- #1.4 Static # features/x.feature:11
Then the 4 squared is 16 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 1 -- #1.1 Dynamic # features/x.feature:19
Then the 1 squared is 1 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 2 -- #1.2 Dynamic # features/x.feature:19
Then the 2 squared is 4 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 3 -- #1.3 Dynamic # features/x.feature:19
Then the 3 squared is 9 # features/steps/x.py:3
#dynamic
Scenario Outline: Verify square for 4 -- #1.4 Dynamic # features/x.feature:19
Then the 4 squared is 16 # features/steps/x.py:3
1 feature passed, 0 failed, 0 skipped
8 scenarios passed, 0 failed, 0 skipped
8 steps passed, 0 failed, 0 skipped, 0 undefined
Took 0m0.005s
This relies on having an Examples table with the correct shape as the final table, in my example, with two rows. I also don't fuss with creating new behave.model.Row objects, I just copy the one from the table and update it. For extra ugliness, if you're using a file, you can put the file name in the Examples table.
Got here looking for something else, but since I've been in similar situation with Cucumber before, maybe someone will also end up at this question, looking for a possible solution. My approach to this problem is to use BDD variables that I can later handle at runtime in my step_definitions. In my python code I can check what is the value of the Gherkin variable and map it to what's needed.
For this example:
Scenario Outline: Use Blender with <thing>
Given I put "<thing>" in a blender
When I switch the blender on
Then it should transform into "<other thing>"
Examples: Amphibians
| thing | other thing |
| Red Tree Frog | mush |
| iPhone | data.iPhone.secret_key | # can use .yaml syntax here as well
Would translate to such step_def code:
#given('I put "{thing}" in a blender')
def step_then_should_transform_into(context, other_thing):
if other_thing == BddVariablesEnum.SECRET_KEY:
basic_actions.load_secrets(context, key)
So all you have to do is to have well defined DSL layer.
Regarding the issue of using SSN numbers in testing, I'd just use fake SSNs and not worry that I'm leaking people's private information.
Ok, but what about the larger issue? You want to use a scenario outline with examples that you cannot put in your feature file. Whenever I've run into this problem what I did was to give a description of the data I need and let the step implementation either create the actual data set used for testing or fetch the data set from an existing test database.
Scenario Outline: Accessing the admin interface
Given a user who <status> an admin has logged in
Then the user <ability> see the admin interface
Examples: Users
| status | ability |
| is | can |
| is not | cannot |
There's no need to show any details about the user in the feature file. The step implementation is responsible for either creating or fetching the appropriate type of user depending on the value of status.

Resources