Let the user create a custom format using a string - python-3.x

I'd like a user to be able to create a custom format in QtWidgets.QPlainTextEdit() and it would format the string and split out the results in another QtWidgets.QPlainTextEdit().
For example:
movie = {
"Title":"The Shawshank Redemption",
"Year":"1994",
"Rated":"R",
"Released":"14 Oct 1994",
"Runtime":"142 min",
"Genre":"Drama",
"Director":"Frank Darabont",
"Writer":"Stephen King (short story \"Rita Hayworth and Shawshank Redemption\"),Frank Darabont (screenplay)",
"Actors":"Tim Robbins, Morgan Freeman, Bob Gunton, William Sadler",
"Plot":"Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.",
"Language":"English",
"Country":"USA",
"Awards":"Nominated for 7 Oscars. Another 21 wins & 36 nominations.",
"Poster":"https://m.media-amazon.com/images/M/MV5BMDFkYTc0MGEtZmNhMC00ZDIzLWFmNTEtODM1ZmRlYWMwMWFmXkEyXkFqcGdeQXVyMTMxODk2OTU#._V1_SX300.jpg",
"Ratings": [
{
"Source":"Internet Movie Database",
"Value":"9.3/10"
},
{
"Source":"Rotten Tomatoes",
"Value":"91%"
},
{
"Source":"Metacritic",
"Value":"80/100"
}
],
"Metascore":"80",
"imdbRating":"9.3",
"imdbVotes":"2,367,380",
"imdbID":"tt0111161",
"Type":"movie",
"DVD":"15 Aug 2008",
"BoxOffice":"$28,699,976",
"Production":"Columbia Pictures, Castle Rock Entertainment",
"Website":"N/A"
}
custom_format = '[ {Title} | ⌚ {Runtime} | ⭐ {Genre} | 📅 {Released} | {Rated} ]'.format(Title=movie['Title'], Runtime=movie['Runtime'], Genre=movie['Genre'],Released=movie['Released'],Rated=movie['Rated'])
print(custom_format)
This code above, would easily print [ The Shawshank Redemption | ⌚ 142 min | ⭐ Drama | 📅 14 Oct 1994 | R ].
However, if I change this code from:
custom_format = '[ {Title} | ⌚ {Runtime} | ⭐ {Genre} | 📅 {Released} | {Rated} ]'.format(Title=movie['Title'], Runtime=movie['Runtime'], Genre=movie['Genre'],Released=movie['Released'],Rated=movie['Rated'])
To:
custom_format = "'[ {Title} | ⌚ {Runtime} | ⭐ {Genre} | 📅 {Released} | {Rated} ]'.format(Title=movie['Title'], Runtime=movie['Runtime'], Genre=movie['Genre'],Released=movie['Released'],Rated=movie['Rated'])"
Notice, that the whole thing is wrapped in "". Therefor its a string. Now doing this will not print out the format that I want.
The reason I wrapped it in "" is because when I add my original custom_format into a QtWidgets.QPlainTextEdit(), it converts it into a string it wont format later on.
So my original idea was, the user creates a custom format for themselves in a QtWidgets.QPlainTextEdit(). Then I copy that format, open a new window wher the movie json variable is contained and paste the format into another QtWidgets.QPlainTextEdit() where it would hopefuly show it formatted correctly.
Any help on this would be appreciated.
ADDITIONAL INFORMATION:
User creates their format inside QtWidgets.QPlainTextEdit().
Then the user clicks Test Format which should display [ The Shawshank Redemption | ⌚ 142 min | ⭐ Drama | 📅 14 Oct 1994 | R ] but instead it displays

Trying to use the full format command would require an eval(), which is normally considered not only bad practice, but also a serious security issue, especially when the input argument is completely set by the user.
Since the fields are known, I see little point in providing the whole format line, and it is better to parse the format string looking for keywords, then use keyword lookup to create the output.
class Formatter(QtWidgets.QWidget):
def __init__(self):
super().__init__()
layout = QtWidgets.QVBoxLayout(self)
self.formatBase = QtWidgets.QPlainTextEdit(
'[ {Title} | ⌚ {Runtime} | ⭐ {Genre} | 📅 {Released} | {Rated} ]')
self.formatOutput = QtWidgets.QPlainTextEdit()
layout.addWidget(self.formatBase)
layout.addWidget(self.formatOutput)
self.formatBase.textChanged.connect(self.processFormat)
self.processFormat()
def processFormat(self):
format_str = self.formatBase.toPlainText()
# escape double braces
clean = re.sub('{{', '', re.sub('}}', '', format_str))
# capture keyword arguments
tokens = re.split(r'\{(.*?)\}', clean)
keywords = tokens[1::2]
try:
# build the dictionary with given arguments, unrecognized keywords
# are just printed back in the {key} form, in order let the
# user know that the key wasn't valid;
values = {k:movie.get(k, '{{{}}}'.format(k)) for k in keywords}
self.formatOutput.setPlainText(format_str.format(**values))
except (ValueError, KeyError):
# exception for unmatching braces
pass

Related

How can extract an income statement from all company concepts?

All company concepts in xbrl format can be extracted with sec's RESTful api.
For example,i want to get tesla's concepts in xbrl format in 2020, get the tesla's cik and the url for api.
cik='1318605'
url = 'https://data.sec.gov/api/xbrl/companyfacts/CIK{:>010s}.json'.format(cik)
To express financial statement in 2020 with elements fy and fp:
'fy' == 2020 and 'fp' == 'FY'
I write the whole python code to call sec's api:
import requests
import json
cik='1318605'
url = 'https://data.sec.gov/api/xbrl/companyfacts/CIK{:>010s}.json'.format(cik)
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36"
}
res = requests.get(url=url,headers=headers)
result = json.loads(res.text)
concepts_list = list(result['facts']['us-gaap'].keys())
data_concepts = result['facts']['us-gaap']
fdata = {}
for item in concepts_list:
data = data_concepts[item]['units']
data_units = list(data_concepts[item]['units'].keys())
for data_units_attr in data_units:
for record in data[data_units_attr]:
if record['fy'] == 2020 and record['fp'] == 'FY':
fdata[item] = record['val']
fdata contains all the company concepts and its value in 2020 for tesla,show part of it:
fdata
{'AccountsAndNotesReceivableNet': 334000000,
'AccountsPayableCurrent': 6051000000,
'AccountsReceivableNetCurrent': 1886000000,
How can get all concepts below to income statement?I want to extract it to make an income statement.
Maybe i should add some values in dei with almost same way as above.
EntityCommonStockSharesOutstanding:959853504
EntityPublicFloat:160570000000
It is simple to parse financial statement such as income statement from ixbrl file:
https://www.sec.gov/ix?doc=/Archives/edgar/data/0001318605/000156459021004599/tsla-10k_20201231.htm
I can get it ,please help to replicate the annual income statement on 2020 for Tesla from sec's RESTful api,or extract the income statement from the whole instance file:
https://www.sec.gov/Archives/edgar/data/1318605/000156459021004599/0001564590-21-004599.txt
If you tell me how to get all concepts belong to income statement ,i can fulfill the job ,with the same way balance and cashflow statement all can be extracted.In my case the fdata contain all concepts belong to income,balance,cashflow statement,which concept in fdata belong to which financial statement? How to map every concepts into income,balance,cashflow statement?
#expression in pseudocode
income_statement = fdata[all_concepts_belong_to_income_statement]
balance_statement = fdata[all_concepts_belong_to_balance_statement]
cashflow_statement = fdata[all_concepts_belong_to_cashflow_statement]
If I understand you correctly, the following should help you get least part of the way there.I'm not going to try to replicate the financial statements; instead, I'll only choose one, and show the concepts used in creating the statement - you'll have to take it from there.
I'll use the "income statement" (formally called in the instance document: "Consolidated Statements of Comprehensive Income (Loss) - USD ($) - $ in Millions") as the example. Again, the data is not in json, but html.
#import the necessary libraries
import lxml.html as lh
import requests
#get the filing
url = 'https://www.sec.gov/Archives/edgar/data/1318605/000156459021004599/0001564590-21-004599.txt'
req = requests.get(url, headers={"User-Agent": "b2g"})
#parse the filing content
doc = lh.fromstring(req.content)
#locate the relevant table and get the data
concepts = doc.xpath("//p[#id='Consolidated_Statmnts_of_Cmprehnsve_Loss']/following-sibling::div[1]//table//#name")
for c in set(concepts):
print(c)
Output:
There are 5 concepts, repeated 3 times each:
us-gaap:ComprehensiveIncomeNetOfTaxAttributableToNoncontrollingInterest
us-gaap:OtherComprehensiveIncomeLossForeignCurrencyTransactionAndTranslationAdjustmentNetOfTax
us-gaap:ProfitLoss
us-gaap:ComprehensiveIncomeNetOfTax
us-gaap:ComprehensiveIncomeNetOfTaxIncludingPortionAttributableToNoncontrollingInterest
I get a indirect way to extract income statement via SEC's api,SEC already publish all data extracted from raw xbrl file(called xbrl instance published also),the tool which parse xbrl file to form four csv files num.txt,sub.txt,pre.txt,tag.txt is not published,what i want to do is to create the tool myself.
Step1:
Download dataset from https://www.sec.gov/dera/data/financial-statement-data-sets.html on 2020,and unzip it ,we get pre.txt,num.txt,sub.txt,tag.txt.
Step2:
Create database and table pre,num,sub,tag according to fields in pre.txt,num.txt,sub.txt,tag.txt.
Step3:
Import pre.txt,num.txt,sub.txt,tag.txt into table pre,num,sub,tag.
Step4:
Query in my postgresql:
the adsh number for tesla's financial statement on 2020 is `0001564590-21-004599`
\set accno '0001564590-21-004599'
select tag,value from num where adsh=:'accno' and ddate = '2020-12-31' and qtrs=4 and tag in
(select tag from pre where adsh=:'accno' and stmt='IS');
tag | value
---------------------------------------------------------------------------------------------+------------------
NetIncomeLoss | 721000000.0000
OperatingLeasesIncomeStatementLeaseRevenue | 1052000000.0000
GrossProfit | 6630000000.0000
InterestExpense | 748000000.0000
CostOfRevenue | 24906000000.0000
WeightedAverageNumberOfSharesOutstandingBasic | 933000000.0000
IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest | 1154000000.0000
EarningsPerShareDiluted | 0.6400
ProfitLoss | 862000000.0000
OperatingExpenses | 4636000000.0000
InvestmentIncomeInterest | 30000000.0000
OperatingIncomeLoss | 1994000000.0000
SellingGeneralAndAdministrativeExpense | 3145000000.0000
NetIncomeLossAttributableToNoncontrollingInterest | 141000000.0000
StockholdersEquityNoteStockSplitConversionRatio1 | 5.0000
NetIncomeLossAvailableToCommonStockholdersBasic | 690000000.0000
Revenues | 31536000000.0000
IncomeTaxExpenseBenefit | 292000000.0000
OtherNonoperatingIncomeExpense | -122000000.0000
WeightedAverageNumberOfDilutedSharesOutstanding | 1083000000.0000
EarningsPerShareBasic | 0.7400
ResearchAndDevelopmentExpense | 1491000000.0000
BuyOutOfNoncontrollingInterest | 31000000.0000
CostOfAutomotiveLeasing | 563000000.0000
CostOfRevenuesAutomotive | 20259000000.0000
CostOfServicesAndOther | 2671000000.0000
SalesRevenueAutomotive | 27236000000.0000
SalesRevenueServicesAndOtherNet | 2306000000.0000
(28 rows)
stmt='IS':get tag in the income statement,ddate='2020-12-31':annual report on 2020 year.Please read SEC's data field definition for qtrs,you may know why to set qtrs=4 in the select command.
Make a comparison with https://www.nasdaq.com/market-activity/stocks/tsla/financials,show part of it,look at the 12/31/2020 column:
Period Ending: 12/31/2021 12/31/2020 12/31/2019 12/31/2018
Total Revenue $53,823,000 $31,536,000 $24,578,000 $21,461,000
Cost of Revenue $40,217,000 $24,906,000 $20,509,000 $17,419,000
Gross Profit $13,606,000 $6,630,000 $4,069,000 $4,042,000
Research and Development $2,593,000 $1,491,000 $1,343,000 $1,460,000
All the number are equal,the terms in xbrl are:Revenues,CostOfRevenue,GrossProfit,ResearchAndDevelopmentExpense ,the respective terms in financial accounting concepts are:Total Revenue,Cost of Revenue,Gross Profit,Research and Development. What i get are right numbers.
Somedays later,i can find the direct way to parse raw xbrl file to get the income statement,i am not familiar with xbrl yet,wish stackoverflow community help me fulfill my goal.

Katalon Posibble to assert response = data?

I store json test data in excel file.
Make use of apache POI to read the json data and parse it as request body, call it from katalon.
Then I write many lines of assertion (groovy assert) to verify each line response = test data.
Example:
Assert test.responseText.fieldA == 'abc'
Assert test.responseText.fieldB == 'xyz'
And so on if I have total of 20 fields.
I'm thinking of there is better way to make use of the json data stored in data file.
To assert the response = test data. So I can save alot of time to key in each line and modify them is the test data changed.
Please advise if this can be achieved?
Here is an example: you have two excel sheets - current values and expected values (values you are testing against).
Current values:
No. | key | value
----+-----+------
1 a 100
2 b 6
3 c 13
Expected values:
No. | key | value
----+-----+------
1 a 100
2 b 6
3 c 14
You need to add those to Data Files:
The following code will compare the values in the for loop and the assertion will fail on the third run (13!=14):
def expectedData = findTestData("expected")
def currentData = findTestData("current")
for(i=1; i<=currentData.getRowNumbers(); i++){
assert currentData.getValue(2, i) == expectedData.getValue(2, i)
}
Failure message should look like this:
2020-07-02 15:16:40.471 ERROR c.k.katalon.core.main.TestCaseExecutor - ❌ Test Cases/table comparison FAILED.
Reason:
Assertion failed:
assert currentData.getValue(2, i) == expectedData.getValue(2, i)
| | | | | | |
| 14 3 | | 13 3
| | com.kms.katalon.core.testdata.reader.SheetPOI#5aabbb29
| false
com.kms.katalon.core.testdata.reader.SheetPOI#72c927f1

Is there a Python function that removes characters (with a digit) from a string?

I am working on a project about gentrification. My teammates pulled data from the census and cleaned it to get the values we need. The issue is, the zip code values won't print 0's (i.e. "2322" when it should be "02322"). We managed to find the tact value that prints the full zip code with the tact codes("ZCTA5 02322"). I want to remove "ZCTA5" to get the zip code alone.
I've tried the below code but it only gets rid of the "ZCTA" instead of "ZCTA5" (i.e. "502322"). I'm also concerned that if I manage to remove the 5 with the characters, it will remove all 5's in the zip codes as well.
From there I will be pulling from pgeocode to access the respective lat & lng values to create the heatmap. Please help?
I've tried the .replace(), .translate(), functions. Replace still prints the zip codes with 5. Translate gets an attribute error.
Sample data
Zipcode | Name | Change_In_Value | Change_In_Income | Change_In_Degree | Change_In_Rent
2322 | ZCTA5 02322 | -0.050242 | -0.010953 | 0.528509 | -0.013263
2324 | ZCTA5 02324 | 0.012279 | -0.022949 | -0.040456 | 0.210664
2330 | ZCTA5 02330 | 0.020438 | 0.087415 | -0.095076 | -0.147382
2332 | ZCTA5 02332 | 0.035024 | 0.054745 | 0.044315 | 1.273772
2333 | ZCTA5 02333 | -0.012588 | 0.079819 | 0.182517 | 0.156093
Translate
zipcode = []
test2 = gent_df['Name'] = gent_df['Name'].astype(str).translate({ord('ZCTA5'): None}).astype(int)
zipcode.append(test2)
test2.head()
Replace
zipcode = []
test2 = gent_df['Name'] = gent_df['Name'].astype(str).replace(r'\D', '').astype(int)
zipcode.append(test2)
test2.head()
Replace
Expected:
24093
26039
34785
38944
29826
Actual:
524093
526039
534785
538944
529826
Translate
Expected:
24093
26039
34785
38944
29826
Actual:
AttributeError Traceback (most recent call last)
<ipython-input-71-0e5ff4660e45> in <module>
3 zipcode = []
4
----> 5 test2 = gent_df['Name'] = gent_df['Name'].astype(str).translate({ord('ZCTA5'): None}).astype(int)
6 # zipcode.append(test2)
7 test2.head()
~\Anaconda3\envs\MyPyEnv\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5178 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5179 return self[name]
-> 5180 return object.__getattribute__(self, name)
5181
5182 def __setattr__(self, name, value):
AttributeError: 'Series' object has no attribute 'translate'
It looks like you are using pandas so you should be able to use the .lstrip() method. I tried this on a sample df and it worked for me:
gent_df.Name = gent_df.Name.str.lstrip(to_strip='ZCTA5')
Here is a link to the library page for .strip(), .lstrip(), and .rstrip()
I hope this helps!
There are many ways to do this. I can think of 2 off the top of my head.
If you want to keep the last 5 characters of the zipcode string, regardless of whether they are digits or not:
gent_df['Name'] = gent_df['Name'].str[-5:]
If want to get the last 5 digits of the zipcode string:
gent_df['Name'] = gent_df['Name'].str.extract(r'(\d{5})$')[0]
Include some sample data for more specific answer.

How to remove '/5' from CSV file

I am cleaning a restaurant data set using Pandas' read_csv.
I have columns like this:
name, online_order, book_table, rate, votes
xxxx, Yes, Yes, 4.5/5, 705
I expect them to be like this:
name, online_order, book_table, rate, votes
xxxx, Yes, Yes, 4.5, 705
You basically need to split the item(dataframe["rate"]) based on / and take out what you need. .apply this on your dataframe using lambda x: getRate(x)
def getRate(x):
return str(x).split("/")[0]
To use it with column name rate, we can use:
dataframe["rate"] = dataframe["rate"].apply(lambda x: getRate(x))
You can use the python .split() function to remove specific text, given that the text is consistently going to be "/5", and there are no instances of "/5" that you want to keep in that string. You can use it like this:
num = "4.5/5"
num.split("/5")[0]
output: '4.5'
If this isn't exactly what you need, there's more regex python functions here
You can use DataFrame.apply() to make your replacement operation on the ratecolumn:
def clean(x):
if "/" not in x :
return x
else:
return x[0:x.index('/')]
df.rate = df.rate.apply(lambda x : clean(x))
print(df)
Output
+----+-------+---------------+-------------+-------+-------+
| | name | online_order | book_table | rate | votes |
+----+-------+---------------+-------------+-------+-------+
| 0 | xxxx | Yes | Yes | 4.5 | 705 |
+----+-------+---------------+-------------+-------+-------+
EDIT
Edited to handle situations in which there could be multiple / or that it could be another number than /5 (ie : /4or /1/3 ...)

How do I get the contents of a list as a string?

This is st[0:6], which is my list (actually it is traces).
I can get some trace by numbers, e.g. st[3], but I want to get some trace by string, for example the data for ADK.10.
I know this name (ADK), but now I don't know this index number.
The 6 Trace(s) in Stream are below:
II.AAK.00.BHZ | 2010-02-18T01:19:08.019500Z
II.AAK.10.BHZ | 2010-02-18T01:19:08.019500Z
IU.ADK.00.BHZ | 2010-02-18T01:18:31.019536Z
IU.ADK.10.BHZ | 2010-02-18T01:18:31.019536Z
IU.AFI.00.BHZ | 2010-02-18T01:23:13.023144Z
IU.AFI.10.BHZ | 2010-02-18T01:23:13.010644Z
I tried this code below, but I'm getting an error:
tr = st['*.RSSD.00.*']
I want to get the RSSD.00 data to tr. What should I do?
I don't know what do you mean by data, but you can extract it with regular expressions. Here an example:
import re
tr = "II.AAK.00.BHZ | 2010-02-18T01:19:08.019500Z"
MAGIC_REGEX = "\w{2}\.(?P<name>.*)\.(?P<value>\d+)\.\w{3}\s+\|\s+(?P<date>\S+)"
match = re.fullmatch(MAGIC_REGEX, tr)
print(match.groupdict())
Result:
{'name': 'AAK', 'value': '00', 'date': '2010-02-18T01:19:08.019500Z'}
groupdict() Gives a dictionary with each group, but you also have groups() which gives you a tuple.
('AAK', '00', '2010-02-18T01:19:08.019500Z')

Resources