From Excel to Quant mod - excel

I am doing some simple analysis using quantmod, my file is in Excel csv file.
The first column is the date format YYYY-MM-DD, I then have ten columns containing price data, each represents a fund or index. None of the data is on yahoo, so I cannot use getSymbols.
Could someone give the code to bring the excel file into R in a format workable with Quantmod in an understandable form that a non-programmer can understand?

I think the issue you have is that if you read the CSV file into R it is a dataframe object. Use the class() function to confirm.
library(tidyverse)
library(quantmod)
library(timekt)
my_data <- readr::read_csv('my excel file.csv')
class(my_data)
To use quantmod function your data needs to be in an xts object (time-series object), it can't be in a dataframe. You can convert a dataframe that has a date/index column into an xts object using the timekt::tk_xts() function. And then you should be able to use quantmod functions for analysis on your data.
my_xts <- timekt::tk_xts(my_data)
quantmod::monthlyReturns(my_xts)

Related

Get word definition/s from google translator using Python

please, I am making my own dictionary and cant figure out how to pull translation definitions from google translate. My idea is that python will open my excel file and in every cell in column 1 is a new word. python will take every single one simultaneously. translate it from English to Slovak by using google translator and don't take just the translated word, but rather its definition/s (if there's more than one definition, take them all) and the group of the definition (noun, adverb, verb, ...) and then add these data in the excel table either in a new cell next to the original translated word or if more definitions, add rows for every definition.
I'm new to this so please excuse me.
To be able to satisfy your requirements. A way to do this is to do the following in your script:
You can use pandas.read_excel to read your excel file and do some data manipulation to get all values in your column 1.
When you got your values to translate you can use something like googletrans which uses Google Translate on the back end or use the paid Google Translation API to handle your translations. But based from your requirements, I suggest using the Google Translation API since it is capable of returning all possible definitions.
When you get your translations, it is up to you to transform your data so you can add them as a new column on your original excel file. You can use pandas.ExcelWriter for this.
I made this simple script that reads a CSV file (I don't have excel installed in my machine), translates everything under text column and puts them to the translated column. It's up to you if you process the data differently.
NOTE for the script below:
I used the Google Translation API which is the paid service
Use pd.read_excel() to read excel files
Adjust the column number based from your input file
sample_data.csv:
text
dummy_field
run
dummy1
how are you
dummy2
jump
dummy3
Sample script:
import pandas as pd
from google.cloud import translate_v2 as translate
def translate_text(text):
translate_client = translate.Client()
target = 'tl'
result = translate_client.translate(text, target_language = target)
return result["translatedText"]
def process_data(input_file):
#df = pd.read_excel('test.xlsx', engine='openpyxl')
df = pd.read_csv(input_file)
df['translated'] = df['text'].apply(translate_text)
# move column 'translated' to second column
# this position will depend on your actual data
second_col = df.pop('translated')
df.insert(1, 'translated', second_col)
print(df)
df.to_csv('./updated_data.csv',index=False)
df.to_excel('./updated_data.xlsx',index=False)
process_data('sample_data.csv')
Output:
Dataframe
Generated csv file:
Generated excel file:

Is there any other way to parse the Excel file with irregular tables?

I used to use pandas to parse the Excel file and it worked pretty well when the data follows a table format. But recently I got a new data look like this:
When I use pandas to read the Excel file, it would read the entire spreadsheet instead of the tables (week by week). My idea now is to reorganize the tables.
For example, when I read the column B from row 10 to row 25, if I encounter the value equals to "% Rejection", then it will move right to read the percentage of each day (for seven times) and create a new table I want.
However, it feels like not quite efficient. Therefore, I'm curious if there is any other way to parse the data. Any recommendation would be great. Thank you.
Edit:
I wonder if I can parse the Excel file to a table looks like this:

Problem when importing table from pdf to python using tabula

When importing data from pdf using tabula with Python, in some cases, I obtain two or more columns merged in one. It does not happen with all the files obtained from the same pdf.
In this case, this is the code used to read the pdf:
from tabula import wrapper
tables = wrapper.read_pdf("933884 cco Saupa 1.pdf",multiple_tables=True,pages='all')
i=1
for table in tables:
table.to_excel('output'+str(i)+'.xlsx',index=False)
i=i+1
For example, when I print the first item of the dataframe obtained from one of these excel files, named "output_pd":
print (output_pd[0][1])
I obtain:
76) 858000015903708 77) 858000013641969 78)
The five numbers are in a single column, so I cannot treat them individually.
Is it possible to improve the data handling in these cases?
You could try manually editing the data in excel. If you use text to columns under the data tab in excel it allows you to split one column into multiple columns without too much work, but you would need to do it for every excel file which could be a pain.
Iterating in each item of each column of each dataframe in the list obtained with tabula
wrapper.read_pdf(file)
in this case
tables
it is possible to obtain clean data.
In this case:
prueba =[]
i = 0
for table in tables:
for columna in table.columns:
for item in (str(table[columna]).split(" ")):
if "858" in str(item):
prueba.append(item[0:15])
print (prueba[0:5])
result in:
['858000019596025', '858000015903707', '858000013641975', '858000000610864', '858000013428853']
But
tabula.wrapper.read_pdf
does not read the whole initial pdf. 2 values are left in the last page. So, it is still neccesary to manually make a little edit.

how to search a text file in python 3

I have this text file that has lists in it. How would I search for that individual list? I have tried using loops to find it, but every time it gives me an error since I don't know what to search for.
I tried using a if statement to find it but it returns -1.
thanks for the help
I was doing research on this last night. You can use pandas for this. See here: Load data from txt with pandas. One of the answers talks about list in text files.
You can use:
data = pd.read_csv('output_list.txt', sep=" ", header=None)
data.columns = ["Name", "b", "c", "etc."]
Add sep=" " in your code, leaving a blank space between the quotes. So pandas can detect spaces between values and sort in columns. Data columns isenter code here for naming your columns.
With a JSON or XML format, text files become more searchable. In my research I’ve decided to go with an XML approach. Here is the link to a blog that explains how do use Python with XML: http://www.austintaylor.io/lxml/python/pandas/xml/dataframe/2016/07/08/convert-xml-to-pandas-dataframe.
If you want to search the data frame try:
import pandas as pd
txt_file = 'C:\path\to\your\txtfile.txt'
df = pd.read_table(txt_file, sep = ",")
row = df.loc[df['Name'] == 'bob']
Print(row)
Now depending how your text file is formated, your results will not work for every text file. The idea of a dataframe in pandas helps u create a CSV file formats. This giving the process a repeatable structure to enable testing results. Again I recommend using a JSON or XML format before implementing pandas data frames in ur solution. U can then create a consistent result, that is testable too!

How to read mixed string and number data from csv in matlab and manipulate

I'm looking to write a script for MATLAB that will import data from a csv file which has a first row containing string headers and the data in each of those columns is either string, date or numeric.
I want to then be able to filter the data in MATLAB according to instances of a particular string and number combination.
Any help appreciated!
Cheers!
I would recommend you to start with reading MATLAB documentation.
[num,txt,raw] = xlsread('myExample.xlsx')
Reads numeric, text and combined data, so, if your data is combined, then you need the cell array raw. After that, you do whatever you want with your cell array (Additional information is not provided since OP did not provide any specific information about the way the data would be filtered)
Try using readtable function in MATLAB.
It correctly imports csv file with header and mixed data type.
xlsread was imported by mixed csv file very incorrectly repeating the some rows while maintaining the same total rows.
I got this after searching for a long time:
MATLAB Central Question/Answer

Resources