Gluton-TS: how do I specify the X and the Y variable? - pytorch

How do I specify the X and the Y variable for my data frame?
say I have something like:
Date | Open | High | Close |
2015-01-01 | 434.622009 | 436.062012 | 431.869995
I want the X to be Open and High, and I want the Y to be "Close", do I just do:
training_data = ListDataset(
[{"start": df.index[0], "target": df["Close"]}],
freq = "1d"
)

Related

Pandas DataFrame remove strings having a certain characters

I have a pandas DataFrame with a lot of text data. I want to remove all lines starting with "*" mark. Therefore, I tried a small example as the following.
string1 = '''* This needs to be gone
But this line should stay
*remove
* this too
End'''
string2 = '''* This needs to be gone
But this line should stay
*remove
* this too
End'''
df = pd.DataFrame({'a':[string1,string2]})
df['a'] = df['a'].map(lambda a: (re.sub(r'(?m)^\*.*\n?', '', a, flags=re.MULTILINE)))
It could perfectly do the job. However, when I applied the same function to my original DataFrame it is not working. Can you help me to identify the issue?
df2['NewsText'] = df2['NewsText'].map(lambda a: (re.sub(r'(?m)^\*.*\n?', '', a, flags=re.MULTILINE)))
df2.head()
Pease see the attached image of my original DataFrame
Given your example data
.str.split('\n') creates a list of each section
.apply(lambda x: '\n'.join([y for y in x if '*' not in y])) uses a list comprehension to remove each sentence with * and then joins it back into a string.
You can join with ' '.join or ''.join
.apply(lambda x: [y for y in x if '*' not in y]) if you want to have a list instead of a long string.
| | a |
|---:|:--------------------------|
| 0 | * This needs to be gone |
| | But this line should stay |
| | *remove |
| | * this too |
| | End |
| 1 | * This needs to be gone |
| | But this line should stay |
| | *remove |
| | * this too |
| | End |
# remove sections with '*'
df['a'] = df['a'].str.split('\n').apply(lambda x: '\n'.join([y for y in x if '*' not in y]))
# final
| | a |
|---:|:--------------------------|
| 0 | But this line should stay |
| | End |
| 1 | But this line should stay |
| | End |

Count categorical values in DataFrame

I have DataFrame only with Categorical Values
Col1 | Col2| ... | ColM
Row
1 X | Y | ... | X
2 Z | X | ... | Y
3 Y | Z | ... | X
.
.
.
N X | Z | ... | Z
I would like to count how many times each category appeared in database
So example result:
X - 100 times
Y - 30 times
Z = 210 times
Thank You for help
The most performant option is to use np.unique with the return_counts flag set:
u, c = np.unique(df, return_counts=True)
pd.Series(c, index=u)
There's also stack and value_counts, which is much slower, but simple and intuitive:
df.stack().value_counts()

Can you have wildcard characters in a cell in an IF statement?

I am trying to set up a table that will allow for very quick classification of sites across about 50 different characteristics. The method I have thought of but am unsure if it's possible is as follows.
Worksheet A: the raw data about 100R x 50C with each cell
describing a characteristic of that row where the last column is the
overall classification.
Worksheet B: a table of about 5R x 50C with the columns
corresponding to the columns in Worksheet A.
A row of Worksheet B would look something like:
* | * | * | 1 | * | 3 | * | Y | * | ... | * | * | * |
And a row from Worksheet A that corresponds with this data would look something like:
A | B | C | 1 | 5 | 3 | Z | Y | 1 | ... | F | 2 | X | High Priority
Where the asterisks indicate a wildcard where I don't care what the content is. All of the other cells would be required conditions. Then I was thinking of applying an array formula on the last column to get the classification. Something like:
{=IF(AND(A2:BV2='Worksheet B'!$A$2:$BV$2), "High Priority", "Low Priority")}
But Excel takes the asterisks as literal string content and evaluates it as FALSE.
Is there a way to make this work? Or an alternative method that would be just as simple to implement?
I got to the bottom of it with a reasonably elegant solution. Please post criticisms if there is a situation where this won't work.
={IF(SUM(IF(A2:BV2='Worksheet B'!A2:BV2,0,1))=COUNTIF('Worksheet B'!A2:BW2,"x"),"Top Priority", "Low Priority")}
Where x is for those cells in which I don't care about the outcome. So instead of "*", I am using "x" in the cells above such that Worksheet B is more like:
x | x | x | 1 | x | 3 | x | Y | x | ... | x | x | x |
If anyone is interested, the formula works by counting all of the mismatched elements and checking them against the number of cells with "x" in the result. If these two numbers are equal, the number of mismatches is equal to the number of cells we don't care about.

Automatically calculate (or delete) rows in Excel when first column is changing

I have a big table, where first columns X is "input column" and range it's changing.
Y - There are more formulas and functions (Vlookup) and 1st column X is a lookup value, and then other columns are calculated from other sheets.
| A | B | C | D | E
1 | X | Y | Y | Y | Y
2 | X | Y | Y | Y | Y
3 | X | Y | Y | Y | Y
4 | X | Y | Y | Y | Y
I am inserting (and deleting) more X values (actual data) and then I use "double click" for all other Y columns to be calculated, BUT it's not good because the X range is not the same. I tried to convert it to table "Ctrl-T", but it's not working very good for me. Maybe I don't use it properly.
Problem:
If I paste a new X column, I need other Y columns to be automatically calculated OR if I delete few X rows, other Y should be also deleted. Now I get something like this:
| A | B | C | D | E
1 | X | Y | Y | Y | Y
2 | X | Y | Y | Y | Y
3 | | N/A | N/A | N/A | N/A
4 | | N/A | N/A | N/A | N/A
or:
| A | B | C | D | E
1 | X | Y | Y | Y | Y
2 | X | Y | Y | Y | Y
3 | X | | | |
What I need:
If I remove X value I need automatically disappear Y values:
| A | B | C | D | E
1 | X | Y | Y | Y | Y
2 | X | Y | Y | Y | Y
If I add X value I need automatically calculate Y values:
| A | B | C | D | E
1 | X | Y | Y | Y | Y
2 | X | Y | Y | Y | Y
3 | X | Y | Y | Y | Y
Hope it's clear, thank you!
For Y Columns, you can add "IF" FORMULA
=if(A1>0,*Y COLUMN FORMULA*,"")
try changing formula to
=iferror(*Y formula,"")
or if it's still slow and if you are changing only X Columns
you can use below code
Private Sub Worksheet_Change(ByVal Target As Range)
If Target.Column = 1 And Target.Count = 1 Then 'CHECK IF THERE IS ANY CHANGE ON X COLUMN
If Target.Value = Empty Then 'CHECK IF X COLUMN HAS BEEN DELETED
Rows(Target.Row).Delete 'IF X COLUMN IS DELETED, DELETS WHOLE ROW
Else
Cells(Target.Row - 1, 2).Resize(1, 4).Copy Cells(Target.Row, 2).Resize(1, 4) 'IF X COLUMN IS ENTERED OR MODIFIED COPIES ABOVE Y COLUMN FORMULAS
End If
End If
End Sub

Insert new columns only when ID is the same in Excel

I have 2 worksheets with similar table structures which looks like this:
| ID | A | B | C |
+--------+-------+-------+-------+
| 1 | x | x | x |
| 4 | x | x | x |
| 12 | x | x | x |
| 3 | x | x | x |
| |
| ... (thousands of rows)
where x are values. Is it possible to create a new table (or worksheet) combining the two worksheets only where the ID from Worksheet1 is the same (similar to a SQL query) so that the resulting table will be like:
| ID | A | B | C | D | E | F |
+--------+-------+-------+-------+-------+-------+-------+
| 1 | x | x | x | x | x | x |
| 4 | x | x | x | x | x | x |
| 12 | x | x | x | x | x | x |
| 3 | x | x | x | x | x | x |
| |
| etc...
Note that the contents of Worksheet1 is added to and not subtracted from. Is VBA necessary or can it be done with a formula? Thank you.
You can use vlookup to solve this.
vlookup searches for id in sheet2 and returns corresponding value in your specified column number of the selected table.

Resources