I uploaded a file to openai for the purpose of answers. I can't figure out how to access it without getting an error.
This is the contents of my file:
{"text":"We are focusing on improving communication to the customers."}
{"text":"The emphasis on the customer is working well."}
{"text":"Services initiatives"}
{"text":"CMM and focus on customer service."}
{"text":"We are working on CMM to focus on the customer."}
{"text":"Our emphasis on making our customers number one."}
{"text":"Customer relationships are continuing to improve."}
This is the information about the file when I query openai:
{"object":"file","id":"file-02fdY5PuZ1aO4cnd2r1PxaB7","purpose":"answers","filename":"f3f1c105-5217-4132-8950-a040b6183ed7","bytes":458,"created_at":1648396701,"status":"processed","status_details":null}
Now I ask a question at https://api.openai.com/v1/answers with this POST:
{"file":"file-02fdY5PuZ1aO4cnd2r1PxaB7","question":"What is the main theme?","search_model": "davinci","model": "davinci","examples_context": "In 2017, U.S. life expectancy was 78.6 years.","examples": [["What is human life expectancy in the United States?","The life expectancy in the United States is 78 years."]],"max_tokens": 100,"stop": ["\n", "<|endoftext|>"]}
The server returns a 400 error. If in my request I replace the "file" field with a "documents" array using same data, the request is successful.
I don't know what more to look at. If someone can suggest something or if you seen an issue in what I am trying to do, I would deeply appreciate hearing from you.
Thanks
I am trying to make a python code with tweepy that will track all the tweets from a specific country from a date which will have some of the chosen specific keywords. And I have chosen a lot of keywords like 24-25.
My keywords are vigilance anticipation interesting ecstacy joy serenity admiration trust acceptance terror fear apprehensive amazement surprize distraction grief sadness pensiveness loathing disgust boredom rage anger annoyance.
for more understanding, my code till now is:
places = api.geo_search(query="Canada",granularity="country")
place_id = places[0].id
public_tweets = tweepy.Cursor(api.search,
q="place:"+place_id+" since:2020-03-01",
lang="en",
).items(num_tweets)
Please help me with this question as soon as possible.
Thank You
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I have a csv file for lyrics songs that I took from Genius. Right now, I m preparing my data. I have two column "songs" and "artist". In the "songs" columns I have a lot information: title, album, year, lyrics and URL. I need to separate the column "songs" in 5 columns.
Then I tried to split the data by comma like this:
df = pd.read_csv('output.csv', header=None)
df = pd.DataFrame(df[0].str.split(',').tolist())
But with this code, I got 122 columns, because all the time that I have a comma in lyrics was created other column:
I guess I have to keep all my lyrics inside of double quotes, then if I split by comma the full lyric will remains in one single column.
Someone know how I can do that?
Adding 1 sample of the data:
songs,artist
"{'title': 'Berzerk', 'album': 'The Marshall Mathers LP 2', 'year': '2013-08-27', 'lyrics': '[Verse 1]\nNow this shit\'s about to kick off, this party looks wack\nLet\'s take it back to straight hip-hop and start it from scratch\nI\'m \'bout to bloody this track up, everybody get back\nThat\'s why my pen needs a pad, \'cause my rhymes on the rag\nJust like I did with addiction, I\'m \'bout to kick it\nLike a magician, critics I turn to crickets\nGot \'em still on the fence whether to picket\nBut quick to get it impaled when I tell \'em, ""Stick it!""\nSo sick I\'m looking pale, wait, that\'s my pigment\n\'Bout to go ham, ya bish, shout out to Kendrick\nLet\'s bring it back to that vintage Slim, bitch!\nThe art of MCing mixed with da Vinci and MC Ren\nAnd I don\'t mean Stimpy\'s friend, bitch\nBeen Public Enemy since you thought PE was gym, bitch\n\n[Pre-Chorus]\nKick your shoes off, let your hair down\n(And go berserk) all night long\nGrow your beard out, just weird out\n(And go berserk) all night long\n\n[Chorus 1]\nWe\'re gonna rock this house until we knock it down\nSo turn the volume loud\n\'Cause it\'s mayhem \'til the A.M.\nSo, baby, make just like K-Fed\nAnd let yourself go, let yourself go\nSay ""Fuck it!"" before we kick the bucket\nLife\'s too short to not go for broke\nSo everybody, everybody, go berserk, grab your vial, yeah\n\n[Verse 2]\nGuess it\'s just the way that I\'m dressed, ain\'t it?\nKhakis pressed, Nike shoes crispy and fresh laced\nSo I guess it ain\'t that aftershave\nOr cologne that made \'em just faint\nPlus I showed up with a coat fresher than wet paint\nSo if love is a chess game, check mate\nBut girl, your body\'s bangin\', jump me in, dang, bang-bang\nYes siree \'Bob\', I was thinking the same thang\nSo come get on this Kid\'s rock, baw with da baw, dang-dang\nPow-p-p-p-pow, chica, pow, chica, wow-wow\nGot your gal blowin\' up a valve, valve-valve\nAin\'t slowin\' down, throw in the towel, towel-towel\nDumb it down, I don\'t know how, huh-huh, how-how\nAt least I know that I don\'t know\nQuestion is, are you bozos smart enough to feel stupid?\nHope so, now ho…\n\n[Pre-Chorus]\nKick your shoes off, let your hair down\n(And go berserk) all night long\nGrow your beard out, just weird out\n(And go berserk) all night long\n\n[Chorus 2]\nWe\'re gonna rock this house until we knock it down\nSo turn the volume loud\n\'Cause it\'s mayhem \'til the A.M.\nSo crank the bass up like crazy\nAnd let yourself go, let yourself go\nSay ""Fuck it!"" before we kick the bucket\nLife\'s too short to not go for broke\nSo everybody, everybody, go berzerk, get your vinyls!\n\n[Scratch]\n\n[Verse 3]\nThey say that love is powerful as cough syrup in styrofoam\nAll I know is I fell asleep and woke up in that Monte Carlo\nWith the ugly Kardashian, Lamar, oh\nSorry yo, we done both set the bar low\nFar as hard drugs are though, that\'s the past\nBut I done did enough codeine to knock Future into tomorrow\nAnd girl, I ain\'t got no money to borrow\nBut I am tryin\' to find a way to get you alone: car note\nOh, Marshall Mathers\nShithead with a potty mouth, get the bar of soap lathered\nKangol\'s and Carheartless Cargos\nGirl, you\'re fixin\' to get your heart broke\nDon\'t be absurd, ma\'am, you birdbrain, baby\nI ain\'t called anybody baby since Birdman, unless you\'re a swallow\nWord, Rick? (Word, man, you heard)\nBut don\'t get discouraged, girl\nThis is your jam, unless you got toe jam\n\n[Pre-Chorus]\nKick your shoes off, let your hair down\n(And go berserk) all night long\nGrow your beard out, just weird out\n(And go berserk) all night long\n\n[Chorus 1]\nWe\'re gonna rock this house until we knock it down\nSo turn the volume loud\n\'Cause it\'s mayhem \'til the A.M.\nSo, baby, make just like K-Fed\nAnd let yourself go, let yourself go\nSay ""Fuck it!"" before we kick the bucket\nLife\'s too short to not go for broke\nSo everybody, everybody, go berserk, grab your vial, yeah', 'image': 'https://images.genius.com/a47bb228d28fd8a0e6e73abfabef7832.1000x1000x1.jpg'}",Eminem
Try this.
import ast
import pandas as pd
raw = pd.read_csv("output.csv")
raw["songs"] = raw["songs"].apply(lambda x: ast.literal_eval(x))
songs = raw["songs"].apply(pd.Series)
result = pd.concat([raw[["artist"]], songs], axis=1)
result.head()
I was looking through the 1975 Oregon Trail Basic Code and found this line in it:
PRINT LIN(2)
I have searched quite a few places but can't find any reference to it.
Can anyone tell me what this means?
Edit:
Sorry, I was wondering what the PRINT LIN(2) meant. Does anyone know what that means?
"Oregon Trail" source of 1975 at www.filfre.net/misc/oregon1975.bas was written in BASIC for a HP-2100 system.
This HP-2100 system was a series of minicomputers produced by Hewlett-Packard.
This system run an interpreted BASIC named "HP Time-Shared BASIC".
This is the reference manual of "TimeShared BASIC/2000 Level F".
About:
PRINT LIN(2)
Generates a carriage return and 2 (two) line feeds.
"Oregon Trail" for year 1978 at www.filfre.net/misc/oregon1978.bas was written using BASIC for "CDC Cyber range of mainframe-class supercomputers of Control Data Corporation (CDC)".
Documentation
http://bitsavers.org/pdf/cdc/cyber/lang/basic/
19983900K_BASIC_Version_3_Reference_Manual_Aug84.pdf
I compare both sources (strip line number without reference by THEN, GOTO or GOSUB) at
Oregon Trail Compare
I'm having a problem inputting tab delimited files into the stanford classifier.
Although I was able to successfully walk through all the included stanford tutorials, including the newsgroup tutorial, when I try to input my own training and test data it doesn't load properly.
At first I thought the problem was that I was saving the data into a tab delimited file using an Excel spreadsheet and it was some kind of encoding issue.
But then I got exactly the same results when I did the following. First I literally typed the demo data below into gedit, making sure to use a tab between the politics/sports class and the ensuing text:
politics Obama today announced a new immigration policy.
sports The NBA all-star game was last weekend.
politics Both parties are eyeing the next midterm elections.
politics Congress votes tomorrow on electoral reforms.
sports The Lakers lost again last night, 102-100.
politics The Supreme Court will rule on gay marriage this spring.
sports The Red Sox report to spring training in two weeks.
sports Messi set a world record for goals in a calendar year in 2012.
politics The Senate will vote on a new budget proposal next week.
politics The President declared on Friday that he will veto any budget that doesn't include revenue increases.
I saved that as myproject/demo-train.txt and a similar file as myproject/demo-test.txt.
I then ran the following:
java -mx1800m -cp stanford-classifier.jar edu.stanford.nlp.classify.ColumnDataClassifier
-trainFile myproject/demo-train.txt -testFile myproject/demo-test.txt
The good news: this actually ran without throwing any errors.
The bad news: since it doesn't extract any features, it can't actually estimate a real model and the probability defaults to 1/n for each item, where n is the number of classes.
So then I ran the same command but with two basic options specified:
java -mx1800m -cp stanford-classifier.jar edu.stanford.nlp.classify.ColumnDataClassifier
-trainFile myproject/demo-train.txt -testFile myproject/demo-test.txt -2.useSplitWords =2.splitWordsRegexp "\s+"
That yielded:
Exception in thread "main" java.lang.RuntimeException: Training dataset could not be processed
at edu.stanford.nlp.classify.ColumnDataClassifier.readDataset(ColumnDataClassifier.java:402)
at edu.stanford.nlp.classify.ColumnDataClassifier.readTrainingExamples (ColumnDataClassifier.java:317)
at edu.stanford.nlp.classify.ColumnDataClassifier.trainClassifier(ColumnDataClassifier.java:1652)
at edu.stanford.nlp.classify.ColumnDataClassifier.main(ColumnDataClassifier.java:1628)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
at edu.stanford.nlp.classify.ColumnDataClassifier.makeDatum(ColumnDataClassifier.java:670)
at edu.stanford.nlp.classify.ColumnDataClassifier.makeDatumFromLine(ColumnDataClassifier.java:267)
at edu.stanford.nlp.classify.ColumnDataClassifier.makeDatum(ColumnDataClassifier.java:396)
... 3 more
These are exactly the same results I get when I used the real data I saved from Excel.
Even more though, I don't know how to make sense of the ArrayIndexOutOfBoundsException. When I used readline in python to print out the raw strings for both the demo files I created and the tutorial files that worked, nothing about the formatting seemed different. So I don't know why this exception would be raised with one set of files but not the other.
Finally, one other quirk. At one point I thought maybe line breaks were the problem. So I deleted all line breaks from the demo files while preserving tab breaks and ran the same command:
java -mx1800m -cp stanford-classifier.jar edu.stanford.nlp.classify.ColumnDataClassifier
-trainFile myproject/demo-train.txt -testFile myproject/demo-test.txt -2.useSplitWords =2.splitWordsRegexp "\s+"
Surprisingly, this time no java exceptions are thrown. But again, it's worthless: it treats the entire file as one observation, and can't properly fit a model as a result.
I've spent 8 hours on this now and have exhausted everything I can think of. I'm new to Java but I don't think that should be an issue here -- according to Stanford's API documentation for ColumnDataClassifier, all that's required is a tab delimited file.
Any help would be MUCH appreciated.
One last note: I've run these same commands with the same files on both Windows and Ubuntu, and the results are the same in each.
Use a properties file. In the example Stanford classifier example
trainFile=20news-bydate-devtrain-stanford-classifier.txt
testFile=20news-bydate-devtest-stanford-classifier.txt
2.useSplitWords=true
2.splitWordsTokenizerRegexp=[\\p{L}][\\p{L}0-9]*|(?:\\$ ?)?[0-9]+(?:\\.[0-9]{2})?%?|\\s+|[\\x80-\\uFFFD]|.
2.splitWordsIgnoreRegexp=\\s+
The number 2 at the start of lines 3, 4 and 5 signifies the column in your tsv file. So in your case you would use
trainFile=20news-bydate-devtrain-stanford-classifier.txt
testFile=20news-bydate-devtest-stanford-classifier.txt
1.useSplitWords=true
1.splitWordsTokenizerRegexp=[\\p{L}][\\p{L}0-9]*|(?:\\$ ?)?[0-9]+(?:\\.[0-9]{2})?%?|\\s+|[\\x80-\\uFFFD]|.
1.splitWordsIgnoreRegexp=\\s+
or if you want to run with command line arguments
java -mx1800m -cp stanford-classifier.jar edu.stanford.nlp.classify.ColumnDataClassifier -trainFile myproject/demo-train.txt -testFile myproject/demo-test.txt -1.useSplitWords =1.splitWordsRegexp "\s+"
I've faced the same error as you.
Pay attention on tabs in the text you are classifying.
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
This means, that at some point classifier expects array of 3 elements, after it splits the string with tabs.
I've run a method, that counts amount of tabs in each line, and if at some line you have not two of them - here is an error.