How to combine multiple columns in CSV file using pandas? [closed] - python-3.x

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I have a csv file for lyrics songs that I took from Genius. Right now, I m preparing my data. I have two column "songs" and "artist". In the "songs" columns I have a lot information: title, album, year, lyrics and URL. I need to separate the column "songs" in 5 columns.
Then I tried to split the data by comma like this:
df = pd.read_csv('output.csv', header=None)
df = pd.DataFrame(df[0].str.split(',').tolist())
But with this code, I got 122 columns, because all the time that I have a comma in lyrics was created other column:
I guess I have to keep all my lyrics inside of double quotes, then if I split by comma the full lyric will remains in one single column.
Someone know how I can do that?
Adding 1 sample of the data:
songs,artist
"{'title': 'Berzerk', 'album': 'The Marshall Mathers LP 2', 'year': '2013-08-27', 'lyrics': '[Verse 1]\nNow this shit\'s about to kick off, this party looks wack\nLet\'s take it back to straight hip-hop and start it from scratch\nI\'m \'bout to bloody this track up, everybody get back\nThat\'s why my pen needs a pad, \'cause my rhymes on the rag\nJust like I did with addiction, I\'m \'bout to kick it\nLike a magician, critics I turn to crickets\nGot \'em still on the fence whether to picket\nBut quick to get it impaled when I tell \'em, ""Stick it!""\nSo sick I\'m looking pale, wait, that\'s my pigment\n\'Bout to go ham, ya bish, shout out to Kendrick\nLet\'s bring it back to that vintage Slim, bitch!\nThe art of MCing mixed with da Vinci and MC Ren\nAnd I don\'t mean Stimpy\'s friend, bitch\nBeen Public Enemy since you thought PE was gym, bitch\n\n[Pre-Chorus]\nKick your shoes off, let your hair down\n(And go berserk) all night long\nGrow your beard out, just weird out\n(And go berserk) all night long\n\n[Chorus 1]\nWe\'re gonna rock this house until we knock it down\nSo turn the volume loud\n\'Cause it\'s mayhem \'til the A.M.\nSo, baby, make just like K-Fed\nAnd let yourself go, let yourself go\nSay ""Fuck it!"" before we kick the bucket\nLife\'s too short to not go for broke\nSo everybody, everybody, go berserk, grab your vial, yeah\n\n[Verse 2]\nGuess it\'s just the way that I\'m dressed, ain\'t it?\nKhakis pressed, Nike shoes crispy and fresh laced\nSo I guess it ain\'t that aftershave\nOr cologne that made \'em just faint\nPlus I showed up with a coat fresher than wet paint\nSo if love is a chess game, check mate\nBut girl, your body\'s bangin\', jump me in, dang, bang-bang\nYes siree \'Bob\', I was thinking the same thang\nSo come get on this Kid\'s rock, baw with da baw, dang-dang\nPow-p-p-p-pow, chica, pow, chica, wow-wow\nGot your gal blowin\' up a valve, valve-valve\nAin\'t slowin\' down, throw in the towel, towel-towel\nDumb it down, I don\'t know how, huh-huh, how-how\nAt least I know that I don\'t know\nQuestion is, are you bozos smart enough to feel stupid?\nHope so, now ho…\n\n[Pre-Chorus]\nKick your shoes off, let your hair down\n(And go berserk) all night long\nGrow your beard out, just weird out\n(And go berserk) all night long\n\n[Chorus 2]\nWe\'re gonna rock this house until we knock it down\nSo turn the volume loud\n\'Cause it\'s mayhem \'til the A.M.\nSo crank the bass up like crazy\nAnd let yourself go, let yourself go\nSay ""Fuck it!"" before we kick the bucket\nLife\'s too short to not go for broke\nSo everybody, everybody, go berzerk, get your vinyls!\n\n[Scratch]\n\n[Verse 3]\nThey say that love is powerful as cough syrup in styrofoam\nAll I know is I fell asleep and woke up in that Monte Carlo\nWith the ugly Kardashian, Lamar, oh\nSorry yo, we done both set the bar low\nFar as hard drugs are though, that\'s the past\nBut I done did enough codeine to knock Future into tomorrow\nAnd girl, I ain\'t got no money to borrow\nBut I am tryin\' to find a way to get you alone: car note\nOh, Marshall Mathers\nShithead with a potty mouth, get the bar of soap lathered\nKangol\'s and Carheartless Cargos\nGirl, you\'re fixin\' to get your heart broke\nDon\'t be absurd, ma\'am, you birdbrain, baby\nI ain\'t called anybody baby since Birdman, unless you\'re a swallow\nWord, Rick? (Word, man, you heard)\nBut don\'t get discouraged, girl\nThis is your jam, unless you got toe jam\n\n[Pre-Chorus]\nKick your shoes off, let your hair down\n(And go berserk) all night long\nGrow your beard out, just weird out\n(And go berserk) all night long\n\n[Chorus 1]\nWe\'re gonna rock this house until we knock it down\nSo turn the volume loud\n\'Cause it\'s mayhem \'til the A.M.\nSo, baby, make just like K-Fed\nAnd let yourself go, let yourself go\nSay ""Fuck it!"" before we kick the bucket\nLife\'s too short to not go for broke\nSo everybody, everybody, go berserk, grab your vial, yeah', 'image': 'https://images.genius.com/a47bb228d28fd8a0e6e73abfabef7832.1000x1000x1.jpg'}",Eminem

Try this.
import ast
import pandas as pd
raw = pd.read_csv("output.csv")
raw["songs"] = raw["songs"].apply(lambda x: ast.literal_eval(x))
songs = raw["songs"].apply(pd.Series)
result = pd.concat([raw[["artist"]], songs], axis=1)
result.head()

Related

Fine Tuning GPT-3 for Consistent Output Format

I am trying to use Open AI API to create quiz questions with three incorrect answers and one correct answer. The prompt I use is
`Write a quiz on ${quiz_topic} with 5 questions. Each question has 3 incorrect answers and 1 correct answer. The correct answer is always the last answer. Write each answer on a separate line`
I would like to parse ChatGPT's answer but the format is inconsistent. For instance, is quiz_topic is "literature", I get the following response:
Q1. Who wrote the novel "The Catcher in the Rye"?
A. J.D. Salinger
B. Ernest Hemingway
C. F. Scott Fitzgerald
D. Mark Twain
This is the correct format. But if I use "U.S History", I get the following output:
Q1: Who was the first president of the United States?
A1: George Washington Jr., Thomas Jefferson, John Adams, George Washington
This is a different format. So my goal is to fine-tune it using the article here
I have created 200 line of a training set. Here is a sample:
{"prompt":"Write a quiz on general knowledge with 5 questions. Each question has 3 incorrect answers and 1 correct answer. The correct answer is always the last answer.Write each answer on a separate line \n\n---\n\n","completion":" Q. What is the Italian word for "tomato"?\nA. Aglio\nA. Cipolla\nA. Peperoncino\nA. Pomodoro\n\nQ. Which restaurant's mascot is a clown?\nA. Whataburger\nA. Burger King\nA. Sonic\nA. McDonald's\n\nQ. Which of these colours is NOT featured in the logo for Google?\nA. Yellow\nA. Blue\nA. Green\nA. Pink\n\nQ. In 2013 how much money was lost by Nigerian scams?\nA. $95 Million\nA. $956 Million\nA. $2.7 Billion\nA. $12.7 Billion\n\nQ. What is the name of Poland in Polish?\nA. Pupcia\nA. Polszka\nA. Póland\nA. Polska\n\n \n\n###\n\n"}
{"prompt":"Write a quiz on books with 5 questions. Each question has 3 incorrect answers and 1 correct answer. The correct answer is always the last answer.Write each answer on a separate line \n\n---\n\n","completion":" Q. What is the name of Eragon's dragon in "Eragon"?\nA. Glaedr\nA. Thorn\nA. Arya\nA. Saphira\n\nQ. In the "The Hobbit", who kills Smaug?\nA. Bilbo Baggins\nA. Gandalf the Grey\nA. Frodo\nA. Bard\n\nQ. What is Hermione Granger's middle name?\nA. Jane\nA. Emma\nA. Jo\nA. Jean\n\nQ. According to The Hitchhiker's Guide to the Galaxy book, the answer to life, the universe and everything else is...\nA. Loving everyone around you\nA. Chocolate\nA. Death\nA. 42\n\nQ. What is the name of the three headed dog in Harry Potter and the Sorcerer's Stone?\nA. Spike\nA. Poofy\nA. Spot\nA. Fluffy\n\n \n\n###\n\n"}
When I run the validation tool with the command
openai tools fine_tunes.prepare_data -f training.jsonl
I get the following message
- All prompts start with prefix `Write a quiz on `. Fine-tuning doesn't require the instruction specifying the task, or a few-shot example scenario. Most of the time you should only add the input data into the prompt, and the desired output into the completion
I don't understand why I must remove "Write a quiz on". So I have misunderstood how to fine-tune a model for consistent formatting.
Can anybody shed a light on how to make sure I get the same formatting with the same prompt

I've been working on a discord bot for at least a week and just today, my joke command started saying invalid syntax

I have been working on a discord.py bot when yesterday, on my bus, some of my code got deleted. Everything was working fine, but now my joke command doesn't work. I am pretty new to coding as this is my 2nd week, but can someone explain to me why this says invalid syntax?
#bot.command()
async def joke(ctx):
responses = [
"Two hunters are out in the woods when one of them collapses. He's not breathing so his friend calls 911. My friend is dead! What should I do? The operator replies, Calm down sir, first make sure that he's really dead. There's a silence, then a loud bang. Back on the phone, the guy says, Ok, now what?",
"I threw a boomerang a few years ago; now I live in constant fear.",
"Someone stole my mood ring. I don't know how I feel about that.",
"Women call me ugly until they find out how much money I make. Then they call me ugly and poor.",
"You're not completely useless: you can serve as a bad example.",
"I broke my finger last week. On the other hand, I'm okay.",
"Appearantly someone in London gets stabbed every 52 seconds. Poor bastard.",
"A Roman legionaire walks into a bar, holds up two fingers, and says, Five Beers please!"
await ctx.send(f'{random.choice(responses)}')
Your missing a closing bracket at the end of your list.

Python/Selenium - How to process extracted data from App store

I am using Selenium/Python to parse reviews from Apple app store. I used the following code to extract the data for the first five reviews:
URL: https://apps.apple.com/us/app/lemonade-insurance/id1055653645#see-all/reviews
wait = WebDriverWait(driver, 5)
response_ratings = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".we-customer-review")))
response_container = []
for e in response_ratings[:5]:
response_container.append(e.get_attribute('innerHTML'))
print(response_container[0])
Then, I print the first output
I expect to have star 5 out of 5, date July 6, 2019, title Convenient and Affordable!!!!, review The Lemonade app is so easy to use as well as having affordable rates!..., and Developer Response Thanks so much for your awesome review!! We're so happy to have you in... for the first review.
How do I get the above info? Thank you in advance for the help
You can use BeautifulSoup to parse the innerHTML and get what you're looking for.
One way of doing it would be:
import re
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
link = 'https://apps.apple.com/us/app/lemonade-insurance/id1055653645#see-all/reviews'
stars = re.compile(r"\d out of \d")
with webdriver.Chrome() as driver:
wait = WebDriverWait(driver, 10)
driver.get(link)
elements = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".we-customer-review")))
for elem in elements:
s = BeautifulSoup(elem.get_attribute("innerHTML"), "html.parser")
review_date = s.find("time").text
review_body = s.find("p").text
review_title = s.find("h3", {"data-test-customer-review-title": ""}).text.strip()
review_stars = ''.join(re.findall(stars, str(s.find("figure"))))
dev_response = s.find_all("p", {"data-test-bidi": ""})
print(f"{review_title} | {review_date} | {review_stars}")
print(review_body)
print(dev_response[1].text if len(dev_response) > 1 else "")
print("-" * 80)
This prints:
Convenient and Affordable!!!! | 07/06/2019 | 5 out of 5
The Lemonade app is so easy to use as well as having affordable rates! It took me all of 15 minutes to sign-up, pick a coverage and a deductible. Very nice customer service as well as very informative. At anytime of any day of the week I can log onto my account and check everything as well as make any necessary changes that I may need or want. The chat option feature works fantastic! Whenever I have a question I just go to the chat feature and within seconds someone is there to help and answer all my questions regarding their services and my plan coverage. I wished they had a referral feature cause I’ve already set up a couple of family members with the company as well. They were amazed that it only took about 10-15 minutes to setup and just how affordable it is!! I’ve gotten quotes from so many other companies but the monthly payments and deductible were too expensive, I was a little hesitant at first but I said hey I should at least give it a try and so far so good!!! I’m hoping that it’ll never come to the point where I’ll actually need to file a claim but if so I feel confident that the process will be easy and stress free considering how much stress I’m going to actually have due to a burglary or theft. I have faith that we’re going to have a very long relationship. Thanks to all the developers of the Lemonade App, the name of the company is nice too!!!!
Thanks so much for your awesome review!! We're so happy to have you in our Lemonade community!
--------------------------------------------------------------------------------
Made the best lemonade I’ve ever had! | 07/01/2018 | 5 out of 5
I have been telling EVERYONE about Lemonade. I don’t know how, but, getting insurance through your company is actually FUN! I have never had so much FUN doing a chore that typically involves a boring Q & A. The app really made me feel like I was getting insurance through a friend. I smiled with the shout out to my horoscope sign after entering my birthday, I loved the “making lemonade” process when getting the quote, and (because I have a future date for the policy to start) I absolutely adore the countdown. I look at it almost daily and become even more excited for my move (and I really don’t like having to move so this is really helping). The use of unclaimed funds going to charity actually makes my heart melt. By providing freedom to choose what organization you would like to contribute to truly makes me feel like I am giving back in some way, and it is beyond noble and inspiring for you to use that money to help others turn lemons into lemonade. Also, your customer service has been impeccable. Every question I have had has been answered quickly and by a friendly representative of the company. I don’t know much about the insurance world, other than we need to have it, but, you make me want to work for you!!! Where do I sign up?
--------------------------------------------------------------------------------

Searching a lot of keywords on twitter via tweepy

I am trying to make a python code with tweepy that will track all the tweets from a specific country from a date which will have some of the chosen specific keywords. And I have chosen a lot of keywords like 24-25.
My keywords are vigilance anticipation interesting ecstacy joy serenity admiration trust acceptance terror fear apprehensive amazement surprize distraction grief sadness pensiveness loathing disgust boredom rage anger annoyance.
for more understanding, my code till now is:
places = api.geo_search(query="Canada",granularity="country")
place_id = places[0].id
public_tweets = tweepy.Cursor(api.search,
q="place:"+place_id+" since:2020-03-01",
lang="en",
).items(num_tweets)
Please help me with this question as soon as possible.
Thank You

How would I write a script to organize a list into a specific table-format?

I have a list of approximately 4,000-odd ancient Chinese proverbs I would like to import into Pleco (a Chinese dictionary app) for flashcards. However, Pleco needs them in a specific format (a table separated by tabs) and to do so manually would take forever.
Any idea how I would implement a script to automatically format the list?
e.g.
Ài fàn yǒu fàn; xī yī yǒu yī.
爱饭有饭, 惜衣有衣。
愛飯有飯, 惜衣有衣。
[Those who] treasure [their] food [will always] have food [and those who] take care of [their] clothing [will always] have clothes [to wear].
[An admonition to thrift; see also bùyī nuǎn below.]
CLOTHING FOOD THRIFT
[A2]
Ái gǒu yǎo de rén bù dōu shì zéi.
挨狗咬的人不都是贼。
挨狗咬的人不都是賊。
(lit) Not all who are bitten by dogs are thieves.
(fig) One should not make judgments based on superficial appearances. Things are not always as they (first) appear.
APPEARANCES JUDGMENTS
[A3]
Áiguo shé yǎo, jiàn shàn pǎo.
挨过蛇咬, 见鳝跑。
挨過蛇咬, 見鱔跑。
(lit) [One who has been] bitten by a snake [at the] sight [of an] eel [will] run away.
(fig) “Once bitten, twice shy.”
[See also yīzhāo bèi shé yǎo below.]
EXPERIENCE LEARNING
into:
爱饭有饭, 惜衣有衣 愛飯有飯, 惜衣有衣 [Those who] treasure [their] food [will always] have food [and those who] take care of [their] clothing [will always] have clothes [to wear].
[An admonition to thrift; see also bùyī nuǎn below.]
挨狗咬的人不都是贼。 挨狗咬的人不都是賊。 (lit) Not all who are bitten by dogs are thieves.
(fig) One should not make judgments based on superficial appearances. Things are not always as they (first) appear.
挨过蛇咬, 见鳝跑。 挨過蛇咬, 見鱔跑。 (lit) [One who has been] bitten by a snake [at the] sight [of an] eel [will] run away.
(fig) “Once bitten, twice shy.”
[See also yīzhāo bèi shé yǎo below.]
It needs to be in the form: Simplified characters TAB Traditional characters TAB TAB Definition
Please leave out the Pinyin (e.g. Ài fàn yǒu fàn; xī yī yǒu yī.), the identifier (e.g. [A1], [A2]) and the last line with the topics (e.g. CLOTHING FOOD THRIFT)
Further down the code the lines will not always be exact. Sometimes there will be more or less lines per proverb.
Thank you all so much! StackOverflow has been a huge help to me in my coding adventures.

Resources