Using unstack with Pandas - python-3.x
I am getting an exception when applying unstack, and would like to understand it.
For a reproducible example:
(to load the data: pd.DataFrame(json.loads(titanic)))
titanic
'{"home.dest":{"0":"St Louis, MO","1":"Montreal, PQ \\/ Chesterville, ON","2":"Montreal, PQ \\/ Chesterville, ON","3":"Montreal, PQ \\/ Chesterville, ON","4":"Montreal, PQ \\/ Chesterville, ON","5":"New York, NY","6":"Hudson, NY","7":"Belfast, NI","8":"Bayside, Queens, NY","9":"Montevideo, Uruguay","10":"New York, NY","11":"New York, NY","12":"Paris, France","13":null,"14":"Hessle, Yorks","15":"New York, NY","16":"Montreal, PQ","17":"Montreal, PQ","18":null,"19":"Winnipeg, MN"},"pclass":{"0":1,"1":1,"2":1,"3":1,"4":1,"5":1,"6":1,"7":1,"8":1,"9":1,"10":1,"11":1,"12":1,"13":1,"14":1,"15":1,"16":1,"17":1,"18":1,"19":1},"survived":{"0":1,"1":1,"2":0,"3":0,"4":0,"5":1,"6":1,"7":0,"8":1,"9":0,"10":0,"11":1,"12":1,"13":1,"14":1,"15":0,"16":0,"17":1,"18":1,"19":0},"name":{"0":"Allen, Miss. Elisabeth Walton","1":"Allison, Master. Hudson Trevor","2":"Allison, Miss. Helen Loraine","3":"Allison, Mr. Hudson Joshua Creighton","4":"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)","5":"Anderson, Mr. Harry","6":"Andrews, Miss. Kornelia Theodosia","7":"Andrews, Mr. Thomas Jr","8":"Appleton, Mrs. Edward Dale (Charlotte Lamson)","9":"Artagaveytia, Mr. Ramon","10":"Astor, Col. John Jacob","11":"Astor, Mrs. John Jacob (Madeleine Talmadge Force)","12":"Aubart, Mme. Leontine Pauline","13":"Barber, Miss. Ellen \\"Nellie\\"","14":"Barkworth, Mr. Algernon Henry Wilson","15":"Baumann, Mr. John D","16":"Baxter, Mr. Quigg Edmond","17":"Baxter, Mrs. James (Helene DeLaudeniere Chaput)","18":"Bazzani, Miss. Albina","19":"Beattie, Mr. Thomson"},"sex":{"0":"female","1":"male","2":"female","3":"male","4":"female","5":"male","6":"female","7":"male","8":"female","9":"male","10":"male","11":"female","12":"female","13":"female","14":"male","15":"male","16":"male","17":"female","18":"female","19":"male"},"age":{"0":29.0,"1":0.92,"2":2.0,"3":30.0,"4":25.0,"5":48.0,"6":63.0,"7":39.0,"8":53.0,"9":71.0,"10":47.0,"11":18.0,"12":24.0,"13":26.0,"14":80.0,"15":null,"16":24.0,"17":50.0,"18":32.0,"19":36.0},"sibsp":{"0":0,"1":1,"2":1,"3":1,"4":1,"5":0,"6":1,"7":0,"8":2,"9":0,"10":1,"11":1,"12":0,"13":0,"14":0,"15":0,"16":0,"17":0,"18":0,"19":0},"parch":{"0":0,"1":2,"2":2,"3":2,"4":2,"5":0,"6":0,"7":0,"8":0,"9":0,"10":0,"11":0,"12":0,"13":0,"14":0,"15":0,"16":1,"17":1,"18":0,"19":0},"ticket":{"0":"24160","1":"113781","2":"113781","3":"113781","4":"113781","5":"19952","6":"13502","7":"112050","8":"11769","9":"PC 17609","10":"PC 17757","11":"PC 17757","12":"PC 17477","13":"19877","14":"27042","15":"PC 17318","16":"PC 17558","17":"PC 17558","18":"11813","19":"13050"},"fare":{"0":211.3375,"1":151.55,"2":151.55,"3":151.55,"4":151.55,"5":26.55,"6":77.9583,"7":0.0,"8":51.4792,"9":49.5042,"10":227.525,"11":227.525,"12":69.3,"13":78.85,"14":30.0,"15":25.925,"16":247.5208,"17":247.5208,"18":76.2917,"19":75.2417},"cabin":{"0":"B5","1":"C22 C26","2":"C22 C26","3":"C22 C26","4":"C22 C26","5":"E12","6":"D7","7":"A36","8":"C101","9":null,"10":"C62 C64","11":"C62 C64","12":"B35","13":null,"14":"A23","15":null,"16":"B58 B60","17":"B58 B60","18":"D15","19":"C6"},"embarked":{"0":"S","1":"S","2":"S","3":"S","4":"S","5":"S","6":"S","7":"S","8":"S","9":"C","10":"C","11":"C","12":"C","13":"S","14":"S","15":"S","16":"C","17":"C","18":"C","19":"C"},"boat":{"0":"2","1":"11","2":null,"3":null,"4":null,"5":"3","6":"10","7":null,"8":"D","9":null,"10":null,"11":"4","12":"9","13":"6","14":"B","15":null,"16":null,"17":"6","18":"8","19":"A"},"body":{"0":null,"1":null,"2":null,"3":135.0,"4":null,"5":null,"6":null,"7":null,"8":null,"9":22.0,"10":124.0,"11":null,"12":null,"13":null,"14":null,"15":null,"16":null,"17":null,"18":null,"19":null}}'
I create a multi index with the following command:
titanic = titanic.set_index(['name', 'home.dest'])
Then I want to unstack.
titanic.unstack(level = 'home.dest')
I get the following exception message:
ValueError: Index contains duplicate entries, cannot reshape
The error is saying that your choice of columns in which you built the MultiIndex is not unique and therefore it has problems unstacking because there are ambiguities.
One way to fix this is to guarantee the uniqueness by adding a counter.
counts = titanic.gropuby(['name', 'home.dest']).cumcount().rename('Counter')
titanic = titanic.set_index(['name', 'home.dest', counts])
Then your unstack will work
titanic.unstack(level = 'home.dest')
But I'd advise maybe
titanic.unstack(['home.dest', 'Counter'])
Otherwise, you'll have to aggregate with a groupby
titanic.groupby(['name', 'home.dest']).first().unstack()
Related
is there a method to detect person and associate a text?
I have a text like : Take a loot at some of the first confirmed Forum speakers: John Sequiera Graduated in Biology at Facultad de Ciencias Exactas y Naturales,University of Buenos Aires, Argentina. In 2004 obtained a PhD in Biology (Molecular Neuroscience), at University of Buenos Aires, mentored by Prof. Marcelo Rubinstein. Between 2005 and 2008 pursued postdoctoral training at Pasteur Institute (Paris) mentored by Prof Jean-Pierre Changeux, to investigate the role of nicotinic receptors in executive behaviors. Motivated by a deep interest in investigating human neurological diseases, in 2009 joined the Institute of Psychiatry at King’s College London where she performed basic research with a translational perspective in the field of neurodegeneration. Since 2016 has been chief of instructors / Adjunct professor at University of Buenos Aires, Facultad de Ciencias Exactas y Naturales. Tom Gonzalez is a professor of Neuroscience at the Sussex Neuroscience, School of Life Sciences, University of Sussex. Prof. Baden studies how neurons and networks compute, using the beautiful collection of circuits that make up the vertebrate retina as a model. I want to have in output : [{"person" : "John Sequiera" , "content": "Graduated in Biology at Facultad...."},{"person" : "Tom Gonzalez" , "content": "is a professor of Neuroscience at the Sussex..."}] so we want to get NER : PER for person and in content we put all contents after detecting person until we found a new person in the text ... it is possible ? i try to use spacy to extract NER , but i found a difficulty to get content : import spacy nlp = spacy.load("en_core_web_lg") doc = nlp(text) for ent in doc.ents: print(ent.text,ent.label_)
How to resolve pandas length error for rows/columns
I have raised the SO Question here and blessed to have an answer from #Scott Boston. However i am raising another question about an error ValueError: Columns must be same length as key as i am reading a text file and all the rows/columns are not of same length, i tried googling but did not get an answer as i don't want them to be skipped. Error b'Skipping line 2: expected 13 fields, saw 14\nSkipping line 5: expected 13 fields, saw 14\nSkipping line 6: expected 13 fields, saw 16\nSkipping line 7: expected 13 fields, saw 14\nSkipping line 8: expected 13 fields, saw 15\nSkipping line 9: expected 13 fields, saw 14\nSkipping line 20: expected 13 fields, saw 19\nSkipping line 21: expected 13 fields, saw 16\nSkipping line 23: expected 13 fields, saw 14\nSkipping line 24: expected 13 fields, saw 16\nSkipping line 27: expected 13 fields, saw 14\n' My pandas dataframe generator #!/usr/bin/python3 import pandas as pd # cvc_file = pd.read_csv('kids_cvc',header=None,error_bad_lines=False) cvc_file[['cols', 0]] = cvc_file[0].str.split(':', expand=True) #Split first column on ':' df = cvc_file.set_index('cols').transpose() #set_index and transpose print(df) Result $ ./read_cvc.py b'Skipping line 2: expected 13 fields, saw 14\nSkipping line 5: expected 13 fields, saw 14\nSkipping line 6: expected 13 fields, saw 16\nSkipping line 7: expected 13 fields, saw 14\nSkipping line 8: expected 13 fields, saw 15\nSkipping line 9: expected 13 fields, saw 14\nSkipping line 20: expected 13 fields, saw 19\nSkipping line 21: expected 13 fields, saw 16\nSkipping line 23: expected 13 fields, saw 14\nSkipping line 24: expected 13 fields, saw 16\nSkipping line 27: expected 13 fields, saw 14\n' cols ab ad an ed eg et en eck ell it id ig im ish ob og ock ut ub ug um un ud uck ush 0 cab bad ban bed beg bet den beck bell bit bid big dim fish cob bog dock but cub bug bum bun bud buck gush 1 dab dad can fed keg get hen deck cell fit did dig him dish gob cog lock cut hub dug gum fun cud duck hush 2 gab had fan led leg jet men neck dell hit hid fig rim wish job dog rock gut nub hug hum gun dud luck lush 3 jab lad man red peg let pen peck jell kit kid gig brim swish lob fog sock hut rub jug mum nun mud muck mush 4 lab mad pan wed NaN met ten check sell lit lid jig grim NaN mob hog tock jut sub lug sum pun spud puck rush 5 nab pad ran bled NaN net then fleck tell pit rid pig skim NaN rob jog block nut tub mug chum run stud suck blush File contents $ cat kids_cvc ab: cab, dab, gab, jab, lab, nab, tab, blab, crab, grab, scab, stab, slab at: bat, cat, fat, hat, mat, pat, rat, sat, vat, brat, chat, flat, gnat, spat ad: bad, dad, had, lad, mad, pad, sad, tad, glad an: ban, can, fan, man, pan, ran, tan, van, clan, plan, scan, than ag: bag, gag, hag, lag, nag, rag, sag, tag, wag, brag, drag, flag, snag, stag ap: cap, gap, lap, map, nap, rap, sap, tap, yap, zap, chap, clap, flap, slap, snap, trap am: bam, dam, ham, jam, ram, yam, clam, cram, scam, slam, spam, swam, tram, wham ack: back, hack, jack, lack, pack, rack, sack, tack, black, crack, shack, snack, stack, quack, track ash: bash, cash, dash, gash, hash, lash, mash, rash, sash, clash, crash, flash, slash, smash ed: bed, fed, led, red, wed, bled, bred, fled, pled, sled, shed eg: beg, keg, leg, peg et: bet, get, jet, let, met, net, pet, set, vet, wet, yet, fret en: den, hen, men, pen, ten, then, when eck: beck, deck, neck, peck, check, fleck, speck, wreck ell: bell, cell, dell, jell, sell, tell, well, yell, dwell, shell, smell, spell, swell it: bit, fit, hit, kit, lit, pit, sit, wit, knit, quit, slit, spit id: bid, did, hid, kid, lid, rid, skid, slid ig: big, dig, fig, gig, jig, pig, rig, wig, zig, twig im: dim, him, rim, brim, grim, skim, slim, swim, trim, whim ip: dip, hip, lip, nip, rip, sip, tip, zip, chip, clip, drip, flip, grip, ship, skip, slip, snip, trip, whip ick: kick, lick, nick, pick, sick, tick, wick, brick, chick, click, flick, quick, slick, stick, thick, trick ish: fish, dish, wish, swish in: bin, din, fin, pin, sin, tin, win, chin, grin, shin, skin, spin, thin, twin ot: cot, dot, got, hot, jot, lot, not, pot, rot, tot, blot, knot, plot, shot, slot, spot ob: cob, gob, job, lob, mob, rob, sob, blob, glob, knob, slob, snob og: bog, cog, dog, fog, hog, jog, log, blog, clog, frog op: cop, hop, mop, pop, top, chop, crop, drop, flop, glop, plop, shop, slop, stop ock: dock, lock, rock, sock, tock, block, clock, flock, rock, shock, smock, stock ut: but, cut, gut, hut, jut, nut, rut, shut ub: cub, hub, nub, rub, sub, tub, grub, snub, stub ug: bug, dug, hug, jug, lug, mug, pug, rug, tug, drug, plug, slug, snug um: bum, gum, hum, mum, sum, chum, drum, glum, plum, scum, slum un: bun, fun, gun, nun, pun, run, sun, spun, stun ud: bud, cud, dud, mud, spud, stud, thud uck: buck, duck, luck, muck, puck, suck, tuck, yuck, chuck, cluck, pluck, stuck, truck ush: gush, hush, lush, mush, rush, blush, brush, crush, flush, slush Note: It's making the first row/column as a master one which has 13 values and skipping all the columns which are more than 13 columns.
I couldn't figure out a pandas way to extend the columns, but converting the rows to a dictionary made things easier. ss = ''' ab: cab, dab, gab, jab, lab, nab, tab, blab, crab, grab, scab, stab, slab at: bat, cat, fat, hat, mat, pat, rat, sat, vat, brat, chat, flat, gnat, spat ad: bad, dad, had, lad, mad, pad, sad, tad, glad ....... un: bun, fun, gun, nun, pun, run, sun, spun, stun ud: bud, cud, dud, mud, spud, stud, thud uck: buck, duck, luck, muck, puck, suck, tuck, yuck, chuck, cluck, pluck, stuck, truck ush: gush, hush, lush, mush, rush, blush, brush, crush, flush, slush '''.strip() with open ('kids.cvc','w') as f: f.write(ss) # write data file ###################################### import pandas as pd dd = {} maxcnt=0 with open('kids.cvc') as f: lines = f.readlines() for line in lines: line = line.strip() # remove \n len1 = len(line) # words have leading space line = line.replace(' ','') cnt = len1 - len(line) # get word (space) count if cnt > maxcnt: maxcnt = cnt # max word count rec = line.split(':') # header : words dd[rec[0]] = rec[1].split(',') # split words for k in dd: dd[k] = dd[k] + ['']*(maxcnt-len(dd[k])) # add extra values to match max column df = pd.DataFrame(dd) # convert dictionary to dataframe print(df.to_string(index=False)) Output ab at ad an ag ap am ack ash ed eg et en eck ell it id ig im ip ick ish in ot ob og op ock ut ub ug um un ud uck ush cab bat bad ban bag cap bam back bash bed beg bet den beck bell bit bid big dim dip kick fish bin cot cob bog cop dock but cub bug bum bun bud buck gush dab cat dad can gag gap dam hack cash fed keg get hen deck cell fit did dig him hip lick dish din dot gob cog hop lock cut hub dug gum fun cud duck hush gab fat had fan hag lap ham jack dash led leg jet men neck dell hit hid fig rim lip nick wish fin got job dog mop rock gut nub hug hum gun dud luck lush jab hat lad man lag map jam lack gash red peg let pen peck jell kit kid gig brim nip pick swish pin hot lob fog pop sock hut rub jug mum nun mud muck mush lab mat mad pan nag nap ram pack hash wed met ten check sell lit lid jig grim rip sick sin jot mob hog top tock jut sub lug sum pun spud puck rush nab pat pad ran rag rap yam rack lash bled net then fleck tell pit rid pig skim sip tick tin lot rob jog chop block nut tub mug chum run stud suck blush tab rat sad tan sag sap clam sack mash bred pet when speck well sit skid rig slim tip wick win not sob log crop clock rut grub pug drum sun thud tuck brush blab sat tad van tag tap cram tack rash fled set wreck yell wit slid wig swim zip brick chin pot blob blog drop flock shut snub rug glum spun yuck crush crab vat glad clan wag yap scam black sash pled vet dwell knit zig trim chip chick grin rot glob clog flop rock stub tug plum stun chuck flush grab brat plan brag zap slam crack clash sled wet shell quit twig whim clip click shin tot knob frog glop shock drug scum cluck slush scab chat scan drag chap spam shack crash shed yet smell slit drip flick skin blot slob plop smock plug slum pluck stab flat than flag clap swam snack flash fret spell spit flip quick spin knot snob shop stock slug stuck slab gnat snag flap tram stack slash swell grip slick thin plot slop snug truck spat stag slap wham quack smash ship stick twin shot stop snap track skip thick slot trap slip trick spot
How to search for specific text in csv within a Pandas, python
Hello I want to find the account text # in the title column, and save it in the new csv. Pandas can do it, I tried to make it but it didn't work. This is my csv http://www.sharecsv.com/s/c1ed9790f481a8d452049be439f4e3d8/Newnormal.csv this is my code: import pandas as pd data = pd.read_csv("Newnormal.csv") data.dropna(inplace = True) sub ='#' data["Indexes"]= data["title"].str.find(sub) print(data) I want results like this From, to, title Xavier5501,KudiiThaufeeq,RT #KudiiThaufeeq: Royal Rape, Royal Harassment, Royal Cocktail Party, Royal Pedo, Royal Bidding, Royal Maalee Bayaan, Royal Slavery..et Thank you.
reduce records to only those that have an "#" in title define new column which is text between "#" and ":" you are left with some records where this leave NaN in to column. I've just filtered these out df = pd.read_csv("Newnormal.csv") df = df[df["title"].str.contains("#")==True] df["to"] = df["title"].str.extract(r".*([#][A-Z,a-z,0-9,_]+[:])") df = df[["from","to","title"]] df[~df["to"].isna()].to_csv("ToNewNormal.csv", index=False) df[~df["to"].isna()] output from to title 1 Xavier5501 #KudiiThaufeeq: RT #KudiiThaufeeq: Royal Rape, Royal Harassmen... 2 Suzane24979006 #USAID_NISHTHA: RT #USAID_NISHTHA: Don't step outside your hou... 3 sandeep_sprabhu #USAID_NISHTHA: RT #USAID_NISHTHA: Don't step outside your hou... 4 oliLince #Timothy_Hughes: RT #Timothy_Hughes: How to Get a Salesforce Th... 7 rismadwip #danielepermana: RT #danielepermana: Pak kasus covid per hari s... ... ... ... ... 992 Reptoid_Hunter #sapiofoxy: RT #sapiofoxy: I literally can't believe we ha... 994 KPCResearch #sapiofoxy: RT #sapiofoxy: I literally can't believe we ha... 995 GreySparkUK #VoxSmartGlobal: RT #VoxSmartGlobal: The #newnormal will see mo... 997 Gabboa10 #HuShameem: RT #HuShameem: One of #PGO_MV admin staff test... 999 wanjirunjendu #ntvkenya: RT #ntvkenya: AAK's Mugure Njendu shares insig...
unable to get rid of all emojis
I need help removing emojis. I looked at some other stackoverflow questions and this is what I am de but for some reason my code doesn't get rid of all the emojis d= {'alexveachfashion': 'Fashion Style * Haute Couture * Wearable Tech * VR\n👓👜⌚👠\nSoundCloud is Live #alexveach\n👇New YouTube Episodes ▶️👇', 'andrewvng': 'Family | Fitness | Friends | Gym | Food', 'runvi.official': 'Accurate measurement via SMART insoles & real-time AI coaching. Improve your technique & BOOST your performance with every run.\nSoon on Kickstarter!', 'triing': 'Augmented Jewellery™️ • Montreal. Canada.', 'gedeanekenshima': 'Prof na Etec Albert Einstein, Mestranda em Automação e Controle de Processos, Engenheira de Controle e Automação, Técnica em Automação Industrial.', 'jetyourdaddy': '', 'lavonne_sun': '☄️🔭 ✨\n°●°。Visual Narrative\nA creative heart with a poetic soul.\n————————————\nPARSONS —— Design & Technology', 'taysearch': 'All the World’s Information At Your Fingertips. (Literally) Est. 1991🇺🇸 🎀#PrincessofSearch 🔎Sample 👇🏽 the Search Engine Here 🗽', 'hijewellery': 'Fine 3D printed jewellery for tech lovers #3dprintedjewelry #wearabletech #jewellery', 'yhanchristian': 'Estudante de Engenharia, Maker e viciado em café.', 'femka': 'Fashion Futurist + Fashion Tech Lab Founder #technoirlab + Fashion Designer / Parsons & CSM Grad / Obsessed with #fashiontech #future #cryptocurrency', 'sinhbisen': 'Creator, TRiiNG, augmented jewellery label ⭕️ Transhumanist ⭕️ Corporeal cartographer ⭕️', 'stellawearables': '#StellaWearables ✉️Info#StellaWearables.com Premium Wearable Technology That Monitors Personal Health & Environments ☀️🏝🏜🏔', 'ivoomi_india': 'We are the manufacturers of the most innovative technologies and user-friendly gadgets with a global presence.', 'bgutenschwager': "When it comes to life, it's all about the experience.\nGoogle Mapper 🗺\n360 Photographer 📷\nBrand Rep #QuickTutor", 'storiesofdesign': 'Putting stories at the heart of brands and businesses | Cornwall and London | #storiesofdesign', 'trume.jp': '草創期から国産ウオッチの製造に取り組み、挑戦を続けてきたエプソンが世界に放つ新ブランド「TRUME」(トゥルーム)。目指すのは、最先端技術でアナログウオッチを極めるブランド。', 'themarinesss': "I didn't choose the blog life, the blog life chose me | Aspiring Children's Book Author | www.slayathomemum.com", 'ayowearable': 'The world’s first light-based wearable that helps you sleep better, beat jet lag and have more energy! #goAYO Get yours at:', 'wearyourowntechs': 'Bringing you the latest trends, Current Products and Reviews of Wearable Technology. Discover how they can enhance your Life and Lifestyle', 'roxfordwatches': 'The Roxford | The most stylish and customizable fitness smartwatch. Tracks your steps/calories/dist/sleep. Comes with FOUR bands, and a travel case!', 'playertek': "Track your entire performance - every training session, every match. \nBecause the best players don't hide.", '_kate_hartman_': '', 'hmsmc10': 'Health & Wellness 🍎\nBoston, MA 🏙\nSuffolk MPA ‘17 🎓 \n.\nJust Strong Ambassador 🏋🏻\u200d♀️', 'gadgetxtreme': 'Dedicated to reviewing gadgets, technologies, internet products and breaking tech news. Follow us to see daily vblogs on all the disruptive tech..', 'freedom.journey.leader': '📍MN\n🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿 \n📧Ashleybp5#gmail.com \n#homeschool #bossmom #builder #momlife', 'arts_food_life': 'Life through my phone.', 'medgizmo': 'Wearable #tech: #health #healthcare #wellness #gadgets #apps. Images/links provided as information resource only; doesn’t mean we endorse referenced', 'sawearables': 'The home of wearable tech in South Africa!\n--> #WearableTech #WearableTechnology #FitnessTech Find your wearable #', 'shop.mercury': 'Changing the way you charge.⚡️\nGet exclusive product discounts, and help us reach our goal below!🔋', 'invisawear': 'PRE-ORDERS NOW AVAILABLE! Get yours 25% OFF here: #girlboss #wearabletech'} for key in d: print("---with emojis----") print(d[key]) print("---emojis removed----") x=''.join(c for c in d[key] if c <= '\uFFFF') print(x) output example ---with emojis---- 📍MN 🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿 📧Ashleybp5#gmail.com #homeschool #bossmom #builder #momlife ---emojis removed---- MN Wife • Homeschooling Mom to 5 • D Y I lover • Small town living in MN. Ashleybp5#gmail.com #homeschool #bossmom #builder #momlife ---with emojis---- Changing the way you charge.⚡️ Get exclusive product discounts, and help us reach our goal below!🔋 ---emojis removed---- Changing the way you charge.⚡️ Get exclusive product discounts, and help us reach our goal below!
There is no technical definition of what an "emoji" is. Various glyphs may be used to render printable characters, symbols, control characters and the like. What seems like an "emoji" to you may be part of normal script to others. What you probably want to do is to look at the Unicode category of each character and filter out various categories. While this does not solve the "emoji"-definition-problem per se, you get much better control over what you are actually doing without removing, for example, literally all characters of languages spoken by 2/3 of the planet. Instead of filtering out certain categories, you may filter everything except the lower- and uppercase letters (and numbers). However, be aware that ꙭ is not "the googly eyes emoji" but the CYRILLIC SMALL LETTER DOUBLE MONOCULAR O, which is a normal lowercase letter to millions of people. For example: import unicodedata s = "🍃Wife • Homeschooling Mom to 5 🐵 • D Y I lover 🔨 • Small town living in MN. 🌿" # Just filter category "symbol" t = ''.join(c for c in s if unicodedata.category(c) not in ('So', )) print(t) ...results in Wife • Homeschooling Mom to 5 • D Y I lover • Small town living in MN. This may not be emoji-free enough, yet the • is technically a form of punctuation. So filter this as well # Filter symbols and punctuations. You may want 'Cc' as well, # to get rid of control characters. Beware that newlines are a # form of control-character. t = ''.join(c for c in s if unicodedata.category(c) not in ('So', 'Po')) print(t) And you get Wife Homeschooling Mom to 5 D Y I lover Small town living in MN
Excel VlookUp tips finding a value within a range and returning the value of column next to it
I have an excel spreadsheet, within that spread sheet I have a ZIP column then I have a ZIPCodes Column Finally I have a county Column. I am trying to write a formula that states if Zip value is within ZipCodes place the countyName that is in the same row. If is NOT within this row of ZipCodes look into the next one, if the Zip value is within that cell then place the CountyName of THAT row and so on. I tried doing 89 IF statements but Excel told me that only 64 statements can be nested. I tried using VLOOKUP but I keep getting a #N/A error. ZIP ZIPCODES CountyName 45373 45105 45144 45616 45618 45650 45660 45679 45684 45693 45697 Adams 44022 45801 45802 45804 45805 45807 45808 45809 45817 45820 45833 45850 45854 45887 Allen 45319 44805 44838 44840 44842 44848 44859 44864 44866 44874 44880 Ashland 45168 44003 44004 44005 44010 44030 44032 44041 44047 44048 44068 44076 44082 44084 44085 44088 44093 44099 Ashtabula 43950 45701 45710 45711 45716 45717 45719 45723 45732 45735 45739 45740 45761 45764 45766 45776 45777 45778 45780 45782 Athens 45806 45806 45819 45865 45869 45870 45871 45884 45885 45888 45895 45896 Auglaize 43033 43713 43718 43719 43759 43902 43905 43906 43909 43912 43916 43927 43928 43933 43934 43935 43937 43940 43942 43947 43950 43951 43967 43972 43977 43983 43985 Belmont 45164 45101 45115 45118 45119 45121 45130 45131 45154 45167 45168 45171 Brown 45069 45003 45004 45011 45012 45013 45014 45015 45018 45025 45026 45042 45043 45044 45050 45053 45055 45056 45061 45062 45063 45064 45067 45069 45071 Butler 45157 44607 44615 44620 44631 44639 44644 44651 44675 Carroll 44629 43009 43044 43047 43060 43070 43072 43078 43083 43084 45389 Champaign 45369 43010 45319 45323 45341 45344 45349 45368 45369 45372 45501 45502 45503 45504 45505 45506 Clark 45158 45102 45103 45106 45112 45120 45122 45140 45145 45147 45150 45153 45156 45157 45158 45160 45176 45245 Clermont 45146 45107 45113 45114 45138 45146 45148 45159 45164 45166 45169 45177 Clinton 44840 43920 43945 43962 43968 44408 44413 44415 44423 44427 44431 44432 44441 44445 44455 44460 44490 44492 44493 44625 44634 44665 Columbiana 45863 43803 43805 43811 43812 43824 43828 43836 43843 43844 43845 Coshocton 45806 44820 44825 44827 44833 44854 44856 44860 44881 44887 Crawford 43009 44017 44022 44040 44070 44101 44102 44103 44104 44105 44106 44107 44108 44109 44110 44111 44112 44113 44114 44115 44116 44117 44118 44119 44120 44121 44122 44123 44124 44125 44126 44127 44128 44129 44130 44131 44132 44133 44134 44135 44136 44137 44138 44139 44140 44141 44142 44143 44144 44145 44146 44147 44149 44178 44181 44185 44188 44189 44190 44191 44192 44193 44194 44195 44197 44198 44199 Cuyahoga 45887 45303 45304 45328 45331 45332 45346 45348 45350 45351 45352 45358 45362 45380 45388 45390 Darke 45118 43512 43519 43520 43526 43530 43536 43549 43556 Defiance 45740 43003 43015 43021 43032 43035 43061 43065 43066 43074 43082 43240 Delaware 45372 43438 44089 44814 44816 44824 44839 44846 44870 44871 Erie 43910 43046 43102 43105 43107 43112 43130 43136 43147 43148 43150 43154 43155 43157 43163 Fairfield 45176 43106 43128 43142 43160 Fayette 43357 43002 43004 43016 43017 43026 43054 43068 43081 43085 43086 43109 43110 43119 43123 43125 43126 43137 43195 43196 43198 43199 43201 43202 43203 43204 43205 43206 43207 43209 43210 43211 43212 43213 43214 43215 43216 43217 43218 43219 43220 43221 43222 43223 43224 43226 43227 43228 43229 43230 43231 43232 43234 43235 43236 43251 43260 43265 43266 43268 43270 43271 43272 43279 43287 43291 43299 Franklin 44691 43502 43515 43521 43533 43540 43553 43558 43567 Fulton 43449 45614 45620 45623 45631 45643 45658 45674 45685 45686 Gallia 45373 44021 44023 44024 44026 44033 44046 44062 44064 44065 44072 44073 44080 44086 Geauga 43939 45301 45305 45307 45314 45316 45324 45335 45370 45384 45385 45387 45431 45432 45433 45434 45435 Greene 43215 43722 43723 43725 43732 43733 43736 43749 43750 43755 43768 43772 43773 43778 43780 Guernsey 43616 45001 45002 45030 45033 45041 45051 45052 45111 45174 45201 45202 45203 45204 45205 45206 45207 45208 45209 45211 45212 45213 45214 45215 45216 45217 45218 45219 45220 45221 45222 45223 45224 45225 45226 45227 45228 45229 45230 45231 45232 45233 45234 45235 45236 45237 45238 45239 45240 45241 45242 45243 45244 45246 45247 45248 45249 45250 45251 45252 45253 45254 45255 45258 45262 45263 45264 45267 45268 45269 45270 45271 45273 45274 45275 45277 45280 45296 45298 45299 45999 Hamilton 44123 44804 45814 45816 45839 45840 45841 45858 45867 45868 45881 45889 45890 45897 Hancock 43224 43326 43340 43345 43346 45810 45812 45835 45836 45843 45859 Hardin 44606 43907 43973 43974 43976 43981 43984 43986 43988 44693 44695 44699 Harrison 44817 43510 43516 43523 43524 43527 43532 43534 43535 43545 43548 43550 43555 Henry 45305 45110 45123 45132 45133 45135 45142 45155 45172 Highland 44646 43111 43127 43135 43138 43144 43149 43152 43158 Hocking 43777 44610 44611 44617 44628 44633 44637 44638 44654 44660 44661 44687 44690 Holmes 44047 44811 44826 44837 44847 44850 44851 44855 44857 44865 44888 44889 44890 Huron 44278 45621 45640 45656 45692 Jackson 45874 43901 43903 43908 43910 43913 43917 43925 43926 43930 43932 43938 43939 43941 43943 43944 43948 43952 43953 43961 43963 43964 43970 43971 Jefferson 43465 43005 43006 43011 43014 43019 43022 43028 43037 43048 43050 Knox 44124 44045 44057 44060 44061 44077 44081 44092 44094 44095 44096 44097 Lake 43469 45619 45638 45645 45659 45669 45675 45678 45680 45688 45696 Lawrence 44004 43001 43008 43013 43018 43023 43025 43027 43030 43031 43033 43055 43056 43058 43062 43071 43073 43080 43093 43098 43721 43740 Licking 45828 43310 43311 43318 43319 43324 43331 43333 43336 43343 43347 43348 43357 43358 43360 Logan 44471 44001 44011 44012 44028 44035 44036 44039 44044 44049 44050 44052 44053 44054 44055 44074 44090 Lorain 45142 43434 43504 43528 43537 43542 43547 43560 43566 43571 43601 43603 43604 43605 43606 43607 43608 43609 43610 43611 43612 43613 43614 43615 43616 43617 43618 43620 43623 43635 43652 43656 43657 43659 43660 43661 43666 43667 43681 43682 43697 43699 Lucas 44902 43064 43140 43143 43151 43153 43162 Madison 43314 44401 44405 44406 44416 44422 44429 44436 44442 44443 44451 44452 44454 44471 44501 44502 44503 44504 44505 44506 44507 44509 44510 44511 44512 44513 44514 44515 44555 44609 44619 44672 Mahoning 44087 43301 43302 43306 43307 43314 43322 43332 43335 43337 43341 43342 43356 Marion 43017 44212 44215 44233 44235 44251 44253 44254 44256 44258 44273 44274 44275 44280 44281 44282 Medina 45711 45720 45741 45743 45760 45769 45770 45771 45772 45775 45779 45783 Meigs 45371 45310 45822 45826 45828 45846 45860 45862 45866 45882 45883 Mercer 44212 45308 45312 45317 45318 45326 45337 45339 45356 45359 45361 45371 45373 45374 45383 Miami 45236 43716 43747 43752 43754 43757 43786 43789 43793 43914 43915 43931 43946 Monroe 44092 45309 45315 45322 45325 45327 45342 45343 45345 45354 45377 45401 45402 45403 45404 45405 45406 45408 45409 45410 45412 45413 45414 45415 45416 45417 45418 45419 45420 45422 45423 45424 45426 45427 45428 45429 45430 45437 45439 45440 45441 45448 45449 45454 45458 45459 45463 45469 45470 45475 45479 45481 45482 45490 Montgomery 44114 43728 43756 43758 43787 Morgan 45335 43315 43317 43320 43321 43325 43334 43338 43349 43350 Morrow 43105 43701 43701 43702 43702 43720 43727 43734 43735 43738 43746 43762 43767 43771 43777 43791 43802 43821 43822 43830 43842 Muskingum 45236 43711 43717 43724 43779 43788 45727 Noble 43723 43408 43412 43416 43430 43432 43433 43436 43439 43440 43445 43446 43449 43452 43456 43458 43468 Ottawa 45876 45813 45821 45849 45851 45855 45861 45873 45879 45880 Paulding 45338 43076 43730 43731 43739 43748 43760 43761 43764 43766 43782 43783 Perry 44113 43103 43113 43116 43117 43145 43146 43156 43164 Pickaway 44215 45613 45624 45642 45646 45661 45683 45687 45690 Pike 44114 44201 44202 44211 44231 44234 44240 44241 44242 44243 44255 44260 44265 44266 44272 44285 44288 44411 44412 44449 Portage 44012 45070 45311 45320 45321 45330 45338 45347 45378 45381 45382 Preble 44113 45815 45827 45830 45831 45837 45844 45848 45853 45856 45864 45875 45876 45877 45893 Putnam 44460 44813 44822 44843 44862 44875 44878 44901 44902 44903 44904 44905 44906 44907 44999 Richland 44837 43101 43115 45601 45612 45617 45628 45633 45644 45647 45673 45681 Ross 45429 43407 43410 43420 43431 43435 43442 43464 43469 Sandusky 44256 45629 45630 45636 45648 45652 45653 45657 45662 45663 45671 45677 45682 45694 45699 Scioto 45723 44802 44807 44809 44815 44818 44828 44830 44836 44841 44845 44853 44861 44867 44883 Seneca 45133 45302 45306 45333 45334 45336 45340 45353 45360 45363 45365 45367 45845 Shelby 44089 44601 44608 44613 44614 44626 44630 44632 44640 44641 44643 44646 44647 44648 44650 44652 44657 44662 44666 44669 44670 44685 44688 44689 44701 44702 44703 44704 44705 44706 44707 44708 44709 44710 44711 44712 44714 44718 44720 44721 44730 44735 44750 44767 44799 Stark 45424 44056 44067 44087 44203 44210 44216 44221 44222 44223 44224 44232 44236 44237 44250 44262 44264 44278 44286 44301 44302 44303 44304 44305 44306 44307 44308 44309 44310 44311 44312 44313 44314 44315 44316 44317 44319 44320 44321 44322 44325 44326 44328 44333 44334 44372 44393 44396 44398 44399 Summit 44661 44402 44403 44404 44410 44417 44418 44420 44424 44425 44428 44430 44437 44438 44439 44440 44444 44446 44450 44453 44470 44473 44481 44482 44483 44484 44485 44486 44488 44491 Trumbull 44122 43804 43832 43837 43840 44612 44621 44622 44624 44629 44653 44656 44663 44671 44678 44679 44680 44681 44682 44683 44697 Tuscarawas 43213 43007 43029 43036 43040 43041 43045 43067 43077 43344 Union 43125 45832 45838 45863 45874 45886 45891 45894 45898 45899 Van Wert 43457 45622 45634 45651 45654 45672 45695 45698 Vinton 44286 45005 45032 45034 45036 45039 45040 45054 45065 45066 45068 45152 45162 Warren 45888 45712 45713 45714 45715 45721 45724 45729 45734 45742 45744 45745 45746 45750 45767 45768 45773 45784 45786 45787 45788 45789 Washington 45011 44214 44217 44230 44270 44276 44287 44606 44618 44627 44636 44645 44659 44667 44676 44677 44691 Wayne 45014 43501 43505 43506 43517 43518 43531 43543 43554 43557 43570 Williams 44264 43402 43403 43406 43413 43414 43437 43441 43443 43447 43450 43451 43457 43460 43462 43463 43465 43466 43467 43511 43522 43525 43529 43541 43551 43552 43565 43569 43619 43654 44817 45872 Wood 45347 43316 43323 43330 43351 43359 44844 44849 44882 Wyandot This is my Vlookup =VLOOKUP(I2, N2:N89, O2:O89, TRUE) If the Zip is found within the ZipCodes range return the value of CountyName of the row that is found.