Analyzing .txt file consisting of tweets line by line

Analyzing .txt file consisting of tweets line by line - string

how would I use multiple fgetl operations to have Matlab read my 30 line of text .txt file. I cannot just put all the lines into one variable because I need to analyze information within the file using Matlab. The information I need to examine are things like how many lines are in the file as well as questions regarding how many certain letters, or symbols are in each line.
I have started out with this code so far
clear all
close all
clc
%% Questions Two
% part a
fid = fopen('twitter_data.txt');
twitter = fread(fid,inf,'*char')';
fclose(fid);
Just noticed the above doesnt work though cause i need it line by line, not all the character in one columns row vector

You want to use fgetl. In addition, you could find patterns with strfind.
fid = fopen('twitter_data.txt');
twitter = fgetl(fid);
while ischar(twitter)
%Process twitter here
fprintf('Line contains %i # symbols',length(strfind(twitter,'#')));
%get next line
twitter = fgetl(fid);
end
fclose(fid);

Related

How to make it so my code remembers what is has written in a text file?

Hello python newbie here.
I have code that prints names into a text file. It takes the names from a website. And on that website, there may be multiple same names. It filters them perfectly without an issue into one name by looking if the name has already written in the text file. But when I run the code again it ignores the names that are already in the text file. It just filters the names it has written on the same session. So my question is how do I make it remember what it has written.
image of the text file
kaupan_nimi = driver.find_element_by_xpath("//span[#class='store_name']").text
with open("mainostetut_yritykset.txt", "r+") as tiedosto:
if kaupan_nimi in tiedosto:
print("\033[33mNimi oli jo tiedostossa\033[0m")
else:
print("\033[32mUusi asiakas vahvistettu!\033[0m")
#Kirjoittaa tekstitiedostoon yrityksen nimen
tiedosto.seek(0)
data = tiedosto.read(100)
if len(data) > 0:
tiedosto.write("\n")
tiedosto.write(kaupan_nimi)
There is the code that I think is the problem. Please correct me if I am wrong.

There are two main issues with your current code.
The first is that you are likely only going to be able to detect duplicated names if they are back to back. That is, if the prior name that you're seeing again was the very last thing written into the file. That's because all the lines in the file except the last one will have newlines at the end of them, but your names do not have newlines. You're currently looking for an exact match for a name as a line, so you'll only ever have a chance to see that with the last line, since it doesn't have a newline yet. If the list of names you are processing is sorted, the duplicates will naturally be clumped together, but if you add in some other list of names later, it probably won't pick up exactly where the last list left off.
The second issue in your code is that it will tend to clobber anything that gets written more than 100 characters into the file, starting every new line at that point, once it starts filling up a bit.
Lets look at the different parts of your code:
if kaupan_nimi in tiedosto:
This is your duplicate check, it treats the file as an iterator and reads each line, checking if kaupan_nimi is an exact match to any of them. This will always fail for most of the lines in the file because they'll end with "\n" while kaupan_nimi does not.
I would suggest instead reading the file only once per batch of names, and keeping a set of names in your program's memory that you can check your names-to-be-added against. This will be more efficient, and won't require repeated reading from the disk, or run into newline issues.
tiedosto.seek(0)
data = tiedosto.read(100)
if len(data) > 0:
tiedosto.write("\n")
This code appears to be checking if the file is empty or not. However, it always leaves the file position just past character 100 (or at the end of the file if there were fewer than 100 characters in it so far). You can probably fit several names in that first 100 characters, but after that, you'll always end up with the names starting at index 100 and going on from there. This means you'll get names written on top of each other.
If you take my earlier advice and keep a set of known names, you could check that set to see if it is empty or not. This doesn't require doing anything to the file, so the position you're operating on it can remain at the end all of the time. Another option is to always end every line in the file with a newline so that you don't need to worry about whether to prepend a newline only if the file isn't empty, since you know that at the end of the file you'll always be writing a fresh line. Just follow each name with a newline and you'll always be doing the right thing.
Here's how I'd put things together:
# if possible, do this only once, at the start of the website reading procedure:
with open("mainostetut_yritykset.txt", "r+") as tiedosto:
known_names = set(name.strip() for name in tiedosto) # names already in the file
# do the next parts in some kind of loop over the names you want to add
for name in something():
if name in known_names: # duplicate found
print("\033[33mNimi oli jo tiedostossa\033[0m")
else: # not a duplicate
print("\033[32mUusi asiakas vahvistettu!\033[0m")
tiedosto.write(kaupan_nimi) # write out the name
tiedosto.write("\n") # and always add a newline afterwards
# alternatively, if you can't have a trailing newline at the end, use:
# if known_names:
# tiedosto.write("\n")
# tiedosto.write(kaupan_nimi)
known_names.add(kaupan_nimi) # update the set of names

Python - How do I separate data into multiple lines

I have two strings that i want to put into a txt file but when I try and write then, it's only on the first line, I want the string to be on separate lines how do I do so?
Here is the writing part of my code btw:
saveFile = open('points.txt', 'w')
saveFile.write(str(jakesPoints))
saveFile.write(str(alexsPoints))
saveFile.close
if jakesPoints was 10 and alexsPoints was 12 then the text file would be
1012
but i want to to be
10
12

You can use a newline character (\n) to move to a new line. For your example:
with open('points.txt', 'w') as saveFile:
saveFile.write("{}\n".format(jakesPoints))
saveFile.write("{}\n".format(alexsPoints))
The other things to note:
It is helpful to open files using with - this will take care of opening and closing the file automatically (which is typically preferred over trying to remember to .close()).
The {}.format() section is used to convert your numbers to a string and add the newline character. I found https://pyformat.info/ explained the string formatters pretty good and highlight all the main advantages.

with open('points.txt', 'w') as saveFile:
saveFile.write(str(jakesPoints))
saveFile.write("\n")
saveFile.write(str(alexsPoints))
See difference betweenw and a used in open(). Also see join() .

Writing to a text file in seperate lines

I am creating a program, which at the end of all the inputs, I write to a text file and it comes up as one big line. Other than going into my text file and manually changing it to multiple lines, how do you write to a python text file in separate lines?
Eta:
Line one that holds one piece of inputted information
Line two that holds another piece of inputted information
Line three that holds a final piece of inputted information
I've tried writing twice to a file before closing it, but it returns an error saying it expected 1 argument and received more.

You should post at least a failed attempt that we can fix; but, due to the simplistic nature of the problem, I'll just give a quick answer anyway.
Steps:
Open the file in write ('w') mode (note that this blanks the file)
Write a line
Write a new-line ('\n'). Note that this step can be combined with the previous
Repeat steps 2 and 3 for all your lines
Close the file.
So, here's an implementation of the above. Note how we can use a with statement to do the first and last steps in one (and other benefits).
lines = ['Line one', 'Line two', 'Line three']
with open('your_file.txt', 'w') as f:
for l in lines:
f.write(l + '\n')

Splitting a csv file

I've used the following code in numerous programs and it has always worked...until now.
a = open('Filename.csv', 'r')
ba = a.read()
a.close()
b = list(zip(*(e.split(',') for e in ba)))
It has always split the csv file on the commas. Now I'm trying the same code with a csv file and it is splitting the file on each and every letter of the file, irregardless of letters or number, capital or small case letters.
Is there better code to use to split up a file on the commas?

Oops, I think I just found my stupid mistake, copying code from a couple of different sources doesn't always work the best, it should have been readlines(), not read()...once I saw it, just doing more head pounding, it finally caught my eye.

Tried to read a text file line by line. But line is split into two and getting stored on next line

I have a text file with special characters as well as normal characters. I am trying to read this file line by line. I have used
string[] lines = System.IO.File.ReadAllLines("Trial.txt");
To read it.
I used a break point and tried to find out the values stored in those lines. It broke some of the lines in between without finishing reading it and stored the rest in a new next line. When I checked the records, I found that the breaking occurs only at the point where there are special characters even though it doesn't happen with a particular special character. If the file has a total of 10 lines and if there is 1 line which has this problem, it reads a total of 11 lines. Can any of you guys pleas help me out with this? The text file is in UTF-8 format.

The File.ReadAllLines method splits the file on carriage return ('\r'), new line ('\n'), or a carriage return followed by a new line (taken from here: http://msdn.microsoft.com/en-us/library/s2tte0y1.aspx).
Check if the line that is not supposed to be split has either of those characters (judging from your reply to Luke Wyatt you probably have a new line ('\n') on that line at the point where it splits).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Analyzing .txt file consisting of tweets line by line - string

You want to use fgetl. In addition, you could find patterns with strfind. fid = fopen('twitter_data.txt'); twitter = fgetl(fid); while ischar(twitter) %Process twitter here fprintf('Line contains %i # symbols',length(strfind(twitter,'#'))); %get next line twitter = fgetl(fid); end fclose(fid);

Related

How to make it so my code remembers what is has written in a text file?

Python - How do I separate data into multiple lines

Writing to a text file in seperate lines

Splitting a csv file

Tried to read a text file line by line. But line is split into two and getting stored on next line

Categories

Resources