I'm looking for a way to shorten a sentence (a text of few lines) to produce a "readable" (not too long) file name.
The application scenario is a chatbot where user can submit a media, say a video, with some paired description text (a caption). The application would assign to the video a readable file name, to retrieve afterward the video by his file name.
Imagine a video paired with a more or less long text description of the scene, like by example:
const videoDescription = 'beautiful yellow flowers on foreground, with a background with countryside meadows and many cows'
How could I shorten the description above with a "suitable" short file name?
Ok, I could just give the sentence as a name, maybe something a bit sanitized, like:
const videoFileName = 'beautiful_yellow_flowers_on_foreground_with_a_background_with_countryside_meadows_and_many_cows.MP4'
but in that way I could exceed the 255 limit of file name size (e.g. on Linux)
Any idea for a shortener algo?
Maybe I could build the shortened filename with word abbreviations?
Maybe I could remove from sentence articles, prepositions, etc.?
BTW, a minor issue: I'm working with Italian language, so a bit of chars sanitize is required to produce good filenames.
Last but not least, I'd looking for JavaScript/Node.js code
You can check if the length is larger than 255 and shorten if necessary. You should also check for duplicates and append -1, -2 and so on if necessary.
let filename='some_flowers_on_foreground_with_a_background_with_countryside_meadows_and_few_cows.MP4'
if(filename.length>255)
filename=filename.slice(0,255-4)+'.MP4'
So far I was parsing the NotesCalendarEntry ics manually and overwriting certain properties, and it worked fine. Today i stumbled upon a problem, where a long summary name of the appointment gets split into multiple lines, and my parsing goes wrong, it replaces the part up to the first line break and the old part is still there.
Here's how I do this "parsing":
NotesCalendarEntry calEntry = cal.getEntryByUNID(apptuid);
String iCalE = calEntry.read();
StringBuilder sb = new StringBuilder(iCalE);
int StartIndex = iCalE.indexOf("BEGIN:VEVENT"); // care only about vevent
tmpIndex = sb.indexOf("SUMMARY:") + 8;
LineBreakIndex = sb.indexOf(Character.toString('\n'), tmpIndex);
if(sb.charAt(LineBreakIndex-1) == '\r') // take \r\n into account if exists
LineBreakIndex--;
sb.delete(tmpIndex, LineBreakIndex); // delete old content
sb.insert(tmpIndex, subject); // put my new content
It works when line breaks are where they are supposed to be, but somehow with long summary name, line breaks are put into the summary (not literal \r\n characters, but real line breaks).
I split the iCalE string by \r\n and got this (only a part obviously):
SEQUENCE:6
ATTENDEE;ROLE=CHAIR;PARTSTAT=ACCEPTED;CN="test/Test";RSVP=FALSE:
mailto:test#test.test
ATTENDEE;CUTYPE=ROOM;ROLE=REQ-PARTICIPANT;PARTSTAT=ACCEPTED
;CN="Room 2/Test";RSVP=TRUE:mailto:room2#test.test
CLASS:PUBLIC
DESCRIPTION:Test description\n
SUMMARY:Very long name asdjkasjdklsjlasdjlasjljraoisjroiasjroiasjoriasoiruasoiruoai Mee
ting long name
LOCATION:Room 2/Test
ORGANIZER;CN="test/Test":mailto:test#test.test
Each line is one array element from iCalE.split("\\r\\n");. As you can see, the Summary field got split into 2 lines, and a space was added after the line break.
Now I have no idea how to parse this correctly, I thought about finding the index of next : instead of a new line break, and then finding the first line break before that : character, but that wouldn't work if the summary also contained a : after the injected line-break, and also wouldn't work on fields like that ORGANIZER;CN= as it doesn't use : but ;
I tried importing external ical4j jar into my xpage to overcome this problem, and while everything is recognized in Domino Designer it resulted in lots of NoClassDefFound exceptions after trying to reach my xpage service, despite the jars being in the build path and all.
java.lang.NoClassDefFoundError: net.fortuna.ical4j.data.CalendarBuilder
How can I safely parse this manually, or how can I properly import ical4j jar to my xpage? I just want to modify 3 fields, the DTSTART, DTEND and SUMMARY, with the dates I had no problems so far. Fields like Description are using literal \n string to mark new lines, it should be the same in other fields...
Update
So I have read more about iCalendar, and it seems that there is a standard for this called line folds, these are crlf line endings followed by a space. I made a while loop checking until the last line-break not followed by a space, and it works great so far. Will use this unless there's a better solution (ical4j is one, but I can't get it working with Domino)
Hi when I try to download a file from mainframe, using attachmate extra it appends the username also along with it. I dont know where to turn it off.
like for example - file name is yyyy.file.name, then when i try to transfer of file it transfers username.yyyy.file.name.
in 3.4 the option to append user name is turned off. Still its happening
Enclose the entire dataset name (including the high-level qualifier) in single quotes. This is a TSO (not JCL) convention - if you refer to a dataset without single quotes, it pre-pends your user ID as the high-level qualifier; however if you place single quotes around the dataset name it will take it 'as is' (well, it will uppercase it, since all z/OS dataset names are uppercase, but otherwise it will be 'as is').
Here is the situation:
The first problem I'm having is with obtaining information from a CSV file. The purpose of the code I'm writing is to get a bunch of information on ZCTAs (zip codes), for a number of different cohorts (there are six currently being used, but the code is meant to be flexible to have any number of cohorts). One file contains the population, by cohort, for each ZCTA. Another file has the number of 'cases' (cases of cancer observed) for each cohort, for each ZCTA. Another file has the crude rate for each cohort, for the state of Iowa (the focus of this research), for the rate at which one can 'expect' to see the number of people who have cancer, for a population, by cohort. There are a couple of other files, but these are the focus, as this is where my issue is exhibited.
What my code does, initially, is to read the population file and get the population of each cohort by ZCTA. Each ZCTA, and the information, is stored in a list, which is then stored in a list of lists (nested), containing all of the ZCTAs. The code then gets the crude rate. Then, the crude rate is taken times the appropriate cohort, for each ZCTA and summed with all of the other cohorts within each ZCTA, to get the total number of people we can EXPECT to see having cancer, for each ZCTA. The population is also summed up. This information is stored in a another list, as well as a list containing all of the ZCTAs. This information will be the focus (The list of all of the ZCTAs, which each contain the total population and the total number of expected cases).
So, the problem is that I then need to take this newly acquired list and get the number of OBSERVED cases, for each cohort, sum those together, append it to the appropriate ZCTA and write it to a new file. I have code implemented that does this fine, EXCEPT that the bottom 22 or so ZCTAs don't get the number of observed cases. I don't know if it is the code, or what, but it works for all of the other 906, but doesn't get the bottom 22.
The reader will find sample data for the files I've discussed (the observed case file, and the output file) at: Gist
Here is the code I'm using:
`expectedcsv = open('ExpectedCases.csv', 'w', newline= '')
expectedwriter = csv.writer(expectedcsv, delimiter = ',')
expectedHeader = ['zcta', 'expected', 'pop', 'observed']
thecasesreader = csv.reader(thecasescsv, delimiter = ',')
for zcta in zctaPop:
caseCounter = 0
thecasescsv = open('NewCaseFile.csv', 'r', newline = '')
thecasesreader = csv.reader(thecasescsv, delimiter = ',')
for case in thecasesreader:
if case[0] == zcta[0]:
for i in range(3, len(case)):
caseCounter += int(case[i])
zcta.append(caseCounter)
expectedwriter.writerow(zcta)
expectedcsv.close()
thecasescsv.close()`
Something else I would also like to bring up is that later on in the code, the actual purpose for all of this, is to create an SMR filter, for each grid point. The grid points are somewhat arbitrary they have been placed (via coordinates) over the entire state of Iowa. The SMR is the number of observed divided by the number of expected cases. The threshold, that is, how many expected cases for a particular filter, is set by the user. So, if a user wants a filter created on 150 expected cases (for each grid point), the code goes through each ZCTA, summing up the expected cases until greater than 150 are found. The distance to this last ZCTA is the 'radius' of the filter.
To do this, I built a distance matrix (the distance from each grid point to every ZCTA) and then sorted it, nearest to furthest. Because of the size of the file (2300 X 930), I have to read this file line by line and get all of the information from other files. So, starting with the nearest ZCTA, I get the population, expected cases, and observed cases (the problem with this file was discussed above) and add these each to their respective counter (one for population, one for observed and one for expected). Then it goes to the next closest ZCTA and does the same, until the the threshold is exceeded.
The problem here is that I couldn't use the CSV Module to read these files, as I was already reading from another file and the index would be lost. So, I had to use just the regular filename.read(), which then required some interesting use of maketrans and .translate. I'm not sure its efficient or works great. Everything seems to be fine, but without the above problem being fixed, it's impossible to tell. I have included the code below, but was wondering if anybody had any better ideas/suggestions?
`expectedCSV = open('ExpectedCases.csv', 'r', newline = '')
table = str.maketrans('\r', ' ')
content = expectedCSV.read()
expectedCSV.close()
content = content.translate(table)
content = content.split(sep = '\n')
newContent = []
for item in content:
newContent.append((item.split(sep= ',')))
content = ' '
for item in newContent:
if item[0] == currentZcta:
expectedTotal += (float(item[1]))
totalPop += (float(item[2]))
totalObservedCount += (float(item[3]))`
Also, I couldn't figure out how to color the methods blue and the variables red, as some of the more awesome users of this site do. I would be very much interested in learning how to do that for future posts.
If anybody needs more info or anything clarified to help answer/formulate a solution, please, by all means, ask! Thanks for taking the time to read!
So, I ended up "solving" this by computing the observed along with the expected and population, by opening the file for each ZCTA computed. This did not really solve the issue I was dealing with, but rather found a way around it. I'm somewhat disappointed that more people didn't view and/or respond to this. If someone comes up with an answer to the actual problem, by all means, post it here. -Mike
I'm developing an application, that allows using dictionaries (e.g. English - German or Country - Capital). There are just 2 very plain tables:
1) Dictionary:
int Id, string Title //PartitionKey="SomeConstString", Rowkey=Id.ToString()
2) Article:
int DictionaryId, string Word, string Meaning //PartitionKey="D" + DictionaryID, Rowkey=Word
I can add articles, but when trying to delete I get the following problem: in every dictionary one or two articles are not deleted. Instead I get ResourceNotFoundException. There is absolutely nothing special about those articles (e.g. Russia - Moscow). When I try to add articles with same PartitionKey and RowKey I get EntityAlreadyExistsException. I installed "Cloud Storage Studio" and found out that those entities are really still in table. I tried to delete them manually but got the same ResourceNotFoundException in storage studio that I was getting in code. So if I add 100 articles and then try to delete them (in code or in studio like Ctrl+A -> Delete), 99 (or sometimes 98) are deleted and others are not. I'm using development storage emulator. Here is how I remove articles (I tried different approaches, result is still the same):
public void DeleteAllArticlesFromDictionary(int dictionaryID) {
TableServiceContext tableServiceContext = tableClient.GetDataServiceContext();
string partitionKey = "D" + dictionaryID;
Article[] articles = tableServiceContext.CreateQuery<Article>(articleTableName).Where(a => a.PartitionKey == partitionKey).ToArray();
for (int i = 0; i<articles.Length; ++i)
tableServiceContext.DeleteObject(articles[i]);
tableServiceContext.SaveChanges();
}
Can anyone tell me what can possibly be wrong with this?
UPD: Works fine in Cloud
I think I found the problem, there was a space character (0x20, ' ') in the end of the word which is RowKey. So it was say 'RowKey1 '. I removed space and now everything is fine in devstorage too (it was not a problem in real environment). However it is rather confusing that spaces are treated correctly inside the key (say 'Row Key 1' can be used without errors) but cause such behaviour if are trailing characters (or maybe leading too). I've read about 'Characters Disallowed in Key Fields', but spaces were not mentioned there. I guess I should use Trim() for my strings before using them as keys.