How to evaluate the contents of an each loop - erb

Here's the setup:
I have a data folder full of yaml files, each of which contains metadata about a post. In my .erb template, I'm looping through all of these with a simple each.do loop.
Now, what I want to do, rather than print every yaml file in that folder, is to only print those where the .yaml content contains an exact string.
Is this possible?
--
For example, let's say I have this structure for data
-data
--blogposts
---post1.yaml
---post2.yaml
...
---post100.yaml
and then, for each post#.yaml, for sake of simplicity, let's say it looks like this:
---
:id: 1
:title: my example title
:postCategory:
- :id: 1
:categoryTitle:
:Business
And in my template, the loop to grab ALL posts obviously looks like:
<% data.blogposts.each do |id, post| %>
*display stuff*
<% end %>
What I really want is to evaluate each .yaml file within "blogposts", check to see if there is a categoryTitle item that matches a string (let's say Business here), and if so, output as an item in the loop. The goal is that I could use this on each "category page" and dynamically just pull in posts who's category is being requested.

Use Enumerable#grep, for given category
<% data.blogposts.each do |id, post| %>
if File.readlines("#{post.name}.yaml").grep(/#{category}/).size > 0
*display stuff*
end
<% end %>

I will go with something like this:
Check the yaml file when i receive it and search for categoryTitle.
Add the file name to a master.yaml file.
Read master.yaml to get the file name of all files i want to display.
This way you don't loop every time in all your files, you just go through one file to identify the ones that you really need to open.
So your folder structure will have one more file, for example:
-data
--master.yaml
--blogposts
---post1.yaml
---post2.yaml
...
---post100.yaml
And the master.yaml file structure may look like this:
:business:
:post1.yaml
:post3.yaml
:othertitle:
:post2.yaml
:post5.yaml
Just check master.yaml for the files with the title you want (e.g. business) and loop through those files only (instead of looping all files every time).

Related

How to read multiple CSV (leaving out specific ones) from a nested directory in PySpark?

Lets say I have a directory called 'all_data', and inside this, I have several other directories based on the date of the data that it contains. These directories are named date_2020_11_01 to date_2020_11_30 and each one of these contain csv files which I intend to read in a single dataframe.
But I don't want to read the data for date_2020_11_15 and date_2020_11_16. How do I do it?
I'm not sure how to exclude certain files, but you can specify a range of file names using brackets. Code below would select all files without 11_15 and 11_16:
spark.read.csv("date_2020_11_{1[0-4,7-9],[0,2-3][0-9]}.csv")
df= spark.read.format("parquet").option("header", "true").load(paths)
where paths is a list of all the paths where data is present, worked for me.
Simple method is, read all data directory as it is and apply filter condition
df.filter("dataColumn != 'date_2020_11_15' & 'date_2020_11_16'")
Else you can use OS module read directory and iterate to that list to eliminate those date directory using condition.

exclude a certain path from all user searches

Unfortunately we have a special folder named "_archive" in our repository everywhere.
This folder has its purpose. But: When searching for content/documents we want to exclude it and every content beneath "_archive".
So, what i want is to exclude the path and its member from all user searches. Syntax is easy with fts:
your_query AND -PATH:"//cm:_archive//*"
to test:
https://www.docdroid.net/RmKj9gB/search-test.pdf.html
take the pdf, put it into your repo twice:
/some_random_path/search-test.pdf
/some_random_path/_archive/search-test.pdf
In node-browser everything works as expected:
TEXT:"HODOR" AND -PATH:"//cm:_archive//*"
= 1 result
TEXT:"HODOR"
= 2 results
So, my idea was to edit search.get.config.xml and add the exclusion to the list of properties:
<search>
<default-operator>AND</default-operator>
<default-query-template>%(cm:name cm:title cm:description ia:whatEvent
ia:descriptionEvent lnk:title lnk:description TEXT TAG) AND -PATH:"//cm:_archive//*"
</default-query-template>
</search>
But it does not work as intended! As soon as i am using 'text:' or 'name:' in the search field, the exclusion seems to be ignored.
What other option do i have? Basically just want to add the exclusion to the base query after the default query template is used.
Version is Alfresco Community 5.0.d
thanks!
I guess you're mistaken what query templates are meant for. Take a look at the Wiki.
So what you're basically doing is programmatically saying I've got a keyword and I want to match the keywords to the following metadata fields.
Default it will match cm:name cm:title cm:description etc. This can be changed to a custom field or in other cases to ALL.
So putting an extra AND or here of whatever won't work, cause this isn't the actual query which will be built. I can go on more about the query templates, but that won't do you any good.
In your case you'll need to modify the search.get webscript of Alfresco and the method called function getSearchResults(params) in search.lib.js (which get's imported).
Somewhere in at the end of the method it will do the following:
ftsQuery = '(' + ftsQuery + ') AND -TYPE:"cm:thumbnail" AND -TYPE:"cm:failedThumbnail" AND -TYPE:"cm:rating" AND -TYPE:"st:site"' + ' AND -ASPECT:"st:siteContainer" AND -ASPECT:"sys:hidden" AND -cm:creator:system AND -QNAME:comment\\-*';
Just add your path to query to it and that will do.

Trying to output the page counts of a large number of PDF's to a log file

I have about 1,550 .pdf files that I want to find page counts for.
I used the command lS -Q | grep \.pdf > ../lslog.log to output all the file names with the extension .pdf to be output into a .log file with double quotes around them. I then opened the lslog.log file in gedit and replaced all the " (double quotes) with ' (apostrophe) so that I can use the files that contain parentheses in the final command.
When I use the command exiftool -"*Count*" (which outputs any exifdata of the selected file that contains the word "count") on a single file, for example, exiftool -"*Count*" 'examplePDF(withparantheses).pdf' I get something like, "Page Count: 512" or whatever the page count happens to be.
However, when I use it on multiple files, for example: exiftool -"*Count*" 'examplePDF(withparantheses).pdf' 'anotherExamplePDF.pdf' I get
File not found: examplePDF(withparantheses).pdf,
======== anotherExamplePDF.pdf
Page Count : 362
1 image files read
1 files could not be read
So basically, I'm able to read the last file, but not the first one. This pattern continues as I add more files. It's able to find the file itself and page count of the last file, but not the other files.
Do I need to input multiple files differently? I'm using a comma right now to separate files, but even without the comma I get the same result.
Does exiftool take multiple files?
I don't know exactly why you're getting the behaviour that you're getting, but it looks like to me like everything you're doing can be collapsed into one line:
exiftool -"*Count*" *.pdf
My output from a bunch of PDFs I had around look like this
======== 86A103EW00.pdf
Page Count : 494
======== DSET3.5_Reportable_Items_Linux.pdf
Page Count : 70
======== DSView 4 v4.1.0.36.pdf
Page Count : 7
======== DSView-Release-Notes-v4.1.0.77 (1).pdf
Page Count : 7
======== DSView-Release-Notes-v4.1.0.77.pdf
Page Count : 7

pentaho create archive folder with MM-YYYY

I would like to archive every file in a folder by putting it in another archive folder with a name like this: "Archive/myfolder-06-2014"
My problem is how to retrieve the current month and year and then how to create a folder (if it does not already exist) with these data.
This solution may be a little awkward (due to the required fuss) but it seems to work. The idea is to precompute the target filename in a seperate transformation and store it as a system variable (TARGET_ZIP_FILENAME):
The following diagrams show the settings of selected components.
Get the current time...
Provide the pattern of the target filename as a string constant...
Extract the month and year as formatted integers...
Replace the month in the pattern (the year will work equivalently)
Set the resulting filename as a system variable
The main job will call the transformation and use the system variable as the zip target filename.
Also you have to make sure that the setting Create Parent folder is active:

How to handle multiple file upload in hunchentoot?

I know how to handle a single file upload in hunchentoot using hunchentoot:post-parameter, but when I add a property multiple, i.e., <input name="file" type="file" multiple="multiple"/>. I got (hunchentoot:post-parameter "file") only for one of them. Is there (and what is ) the mechanism for receiving all files, chosen by user?
The Hunchentoot API does not directly give you access to multiple uploaded files, but you can use (hunchentoot:post-parameters *request*) to retrieve the list of all POST parameters (which includes the uploaded files). This will be an alist, and you can get a list of all uploaded files using standard alist techniques (e.g. (remove "file" (hunchentoot:post-parameters hunchentoot:*request*) :test (complement #'equal) :key #'car)).
This is a rather straight-forward task in hunchentoot. Assuming you have a html <input> element with name="files" and multi="true", you could access all the files associated with the the "files" input like this:
(loop for post-parameter in (hunchentoot:post-parameters*)
if (equal (car post-parameter) "files")
collect post-parameter))
This will give you a list whose length should match the number of uploaded files associated with the name "files". Each of the elements will be a list that looks like this:
("files" #P"/temporary/file1" "name of file" "file type")
More information can be found in the very well-documented reference.

Resources