I have a small single board computer which will be running a linux distribution and some programs and has specific user configuration, directory structure, permissions settings etc.
My question is, what is the best way to maintain the system configuration for release? In my time thinking about this problem I've thought of a few ideas but each has its downsides.
Configure the system and burn the image to an iso file for distribution
This one has the advantage that the system will be configured precisely the way I want it, but committing an iso file to a repository is less than desirable since it is quite large and checking out a new revision means reflashing the system.
Install a base OS (which is version locked) and write a shell script to configure the settings from scratch.
This one has the advantage that I could maintain the script in a repository and update and config changes by pulling changes to the script and running it again, however now I have to maintain a shell script to configure a system and its another place where something can go wrong.
I'm wondering what the best practices are in embedded in general so that I can maybe implement a good deployment and maintenance strategy.

Embeddded systems tend to have a long lifetime. Do not be surprised if you need to refer to something released today in ten years' time. Make an ISO of the whole setup, source code, diagrams, everything... and store it away redundantly. Someone will be glad you did a decade from now. Just pretend it's going to last forever and that you'll have to answer a question or research a defect in ten years.


Check file history in Linux/Ubuntu/Centos

Can i check the file history like git or SVN in linux os. The modifications by date in Linux/Ubuntu/Centos. Any software that helps me do this?
Git and Subversion are software packages whose purpose in life is to keep track of content changes in the files of a project. The OSes usually do not care about files history; they don't provide such a feature.
Windows and macOS include backup tools that run automatically in background (if they are enabled) from time to time and can be used to access some (not all) past versions of the files. This functionality comes with the cost of disk space used to store the past versions of files.
Linux doesn't provide such a tool (but you can install one, if you need it.)
I guess you are out of luck. You cannot recover a previous version of the file but you can install a backup software to avoid reaching this situation in the future.
By default you can't. The filesystem simply stores the current state of the file, not its history (as 1615903 pointed out in the comments, there are some versioned filesystems that keep track of this kind of history, but they are largely unsupported in Linux - which means you probably aren't dealing with one, if you are, the filesystem documentation can guide you through the recovery of your file). It's possible that some forensics tool can at least attempt to recover some file's history but I'm not sure of that (and they will probably fail if the older file's sectors have been written on).
For the future, you can prepare in advance for similar problems by setting up some incremental backup (it can be done pretty easily with rsync), but it's still limited to the specific timeframes you set your script to run into.

maintain multiple versions of Windows CE (QFEs)

We build firmware using Windows CE (6 and 7) on a Windows XP system. We often install the QFEs (CE patches/updates) from Microsoft as they are released. When we have to go back to a certain release to develop a patch, it can be a real pain because we will need to build a system with the same patch level that existed on the system at the time that the product was released. Is there any easy way to maintain a QFE history that can easily be reverted at any given time? Something along the lines of snapshotting the system state as it pertains to the CE install/QFEs at each release? We don't want to use virtual machine snapshots or anything that controls the state of anything outside of the Windows CE components for this. It is a pretty specific requirement, so I am guessing no, but perhaps someone has tackled this exact problem.
I understand that you're saying you don't want to use VMs, though I'm not entirely sure why. I'd recommend at least thinking about it.
Back when I controlled builds for multiple platforms across multiple OS versions, I used Virtual Machines for this. Each VM was a bare snapshot of a PC with the tools and SDKs installed. A build script would then pull the source for each BSP and build it nightly. They key is to maintain and archive "clean" VMs (without source) and just pitch the changes after doing builds. It was way faster and way cleaner than trying to keep the WINCEROOT for each QFE level in source control and pulling that - you have to reset the machine to zero in that case anyway to be confident of no cross-pollution between levels.

How to determine that the shell script is safe

I downloaded this shell script from this site.
It's suspiciously large for a bash script. So I opened it with text editor and noticed
that behind the code there is a lot of non-sense characters.
I'm afraid of giving the script execution right with chmod +x Can you advise me how to recognize if it's safe or how to set it's limited rights in the system?
thank you
The "non-sense characters" indicate binary files that are included directly into the SH file. The script will use the file itself as a file archive and copy/extract files as needed. That's nothing unusual for an SH installer. (edit: for example, makeself)
As with other software, it's virtually impossible to decide wether or not running the script is "safe".
Don't run it! That site is blocked where I work, because it's known to serve malware.
Now, as to verifying code, it's not really possible without isolating it completely (technically difficult, but a VM might serve if it has no known vulnerabilities) and running it to observe what it actually does. A healthy dose of mistrust is always useful when using third-party software, but of course nobody has time to verify all the software they run, or even a tiny fraction of it. It would take thousands (more likely millions) of work years, and would find enough bugs to keep developers busy for another thousand years. The best you can usually do is run only software which has been created or at least recommended by someone you trust at least somewhat. Trust has to be determined according to your own criteria, but here are some which would count in the software's favor for me:
Part of a major operating system/distribution. That means some larger organization has decided to trust it.
Source code is publicly available. At least any malware caused by company policy (see Sony CD debacle) would have a bigger chance of being discovered.
Source code is distributed on an appropriate platform. Sites like GitHub enable you to gauge the popularity of software and keep track of what's happening to it, while a random web site without any commenting features, version control, or bug database is an awful place to keep useful code.
While the source of the script does not seem trustworthy (IP address?), this might still be legit. With shell scripts it is possible to append binary content at the end and thus build a type of installer. Years ago, Sun would ship the JDK for Solaris in exactly that form. I don't know if that's still the case, though.
If you wanna test it without risk, I'd install a Linux in a VirtualBox (free virtual-machine software), run the script there and see what it does.
Addendum on see what it does: There's a variety of tools on UNIX that you can use to analyze a binary program, like strace, ptrace, ltrace. What might also be interesting is running the script using chroot. That way you can easily find all files that are installed.
But at the end of the day this will probably yield more binary files which are not easy to examine (as probably any developer of anti-virus software will tell you). Therefore, if you don't trust the source at all, don't run it. Or if you must run it, do it in a VM where at least it won't be able to do too much damage or access any of your data.

Use "apt" or compile from scratch for a web service?

For the first time, I am writing a web service that will call upon external programs to process requests in batch. The front-end will accept file uploads and then place them in a queue. The workers on the backend will take that file, run it through ffmpeg and the rest of my pipeline, and send an email when the process is complete.
I have my backend process working on my computer (Ubuntu 10.04). The question is: should I try to re-create that pipeline using binaries that I've compiled from scratch? Or is it okay to use apt when configuring in The Real World?
Not all hosting services uses Ubuntu, and not all give me root access. (I haven't chosen a host yet.) However, they will let me upload binaries to execute, and many give me shell access with gcc.
Usually this would be a no-brainier and I'd compile it all from scratch. But doing so - not to mention trying to figure out how to create a platform-independent .tar.gz binary - will be quite a task which ultimately doesn't really help me ship my product.
Do you have any thoughts on the best way to set up my stack so that I'm not tied to a specific hosting provider? Should I try creating my own .deb, which contains Ubuntu's version of ffmpeg (and other tools) with the configurations I need?
Short of a setup where I manage my own servers/VMs (which may very well be what I have to do), how might I accomplish this?
The question is: should I try to re-create that pipeline using binaries that I've compiled from scratch? Or is it okay to use apt when configuring in The Real World?
It is in reverse: it is not okay to deploy unpackaged in The Real World IMHO
and not all give me root access
How would you be deploying a .deb without root access. Chroot jails?
But doing so - not to mention trying to figure out how to create a platform-independent .tar.gz binary - will be quite a task which ultimately doesn't really help me ship my product.
+1 You answer you own question. Don't meddle unless you have to.
Do you have any thoughts on the best way to set up my stack so that I'm not tied to a specific hosting provider?
Only depend on wellpackaged standard libs (such as ffmpeg). Otherwise include them in your own deployment. This problem isn't too hard too solve for 10s of thousand Linux applications over decades now, so it would probably be feasible for you too.
Out of the box:
Look at rightscale and other cloud providers/agents that have specialized images/tool chains especially for video encoding.
A 'regular' VPS provider (with Xen or Virtuozzo) will not normally be happy with these kinds of workload, but EC2, Rackspace and their lot will be absolutely fine with that.
In general, I wouldn't believe that a cloud infrastructure provider that doesn't grant root access will allow for computationally intensive workloads. $0.02

What's the best way to keep multiple Linux servers synced?

I have several different locations in a fairly wide area, each with a Linux server storing company data. This data changes every day in different ways at each different location. I need a way to keep this data up-to-date and synced between all these locations.
For example:
In one location someone places a set of images on their local server. In another location, someone else places a group of documents on their local server. A third location adds a handful of both images and documents to their server. In two other locations, no changes are made to their local servers at all. By the next morning, I need the servers at all five locations to have all those images and documents.
My first instinct is to use rsync and a cron job to do the syncing over night (1 a.m. to 6 a.m. or so), when none of the bandwidth at our locations is being used. It seems to me that it would work best to have one server be the "central" server, pulling in all the files from the other servers first. Then it would push those changes back out to each remote server? Or is there another, better way to perform this function?
The way I do it (on Debian/Ubuntu boxes):
Use dpkg --get-selections to get your installed packages
Use dpkg --set-selections to install those packages from the list created
Use a source control solution to manage the configuration files. I use git in a centralized fashion, but subversion could be used just as easily.
An alternative if rsync isn't the best solution for you is Unison. Unison works under Windows and it has some features for handling when there are changes on both sides (not necessarily needing to pick one server as the primary, as you've suggested).
Depending on how complex the task is, either may work.
One thing you could (theoretically) do is create a script using Python or something and the inotify kernel feature (through the pyinotify package, for example).
You can run the script, which registers to receive events on certain trees. Your script could then watch directories, and then update all the other servers as things change on each one.
For example, if someone uploads spreadsheet.doc to the server, the script sees it instantly; if the document doesn't get modified or deleted within, say, 5 minutes, the script could copy it to the other servers (e.g. through rsync)
A system like this could theoretically implement a sort of limited 'filesystem replication' from one machine to another. Kind of a neat idea, but you'd probably have to code it yourself.
AFAIK, rsync is your best choice, it supports partial file updates among a variety of other features. Once setup it is very reliable. You can even setup the cron with timestamped log files to track what is updated in each run.
I don't know how practical this is, but a source control system might work here. At some point (perhaps each hour?) during the day, a cron job runs a commit, and overnight, each machine runs a checkout. You could run into issues with a long commit not being done when a checkout needs to run, and essentially the same thing could be done rsync.
I guess what I'm thinking is that a central server would make your sync operation easier - conflicts can be handled once on central, then pushed out to the other machines.
rsync would be your best choice. But you need to carefully consider how you are going to resolve conflicts between updates to the same data on different sites. If site-1 has updated
'customers.doc' and site-2 has a different update to the same file, how are you going to resolve it?
I have to agree with Matt McMinn, especially since it's company data, I'd use source control, and depending on the rate of change, run it more often.
I think the central clearinghouse is a good idea.
Depends upon following
* How many servers/computers that need to be synced ?
** If there are too many servers using rsync becomes a problem
** Either you use threads and sync to multiple servers at same time or one after the other.
So you are looking at high load on source machine or in-consistent data on servers( in a cluster ) at given point of time in the latter case
Size of the folders that needs to be synced and how often it changes
If the data is huge then rsync will take time.
Number of files
If number of files are large and specially if they are small files rsync will again take a lot of time
So all depends on the scenario whether to use rsync , NFS , Version control
If there are less servers and just small amount of data , then it makes sense to run rysnc every hour.
You can also package content into RPM if data changes occasionally
With the information provided , IMO Version Control will suit you the best .
Rsync/scp might give problems if two people upload different files with same name .
NFS over multiple locations needs to be architect-ed with perfection
Why not have a single/multiple repositories and every one just commits to those repository .
All you need to do is keep the repository in sync.
If the data is huge and updates are frequent then your repository server will need good amount of RAM and good I/O subsystem
