How can I get the standard streams of my process without any library? - io

I'm writing a coding language of my own and I want to be able to create console applications, this means I need to be able to catch the standard streams of me process.
Can anyone give me any insight into how this is done in other languages?
For example, how does the Console class in C# work behind the covers? Or what does cout in C++ compile to?
Any insight would help, I don't have a solid lead.

There is no portable solution without a library. Indeed, the point of the standard C I/O library is to provide a portable solution.
Each platform will also have its own libraries (for the various supported languages), which you could use if your compiler can produce anappropriate calling sequence.
The existence of the three standard streams (input, output and error output) is, in part, an artifact of the standard C library but there is usually an obvious translation from the platform's native interface.
For example, the Posix interface, in which every I/O stream is referenced with a small integer, called the "file descriptor" (fd), reserves fds 0, 1 and 2 for the three standard streams. Posix shells arrange for these fds to be correctly associated when they initiate a console application.
Windows has a different mechanism, which you can probably read about in the Windows documentation. (The shell interface is very different.)
And so on.

Related

What programming languages support running in a limited manner for security and what proven issues complicate this?

I want to allow user scripting/programming of server-side programs. While I am seeking advice on serverfault regarding how to mitigate security risks on the system administration side of things, I am also wondering what languages provide facilities to restrict them from executing dangerous code?
While I'll discuss Lua and Tcl, I am very interested in other options as well; I love to learn about new languages, and having a selection to choose from would be great from the standpoint of being able to pick one that would be ideal for users.
Lua:
Lua has a sandbox capability.
As an example of an issue that a developer may need to consider for Lua, there is this page regarding Lua bytecode as a vector of attack.
Tcl:
Tcl has the ability to create "safe interpreters":
https://www.tcl-lang.org/man/tcl8.6/TclCmd/interp.htm#M12
If the -safe switch is specified (or if the parent interpreter is a safe interpreter), the new child interpreter will be created as a safe interpreter with limited functionality; otherwise the child will include the full set of Tcl built-in commands and variables.
I haven't been able to find much information regarding issues a developer may need to consider for Tcl.
What are other languages with similar capabilities and do they have any proven known issues in this regard?
In Tcl with basic safe interpreters, all operations that touch the outside world at all (open, exec, source, load, socket, etc.) are profiled out of the safe interpreter — the commands that provide them are hidden (a special kind of naming that means that they cannot be accessed from inside the interpreter) — so it's trivially provable that by default there's only possible problems due to excessive memory or CPU use.
But what about how to actually let the safe interpreters do something useful?
Well, that's possible because every safe interpreter has a parent interpreter that is fully-enabled, and which can create inter-interpreter aliases: commands in the safe interpreter which invoke a defined stand-in in the controlling parent. It's good to think of those as being analogous to a system call (though much cheaper!) and they can provide exactly the operations that the application wishes to support. Of course, some substantial care is required if you take an argument to those commands that you intend to treat as a filename or network address, but you at least know that you only ever get poked at in ways that you expect. (The usual way to avoid problems with filenames is to only support abstract handles — simple names defined by the parent — that have no meaning other than “you can use this in these operations”. That's pretty much how Tcl's I/O channel handles work.)
There's also the full Safe Tcl built-in package, which provides a simulated full interpreter in a safe interpreter and which allows defined profiles of what can be accessed (e.g., just reading packages from a defined local package repository). I'm less certain that that's correct; it is quite complicated internally.

Explain how information flows between programming languages, for beginning programmers

I am an autodidact, teaching myself how to program and script in different languages (novice in: Java, C++, Javascript/Node.js, HTML/CSS), design projects and schematics, adding electronics and peripherals.
What I have seen a lot of while researching is the use of multiple languages to achieve a set of goals (such as building a Web server in Javascript/Node to handle HTTP requests and responses, responding with a Web page written in HTML and customized/stylized with CSS and embedded with Javascript mannerisms; or instead of Node, you write it in PHP or Python).
I'm having a hard time wrapping my mind around WHY multiple languages are used instead of just one (some high-level languages are capable of performing a large portion, if not all, of the required tasks) and HOW information is passed in between the different languages. Could one program call another (I know that an HTML file can make "calls" to CSS and Javascript files, so, I understand that instance)?
I think the reason why I am hung up on this is because of my inexperience and lack of knowledge of other common languages. Does that mean that certain languages are meant to handle only specific tasks in a specific manner?
I feel like some languages, such as Java and C++, for example, can be used in various ways and in various instances to handle a myriad of different tasks. Is that not true of some of the others (PHP & Python, for instance)?
I'm digging into the wealth of knowledge and the collective experience of some of the most brilliant minds this world has to offer but remember that I am new to this and I don't have the advantage of doing this in a classroom but I have read and own many books on programming in specific languages and the like. Please answer in a way that I and the others that may follow can understand.
Thank you for your time and I look forward to the responses.
Cheers.
Fantastic answers!
I'm curious though; when working towards a solution for a specific problem, when does the programmer know when to stop in one language and continue a segment in another language?
That's where I am confused. Is it typically up to the software developer and his/her own particular and artistic preference on how something is done or are certain things just not possible without using multiple languages?
I do understand scripting and when it's beneficial to use rather than a program or application and I know runtime execution/compiled code, environments and frameworks and virtual machines but none of that clearly lays out a defined perimeter or a limit in functionality/ability for any particular language. Why call a C++ function in Python? Could Python not accomplish what was needed in the first place and could choosing a more appropriate language have mitigated the need for adding another level of complexity to the solution? I may be overthinking it but knowing this will guide me in my learning and help me map out better solutions as a programmer.
Basically the different technologies (browsers, operating systems etc) and with it programming languages evolved over time so that there are many different languages used in practice. For the same reason that there are multiple real languages. You could design a web browser that supports Python instead of JavaScript for front-end programming, but this would involve designing the APIs the scripts use to access the pages (DOM HTML model), it would need to be supported by all major web browsers, standardized, and web developers would need to use it.
Yes in many cases it is possible for programs written in one language to call programs written in another languages. There needs to be some kind of interface connecting the two parts, depending on the context. For example:
C and C++ are both compiled languages. That is, they get translated into machine code to be executed by the processor. The positions in the machine code where code for one function is located are stored with it. The linker of the operating system is responsible for linking two modules (.c files) such that function calls made in one module towards a function defined in the other result in the right machine code being loaded. For a C++ program to call a C program, one problem (many others) is that functions are named differently (name mangling). In practice the functions from the C program would need to be declared extern "C" in the C++ source code for the linker to set this up correctly.
JavaScript, CSS and HTML are interpreted and executed (for JavaScript) by the browser, but not necessarily translated into machine code. (JavaScript engined may use Just-in-time compilation). So the browser provides possibilities for the JavaScript code to access the CSS definition, for example. .style.color = ....
For scripting languages like Perl, PHP, Python etc to call each other, different libraries exist that handle the necessary intermediate steps ("glue code"). There are many possibilities, for instance the PHP code could invoke the Python interpreter to execute a Python program, or it could pass data to a running Python program through the operating system's mechanisms etc.
Wrappers such as SWIG allow C/C++ code to be called from scripting languages. They add the necessary symbols (functions) to the code that Python would call internally. Than the C++ program is compiled as a Python extension, which is loaded by the Python interpreter, itself a compiled program, and the operating system's linker is used. The Python interpreter then interprets Python code in such a way that calling a given Python function results in the machine code of the extension's wrapper function to be executed.
There are many ways to classify programming languages into categories. For example from low level (machine code) to higher level (more abstraction, translation to machine code handled automatically):
Assembly (for expressing machine code instructions)
Compiled languages for system-level programming. (C, C++, Pascal, ...)
Compiled languages running in a VM (Java, C#, ...)
Scripting languages (Python, Perl, PHP, ...) Less focused on efficiency, but more flexible.
Higher domain-speficic level languages (MATLAB, AppleScript)
Shell scripting (bash, sh)
Programming is all about creating solutions to problems. People think differently. People see the world from different perspectives. People like to tweak solutions and play with tools. Languages are created by people in order to solve different problems and in some cases just for play. My response is more along the lines of 'Why would there only be one language?'.

How do programs use other programs that are not in the same programming language

How do programs use other programs that aren't in the same language?
For example, Windows is in C++ but the kernel is in C.
I've also seen Java programs use C programs as well.
How do they do this?
Do they use master classes? Like class Whatever : MasterClassName?
I'm not sure what you mean when you say you've seen a Java program use a C program. Do you mean a Java program use an executable that was generated from C code?
On a slightly different note, two programs can communicate with each other through, among other things, a DLL, a socket interface (TCP/UDP), a file, a database, and/or CORBA.
The programming language is not what is relevant for communication between programs. Programming languages are just a means of making programs readable for humans; what the computer actually executes is machine code.
In order to communicate, programs need to make several assumptions about how this communication looks like. This mechanism is generally called a protocol.
For example, applications communicate with an OS kernel typically via a syscall protocol. They store some special values in the processor registers, and use a machine code instruction to switch into kernel mode. The kernel then examines the aforementioned special values to decide which operation needs to be executed (e.g. open file, print on terminal, etc.). The meaning of specific values is specified by the protocol.
DLLs, files, databases, IPC, networking etc. all are communication protocols, in this particular interpretation. They are usually implemented in terms of more low-level protocols, such as kernel syscalls, though.

Determine source language from a binary?

I responded to another question about developing for the iPhone in non-Objective-C languages, and I made the assertion that using, say, C# to write for the iPhone would strike an Apple reviewer wrong. I was speaking largely about UI elements differing between the ObjC and C# libraries in question, but a commenter made an interesting point, leading me to this question:
Is it possible to determine the language a program is written in, solely from its binary? If there are such methods, what are they?
Let's assume for the purposes of the question:
That from an interaction standpoint (console behavior, any GUI appearance, etc.) the two are identical.
That performance isn't a reliable indicator of language (no comparing, say, Java to C).
That you don't have an interpreter or something between you and the language - just raw executable binary.
Bonus points if you're language-agnostic as possible.
Short answer: YES
Long answer:
If you look at a binary, you can find the names of the libraries that have been linked in. Opening cmd.exe in TextPad easily finds the following at hex offset 0x270: msvcrt.dll, KERNEL32.dll, NTDLL.DLL, USER32.dll, etc. msvcrt is the Microsoft 'C' runtime support functions. KERNEL32, NTDLL, and USER32.dll are OS specific libraries which tell you either the target platform, or the platform on which it was built, depending on how well the cross-platform development environment segregates the two.
Setting aside those clues, most any c/c++ compiler will have to insert the names of the functions into the binary, there is a list of all functions (or entrypoints) stored in a table. C++ 'mangles' the function names to encode the arguments and their types to support overloaded methods. It is possible to obfuscate the function names but they would still exist. The functions signatures would include the number and types of the arguments which can be used to trace into the system or internal calls used in the program. At offset 0x4190 is "SetThreadUILanguage" which can be searched for to find out a lot about the development environment. I found the entry-point table at offset 0x1ED8A. I could easily see names like printf, exit, and scanf; along with __p__fmode, __p__commode, and __initenv
Any executable for the x86 processor will have a data segment which will contain any static text that was included in the program. Back to cmd.exe (offset 0x42C8) is the text "S.o.f.t.w.a.r.e..P.o.l.i.c.i.e.s..M.i.c.r.o.s.o.f.t..W.i.n.d.o.w.s..S.y.s.t.e.m.". The string takes twice as many characters as is normally necessary because it was stored using double-wide characters, probably for internationalization. Error codes or messages are a prime source here.
At offset B1B0 is "p.u.s.h.d" followed by mkdir, rmdir, chdir, md, rd, and cd; I left out the unprintable characters for readability. Those are all command arguments to cmd.exe.
For other programs, I've sometimes been able to find the path from which a program was compiled.
So, yes, it is possible to determine the source language from the binary.
I'm not a compiler hacker (someday, I hope), but I figure that you may be able to find telltale signs in a binary file that would indicate what compiler generated it and some of the compiler options used, such as the level of optimization specified.
Strictly speaking, however, what you're asking is impossible. It could be that somebody sat down with a pen and paper and worked out the binary codes corresponding to the program that they wanted to write, and then typed that stuff out in a hex editor. Basically, they'd be programming in assembly without the assembler tool. Similarly, you may never be able to tell with certainty whether a native binary was written in straight assembler or in C with inline assembly.
As for virtual machine environments such as JVM and .NET, you should be able to identify the VM by the byte codes in the binary executable, I would expect. However you may not be able to tell what the source language was, such as C# versus Visual Basic, unless there are particular compiler quirks that tip you off.
what about these tools:
PE Detective
PEiD
both are PE Identifiers. ok, they're both for windows but that's what it was when i landed here
I expect you could, if you disassemble the source, or at least you may know the compiler, as not all compilers will use the same code for printf for example, so Objective-C and gnu C should differ here.
You have excluded all byte-code languages so this issue is going to be less common than expected.
First, run what on some binaries and look at the output. CVS (and SVN) identifiers are scattered throughout the binary image. And most of those are from libraries.
Also, there's often a "map" to the various library functions. That's a big hint, also.
When the libraries are linked into the executable, there is often a map that's included in the binary file with names and offsets. It's part of creating "position independent code". You can't simply "hard-link" the various object files together. You need a map and you have to do some lookups when loading the binary into memory.
Finally, the start-up module for C, C++ (and I imagine C#) is unique to that compiler's defaiult set of libraries.
Well, C is initially converted the ASM, so you could write all C code in ASM.
No, the bytecode is language agnostic. Different compilers could even take the same code source and generate different binaries. That's why you don't see general purpose decompilers that will work on binaries.
The command 'strings' could be used to get some hints as to what language was used (for instance, I just ran it on the stripped binary for a C application I wrote and the first entries it finds are the libraries linked by the executable).

Bare metal cross compilers input

What are the input limitations of a bare metal cross compiler...as in does it not compile programs with pointers or mallocs......or anything that would require more than the underlying hardware....also how can 1 find these limitations..
I also wanted to ask...I built a cross compiler for target mips..i need to create a mips executable using this cross compiler...but i am not able to find where the executable is...as in there is 1 executable which i found mipsel-linux-cpp which is supposed to compile,assemble and link and then produce a.out but it is not doing so...
However the ./cc1 gives a mips assembly.......
There is an install folder which has a gcc executable which uses i386 assembly and then gives an exe...i dont understand how can the gcc exe give i386 and not mips assembly when i have specified target as mips....
please help im really not able to understand what is happ...
I followed the foll steps..
1. Installed binutils 2.19
2. configured gcc for mips..(g++,core)
I would suggest that you should have started two separate questions.
The GNU toolchain does not have any OS dependencies, but the GNU library does. Most bare-metal cross builds of GCC use the Newlib C library which provides a set of syscall stubs that you must map to your target yourself. These stubs include low-level calls necessary to implement stream I/O and heap management. They can be very simple or very complex depending on your needs. If the only I/O support is to a UART to stdin/stdout/stderr, then it is simple. You don't have to implement everything, but if you do not implement teh I/O stubs, you won't be able to use printf() for example. You must implement the sbrk()/sbrk_r() syscall is you want malloc() to work.
The GNU C++ library will work correctly with Newlib as its underlying library. If you use C++, the C runtime start-up (usually crt0.s) must include the static initialiser loop to invoke the constructors of any static objects that your code may include. The run-time start-up must also of course initialise the processor, clocks, SDRAM controller, timers, MMU etc; that is your responsibility, not the compiler's.
I have no experience of MIPS targets, but the principles are the same for all processors, there is a very useful article called "Building Bare Metal ARM with GNU" which you may find helpful, much of it will be relevant - especially porting the parts regarding implementing Newlib stubs.
Regarding your other question, if your compiler is called mipsel-linux-cpp, then it is not a 'bare-metal' build but rather a Linux build. Also this executable does not really "compile, assemble and link", it is rather a driver that separately calls the pre-processor, compiler, assembler and linker. It has to be configured correctly to invoke the cross-tools rather than the host tools. I generally invoke the linker separately in order to enforce decisions about which standard library to link (-nostdlib), and also because it makes more sense when a application is comprised of multiple execution units. I cannot offer much help other than that here since I have always used GNU-ARM tools built by people with obviously more patience than me, and moreover hosted on Windows, where there is less possibility of the host tool-chain being invoked instead (one reason why I have also avoided those tool-chains that rely on Cygwin)
EDIT
With more time available, I have rewritten my original answer in an attempt to provide something more useful.
I cannot provide a specific answer for your question. I have never tried to get code running on a MIPS machine. What I do have is plenty of experience getting a variety of "bare metal" boards up and running. All kinds of CPUs and all kinds of compilers and cross compilers. So I have an understanding of the principles that apply in all such situations. I will point out the kind of knowledge you will need to absorb before you can hope to succeed with a job like this, and hopefully I can list some links to resources to get you started on learning that knowledge.
I am worried you don't know that pointers are exactly the kind of thing a bare metal compiler can handle, they are a basic machine primitive. This tells me you are probably not an expert embedded developer who is just stuck in this particular scenario. Never mind. There isn't anything magic about programming an embedded system, and you can learn what you need to know.
The first step is getting to understand the relationship between C and the machine you wish to run code on. Basically C is a portable assembly language. This means that C is good for manipulating the basic operations of the machine. In this sense the basic operations of the machine are reading and writing memory locations, performing arithmetic and boolean operations on the data read from memory, and making branching and looping decisions based on that data. In particular the C concept of pointers allows you to manipulate data at locations in memory that you specify.
So far so good, but just doing raw computations in memory is not usually enough - you need a way to input and output data from memory. To do that you need to manipulate the hardware peripherals on your board. If the hardware peripherals are memory mapped then the machine registers used to control the peripherals look exactly like memory locations and C can manipulate them directly. Even in that case though, it is much more likely that doing useful I/O is best handled by extending the C core language with a library of routines provided just for that purpose. These library routines handle all the nasty details (timers, interrupts, non-memory mapped I/O) involved in manipulating the peripheral hardware on the board, and wrap them up with a convenient C function call interface. The idea is that you can go simply printf("hello world"); and the library call take care of the details of displaying the string.
An appropriately skilled developer knows how to adapt an existing I/O library to a new board, or how to develop new library routines to provide access to non-standard custom hardware. The classic way to develop these skills is to start with something simple, usually a LED for an output device, and a switch for an input device. Write a program that pulses a LED in a predictable way, or reads a switch and reflects in on a LED. The first time you get this working will be hugely satisfying.
Okay I have rambled enough. It is time to provide some more resources for you to study. The good news is that there's never been a better time to learn how things work at the interface between hardware and software. There is a wealth of freely available code and docs. Stackoverflow is a great resource as you know. Good luck! Links follow;
Embedded systems overview
Knowing the C language well is fundamental
Why not get your code working on a simulator before you try real hardware
Another emulated environment
Linux device drivers - an overlapping subject
Another book about bare metal programming

Resources