Understanding Linux 02: Everything is a File
Understanding Linux: Everything is a File
One of the core philosophies lies behind the Unix operating system is "Everything is a file". This design provides a uniform interface for applications and users. This means everything in the system from processes, files, directories, sockets, pipes are represented by files. This ghost from the past still heavily affects the inner workings of the modern Linux distributions.
How can a socket or process be a file? It is not an actual file, it is represented by the file. This means you can access every hardware and operating system capability of your computer using just file addresses.
Before we get into examples, please keep in mind that there are many different ways for achieving any given task on Linux. These examples are designed to build intuition about "everything is a file". There are many specialized commands which read the information from these files and present them to the user with a better interface. The "everything is a file" approach makes it easier to develop such tools. Exploiting system capabilities organized like files is easier than learning different approaches and libraries to access different system capabilities.
Reading Hardware Utilization as File
Users can learn about hardware resources by using the proper file paths. No specialized tool is needed. To print all information about your CPU, including current clock speed and other information:
cat /proc/cpuinfo
...
cpu MHz : 4082.523
cache size : 9216 KB
physical id : 0
siblings : 6
core id : 5
...
cat (short for “concatenate“) command prints the target file into the console and exits successfully. Another command we are going to use is
Try printing the following files with the cat command and inspect the outputs:
- /proc/meminfo
- /proc/devices
- /proc/diskstats
These outputs are not very human-readable, yet they include all the required information.
Managing LEDs as File
Light-emitting diodes (LED) are the little blinking lights plugged to computer. These LEDs are represented by files under the proper folder:
ls /sys/class/leds/
input15::capslock input15::kana input15::scrolllock input16::numlock
input4::capslock input4::kana input4::scrolllock input4::numlock
input15::compose input15::numlock input16::capslock input16::scrolllock
...
ls command lists the files in the given folder. Lists the current folder if not target is given Behaves similarly to DIR command and
Try the following command to the status of your capslock LED on your keyboard:
cat /sys/class/leds/input15\:\:capslock/brightness
0
Not the most useful feature for daily life, yet like everything else, it is a file. Just like everything else, Linux allows you to access and manage even the single led on your hardware. This one gets useful for USB devices with disturbingly bright LEDs, like older ALFA dongles.
Accessing Processes as File
If you were to list files in the /proc folder using the ls command, you can see many directories with numeric names.
ls /proc/
1 1186 1359 1424 15291 1795 210 235 3492 48 55 698 834 crypto kpagecount swaps
10 1189 1362 1425 1559 18390 211 24 35 49 5599 700 835 devices kpageflags sys
These are the process IDs of the processes currently running the operating system. Looking in these directories, we can learn everything about the given process.
Resource Usage and Resources Limits as File
How much resources can be used by the process with ID 235?
cat /proc/235/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 128186 128186 processes
...
How much disk usage is done by that process?
cat /proc/235/io
rchar: 0
wchar: 0
syscr: 0
...
Process 235 did not made any IO operations.
Recovering the Source Code of a Running Command from Memory
Memory objects stored in the ram are also easily accessible within the file structure. This can be leveraged to recover source code or binaries of running files even after the original files were deleted.
Consider the scenario, there is a program in your system which started 30 days ago and is still running. As an accident, somebody deleted the script from the server. The running script won't be affected by the deletion of a file. This file needs to be recovered.
We can easily recover the source code or binary of any given application in Linux:
$ echo "sleep 9999" > test.sh
$ bash test.sh &
[1] 31048
$ rm -v test.sh
$ cat /proc/31048/fd/255
sleep 9999
sleep is a basic command that simply waits for a given amount of seconds and quits. For our example, it is used to keep the process alive for a while.
rm command deletes the given file. Beware, there is no recycle bin in Linux. What is deleted is rm is often gone for good.
bash is the default shell we will be using during this tutorial and it will be explored in detail in later sections. For this example, it is used to execute the file as a script. The ampersand sign at the end of the command puts the command in the background and prints the process id of the process.
As seen in the example, users can event reach the memory data of the processes using only file paths.
Using file paths to create TCP connections
Every resource you can reach over your network is also represented by a file. Consider the following file path:
/dev/tcp/time.nist.gov/14
Even it looks like a local file in the system, it represents connecting to /14
of the /time.nist.gov
using /tcp
capable /dev
ices. This port is used by daytime protocol
By using the core Linux command cat we can read information from this port:
cat </dev/tcp/time.nist.gov/14
59198 20-12-15 14:31:45 00 0 0 4.8 UTC(NIST) *
Daytime is a primitive protocol that directly responds to queries without any negotiation. Talking with HTTP servers is also possible by using extra commands.
exec 4<>/dev/tcp/mirrors.kernel.org/80
echo -e "GET / " >&4
cat <&4 | head
HTTP/1.1 200 OK
Server: nginx
...
echo command is pretty similar to cat command. Cat reads a file and prints the file into the terminal. Echo, on the other hand, prints the argument of the commands into the terminal. In the example, this function is leveraged to send a GET request to HTTP server.
exec command is capable of modifying the running shell. We will be discussing shells in detail in later sections. For now, this command is utilized to stay connected to the file path. Reading time with the daytime protocol was easy because it was a primitive protocol. For any question, daytime replies with the current time as shown in the example above. HTTP is a much more complex protocol that works with requests and responses. We leveraged exec command to keep a connection open for the response according to HTTP 1.1 specs.
Even though it works, this is not very useful for daily HTTP tasks. In later sections, we will explain these commands, try to focus only on the filesystem capabilities at the moment.
Pseudorandom number generators
There some special files to generate pseudorandom numbers. These numbers are generated by the operating system and consumed by using their special files.
To print 100 bytes of random to into console:
head -c 100 </dev/urandom
There are also other generator files in the Linux operating system.
Conclusion
As seen in the examples, there are many similar files that can alter the behavior of the operating system and the connected hardware. This design make it easier for software to interact with these resources.
It is important to note that none of the files existed before nor after these commands. Linux does not save the data to files for the user to read later. Data is generated when a user attempts to access these files. The file size for these files is often 0.
In the next section we are going to discuss how these files are organized.