Linux19 December, 2020

Understanding Linux 02: Everything is a File

Understanding Linux: Everything is a File

One of the core philosophies lies behind the Unix operating system is "Everything is a file". This design provides a uniform interface for applications and users. This means everything in the system from processes, files, directories, sockets, pipes are represented by files. This ghost from the past still heavily affects the inner workings of the modern Linux distributions.

How can a socket or process be a file? It is not an actual file, it is represented by the file. This means you can access every hardware and operating system capability of your computer using just file addresses.

Before we get into examples, please keep in mind that there are many different ways for achieving any given task on Linux. These examples are designed to build intuition about "everything is a file". There are many specialized commands which read the information from these files and present them to the user with a better interface. The "everything is a file" approach makes it easier to develop such tools. Exploiting system capabilities organized like files is easier than learning different approaches and libraries to access different system capabilities.

Reading Hardware Utilization as File

Users can learn about hardware resources by using the proper file paths. No specialized tool is needed. To print all information about your CPU, including current clock speed and other information:

cat /proc/cpuinfo
...
cpu MHz		: 4082.523
cache size	: 9216 KB
physical id	: 0
siblings	: 6
core id		: 5
...

cat (short for “concatenate“) command prints the target file into the console and exits successfully. Another command we are going to use is

Try printing the following files with the cat command and inspect the outputs:

/proc/meminfo
/proc/devices
/proc/diskstats

These outputs are not very human-readable, yet they include all the required information.

Managing LEDs as File

Light-emitting diodes (LED) are the little blinking lights plugged to computer. These LEDs are represented by files under the proper folder:

ls /sys/class/leds/
input15::capslock  input15::kana     input15::scrolllock  input16::numlock
input4::capslock   input4::kana      input4::scrolllock   input4::numlock
input15::compose   input15::numlock  input16::capslock    input16::scrolllock
...

ls command lists the files in the given folder. Lists the current folder if not target is given Behaves similarly to DIR command and

Try the following command to the status of your capslock LED on your keyboard:

cat /sys/class/leds/input15\:\:capslock/brightness
0

Not the most useful feature for daily life, yet like everything else, it is a file. Just like everything else, Linux allows you to access and manage even the single led on your hardware. This one gets useful for USB devices with disturbingly bright LEDs, like older ALFA dongles.

Accessing Processes as File

If you were to list files in the /proc folder using the ls command, you can see many directories with numeric names.

ls /proc/
1      1186   1359   1424   15291  1795   210    235   3492  48    55    698  834        crypto       kpagecount    swaps
10     1189   1362   1425   1559   18390  211    24    35    49    5599  700  835        devices      kpageflags    sys

These are the process IDs of the processes currently running the operating system. Looking in these directories, we can learn everything about the given process.

Resource Usage and Resources Limits as File

How much resources can be used by the process with ID 235?

cat /proc/235/limits

Limit                     Soft Limit           Hard Limit           Units   
Max cpu time              unlimited            unlimited            seconds  
Max file size             unlimited            unlimited            bytes   
Max data size             unlimited            unlimited            bytes   
Max stack size            8388608              unlimited            bytes   
Max core file size        0                    unlimited            bytes   
Max resident set          unlimited            unlimited            bytes   
Max processes             128186               128186               processes
...

How much disk usage is done by that process?

cat /proc/235/io
rchar: 0
wchar: 0
syscr: 0
...

Process 235 did not made any IO operations.

Recovering the Source Code of a Running Command from Memory

Memory objects stored in the ram are also easily accessible within the file structure. This can be leveraged to recover source code or binaries of running files even after the original files were deleted.

Consider the scenario, there is a program in your system which started 30 days ago and is still running. As an accident, somebody deleted the script from the server. The running script won't be affected by the deletion of a file. This file needs to be recovered.

We can easily recover the source code or binary of any given application in Linux:

$ echo "sleep 9999" > test.sh
$ bash test.sh &
[1] 31048
$ rm -v test.sh
$ cat /proc/31048/fd/255
sleep 9999

sleep is a basic command that simply waits for a given amount of seconds and quits. For our example, it is used to keep the process alive for a while.

rm command deletes the given file. Beware, there is no recycle bin in Linux. What is deleted is rm is often gone for good.

bash is the default shell we will be using during this tutorial and it will be explored in detail in later sections. For this example, it is used to execute the file as a script. The ampersand sign at the end of the command puts the command in the background and prints the process id of the process.

As seen in the example, users can event reach the memory data of the processes using only file paths.

Using file paths to create TCP connections

Every resource you can reach over your network is also represented by a file. Consider the following file path:

/dev/tcp/time.nist.gov/14

Even it looks like a local file in the system, it represents connecting to /14 of the /time.nist.gov using /tcp capable /devices. This port is used by daytime protocol

By using the core Linux command cat we can read information from this port:

cat </dev/tcp/time.nist.gov/14
59198 20-12-15 14:31:45 00 0 0   4.8 UTC(NIST) *

Daytime is a primitive protocol that directly responds to queries without any negotiation. Talking with HTTP servers is also possible by using extra commands.

exec 4<>/dev/tcp/mirrors.kernel.org/80
echo -e "GET / " >&4
cat <&4 | head
HTTP/1.1 200 OK
Server: nginx
...

echo command is pretty similar to cat command. Cat reads a file and prints the file into the terminal. Echo, on the other hand, prints the argument of the commands into the terminal. In the example, this function is leveraged to send a GET request to HTTP server.

exec command is capable of modifying the running shell. We will be discussing shells in detail in later sections. For now, this command is utilized to stay connected to the file path. Reading time with the daytime protocol was easy because it was a primitive protocol. For any question, daytime replies with the current time as shown in the example above. HTTP is a much more complex protocol that works with requests and responses. We leveraged exec command to keep a connection open for the response according to HTTP 1.1 specs.

Even though it works, this is not very useful for daily HTTP tasks. In later sections, we will explain these commands, try to focus only on the filesystem capabilities at the moment.

Pseudorandom number generators

There some special files to generate pseudorandom numbers. These numbers are generated by the operating system and consumed by using their special files.

To print 100 bytes of random to into console:

head -c 100 </dev/urandom

There are also other generator files in the Linux operating system.

Conclusion

As seen in the examples, there are many similar files that can alter the behavior of the operating system and the connected hardware. This design make it easier for software to interact with these resources.

It is important to note that none of the files existed before nor after these commands. Linux does not save the data to files for the user to read later. Data is generated when a user attempts to access these files. The file size for these files is often 0.

In the next section we are going to discuss how these files are organized.

Latest posts

How mirror/clone google Debian repository

Step by Step Video Recording Configuration Guide For Jitsi Meet with Jibri