---
title: Understanding files 101
date: 2020-11-21
tags: [code]
description: One of the basics that is often overlooked is a general knowledge of common file types. So this is what this article is about. After reading this you will have a basic understanding of what any file is and how you might be able to interact with it.
---

One topic that seems to always stay relevant to me is the divide between
software developers and regular users. The common approach seems to be that
regular users are not trusted to understand many fundamental ideas of computers
and therefore get very limited apps compared to the power tools that are used
by developers. This approach of course only widens the divide.

On the other side of that argument we get something like
[code.org](https://code.org/) which is trying to teach kids how to code, but
doing so in a toy environment which is completely detached from reality.

I personally believe that we should concentrate on the basics that can be
applied in everyday life. If people know how to look up something on the
internet they can teach themselves. If people know how to use right click
menus or common keyboard shortcuts or browser tabs they can complete their
tasks much more efficiently. Teach a man to fish and all that.

One of these basics that is often overlooked is a general knowledge of common
file types. So this is what this article is about. After reading this you will
have a basic understanding of what any file is and how you might be able to
interact with it.

## A file is not a program

Before I get into it, I have to clear up a common misconception:

Files are often associated with programs. We say things like "Word file",
"Excel file", or "Photoshop file" and we expect that the respective program
will launch when we double-click such a file.

But that doesn't mean that these files are exclusive to those programs. There
could be many other programs that can open those files just as well. The
widespread claim "[you need Adobe Reader to open PDF
files](https://pdfreaders.org/)" is simply a lie used for marketing.

Just think of a mp3 audio file. When you double-click it, an audio player
launches. But you could install a different audio player and set it as the
default any time. The same goes for jpg or png images and many other types of
files.

With that out of the way, let's get to the actual files.

## Text files

You probably already know that files just consist of ones and zeroes, also
called *bits*. How these bits are interpreted depends on the specific type,
but many file types use the same basic structure: Groups of eight bits (we call
them *bytes*) are mapped to characters. For example, `01100001` is mapped to
`a`. This way we can create simple text files.

There are different mappings, but the most important ones are
[ASCII](https://en.wikipedia.org/wiki/ASCII) and
[UTF-8](https://en.wikipedia.org/wiki/UTF-8). ASCII is old and simple and only
contains the most important characters for the english language. UTF-8 is new
and complicated and contains everything that ASCII does and then a lot more,
e.g. chinese characters or emojis.

When I say that bytes are mapped to characters, I use a very loose definition
of the term "character". This does not only contain letters and digits, but
also punctuation, spaces (`00100000`), and even line breaks (`00001010`).

The programs that are used to view and edit text files are called "text
editors". The default text editor on Windows is called Notepad, the one on
MacOS is called TextEdit, and the most common one on Linux is called gEdit.
Word is not a text editor in this sense, because it stores its documents in
much more complicated files that can also contain formatting and images, which
goes far beyond the simple mapping we are talking about here (we will get to
that).

If you come across a file which you don't know it is often a good idea to look
at it in a text editor. If the file happens to contain text, you can read it
and maybe understand enough to know what to do next.

## XML

Mapping bits to characters already gives us some structure, but apparently not
enough. So people have invented different formats on top of that. Probably the
most widespread of these formats is XML. An XML file looks roughly like this:

```
<animals>
	<animal name="dog">
		<sound>Bark</sound>
		<legs>4</legs>
	</animal>
	<animal name="cat">
		<sound>Meow</sound>
		<legs>4</legs>
	</animal>
</animals>
```

I am not going to explain all details of XML, and I hope it is somewhat
self-explanatory. You can usually identify it by the use of all those angle
brackets.

XML is used virtually everywhere. For example: Every website is essentially
just an XML file.

## ZIP

Text files are great because they are much easier to read compared to a stream
of ones and zeroes. However, that comes at the price of being less efficient.
For example: "14", when encoded as two characters, is `0011000100110100`. When
we encode "14" as a number directly we get `1110`, which is obviously much
shorter.

Some smart people have come up with a great solution for that: We can
*compress* files to get back some of that efficiency. So now we get the best of
both worlds: We can write some understandable text files and then bundle them
together into a single compressed ZIP file.

This is actually how many file types work these days. For example, try changing
the file extension of any MS Office file to `.zip`. You can now unpack it and
see what it contains: You guessed it, a bunch of XML files!

## Conclusion

Of course there are a lot more file types. But text files and ZIP alone already
cover a lot of ground.

I believe the awareness of text files might be the biggest factor in the divide
between software developers and regular users. All programming is done in text
files. Settings for programming tools are usually changed not in some graphical
dialog but by editing a configuration file. I write this article not in Word
but in my text editor.

I don't think you have to do everything in a text editor. But I also don't
think you should be dependent on a specific application for each task. Knowing
about file types gives you the freedom to switch applications or even
interacting with files you have never seen before. It also opens up the world
of developer power tools, if you ever want to go there. It just generally
increases your ability to effectively and responsible navigate the modern
digital world.
