Writing in RTL languages (Abjad writing system) comes with a few challenges specially if your editor’s support for those languages isn’t so good, and that is the case for the majority of web based editors (e.g. comment sections). The issues you may come across will vary from wrong direction of texts that have a mix of RTL & LTR characters to having end-sentence punctuation marks coming to the beginning of the sentence. In this post I’ll tell how to solve issues of this kind using specific unicode characters.
The first sentence has it’s exclamation mark at the beginning of the sentence! but the second sentence has it’s exclamation mark correctly at the end, why? because it has an extra 202B character at the beginning. We call these characters non-printing characters (for obvious reasons 😁), there are also characters that can change the paragraph to RTL, to LTR, to reverse a word, to change the order of character blocks and so on. The aim of this post is familiarity with some of these characters; This post was written after I found this question on Stackoverflow.
Intro & Terminology
1- Directionality:
All characters have an intrinsic “directionality”, for example Latin characters have Left-to-Right directionality and Persian/Arabic/Urdu characters have Right-to-Left directionality, some characters have neutral directionality like dot (.) and comma (,), when we type a series of RTL characters, the text will be RTL and when we type a series of LTR characters, the text will be LTR. (Oh really? 😂)
2- Character Block:
Characters next to each other that have the same directionality are called “character blocks! In the next image we have 7 character blocks.
3- Different Types of Non-printing Characters
3 of the types are of importance here, 1- Mark characters, 2- Embedding characters, 3- Override characters.
Mark Characters
1- 200F: Right-To-Left Mark
This character has an intrinsic right-to-left directionality but has no visual appearance! The usage of this character is that if we add it to a neutral character, it can change the directionality of the associated character block to right-to-left.
A simple text in stackoverflow’s answers section.
The same text with an extra 200F at the end.
2- 200E: Left-to-right-mark
This character has an intrinsic left-to-right directionality and just like the former one it has no visual appearance, we can use it alongside a neutral character and change the associated character block to left-to-right.
The text below has a dot at the ending and after adding the 200E character the directionality of the last character block is set to LTR, and the dot comes to the right of the English sentence.
Usage of 200E at the end of the sentence.
The raw text shown in VSCodium.
Embedding Characters
1- 202B: Right-to-left Embedding
This is the character that we talked about in our first example, it’s job is to change the directionality of a paragraph to RTL, or to be more precise, to change the order of the character blocks within a paragraph to Right-to-Left. A really useful scenario for this character is when you want to write a mixed list of words, to make the list look as expected a 202B must be added before each LTR character block.
Addition of 202B before each number (numbers are LTR).
The raw text shown in VSCodium.
P.S: A Secondary Usage of 202B
A secondary usage that it had for me personally was when I wanted to write English words and have them to the right of the editor in editors such as Notepad, KWrite and Gedit. To do so I add this character before the paragraph and it comes to the right. In this niche scenario 202F can be used without any change too.
Using 202B before 3rd and 4th paragraph
Not using 202B before 3rd and 4th paragraph
Another use-case of this character is in some government websites’ editors that don’t have any built in functionality to distinguish the directionality of the text, and you’ll be presented with such outputs: مطالب upload شدند.
So we add a 202B in the beginning and we’re good to go:
مطالب upload شدند.
2- 202A: Left-to-right Embedding
In the next image I’ve changed the directionality of each character block to LTR so that I could write a math equation using Persian words:
Addition of 202A before each word in the second Persian equation.
The raw text shown in VSCodium.
3- 202C: Pop Directional Formatting یا PDF
This character marks the end of an embedded section, imagine I want to use 202A and 202B characters multiple times in a text, wherever I want to end this embedding section, I use 202C or PDF, I’ve actually used this character in the Persian translation of this post, in that case, I wanted a left to right character block within a right to left paragraph, so I ended the LTR section using a 202C.
The first sentence in the next image is raw, the second one has changed the directionality of the whole paragraph to LTR, and the third one has a single LTR character block and the rest of the paragraph is still RTL which is what we expect.
The text mentioned
The raw text shown in VSCodium.
Override Characters
1- 202D & 202E
They don’t have many use-cases but they are the cutest non-printing characters 😇, they change the directionality of every single character in a character block! so if you type “Hello” and then add a 202E (Left-to-Right override) in the beginning it’ll change to “olleH”! and if you write “سلام” with a 202D (Right-to-Left override) in the beginning, it’ll change to “مالس”! except this fun use-case the other use-case I know of is an evil one! 😈, I’m gonna tell you about it later.
Using override characters for fun!
The raw text shown in VSCodium.
2- The Evil Use-case of Override Characters!
Look at the file below, what do you see? a harmless .png photo right?
Image of a harmless .png photo.
But NOOO! It’s a windows executable!
Behind the scenes of the ‘harmless’ file!
Now let me show you the raw text of the filename!
The raw text shown in VSCodium.
This is an evil use-case of 202E, you can make an executable (probably a malicious one) look like an innocent picture! By the way this method has become obsolete and even if you change the name to look innocent, you can’t bypass the UAC (User Account Control) privileges, anti-viruses and make a thumbnail that can work on all windows versions.
Playground
Here is a playground for you to work with the characters and see how things work.
How To Use On Linux
The majority of Linux keyboards have these characters by default, in the following table you’ll see the location of each of the characters:
Location
Character
Alt + 0
200F
Alt + 9
200E
Alt + ]
202B
Alt + [
202A
Alt + p
202C
Alt + i
202D
Alt + o
202E
How To Use On Windows
There are two ways you can write these characters on Windows, the first one is to hold Alt key down and write the unicode number using the numpad, and the other way is to add those characters to your keyboard using Microsoft’s tutorial on how to Insert ASCII or Unicode Latin-based symbols and characters.
Characters On Windows
Location
Character
Alt + 8207
200F
Alt + 8206
200E
Alt + 8235
202B
Alt + 8234
202A
Alt + 8236
202C
Alt + 8237
202D
Alt + 8238
202E
Outro
Thank you for reading my post, if you detected any errors, or if you have any suggestions regarding this post or anything about my blog, I’ll be happy if you write a comment or send me an email.
By elamir
🧠 Logician (INTP)
❤️ A good friend
💻 Software developer
💊 Medical science student
🌐 A global citizen from Iran
Thanks for teaching us 💓
I’m so pleased by your support my kind friend 😍❤️