Showing posts with label kanji. Show all posts
Showing posts with label kanji. Show all posts

Tuesday, December 5, 2017

My attempt at creating a substitution cipher

As I wandered around Bletchley Park on Sunday, I couldn't help but wonder if I could come up with a plaintext cipher of my own.

Now, I'm not well versed in any programming languages* - though I did once dick around in PHP - unless it's scripting something in mIRC; nor am I any good at math, so that rules out any complicated stuff like NAND or XOR; so my only option was a letter substitution cipher.
 
*09/02/2023: This is now available in Python.

The only problem with these is that they're incredibly trivial to crack. Esp. if A = T, B = Z, C = L, etc.

Whilst walking around in circles - which is something I do whenever l'm lost in thought or bored (or both) - I came up with the solution: Kanji, Hiragana and Katakana

This brought the amount of usable characters to around 2,280.

Where it all began:

Here's what I did,

I used the following website [saiga-jp (archived)] to get a list of Kanji - which also included Jinmei level Kanji as a bonus, now bringing the character count to over 3,000 - and Windows charmap to get the Hiragana and Katakana characters; including vu (ヴ), wi (ヰ) and the other unused ones. (I didn't include the small versions of characters like ぅ.)

Following that, I removed any characters that looked remotely similar to each other like Katakana ロ vs. Kanji 口 (mouth); or Katakana ニ vs. Kanji 二 (two).

The only ones I didn't remove were 夕 and タ
because they looked quite different from each other - notice how the \ actually goes through the ノ in the second Kanji? Though this seems to depend on the font; and in retrospect I probably should have done so.

After I did this, I mixed up the characters in a random order in notepad and worked out how many characters I would need per line depending on what characters I permitted to be encoded.

The end result looked a little something like this.



I opted for an .ini file for this because I felt it was easier for me to work with.

The logistics: 

The next stage was for me to write a half-assed mIRC alias to generate an encrypted string based on my input for demonstration purposes.

But before I could do this there were a few factors I need to take into consideration in order to make the output harder (read: not impossible) to crack.

1.
A character may only be encrypted if it's in the list of pre-defined permitted characters. Any character that is not permitted (with the exception of space) must either be set as "X." If strict mode is applied, the encryption process is terminated.

Hopefully this should force people to write characters like £ as "pounds," which also makes the string longer.

2.
All Japanese characters must be randomly picked from that specific character's line of characters. If you notice in the above screenshot, line 4 are the characters for the character "3." So: 3 => || 3 =>

However...
  
3.
Any characters that follow themselves in succession can never be the previous Japanese character.

Let's take the sound "Grrr," as an example. This has three r's. Therefore the second r cannot have the same character as the first r, and the third r cannot have the same character as the second r.

So Grrr => 貌よ一よ would be acceptable, but Grrr => 貌よよよ is not. However, if there is a space, having the same character as the last character before the space is also acceptable.

So Grrr r => 貌よ一よ よ is valid. 

Here's an example [imgur] of me testing this using two Kanji representing the letter "a."

With all that done, I coded an alias which can be downloaded from here. [github]

Pros:

1. Various strings for the same sentence, like so.
 



2. The sheer amount of Kanji that can be used. The example ini file on Github contains around 3,200 Kanji. From what I've heard there are well over 8,000 Kanji, which works out at about >85 characters per ascii character.

3. Character and key order can be restructured freely.

Cons: 

1. Requires at least two Japanese characters for one letter re: the doppelgänger problem.

2. Would require at least Japanese locale installed on a computer, or at least good handwriting if someone writes down the ciphertext. (Though why would anyone do that in this day and age?)


Final word:

This seemed like an interesting idea, at least from my perspective. I'm not sure if anyone has ever attempted this before from a harder perspective.

Now I just want to see if somebody can crack it.

If you want to have a go, then please try decrypting this message I wrote with a completely different ini key structure.


You won't get a prize if you do, but you may get a warm fuzzy feeling~
 
09/02/2023: Sadly, due to circumstances beyond my control - such as the password for the VeraCrypt file these were stored in being apparently incorrect - I've had to completely remake these. I will make a note of the links and re-add them if possible if by some freak of magic I'm able to get back into the file.
 
1st: https://github.com/Jigsy1/IdeoCipher/issues/2 (EASY MODE - spaces and no padding)
2nd: https://github.com/Jigsy1/IdeoCipher/issues/3 (NORMAL MODE - encoded spaces, no paddding)
3rd: https://github.com/Jigsy1/IdeoCipher/issues/4 (HARD MODE - encoded spaces and padding)

16/11/2018:

I was going to modify this blog post slightly to reflect a new name change, but, you know, *effort*.

Last night I found a very large list [rikai (archived)] (>20,000) of ideographs, which according to sqrt on IRC included Chinese ideographs. As a result, the script is now henceforth referred to as the Ideograph Cipher.

And here's a message with using the same structure as the third implementiation, but using an extended file.
 
4th: https://github.com/Jigsy1/IdeoCipher/issues/5 (LUNATIC MODE) 
 

If you manage to succeed, leave a message in the comments or create an "issue" [github] at Github, or just comment on those specific Github threads.

Good luck!