back to article Ruin your co-developers' life with Mimic, the Unicode substitution tool

This is an idea of superlative malice: a developer has posted a GitHub project that replaces ASCII characters in C# code with near-homoglyphs from the Unicode character set. Nobody would miss the substitution if emoji started popping up in their code, but “Mimic” from Greg Toombs is more subtle than that. His script, inspired …

  1. Anonymous Coward
    Anonymous Coward

    Ohh nasty

    Note to co-workers: Don't try it, unless you want your deskphone ringtone to ring to the tune of Rick Astley or something else equally as annoying to happen.

    1. Captain DaFt

      Re: Ohh nasty

      you're way too kind.

      I'd slip them one of these: http://www.xamuel.com/blank-mp3s/ for a ringtone, and watch the hilarity ensue as they miss calls.

      1. DanielN

        Re: Ohh nasty

        Heh. With silent mp3 comes silent responsibility.

  2. John Savard

    It's the Character Set's Fault!

    This is what we get for abandoning the punched card, and indulging in fripperies like lower-case on computers!

    More seriously, while Apple's new language Swift allows foreign-language characters in identifiers, in general it would seem to me that if compilers scanned for "illegal character" errors in code (outside of character string constants, and comments) before doing any other syntax checking - as they did in the old days of punched cards - bugs of this nature would be simple to identify and deal with.

    1. Anonymous Coward
      Anonymous Coward

      Re: It's the Character Set's Fault!

      As one who learned on punched cards and punched paper tape, every program should check that every input is within a valid range before processing it further. Decades ago, as an EDP Auditor for a telco, I used to put randomly punched cards in input data to test this. You only need one COBOL program trying to execute the Data Division to learn. A side benefit would be a lack of buffer overflow hacks; just by checking for allowable length. Harumph, harumph.

  3. Gene Cash Silver badge

    EMACS native coding system functions

    You can also do ctrl-x <return> f (or meta-x set-buffer-file-coding-system) then pick us-ascii as the coding system for the file. Anything that doesn't translate gets flagged.

    I learned that back in the dark ages of TN3270 when it would randomly crap out doing the EBCDIC->ASCII conversion.

  4. ammabamma
    Trollface

    Abusing unicode text is fun!

    ˙"ʞɹoʍ ɹno pɐǝɹ oʇ buıןıǝɔ ǝɥʇ oʇ sɹıɐɥɔ ɹıǝɥʇ dɐɹʇs oʇ ǝʌɐɥ ɹǝbuoן ou ןןıʍ ʎɥɔɹɐuoɯ ǝɥʇ ɟo ʎuuɐɹʎʇ pǝʇsıɟ-uoɹı ǝɥʇ ɹǝpun ןןıʇs suısnoɔ uɐǝpodıʇuɐ ɹood ɹno" ʇɐɥʇ os suı-ʞɔǝɥɔ ʇsɹıɟ ןıɹdɐ ɹıǝɥʇ ɟo ǝuo uı sıɥʇ ǝʞıן buıɥʇǝɯos pıp ɐıןɐɹʇsnɐ uı ɯɐǝʇ ʇuǝɯdoןǝʌǝp ɹno 'ǝʇou snoıɔıןɐɯ ssǝן ʇɐɥʍǝɯos ɐ uo

    Great mischief can also be had with embedding bi-directional text marks in "normal" text

    1. Anonymous Coward
      Anonymous Coward

      Re: Abusing unicode text is fun!

      Good to know I haven't lost the ability to read the boss's papers lying on his desk without him realising it.

      The brain has to work a little harder when it sees a letter that could be valid the normal way up. n(u), a(e), m(w), g(b). There was an interesting study of how well we can cope with different orientations. Upside down is a relatively easy one. I wonder if upside down but written from right to left would be more difficult to read? On the other hand Hebrew doesn't take much practice to find it a natural direction with the different alphabet.

      1. DJV Silver badge
        Happy

        Re: Abusing unicode text is fun!

        Indeed, I learned my upside down reading skills at school by electing to sit at the front at the desk that butted up against the teacher's desk (of the same height). She always assumed it was the buggers at the back that were cheating!

  5. a_yank_lurker

    Wasn't C# developed by MicroSlurp?

  6. Kanhef

    A variation

    Write a program that makes substitutions only in variable names and classes/structs/etc. (but not standard library ones). The code will still compile and run the same, but trying to change it would be a nightmare. Might be useful if you have to let someone see your code, but don't want them to steal it.

    1. druck Silver badge
      Happy

      Re: A variation

      This is nothing new, a program for the Acorn Archimedes Pre-RISC OS Arthur operating system was written in BASIC, but protected itself from being reverse engineered by replacing every variable and function name with the string "ClaresMicroSupplies", or rather a subtly different combination of letter cases for each instance. It did make it completely impossible to follow, as you just can't tell the difference based on case alone.

      1. Anonymous Coward
        Anonymous Coward

        Re: A variation

        " It did make it completely impossible to follow, as you just can't tell the difference based on case alone."

        Someone once had a company programming standard that said assembler statement labels had to be letter number combinations. Can't remember if that applied to naming variables too. The letters indicated the hierarchy of the subroutine in the program and the number represented it was the nth label declared in that routine. For a GoTo language it made it very difficult 10 years later to understand how the code worked.

  7. allthecoolshortnamesweretaken

    Evil. Possible knighthood on behalf of the BOFH in the near future?

  8. Anonymous South African Coward Bronze badge

    Evil. Muhuhahahaha.

  9. Anonymous Coward
    Anonymous Coward

    I'm sure many a fun hour will be wasted looking for these substitutions.

    I've not pranked my co-developers but I always thought a good one would be to slip a leading zero onto in integer constant thus converting it to octal (at least in Java). How many people look at the value of constants when they fire up their IDE? The only danger is that there's a chance the change could make it into production.

    The funniest one I've done was swap over some keys on another devs keyboard. It was like watching someone riding a bike suddenly forget how it's done.

    1. Anonymous Coward
      Anonymous Coward

      I've had that happen unwittingly on me in production. strtoul, if you don't tell it, will implicitly interpret numbers with leading zeros as octal.

      Discovered this writing some test code for a former employer, I tested it just entering the digits straight in, they used it by entering leading zeros on the serial numbers. Thankfully the fix was easy.

      No, if I wanted to be nasty, it'd be slipping this script into the git hooks of their working directory so that it committed code with these substitutions.

    2. Stephen Wilkinson

      we amended one of our developer's keyboards so part of the bottom row read C U N T

    3. Anonymous Coward
      Anonymous Coward

      When our first microprocessor arrived we connected it to one of the Teletypes in the developers' terminal room as its first real test. It was programmed to present the mainframe login sequence - then to start over again. The unfortunate guinea pig didn't have to suffer their frustration too long though.

  10. GrumpenKraut
    Boffin

    Easy

    LC_ALL=C grep --color -n '[^ -~]' *.cc *.h

    Finds all non-ASCII chars (and also tabs); colorization is very useful here.

    1. GregToombs

      Re: Easy

      This is a handy trick - I'll be adding it to the wiki. Thank you.

      [The one caveat: it does not highlight spaces, and yes, mimic wreaks havoc on spaces.]

  11. Bc1609

    Working in international publishing

    Has at least given me a good grasp on handling non-ASCII characters. Many of our print systems pre-date any kind of extended character set, and so all of my team have a little toolbox of scripts to flag (and in many cases automatically replace) anything above chr127. Maybe I'll have to start running them over my code as well as the content...

  12. Anonymous IV

    Already done in posts to local newspaper's website

    A troll, apparently with unfettered administrator access to our local newspaper's website comment columns, has regularly posted the following boilerplate for at least a year:

    "And when it comes to Gloucester Rugby, what the club really should have done, whilst he graced the Kingsholm turf, was to ensure that Ruƿert Hαrden (our first choice, our most talented and our most complete tight head prop) started as many games for Gloucester as was possible."

    Inquiring whether "most complete" means "ungelded" would draw a level of vituperation beyond which I would ever wish to experience...

  13. brotherelf
    Paris Hilton

    So where's the news?

    Homograph attacks weren't all that surprising (in retrospect) in 2002.

    1. GregToombs
      Facepalm

      Re: So where's the news?

      I would agree with you that, whereas some people seem surprised of the existence of this vulnerability and my somewhat sophomoric project to exploit it, I'm instead surprised that they're surprised. Unicode has been around for about a quarter century.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon