Ohh nasty
Note to co-workers: Don't try it, unless you want your deskphone ringtone to ring to the tune of Rick Astley or something else equally as annoying to happen.
This is an idea of superlative malice: a developer has posted a GitHub project that replaces ASCII characters in C# code with near-homoglyphs from the Unicode character set. Nobody would miss the substitution if emoji started popping up in their code, but “Mimic” from Greg Toombs is more subtle than that. His script, inspired …
This is what we get for abandoning the punched card, and indulging in fripperies like lower-case on computers!
More seriously, while Apple's new language Swift allows foreign-language characters in identifiers, in general it would seem to me that if compilers scanned for "illegal character" errors in code (outside of character string constants, and comments) before doing any other syntax checking - as they did in the old days of punched cards - bugs of this nature would be simple to identify and deal with.
As one who learned on punched cards and punched paper tape, every program should check that every input is within a valid range before processing it further. Decades ago, as an EDP Auditor for a telco, I used to put randomly punched cards in input data to test this. You only need one COBOL program trying to execute the Data Division to learn. A side benefit would be a lack of buffer overflow hacks; just by checking for allowable length. Harumph, harumph.
You can also do ctrl-x <return> f (or meta-x set-buffer-file-coding-system) then pick us-ascii as the coding system for the file. Anything that doesn't translate gets flagged.
I learned that back in the dark ages of TN3270 when it would randomly crap out doing the EBCDIC->ASCII conversion.
˙"ʞɹoʍ ɹno pɐǝɹ oʇ buıןıǝɔ ǝɥʇ oʇ sɹıɐɥɔ ɹıǝɥʇ dɐɹʇs oʇ ǝʌɐɥ ɹǝbuoן ou ןןıʍ ʎɥɔɹɐuoɯ ǝɥʇ ɟo ʎuuɐɹʎʇ pǝʇsıɟ-uoɹı ǝɥʇ ɹǝpun ןןıʇs suısnoɔ uɐǝpodıʇuɐ ɹood ɹno" ʇɐɥʇ os suı-ʞɔǝɥɔ ʇsɹıɟ ןıɹdɐ ɹıǝɥʇ ɟo ǝuo uı sıɥʇ ǝʞıן buıɥʇǝɯos pıp ɐıןɐɹʇsnɐ uı ɯɐǝʇ ʇuǝɯdoןǝʌǝp ɹno 'ǝʇou snoıɔıןɐɯ ssǝן ʇɐɥʍǝɯos ɐ uo
Great mischief can also be had with embedding bi-directional text marks in "normal" text
Good to know I haven't lost the ability to read the boss's papers lying on his desk without him realising it.
The brain has to work a little harder when it sees a letter that could be valid the normal way up. n(u), a(e), m(w), g(b). There was an interesting study of how well we can cope with different orientations. Upside down is a relatively easy one. I wonder if upside down but written from right to left would be more difficult to read? On the other hand Hebrew doesn't take much practice to find it a natural direction with the different alphabet.
Write a program that makes substitutions only in variable names and classes/structs/etc. (but not standard library ones). The code will still compile and run the same, but trying to change it would be a nightmare. Might be useful if you have to let someone see your code, but don't want them to steal it.
This is nothing new, a program for the Acorn Archimedes Pre-RISC OS Arthur operating system was written in BASIC, but protected itself from being reverse engineered by replacing every variable and function name with the string "ClaresMicroSupplies", or rather a subtly different combination of letter cases for each instance. It did make it completely impossible to follow, as you just can't tell the difference based on case alone.
" It did make it completely impossible to follow, as you just can't tell the difference based on case alone."
Someone once had a company programming standard that said assembler statement labels had to be letter number combinations. Can't remember if that applied to naming variables too. The letters indicated the hierarchy of the subroutine in the program and the number represented it was the nth label declared in that routine. For a GoTo language it made it very difficult 10 years later to understand how the code worked.
I'm sure many a fun hour will be wasted looking for these substitutions.
I've not pranked my co-developers but I always thought a good one would be to slip a leading zero onto in integer constant thus converting it to octal (at least in Java). How many people look at the value of constants when they fire up their IDE? The only danger is that there's a chance the change could make it into production.
The funniest one I've done was swap over some keys on another devs keyboard. It was like watching someone riding a bike suddenly forget how it's done.
I've had that happen unwittingly on me in production. strtoul, if you don't tell it, will implicitly interpret numbers with leading zeros as octal.
Discovered this writing some test code for a former employer, I tested it just entering the digits straight in, they used it by entering leading zeros on the serial numbers. Thankfully the fix was easy.
No, if I wanted to be nasty, it'd be slipping this script into the git hooks of their working directory so that it committed code with these substitutions.
When our first microprocessor arrived we connected it to one of the Teletypes in the developers' terminal room as its first real test. It was programmed to present the mainframe login sequence - then to start over again. The unfortunate guinea pig didn't have to suffer their frustration too long though.
Has at least given me a good grasp on handling non-ASCII characters. Many of our print systems pre-date any kind of extended character set, and so all of my team have a little toolbox of scripts to flag (and in many cases automatically replace) anything above chr127. Maybe I'll have to start running them over my code as well as the content...
A troll, apparently with unfettered administrator access to our local newspaper's website comment columns, has regularly posted the following boilerplate for at least a year:
"And when it comes to Gloucester Rugby, what the club really should have done, whilst he graced the Kingsholm turf, was to ensure that Ruƿert Hαrden (our first choice, our most talented and our most complete tight head prop) started as many games for Gloucester as was possible."
Inquiring whether "most complete" means "ungelded" would draw a level of vituperation beyond which I would ever wish to experience...