Edge
Shows the real address in Edge on W10.
Click this link (don't fret, nothing malicious). Chances are your browser displays "apple.com" in the address bar. What about this one? Goes to "epic.com," right? Wrong. They are in fact carefully crafted but entirely legitimate domains in non-English languages that are designed to look exactly the same as common English words …
Looking at them on this Linux Mint box, my RSS reader shows them as https://xn--80ak6aa92e.com and https://www.xn--e1awd7f.com/ respectively.
My browser (Firefox on this box) shows the first one as the article describes, except for one significant difference: The 'l' looks like a capital I - presumably a side effect of the font in use here, but the important point is that for me it stands out a mile.
The second one, however, does just look like epic.com
This post has been deleted by its author
That decision was criminal in its stupidity. Example: НSВС.com - that is Russian N, S Russian V, Russian S, .com.
You can create a mixed encoding homophone for nearly anything and it will be virtually indistinguishable from the real thing. Now throw in a certificate and voila - phishing, here it comes.
This isn't a fix, it is a work around. You fix the problem that you are not mislead by malicious IDNs, but you have a new problem that you cannot see any IDNs.
It's like someone complaining that their editor doesn't work in Arabic, and being told that the fix is to write in English.
For most English speakers, not seeing IDNs is likely not much of an issue.
Maybe a compromise would be that with punycode 'true' it shows the punycode domain name in the address bar to avoid (English speaking) people getting fooled, but the shows proper name when you hover over it if you were i.e. visiting a Russian site.
Obviously another solution will need to be found for them, but English speakers are likely to be the target of the vast majority of hijacking attempts that use punycode domains masquerading as real ones.
Is a solution a bad one if it only fixes the majority of the problem, rather than 100% of it?
Obviously another solution will need to be found for them, but English speakers are likely to be the target of the vast majority of hijacking attempts that use punycode domains masquerading as real ones.
No, you are only thinking of the problems that an anglophone will encounter from homographic IDN attacks, it is still a form of colonialism.
You haven't considered that due to our earlier anglophone-only internet, most of those non english speakers will actually be using a lot of domains that have english domain names, for instance paypal, google, mpay and so on. A work around that "works" for anglophones, but still allows the remaining 84% of the world to be pwned is not a valid solution.
For instance, a user in India almost certainly would want punycode on for local websites, but they still won't want to go to xn--mesa-g6d.in thinking it is mpesa.in.
If you can't apply the workaround, you'll need to check certificate for sites you really care (let's encrypt cert is a red herring). It sucks that not only urlbar gets spoofed but also noscript sees no harm so drive by is that much more likely to happen (if you apply permanent exceptions to domains you trust).
But this is still a perfectly valid and complete "fix" for that person if that person only actually wants/needs to write in English.
Of course, the "fix" for the person who never needs to visit IDN domains is an "it's broken" for someone who does. Isn't it ?
Which is the real problem, no ?
But your text editor analogy falls somewhat short. A text editor that does not support Arabic cannot be used to send a document to someone that looks like English but is in fact Arabic.
"I sent the infidel the instructions for assembling a bomb, and they thought it was a shopping list because I used Arabic that made it look like a list of English words for grocery goods. How surprised will they be when they go out to buy milk and eggs and instead blow up the supermarket ?!"
:)
Sorry, this might seem a little simple, but, as we know what characters looks like what in other languages, when some applies to have a domain like raural.com that become paypal.com is to simply flag it as unavailable, just like if someone owns the domain already - surely it wouldn't take much longer for a script checking to see if the domain you wish to buy permeates the unicode and checks all possibilities before returning the results with a big fat "computer says no" when you're trying to spoof a domain.
Yeah, a few people may end up not being able to get the domain they wish, but let's face it, most people buying a domain face that problem these days anyway as someone's beaten them to all the good names anyway.
Or am I over simplifying things? I could quite easily be, I'm rather the idiot..
you are. under your proposal, a hypothetical corporation peddling nuclear reactor fuel (mox.com) should be able to lock out an equally hypothetical innocent grop of russian lichen-fanciers (мох.ru). The existence of a company website opal.com should not stop a hypothetical local nightclub in the middle of siberia from calling itself ора1.ru, after a little local river. ideally, these hypothetical russian entities should also be able to register their names in the .com or .org namespaces - saying otherwise would strongly imply that some animals are more equal than others.
most IDN are used entirely innocently, and are a great help in online those of us who do not speak english, or at least another laguage based on the latin alphabet, fluently. making them second-class does not help anybody.
Currently mox.com and mox.ru can both exist, even if owned by different entities. That's the whole point of having different namespaces. Given that, мох.ru should be allowed whether or not mox.com exists, so long as mox.ru doesn't exist.
If both spellings want the same namespace, as in мох.com and mox.com, then it should be handled as if the spellings were the same. First-come, first-served, or whatever the rule is. That isn't making IDN second class. It is treating them the same as everything else.
This post has been deleted by its author
With the launch of IDN equivalent TLD's for CNO along with the newGTLD's, ICANN had an ideal opportunity to fix this problem for good. Instead they made it worse.
What should have happened: Complete banning of mixing scripts between levels. All IDN's in CNO should have been moved over to their equivalent IDN newGTLD (eg cyrillic .com's should have been grandfathered over to .ком, etc,) and the system returned to only ASCII registrations allowed in the plain old ASCII CNO TLD's.
Instead, ICANN sat on it's hands and even let mixed scripts proliferate into the ASCII new GTLD's! So now you can register chinese scripts in .xyz. How useful.
SSAC were asleep at the wheel.
But don't get me started.
In fact it's become some a huge mess that Verisign, having successfully applied for 12 transliterations of .com and .net, have only launched two of them - .コム for Japan and .닷컴/.닷넷 for Korea - and that was over a year ago. They have abandoned launching the rest. That would make for an interesting article in itself- why would a powerhouse like Verisign not be able to handle launching the lot of them at the same time, given they're for completely different markets?
"different but look almost identical"
A letter is just a symbol with a certain shape - if two letters look identical, they are identical. It doesn't matter if different languages use that shape in different ways to represent different sounds, the only thing a computer needs to do is display the shape when told to do so; there's absolutely no reason to come up with multiple codes to represent the same shape just because that shape is used in different alphabets.
And before objections that the letters aren't quite identical and the minor differences justify the different codes, that sort of minor change is a function of font. The difference between a Times New Roman "P", a Comic Sans "P" and a Wingdings "P" is far greater than the difference between an English and a Russian "P". If you want Cyrillic-looking letters you choose a Cyrillic font, if you want Latin letters you choose a Latin font. Defining multiple codes for effectively identical letters really doesn't help matters.
So true - the issue is how the user responds to the symbol displayed and not what the computers are using internally to represent it.
There is a necessary trade-off between making things easy or friendly for the less IT-literate (i.e. most non-IT) people, and giving those same people a risk-proportionate way of avoiding ne'er do wells. The risk is browser makers/writers putting in things like that Firefox IDN punycode default to simultaneously shield users from the details while opening an avenue for said users to be misdirected by the ne'er do wells.
A typical UK or US English user is unlikely to need a URL to include Cyrillic or other variants of their normal symbols appearing in URLs. Same for typical French or Arabic or other users - that should apply en masse per locale/region and doesn't seem to be a particularly insurmountable technical problem.
I'll tell you what. Poke both your eyes out so you're dependent upon a screen reader. Then see if it makes any difference which alphabet is used.
Hint: symbols that look the same may represent different phonemes in other languages.
If you still can't figure it out, well you're now blind so you won't be posting any more stupid comments. At least not until you've got the hang of that screen reader.