nav search
Data Center Software Security Transformation DevOps Business Personal Tech Science Emergent Tech Bootnotes BOFH

back to article
Seven pet h8s: Verity is sorely vexed

Bronze badge
Devil

On the subject of C++

I leave you with an 'interview with Bjarne Stroustrup" - http://www.ganssle.com/tem/tem17.htm

8
0
Silver badge

Stuck-in-time Overflow

Google something site:stackoverflow.com and the top link will take you to an answer with a thousand upvotes from 2009 before the New Testament (C++11) arrived. Everything's moved on since then, unless the question is about makefiles.

Upvotes should disappear after a year and eventually old answers will sink and bright shiny new code will appear to take its place. Perhaps a special exemption can be made for Delphi.

8
1
Silver badge

Re: Stuck-in-time Overflow

Perhaps a special exemption can be made for Delphi.

Ooh, Miaow...

Fancy a saucer of milk?

:)

11
0
Roo
Silver badge
Windows

Re: Stuck-in-time Overflow

"Upvotes should disappear after a year and eventually old answers will sink and bright shiny new code will appear to take its place."

That's not a bad suggestion, but I'd like the option of specifying a time range. There is some perfectly good old code out there that needs love too. ;)

9
0
Silver badge

Re: Stuck-in-time Overflow

Exactly - occasionally someone gives you some old code from 10 years ago and you're looking at it wondering just WTF. SO is very handy for obscure answers sometimes.

5
0
Silver badge

Re: Stuck-in-time Overflow

There is some perfectly good old code out there that needs love too. ;)

The few times I thought I'd had a reasonably definitive answer to add to SO, it's refused to let me post due to lack of points.

I gave up trying.

Is SO ageist?

3
0
Bronze badge
Coat

Closures

Use Java. Call them lambda functions. It's an even less meaningful name, so at least you don't make any assumptions about what they're supposed to do. Which is, er, let me get back to you on that.

5
0
Silver badge

Re: Closures

Java is simply lifting Lisp terminology, and a lambda function is not necessarily a closure if I my admittedly poor grasp of the subject is worth writing home about. A closure is a Hard Sums concept that makes my head ache when I read about it.

6
0
LDS
Silver badge
Devil

I'm not surprised most commenters against Unicode are anglophones

Which probably have little knowledge of foreign languages. The real issue is most programming languages were developed in US/UK on English-bound OSes, by people mostly speaking and writing applications in English - or in a single language.

And that lead to the wrong idea that managing text is easy (so easy C doesn't have a string type at all, you just use array of bytes, being "char" a fake). The reality is managing text (properly) *is one of the most difficult tasks* - especially when you have to manage more than one language at the same time.

Unluckily, languages were not designed by programmers, or mathematicians. Thereby fitting them into simple data structures and algorithms is not easy at all.

The U in Unicode means also "Ugly"? Probably. Just, not complain only. Propose a better working solution for a string data type that can handle multiple languages while being easy to use at the same time.

Also, stop using "strings" and byte array interchangeably. They are not. I understand for beginners strings looks like the "definitive data type" because it can store anything - I've seen a PHP programmer doing bitwise calculations turning integers into strings of "0" and "1" characters, and the comparing them - whoever teaches it is doing a disservice to beginners, yet dynamic typed language made it even worse.

It' time to teach from the beginning how complex text manipulation is - because there languages are more than one, and many are far more complex than English.

IMHO, all comp-sci graduates and practitioners should be required to know at least two languages besides English. They would start to understand why Unicode is a necessary evil - and the difference between text - and a sequence of bytes.

13
5

Unicode Strings...

"a better working solution for a string data type that can handle multiple languages while being easy to use at the same time"

The Go language does this very nicely. See http://golang.org.

0
0
Silver badge

Re: I'm not surprised most commenters against Unicode are anglophones

Better to discourage programmers from manipulating text at all. Once you allow programmers to treat human-readable text as chunks of "wordlets", you're asking for the pain of this:

snprintf(buf,len,"%d item%s", count, (count==1)?"":"s");

and this:*

snprintf(buf,len,"%d item%s %s%s.", count, (count==1)?"":"s"
,type==DLOAD?"download":"updat" iscomplete?"ed":"ing"); 

A brief overview of how Slavic languages handle pluralisation usually does the trick to dissuade practitioners of #1; those who'd describe #2 as "efficient" would take a lot more work ("but I'll just use whatever the Japanese for 'ing' is and it'll still be fine...") .

The best way ways around these is to leave the job of writing text to linguists, and limit the job of the software to simply choosing between complete sentences; or better - complete paragraphs.

* both of those examples are culled from real code, by the way. A long time ago I used to work with development teams to make their products localisable... most needed only a little work, some were really bad. Perversely, I'd find that the higher the number of non-native English speakers on the team, the harder the software was to localise into a non-English language.

11
0
Silver badge

Re: Unicode Strings...

For a supposedly Unicode native language, it ain't half difficult to use Unicode in Go.

In that example, the thing that works is at the thing at the end, where you're faffing about with strings like just as in C. Unless I can address and compare Unicode glyphs in strings as easily as addressing and comparing simple ASCII elements, it's... suboptimal.

3
0
LDS
Silver badge

"The Go language does this very nicely." ?

It does look to me it doesn't anything special but implying they are UTF-8, and still alike arrays. Not very different from Python 3.

The issue many have with UTF is not any sequence of bytes is valid. That's why concatenating an UTF string with a generic byte sequence is "risky" - you can obtain an invalid UTF string. That doesn't happen with ASCII strings - any sequence is valid even if it may contain unprintable "characters" (or print the wrong ones depending on the "codepage") - yet still they are valid ASCII strings.

Usually the main difference is some functions calls will balk at those invalid sequences, and throw error/exceptions - which were not triggered when using ASCII, whatever the contents was. IMHO, it means lurking bugs now are surfacing, but many developers think instead UTF broke their code.

4
0
Silver badge

Re: I'm not surprised most commenters against Unicode are anglophones

Which commenters, where? The article doesn't, that I can see, mention Unicode.

Living, as I do, in a fairly polyglot city (for America), I do seem to know a fair number of programmers who can find their way around in more languages than English. But the suggestion of two language beyond English, or even one, seems visionary.

A lot of the resistance comes not from Anglophone bigotry, but from the deep reluctance to change. That is not in itself a bad thing, since changing code that--within whatever boundaries--does work always risks problems. I understand the programmer who lacks enthusiasm for changing a lot of code so that he can put the tilde in "canon" or the umlaut in "Jager". For a new system, sure; but how often do we work with wholly new systems.

6
0
Silver badge

Re: "The Go language does this very nicely." ?

. That's why concatenating an UTF string with a generic byte sequence is "risky" - you can obtain an invalid UTF string

Byte concatenation of two well-formed UTF-8 strings results in one well-formed UTF-8 string. If you find yourself wondering about input strings that are likely to contain UTF-8 multibyte codes, it's a very strong sign that you shouldn't be doing anything at all with the "characters" that are inside them.

(Besides, byte-drops in UTF-8 are detectable by code, unlike older multibyte systems like Big5 or Shift-JIS where it's impossible to tell a corrupted sequence from an intentional input without using statistical analysis)

Code should not try to write text. The smallest unit of text that you can legitimately concatenate is the sentence. Value insertion, not concatenation, is how you format text that will be read by a human.

The problem is that Unix and its descendants are designed around the assumption that there's really no difference between human-readable and machine-readable data.

7
1
Silver badge

Re: I'm not surprised most commenters against Unicode are anglophones

IMHO, all comp-sci graduates and practitioners should be required to know at least two languages besides English.

But...but...I've forgotten almost all the French and Latin I learned in four years of schooling. Does that mean I have to resign now, 35 years later?

8
0
LDS
Silver badge

Which commenters, where?

Read some of the links posted by Stob...

Of course the "two language" was a provocation. But in my experience, invariably when in a forum I see people complaining about a language adopting Unicode they are native anglophones - and living in an anglophone country - which means they mostly don't encounter foreign languages.

Anyway, there's a big difference about hearing or even speaking more than one language, and actually having to process text. There are many idiosyncrasies in how text in different languages is written and manipulated that are of course lost when speaking.

7
0
Silver badge

Re: I'm not surprised most commenters against Unicode are anglophones

All you have to remember is the rule: eye before serpent except after sheaves of corn and you are good to go.

6
0
Silver badge

Re:The best way ways around these is to leave the job of writing text to linguists

Nonsense! Display all messages in English using SHOUTY CAPS with extra spaces between the letters and any foreigner will understand perfectly.

17
0
Silver badge

Re: I'm not surprised most commenters against Unicode are anglophones

As someone who started writing stuff in Python a few years back, and read up on all the unicode stuff as a part of that, I have to say I think the Python developers are broadly correct: Python 2's approach was bad, and Python 3's approach is better. You can see why the Python 2-era designers thought it was a good idea to fudge everything so developers never 'had to worry about' the distinction between strings of bytes and text, and didn't have to care about the various ways of interpreting the former as the latter, In A Time When it was still kind of okay to pretend you only ever had to care about ASCII. But it really *isn't* a good idea, and any time you have to deal with anything but ASCII, it really does tend to lead to difficulties. I think they're right that it's better to explicitly acknowledge the difference in the language. And if you're writing pure Python 3, it's really not that hard to work with - it's not rocket surgery, just encode() and decode() as necessary. (Especially since the default encoding for Python 3 is UTF-8, which is going to be the one you want about 99.99% of the time anyhow).

The reason people hate it so much, I think, is that it becomes unfortunately *much* more of a minefield if you want to write or maintain code that works with both Python 2 and Python 3. Which many many people do. Doing that in itself immediately gives you many, many more edge cases and headaches to deal with. (I've lost count of the number of times I wish I could go back in a time machine and change Python 2's default encoding to be UTF-8, for a start). It gets even worse if you're not starting from scratch - so you can just `from __future__ import unicode_literals` and get rid of quite a few of the issues, making at least *your* code behave quite a lot more similarly under the two interpreters, but are stuck with older code where you can't just do that because of existing assumptions about the way the code will handle strings on Python 2. If you're in that situation, it can be a bleeding nightmare.

It's definitely a shame, and I think some of the Python devs at this point wish they could do the whole thing over and try to somehow reduce the problems. But I don't think it's because the Python 3 design is wrong, or because the Python devs are arrogant, or anything. It's just one of those things that happens because this stuff is hard.

I don't think all the coders who complain are just knuckle-dragging Americans or Brits who refuse to believe that anyone really needs more than the sacred ASCII character set. You *do* get those types, but I think they're a minority. It's just that dealing with this stuff can be a pain even if you honestly do understand and accept the importance of handling ALL THE CHARACTERS.

6
0
Bronze badge

Re: I'm not surprised most commenters against Unicode are anglophones

Its not "English", its the Roman alphabet; that level of coding also accommodates Greek and accented characters from derived languages. Unicode is needed for other scripts, chiefly Asian languages.

There is a reason why a Chinese typewriter is a large, heavy, expensive custom device while a corresponding European language one is (was) small, compact and mass produced. Unicode is like telling everyone they have to use a Chinese typewriter to write English just in case someone wants to write something in Mandarin. Even the Chinese were smart enough to realize that a culturally significant language isn't good for business -- they've had a rendering of their language in Roman script for decades.

4
6
Silver badge

Re: I'm not surprised most commenters against Unicode are anglophones

Um. No. Aside from Latin characters, Greek and 'Asian languages', you've got Cyrillic, Hebrew, Arabic and several different Indic scripts. Pre-Unicode, each of these had its own encoding - or multiple encodings - most of which didn't really take any account of how the others worked.

Unicode is massively, hugely better than the 'everyone gets their own encoding! You get an encoding! You get an encoding!' world we had before. Claiming that it only exists for Chinese (you know, that ridiculous little niche case, it's only *literally the world's most commonly spoken language*) is ridiculous.

(Also, it's rather silly to talk about 'write something in Mandarin'. Mandarin, Cantonese etc. are the *spoken dialect* variants of Chinese. There are two major written variants of Chinese, Traditional and Simplified. There's no 'written Mandarin Chinese', and there's no direct correlation between which spoken and which written variant you use, many combinations are possible).

11
1
Silver badge

Re: I'm not surprised most commenters against Unicode are anglophones

The last company I did any international language coding for the first thing I did was to write a DB for all the user interface text and a tool to allow whichever country was in control of their language interface to put their own translations of the english versions in. Even better they got to be in charge of their own CSS which I'd insisted on since the english master version had been working for nearly two years while the advertising department flip-flopped on colours fonts and images. Its amazing how easy coding is when you can fuck off this sort of random noise and penis waving from the design process. Give people ownership of their part of the process and the means to ensure you cant be to blame and they dont spend their lives trying to fuck you up.

1
0
Silver badge

Re: "The Go language does this very nicely." ?

Value insertion, not concatenation, is how you format text that will be read by a human.

Yes, well...

English is OK if you have 3 cases for none, 1 or plural.

But take Slavic languages:

1 rubl'

2 rublya

3 rublya

4 rublya

5-20 rublei

21 rubl'

1 god

2 goda

5 lyet

You need a whole code block to decide how to format a currency, date or a time ahead of selecting the proper sentence. And I am sure there are other language groups out there that are worse.

2
0
Gold badge

Re: I'm not surprised most commenters against Unicode are anglophones

"I've forgotten almost all the French and Latin ..."

I don't think French and Latin really count as two different languages. They are sufficiently different from English to make you curious about lingustics, but there are languages out there that will make you seriously wonder whether even whole sentences are the minimal unit of translation, or indeed whether it is actually possible to translate them into English without garbling at least some of the meaning.

Notions like "noun", "verb" or even "word" start to look flaky if you review *all* the languages of the world.

2
0
Silver badge

@Voyna I Mor Re: "The Go language does this very nicely." ?

Yes, I mentioned Slavic languages in a follow-up post, as one reason why building strings in code is a really bad idea.

Welsh is the only living language that can beat the Slavic family on complexity of pluralisation rules, but these are actually quite rigid rules, and can be expressed simply by code. Here is the list of pluralisation rules for most of the world's languages:

http://www.unicode.org/cldr/charts/27/supplemental/language_plural_rules.html

Localisation toolkits like GNU Gettext have a pluralisation mechanism built in, which lets you selects the correct string for the value you're inserting ("%d day" or "%d days"), before inserting the value. It's also not limited to just two options, and the logic to select between the options is general enough to support any pluralisation scheme. If you want to know more, it's documented here: https://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms

If you're not on Linux (or a framework that relies on Gettext for its localisation), you can still use the same procedure, with only a small amount of additional code. The trick is knowing that you might have to do this; once you know you need to do it, implementing it is trivial. (I use a small C# class called PluralStringFormatter that implements this logic; with a "string selector" object that's implements the string selection according to the current locale )

1
0

You're back ...

... and it's so lovely to see you again! Stay.

37
0
Silver badge
Happy

Yah!

gcy2017 := New (Year);

gcy2017.happy := TRUE;

Great to have you back Verity.

5
0
Anonymous Coward

Re: Yah!

That's old Turbo Pascal code. Sons of Khan code now would be:

gcy2017 := TYear.Create;

try

gcy2017.Happy := True;

while not gcy2017.End do

begin

// < your tasks here >

end;

finally

// Trigger on 2018-01-01 00:00:00

gcy2017.Free;

end;

2
0

I see a problem

"By the way, special mention for JavaScript, the runaway winner in this category: Broccoli, Brunch, Grunt, Gulp, Mimosa, Jake and Webpack, which list sounds like a roll call of firefighters in a forthcoming BBC puppet programme for the under-fives TheDonaldTrumpton."

Not sure Brocolli and Mimosa will qualify for visa's under Obersturmbahnfuehrer Rudd's new listing program....

0
2
Silver badge

"a forthcoming BBC puppet programme for the under-fives TheDonaldTrumpton."

Priceless.

7
0
Silver badge
Silver badge

"a forthcoming BBC puppet programme for the under-fives TheDonaldTrumpton."

Priceless.

Priceless indeed.

But shouldn't that really read "... for the under-fives of any age ..."?

4
0
Roo
Silver badge
Windows

Rejection Criteria

Dear Verity

I fear that if I were to enforce the third rejection criteria ("probably cleverer than us.") pretty much everything would be rejected. Would 2 out of 3 suffice, or is that too little hitlerism ?

2
0
Bronze badge
Unhappy

Re: Rejection Criteria

Anyone who can garner a reputation higher than ten clearly has too much time on their hands, and/or some other terrible issue that means they should be Struck Off immediately.

2
0
Silver badge

Too many tools

We have too many tools and languages - and the solution is always to build a new one to fix the problems in the old tool/language. And of course it never does, it just moves them somewhere else - so we create a new tool/language ...

Sigmonster says, "A good programmer can write FORTRAN programs in any language"

12
0
Unhappy

Re: Too many tools

And a bad programmer can write appalling Fortran in any language

2
0
Bronze badge

Ah, NNTP, comp.lang.c++ ... fond memories

Just had a mosey over there - gratifying, at least, to see that the great Alf P. Steinbach is still going strong, still flying the flag of cool reason amid the roiling sea of trolls, nutters, spam and desperate ignorance.

4
0
Anonymous Coward

I started laughing at the innocent youngster that wrote this article

... when I got to the point where "make" is called "standard".

It only was in the meaning that it had the same name, but different syntaxes on each flavour of Unix. Until GNU make came and invented a new one, that at least was more portable.

But that hadn't happened before there was an imake, an attempt to automatically create Makefiles for all those flavours, and then GNU automake, to do the same, but in a different way. And that was all before ant and XML came into fashion.

Nostalgia for the good old days of pure and clean Makefiles, good old days that never existed :)

6
0
Silver badge

Re: I started laughing at the innocent youngster that wrote this article

"good old days that never existed"

When you look carefully, they never did whatever the context.

5
0
Silver badge

Re: When you look carefully, they never did

I dunno. The time the pointy-haired DP Manager came in to nag the operators and got his tie caught in one of the carbon-paper take-up spools of the decolator during a run made me smile for months.

6
0
Silver badge

Re: I started laughing at the innocent youngster that wrote this article

I hate make and all its bastard offspring...

Hint if you need an extra tool to write the config for the tool that makes the config for the thing you are building; you are doing it wrong

These days I just use a bloody shell script to call the compiler, works every time and is 'human' readable

cMake, make & M4 can all burn in the pits of Microsoft for all I care

3
1

Re: I started laughing at the innocent youngster that wrote this article

+1

And from the sysadmin side, I am amazed anyone uses still Sendmail, for precisely that reason.

1
0

Re: I started laughing at the innocent youngster that wrote this article

>Hint if you need an extra tool to write the config for the tool that makes the config for the thing you are building; you are doing it wrong

>These days I just use a bloody shell script to call the compiler, works every time and is 'human' readable

hear hear. stuff i wrote 20 years ago still runs fine today, where i did it on the basis of "maximum simplicity to read & run". shell kicks arse, as do the other standard *nix tools. where i dipped into this-or-that "more efficient" or "tighter" DSLs, it's all dead code. dead effort. wasted time. bounded existence.

well, except make.

oh, and that no-source-control-HERE-let-alone-multiple-branches-thankyouverymuch thing i did/had to do, which apple later re-presented as TimeMachine. which you can still do in shell in 2mins. like i did.

1
0
Silver badge

Bah!

Python, where the intern opening and closing the text in the wrong editor can fuck the code to a fare-thee-well.

Seriously: All those good pythonic ideas saddled with the beyond idiotic significant leading whitespace design feature?

Even Cobol didn't fall for that one. Stay between columns 12 and 76 and remember to punctuate properly and you could do what the hell you liked with spacing.

Of all the things to get bent out of shape about with the flood of "C-like" languages in the wild, the curly brackets seems a bit .. silly.

12
0
Anonymous Coward

Re: Bah!

"Even Cobol didn't fall for that one. Stay between columns 12 and 76 and remember to punctuate properly and you could do what the hell you liked with spacing."

DEC's Terminal Format in Cobol (arguably) went one better - stuff like labels went in column 1, instructions were indented by a tab, thereafter what the hell you liked. Much faster to bash code in, and there was a reformatting utility to convert Terminal Format to column oriented format or back.

1
0
Bronze badge

Re: Bah!

Curly braces were introduced with Algol, one of the great-grandparents of programming languages (which actually predated computing as we know it).

5
0
Silver badge

Re: Bah!

I have vague memories of the time wasted hunting for the miss-matched brace back when I first got my hands on a C compiler. Improved compiler error messages have helped, always typing }«left arrow» immediately after { avoids the problem much of the time. When there is a brace problem the best tool I have for fixing the problem promptly is decades of experience. Python indentation is really clear, the parser reports errors on the correct line, and more functionality fits on the screen because a C line with only a } only becomes a python blank line if it improves clarity. Dealing with a python indentation problem is easy for anyone who understands why a C compiler is going to scream if you save your code in docx format.

I have come across config files that use C style braces. When I find one, I want to find a machine belonging to the person responsible, delete a } in the middle of his configuration file and see how long it takes him to fix the problem with only the inevitable crappy diagnostics from his half baked parser.

1
0
Silver badge

Re: Bah!

"and see how long it takes him to fix the problem with only the inevitable crappy diagnostics from his half baked parser."

About twenty seconds at a guess. My half baked parser used a keyword followed by a brace followed by parameters, and then a closing brace. So when the parser stumbles across a new keyword it will report the missing '}' and then continue as if one had been present. It's called error recovery and recognising that files that are supposed to be created by software are often hacked about with by humans.

Adding an extra '}' is more fun. ;-)

0
0
Silver badge
Gimp

Python 2.71828

I'm not surprised at continuing resistance to Python 3. My first and last experience of interacting with Python folk was receiving multiple repetitions of "why would you want to do that?", with variously emphasized sarcasms, all to one question where I needed to emulate data handling of a non-Python-invented format.

They simply were *against* anything externally derived. These are not the iconoclasts you were looking for.

5
0

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

The Register - Independent news and views for the tech community. Part of Situation Publishing