Issue Details (XML | Word | Printable)

Key: PCC-49
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Anders Magnusson
Reporter: Thorsten Glaser
Votes: 0
Watchers: 1

If you were logged in you would be able to see more operations.

pcc-current mksh-R37c optimisation ⇒ broken

Created: 18/Apr/09 01:40 PM   Updated: 21/May/09 04:22 PM
Component/s: Common code, i386 target, pcc backend
Affects Version/s: None
Fix Version/s: None

File Attachments: 1. Text File diff.txt (22 kB)
2. Text File diff.txt (37 kB)
3. Text File exec_diff.txt (13 kB)
4. Text File mksh-test.txt (14 kB)

Environment: MirBSD #10-current (i386)

 Description  « Hide
From: Thorsten Glaser <>
Message-ID: <>
Date: Fri, 17 Apr 2009 11:32:59 +0000 (UTC)
Subject: pcc-current mksh-R37c optimisation ⇒ broken

% wget
% gzip -dc mksh-R37c.cpio.gz | cpio -mid
% cd mksh
% env CC=pcc CFLAGS=-O2 sh -r
% ./mksh -c 'for perli in perl5 perl; do perlos=$($perli -e "print \$^O;") 2>&- || continue; print $perli
$perlos; done'

Watch it hang. (Alternatively: 「% ./ -v」)

% env CC=pcc CFLAGS=-O0 sh -r
% ./mksh -c 'for perli in perl5 perl; do perlos=$($perli -e "print \$^O;") 2>&- || continue; print $perli
$perlos; done'
perl mirbsd
% _

Note: this used to work (pcc from 2008/10/28 for instance).

"It is inappropriate to require that a time represented as
 seconds since the Epoch precisely represent the number of
 seconds between the referenced time and the Epoch."
        -- IEEE Std 1003.1b-1993 (POSIX) Section B.2.2.2

 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Gregory McGarry added a comment - 27/Apr/09 02:35 AM
I had to fix some bugs in the i386 target for R37c to build on OSX. But otherwise:

$ ./mksh -c 'for perli in perl5 perl; do perlos=$($perli -e "print \$^O;") 2>&- || continue; print $perli $perlos; done'
perl darwin

Tests ran to completion too:

Total failed: 19 (18 unexpected)
Total passed: 264

Using gcc:

Total failed: 1 (as expected)
Total passed: 283

Gregory McGarry added a comment - 29/Apr/09 10:15 AM
Do the attached results reveal anything? I can't see how all these subtle differences could be due to a codegen bug.

Thorsten Glaser added a comment - 29/Apr/09 01:39 PM
Did you commit your fixes until now?

If I bootstrap pcc-current from CVS, I still get a compiler which does not
even build a nop programme (internal compiler error), even with
YACC=yacc in the configure environment and "make distclean" run
before each run.

If I use a pcc-current compiled by pcc-20081028, it still cannot run
the script or the above code snippet.

pcc-20081028 has no unexpected mksh test suite failure, so it's
definitively a regression.

Gregory McGarry added a comment - 05/May/09 01:24 AM
I committed fixes that should only affect OS X. I can try an ubuntu system.

I'll also try a build of the pcc-20081028 snapshot. Snapshots around October introduced problems when gcc-compat and i386 memory optimisations were introduced.

Gregory McGarry added a comment - 05/May/09 08:08 AM
I reproduced the problem on Ubuntu 8.03. Through a binary search, I was able to determine that the problems where introduced in pcc-090125.tgz, which includes the wchar_t detection changes by myself, but a very large change by Ragge to deal with gcc compatibility.

I will attach the differences introduced at that time.

Gregory McGarry added a comment - 05/May/09 08:10 AM - edited
Attached diff of changes introduced in pcc-090125.tgz on Jan24.

Thorsten Glaser added a comment - 07/May/09 07:59 PM

I cvs up -PAdC'd my pcc source tree today.

It passes a four-time bootstrap with
$ YACC=yacc CC=pcc CFLAGS=-O mksh configure --prefix=/usr/local/pcc && make && make install && make distclean
This is good news, as earlier versions failed that.

MirBSD 10 Kv#10uA7-20090425 GENERIC#1198 i386
(the version we currently offer on BitTorrent is about exactly that one)

The wstring.c test case (thanks for applying a fourth or so of my diff) also succeeds,
sans the fact that wprintf(3) is still missing from MirBSD libc (but the generated asm
code is okay).

The mksh issue still persists, though ( hangs, the perl sample too).

│CPU: 21.4% user, 0.0% nice, 77.3% system, 0.6% interrupt, 0.6% idle

│11028 tg -6 0 228K 748K sleep piperd 0:05 15.97% mksh

It seems to repeatedly do syscalls (mostly getpid, but also clock_gettime,
close, fcntl, fork!).

┌──┤ with CFLAGS=-O0 passed to mksh/
│Total failed: 3 (as expected)
│Total passed: 286

Thanks for looking into it.

Gregory McGarry added a comment - 08/May/09 01:27 AM
Right, this Jira issue is still very relevant. The diff.txt file attached to this issue contains the changes that introduced this problem. I'm confident it has something to do with the attribute changes. I don't begin to understand how it works. I'm also not sure whether it's possible to pull diff.txt apart to test individual changes in an effort to narrow in on the problem.

Gregory McGarry added a comment - 11/May/09 12:47 AM
I've attached the specific change that introduced the problem.

 cvs rdiff -u -D "2009-01-25 0:00" -D "2009-01-25 1:00" pcc

Gregory McGarry added a comment - 11/May/09 01:21 AM
The problem is actually in exec.c. A codegen bug...

I've attached a diff of exec.s generated by pcc before the pcc changes and after the pcc changes.

Gregory McGarry added a comment - 11/May/09 01:41 AM - edited
The three functions which have code differences are execute(), comexec() and herein(). They all call sigsetjmp(). The other function don't.

Thorsten Glaser added a comment - 11/May/09 11:24 AM
Ugh yes, sigsetjmp has been the cause of trouble for mksh several times
already; early pcc had a codegen bug there too; IA64 has non-natural
alignment restrictions on it, etc.

Unfortunately mksh has to use it ☺

Gregory McGarry added a comment - 11/May/09 11:19 PM
Actually, I think the problem is with volatile. These three functions use volatile and the others don't. Looking at the assembly, to looks like ebx is holding a value longer than in the previous version.

So maybe volatile isn't working anymore. The symptoms appear to agree, since the same commands are executed repeatedly. Do you have a simple test case for volatile?

Gregory McGarry added a comment - 13/May/09 11:09 AM
I factored comexec() and herein() into separate files. They compile and work okay with -O2.

Compiling exec.c without these functions only works without -O2.

Gregory McGarry added a comment - 13/May/09 11:18 AM
gcc can compile exec.c without problems. If I remove all the volatile qualifiers from exec.c and re-compile with gcc, then the symptom appears with gcc too.

I'd like to think that we've narrowed down the problem to volatile in execute() in exec.c.

Gregory McGarry added a comment - 13/May/09 11:23 AM

I removed the volatile qualifier from

const char ** volatile ap;

in exec.c::execute() and gcc fails with the same symptom. So maybe pcc has a problem with the const and volatile qualifiers at the same time.

Anders Magnusson added a comment - 13/May/09 02:28 PM
I'll take a look at it this evening. It's a simple test case, thanks Greg!

Anders Magnusson added a comment - 13/May/09 07:02 PM
I've just checked in a fix, hopefully it solves the problem.

Gregory McGarry added a comment - 14/May/09 12:40 AM
Woo hoo! It's fixed. Thanks ragge!

Gregory McGarry added a comment - 14/May/09 12:53 AM - edited
Info from mksh

-rwxr-xr-x 1 gmcgarry gmcgarry 233624 2009-05-14 08:35 mksh.gcc
-rwxr-xr-x 1 gmcgarry gmcgarry 221167 2009-05-14 08:34 mksh.pcc

$ time CC=gcc sh
real 0m20.622s
user 0m11.741s
sys 0m7.348s

$ time CC=pcc sh
real 0m11.083s
user 0m5.256s
sys 0m4.504s

$ time ./mksh.gcc
real 0m15.519s
user 0m0.364s
sys 0m2.072s

$ time ./mksh.pcc
real 0m15.615s
user 0m0.388s
sys 0m2.136s

summary: pcc produces ~10% smaller binaries, and is about ~2x faster. The resulting binary performs about the same.

Thorsten Glaser added a comment - 14/May/09 11:22 AM

While pcc still does not survive a four-times bootstrap, I could
indeed use a pcc built by a gcc-built pcc to see that this problem
is indeed fixed now and mksh's testsuite passes, on MirBSD.

I think this is enough to close this bug.

Anders Magnusson added a comment - 21/May/09 04:22 PM
Resolve requested by the originator.