Issue Details (XML | Word | Printable)

Key: PCC-466
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Iain Hibbert
Reporter: Volkmar Klatt
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
pcc

pcc incorrectly ignores the content of string constants following a '\0'-character

Created: 12/Aug/14 02:38 PM   Updated: 22/Aug/14 03:17 PM
Component/s: None
Affects Version/s: None
Fix Version/s: None

File Attachments: 1. Text File tr_20140811_70.c (1 kB)

Environment:
pcc 1.1.0.DEVEL 20140807 for i686-pc-linux-gnu


 Description  « Hide
/* tr_20140812_70.c - test file for pcc - Volkmar Klatt

   pcc incorrectly ignores the
   content of string constants (string literals, ANSI C, 6.4.5)
   following a '\0'-character

   pcc 1.1.0.DEVEL 20140807 for i686-pc-linux-gnu

   usage:
   gcc tr_20140812_70.c -o z
   pcc tr_20140812_70.c -o zz
   ./z # --> 0, -1, -1
   ./zz # --> -1, 0, 0 */

#include <stdio.h>
#include <string.h>
#include <inttypes.h>
#include <stdlib.h>

int cmp(const void *p1, const void *p2)
{
   return (*(unsigned char *)p1 - *(unsigned char *)p2);
}

void show_buf10(char b[10])
{
   int i;

   for (i=0; i<10; i++)
      printf("%i,", b[i]);
      
   printf("\n");
}

int main()
{
   char buf[10];
   int n;

// 1
   strcpy(buf, "noticeme");
   qsort(buf, 9, 1, &cmp);
   show_buf10(buf);

   /* memcmp() cannot be blamed, it just gets wrong input */
   n = memcmp(buf, "\0ceeimnot", 9);
   printf("n == %i\n---\n", n);

// 2
   n = memcmp("\0hihihiii", "\0hohohooo", 9);
   printf("n == %i\n---\n", n);
 
// 3
   n = memcmp("haha\0haaa", "haha\0hooo", 9);
   printf("n == %i\n", n);

   return 0;
}




 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Anders Magnusson added a comment - 13/Aug/14 09:21 AM
This bug was introduced when unicode identifiers were added. Just need to find out which way is the best to solve it :-)

Iain Hibbert added a comment - 13/Aug/14 09:31 PM
This happens because the string handling code now converts and holds strings in binary format internally, but still relies upon the nil terminator to detect the end of the string. If the string contains one, then it is used instead of the real one

there are two ways to fix this, but neither is simple alas

1. revert to the method used before, where the escaped form is held in memory
  - then unicode sequences are difficult to decode

2. go all the way and treat the string as a binary object
  - the patricia tree implementation needs to be rewritten, as it relies upon the nil terminator and string/length tuple needs to be kept

I was working on it, but was away for a couple of weeks. I think I prefer the second method, but it is more complex and I don't want to disadvantage the compiler if possible. I hope to fix it in the next week or so

rl added a comment - 19/Aug/14 12:00 PM
Wrong priority!
This is in no way a "minor" bug, there is no workaround. This is a showstopper.

Iain Hibbert added a comment - 22/Aug/14 03:17 PM
I have reworked the string handling, reverting to the method where we hold strings internally
with escaped values, rather than a straight binary string. This lets us have nil values.

perhaps later, this can be reworked to use the second method I suggested earlier, but that is more complex