C Text Processing: Output Wrong Size

I'm writing ac program that takes a text file of words, and copies only words without capitalization or punctuation, and which are 4 or more characters long. I have tested the boolean functions int containsPunctuationOrCaps(char *word) and int longerThanThree(char *word) , and they both work. However, my main function only prints words of at least seven characters, and anything longer is truncated.

int main() {
  char *currentWord = malloc(36);
  int count = 0;
  char *Words[3000];
  FILE *fin, *fout;

  fin = fopen(INFILE,"r");
  if (fin==NULL) {
    printf("INPUT FILE NOT FOUNDn");
    return 1;
  } 
  while(fgets(currentWord, sizeof(currentWord), fin) != NULL) {
    if(!containsPunctuationOrCaps(currentWord) && longerThanThree(currentWord)) {
    Words[count]=currentWord;
    printf("%sn",currentWord);
    count++;
    }
  }
  fclose(fin);    
}

When I change char *currentWord = malloc(36); to char currentWord[]; it doesn't read anything. How do I make this work?


You declare currentWord as a char * , which points to dynamically allocated memory. sizeof is evaluated at compile-time, and evaluates to the size (in bytes) required by the type of currentWord - in your case, the size required to store a memory address/pointer, which apparently is 8 bytes on your system. Since fgets appends a terminating byte, the fgets call only reads 7 characters.

You could replace char *currentWord = malloc(36); (note: you never free the allocated memory) by char currentWord[36]; , that should lead to at least 35 characters being read. However, fgets always attempts to read until the end of the line (or until the buffer is full), hence the currentWord array will contain multiple words.

You could split the currentWord at spaces, but that requires additional checking logic at the end of the buffer (is the end of currentWord the end of a word/the line, or was the buffer just full and the word continues?). The easiest way to accomplish what you want would probably be to read the file character-by-character (using getc , but you should ensure that you use buffered I/O, see setbuf ). As you read each character, you check if it is a word or non-word character (or EOF). In the former case, you append to the buffer, while in the latter case you output the word if it meets your criteria, first however appending a terminator. The currentWord buffer should be dynamically allocated (unless you know a definite upper bound of word lengths), and you might have to reallocate it if the word currently being read is longer than the allocated memory can hold.


When you write

char currentWord[];

instead of

char *currentWord = malloc(36);

you declare an 0-sized static array of chars. When sizeof is applied to the name of a static array, the result is the size in bytes of that array.

For example, in this case:

char currentWord[10]; 

sizeof would have returned 10*sizeof(char).

In your case, the array is empty, therefore sizeof(currentWord) will return 0.

链接地址: http://www.djcxy.com/p/72150.html

上一篇: 修改char * str的内容

下一篇: C文本处理:输出大小错误