题目描述
需求:Java写一个程序,汇总文章中每个英文单词的个数。判断一个单词时,需要考虑前后的空格,换行字符以及连接”-”符号,连接符会将一个词组成一个整体,用正则表达式实现,具体规则如下:
- 以下当作一个词:
don't, doesn't, didn't, can't, couldn't, wouldn't, isn't, aren't, wasn't, weren't - 以下当作一个词处理:
he's, she's, I'm, you're, we're, they're - 以下不计入统计,删除
Shawn's, apple's, Jonas’, what's, 'twas - ice-cream 如果不在行尾换行时,当作一个词,但是不能删掉中间连接符
题目来源及自己的思路
看了一些资料,写了一个初稿,
(?:she's|he's|they're|we're|you're|I'm|It's)|(?:isn't|aren't|doesn't|don't|didn't|haven't|hadn't|hasn't|can't|couldn't|wasn't|weren't|wouldn't )
测试字符串为:
She's"1.tom:'what's your name.' Jame's Janes', didn't, character,wasn't,
ice-cream,
相关代码
(?:she's|he's|they're|we're|you're|I'm|It's)|(?:isn't|aren't|doesn't|don't|didn't|haven't|hadn't|hasn't|can't|couldn't|wasn't|weren't|wouldn't )
你期待的结果是什么?实际看到的错误信息又是什么?
但是不能正确判断单词、连接符和换行符。
谢谢老司机领路!帮我设计这个正则表达式 ^_^
基本上满足你的要求