需要匹配文本如下:
要求alt标签不为hello,不能与src中后面的图片名相同,不能为汉字
<img src="en/p_w_picpaths/main_page/cover.jpg" alt="hello" />
<img src="en/p_w_picpaths/main_page/cover.jpg" alt="cover" />
<img src="en/p_w_picpaths/main_page/cover.jpg" alt="我是汉字" />
<img src="en/p_w_picpaths/main_page/cover.jpg" alt=" yingwen " />
<img alt=" yingwen " src="en/p_w_picpaths/main_page/cover.jpg" />
<img src="p_w_picpaths/en_adu_1_6_1_submenu_07out.gif" alt=" Material2 " name="Material2" width="130" height="18" border="0" id="Materia12" ('Material2','','p_w_picpaths/en_adu_1_6_1_submenu_07in.gif',1)" />
<img src="p_w_picpaths/spacer.gif" width="1" height="55" alt="hello" />
<img title="動畫所見是牙冠的縱切面。蛀壞部分從琺瑯質開始,一直蔓延至象牙質,形成明顯的蛀牙洞" alt="動畫所見是牙冠的縱切面。蛀壞部分從琺瑯質開始,一直蔓延至象牙質,形成明顯的蛀牙洞" src="02_inside/teens_OD_3DAni_07a.gif">
有三种方法:
前两种用法为常用的,后一种不经常使用,但是非常简洁
<img[^>]+(?<=alt=")(?!hello|\2)([a-zA-Z]+)[^>]*src=["']?([^"']+\/(\w+).(?:jpg|gif|png))"|<img[^>]+(?<=src=["']?)([^"']+\/(\w+).(?:jpg|gif|png))"[^>]*alt="(?!hello|\5)[a-zA-Z]+[^>]*>
<img\b(?=[^>]*src="(?:[^"]*/)?([^"\.]*)\.[^"]*")(?=[^>]*alt="(?!hello")(?!\1)[^"]+")[^>]*/>
(?=.*?/([a-zA-Z0-9_]+)\.(?:jpg|gif|png))<img[^>]+?alt="(?!\1|hello")[a-zA-Z0-9._]+"[^>]*>
下面为常用去除input 标签中的value,name值,位置不固定,是上面方法的扩展
<input\b(?=[^>]*value=\s*["']?([^"]*))(?=[^>]*name=\s*["']?([^"]*))[^>]*>
(?=.*?value=\s*["']?([^"]*))<input[^>]+name=["']?([^"]+)"[^>]*>