Quantcast
Channel: Extract text from html with powershell - bad pattern - Stack Overflow
Viewing all articles
Browse latest Browse all 3

Extract text from html with powershell - bad pattern

$
0
0

I want to extract this text

Spectrum Mortis - Bit Meseri - The Incantation (2022)Hate Legions - Exitus Letalis (Tota Vita Nihil Aliud Quam Ad Mortem Iter Est) (2014)

from this html block

<span id='tid-span-369523'><a id="tid-link-369523" href="http://metalarea.org/forum/index.php?showtopic=369523" title="This topic was started: Sep 16 2022, 04:18:47">Spectrum Mortis - Bit Meseri - The Incantation (2022)</a></span><span id='tid-span-221568'><a id="tid-link-221568" href="http://metalarea.org/forum/index.php?showtopic=221568" title="This topic was started: Apr 11 2014, 14:31:18">Hate Legions - Exitus Letalis (Tota Vita Nihil Aliud Quam Ad Mortem Iter Est) (2014)</a></span>

I'm trying to set this code but nothing is written on output2.txt

$html = Get-Content -Path 'C:\temp\html\metalarea2.html' -Raw$pattern = '<span id="tid-span-\\d+"><a id="tid-link-\\d+" href=".+?" title=".+?">(.+?)</a></span>'$matches = Select-String -InputObject $html -Pattern $pattern -AllMatches$result = $matches | % { $_.Matches } | % { $_.Groups[1].Value }$result | Out-File -FilePath "C:\temp\html\output2.txt"

I don't understand where the problem lies

EDIT: SOLUTIONS

$pattern = '<span id=\x27tid-span-\d+\x27><a id="tid-link-\d+" href=".+?" title=".+?">(.+?)</a></span>'

OR

$pattern = '<a id="tid-link-\d+".+?>(.+?)</a>'

Viewing all articles
Browse latest Browse all 3

Latest Images

Trending Articles





Latest Images