Hi all,
Looking to see if anyone can help in this scenario, I have generated XML files that are generated from a report coming from video conferencing. The structure of data in XML is like this:
Code:
<text textHeight="9.199219" lineSpacingFactor="1.1499023" leadingOffset=
"-1.5644531" hyperlinkType="None">
<reportElement mode="Opaque" x="20" y="131" width="539" height="11" forecolor="#FFFFFF" backcolor="#000000"/>
<font isBold="true" pdfEncoding="CP1252"/>
<textContent><![CDATA[Guest/Host: Guest]]></textContent>
</text>
<text textHeight="9.199219" lineSpacingFactor="1.1499023" leadingOffset="-1.5644531" hyperlinkType="None">
<reportElement style="DetailData" mode="Transparent" x="113" y="142" width="65" height="11"/>
<font pdfEncoding="CP1252"/>
<textContent><![CDATA[82.71.32.74]]></textContent>
</text>
<text textHeight="9.199219" lineSpacingFactor="1.1499023" leadingOffset="-1.5644531" hyperlinkType="None">
<reportElement style="DetailData" mode="Transparent" x="185" y="142" width="93" height="11"/>
<font pdfEncoding="CP1252"/>
<textContent><![CDATA[Guest]]></textContent>
</text>
<text textAlignment="Center" textHeight="9.199219" lineSpacingFactor="1.1499023" leadingOffset="-1.5644531" hyperlinkType="None">
<reportElement style="DetailData" mode="Transparent" x="286" y="142" width="79" height="11"/>
<font pdfEncoding="CP1252"/>
<textContent><![CDATA[30 Sep 2009 09:54:03]]></textContent>
</text>
<text textAlignment="Center" textHeight="9.199219" lineSpacingFactor="1.1499023" leadingOffset="-1.5644531" hyperlinkType="None">
<reportElement style="DetailData" mode="Transparent" x="373" y="142" width="86" height="11"/>
<font pdfEncoding="CP1252"/>
<textContent><![CDATA[30 Sep 2009 09:54:03]]></textContent>
</text>
<text textAlignment="Right" textHeight="9.199219" lineSpacingFactor="1.1499023" leadingOffset="-1.5644531" hyperlinkType="None">
<reportElement style="DetailData" mode="Transparent" x="466" y="142" width="36" height="11"/>
<font pdfEncoding="CP1252"/>
<textContent><![CDATA[0]]></textContent>
</text>
<text textAlignment="Right" textHeight="9.199219" lineSpacingFactor="1.1499023" leadingOffset="-1.5644531" hyperlinkType="None">
<reportElement style="DetailData" mode="Transparent" x="509" y="142" width="50" height="11"/>
<font pdfEncoding="CP1252"/>
<textContent><![CDATA[ 0.014]]></textContent>
</text>
<line>
<reportElement style="DetailData" mode="Opaque" x="20" y="154" width="540" height="1"/>
<graphicElement pen="Thin" fill="Solid"/>
</line>
<text textHeight="9.199219" lineSpacingFactor="1.1499023" leadingOffset="-1.5644531" hyperlinkType="None">
<reportElement style="DetailData" mode="Transparent" x="20" y="142" width="87" height="11"/>
<font pdfEncoding="CP1252"/>
<textContent><![CDATA GVTest]]></textContent>
</text>
Basically it is structured in such a way for tables as you can see from the co-ordinates. There are 7 columns in the table, you can see a heading for example:
<textContent><![CDATA[Guest/Host: Guest]]></textContent>
this is a heading, but you can also see the actual column data in the same format i.e. the column data for that heading would be
<textContent><![CDATA GVTest]]></textContent>
The actual data I need to pull out is the column data as above, but there is no differential tags for different data and columns and is encapsulated in CDATA.
I don't think regular expressions would work as most of the data is always different and contains times, dates, names etc...
anybody have any suggestions on how to parse this and get the information I need out of it?