Performance dedgradation reading many files [Archive]

Thomasm

25th February 2004, 21:14

I am developing an integration to a 3:rd party system that requires data communication via XML files. For the purpose I have developed a simple XML parser that works ver nicely, but when going into system testing I am getting problems.

The communication between the systems is quite heavy with _many_ files in short time. 2000 files at the same time will be a real world scenario. The problem is that the performance degrades by each file read and you can notice a difference already after 50 files. We are talking about hours or processing before 2000 files are processed. The first file is only a few seconds but the performance curve is steep.

I am closing the file after it is processed, so that would not be the issue? I have been looking for functions to purge memory or something, but I have not found anything.

Greatful for any help
-- Thomas Martensson

mark_h

25th February 2004, 23:24

Are you talking system performance or session performance? From reading this post it looks like this session is consuming all available CPU. Is that correct? If so maybe just building in a slight delay in the session would allow other things into the CPU. Sigh if only you could setup scheduling like the old HP3000 MPEX boxes.

Mark

lbencic

25th February 2004, 23:50

Also, what are you doing with the data? I only ask because of your other AFS post - are you calling the AFS code for each record of the 2000 you are wanting to read at any given time? That's a big slow down...
Maybe post the code so we can get a better idea of the problem...(can change names to protect the innocent or leave out parts that do not apply to the problem)

Thomasm

26th February 2004, 00:18

Hi!

Thanks for your replies.

The performance is the session only. OK, I have not been monitoring the system while running the session but I am sure other users would have reacted. Also, the 2 post are not related, so this is not an afs issue.

What the session does is basically open a directory and for each of the files with a matching name: open the file, read it line by line into a string, parse this string for data (it is XML) and store this data on local variables until 1 complete record have been picked up. for each record picked up insert that into the database and go read the next record. When the whole file is read then close it, move it to the 'processed' directory and open the next one and go on until all files are read.

Here is the code that is doing the low level file handling. I have cut it short where it starts the actual parsing as it does not add any information to the problem, I think.

(Sorry, not sure how to format the code)

Function init.file.opening()
{
cur.elem = "" | init
prev.elem = "" | init
_xml.buffer = ""
_cur.pos = 0
_EOF = false
}

|* Check return for errors
Function long open.import.file( string _import.file(1024) )
{
init.file.opening()
_import.file.pointer = seq.open( strip$(shiftl$(_import.file)), "rt")
return( _import.file.pointer )
}

|* Check return for errors
Function long close.import.file()
{
_seq.OK = seq.close( _import.file.pointer )

| should move the file to the history directory

return( _seq.OK = 0 )
}

| After opening the file this is the entry point to read it.
function long nextElement()
{
long res
long start.pos, end.pos, full.end.pos, start.end.tag.pos, comment.start.pos, spec.start.pos
long space.pos
long next.start.pos

res = true

| Read more into the buffer if there is room
while ( (len(_xml.buffer) - _cur.pos) < _READ.SIZE ) and not _EOF
| move the buffer
_xml.buffer = _xml.buffer( (_cur.pos+1); len(_xml.buffer)-_cur.pos)
_cur.pos = 0 | reset
res = readNextLine()
endwhile

| Now do the parsing...
}

function long readNextLine()
{
string local.buffer(_READ.SIZE)

local.buffer = ""

_seq.OK = seq.gets( local.buffer, _READ.SIZE, _import.file.pointer, GETS_ALL_CHARS )

if ( _seq.OK <> 0 and _EOF ) then
| reading again while we have reached EOF
set.error("The XML Parser function readNextLine() was called after reaching EOF")
return(false)
endif

if ( _seq.OK <> 0 ) then
_EOF = true
endif

_xml.buffer = _xml.buffer & " " & strip$(shiftl$(whiteSpaceToSpace(local.buffer)))

return(true) | OK if we are here
}

mark_h

29th February 2004, 21:03

Where at in the code do you notice degradation? In the reading or the parsing? Maybe if you attach all the code someone may see something. Is part of the parsing running an API function server? If so then that could be part of the problem. Sometimes I think function servers leave things hanging for the parent process.

Mark