|
Comments and Discussions
|
 |

|
I want to Parse a CString in Unicode (MFC Unicode character Set). I use in your Sample
CString CszXML;
..........
Cxml *xml = new Cxml();
xml->ParseString(CszXML.GetBuffer());
m_tree.DeleteAllItems();
CNode *root = xml->GetRootNode();
fill(NULL,root);
After that, it's incorrect when show in Tree. When I use Debug Mode to see CszXML in your source Sample ( read from file), I don't know how to convert my CString in Unicode to your String that you use in your Demo.
|
|
|
|

|
Ok, i will get on this in 48 hours max.
Haven't worked on this project in quite a while and i can not solve this at first glance. Hope i will be able to help.
|
|
|
|
|

|
thanks, it is a light weight parser, thoug actually there many good free implements.
I only want to parse simple XML too.
modified on Thursday, October 21, 2010 9:19 PM
|
|
|
|

|
Stuff in brackets from the reply not made to me in the post below...
I can't comment on whether the world needs another XML parser but I got a dose of the screaming ab-dabs from looking in the xml class header. There's loads of the internal workings of the class leaking across the interface. Class interfaces should be complete and minimal and this one looks like it should be called something like xml_parser_with_base_64_encoding_and_decoding. Perhaps what you've got here is really a namespace and not a class?
Other awoooga signs just from the xml class definition...
- pointer members without a copy constructor (or preventing copying and assignment, missed that option out before)
- while we're on the subject, pointer members (requires manual resource management and a cluttered implementation)
- using character pointers instead of std::string/std::vector in the interface (vector is just as efficient as using a pointer for what you're doing and far safer)
- parsing interface taking character arrays instead of streams (streams are a more general interface and have a lot of the parsing you do built in without having to muck around with pointers)
- private member functions and little information hiding (private member functions are generally better implemented as helpers in the implementation file. About the only time you ever need them is if they're overridable. And it might pay to investigate the difference between encapsulation and information hiding)
I (still) don't think I've got the stomach to look in the implementation.
modified on Tuesday, October 5, 2010 7:20 AM
|
|
|
|

|
Hi, I just wanted to say that there's ~1000 examples of simple parsers like this, because everyone was in need of simple one and didn't use google to look for existing one. The reason that you state is complete nonsense, as you ignore fundamental laws of OOP. Re-usability is one of them. If you're not able to use free libs existing on the market, you're only adding to the trash on the internet. No one is going to use your class, as it's simply not enough, not supported, and don't have future. Just do a search on google.com for simple XML parser, and you'll get what you wanted in the first place.
Don't make internet a bigger trash than it is, by contributing useless code to it.
Cheers.
|
|
|
|

|
For me, it isn't a matter of OOP (irrelevant) or reuse (adding an useless library doesn't impair the users of good ones) or support (we can give a new project the benefit of doubt).
The serious problem is that publishing woefully deficient pieces of code like this as "XML parsers" while they are knowingly and deliberately non-conformant deceives users that don't even know about somewhat advanced features and damages XML as a reliable, universal standard.
If my XML might be consumed by some horrid hack that only understands some undefined restricted subset of XML, I am dragged down to the level of ignorant yokels that don't even recognize they have a problem because "it works for them"; every time a "simple" not-quite-XML parser appears the likelihood of such a decay of XML expressiveness and usefulness increases.
modified on Monday, October 4, 2010 10:10 AM
|
|
|
|

|
If you're not able to use free libs existing on the market, you're only adding to the trash on the internet
Trust me i am able to use free libs , i just did not do it in this project. I probably overestimate my capability's as a programmer, but until i do good i think i have to do bad, though i still think the underestimation of this project is a trend more than it is a fact.
No one is going to use your class, as it's simply not enough, not supported, and don't have future.
I am partially aware of that, though placing the project here under the observant eye of other developers may give it a chance for improvement.
Class interfaces should be complete and minimal and this one looks like it should be called something like xml_parser_with_base_64_encoding_and_decoding
Will move those base 64 functions in the utils section. They are there because i neaded a simple xml parser that is used in an online backup service. A program creates a file like this:
="1.0"
<BACKUP_SERVICE>
<XMLNAME>theIDofthefile</XMLNAME>
<STRUCTURE_NAME>ForExampleStudent</STRUCTURE_NAME>
<STRUCTURE_DATA>base64dataOfTheStructure</STRUCTURE_DATA>
</BACKUP_SERVICE>
And that is all. No extra stuff. This parser was intended to parse a structure like this, and it does a hell of a job. Your(plural) personal expectations are the ones that where misplaced, though it determined me to improve the project.
pointer members without a copy constructor
I might be wrong, but i made a small hack here. Please, if you have the time point me the downsides and the advantages of implementing the copy constructor with regards to my project, not to the "this is how it is done, because is the C++ way".
using character pointers instead of std::string/std::vector in the interface Live with it!
parsing interface taking character arrays instead of streams
Yes, but if you need to parse files larger than 2-3MB you should use streams. This is not a complete solution. You will have to implement this, it is easy.
private member functions and little information hiding. spurious! When is enough privates and enough hiding ??
|
|
|
|

|
If you're going to reply to a message I've posted please have the decency not to conflate it with other people's comments.
As for your delightful comment:
"Live with it!"
I have no intention of living with your code. Perhaps you should learn to "live with" criticism or not post it to public forums.
|
|
|
|

|
Ok, my bad in mixing the replays.
The live with it is just because someone who is used with a way of coding,has a philosophy, but is nor better nor worse than mine.
They say: use new/delete not malloc/free because those are prone to errors and it is the old way, this is the new way, the C++ way. Well the answer is this:
If used incorrectly everything is prone to error. In the new/malloc case the manner is equal. I am not the only one using and supporting the 'old functions':
here is just one of them: http://www.scs.stanford.edu/~dm/home/papers/c++-new.html[^]
The same gos for char arrays. They are prone to errors use std::string because that is what i am comfortable with . Well, i am comfortable using char arrays. The end. I had enough. I dis-consider all who will, from now on rant about it without pointing the error in the code that char arrays are so prone to !
I like criticism but what you wrote is not. It is ranting. Please, when you try to criticize me try to make the antitheses on wrong/write bad/better and the 'close to full' explanation. I do not accept it otherwise!
If i have a function in my compiler lib, then I am in title to use it as I consider right, and unless proven wrong i will continue to do so.
EDIT:
Ok, i see you edited the first post, it now is closer to criticysm than to rant!
I will study it now.
|
|
|
|

|
Sorry, I'll edit it again to conform to your expectations.
|
|
|
|

|
Not implementing XML properly is unacceptable and inexcusable, especially considering that it isn't done as a private hack that doesn't hurt other people but on a site that is supposed to teach people how to program.
EDIT: In case it isn't obvious, I refer to not supporting comments, which implies not knowing how to design a parser in general, and not supporting actual Unicode, which destroys any hope of conformance and reliability.
EDIT 2: I looked at the code. Many more issues are apparent:
- including a 19 MB Intellisense database that any user would either rebuild (if using Visual C++) or ignore (if not);
- using C++ for classes, but with eminently unsafe C strings, malloc(), memset() etc.;
- bad spelling (UBKNOWN, GetParrent(), etc.);
- limited size buffers like _TCHAR szAttrValBuff[256] = {0}; , without checking, and further buffer overflow pearls like memset(szAttrValBuff,0,80*sizeof(_TCHAR)); and the use of lstrcat() ;
- even if length limits on names and attribute values were implemented safely, they would still be a blatant violation of the XML specification;
- no check on the sets of allowed characters in names, whitespace etc., only for delimiter characters;
- totally wrong attribute delimitation, misinterpreting single quotes inside doubly quoted attribute values and vice versa as ending delimiters without remembering how the attribute value is quoted:
if(c == CQUOTE || c == CDQUOTE)
{
c = szXML[++k];
}
while(c != CQUOTE && c != CDQUOTE)
{ if(c != CNEW && c != CTAB && c != CRET)
concat(szAttrValBuff, c);
c = szXML[++k];
}
- no support for CDATA sections;
- no support for processing instructions;
- no support for DTDs and entities.
Entities, attribute quoting and individual character validation are extremely important in practice; disregarding unlimited lengths, CDATA sections, processing instructions are typical mistakes of recurring attempts to make XML processing "simple" and "efficient"; deliberate and widespread memory corruption issues are just horrible.
modified on Tuesday, September 28, 2010 3:20 AM
|
|
|
|

|
Thank you for the input!
This is in my opinion the first constructive criticism that i received on this project. A +5 rating from me. I will address the problems you posted as soon as possible.
Thank you very much!
|
|
|
|

|
Ok, As a result of your constructive criticism the following problems have been addressed:
-UNICODE. I simply compiled using UNICODE Character set and tested. It worked from the first try with no modification! The fact is that it does not work for both Unicode AND ASCII. It depends on the compilation. I will find a way to improve this in the near future.
-wrong attribute delimitation, misinterpreting single quotes inside doubly quoted attribute values and vice versa has been fixed:
if(c == CQUOTE || c == CDQUOTE)
{
c = szXML[++k];
}
while(c != CQUOTE && c != CDQUOTE)
{ if(c != CNEW && c != CTAB && c != CRET)
concat(szAttrValBuff, c);
c = szXML[++k];
}
became:
if(c == CQUOTE || c == CDQUOTE)
{
cDelim = c;
c = szXML[++k];
}
while(c != cDelim && cDelim != 0)
{ if(c != CNEW && c != CTAB && c != CRET)
concat(szAttrValBuff, c);
c = szXML[++k];
}
cDelim = 0;
-The intellisense file has been removed! My apologies for that!
Again, Thanks for allocating the time to review my project. I will work on all the other problems you pointed out!
|
|
|
|

|
Added run-time UNICODE support.
From the tests I made it appears to work properly, but only time will expose some hidden bugs. This implementation created some problems. The project MUST be compiled using UNICODE character set in visual studio therefor is not portable on other compilers since my experience with other C++ compilers is limited i will leave it as is for now, but i plan to replace the macro _TCHAR with the appropriate type in the future.
I am currently working on implementing CDATA sections.
The problem I have is the lack of test files. I am working on this also.
Will keep you guys posted on the progress.
|
|
|
|

|
Ok, i have as i promised revised the project and corrected most of the issues that you found.
Still, the DTD and the XSD are not yet supported. I tried, but it byte me. I will not gave up, but at first i will just make the parser ignore them.
For the purpose that i originally posted this project i think it is complete. I did not tried to make a project w3c compilable, and nor i think that i can in the future.
|
|
|
|

|
If you don't understand that DTDs are a necessary foundation of XML, that standards imply compliance rather than implementing what you feel like to implement, and that your project is far from "complete", I can only feel sad for you and your career perspectives as a programmer.
|
|
|
|

|
Sorry, I don't agree with either message of "My Vote of 1" and "My Vote of 2". Both seemed spurious.
If a C++ programmer does NOT know how to handle an ASCII-Z string, then they're in serious trouble anyway and should probably be using VB or C# or Java.
In terms of C++ objects instead of C-RTL, sometimes it's just a plain matter of speed. Example: we had a simple XML-like parser for financial trading data using quite a bit of STL. We profiled the program and learned that 80% of our time was in the parsing / storing /accessing of the messages! Converting to a "C RTL" approach reduced that to 5%.
|
|
|
|

|
Very poor designed class. You basically do not use anything really of C in your class. You use malloc instead of new (no reason for it). Why re-invent the wheel when there are other better xml parsers?
|
|
|
|

|
the reason for reinventing the wheel is in the introduction.
Oh, no reason for using new either!! new calls the constructor, and there is no need for that. I am used using it for classes. Very poor design? other than being simple, please extrapolate. Not great maybe, not good maybe also, but very poor? Why ?
If you are to throw dirt, than do it right!
|
|
|
|

|
Well lets see. You use platform specific code. Not properly unicode support. (Look in to ICU) "It only works for ASCII" (Who still only programs for ASCII (is this 1990?)). You only use malloc in C. This isn't C. Unless you can really explain why you need mallocing. No use of std::w/string (is there really a need to use C strings?), I'm shocked you actually use std::list instead of rolling your own C like interface. If you are going to write in C++, at least do it properly because you could have designed it with better efficiently unlike with the C. Just because you can use C doesn't mean you should always use C. Also, btw no one uses (void) in parameter names. By using C also, you show that it's okay and encourage it. That's why there are so many bad C++ programmers out there. Because you teach them to use C in C++ instead of proper C++.
|
|
|
|

|
Yes, i use platform specific code! that makes it un-portable not rubbish, It is like Visual studio! It can not run on Linux. It is not portable, not rubbish! I plan to make it portable, but i don't have the time.
The reason for malloc is that at some time i used realloc. I am also planning to reuse it, instead of calling free so much! I removed it at development time because of some memory issues that proved to be from some place else. I just forgot to re-instate the old, realloc code!
It has been tested for ASCII only, but as you can see, it is fairly easy to test for UNICODE, and it probably will work as is (the classes, not the test-project)! It has been designed with that in mind:
m_szValue = (_TCHAR*)malloc(l+sizeof(_TCHAR));
Also, btw no one uses (void) in parameter names
I won't dignify that.
By using C also, you show that it's okay and encourage it.That's why there are so many bad C++ programmers out there. Because you teach them to use C in C++ instead of proper C++.
well, if you know what you are doing, it really is OK and i encourage it. Otherwise go C#. Does for you proper C++ means using the proper library's?
|
|
|
|

|
I never said it was rubbish because it wasn't portable. But it does cause points to be knocked off. You can easily just clear the string via std::w/string (what's the excuse there?). That also knocks out the need for malloc (if you had newed then all you had to do was delete the C string and renew it). Only code that still uses (void) in the parameters were bad habited C programmers. Even in the standard it never requires it (so why use it?). I don't care for C# since no decent cross os gui for it.
Proper C++ means using the right headers for the right job. Like in your case std::w/string over the nasty useless C strings. Why else do we have std::w/string in the first place? You can easily do std::w/string.c_str() and achieve the same effect. There is some places C might be useful, but most of the time it's just bad habit and never bother to correct it. Like in your case, which has no purpose of and which you can easily achieve the same effect.
If you don't like criticism then don't post articles.
|
|
|
|

|
Sorry about my tone!
I try to understand but i don't really! I will pose some questions in order to clarify you question.
You say that i should have used something like:
basic_string <wchar_t> *m_szMyString;
instead of
_TCHAR * m_szMyString;
?
|
|
|
|

|
Actually you can use std::basic_string<_TCHAR> and achieve the same effect also. or simply just use std::string (std::basic_string) or std::wstring (std::basic_string)
And yes use those I just listed instead of what you are currently using. They are a lot more functionality and less error prone. Read up on std::w/string in a C++ book.
|
|
|
|
 |
|
|
General News Suggestion Question Bug Answer Joke Rant Admin
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.
|
A Simple C++ XML parser with only the basic functionality
Type | Article |
Licence | CPOL |
First Posted | 21 Sep 2010 |
Views | 62,329 |
Downloads | 2,634 |
Bookmarked | 25 times |
|
|