(Net::LDAP) Automatically convert attributes into utf8 when writting

Discussion:

(too old to reply)

pe rl

2015-08-25 10:54:26 UTC

Hi, we are using an old version of Net::LDAP (0.39) in an old perl
installation (5.10.1). Recently we have changed the ldap server, and now it
uses utf8 in the entry attributes, so we are getting problems with reading and
writting attributes with Net::LDAP.

To solve it, I have read the documentation of Net::LDAP 0.39 ( at
https://metacpan.org/pod/release/GBARR/perl-ldap-0.39/lib/Net/LDAP.pod ), and
I see there is an option ("raw") in the constructor to indicate attributes
that should be treated as utf8. I have tested it, and it works por reading
from the ldap server (attribute strings are marked as utf8, so they a treated
correctly by our programs), but it doesn't work for writting (our latin1
strings are not being converted automatically into utf8 before being sent).

So it looks like the "raw" option works for reading but not for writting. Is
there any quick way to use the "raw" regex also for writting? The alternative
would be to review all of our code and manually encode all the values to utf8
before passing them to Net::LDAP, but it would mean a lot of work. It would be
better if we could change the Net::LDAP library itself to convert
automatically attributes into utf8, same as for reading. For example, a new
option "raw_for_writing" could be added to the constructor:

      $ldap = Net::LDAP->new(
                                $server,
                                port => $port,
                                raw => qr/(?i:^jpegPhoto|;binary)/,
                                raw_for_writing => 1,
                            )

I see that the automatic conversion for reading is done at the "decode"
function of "Entry.pm":

    sub decode {

      ...

      if (CHECK_UTF8 && $arg{raw}) {
        $result->{objectName} = Encode::decode_utf8($result->{objectName})
          if ('dn' !~ /$arg{raw}/);

      ...

        foreach my $elem (@{$self->{asn}{attributes}}) {
          map { $_ = Encode::decode_utf8($_) } @{$elem->{vals}}
            if ($elem->{type} !~ /$arg{raw}/);
        }
      }

And I see that there is an "encode" function in "Entry.pm", that doesn't do
the magic:

    sub encode {
      $LDAPEntry->encode( shift->{asn} );
    }

Would it be sufficient to add some similar code to the "Entry::encode"
function in order to automatically encode attributes to utf8 before being sent?

Any suggestion to reduce the amount of code to be changed in our programs?

Thank you

Keutel, Jochen (mlists)

2015-08-25 11:04:21 UTC

Permalink

Hello,
instead of patching Net::LDAP you should use utf8::encode() and
utf8::decode() in your perl code.

See http://perldoc.perl.org/5.10.1/utf8.html .

Regards, Jochen.

Post by pe rl
Hi, we are using an old version of Net::LDAP (0.39) in an old perl
installation (5.10.1). Recently we have changed the ldap server, and now it
uses utf8 in the entry attributes, so we are getting problems with reading and
writting attributes with Net::LDAP.
To solve it, I have read the documentation of Net::LDAP 0.39 ( at
https://metacpan.org/pod/release/GBARR/perl-ldap-0.39/lib/Net/LDAP.pod ), and
I see there is an option ("raw") in the constructor to indicate attributes
that should be treated as utf8. I have tested it, and it works por reading
from the ldap server (attribute strings are marked as utf8, so they a treated
correctly by our programs), but it doesn't work for writting (our latin1
strings are not being converted automatically into utf8 before being sent).
So it looks like the "raw" option works for reading but not for writting. Is
there any quick way to use the "raw" regex also for writting? The alternative
would be to review all of our code and manually encode all the values to utf8
before passing them to Net::LDAP, but it would mean a lot of work. It would be
better if we could change the Net::LDAP library itself to convert
automatically attributes into utf8, same as for reading. For example, a new
$ldap = Net::LDAP->new(
$server,
port => $port,
raw => qr/(?i:^jpegPhoto|;binary)/,
raw_for_writing => 1,
)
I see that the automatic conversion for reading is done at the "decode"
sub decode {
...
if (CHECK_UTF8 && $arg{raw}) {
$result->{objectName} = Encode::decode_utf8($result->{objectName})
if ('dn' !~ /$arg{raw}/);
...
if ($elem->{type} !~ /$arg{raw}/);
}
}
And I see that there is an "encode" function in "Entry.pm", that doesn't do
sub encode {
$LDAPEntry->encode( shift->{asn} );
}
Would it be sufficient to add some similar code to the "Entry::encode"
function in order to automatically encode attributes to utf8 before being sent?
Any suggestion to reduce the amount of code to be changed in our programs?
Thank you

pe rl

2015-08-25 11:37:15 UTC

Permalink

Thank you, I already knew utf8::encode() and utf8::decode().

They are not necessary when reading/searching in the ldap server, since
Net::LDAP already has a "raw" option in the constructor to automatically
encode/decode strings. It is working for us, and the only change required has
been to add the "raw" option to the constructor.

The problem appears when writting to the ldap server. I have started to modify
our code with utf8::encode(), by adding it to every attribute in all of our
functions. The problem is that it is very inefficient, since I will have to
modify every attribute that appears in our programs. We have a lot of functions
that create/modify/delete entries in the ldap server, so I will have to change
a lot of code to manually encode attribs to utf8, and then test all of the
changes.

It would be much simpler if Net::LDAP would encode automatically the
attributes by using the regex passed into the "raw" option of the constructor,
since the changes in our programs would be zero. In my first message I pasted
the code in Net::LDAP that encodes the attributes when reading from the ldap
server, and it looks simple. Probably encoding attributes when writting to the
ldap server could be simple as well. Probably the changes required in
Net::LDAP are minimal compared to the changes required in our code.

Thank you

Post by Keutel, Jochen (mlists)
Hello,
instead of patching Net::LDAP you should use utf8::encode() and
utf8::decode() in your perl code.
See http://perldoc.perl.org/5.10.1/utf8.html .
Regards, Jochen.

Hi, we are using an old version of Net::LDAP (0.39) in an old perl
installation (5.10.1). Recently we have changed the ldap server, and now it
uses utf8 in the entry attributes, so we are getting problems with reading and
writting attributes with Net::LDAP.
To solve it, I have read the documentation of Net::LDAP 0.39 ( at
https://metacpan.org/pod/release/GBARR/perl-ldap-0.39/lib/Net/LDAP.pod ), and
I see there is an option ("raw") in the constructor to indicate attributes
that should be treated as utf8. I have tested it, and it works por reading
from the ldap server (attribute strings are marked as utf8, so they a treated
correctly by our programs), but it doesn't work for writting (our latin1
strings are not being converted automatically into utf8 before being sent).
So it looks like the "raw" option works for reading but not for writting. Is
there any quick way to use the "raw" regex also for writting? The alternative
would be to review all of our code and manually encode all the values to utf8
before passing them to Net::LDAP, but it would mean a lot of work. It would be
better if we could change the Net::LDAP library itself to convert
automatically attributes into utf8, same as for reading. For example, a new
       $ldap = Net::LDAP->new(
                                 $server,
                                 port => $port,
                                 raw => qr/(?i:^jpegPhoto|;binary)/,
                                 raw_for_writing => 1,
                             )
I see that the automatic conversion for reading is done at the "decode"
     sub decode {
       ...
       if (CHECK_UTF8 && $arg{raw}) {
         $result->{objectName} = Encode::decode_utf8($result->{objectName})
           if ('dn' !~ /$arg{raw}/);
       ...
             if ($elem->{type} !~ /$arg{raw}/);
         }
       }
And I see that there is an "encode" function in "Entry.pm", that doesn't do
     sub encode {
       $LDAPEntry->encode( shift->{asn} );
     }
Would it be sufficient to add some similar code to the "Entry::encode"
function in order to automatically encode attributes to utf8 before being sent?
Any suggestion to reduce the amount of code to be changed in our programs?
Thank you

Peter Marschall

2015-08-29 11:54:15 UTC

Permalink

Hi,

Post by pe rl
They are not necessary when reading/searching in the ldap server, since
Net::LDAP already has a "raw" option in the constructor to automatically
encode/decode strings. It is working for us, and the only change required
has been to add the "raw" option to the constructor.

I think you misinterpret the purpose of the raw option.

Its goal is to convert the byte strings coming from the LDAP server that
represent UTF-8 encoded directory strings from byte semantics to
Perl scalars with character semantics.

On the other hand, perl-ldap expects scalars in character semantics when
it comes to writing directory strings to an LDAP server.

It is not perl-ldap's job to translate between scalars in Perl's character
semantics and various input or output encodings of your application.

Post by pe rl
The problem appears when writting to the ldap server. I have started to
modify our code with utf8::encode(), by adding it to every attribute in all
of our functions. The problem is that it is very inefficient, since I will
have to modify every attribute that appears in our programs. We have a lot
of functions that create/modify/delete entries in the ldap server, so I
will have to change a lot of code to manually encode attribs to utf8, and
then test all of the changes.

It is not perl-ldap's job to translate between scalars in Perl's character
semantics and various input or output encodings of your application.

This is the application's task.
If you - as you write - need to convert every attribute using ut8::encode(),
then your application seems to use a mixture of byte & character semantics.

In that case please do yourself a favour and switch over to character
semantics by correctly converting input to character semantics when it
happens:
- for file & console input you can use the ":encoding(...)" layer to make
sure you get character semantics instead of byte semantics
- for @ARGV a simple
$_ = Encode::decode('UTF-8' ,$_) for @ARGV;
should be sufficient.

You may also have a look at the 'utf8::all' package that does a lot of the
above for you automatically.

Please read the perlunicode manual page for more detailed information.

Best
PEter

--
Peter Marschall
***@adpm.de

pe rl

2015-08-31 07:42:09 UTC

Permalink

Thank you for your information.

Finally I added "uf8::encode" to all the attribs, so now it works.

Converting our code (@_ and file i/o) into utf8 was an option, but I discarded it because we have a lot of files (our proyect is nearly a framework, not a few files), including modules that read translation string files for several languages, so converting everything into utf8 would be a lot of extra work.

Our proyect is rather old, it was created in the old times, when utf8 was still not used. This is the reason why it is so difficult for us to convert everyting into utf8. Anyway I believe we will have to convert it some day, as you proposed.

Thank you

Post by Peter Marschall
Hi,

They are not necessary when reading/searching in the ldap server, since
Net::LDAP already has a "raw" option in the constructor to automatically
encode/decode strings. It is working for us, and the only change required
has been to add the "raw" option to the constructor.

I think you misinterpret the purpose of the raw option.
Its goal is to convert the byte strings coming from the LDAP server that
represent UTF-8 encoded directory strings from byte semantics to
Perl scalars with character semantics.
On the other hand, perl-ldap expects scalars in character semantics when
it comes to writing directory strings to an LDAP server.
It is not perl-ldap's job to translate between scalars in Perl's character
semantics and various input or output encodings of your application.

The problem appears when writting to the ldap server. I have started to
modify our code with utf8::encode(), by adding it to every attribute in all
of our functions. The problem is that it is very inefficient, since I will
have to modify every attribute that appears in our programs. We have a lot
of functions that create/modify/delete entries in the ldap server, so I
will have to change a lot of code to manually encode attribs to utf8, and
then test all of the changes.

It is not perl-ldap's job to translate between scalars in Perl's character
semantics and various input or output encodings of your application.
This is the application's task.
If you - as you write - need to convert every attribute using ut8::encode(),
then your application seems to use a mixture of byte & character semantics.
In that case please do yourself a favour and switch over to character
semantics by correctly converting input to character semantics when it
- for file & console input you can use the ":encoding(...)" layer to make
sure you get character semantics instead of byte semantics
should be sufficient.
You may also have a look at the 'utf8::all' package that does a lot of the
above for you automatically.
Please read the perlunicode manual page for more detailed information.
Best
PEter
--
Peter Marschall