Crypt::OpenSSL:X509 and UTF-8 strings

Bumped to top due to updates.

For my current project I look at a lot of X.509 certificates using Dan Sully’s Crypt::OpenSSL:X509 Perl module. I’m not using the version from CPAN, but his current codebase straight from his git repository.

While trying to store information about certs in a PostgreSQL DB which is set to UTF-8 strings, I encountered errors. Some debugging later I found that some of the certs had Umlauts in the subject field. The XS code from Crypt::OpenSSL:X509 wasn’t UTF-8 aware, causing automatic down-conversion to ISO-8859-1, which produced illegal byte sequence when parsed as UTF-8.

After some cursing and debugging I came up with this patch:


--- ../dsully-perl-crypt-openssl-x509/X509.xs 2009-03-06 22:22:44.000000000 +0100
+++ X509.xs 2009-08-17 14:46:00.000000000 +0200
@@ -73,6 +73,15 @@
return sv;
}

+static SV* sv_bio_utf8_on(BIO *bio) {
+
+ SV* sv;
+ sv = (SV *)BIO_get_callback_arg(bio);
+ SvUTF8_on(sv);
+ return sv;
+}
+
+
/*
static void sv_bio_error(BIO *bio) {

@@ -293,8 +302,10 @@
name = X509_get_issuer_name(x509);
}

+ /* this need not be pure ascii, try to get a native perl character string with utf8 */
+ sv_bio_utf8_on(bio);
/* this is prefered over X509_NAME_oneline() */
- X509_NAME_print_ex(bio, name, 0, XN_FLAG_SEP_CPLUS_SPC);
+ X509_NAME_print_ex(bio, name, 0, (XN_FLAG_SEP_CPLUS_SPC | ASN1_STRFLGS_UTF8_CONVERT) & ~ASN1_STRFLGS_ESC_MSB);

} else if (ix == 3) {

@@ -799,7 +810,8 @@
n = OBJ_nid2sn(nid);
}
BIO_printf(bio, "%s=", n);
- ASN1_STRING_print(bio, X509_NAME_ENTRY_get_data(name_entry));
+ sv_bio_utf8_on(bio);
+ ASN1_STRING_print_ex(bio, X509_NAME_ENTRY_get_data(name_entry),ASN1_STRFLGS_UTF8_CONVERT & ~ASN1_STRFLGS_ESC_MSB);
RETVAL = sv_bio_final(bio);

OUTPUT:

Basically, this just tells the openssl library to output UTF-8 and the perl core that the new strings are encoded in UTF-8.

This might be overkill in the cases where it’s not actually needed, but it should do no harm.

Update: My patch is now in the git repository.

Update2: Life is not that easy. Looking at more X.509 certs in the wild shows that openssl does not check whether it returns a valid UTF8 string. So stay tuned for additional patches in Dan’s git repository.

Update3: My patches are now integrated in the git version.